LISTSERV 16.5 - CODE4LIB Archives

Another vote for Whisper.

I am very much not a python & command line person. I can muddle through, but would prefer any other option.

So on Windows I've been using StoryToolkitAI, which is built on Whisper and adds a GUI. It's primarily designed for integration with a video editing tool, but you can ignore that part entirely.

https://github.com/octimot/StoryToolkitAI
https://github.com/octimot/StoryToolkitAI/releases


The results are astonishingly good. I've used StoryToolkitAI to transcribe interviews, translated a podcast from spoken Spanish to English text, and generated subtitle files for recorded conference sessions.

-Chad



---

Chad Haefele

he/him/his

Head of User Experience

UNC University Libraries

[log in to unmask]






________________________________
From: Code for Libraries <[log in to unmask]> on behalf of Martin, Will <[log in to unmask]>
Sent: Wednesday, December 13, 2023 2:30 PM
To: [log in to unmask] <[log in to unmask]>
Subject: Re: [CODE4LIB] Voice to text software - Windows or Linux

[Some people who received this message don't often get email from [log in to unmask] Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

I had good luck using Whisper to transcribe a bunch of recordings of phone calls. They were mostly short, and the audio quality was pretty poor since they were recorded on a bulk system in the '90s, but it got most of it.

Occasionally it would get stuck in a loop, resulting in odd output where the transcript contained the same phrase several times over before Whisper finally realized there was nothing more in that section of the audio and it was time to move on to the next chunk.

Still, it transcribed like 2000 phone calls in a little over a week of processing on a reasonably standard desktop computer.  Which is so, so much faster than manual transcription.

Will Martin

-----Original Message-----
From: Code for Libraries <[log in to unmask]> On Behalf Of Majewski, Steven Dennis (sdm7g)
Sent: Monday, December 11, 2023 3:08 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Voice to text software - Windows or Linux


++ for Whisper.
I’ve used it transcribing podcasts.
It occasionally has problems identifying obscure and unusual words, but it seems to work better than most.

When I tested it on an episode of Tetrapod Zoology podcast, it mis-transcribed:

"Tetrapod Zoology” as "Tedgeport Zoology” podcast,
“Mesozoic Art” as "Mises Ewick Art”  and
“TetZoo” as “Tetsu”

But everything else looks spot on.
( I haven’t checked spelling of proper names, but human transcribers would have trouble there. )

It worked even better on a podcast with less technical jargon and neologisms.

There are other ports/implementations of Whisper

In C++:  https://github.com/ggerganov/whisper.cpp
& Rust: https://github.com/Gadersd/whisper-burn

And others using different ML frameworks than PyTorch, which are worth trying if you need more performance.


- sdm


> On Dec 10, 2023, at 2:52 PM, Sove67 <[log in to unmask]> wrote:
>
> Hi Charles,
>
> I'm not familiar with any on that list, but you may be interested in
> Whisper: https://github.com/openai/whisper
>
> Rather than a traditional "pipeline" ASR software, this model utilizes
> machine learning. It was built by OpenAI, the folks who made ChatGPT. I've
> been impressed by it.
>
>   1. It runs on local hardware, no connection to an exterior server needed
>   2. Able to translate speech to text in real time. (Kaldi, the topmost
>   example on your list has been noted as being "several times slower":
>   https://deepgram.com/learn/benchmarking-top-open-source-speech-models#kaldi-gigaspeech-xl
>   )
>   3. Open Source software licensed under the MIT license, so it can be
>   used & modified for free in private or commercial settings.
>   4. I know it runs on Windows 10 (see the setup section on the github
>   page), and it should be compatible with Linux systems, using an installer
>   like Anaconda:
>   https://www.linuxlinks.com/machine-learning-linux-whisper-automatic-speech-recognition-system/
>
>
> Best of luck with your project!
> - Kaleb A (Langara LIT Student)
>
> On Fri, Dec 8, 2023 at 1:28 PM charles meyer <[log in to unmask]> wrote:
>
>> My esteeme listmates,
>>
>> Has anyone used any of these in Windows 10 or any Linus distro?
>>
>>
>> https://www.ubuntupit.com/best-open-source-speech-recognition-tools-for-linux/
>>
>> Thank you,
>>
>> Charles.
>>
>> Charlotte County Public Library
>>