I had good luck using Whisper to transcribe a bunch of recordings of phone calls. They were mostly short, and the audio quality was pretty poor since they were recorded on a bulk system in the '90s, but it got most of it.
Occasionally it would get stuck in a loop, resulting in odd output where the transcript contained the same phrase several times over before Whisper finally realized there was nothing more in that section of the audio and it was time to move on to the next chunk.
Still, it transcribed like 2000 phone calls in a little over a week of processing on a reasonably standard desktop computer. Which is so, so much faster than manual transcription.
Will Martin
-----Original Message-----
From: Code for Libraries <[log in to unmask]> On Behalf Of Majewski, Steven Dennis (sdm7g)
Sent: Monday, December 11, 2023 3:08 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Voice to text software - Windows or Linux
++ for Whisper.
I’ve used it transcribing podcasts.
It occasionally has problems identifying obscure and unusual words, but it seems to work better than most.
When I tested it on an episode of Tetrapod Zoology podcast, it mis-transcribed:
"Tetrapod Zoology” as "Tedgeport Zoology” podcast,
“Mesozoic Art” as "Mises Ewick Art” and
“TetZoo” as “Tetsu”
But everything else looks spot on.
( I haven’t checked spelling of proper names, but human transcribers would have trouble there. )
It worked even better on a podcast with less technical jargon and neologisms.
There are other ports/implementations of Whisper
In C++: https://github.com/ggerganov/whisper.cpp
& Rust: https://github.com/Gadersd/whisper-burn
And others using different ML frameworks than PyTorch, which are worth trying if you need more performance.
- sdm
> On Dec 10, 2023, at 2:52 PM, Sove67 <[log in to unmask]> wrote:
>
> Hi Charles,
>
> I'm not familiar with any on that list, but you may be interested in
> Whisper: https://github.com/openai/whisper
>
> Rather than a traditional "pipeline" ASR software, this model utilizes
> machine learning. It was built by OpenAI, the folks who made ChatGPT. I've
> been impressed by it.
>
> 1. It runs on local hardware, no connection to an exterior server needed
> 2. Able to translate speech to text in real time. (Kaldi, the topmost
> example on your list has been noted as being "several times slower":
> https://deepgram.com/learn/benchmarking-top-open-source-speech-models#kaldi-gigaspeech-xl
> )
> 3. Open Source software licensed under the MIT license, so it can be
> used & modified for free in private or commercial settings.
> 4. I know it runs on Windows 10 (see the setup section on the github
> page), and it should be compatible with Linux systems, using an installer
> like Anaconda:
> https://www.linuxlinks.com/machine-learning-linux-whisper-automatic-speech-recognition-system/
>
>
> Best of luck with your project!
> - Kaleb A (Langara LIT Student)
>
> On Fri, Dec 8, 2023 at 1:28 PM charles meyer <[log in to unmask]> wrote:
>
>> My esteeme listmates,
>>
>> Has anyone used any of these in Windows 10 or any Linus distro?
>>
>>
>> https://www.ubuntupit.com/best-open-source-speech-recognition-tools-for-linux/
>>
>> Thank you,
>>
>> Charles.
>>
>> Charlotte County Public Library
>>
|