DEV Community

Cover image for The awesome speech recognition toolkit: Vosk!
kama-meshi
kama-meshi

Posted on

The awesome speech recognition toolkit: Vosk!

What is Vosk?

Vosk is a speech recognition toolkit supporting over 20 languages.
The language model is 50MB light and easy to embed. So you will easily can do speech recognition completely offline.

Vosk provides bindings for Python, Java, C#, and also Node.js!

  • Supports 20+ languages and dialects
  • Works offline, even on lightweight devices - Raspberry Pi, Android, iOS

See Vosk's page for detail.

Let's try!

Install Vosk

Now you can try Vosk with Python!
Vosk can be installed by pip. However, I prefer poetry, so I'll install it there.

⚠️ Poetry will try to install the latest version (0.3.38). But that version is not compatible with MacOS. So I installed it by specifying the version to be installed by pip. (as of 2022-05-19)

And you can download the python module from Vosk examples.

Download the language model

The language model is available here. Extract the zip file and place it.

Prepare an audio file

You will need an audio file in the correct format - PCM 16khz 16bit mono.

If you are English speaker, you can get the test voice from Vosk example.

You can convert with ffmpeg.

ffmpeg -i my_voice.wav -ar 16000 -ac 1 -f s16le my_voice_16khz.wav
Enter fullscreen mode Exit fullscreen mode

Run Vosk

Run the python module...

Run in terminal

Done it!! πŸŽ‰
There are some differences. But, Vosk also recognized Japanese Kanji characters. πŸ€„

I'm a Japanese speaker, so recognized a Japanese audio file.
The text of the audio is "γ”θ¦–θ΄γ‚γ‚ŠγŒγ¨γ†γ”γ–γ„γΎγ—γŸοΌγ‚°γƒƒγƒ‰γƒœγ‚Ώγƒ³γ¨γƒγƒ£γƒ³γƒγƒ«η™»ιŒ²γ‚ˆγ‚γ—γγŠι‘˜γ„γ—γΎγ™οΌ".

The complete commands is below.

poetry add vosk@0.3.32
curl -O https://raw.githubusercontent.com/alphacep/vosk-api/v0.3.32/python/example/test_simple.py
curl -O https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip
unzip vosk-model-small-ja-0.22.zip
mv vosk-model-small-ja-0.22/ model/
poetry run python test_simple.py my_voice_16khz.wav
Enter fullscreen mode Exit fullscreen mode

The codes are on GitHub and Replit.
I hope you'll enjoy Vosk too! Thank you.

GitHub logo kama-meshi / HelloVosk

Sample Vosk repl with Python.

Hello Vosk

This is a sample repl for Vosk with Python.

Sample voice

Let's recognize this voice 🎀

"γ”θ¦–θ΄γ‚γ‚ŠγŒγ¨γ†γ”γ–γ„γΎγ—γŸοΌγ‚°γƒƒγƒ‰γƒœγ‚Ώγƒ³γ¨γƒγƒ£γƒ³γƒγƒ«η™»ιŒ²γ‚ˆγ‚γ—γγŠι‘˜γ„γ—γΎγ™οΌ"

Usage

poetry install
poetry run python main.py
Enter fullscreen mode Exit fullscreen mode

And my repl is in replit.

https://replit.com/@kama-meshi/HelloVosk

Special Thanks

Top comments (0)