End-to-End Speech Recognition with Python

#100daysofcode #challenge #python #programming

Let's start with why you should use Picovoice Python SDK when there are alternative libraries and in-depth tutorials on speech recognition with Python.

Private - processes voice data on the device
Cross-platform — Linux, macOS, Windows, Raspberry Pi, …
Real-time - zero latency

-I do not need to say accurate I guess. I haven't seen any vendor claiming mediocre accuracy 🙃

Now, let's get started!

1 — Install Picovoice

pip3 install picovoice

2 — Create a Picovoice Instant
Picovoice SDK consists of Porcupine Wake Word, enabling custom hotwords and Rhino Speech-to-Intent, enabling custom voice commands. Jointly they enable hands-free experiences.
Porcupine, set an alarm for 1 hours and 13 seconds.
Porcupine detects the hotword "Porcupine", then Rhino captures the user’s intent and provides intent and intent details as seen below:

{
    is_understood: true,
    intent: setAlarm,
    slots: {
        hours: 1,
        seconds: 13
    }
}

To create a Picovoice instance we need Porcupine and Rhino models, paths to the models and callbacks for hotword detection and inference completion. For the simplicity, we'll use pre-trained Porcupine and Rhino models, however, you can train custom ones on the Picovoice Console: While exploring the Picovoice Console, grab your AccessKey, too! Signing up for Picovoice Console is free, no credit card required.

from picovoice import Picovoice
keyword_path = ...  # path to Porcupine wake word file (.PPN)
def wake_word_callback():
    pass
context_path = ...  # path to Rhino context file (.RHN)
def inference_callback(inference):
    print(inference.is_understood)
    if inference.is_understood:
        print(inference.intent)
        for k, v in inference.slots.items():
            print(f"{k} : {v}")

pv = Picovoice(
    access_key=${YOUR_ACCESS_KEY}
    keyword_path=keyword_path(),
    wake_word_callback=wake_word_callback,
    context_path=context_path(),
    inference_callback=inference_callback)

Do not forget to replace model path and AccessKey placeholders.

3 — Process Audio with Picovoice
Pass frames of audio to the engine:

pv.process(audio_frame)

4 — Read audio from the Microphone
Install [pvrecorder](https://pypi.org/project/pvrecorder/) and read the audio:

from pvrecoder import PvRecoder
# `-1` is the default input audio device.
recorder = PvRecoder(device_index=-1)
recorder.start()

Read audio frames from the recorder and pass it to .process method:

pcm = recorder.read()
pv.process(pcm)

5— Create a GUI with Tkinter
Tkinter is the standard GUI framework shipped with Python. Create a frame, add a label showing the remaining time to it, then launch:

window = tk.Tk()
time_label = tk.Label(window, text='00 : 00 : 00')
time_label.pack()

window.protocol('WM_DELETE_WINDOW', on_close)

window.mainloop()

Some resources:
Source code for the tutorial
Original Medium Article
Picovoice SDK
Picovoice Console