August 16th, 2024 · 2 min read
There are several approaches for adding speech recognition capabilities to a Python application. In this article, I’d like to introduce a new paradigm for adding purpose-made & context-aware voice assistants into Python apps using the Picovoice platform.
Picovoice enables developers to create voice experiences similar to Alexa and Google for existing Python apps. Different from cloud-based alternatives, Picovoice is:
- Private and secure — no voice data goes out of the app.
- Accurate — focuses on the domain of interest
- Cross-platform — Linux, macOS, Windows, Raspberry Pi, …
- Reliable and zero latency — eliminates unpredictable network delay
In what follows, I’ll introduce Picovoice by building a voice-enabled alarm clock using Picovoice SDK, Picovoice Console, and Tkinter GUI framework. The code is open-source and available on Picovoice’s GitHub repository.
1 — Install Picovoice
Install Picovoice from a terminal:
pip3 install picovoice
2 — Create an Instance of Picovoice
Picovoice is an end-to-end voice recognition platform with wake word detection and intent inference capabilities. Picovoice uses the Porcupine Wake Word engine for voice activation and the Rhino Speech-to-Intent engine for inferring intent from follow-on voice commands. For example, when a user says:
Picovoice, set an alarm for 2 hours and 31 seconds.
Porcupine detects the utterance of thePicovoice wake word. Then Rhino infers the user’s intent from the follow-on command and provides a structured inference:
{
is_understood: true,
intent: setAlarm,
slots: {
hours: 2,
seconds: 31
}
}
Create an instance of Picovoice by providing paths to Porcupine and Rhino models and callbacks for wake word detection and inference completion:
from picovoice import Picovoice
keyword_path = ... # path to Porcupine wake word file (.PPN)
def wake_word_callback():
pass
context_path = ... # path to Rhino context file (.RHN)
def inference_callback(inference):
print(inference.is_understood)
if inference.is_understood:
print(inference.intent)
for k, v in inference.slots.items():
print(f"{k} : {v}")
pv = Picovoice(
access_key=${YOUR_ACCESS_KEY}
keyword_path=keyword_path(),
wake_word_callback=wake_word_callback,
context_path=context_path(),
inference_callback=inference_callback)
Several pre-trained Porcupine and Rhino models are available on their GitHub repositories [1][2]. For this demo, we use the pre-trained PicovoicePorcupine model and the pre-trained Alarm Rhino model. Developers are also empowered to create custom models using Picovoice Console.
3 — Get your Free AccessKey
Sign up for Picovoice Console to get your AccessKey. It is free. AccessKey is used for authentication and authorization when using Picovoice SDK.
4 — Process Audio with Picovoice
Once the engine is instantiated it can process a stream of audio. Simply pass frames of audio to the engine:
pv.process(audio_frame)
5 — Read audio from the Microphone
Install pvrecorder. Then, read the audio:
from pvrecoder import PvRecoder
# `-1` is the default input audio device.
recorder = PvRecoder(device_index=-1)
recorder.start()
Read frames of audio from the recorder and pass it to Picovoice’s .process method:
pcm = recorder.read()
pv.process(pcm)
6— Create a Cross-Platform GUI using Tkinter
Tkinter is the standard GUI framework shipped with Python. Create a frame (window), add a label showing the remaining time to it, and launch the app:
window = tk.Tk()
time_label = tk.Label(window, text='00 : 00 : 00')
time_label.pack()
window.protocol('WM_DELETE_WINDOW', on_close)
window.mainloop()
7 — Putting it Together
There are about 200 lines of code for GUI, audio recording, and voice recognition. I also created a separate thread for audio processing to avoid blocking the main GUI thread.
If you have technical questions or suggestions please open a GitHub issue on Picovoice’s GitHub repository. If you wish to modify or improve this demo, feel free to submit a pull request.
Top comments (0)