DEV Community

Dilek Karasoy for Picovoice

Posted on • Edited on

Evaluating Hotword Detection Software

So far, we've covered adding hotwords to several platforms with:
Python
Vue
Angular
No-code Platform for MCUs
Raspberry Pi
Arduino
Today let's go over how to evaluate -scientifically- the performance of several hotword detection engines:

Hotword detection accuracy is measured by two parameters: False Rejection Rate (FRR) and False Acceptance Rate (FAR). The FRR is the probability of the software missing the wake-up phrase. The FAR is usually expressed as FAR per hour. It is the number of times a wake word software incorrectly detects the wake phrase in an hour. Ideally, we want both to be zero. However, we mostly hear one ratio only.
For example, when someone claims their wake word software is 99% accurate, they may mean their FRR is 1%. However, if they don't mention the FAR per hour, then it'd be misleading. Because software can achieve 100% accuracy with high FARs.

This benchmark was developed on Ubuntu 20.04 with Python 3.8 using LibriSpeech, Demand and crowdsourced datasets, then open-sourced. You can use other datasets. You can find the repo on Picovoice's GitHub.

Clone the repository using

git clone --recurse-submodules git@github.com:Picovoice/wakeword-benchmark.git
Enter fullscreen mode Exit fullscreen mode

Make sure the Python packages in the requirements.txt are properly installed for your Python version as Python bindings are used for running the engines.
The benchmarked engines, PocketSphinx can be installed using PyPI. Porcupine and Snowboy by cloning the repos here. You can use other engines, just make sure that you follow the instructions before proceeding to the next step.

Run the accuracy benchmark:

python3 benchmark.py -h
Enter fullscreen mode Exit fullscreen mode
python3 benchmark.py \
--librispeech_dataset_path ${LIBRISPEECH_DATASET_PATH} \
--demand_dataset_path ${DEMAND_DATASET_PATH} \
--keyword ${KEYWORD} \
--access-key ${ACCESS_KEY}
Enter fullscreen mode Exit fullscreen mode

Learn more about evaluating hotword detection software performance

Top comments (0)