DEV Community

jeikabu
jeikabu

Posted on • Originally published at rendered-obsolete.github.io on

Home Assistant Voice Recognition with Rhasspy

With the impending demise of Snips, I’ve been looking for a suitable replacement offline speech recognition solution. After some research, Rhasspy seems like a real winner. Besides supporting a variety of toolkits, it has good documentation, and can be easy to get working.

A Raspberry Pi3B with stock Debian really struggled with this at times. It might be possible to alleviate this by picking different services or adjusting other configuration, but you might be better off just using a more powerful device (like a Pi4 or Jetson Nano) or running it remotely.

Installation

Normally, I like to go through manual installation. But installing Pocketsphinx and OpenFST for Jasper was enough of a headache that I decided to go the container route.

Follow the Rhasspy installation docs. I’m runnning both Hass and Rhasspy on the same Raspberry Pi. From my PC I connect to the pi as pi3.local- adjust this based on the name of your device or use the IP address. If working directly on the device everything is localhost.

If you haven’t already, install docker using the convenience script:

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
Enter fullscreen mode Exit fullscreen mode

You can run Rasspy with the recommended:

docker run -d -p 12101:12101 \
      --restart unless-stopped \
      -v "$HOME/.config/rhasspy/profiles:/profiles" \
      --device /dev/snd:/dev/snd \
      synesthesiam/rhasspy-server:latest \
      --user-profiles /profiles \
      --profile en
Enter fullscreen mode Exit fullscreen mode

Or , use docker-compose:

  1. Install docker-compose via alternative install options:

    sudo pip install docker-compose
    
  2. Use the recommended docker-compose.yml:

    rhasspy:
     image: "synesthesiam/rhasspy-server:latest"
     restart: unless-stopped
     volumes:
         - "$HOME/.config/rhasspy/profiles:/profiles"
     ports:
         - "12101:12101"
     devices:
         - "/dev/snd:/dev/snd"
     command: --user-profiles /profiles --profile en
    
  3. Run: docker-compose up

If docker-compose up fails with ImportError: No module named ssl_match_hostname see this issue:

# Remove problematic `ssl-match-hostname`
sudo pip uninstall backports.ssl-match-hostname docker-compose
# Install alternative `ssl-match-hostname`
sudo apt-get install -y python-backports.ssl-match-hostname \
    python-backports.shutil-get-terminal-size
# Reinstall docker-compose
sudo pip install docker-compose
Enter fullscreen mode Exit fullscreen mode

Docker Shell

When running things with docker, it takes an extra step to have a shell in the context of the container.

  1. Show running containers with docker ps or docker container ls:

    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    4181a2880c84 synesthesiam/rhasspy-server:latest "/run.sh --user-prof…" 26 hours ago Up 4 minutes 0.0.0.0:12101->12101/tcp pi_rhasspy_1
    
  2. Get a shell to the container:

    pc> docker exec -it pi_rhasspy_1 /bin/bash
    # Now you're in the container
    root@4181a2880c84:/#
    

Replace pi_rhasspy_1 with the “container id” or “name” of the appropriate container.

Configuration

Once docker outputs rhasspy_1 | Running on https://0.0.0.0:12101 (CTRL + C to quit) Rhasspy should be up and running. Ignore what it says and use http instead of https- point your browser at http://pi3.local:12101.

At this point I was able to configure everything via the Settings tab. Should that not cooperate, everything can also be done via json.

Audio

The first things to get working are audio input and output. Refer back to an earlier post about working with ALSA.

  1. Settings > Microphone (“Audio Recording”)
    1. Use arecord directly (ALSA) (default is PyAudio)
    2. Select appropriate Input Device
  2. Settings > Sounds (“Audio Playing”)
    1. Use aplay directly (ALSA)
    2. Select appropriate Output Device

To verfy audio recording/playback works, from a docker shell use arecord and aplay.

If, instead of ALSA for input you want to use PyAudio, it’s handy to see what PyAudio sees:

# Install pyaudio
sudo apt-get install -y python-pyaudio
# Launch python REPL
python
Enter fullscreen mode Exit fullscreen mode

Then, run the following (from SO#1, SO#2):

import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    print p.get_device_info_by_index(i)

## OR

import pyaudio
p = pyaudio.PyAudio()
info = p.get_host_api_info_by_index(0)
numdevices = info.get('deviceCount')
for i in range(0, numdevices):
        if (p.get_device_info_by_host_api_device_index(0, i).get('maxInputChannels')) > 0:
            print "Input Device id ", i, " - ", p.get_device_info_by_host_api_device_index(0, i).get('name')
Enter fullscreen mode Exit fullscreen mode

Rhasspy TTS

Testing text-to-speech also seems to be the easist way to validate your audio output is working.

  1. Settings > Text to Speech
    • eSpeak didn’t work for me, but both flite and pico-tts did
  2. Speech tab, in Sentence put hello and Speak
  3. Check Log tab for FliteSentenceSpeaker lines to see e.g. command lines it’s using

Intent Recognition

One way to validate audio input is to setup Rhasspy to recognize intents.

  1. Settings > Intent Recognition
    • Default OpenFST should work
  2. Sentences tab to configure recognized intents
  3. Speech tab, use Hold to Record or Tap to Record for mic input
  4. Saying what time is it should output:
{
 "intent":{
     "entities":{},
     "hass_event":{
         "event_data":{},
         "event_type": "rhasspy_GetTime"
     },
     "intent":{
         "confidence": 1,
         "name": "GetTime"
     },
     "raw_text": "what time is it",
 }
}
Enter fullscreen mode Exit fullscreen mode

Wake word

Another way to validate audio input is to setup a phrase to trigger Rhasspy to recognize intents (i.e. hey siri, ok google, etc.)

  1. Settings > Wake Word
  2. PocketSphinx is the only fully open/offline option
    • “Wake Keyphrase” is the trigger phrase
  3. Save Settings and wait for Rhasspy to restart
  4. Train (mentioned in the docs)
  5. Check Log for PocketsphinxWakeListener: Hotword detected

If your wake keyphrase contains a new word, the log will complain it’s not in dictionary.txt after you Save Settings :

[WARNING:955754080] PocketsphinxWakeListener: XXX not in dictionary
[DEBUG:3450672] PocketsphinxWakeListener: Loading wake decoder with hmm=/profiles/en/acoustic_model, dict=/profiles/en/dictionary.txt
Enter fullscreen mode Exit fullscreen mode

It seems like either adding a custom word via the Words tab and/or hitting Train should fix this, but I haven’t yet figured out the correct incantation.

Hass Integration

Integrating with Home Assistant is accomplished by leveraging Hass’ REST API and POSTing to /api/events endpoint.

  1. Hass: Create long-lived access token
    1. Open Hass: http://pi3.local:8123/profile
    2. Long-Lived Access Tokens > Create Token
    3. Also read Hass authetication docs
  2. Rhasspy: Configure intent handling with Hass
    1. Open Rhasspy: http://pi3.local:12101/
    2. Settings > Intent Handling
    3. Hass URL http://172.17.0.1:8123 (the docker host, 172.17.0.2 is the container itself)
      • If not using docker could instead use localhost
    4. Access Token the token from above
    5. Save Settings > OK to restart

Check Hass REST API is working:

curl -X GET -H "Authorization: Bearer <ACCESS TOKEN>" -H "Content-Type: application/json" http://pi3.local:8123/api/
Enter fullscreen mode Exit fullscreen mode

Should return:

{"message": "API running."}
Enter fullscreen mode Exit fullscreen mode

Note that from within the container you can’t connect to services outside the container using localhost. There’s a few different ways to do this, but that’s why we’re using 172.17.0.1 above:

# Shell into container
docker exec -it pi_rhasspy_1 /bin/bash
# Try Hass REST API to `localhost`
curl -X GET -H "Authorization: Bearer <ACCESS TOKEN>" -H "Content-Type: application/json" http://localhost:8123/api/
curl: (7) Failed to connect to localhost port 8123: Connection refused
Enter fullscreen mode Exit fullscreen mode

Let’s test the Rhasspy->Hass connection:

  1. Open Hass: http://pi3.local:8123/profile
  2. Developer Tools > Events > Listen to events
    • rhasspy_GetTime and Start Listening.
  3. Like for intent recognition, say “what time is it”
  4. Hass should output:
{
 "event_type": "rhasspy_GetTime",
 "data": {},
 "origin": "REMOTE",
 "time_fired": "2019-12-17T16:02:51.366090+00:00",
 "context": {
     "id": "012345678901234567890123456789",
     "parent_id": null,
     "user_id": "deadbeefdeadbeefdeadbeefdeadbeef"
 }
}
Enter fullscreen mode Exit fullscreen mode

Let’s test Hass automation:

  1. Open Hass: http://pi3.local:8123/profile
  2. Configuration > Automation > +
  3. Create an Event trigger:
    • Triggers
      • Trigger type: Event
    • Actions
      • Action type: Call service
      • Service: system_log.write
      • Service data: {message: 'Hello event'}
  4. Like for intent recognition, say “what time is it”
  5. In Hass, Developer Tools > Logs should show the message.

Hass TTS

To use Rhasspy’s TTS we can leverage its REST API:

curl -X POST -d "hello world" http://pi3.local:12101/api/text-to-speech
Enter fullscreen mode Exit fullscreen mode

To trigger this from Hass, we can use the RESTful Command integration. In configuration.yaml:

rest_command:
  tts:
    url: http://localhost:12101/api/text-to-speech
    method: POST
    payload: ''
Enter fullscreen mode Exit fullscreen mode

The payload is Jinja2 template that can be set by the caller.

Test the tts REST command:

  1. Open Hass: http://pi3.local:8123/profile
  2. Developer Tools > Services
  3. Specify rest_command.tts service and with data message: "hello"
  4. Call Service to trigger Rhasspy TTS

Let’s add it to our Hass automation:

  1. Configuration > Automation
  2. Edit the previous item (click the pencil- ✎)
  3. Add Action :
    • Action type: Call service
    • Service: rest_command.tts (it should auto-complete for you)
    • Service data: {message: 'hello world'}
  4. Like for intent recognition, say “what time is it”

This should trigger a full loop:

speech -> Rhasspy -> intent -> Hass -> text -> Rhasspy -> speech
Enter fullscreen mode Exit fullscreen mode

Systemd

I’d like Rhasspy to auto-start similar to Hass.

It would seem that mixing docker with systemd is bad mojo, making me contemplate re-installing Hass via docker. Docker says little on starting containers with systemd other than don’t cross the streams with restarts. And so far google has turned up dubious results- mostly from several years ago that don’t work with current versions of docker.

Create /etc/systemd/system/rhasspy@homeassistant.service:

[Unit]
Description=Rhasspy
Wants=home-assistant@homeassistant.service 
Requires=docker.service
After=home-assistant@homeassistant.service docker.service

[Service]
Type=exec
ExecStart=docker run --rm \
    --name rhasspy \
    -p 12101:12101 \
    -v "/home/homeassistant/.config/rhasspy/profiles:/profiles" \
    --device /dev/snd:/dev/snd \
    synesthesiam/rhasspy-server:latest \
    --user-profiles /profiles --profile en 
ExecStop=docker stop rhasspy
# Restart on failure
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode
Wants/Requires/After Docker must be running, and ideally Hass is (but we can start Rhasspy without it) man
Type Stronger requirement than simple ensuring the process starts man
ExecStart Start the container
ExecStop Stop the container

For ExecStart, note a few differences from the original docker run:

--rm Remove the container on exit. Otherwise we get “name taken” errors on restarts.
--name Give it a predictable name to simplify ExecStop, and make it easier to open docker shells
-v Docker defaults to creating files as root. /srv/ might be better, but I thought this would make the profiles easier to find.
--restart unless-stopped Removed since systemd is managing the lifetime.

Configure it to auto-start and start it:

sudo systemctl --system daemon-reload
sudo systemctl enable rhasspy@homeassistant
sudo systemctl start rhasspy@homeassistant
Enter fullscreen mode Exit fullscreen mode

To debug:

# Check running containers
docker container ls
# Check log output
sudo journalctl -f -u rhasspy@homeassistant
# Open docker shell
docker exec -it rhasspy /bin/bash
Enter fullscreen mode Exit fullscreen mode

Note, if you fail to remove $HOME from docker run it will fail with:

Dec 18 19:00:25 pi3 docker[4764]: /usr/bin/docker: invalid reference format.
Enter fullscreen mode Exit fullscreen mode

Top comments (5)

Collapse
 
krusenas profile image
Karolis

Great, this is on my to-do list! :) I will try running model inference on intel NUC to see if it works better than rpi

Collapse
 
jeikabu profile image
jeikabu

I have a friend that loves those and uses them extensively. I like that they have a wide range features/capabilities, but have yet to actually get one.

Collapse
 
krusenas profile image
Karolis

wouldn't buy it again though, I would look for something with AMD CPU instead of Intel :)

Thread Thread
 
jeikabu profile image
jeikabu

Something based on Raven Ridge or newer APU would be interesting.

Those NUCs are a bit pricier than I expected. Looks like min-spec is starting $200+. It’s a bit over my impulsive buy limit, but I could see grabbing just one.

Collapse
 
amadib profile image
Ahmad H. Ibrahim

Curious if you got a chance to consider or evaluate the Almond+Ada integration?