Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

#webdev #programming #javascript #ai

Introduction

In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to:

Capture voice input through a web interface.
Process the text using a local LLM (like Mistral) running on the Pi.
Generate voice responses using Piper for text-to-speech (TTS).
Stream everything in real-time via WebSockets.

All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step!

1. Setting up the Raspberry Pi

First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker.

Steps:

Update the system:

   sudo apt-get update
   sudo apt-get upgrade

Enable the audio interface:

   sudo raspi-config

Navigate to System Options > Audio and select the correct output/input device.

2. Installing Ollama for Local LLMs

Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, I pulled the Mistral model:

ollama pull mistral

To confirm it works, I ran a quick test:

ollama run mistral

The model was ready to process text right on the Pi!

3. Setting up Piper for Text-to-Speech (TTS)

For offline voice generation, I chose Piper — a fantastic open-source TTS engine.

Install dependencies:

   sudo apt-get install wget build-essential libsndfile1

Download Piper for ARM64 (Raspberry Pi):

   wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz
   tar -xvzf piper_arm64.tar.gz
   chmod +x piper
   sudo mv piper /usr/local/bin/

Test if Piper works:

   echo "Hello, world!" | piper --model en_US --output_file output.wav
   aplay output.wav

Now the Pi could "talk" back!

4. Creating the Backend (Node.js)

I built a simple Node.js server to:

Accept text from the client (voice input from a web app).
Process it using Mistral (via Ollama).
Convert the LLM response to speech with Piper.
Stream the audio back to the client.

server.js:

const express = require('express');
const { exec } = require('child_process');
const WebSocket = require('ws');

const app = express();
const PORT = 3001;

// WebSocket setup
const wss = new WebSocket.Server({ port: 3002 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  ws.on('message', (message) => {
    console.log('Received:', message);

    // Run Mistral LLM
    exec(`ollama run mistral "${message}"`, (err, stdout) => {
      if (err) {
        console.error('LLM error:', err);
        ws.send('Error processing your request.');
        return;
      }

      // Convert LLM response to speech using Piper
      exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => {
        if (ttsErr) {
          console.error('Piper error:', ttsErr);
          ws.send('Error generating speech.');
          return;
        }

        // Send the audio file back to the client
        ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' }));
      });
    });
  });
});

app.listen(PORT, () => {
  console.log(`Server running at http://localhost:${PORT}`);
});

5. Building the Real-Time Web Interface (React)

For the frontend, I created a simple React app to:

Record voice input.
Display real-time text responses.
Play the generated speech audio.

App.js:

import React, { useState } from 'react';

function App() {
  const [text, setText] = useState('');
  const [response, setResponse] = useState('');
  const [audio, setAudio] = useState(null);

  const ws = new WebSocket('ws://localhost:3002');

  const handleSend = () => {
    ws.send(text);
  };

  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    setResponse(data.text);

    fetch(`http://localhost:3001/${data.audio}`)
      .then(res => res.blob())
      .then(blob => {
        setAudio(URL.createObjectURL(blob));
      });
  };

  return (
    <div>
      <h1>Voice Assistant</h1>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleSend}>Send</button>
      <h2>Response:</h2>
      <p>{response}</p>
      {audio && <audio controls src={audio} />}
    </div>
  );
}

export default App;

6. Running the Project

Once the backend and frontend were ready, I launched both:

Start the backend:

  node server.js

Run the React app:

  npm start

I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally.