DEV Community

Aswin O
Aswin O

Posted on

Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

Introduction

In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to:

  • Capture voice input through a web interface.
  • Process the text using a local LLM (like Mistral) running on the Pi.
  • Generate voice responses using Piper for text-to-speech (TTS).
  • Stream everything in real-time via WebSockets.

All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step!


1. Setting up the Raspberry Pi

First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker.

Steps:

  1. Update the system:
   sudo apt-get update
   sudo apt-get upgrade
Enter fullscreen mode Exit fullscreen mode
  1. Enable the audio interface:
   sudo raspi-config
Enter fullscreen mode Exit fullscreen mode

Navigate to System Options > Audio and select the correct output/input device.


2. Installing Ollama for Local LLMs

Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Once installed, I pulled the Mistral model:

ollama pull mistral
Enter fullscreen mode Exit fullscreen mode

To confirm it works, I ran a quick test:

ollama run mistral
Enter fullscreen mode Exit fullscreen mode

The model was ready to process text right on the Pi!


3. Setting up Piper for Text-to-Speech (TTS)

For offline voice generation, I chose Piper — a fantastic open-source TTS engine.

  1. Install dependencies:
   sudo apt-get install wget build-essential libsndfile1
Enter fullscreen mode Exit fullscreen mode
  1. Download Piper for ARM64 (Raspberry Pi):
   wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz
   tar -xvzf piper_arm64.tar.gz
   chmod +x piper
   sudo mv piper /usr/local/bin/
Enter fullscreen mode Exit fullscreen mode
  1. Test if Piper works:
   echo "Hello, world!" | piper --model en_US --output_file output.wav
   aplay output.wav
Enter fullscreen mode Exit fullscreen mode

Now the Pi could "talk" back!


4. Creating the Backend (Node.js)

I built a simple Node.js server to:

  • Accept text from the client (voice input from a web app).
  • Process it using Mistral (via Ollama).
  • Convert the LLM response to speech with Piper.
  • Stream the audio back to the client.

server.js:

const express = require('express');
const { exec } = require('child_process');
const WebSocket = require('ws');

const app = express();
const PORT = 3001;

// WebSocket setup
const wss = new WebSocket.Server({ port: 3002 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  ws.on('message', (message) => {
    console.log('Received:', message);

    // Run Mistral LLM
    exec(`ollama run mistral "${message}"`, (err, stdout) => {
      if (err) {
        console.error('LLM error:', err);
        ws.send('Error processing your request.');
        return;
      }

      // Convert LLM response to speech using Piper
      exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => {
        if (ttsErr) {
          console.error('Piper error:', ttsErr);
          ws.send('Error generating speech.');
          return;
        }

        // Send the audio file back to the client
        ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' }));
      });
    });
  });
});

app.listen(PORT, () => {
  console.log(`Server running at http://localhost:${PORT}`);
});
Enter fullscreen mode Exit fullscreen mode

5. Building the Real-Time Web Interface (React)

For the frontend, I created a simple React app to:

  • Record voice input.
  • Display real-time text responses.
  • Play the generated speech audio.

App.js:

import React, { useState } from 'react';

function App() {
  const [text, setText] = useState('');
  const [response, setResponse] = useState('');
  const [audio, setAudio] = useState(null);

  const ws = new WebSocket('ws://localhost:3002');

  const handleSend = () => {
    ws.send(text);
  };

  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    setResponse(data.text);

    fetch(`http://localhost:3001/${data.audio}`)
      .then(res => res.blob())
      .then(blob => {
        setAudio(URL.createObjectURL(blob));
      });
  };

  return (
    <div>
      <h1>Voice Assistant</h1>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleSend}>Send</button>
      <h2>Response:</h2>
      <p>{response}</p>
      {audio && <audio controls src={audio} />}
    </div>
  );
}

export default App;
Enter fullscreen mode Exit fullscreen mode

6. Running the Project

Once the backend and frontend were ready, I launched both:

  • Start the backend:
  node server.js
Enter fullscreen mode Exit fullscreen mode
  • Run the React app:
  npm start
Enter fullscreen mode Exit fullscreen mode

I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally.


Conclusion

Building a real-time, fully offline voice assistant on a Raspberry Pi was an exciting challenge. With:

  • Ollama for running local LLMs (like Mistral)
  • Piper for high-quality text-to-speech
  • WebSockets for real-time communication
  • React for a smooth web interface

... I now have a personalized voice AI that works without relying on the cloud.

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

Top comments (2)

Collapse
 
humna_mazhar_cd492d9fed21 profile image
Humna Mazhar

This sounds like an awesome project! Running a local LLM on a Raspberry Pi while handling real-time voice input and responses is a challenging but rewarding task. Using WebSockets for seamless streaming is a great choice, and I’d love to see how it performs in terms of latency and accuracy.

By the way, if you're into tracking the latest fast food prices, you might find this price guide useful. Looking forward to more updates on your project!

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs