Introduction
In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to:
- Capture voice input through a web interface.
- Process the text using a local LLM (like Mistral) running on the Pi.
- Generate voice responses using Piper for text-to-speech (TTS).
- Stream everything in real-time via WebSockets.
All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step!
1. Setting up the Raspberry Pi
First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker.
Steps:
- Update the system:
sudo apt-get update
sudo apt-get upgrade
- Enable the audio interface:
sudo raspi-config
Navigate to System Options > Audio and select the correct output/input device.
2. Installing Ollama for Local LLMs
Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using:
curl -fsSL https://ollama.com/install.sh | sh
Once installed, I pulled the Mistral model:
ollama pull mistral
To confirm it works, I ran a quick test:
ollama run mistral
The model was ready to process text right on the Pi!
3. Setting up Piper for Text-to-Speech (TTS)
For offline voice generation, I chose Piper — a fantastic open-source TTS engine.
- Install dependencies:
sudo apt-get install wget build-essential libsndfile1
- Download Piper for ARM64 (Raspberry Pi):
wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz
tar -xvzf piper_arm64.tar.gz
chmod +x piper
sudo mv piper /usr/local/bin/
- Test if Piper works:
echo "Hello, world!" | piper --model en_US --output_file output.wav
aplay output.wav
Now the Pi could "talk" back!
4. Creating the Backend (Node.js)
I built a simple Node.js server to:
- Accept text from the client (voice input from a web app).
- Process it using Mistral (via Ollama).
- Convert the LLM response to speech with Piper.
- Stream the audio back to the client.
server.js:
const express = require('express');
const { exec } = require('child_process');
const WebSocket = require('ws');
const app = express();
const PORT = 3001;
// WebSocket setup
const wss = new WebSocket.Server({ port: 3002 });
wss.on('connection', (ws) => {
console.log('Client connected');
ws.on('message', (message) => {
console.log('Received:', message);
// Run Mistral LLM
exec(`ollama run mistral "${message}"`, (err, stdout) => {
if (err) {
console.error('LLM error:', err);
ws.send('Error processing your request.');
return;
}
// Convert LLM response to speech using Piper
exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => {
if (ttsErr) {
console.error('Piper error:', ttsErr);
ws.send('Error generating speech.');
return;
}
// Send the audio file back to the client
ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' }));
});
});
});
});
app.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`);
});
5. Building the Real-Time Web Interface (React)
For the frontend, I created a simple React app to:
- Record voice input.
- Display real-time text responses.
- Play the generated speech audio.
App.js:
import React, { useState } from 'react';
function App() {
const [text, setText] = useState('');
const [response, setResponse] = useState('');
const [audio, setAudio] = useState(null);
const ws = new WebSocket('ws://localhost:3002');
const handleSend = () => {
ws.send(text);
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
setResponse(data.text);
fetch(`http://localhost:3001/${data.audio}`)
.then(res => res.blob())
.then(blob => {
setAudio(URL.createObjectURL(blob));
});
};
return (
<div>
<h1>Voice Assistant</h1>
<textarea value={text} onChange={(e) => setText(e.target.value)} />
<button onClick={handleSend}>Send</button>
<h2>Response:</h2>
<p>{response}</p>
{audio && <audio controls src={audio} />}
</div>
);
}
export default App;
6. Running the Project
Once the backend and frontend were ready, I launched both:
- Start the backend:
node server.js
- Run the React app:
npm start
I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally.
Conclusion
Building a real-time, fully offline voice assistant on a Raspberry Pi was an exciting challenge. With:
- Ollama for running local LLMs (like Mistral)
- Piper for high-quality text-to-speech
- WebSockets for real-time communication
- React for a smooth web interface
... I now have a personalized voice AI that works without relying on the cloud.
Top comments (2)
This sounds like an awesome project! Running a local LLM on a Raspberry Pi while handling real-time voice input and responses is a challenging but rewarding task. Using WebSockets for seamless streaming is a great choice, and I’d love to see how it performs in terms of latency and accuracy.
By the way, if you're into tracking the latest fast food prices, you might find this price guide useful. Looking forward to more updates on your project!