Building a voice-activated AI assistant with Node.js and Claude API
I wanted to build something fun: a voice assistant that actually understands context, remembers what you said earlier in the conversation, and costs less than a coffee a month to run.
Here's how I built it using the Web Speech API + Node.js + Claude API access via SimplyLouie.
The stack
- Frontend: Vanilla JS + Web Speech API (built into Chrome/Edge — no library needed)
- Backend: Node.js Express
- AI: Claude via SimplyLouie's developer API at $10/month flat
- Storage: In-memory conversation history (upgradeable to Redis)
Step 1: The frontend — capture voice input
<!DOCTYPE html>
<html>
<head>
<title>Voice AI</title>
</head>
<body>
<button id="startBtn">🎤 Hold to talk</button>
<div id="transcript"></div>
<div id="response"></div>
<script>
const btn = document.getElementById('startBtn');
const transcriptEl = document.getElementById('transcript');
const responseEl = document.getElementById('response');
const recognition = new webkitSpeechRecognition();
recognition.continuous = false;
recognition.interimResults = false;
recognition.lang = 'en-US';
btn.addEventListener('mousedown', () => recognition.start());
btn.addEventListener('mouseup', () => recognition.stop());
recognition.onresult = async (event) => {
const transcript = event.results[0][0].transcript;
transcriptEl.textContent = `You said: ${transcript}`;
const reply = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: transcript })
}).then(r => r.json());
responseEl.textContent = `AI: ${reply.response}`;
// Speak the response back
const utterance = new SpeechSynthesisUtterance(reply.response);
window.speechSynthesis.speak(utterance);
};
</script>
</body>
</html>
Step 2: The backend — handle conversation context
const express = require('express');
const app = express();
app.use(express.json());
// Simple in-memory conversation store (keyed by session)
const conversations = new Map();
app.post('/api/chat', async (req, res) => {
const sessionId = req.headers['x-session-id'] || 'default';
const { message } = req.body;
// Get or create conversation history
if (!conversations.has(sessionId)) {
conversations.set(sessionId, []);
}
const history = conversations.get(sessionId);
// Add user message
history.push({ role: 'user', content: message });
// Keep last 10 exchanges to stay within context limits
const recentHistory = history.slice(-20);
try {
const response = await fetch('https://simplylouie.com/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.LOUIE_API_KEY}`
},
body: JSON.stringify({
messages: recentHistory,
system: 'You are a helpful voice assistant. Keep responses concise — under 2 sentences when possible, since they will be read aloud.'
})
});
const data = await response.json();
const aiReply = data.response || data.content;
// Store assistant response
history.push({ role: 'assistant', content: aiReply });
res.json({ response: aiReply });
} catch (err) {
res.status(500).json({ error: 'AI unavailable' });
}
});
app.listen(3000, () => console.log('Voice AI running on :3000'));
Step 3: Add session tracking so it remembers the conversation
The frontend needs to send a consistent session ID:
// Add to frontend — generate once per page load
const sessionId = Math.random().toString(36).substr(2, 9);
// Update the fetch call:
const reply = await fetch('/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-session-id': sessionId // <-- add this
},
body: JSON.stringify({ message: transcript })
}).then(r => r.json());
Now it remembers what you talked about earlier in the session.
What it can do
Once it's running:
You: "What's the capital of France?"
AI: "Paris."
You: "What's the population there?"
AI: "Paris has about 2.1 million people in the city proper."
Note how the second question works — it understands "there" means Paris because of the conversation history.
The cost math
I ran this for a week with about 200 voice interactions:
- Claude API via SimplyLouie: $10/month flat (developer tier)
- Hosting (Railway): $5/month
- Web Speech API: free (browser built-in)
- Total: $15/month
For comparison, building this with OpenAI's Whisper for speech-to-text + GPT-4 API would run $40-60/month at the same volume.
Optional: add wake word detection
If you want it to always listen (like Alexa):
// Continuously restart recognition
recognition.onend = () => {
if (isListening) recognition.start();
};
recognition.onresult = async (event) => {
const transcript = event.results[0][0].transcript.toLowerCase();
// Only respond if wake word detected
if (!transcript.includes('hey louie')) return;
const actualMessage = transcript.replace('hey louie', '').trim();
// ... rest of handler
};
let isListening = true;
recognition.start();
Get started
The developer API that powers this: simplylouie.com/developers
$10/month flat rate. No per-token billing surprises. You hit the API, it works.
Full repo for this project is in the comments — drop your questions there too.
Top comments (0)