brian austin

Posted on Apr 15

Building a voice-activated AI assistant with Node.js and Claude API

#ai #node #webdev #javascript

Building a voice-activated AI assistant with Node.js and Claude API

I wanted to build something fun: a voice assistant that actually understands context, remembers what you said earlier in the conversation, and costs less than a coffee a month to run.

Here's how I built it using the Web Speech API + Node.js + Claude API access via SimplyLouie.

The stack

Frontend: Vanilla JS + Web Speech API (built into Chrome/Edge — no library needed)
Backend: Node.js Express
AI: Claude via SimplyLouie's developer API at $10/month flat
Storage: In-memory conversation history (upgradeable to Redis)

Step 1: The frontend — capture voice input

<!DOCTYPE html>
<html>
<head>
  <title>Voice AI</title>
</head>
<body>
  <button id="startBtn">🎤 Hold to talk</button>
  <div id="transcript"></div>
  <div id="response"></div>

  <script>
    const btn = document.getElementById('startBtn');
    const transcriptEl = document.getElementById('transcript');
    const responseEl = document.getElementById('response');

    const recognition = new webkitSpeechRecognition();
    recognition.continuous = false;
    recognition.interimResults = false;
    recognition.lang = 'en-US';

    btn.addEventListener('mousedown', () => recognition.start());
    btn.addEventListener('mouseup', () => recognition.stop());

    recognition.onresult = async (event) => {
      const transcript = event.results[0][0].transcript;
      transcriptEl.textContent = `You said: ${transcript}`;

      const reply = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: transcript })
      }).then(r => r.json());

      responseEl.textContent = `AI: ${reply.response}`;

      // Speak the response back
      const utterance = new SpeechSynthesisUtterance(reply.response);
      window.speechSynthesis.speak(utterance);
    };
  </script>
</body>
</html>

Step 2: The backend — handle conversation context

const express = require('express');
const app = express();
app.use(express.json());

// Simple in-memory conversation store (keyed by session)
const conversations = new Map();

app.post('/api/chat', async (req, res) => {
  const sessionId = req.headers['x-session-id'] || 'default';
  const { message } = req.body;

  // Get or create conversation history
  if (!conversations.has(sessionId)) {
    conversations.set(sessionId, []);
  }
  const history = conversations.get(sessionId);

  // Add user message
  history.push({ role: 'user', content: message });

  // Keep last 10 exchanges to stay within context limits
  const recentHistory = history.slice(-20);

  try {
    const response = await fetch('https://simplylouie.com/api/chat', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${process.env.LOUIE_API_KEY}`
      },
      body: JSON.stringify({
        messages: recentHistory,
        system: 'You are a helpful voice assistant. Keep responses concise — under 2 sentences when possible, since they will be read aloud.'
      })
    });

    const data = await response.json();
    const aiReply = data.response || data.content;

    // Store assistant response
    history.push({ role: 'assistant', content: aiReply });

    res.json({ response: aiReply });
  } catch (err) {
    res.status(500).json({ error: 'AI unavailable' });
  }
});

app.listen(3000, () => console.log('Voice AI running on :3000'));

Step 3: Add session tracking so it remembers the conversation

The frontend needs to send a consistent session ID:

// Add to frontend — generate once per page load
const sessionId = Math.random().toString(36).substr(2, 9);

// Update the fetch call:
const reply = await fetch('/api/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-session-id': sessionId  // <-- add this
  },
  body: JSON.stringify({ message: transcript })
}).then(r => r.json());

Now it remembers what you talked about earlier in the session.

What it can do

Once it's running:

You: "What's the capital of France?"
AI: "Paris."

You: "What's the population there?"
AI: "Paris has about 2.1 million people in the city proper."

Note how the second question works — it understands "there" means Paris because of the conversation history.

The cost math

I ran this for a week with about 200 voice interactions:

Claude API via SimplyLouie: $10/month flat (developer tier)
Hosting (Railway): $5/month
Web Speech API: free (browser built-in)
Total: $15/month

For comparison, building this with OpenAI's Whisper for speech-to-text + GPT-4 API would run $40-60/month at the same volume.

Optional: add wake word detection

If you want it to always listen (like Alexa):

// Continuously restart recognition
recognition.onend = () => {
  if (isListening) recognition.start();
};

recognition.onresult = async (event) => {
  const transcript = event.results[0][0].transcript.toLowerCase();

  // Only respond if wake word detected
  if (!transcript.includes('hey louie')) return;

  const actualMessage = transcript.replace('hey louie', '').trim();
  // ... rest of handler
};

let isListening = true;
recognition.start();

Get started

The developer API that powers this: simplylouie.com/developers

$10/month flat rate. No per-token billing surprises. You hit the API, it works.

Full repo for this project is in the comments — drop your questions there too.

DEV Community

Building a voice-activated AI assistant with Node.js and Claude API

Building a voice-activated AI assistant with Node.js and Claude API

The stack

Step 1: The frontend — capture voice input

Step 2: The backend — handle conversation context

Step 3: Add session tracking so it remembers the conversation

What it can do

The cost math

Optional: add wake word detection

Get started

Top comments (0)