Mart Schweiger

Posted on Apr 3 • Originally published at assemblyai.com

Node.js Voice Agent with AssemblyAI Universal-3 Pro Streaming

#ai #node #tutorial #javascript

Node.js Voice Agent with AssemblyAI Universal-3 Pro Streaming

Build a real-time voice agent in Node.js using the AssemblyAI Universal-3 Pro Streaming model (u3-rt-pro) for speech-to-text — no Python required, no heavy framework dependencies.

Two modes in one repo:

Terminal agent (src/agent.js) — mic input via mic, plays TTS audio in your terminal
Browser server (src/server.js) — Node.js WebSocket server with a browser UI using getUserMedia

Why AssemblyAI Universal-3 Pro for Node.js?

Metric	AssemblyAI Universal-3 Pro	Deepgram Nova-3
P50 latency	307 ms	516 ms
Word Error Rate	8.14%	9.87%
Neural turn detection	✅	❌ (VAD only)
Mid-session prompting	✅	❌
Real-time diarization	✅	❌
Anti-hallucination	✅	❌

Neural turn detection eliminates the need for a separate VAD library. The model uses both acoustic and linguistic signals to detect when a speaker has finished — not just when they've gone silent.

Quick Start

git clone https://github.com/kelseyefoster/voice-agent-nodejs-assemblyai
cd voice-agent-nodejs-assemblyai

npm install
cp .env.example .env
# Edit .env with your API keys

Terminal Agent

npm start
# Speak into your mic — Ctrl+C to quit

Browser Server

npm run server
# Open http://localhost:3000

AssemblyAI WebSocket URL

const AAI_WS_URL =
  `wss://streaming.assemblyai.com/v3/ws` +
  `?speech_model=u3-rt-pro` +
  `&encoding=pcm_s16le` +
  `&sample_rate=16000` +
  `&end_of_turn_confidence_threshold=0.4` +
  `&min_end_of_turn_silence_when_confident=300` +
  `&max_turn_silence=1500` +
  `&token=${ASSEMBLYAI_API_KEY}`;

Message Handling

ws.on("message", async (data) => {
  const msg = JSON.parse(data.toString());

  if (msg.type === "Begin") {
    console.log(`Session: ${msg.id}`);
  }

  if (msg.type === "Turn" && !msg.end_of_turn) {
    process.stdout.write(`\r${msg.transcript}`);
  }

  if (msg.type === "Turn" && msg.end_of_turn) {
    const reply = await generateResponse(msg.transcript);
    await speak(reply);
  }
});

Sending Audio

Browser (getUserMedia + ScriptProcessor)

processor.onaudioprocess = (e) => {
  const float32 = e.inputBuffer.getChannelData(0);
  const int16 = new Int16Array(float32.length);
  for (let i = 0; i < float32.length; i++) {
    int16[i] = Math.max(-32768, Math.min(32767, Math.round(float32[i] * 32767)));
  }
  ws.send(int16.buffer);
};

Terminal (mic package)

const micStream = micInstance.getAudioStream();
micStream.on("data", (chunk) => {
  aaiWs.send(chunk); // raw PCM s16le bytes
});

Turn Detection Tuning

Parameter	Default	Lower Value	Higher Value
`end_of_turn_confidence_threshold`	0.4	Faster response	Fewer false triggers
`min_end_of_turn_silence_when_confident`	300ms	Snappier	More natural pauses
`max_turn_silence`	1500ms	Faster cutoff	More thinking time

Mid-Session Keyterm Prompting

Inject domain-specific vocabulary without restarting:

ws.send(JSON.stringify({
  type: "UpdateConfiguration",
  keyterms: ["AssemblyAI", "Universal-3", "your-product-name"],
}));

DEV Community

Node.js Voice Agent with AssemblyAI Universal-3 Pro Streaming

Node.js Voice Agent with AssemblyAI Universal-3 Pro Streaming

Why AssemblyAI Universal-3 Pro for Node.js?

Quick Start

Terminal Agent

Browser Server

AssemblyAI WebSocket URL

Message Handling

Sending Audio

Browser (getUserMedia + ScriptProcessor)

Terminal (mic package)

Turn Detection Tuning

Mid-Session Keyterm Prompting

Resources

Top comments (0)