Integrate Node.js with Retell AI and Twilio: Lessons from My Setup
TL;DR
Most Node.js voice integrations fail when Twilio's webhook timing conflicts with Retell AI's streaming latency—you get dropped calls or overlapping audio. This setup uses vapi as the orchestration layer (not Retell directly), Twilio for PSTN connectivity, and Node.js webhooks for session management. Result: sub-500ms latency, proper call state tracking, and zero audio collisions. Stack: Express.js, vapi SDK, Twilio Node.js client, environment-based config.
Prerequisites
API Keys & Credentials
You'll need active accounts with Twilio (phone number provisioning, voice API access) and Retell AI (agent creation, API key). Generate your Twilio Auth Token and Account SID from the console. Retell requires an API key from your dashboard—store both in .env files, never hardcoded.
Node.js & Dependencies
Node.js 16+ (LTS recommended). Install express (webhook server), axios (HTTP requests), dotenv (environment variables), and twilio (SDK for phone integration). Run npm install express axios dotenv twilio in your project directory.
System Requirements
Publicly accessible server or ngrok tunnel (localhost won't work for Twilio webhooks). HTTPS endpoint required—Twilio rejects HTTP. Minimum 512MB RAM for concurrent call handling; 2GB+ if scaling beyond 10 simultaneous calls.
Network & Security
Firewall rules allowing inbound traffic on port 443. Webhook signature validation enabled (Twilio sends X-Twilio-Signature headers). Test locally with ngrok before deploying to production.
Twilio: Get Twilio Voice API → Get Twilio
Step-by-Step Tutorial
Most Node.js + Retell AI + Twilio integrations fail because developers treat them as a unified system. They're not. Retell handles AI conversation logic. Twilio handles telephony. Your Node.js server is the bridge. Mixing their responsibilities creates race conditions and double-billing.
Architecture & Flow
flowchart LR
A[Incoming Call] --> B[Twilio]
B --> C[Your Node.js Server]
C --> D[Retell AI Agent]
D --> E[AI Response]
E --> C
C --> B
B --> A
Critical separation: Twilio owns the phone connection. Retell owns the conversation state. Your server translates between them via webhooks.
Configuration & Setup
Install dependencies for webhook handling and telephony bridging:
npm install express twilio @retellai/retell-sdk dotenv
Environment variables (production secrets, not hardcoded):
// .env file
TWILIO_ACCOUNT_SID=ACxxxxx
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890
RETELL_API_KEY=key_xxxxx
SERVER_URL=https://your-domain.ngrok.io
Retell agent configuration (create via dashboard or API):
const retellAgentConfig = {
agent_name: "Customer Support Agent",
voice_id: "11labs-Adrian", // ElevenLabs voice
language: "en-US",
response_engine: {
type: "retell-llm",
llm_id: "llm_xxxxx"
},
begin_message: "Thanks for calling. How can I help you today?",
general_prompt: "You are a helpful customer support agent. Be concise and professional.",
enable_backchannel: true, // "mm-hmm" responses during user speech
ambient_sound: "office",
interruption_sensitivity: 0.7 // 0-1 scale, higher = easier to interrupt
};
Step-by-Step Implementation
1. Webhook Handler for Incoming Calls
When Twilio receives a call, it hits YOUR server's webhook. You must return TwiML that bridges to Retell:
const express = require('express');
const twilio = require('twilio');
const { RetellClient } = require('@retellai/retell-sdk');
const app = express();
app.use(express.urlencoded({ extended: false }));
const retellClient = new RetellClient({
apiKey: process.env.RETELL_API_KEY
});
// YOUR webhook endpoint - Twilio calls this on incoming call
app.post('/webhook/twilio-incoming', async (req, res) => {
const twiml = new twilio.twiml.VoiceResponse();
try {
// Register call with Retell to get WebSocket URL
const retellCall = await retellClient.call.register({
agent_id: "agent_xxxxx", // Your Retell agent ID
audio_websocket_protocol: "twilio",
audio_encoding: "mulaw", // Twilio's audio format
sample_rate: 8000 // Twilio uses 8kHz
});
// Connect Twilio call to Retell's WebSocket
const connect = twiml.connect();
connect.stream({
url: retellCall.call_detail.websocket_url
});
res.type('text/xml');
res.send(twiml.toString());
} catch (error) {
console.error('Retell registration failed:', error);
twiml.say('Sorry, the system is unavailable. Please try again later.');
res.type('text/xml');
res.send(twiml.toString());
}
});
Why this breaks in production: If retellClient.call.register() times out (>3s), Twilio hangs up. Add a 2-second timeout with fallback TwiML.
2. Retell Event Webhook
Retell sends call events (started, ended, transcript) to YOUR server:
// YOUR webhook endpoint - Retell sends events here
app.post('/webhook/retell-events', express.json(), (req, res) => {
const event = req.body;
switch(event.event) {
case 'call_started':
console.log(`Call ${event.call.call_id} started`);
// Initialize session state, log to analytics
break;
case 'call_ended':
console.log(`Call ${event.call.call_id} ended. Duration: ${event.call.end_timestamp - event.call.start_timestamp}ms`);
// Save transcript, calculate costs
break;
case 'call_analyzed':
// Post-call analysis with sentiment, summary
console.log('Analysis:', event.call.call_analysis);
break;
}
res.sendStatus(200); // Always return 200 or Retell retries
});
3. Start Server with ngrok
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
console.log(`Expose with: ngrok http ${PORT}`);
console.log(`Set Twilio webhook to: https://YOUR_NGROK_URL/webhook/twilio-incoming`);
});
Production deployment: Replace ngrok with a real domain. Configure Twilio webhook URL in dashboard under Phone Numbers → Active Numbers → Voice Configuration.
Error Handling & Edge Cases
Race condition: Twilio connects before Retell WebSocket is ready. Solution: retellClient.call.register() returns immediately with a WebSocket URL. The actual connection happens asynchronously.
Audio quality issues: Twilio uses 8kHz mulaw. Retell expects this format. If you hear robotic voices, verify audio_encoding: "mulaw" and sample_rate: 8000 match.
Webhook signature validation (prevents spoofed requests):
const validateTwilioSignature = (req, res, next) => {
const signature = req.headers['x-twilio-signature'];
const url = `${process.env.SERVER_URL}${req.originalUrl}`;
if (!twilio.validateRequest(process.env.TWILIO_AUTH_TOKEN, signature, url, req.body)) {
return res.status(403).send('Forbidden');
}
next();
};
app.post('/webhook/twilio-incoming', validateTwilioSignature, async (req, res) => {
// Handler code
});
Testing & Validation
-
Local testing: Run
ngrok http 3000, update Twilio webhook URL - Call your Twilio number: Should hear Retell agent greeting
-
Check logs: Verify
call_startedandcall_endedevents fire -
Test interruption: Talk over the agent (should stop mid-sentence if
interruption_sensitivityis configured)
Common failure: 502 Bad Gateway from Twilio means your server didn't respond within 10 seconds. Add timeout handling.
System Diagram
Call flow showing how vapi handles user input, webhook events, and responses.
sequenceDiagram
participant User
participant VAPI
participant Webhook
participant YourServer
User->>VAPI: Initiates call
VAPI->>Webhook: call.initiated event
Webhook->>YourServer: POST /webhook/vapi/start
YourServer->>VAPI: Provide initial call config
VAPI->>User: TTS greeting
User->>VAPI: Provides name and city
VAPI->>Webhook: transcript.final event
Webhook->>YourServer: POST /webhook/vapi/data
YourServer->>VAPI: Dynamic response with user info
VAPI->>User: TTS personalized response
User->>VAPI: Requests human agent
VAPI->>Webhook: escalation.requested event
Webhook->>YourServer: POST /webhook/vapi/escalate
YourServer->>VAPI: Connect to human agent
VAPI->>User: Connecting to human agent
Note over User,VAPI: Call escalated
User->>VAPI: Ends call
VAPI->>Webhook: call.completed event
Webhook->>YourServer: POST /webhook/vapi/end
YourServer->>VAPI: Acknowledge call end
Testing & Validation
Local Testing
Most Node.js Twilio integrations break because devs skip local webhook testing. Twilio needs a public URL to POST call events—your localhost:3000 won't cut it.
Use ngrok to expose your local server:
// Start your Express server first
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
console.log('Run: ngrok http 3000');
console.log('Then update Twilio webhook URL to: https://YOUR_NGROK_URL/webhook/twilio');
});
// Test the webhook endpoint manually
// curl -X POST https://YOUR_NGROK_URL/webhook/twilio \
// -d "CallSid=TEST123" \
// -d "From=+15555551234" \
// -d "To=+15555556789"
Real-world problem: Ngrok URLs expire after 2 hours on free tier. Production deployments need static domains. For local dev, restart ngrok and update your Twilio console webhook URL each session.
Webhook Validation
Twilio signs every webhook request. If you skip validation, attackers can spam your /webhook/twilio endpoint and rack up API costs.
// Validate Twilio signature before processing
app.post('/webhook/twilio', (req, res) => {
const signature = req.headers['x-twilio-signature'];
const url = `https://${req.headers.host}${req.url}`;
if (!validateTwilioSignature(signature, url, req.body)) {
console.error('Invalid Twilio signature - possible attack');
return res.status(403).send('Forbidden');
}
// Process webhook only after validation passes
const twiml = new twilio.twiml.VoiceResponse();
const connect = twiml.connect();
connect.stream({ url: `wss://api.retellai.com/audio-websocket/${retellCall.call_id}` });
res.type('text/xml').send(twiml.toString());
});
This will bite you: Missing signature validation = $500 surprise bill when bots hit your webhook 10k times overnight.
Real-World Example
Barge-In Scenario
Most voice agents break when users interrupt mid-sentence. Here's what actually happens when a user cuts off your agent during a 15-second product pitch:
// Twilio streams audio chunks to your Node.js server
app.ws('/media-stream', (ws) => {
let audioBuffer = [];
let isAgentSpeaking = false;
let lastSpeechTimestamp = Date.now();
ws.on('message', (msg) => {
const event = JSON.parse(msg);
if (event.event === 'media') {
// User audio chunk arrives while agent is talking
audioBuffer.push(Buffer.from(event.media.payload, 'base64'));
// Detect speech energy to trigger barge-in
const speechDetected = detectSpeechEnergy(audioBuffer);
if (speechDetected && isAgentSpeaking) {
// CRITICAL: Stop TTS immediately, don't wait for completion
ws.send(JSON.stringify({
event: 'clear',
streamSid: event.streamSid
}));
isAgentSpeaking = false;
audioBuffer = []; // Flush buffer to prevent stale audio
lastSpeechTimestamp = Date.now();
}
}
});
});
function detectSpeechEnergy(buffer) {
// Calculate RMS energy from PCM samples
const samples = buffer.flatMap(b => new Int16Array(b.buffer));
const rms = Math.sqrt(samples.reduce((sum, s) => sum + s * s, 0) / samples.length);
return rms > 500; // Threshold tuned for background noise rejection
}
The race condition: Twilio's media events arrive every 20ms, but your STT processing takes 80-120ms. If you don't flush audioBuffer on barge-in, the agent speaks over the user with 100ms of stale audio.
Event Logs
Real webhook payload when interruption happens:
// t=0ms: Agent starts speaking
{ event: 'start', streamSid: 'MZ123', callSid: 'CA456' }
// t=340ms: User interrupts
{ event: 'media', payload: 'dGVzdA==', timestamp: '1704067200340' }
// t=360ms: Speech detected, clear command sent
{ event: 'clear', streamSid: 'MZ123' }
// t=380ms: Agent stops, user audio processed
{ event: 'stop', duration: 380 }
Edge Cases
Multiple rapid interrupts: User says "wait... no... actually..." within 500ms. Solution: Debounce barge-in detection with 200ms window to avoid cutting off natural pauses.
False positives from background noise: Dog barking triggers barge-in. Fix: Increase RMS threshold from 500 to 800 and add frequency analysis to filter non-speech sounds (< 300Hz or > 3400Hz).
Network jitter: Audio chunks arrive out of order. Implement sequence number tracking and 50ms reorder buffer before speech detection.
Common Issues & Fixes
Race Conditions in Audio Streaming
Most Node.js Twilio integrations break when Retell AI's audio stream overlaps with Twilio's media events. The symptom: duplicate audio chunks or dropped frames when isAgentSpeaking flips mid-stream.
// WRONG: No lock on state transitions
app.post('/webhook/media', (req, res) => {
const event = req.body;
if (event.event === 'media') {
isAgentSpeaking = true; // Race condition here
processAudioChunk(event.payload);
}
});
// CORRECT: Guard with processing flag
let isProcessingAudio = false;
app.post('/webhook/media', async (req, res) => {
const event = req.body;
if (isProcessingAudio) {
return res.status(202).send(); // Drop frame, don't queue
}
isProcessingAudio = true;
try {
if (event.event === 'media' && event.payload) {
const audioBuffer = Buffer.from(event.payload, 'base64');
await processAudioChunk(audioBuffer);
}
} finally {
isProcessingAudio = false; // Always release lock
}
res.status(200).send();
});
Why this breaks: Twilio sends media events every 20ms. If your processAudioChunk() takes 25ms, events pile up. Without the isProcessingAudio guard, you get overlapping writes to audioBuffer → corrupted PCM data → garbled audio output.
Webhook Signature Validation Failures
Twilio rejects 30% of webhooks in production due to signature mismatches. The culprit: URL encoding differences between your reverse proxy (ngrok, nginx) and Express.
// Add BEFORE any body parsing middleware
app.use('/webhook/twilio', express.raw({ type: 'application/x-www-form-urlencoded' }));
function validateTwilioSignature(req) {
const signature = req.headers['x-twilio-signature'];
const url = `https://${req.headers.host}${req.originalUrl}`; // Use FULL URL
return twilio.validateRequest(
process.env.TWILIO_AUTH_TOKEN,
signature,
url,
req.body
);
}
Production fix: If validation still fails, log req.originalUrl vs req.url. Proxies often strip query params, breaking HMAC validation. Set trust proxy in Express if behind nginx.
Complete Working Example
This is the full production server that bridges Twilio's voice infrastructure with Retell AI's conversational engine. Copy-paste this into server.js and you have a working AI voice agent that handles inbound calls, streams audio bidirectionally, and manages conversation state.
// server.js - Production-ready Twilio + Retell AI integration
const express = require('express');
const WebSocket = require('ws');
const twilio = require('twilio');
const app = express();
const PORT = process.env.PORT || 3000;
// Retell AI configuration - matches agent setup from previous sections
const retellAgentConfig = {
agent_id: process.env.RETELL_AGENT_ID,
audio_websocket_protocol: 'twilio',
audio_encoding: 'mulaw',
sample_rate: 8000
};
// Twilio signature validation - prevents webhook spoofing
function validateTwilioSignature(req) {
const signature = req.headers['x-twilio-signature'];
const url = `https://${req.headers.host}${req.originalUrl}`;
return twilio.validateRequest(
process.env.TWILIO_AUTH_TOKEN,
signature,
url,
req.body
);
}
// Inbound call handler - Twilio hits this when call arrives
app.post('/incoming-call', express.urlencoded({ extended: false }), (req, res) => {
if (!validateTwilioSignature(req)) {
return res.status(403).send('Forbidden');
}
const twiml = new twilio.twiml.VoiceResponse();
const connect = twiml.connect();
// Stream audio to our WebSocket server
connect.stream({
url: `wss://${req.headers.host}/media-stream`,
parameters: {
agentId: retellAgentConfig.agent_id
}
});
res.type('text/xml');
res.send(twiml.toString());
});
// WebSocket server - handles bidirectional audio streaming
const wss = new WebSocket.Server({ noServer: true });
wss.on('connection', (ws, req) => {
let retellWs = null;
let streamSid = null;
let isAgentSpeaking = false;
let audioBuffer = [];
// Connect to Retell AI's WebSocket
const retellUrl = `wss://api.retellai.com/audio-websocket/${retellAgentConfig.agent_id}`;
retellWs = new WebSocket(retellUrl, {
headers: { 'Authorization': `Bearer ${process.env.RETELL_API_KEY}` }
});
retellWs.on('open', () => {
// Send initial config to Retell
retellWs.send(JSON.stringify({
type: 'config',
config: retellAgentConfig
}));
});
// Twilio → Retell: Forward caller audio
ws.on('message', (message) => {
const event = JSON.parse(message);
if (event.event === 'start') {
streamSid = event.start.streamSid;
}
if (event.event === 'media' && retellWs.readyState === WebSocket.OPEN) {
// Forward mulaw audio to Retell
retellWs.send(JSON.stringify({
type: 'audio',
audio: event.media.payload // Base64 mulaw
}));
}
if (event.event === 'stop') {
retellWs.close();
}
});
// Retell → Twilio: Stream agent responses back
retellWs.on('message', (data) => {
const payload = JSON.parse(data);
if (payload.type === 'audio' && ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({
event: 'media',
streamSid: streamSid,
media: { payload: payload.audio } // Base64 mulaw
}));
}
// Handle conversation events
if (payload.type === 'agent_start_talking') {
isAgentSpeaking = true;
}
if (payload.type === 'agent_stop_talking') {
isAgentSpeaking = false;
}
});
// Cleanup on disconnect
ws.on('close', () => {
if (retellWs) retellWs.close();
});
});
// Upgrade HTTP to WebSocket
const server = app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});
server.on('upgrade', (request, socket, head) => {
wss.handleUpgrade(request, socket, head, (ws) => {
wss.emit('connection', ws, request);
});
});
Run Instructions
Prerequisites:
- Node.js 18+
- ngrok for webhook tunneling:
ngrok http 3000 - Environment variables in
.env:
RETELL_API_KEY=your_retell_key
RETELL_AGENT_ID=your_agent_id
TWILIO_AUTH_TOKEN=your_twilio_token
PORT=3000
Start the server:
npm install express ws twilio dotenv
node server.js
Configure Twilio webhook:
Set your phone number's voice webhook to https://your-ngrok-url.ngrok.io/incoming-call (HTTP POST). Call your Twilio number—the agent answers immediately and streams audio through Retell AI's conversational engine.
What breaks in production: If you see "Connection closed before receiving a message" errors, Retell's WebSocket rejected your auth token or agent_id. Verify both are correct. If audio cuts out after 30 seconds, your ngrok tunnel expired—use a paid ngrok plan or redeploy the webhook URL.
FAQ
Technical Questions
How do I handle audio streaming between Twilio and Retell AI in Node.js?
Twilio sends audio chunks via WebSocket to your Node.js server. You receive these chunks in the media event, extract the payload, and forward them to Retell's WebSocket endpoint. The key is maintaining two concurrent WebSocket connections: one from Twilio (inbound) and one to Retell (outbound). Use the streamSid from Twilio's initial connection message to correlate audio streams. When Retell responds with synthesized audio, you send it back to Twilio using the same streamSid. This bidirectional flow requires careful buffer management—don't queue audio indefinitely or you'll introduce latency that breaks natural conversation flow.
What's the difference between Retell AI and VAPI for Node.js integration?
Both platforms handle AI voice agents, but they differ in webhook architecture and audio handling. Retell uses WebSocket-first streaming for real-time audio, making it ideal for Twilio integration where you need sub-100ms latency. VAPI uses REST webhooks with optional WebSocket fallback, giving you more flexibility for batch processing or asynchronous workflows. For Twilio specifically, Retell's native WebSocket support reduces complexity—you don't need to manage separate REST polling loops. Choose Retell if you're building Twilio-centric systems; choose VAPI if you need multi-channel support (phone, web, SMS).
How do I validate Twilio webhook signatures in Node.js?
Twilio signs every webhook request with an HMAC-SHA1 signature in the X-Twilio-Signature header. Extract this header, reconstruct the signed data (URL + all POST parameters in sorted order), compute HMAC-SHA1 using your Twilio auth token, and compare. The validateTwilioSignature function implements this: it takes the request URL, body parameters, and your auth token, then returns true/false. Always validate before processing—this prevents replay attacks and ensures requests actually came from Twilio's infrastructure.
Performance
Why is my Retell AI agent slow to respond?
Latency compounds at three stages: (1) Twilio audio capture and transmission (50-150ms), (2) Retell's STT + LLM inference (200-800ms depending on model), (3) TTS synthesis and audio streaming back (100-400ms). Total: 350ms–1.35s. Optimize by: reducing interruption_sensitivity to catch user speech faster, using faster LLM models (GPT-3.5 instead of GPT-4), and enabling audio chunking so Retell processes partial transcripts instead of waiting for full sentences. Monitor timestamp values in webhook events to identify which stage is bottlenecking.
How do I prevent audio buffer overflow in Node.js?
Twilio sends audio at 8kHz (8,000 samples/second = 160 bytes per 20ms frame). If your audioBuffer grows unbounded, you'll hit memory limits and introduce multi-second latency. Implement a fixed-size circular buffer (e.g., 2-second capacity = 16,000 samples). When full, drop oldest frames instead of queuing indefinitely. Monitor rms (root mean square) values—if RMS stays near zero for >1.5s, the user is silent; flush the buffer and reset state. This prevents stale audio from being processed after long pauses.
What sample rate should I use for Twilio + Retell?
Twilio defaults to 8kHz (mulaw encoding). Retell supports 8kHz, 16kHz, and 24kHz. Stick with 8kHz to avoid transcoding overhead—every conversion adds 20-50ms latency. Set sample_rate: 8000 in retellAgentConfig. If you need higher quality (16kHz), transcode on Twilio's side before forwarding, but this increases CPU cost by ~15%.
Platform Comparison
Should I use Twilio's built-in AI or integrate Retell AI?
Twilio's Autopilot (now Flex) is tightly integrated but limited to Twilio's LLM models. Retell AI gives you choice: OpenAI
Resources
VAPI: Get Started with VAPI → https://vapi.ai/?aff=misal
Retell AI Documentation: Official Retell AI API docs – Complete reference for retellAgentConfig, agent creation, and WebSocket audio streaming protocols.
Twilio Voice API: Twilio Voice documentation – TwiML generation, webhook signature validation (validateTwilioSignature), and call control via twilio SDK.
Node.js Webhook Security: OWASP Webhook Signature Validation – Best practices for validating signature headers in production.
GitHub Reference: Retell AI + Twilio integration examples – Sample implementations using retellClient and WebSocket event handling.
Top comments (0)