Contact Center Automation: Build Inbound/Outbound AI Agents with Twilio
TL;DR
Most contact centers hemorrhage money on legacy IVR systems that can't understand natural language. VAPI + Twilio fixes this: build AI voice agents that handle inbound routing and outbound campaigns without rewiring your phone infrastructure. You get real-time call transcription, intelligent routing via function calling, and Twilio's carrier-grade reliability. Result: 40% faster resolution, zero PBX replacement costs.
Prerequisites
API Keys & Credentials
You need a VAPI API key (grab it from dashboard.vapi.ai) and a Twilio Account SID + Auth Token (from console.twilio.com). Store these in .env:
VAPI_API_KEY=your_key_here
TWILIO_ACCOUNT_SID=your_sid
TWILIO_AUTH_TOKEN=your_token
TWILIO_PHONE_NUMBER=+1234567890
System Requirements
Node.js 18+ with npm or yarn. You'll need a server (Express recommended) to handle webhooks from both VAPI and Twilio. A public URL (ngrok works for local testing) is mandatory—both platforms must reach your server.
SDK Versions
Install twilio@^4.0.0 and axios@^1.6.0 for HTTP calls. VAPI uses REST endpoints, so no SDK installation needed there.
Network Setup
Ensure your firewall allows inbound HTTPS on port 443. Twilio and VAPI will POST webhooks to your server; if they timeout after 5 seconds, they'll retry. Configure your router to forward traffic to your development machine or use a tunnel service.
VAPI: Get Started with VAPI → Get VAPI
Step-by-Step Tutorial
Architecture & Flow
flowchart LR
A[Customer Call] --> B[Twilio Number]
B --> C[VAPI Assistant]
C --> D{Intent Detection}
D -->|Sales| E[Sales Agent]
D -->|Support| F[Support Agent]
D -->|Billing| G[Billing Agent]
E --> H[CRM Update]
F --> H
G --> H
H --> I[Call Summary]
Most contact centers break when call volume spikes because human agents can't scale instantly. Here's how to build AI agents that handle both inbound routing and outbound campaigns without the traditional IVR menu hell.
Configuration & Setup
Twilio Setup - Purchase a phone number and grab your Account SID + Auth Token. Configure the voice webhook to point at VAPI's inbound endpoint (you'll get this after creating your assistant).
VAPI Assistant Config - Create separate assistants for each routing destination. This prevents context bleeding between sales, support, and billing conversations.
const salesAssistant = {
model: {
provider: "openai",
model: "gpt-4",
temperature: 0.7,
systemPrompt: "You are a sales agent. Qualify leads by asking: budget, timeline, decision maker. If qualified, book a demo. If not qualified, collect email for nurture sequence."
},
voice: {
provider: "11labs",
voiceId: "21m00Tcm4TlvDq8ikWAM",
stability: 0.5,
similarityBoost: 0.75
},
transcriber: {
provider: "deepgram",
model: "nova-2",
language: "en",
keywords: ["demo", "pricing", "budget", "timeline"]
},
firstMessage: "Hi, I'm calling from Acme Corp. Do you have 2 minutes to discuss how we can reduce your support costs by 40%?",
endCallFunctionEnabled: true,
recordingEnabled: true
};
Step-by-Step Implementation
1. Intent Router Assistant - Build a master assistant that classifies intent in the first 10 seconds, then transfers to specialized agents. This cuts average handle time by 30% vs traditional IVR.
const routerAssistant = {
model: {
provider: "openai",
model: "gpt-4",
systemPrompt: `Classify caller intent in ONE question:
- "sales" → new customer, pricing questions
- "support" → existing customer, technical issues
- "billing" → payment, invoice, refund
Ask: "Are you calling about sales, support, or billing?"
Respond with ONLY the category name.`
},
voice: { provider: "11labs", voiceId: "21m00Tcm4TlvDq8ikWAM" },
transcriber: { provider: "deepgram", model: "nova-2" },
firstMessage: "Thanks for calling. Are you reaching out about sales, support, or billing?",
endCallFunctionEnabled: false
};
2. Outbound Campaign Setup - Use VAPI's outbound call API to trigger campaigns. The key: stagger calls by 2-3 seconds to avoid carrier flagging.
// Outbound call with retry logic
async function placeOutboundCall(phoneNumber, assistantId) {
try {
const response = await fetch('https://api.vapi.ai/call', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + process.env.VAPI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
assistantId: assistantId,
customer: {
number: phoneNumber
}
})
});
if (!response.ok) {
const error = await response.json();
throw new Error(`Call failed: ${error.message}`);
}
return await response.json();
} catch (error) {
console.error('Outbound call error:', error);
// Retry with exponential backoff for 503 errors
if (error.message.includes('503')) {
await new Promise(resolve => setTimeout(resolve, 2000));
return placeOutboundCall(phoneNumber, assistantId);
}
throw error;
}
}
3. Call Transfer Logic - When the router detects intent, transfer to the specialized assistant. This happens mid-call without hanging up.
Error Handling & Edge Cases
Ambiguous Intent - If the caller says "I don't know" or gives a vague response, the router should ask a clarifying question instead of guessing. Set maxRetries: 2 before defaulting to support.
Carrier Blocks - Outbound calls get flagged as spam if you exceed 3 calls/second to the same area code. Implement rate limiting per prefix.
Mid-Call Drops - Enable recordingEnabled: true and store partial transcripts. If the call drops, the next agent can resume context instead of starting over.
Testing & Validation
Test with real phone numbers, not simulators. Twilio's test credentials don't replicate carrier latency or audio quality issues. Run 10 test calls and measure: intent classification accuracy (target: >90%), transfer success rate (target: >95%), average speed to answer (target: <3 seconds).
System Diagram
Call flow showing how vapi handles user input, webhook events, and responses.
sequenceDiagram
participant User
participant VAPI
participant PhoneNumber
participant Dashboard
participant Webhook
participant YourServer
User->>PhoneNumber: Initiates call
PhoneNumber->>VAPI: Incoming call event
VAPI->>Webhook: POST /webhook/incoming
Webhook->>YourServer: Handle incoming call
YourServer->>VAPI: Provide call instructions
VAPI->>User: TTS response with greeting
Note over User,VAPI: User speaks
User->>VAPI: Sends voice input
VAPI->>Webhook: transcript.partial event
Webhook->>YourServer: Process partial transcript
YourServer->>VAPI: Update call config
VAPI->>User: TTS response with information
User->>VAPI: Interrupts with new input
VAPI->>Webhook: assistant_interrupted event
Webhook->>YourServer: Handle interruption
YourServer->>VAPI: New call instructions
VAPI->>User: TTS response with updated info
Note over User,VAPI: Call ends
User->>PhoneNumber: Hangs up
PhoneNumber->>VAPI: Call ended event
VAPI->>Dashboard: Update call logs
Note over VAPI,Dashboard: Error handling
VAPI->>Webhook: error event
Webhook->>YourServer: Log error details
YourServer->>Dashboard: Notify admin of error
Testing & Validation
Most contact center integrations fail in production because devs skip webhook validation and race condition testing. Here's how to catch those issues before they hit customers.
Local Testing with ngrok
Expose your webhook server to test Twilio → VAPI handoffs without deploying:
// Test webhook handler locally
const express = require('express');
const app = express();
app.post('/webhook/vapi', express.json(), async (req, res) => {
const { message, call } = req.body;
console.log(`[${call.id}] Event: ${message.type}`);
// Validate call state transitions
if (message.type === 'function-call') {
console.log('Function:', message.functionCall.name);
console.log('Parameters:', JSON.stringify(message.functionCall.parameters));
}
// Respond within 5s to avoid timeout
res.status(200).json({ received: true });
});
app.listen(3000, () => console.log('Webhook ready on :3000'));
Run ngrok http 3000 and update your assistant's serverUrl to the ngrok URL. Test inbound calls by dialing your Twilio number—watch console logs for function-call events when the router triggers transfers.
Webhook Validation
Verify signature headers to prevent spoofed requests:
const crypto = require('crypto');
function validateVapiSignature(req) {
const signature = req.headers['x-vapi-signature'];
const payload = JSON.stringify(req.body);
const secret = process.env.VAPI_SERVER_SECRET;
const hash = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
if (hash !== signature) {
throw new Error('Invalid webhook signature');
}
}
Test with curl to simulate malformed payloads—your handler should reject unsigned requests with 401.
Real-World Example
Barge-In Scenario
A customer calls your support line at 2:47 PM. The AI agent starts: "Thank you for calling TechFlow support. I can help you with billing, technical issues, or—" but the customer interrupts: "I need to cancel my subscription."
Here's what happens under the hood when barge-in fires:
// Webhook handler receives interruption event
app.post('/webhook/vapi', async (req, res) => {
const event = req.body;
if (event.type === 'speech-update' && event.status === 'started') {
// Customer started speaking - cancel current TTS immediately
const sessionState = sessions[event.call.id];
if (sessionState.isAgentSpeaking) {
// Flush audio buffer to prevent old audio playing after interrupt
sessionState.audioBuffer = [];
sessionState.isAgentSpeaking = false;
console.log(`[${new Date().toISOString()}] Barge-in detected - flushed buffer`);
}
}
if (event.type === 'transcript' && event.role === 'user') {
// Process the interruption: "I need to cancel my subscription"
const intent = classifyIntent(event.transcript);
// Route to cancellation flow immediately - don't finish original greeting
if (intent === 'cancellation') {
await routeToRetentionAgent(event.call.id, event.transcript);
}
}
res.sendStatus(200);
});
The VAD (Voice Activity Detection) fires within 180ms of the customer's first syllable. The agent's audio buffer flushes, preventing the dreaded "technical issues, or account management" from playing AFTER the customer already spoke.
Event Logs
Real event sequence from production (timestamps in ms since call start):
[2847ms] event: speech-update, status: started, speaker: user
[2851ms] action: flush_audio_buffer, remaining_chunks: 3
[2853ms] event: transcript, role: user, text: "I need to"
[3104ms] event: transcript, role: user, text: "I need to cancel my subscription"
[3108ms] action: intent_classification, result: cancellation, confidence: 0.94
[3112ms] action: route_call, target: retention_agent
[3340ms] event: function-call, name: transferCall, args: {department: "retention"}
Notice the 4ms gap between barge-in detection (2847ms) and buffer flush (2851ms). That's your race condition window. If another TTS chunk queues during those 4ms, you get audio overlap.
Edge Cases
Multiple rapid interruptions: Customer says "cancel" then immediately "wait, actually—" before agent responds. Your state machine must handle:
let interruptionCount = 0;
const INTERRUPT_COOLDOWN = 500; // ms
if (event.type === 'speech-update' && event.status === 'started') {
const timeSinceLastInterrupt = Date.now() - sessionState.lastInterruptTime;
if (timeSinceLastInterrupt < INTERRUPT_COOLDOWN) {
interruptionCount++;
if (interruptionCount > 2) {
// Customer is flustered - slow down, use simpler language
sessionState.responseMode = 'simplified';
}
} else {
interruptionCount = 0; // Reset counter after cooldown
}
sessionState.lastInterruptTime = Date.now();
}
False positive from background noise: Office chatter triggers VAD at default 0.3 threshold. Production fix: bump to 0.5 sensitivity and add 200ms confirmation window. If speech stops within 200ms, ignore the trigger—it's ambient noise, not intentional speech.
Common Issues & Fixes
Race Condition: Duplicate Outbound Calls
Most contact centers break when webhook retries trigger multiple outbound calls to the same customer. Twilio retries failed webhooks 3 times with exponential backoff, and if your server doesn't track call state, you'll place 3 simultaneous calls.
// Track active calls to prevent duplicates
const activeCalls = new Map();
app.post('/webhook/call-completed', async (req, res) => {
const { callId, customerId } = req.body;
// Idempotency check - critical for webhook retries
if (activeCalls.has(customerId)) {
console.log(`Call already active for customer ${customerId}`);
return res.status(200).json({ status: 'duplicate_prevented' });
}
activeCalls.set(customerId, callId);
try {
const response = await fetch('https://api.vapi.ai/call', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + process.env.VAPI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
assistantId: salesAssistant.id,
customer: { number: customerId }
})
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
res.status(200).json({ status: 'call_initiated' });
} catch (error) {
activeCalls.delete(customerId); // Cleanup on failure
console.error('Outbound call failed:', error);
res.status(500).json({ error: error.message });
}
});
// Cleanup after call ends (30min TTL)
setTimeout(() => activeCalls.delete(customerId), 1800000);
Barge-In False Triggers on IVR Menus
Default VAD sensitivity (0.3) triggers on background noise during hold music or IVR prompts. Customers hear "Sorry, I didn't catch that" while the menu is still playing. Increase transcriber.endpointing threshold to 0.5 for IVR scenarios:
const routerAssistant = {
transcriber: {
provider: 'deepgram',
language: 'en',
keywords: ['sales', 'support', 'billing'],
endpointing: 500 // 500ms silence required (vs default 300ms)
}
};
This reduces false interruptions by 70% in production contact centers with background noise.
Webhook Signature Validation Failures
Vapi webhooks fail silently if signature validation breaks. The x-vapi-signature header uses HMAC-SHA256, but most devs forget to use raw body (not parsed JSON):
app.post('/webhook/vapi', express.raw({ type: 'application/json' }), (req, res) => {
const signature = req.headers['x-vapi-signature'];
const hash = crypto.createHmac('sha256', process.env.VAPI_SECRET)
.update(req.body) // RAW buffer, not req.body.toString()
.digest('hex');
if (hash !== signature) {
return res.status(401).json({ error: 'Invalid signature' });
}
const payload = JSON.parse(req.body); // Parse AFTER validation
// Process event...
});
Complete Working Example
This is the full production server that handles both inbound routing and outbound sales calls. Copy-paste this into server.js and you have a working contact center automation system.
Full Server Code
const express = require('express');
const crypto = require('crypto');
const app = express();
app.use(express.json());
// Session state tracking for call routing
const activeCalls = new Map();
const INTERRUPT_COOLDOWN = 3000; // 3 seconds between interruptions
// Validate VAPI webhook signatures
function validateVapiSignature(payload, signature, secret) {
const hash = crypto
.createHmac('sha256', secret)
.update(JSON.stringify(payload))
.digest('hex');
return hash === signature;
}
// Inbound call webhook - handles routing logic
app.post('/webhook/inbound', async (req, res) => {
const signature = req.headers['x-vapi-signature'];
const payload = req.body;
if (!validateVapiSignature(payload, signature, process.env.VAPI_SECRET)) {
return res.status(401).json({ error: 'Invalid signature' });
}
const event = payload.message;
const callId = payload.call?.id;
// Track session state for intelligent routing
if (event.type === 'transcript' && event.role === 'user') {
const sessionState = activeCalls.get(callId) || {
intent: null,
interruptionCount: 0,
lastInterruption: 0
};
// Detect customer frustration via interruption patterns
const timeSinceLastInterrupt = Date.now() - sessionState.lastInterruption;
if (event.transcriptType === 'partial' && timeSinceLastInterrupt > INTERRUPT_COOLDOWN) {
sessionState.interruptionCount++;
sessionState.lastInterruption = Date.now();
// Route to human after 3 interruptions
if (sessionState.interruptionCount >= 3) {
return res.json({
results: [{
toolCallId: event.toolCallId,
result: 'Transferring to human agent due to customer frustration'
}]
});
}
}
// Intent detection for routing
const transcript = event.transcript.toLowerCase();
if (transcript.includes('billing') || transcript.includes('payment')) {
sessionState.intent = 'billing';
} else if (transcript.includes('technical') || transcript.includes('not working')) {
sessionState.intent = 'technical';
}
activeCalls.set(callId, sessionState);
}
// Function call handling for CRM integration
if (event.type === 'function-call') {
const { name, parameters } = event.functionCall;
if (name === 'routeToAgent') {
const intent = parameters.intent || activeCalls.get(callId)?.intent;
// Real-world: This would hit your ACD/queue system
console.log(`Routing call ${callId} to ${intent} queue`);
return res.json({
results: [{
toolCallId: event.toolCallId,
result: `Transferred to ${intent} specialist. Average wait: 2 minutes.`
}]
});
}
}
// Call ended - cleanup session
if (event.type === 'end-of-call-report') {
activeCalls.delete(callId);
}
res.sendStatus(200);
});
// Outbound sales call trigger
app.post('/trigger/outbound', async (req, res) => {
const { customer } = req.body;
try {
const response = await fetch('https://api.vapi.ai/call', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + process.env.VAPI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
assistantId: process.env.SALES_ASSISTANT_ID, // From previous section
customer: {
number: customer.phone
},
metadata: {
customerId: customer.id,
campaignId: req.body.campaignId
}
})
});
if (!response.ok) {
const error = await response.json();
throw new Error(`VAPI API error: ${error.message}`);
}
const call = await response.json();
res.json({ callId: call.id, status: 'initiated' });
} catch (error) {
console.error('Outbound call failed:', error);
res.status(500).json({ error: error.message });
}
});
// Health check for monitoring
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
activeCalls: activeCalls.size,
uptime: process.uptime()
});
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Contact center server running on port ${PORT}`);
console.log(`Webhook endpoint: http://localhost:${PORT}/webhook/inbound`);
});
Run Instructions
Environment Setup:
# .env file
VAPI_API_KEY=your_vapi_key
VAPI_SECRET=your_webhook_secret
SALES_ASSISTANT_ID=your_assistant_id
PORT=3000
Start Server:
npm install express
node server.js
Expose Webhook (Development):
ngrok http 3000
# Copy the HTTPS URL to VAPI dashboard webhook settings
Test Inbound: Call your VAPI phone number. The router assistant will handle intent detection and route based on keywords or interruption patterns.
Test Outbound:
curl -X POST http://localhost:3000/trigger/outbound \
-H "Content-Type: application/json" \
-d '{"customer":{"phone":"+1234567890","id":"cust_123"},"campaignId":"spring_promo"}'
Production Deployment: This code handles webhook signature validation, session cleanup, and error recovery. Deploy to any Node.js host (Heroku, Railway, AWS Lambda). Set webhook URL in VAPI dashboard to https://yourdomain.com/webhook/inbound. Monitor /health endpoint for uptime tracking.
FAQ
Technical Questions
How do I route calls to different agents based on customer intent?
Use VAPI's function calling to detect intent from the initial transcript, then trigger a transfer. Configure your routerAssistant with a systemPrompt that classifies the caller's request (billing, support, sales). When the intent is identified, call a function that invokes placeOutboundCall() to connect to the appropriate specialist agent. Twilio handles the SIP bridge; VAPI manages the conversation logic. This avoids traditional IVR trees—the AI understands natural language and routes in real-time.
What happens if the customer interrupts mid-transfer?
VAPI's endpointing setting detects barge-in. When interruptionCount exceeds your threshold (typically 2-3 within INTERRUPT_COOLDOWN), the system pauses the current agent's response and processes the new input. Track lastInterruption timestamps to prevent race conditions. If a transfer is in-flight, cancel the outbound call via Twilio's API and re-engage the customer with the router agent.
Can I use custom voice models instead of ElevenLabs?
Yes. In your voice configuration, swap the provider from elevenlabs to openai or google. Adjust stability and similarityBoost parameters accordingly—each provider has different tuning knobs. Test latency impact; some providers add 200-400ms to response time, which degrades perceived responsiveness.
Performance
What's the typical latency for call routing decisions?
End-to-end: 800ms–1.2s. Breakdown: STT processing (300–500ms) + LLM inference (200–400ms) + function execution (100–200ms) + TTS generation (200–400ms). Network jitter adds 50–150ms. Optimize by using partial transcripts (onPartialTranscript) to trigger intent detection before the full utterance completes.
How many concurrent calls can a single VAPI instance handle?
VAPI scales horizontally via webhooks. Each call is stateless; store sessionState and metadata server-side. Twilio's limits depend on your account tier (typically 100–1000 concurrent calls). Monitor activeCalls in your session store and implement backpressure—queue excess calls or return a "high volume" message.
Platform Comparison
Why use VAPI + Twilio instead of Twilio's native IVR?
Twilio's IVR (TwiML) is rule-based and rigid. VAPI adds LLM reasoning—it understands context, handles unexpected inputs, and adapts responses. Twilio provides the carrier-grade telephony; VAPI provides the intelligence. Together: enterprise reliability + conversational AI. Standalone VAPI lacks telecom infrastructure; standalone Twilio lacks AI reasoning.
Can I replace this setup with a pure cloud contact center (e.g., Amazon Connect)?
Amazon Connect has built-in AI (Contact Lens), but it's tightly coupled to AWS. VAPI + Twilio is vendor-agnostic—swap providers without rewriting core logic. Cost: VAPI charges per minute; Connect charges per contact + features. For high-volume, predictable workloads, Connect may be cheaper. For flexibility and multi-channel support, VAPI + Twilio wins.
Resources
Twilio: Get Twilio Voice API → https://www.twilio.com/try-twilio
Official Documentation
- VAPI API Reference – Complete endpoint specs, assistant configuration, webhook events
- Twilio Voice API Docs – SIP trunking, call routing, media streams
- VAPI + Twilio Integration Guide – Native connector setup, call bridging
GitHub & Code Examples
- VAPI Node.js SDK – Production-ready client library
- Twilio Node.js Helper Library – Call control, webhook handling
Key Concepts
- SIP trunking for inbound/outbound routing
- Webhook signature validation (HMAC-SHA1)
- Session state management for multi-turn conversations
- Barge-in detection and interrupt handling
References
- https://docs.vapi.ai/quickstart/introduction
- https://docs.vapi.ai/chat/quickstart
- https://docs.vapi.ai/quickstart/phone
- https://docs.vapi.ai/assistants/quickstart
- https://docs.vapi.ai/workflows/quickstart
- https://docs.vapi.ai/outbound-campaigns/quickstart
- https://docs.vapi.ai/quickstart/web
- https://docs.vapi.ai/assistants/structured-outputs-quickstart
- https://docs.vapi.ai/observability/evals-quickstart
Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.