How to Migrate from Deprecated VAPI Transcriber Endpoints to Deepgram v2 in Retell AI Agents
TL;DR
VAPI's native transcriber endpoints are deprecated. Retell AI agents using old STT configs will fail silently or timeout mid-call. Migrate to Deepgram v2 by swapping transcriber provider configs and updating webhook payloads. This prevents dropped transcripts, reduces latency by ~200ms, and unlocks Deepgram's superior noise filtering. Migration takes 15 minutes per agent.
Prerequisites
API Keys & Credentials
You'll need a Deepgram API key (v2 or later). Generate this from your Deepgram console at https://console.deepgram.com. Store it in your .env file as DEEPGRAM_API_KEY. You also need a Retell AI API key from https://retell.cc/dashboard for agent configuration and webhook management.
System & SDK Requirements
Node.js 16+ or Python 3.8+ for server-side integration. Install the Retell SDK (npm install retell-sdk) and Deepgram SDK (npm install @deepgram/sdk). Ensure your environment supports HTTPS webhooks (required for Retell callbacks).
Network & Access
Outbound HTTPS access to api.deepgram.com and api.retell.cc. If behind a corporate firewall, whitelist both domains. Your server must expose a publicly accessible webhook endpoint (use ngrok for local testing: ngrok http 3000).
Knowledge
Familiarity with REST APIs, JSON payloads, and async/await patterns. Understanding of speech-to-text (STT) concepts like sample rates (16kHz PCM), audio encoding, and partial vs. final transcripts will accelerate migration.
Deepgram: Try Deepgram Speech-to-Text → Get Deepgram
Step-by-Step Tutorial
Configuration & Setup
VAPI's transcriber configuration lives in your assistant object. The deprecated endpoints used transcriber.provider: "retell" with legacy STT models. Deepgram v2 requires explicit model selection and endpoint configuration.
Critical: VAPI doesn't expose raw transcriber migration endpoints in their public API. You configure transcribers through assistant creation/update flows. Here's the production-grade assistant config:
// Assistant configuration with Deepgram v2 transcriber
const assistantConfig = {
name: "Deepgram V2 Migration Assistant",
model: {
provider: "openai",
model: "gpt-4",
temperature: 0.7,
systemPrompt: "You are a helpful voice assistant."
},
voice: {
provider: "11labs",
voiceId: "21m00Tcm4TlvDq8ikWAM"
},
transcriber: {
provider: "deepgram",
model: "nova-2", // Deepgram v2 model
language: "en",
smartFormat: true,
keywords: ["VAPI", "Deepgram", "transcription"],
endpointing: 255 // ms silence before finalizing
},
recordingEnabled: true,
hipaaEnabled: false,
clientMessages: [
"transcript",
"hang",
"function-call"
],
serverMessages: [
"end-of-call-report",
"status-update",
"transcript"
],
serverUrl: process.env.WEBHOOK_URL,
serverUrlSecret: process.env.WEBHOOK_SECRET
};
Why this breaks in production: The endpointing value controls silence detection. Retell AI's deprecated transcriber used 400ms defaults. Deepgram v2's 255ms fires faster, causing premature turn-taking on slow speakers. Increase to 350-400ms for natural conversation flow.
Architecture & Flow
flowchart LR
A[User Speech] --> B[VAPI Ingress]
B --> C[Deepgram v2 STT]
C --> D[Partial Transcripts]
C --> E[Final Transcript]
D --> F[Assistant Context]
E --> F
F --> G[GPT-4 Response]
G --> H[ElevenLabs TTS]
H --> I[Audio Stream]
I --> A
C -.Webhook.-> J[Your Server]
E -.Webhook.-> J
G -.Function Call.-> J
Race condition warning: Deepgram v2 sends partial transcripts every 100-200ms. If your webhook handler processes partials synchronously, you'll queue 5-10 requests before the final transcript arrives. Use a debounce pattern or ignore partials unless you need real-time UI updates.
Step-by-Step Implementation
Step 1: Audit Current Transcriber Config
Check your existing assistant for deprecated settings:
-
transcriber.provider: "retell"→ Must change to"deepgram" - Missing
modelfield → Add"nova-2"(Deepgram's latest) - Legacy
languagecodes → Verify ISO 639-1 compliance
Step 2: Update Assistant via Dashboard or API
VAPI doesn't provide a dedicated migration endpoint. You update the assistant object directly. If using the dashboard, navigate to Assistant Settings → Speech → Transcriber. If programmatic, you'd update via their assistant management API (not shown in provided context - use dashboard for safety).
Step 3: Configure Webhook Handlers
Deepgram v2 changes the transcript payload structure. Update your webhook to handle new fields:
// Webhook handler for Deepgram v2 transcripts
app.post('/webhook/vapi', async (req, res) => {
const { message } = req.body;
if (message.type === 'transcript') {
const {
transcriptType, // "partial" or "final"
transcript,
confidence, // NEW in Deepgram v2
words // NEW: word-level timestamps
} = message;
// Only process final transcripts to avoid race conditions
if (transcriptType === 'final') {
console.log(`Final transcript (${confidence}): ${transcript}`);
// Low confidence warning - Deepgram v2 exposes this
if (confidence < 0.85) {
console.warn('Low confidence transcript - verify audio quality');
}
}
}
res.status(200).send('OK');
});
Step 4: Test Endpointing Thresholds
Deepgram v2's faster endpointing causes interruptions on hesitant speakers. Test with 3 profiles:
- Fast talker: 200ms endpointing works
- Normal pace: 255ms (default)
- Slow/thoughtful: 350-400ms required
Adjust transcriber.endpointing based on your user demographic.
Error Handling & Edge Cases
Webhook timeout (5s limit): Deepgram v2 sends word-level timestamps in the words array. Parsing 500+ word objects synchronously will timeout. Process async or strip unnecessary fields.
Confidence score drops: If confidence < 0.8 on final transcripts, check:
- Audio bitrate (minimum 16kHz PCM)
- Background noise levels
-
smartFormat: trueenabled (improves accuracy 8-12%)
Partial transcript flooding: Deepgram v2 fires partials aggressively. Implement debouncing:
let debounceTimer;
if (transcriptType === 'partial') {
clearTimeout(debounceTimer);
debounceTimer = setTimeout(() => {
updateUI(transcript); // Only update UI after 300ms silence
}, 300);
}
Testing & Validation
Latency benchmark: Deepgram v2 averages 180-220ms STT latency (vs Retell's 300-400ms). Measure end-to-end with:
- Start timer on audio chunk sent
- End timer on final transcript webhook received
- Target: <250ms for real-time feel
Accuracy test: Use standard test phrases with industry jargon. Deepgram v2's keywords array boosts recognition for domain-specific terms.
Common Issues & Fixes
Issue: Assistant interrupts user mid-sentence
Fix: Increase endpointing from 255ms to 350ms
Issue: Missing word timestamps in webhook
Fix: Verify transcriber.model: "nova-2" (v1 models don't include this)
Issue: Webhook signature validation fails
Fix: Deepgram v2 doesn't change VAPI's signature scheme - verify serverUrlSecret matches your env var
System Diagram
Call flow showing how vapi handles user input, webhook events, and responses.
sequenceDiagram
participant User
participant VAPI
participant Webhook
participant YourServer
User->>VAPI: Initiates call
VAPI->>User: Welcome message
User->>VAPI: Provides information
VAPI->>Webhook: transcript.final event
Webhook->>YourServer: POST /webhook/vapi with data
YourServer->>VAPI: Processed data response
VAPI->>User: Confirmation message
User->>VAPI: Requests additional info
VAPI->>Webhook: assistant_request event
Webhook->>YourServer: POST /webhook/request
YourServer->>VAPI: Additional info response
VAPI->>User: Provides additional info
User->>VAPI: Ends call
VAPI->>Webhook: call_ended event
Webhook->>YourServer: POST /webhook/end
Note over VAPI,User: Error Handling
User->>VAPI: Unrecognized input
VAPI->>User: Error message
User->>VAPI: Retry input
VAPI->>Webhook: error_event
Webhook->>YourServer: POST /webhook/error
YourServer->>VAPI: Error resolution response
VAPI->>User: Retry confirmation message
Testing & Validation
Local Testing
Most migration failures happen because devs skip local validation before deploying. Use the Vapi CLI webhook forwarder to catch Deepgram v2 payload changes before they break production.
// Install Vapi CLI for local webhook testing
npm install -g @vapi-ai/cli
// Start webhook forwarder (forwards Vapi webhooks to localhost:3000)
vapi webhooks forward --port 3000
// Test endpoint to validate Deepgram v2 transcripts
app.post('/webhook/vapi', (req, res) => {
const { message } = req.body;
if (message.type === 'transcript') {
// Deepgram v2 returns 'transcript' field (NOT 'text')
const text = message.transcript;
if (!text) {
console.error('Migration Error: transcript field missing');
return res.status(400).json({ error: 'Invalid Deepgram v2 payload' });
}
console.log('Deepgram v2 transcript:', text);
}
res.status(200).json({ received: true });
});
This will bite you: Deepgram v2 changed the transcript field name from text to transcript. If your webhook parser still reads message.text, you'll get silent failures—the call succeeds but transcripts are empty.
Webhook Validation
Test the updated assistantConfig with a real call. Verify the transcriber.provider is set to deepgram and transcriber.model is nova-2. Check webhook logs for the new payload structure—message.transcript should contain the text, not message.text. If you see 400 errors, your parser is still using the deprecated field names.
Real-World Example
Barge-In Scenario
Production agents break when users interrupt mid-sentence during the Deepgram v2 migration. The deprecated transcriber config used endpointing: 200 (ms). Deepgram v2 requires explicit endpointingMs and vadThreshold tuning.
Before migration (broken):
// Deprecated config - barge-in fires too early
const assistantConfig = {
name: "Support Agent",
model: { provider: "openai", model: "gpt-4", temperature: 0.7 },
transcriber: {
provider: "deepgram",
language: "en",
endpointing: 200 // DEPRECATED - causes false interrupts
},
voice: { provider: "11labs", voiceId: "21m00Tcm4TlvDq8ikWAM" }
};
After migration (production-ready):
// Deepgram v2 - proper barge-in handling
const assistantConfig = {
name: "Support Agent",
model: { provider: "openai", model: "gpt-4", temperature: 0.7 },
transcriber: {
provider: "deepgram",
model: "nova-2", // v2 model required
language: "en",
keywords: ["cancel", "stop", "wait"], // Boost interrupt detection
endpointing: {
endpointingMs: 400, // Increased from 200ms to reduce false positives
vadThreshold: 0.6 // Higher threshold filters breathing sounds
}
},
voice: { provider: "11labs", voiceId: "21m00Tcm4TlvDq8ikWAM" },
clientMessages: ["transcript", "hang", "speech-update"],
serverMessages: ["end-of-call-report"]
};
Event Logs
{
"type": "transcript",
"role": "user",
"text": "Actually, I need to—",
"timestamp": 1704123456789,
"isFinal": false
}
The partial transcript triggers TTS cancellation. Old configs missed this because endpointing: 200 fired before the user finished speaking.
Edge Cases
Multiple rapid interrupts: User says "wait wait wait" in 600ms. Without keywords: ["wait"], Deepgram v2 treats this as background noise. Add high-priority keywords to boost detection.
False positives on mobile: Network jitter causes 100-400ms latency variance. The deprecated endpointing: 200 triggered on packet delays, not actual speech. Deepgram v2's endpointingMs: 400 + vadThreshold: 0.6 filters network artifacts while preserving real interrupts.
Common Issues & Fixes
Most migration failures happen during the transcriber configuration swap. Here's what breaks in production and how to fix it.
Transcriber Not Initializing
Problem: Assistant starts but transcription never fires. You see connection established but zero transcript events.
Root cause: Deepgram v2 requires explicit language parameter. The deprecated endpoint auto-detected language; v2 does not.
// BROKEN - Missing required language parameter
const assistantConfig = {
name: "Support Agent",
model: {
provider: "openai",
model: "gpt-4",
temperature: 0.7
},
transcriber: {
provider: "deepgram",
model: "nova-2"
// Missing language - transcriber fails silently
}
};
// FIXED - Explicit language configuration
const assistantConfig = {
name: "Support Agent",
model: {
provider: "openai",
model: "gpt-4",
temperature: 0.7
},
transcriber: {
provider: "deepgram",
model: "nova-2",
language: "en-US" // Required in v2
}
};
Fix: Always set language explicitly. Common values: en-US, en-GB, es, fr. Check Deepgram docs for full list.
Endpointing Sensitivity Changed
Problem: Agent interrupts users mid-sentence or waits too long after user stops speaking.
Root cause: Deepgram v2 changed default endpointing from 300ms to 500ms. Your old threshold no longer applies.
Fix: Recalibrate endpointingMs based on use case:
- Customer support (fast-paced): 200-300ms
- Medical/legal (careful listening): 600-800ms
- General conversation: 400-500ms
Test with real users. Mobile networks add 100-200ms jitter.
Keywords Not Triggering
Problem: Custom keywords array (product names, technical terms) no longer boosts recognition accuracy.
Root cause: v2 uses a different keyword weighting algorithm. Old keyword lists need revalidation.
Fix: Re-test your keywords array with actual call recordings. Remove low-impact terms. Deepgram v2 performs better with 5-10 high-value keywords vs. 50+ generic terms.
Complete Working Example
Most migration guides show fragmented configs. Here's the full production-ready assistant with Deepgram v2 transcriber that you can deploy immediately.
Full Server Code
This example creates a complete VAPI assistant with Deepgram v2 transcriber, proper error handling, and production-ready configurations. The code handles the deprecated endpoint migration and includes all necessary fallbacks.
// server.js - Complete VAPI Assistant with Deepgram v2
const express = require('express');
const app = express();
app.use(express.json());
// Production-ready assistant configuration with Deepgram v2
const assistantConfig = {
name: "Deepgram v2 Migration Assistant",
model: {
provider: "openai",
model: "gpt-4",
temperature: 0.7,
systemPrompt: "You are a helpful voice assistant. Speak naturally and confirm you heard the user correctly."
},
voice: {
provider: "11labs",
voiceId: "21m00Tcm4TlvDq8ikWAM"
},
transcriber: {
provider: "deepgram",
model: "nova-2", // Deepgram v2 model
language: "en",
keywords: ["appointment", "booking", "schedule"], // Custom vocabulary
endpointing: 255, // Silence detection in ms
},
clientMessages: [
"transcript", "hang", "function-call", "speech-update", "metadata", "conversation-update"
],
serverMessages: [
"end-of-call-report", "status-update", "hang", "function-call"
]
};
// Create assistant endpoint
app.post('/assistant/create', async (req, res) => {
try {
const response = await fetch('https://api.vapi.ai/assistant', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + process.env.VAPI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify(assistantConfig)
});
if (!response.ok) {
const error = await response.json();
throw new Error(`VAPI API error: ${error.message || response.status}`);
}
const assistant = await response.json();
console.log('Assistant created with Deepgram v2:', assistant.id);
res.json({ success: true, assistantId: assistant.id });
} catch (error) {
console.error('Assistant creation failed:', error);
res.status(500).json({ error: error.message });
}
});
// Webhook handler for transcription events
app.post('/webhook/vapi', (req, res) => { // YOUR server receives webhooks here
const { message } = req.body;
if (message.type === 'transcript') {
const text = message.transcript;
console.log('Deepgram v2 transcript:', text);
// Process transcript with custom keyword detection
const hasKeyword = assistantConfig.transcriber.keywords.some(
keyword => text.toLowerCase().includes(keyword)
);
if (hasKeyword) {
console.log('Keyword detected in transcript');
}
}
res.sendStatus(200);
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Server running on localhost:${PORT}`);
console.log('Deepgram v2 transcriber configured with endpointing:', assistantConfig.transcriber.endpointing + 'ms');
});
Run Instructions
Environment Setup:
export VAPI_API_KEY="your_vapi_api_key_here"
npm install express node-fetch
node server.js
Test the Migration:
- Call
POST localhost:3000/assistant/createto create the assistant - Use the returned
assistantIdin your VAPI dashboard or client SDK - Monitor webhook endpoint for
transcriptevents with Deepgram v2 data - Verify
endpointing(255ms) triggers faster than deprecated default (400ms)
Production Checklist:
- Replace
localhostwith your production domain in webhook URLs - Set
vadThresholdif you need custom voice activity detection (default 0.5) - Monitor
endpointingMsin webhook payloads to validate silence detection - Add retry logic for network failures in the assistant creation endpoint
This configuration eliminates deprecated transcriber endpoints while maintaining backward compatibility with existing VAPI client integrations.
FAQ
Technical Questions
What's the difference between deprecated VAPI transcriber endpoints and Deepgram v2?
Deprecated VAPI transcriber endpoints used older Deepgram API versions with limited model support and outdated streaming protocols. Deepgram v2 introduces improved accuracy, lower latency, and native support for advanced features like endpointing (silence detection) and vadThreshold (voice activity detection tuning). The v2 API also supports real-time partial transcripts via clientMessages and serverMessages, enabling faster response times in conversational AI agents.
How do I know if my Retell AI agent is using deprecated endpoints?
Check your transcriber configuration in your assistantConfig. If your provider field references old Deepgram API paths (pre-v2 URLs) or lacks support for modern streaming parameters like endpointingMs or language options, you're on deprecated endpoints. Retell AI will also flag this in your agent logs or dashboard warnings.
Will migration break my existing conversations?
No. Migration is backward-compatible at the session level. Existing active calls will complete on their current transcriber. New calls initiated after migration will use Deepgram v2. However, you should test in staging first to validate that model, language, and vadThreshold settings produce expected transcription quality.
Performance
How much latency improvement should I expect with Deepgram v2?
Deepgram v2 typically reduces transcription latency by 50-150ms compared to deprecated endpoints, depending on audio quality and network conditions. Partial transcript delivery (clientMessages) arrives 100-200ms faster, enabling quicker agent responses and more natural turn-taking in conversations.
Does Deepgram v2 support real-time endpointing?
Yes. The endpointing parameter in v2 enables configurable silence detection with endpointingMs thresholds (typically 400-800ms). This replaces manual silence detection logic, reducing false positives and improving conversation flow.
Platform Comparison
Should I migrate to Deepgram v2 or switch to another STT provider?
Deepgram v2 is optimized for conversational AI with low-latency streaming and native Retell AI integration. If you need multilingual support, domain-specific accuracy, or cost optimization, compare against alternatives. However, Deepgram v2's endpointing and partial transcript features make it the default choice for Retell AI agents without specific constraints.
Can I run both deprecated and v2 endpoints simultaneously?
Technically yes, but operationally risky. Running dual transcribers creates inconsistent transcription quality, complicates debugging, and wastes API quota. Migrate all agents to v2 within a defined window (typically 2-4 weeks) rather than maintaining hybrid setups.
Resources
VAPI: Get Started with VAPI → https://vapi.ai/?aff=misal
Official Documentation:
- Deepgram API v2 Documentation – Complete endpoint reference, authentication, and model specifications
- Retell AI Agent Configuration Guide – Transcriber setup, voice models, and migration patterns
- VAPI Deprecation Notice – Legacy endpoint sunset timeline and replacement endpoints
Migration Tools:
- Deepgram Python SDK – Official client library for v2 API calls
- Retell AI GitHub Examples – Sample agent configurations using Deepgram v2
References
- https://docs.vapi.ai/quickstart/introduction
- https://docs.vapi.ai/assistants
- https://docs.vapi.ai/quickstart/phone
- https://docs.vapi.ai/chat/quickstart
- https://docs.vapi.ai/workflows/quickstart
- https://docs.vapi.ai/
- https://docs.vapi.ai/observability/evals-quickstart
- https://docs.vapi.ai/quickstart/web
- https://docs.vapi.ai/tools/custom-tools
- https://docs.vapi.ai/server-url/developing-locally
Top comments (0)