How to Build a Vapi Voice Agent from Scratch (Complete 2026 Guide)
Vapi is the fastest way to add a voice AI to any product or business workflow. In under an hour, you can have a phone agent that answers calls, handles natural conversation, executes tools, and integrates with any backend.
This is the guide I wish existed when I started — no fluff, just working code and the decisions that matter.
What Vapi Is (and What It Isn't)
Vapi handles the hard parts of voice AI so you don't have to:
- Speech-to-text (Deepgram, Google, Assembly — your choice)
- LLM inference (GPT-4o, Claude, Gemini — your choice)
- Text-to-speech (ElevenLabs, PlayHT, Cartesia — your choice)
- WebRTC/telephony infrastructure (Twilio, Vonage)
- Turn detection, interruption handling, latency optimization
What you provide: a system prompt, tool definitions, and your API keys.
What Vapi is NOT: a full no-code chatbot builder. You need to understand JSON configs and basic API concepts. If you want zero-code, use Retell AI instead.
Architecture Overview
A Vapi agent has 5 layers:
Phone call / WebRTC
↓
Vapi platform (STT → LLM → TTS)
↓
Your system prompt (assistant behavior)
↓
Tool calls (your server functions)
↓
Your backend (calendar, CRM, database)
The most important design decision: tool architecture. Your agent's intelligence is capped by what tools you give it. Get that right and everything else is configuration.
Prerequisites
- Vapi account at vapi.ai ($20 minimum to start, ~$0.05/min)
- OpenAI API key (or Anthropic for Claude)
- Twilio account (phone number: $1.15/mo) OR just use Vapi's web call for testing
- A publicly accessible HTTPS endpoint for tool calls (ngrok works for local dev)
Step 1: Create Your First Assistant via API
Don't use the dashboard for production. Use the API — it's reproducible and version-controllable.
import requests
import json
VAPI_API_KEY = "your_vapi_key"
assistant_config = {
"name": "Maya — AI Receptionist",
"model": {
"provider": "openai",
"model": "gpt-4o",
"temperature": 0.3, # Lower = more consistent, higher = more natural
"systemPrompt": """You are Maya, the AI receptionist for Sonrisa Dental.
Your role:
- Answer calls warmly and professionally
- Determine if caller is new patient or existing
- For new patients: collect name, phone, reason for visit, insurance
- For existing patients: help reschedule or answer questions
- Book appointments using the book_appointment tool
- Escalate emergencies to the on-call line: (555) 911-0000
Rules:
- Keep responses under 2 sentences
- Confirm bookings by reading back the details
- Never discuss fees beyond "please ask when you arrive"
- If asked if you're AI: "I'm Maya, Sonrisa's virtual receptionist"
"""
},
"voice": {
"provider": "11labs",
"voiceId": "21m00Tcm4TlvDq8ikWAM", # Rachel — warm, professional
"stability": 0.5,
"similarityBoost": 0.75
},
"transcriber": {
"provider": "deepgram",
"model": "nova-2",
"language": "en-US"
},
"tools": [
{
"type": "function",
"function": {
"name": "check_availability",
"description": "Check available appointment slots for a given date range",
"parameters": {
"type": "object",
"properties": {
"preferred_date": {
"type": "string",
"description": "Preferred date in YYYY-MM-DD format, or 'this week', 'next week'"
},
"service_type": {
"type": "string",
"description": "Type of dental service: cleaning, checkup, emergency, cosmetic"
}
},
"required": ["preferred_date"]
}
},
"server": {
"url": "https://your-server.com/api/check-availability",
"timeoutSeconds": 8
}
},
{
"type": "function",
"function": {
"name": "book_appointment",
"description": "Book an appointment for a patient",
"parameters": {
"type": "object",
"properties": {
"patient_name": {"type": "string"},
"phone": {"type": "string"},
"service_type": {"type": "string"},
"appointment_slot": {"type": "string", "description": "ISO 8601 datetime"},
"is_new_patient": {"type": "boolean"},
"insurance": {"type": "string"}
},
"required": ["patient_name", "phone", "appointment_slot", "is_new_patient"]
}
},
"server": {
"url": "https://your-server.com/api/book-appointment",
"timeoutSeconds": 10
}
}
],
"firstMessage": "Thank you for calling Sonrisa Dental. This is Maya — how can I help you today?",
"endCallMessage": "Have a great day! We'll see you soon.",
"endCallPhrases": ["goodbye", "bye bye", "that's all", "that's everything"],
"recordingEnabled": True,
"silenceTimeoutSeconds": 30,
"maxDurationSeconds": 600
}
response = requests.post(
"https://api.vapi.ai/assistant",
headers={
"Authorization": f"Bearer {VAPI_API_KEY}",
"Content-Type": "application/json"
},
json=assistant_config
)
assistant = response.json()
print(f"Assistant created: {assistant['id']}")
Step 2: Build Your Tool Server
When the agent calls a tool, Vapi sends a POST to your server URL. Your server must respond within the timeout window.
Here's a minimal FastAPI server that handles both tools:
from fastapi import FastAPI, Request
from datetime import datetime, timedelta
import json
app = FastAPI()
# Mock calendar data — replace with your actual booking system
AVAILABLE_SLOTS = {
"cleaning": ["2026-02-27T10:00:00", "2026-02-27T14:00:00", "2026-02-28T09:00:00"],
"checkup": ["2026-02-27T11:00:00", "2026-02-28T15:00:00"],
"emergency": ["2026-02-26T16:00:00"], # Same day for emergencies
}
BOOKINGS = [] # Replace with your database
@app.post("/api/check-availability")
async def check_availability(request: Request):
body = await request.json()
# Vapi sends tool call in this format
tool_call = body.get("message", {}).get("toolCalls", [{}])[0]
args = tool_call.get("function", {}).get("arguments", "{}")
params = json.loads(args) if isinstance(args, str) else args
service = params.get("service_type", "checkup")
slots = AVAILABLE_SLOTS.get(service, AVAILABLE_SLOTS["checkup"])
# Format slots for the agent to speak naturally
formatted = []
for slot in slots[:3]: # Max 3 options to avoid overwhelming caller
dt = datetime.fromisoformat(slot)
formatted.append(dt.strftime("%A %B %d at %I:%M %p"))
return {
"results": [{
"toolCallId": tool_call.get("id"),
"result": f"Available slots: {', '.join(formatted)}"
}]
}
@app.post("/api/book-appointment")
async def book_appointment(request: Request):
body = await request.json()
tool_call = body.get("message", {}).get("toolCalls", [{}])[0]
args = tool_call.get("function", {}).get("arguments", "{}")
params = json.loads(args) if isinstance(args, str) else args
# Save to your database here
booking = {
"id": f"BK{len(BOOKINGS)+1:04d}",
"patient": params.get("patient_name"),
"phone": params.get("phone"),
"service": params.get("service_type", "checkup"),
"slot": params.get("appointment_slot"),
"new_patient": params.get("is_new_patient", True),
"insurance": params.get("insurance", "unspecified"),
"created_at": datetime.utcnow().isoformat()
}
BOOKINGS.append(booking)
# Send confirmation SMS here (Twilio)
# send_sms(params.get("phone"), f"Confirmed: {booking['service']} on {booking['slot']}")
slot_dt = datetime.fromisoformat(params["appointment_slot"])
formatted_slot = slot_dt.strftime("%A %B %d at %I:%M %p")
return {
"results": [{
"toolCallId": tool_call.get("id"),
"result": f"Appointment confirmed for {params['patient_name']} on {formatted_slot}. Confirmation ID: {booking['id']}"
}]
}
# Run: uvicorn server:app --reload --port 8000
# Expose locally: ngrok http 8000
Step 3: Attach a Phone Number
# Buy a number (or bring your own Twilio number)
number_response = requests.post(
"https://api.vapi.ai/phone-number",
headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
json={
"provider": "twilio",
"assistantId": assistant["id"],
"name": "Sonrisa Main Line",
# Use existing Twilio number:
"twilioPhoneNumber": "+15551234567",
"twilioAccountSid": "ACxxx",
"twilioAuthToken": "your_auth_token"
}
)
phone = number_response.json()
print(f"Number attached: {phone['number']}")
Or buy directly through Vapi (they handle Twilio):
buy_response = requests.post(
"https://api.vapi.ai/phone-number/buy",
headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
json={
"areaCode": "512", # Target local area code
"assistantId": assistant["id"]
}
)
Step 4: Configure Webhooks for Call Events
Track every call outcome for analysis and follow-up:
# Add to assistant config (or update existing)
update_response = requests.patch(
f"https://api.vapi.ai/assistant/{assistant['id']}",
headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
json={
"serverUrl": "https://your-server.com/api/vapi-webhook",
"serverUrlSecret": "your_webhook_secret"
}
)
@app.post("/api/vapi-webhook")
async def vapi_webhook(request: Request):
body = await request.json()
message_type = body.get("message", {}).get("type")
if message_type == "end-of-call-report":
report = body["message"]
call_data = {
"duration": report.get("durationSeconds"),
"ended_reason": report.get("endedReason"), # "customer-ended-call", "silence-timed-out", etc.
"cost": report.get("cost"), # In USD
"transcript": report.get("transcript"),
"summary": report.get("summary"),
"tools_called": [t["name"] for t in report.get("toolCalls", [])],
"appointment_booked": "book_appointment" in [t["name"] for t in report.get("toolCalls", [])]
}
# Log it, send to Slack, update CRM, whatever
print(f"Call ended: {call_data}")
# If appointment was booked, email the practice
if call_data["appointment_booked"]:
# send_email_to_practice(call_data)
pass
return {"status": "ok"}
The Settings That Matter Most
Temperature: Keep it low (0.2–0.4) for receptionists. Higher temperature = more natural conversation but more hallucinations. A receptionist that confidently books the wrong slot is worse than one that sounds a bit stiff.
Turn detection sensitivity: Vapi's default is usually fine, but if you're getting cut-offs (agent interrupts before caller finishes), increase endpointingConfig.onPunctuationSeconds.
First message: Don't make it a question. "Thank you for calling X, this is Maya" lets the caller control what happens next. Starting with a question ("How can I help you today?") sounds more bot-like.
Max duration: Set a hard cap (maxDurationSeconds: 600). Rare, but some callers will keep an agent on the line indefinitely. A 10-minute cap prevents runaway costs.
Tool timeouts: Your tools must respond within the timeout. If they don't, the agent gets no result and has to handle the uncertainty gracefully. Keep tools fast (< 5 seconds) or the call feels broken.
Testing Your Agent
Test before going live with real calls:
# Initiate a test call programmatically
test_call = requests.post(
"https://api.vapi.ai/call",
headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
json={
"assistantId": assistant["id"],
"customer": {
"number": "+1YOUR_TEST_NUMBER" # Call your own phone
},
"phoneNumberId": phone["id"]
}
)
print(f"Test call initiated: {test_call.json()['id']}")
Test these scenarios every time before deploying:
- New patient booking — happy path
- Existing patient reschedule
- Emergency call (should get different response + escalation)
- "Are you a robot?" question
- Long silence (should handle gracefully)
- Caller hangs up mid-booking (check if partial data is saved)
- Tool failure (take your server down, see what the agent does)
Cost Breakdown
For a small dental practice (200 calls/month, avg 2 min):
| Component | Cost |
|---|---|
| Vapi platform | ~$20 (400 min @ $0.05) |
| OpenAI GPT-4o | ~$8 (included in Vapi pricing) |
| ElevenLabs TTS | ~$5 |
| Deepgram STT | ~$3 |
| Twilio phone | $1.15 |
| Total | ~$37/month |
At a $299/month service price, that's ~88% gross margin.
Where to Go from Here
-
Multi-language support: Add
languageto the transcriber config. Vapi supports 30+ languages. For a CDMX dental practice, add Spanish as a fallback. -
Human handoff: Use Vapi's
transfertool to forward calls to a human during business hours. - Calendar integration: Replace the mock slots with a real Cal.com or Google Calendar API call in your tool server.
- CRM sync: Log every call and booking to HubSpot or whatever CRM the business uses.
The core architecture above scales to any voice use case: lead qualification, outbound sales calls, appointment reminders (outbound), customer support triage, or intake for law firms.
Rey Midas builds AI-powered business automation at MidasTools. We deploy Vapi-based phone agents for service businesses — dental practices, law firms, and real estate offices — starting at $499 setup.
Top comments (0)