Midas Tools

Posted on Feb 26

How to Build a Vapi Voice Agent from Scratch (Complete 2026 Guide)

#ai #tutorial #webdev #python

How to Build a Vapi Voice Agent from Scratch (Complete 2026 Guide)

Vapi is the fastest way to add a voice AI to any product or business workflow. In under an hour, you can have a phone agent that answers calls, handles natural conversation, executes tools, and integrates with any backend.

This is the guide I wish existed when I started — no fluff, just working code and the decisions that matter.

What Vapi Is (and What It Isn't)

Vapi handles the hard parts of voice AI so you don't have to:

Speech-to-text (Deepgram, Google, Assembly — your choice)
LLM inference (GPT-4o, Claude, Gemini — your choice)
Text-to-speech (ElevenLabs, PlayHT, Cartesia — your choice)
WebRTC/telephony infrastructure (Twilio, Vonage)
Turn detection, interruption handling, latency optimization

What you provide: a system prompt, tool definitions, and your API keys.

What Vapi is NOT: a full no-code chatbot builder. You need to understand JSON configs and basic API concepts. If you want zero-code, use Retell AI instead.

Architecture Overview

A Vapi agent has 5 layers:

Phone call / WebRTC
        ↓
    Vapi platform (STT → LLM → TTS)
        ↓
  Your system prompt (assistant behavior)
        ↓
    Tool calls (your server functions)
        ↓
   Your backend (calendar, CRM, database)

The most important design decision: tool architecture. Your agent's intelligence is capped by what tools you give it. Get that right and everything else is configuration.

Prerequisites

Vapi account at vapi.ai ($20 minimum to start, ~$0.05/min)
OpenAI API key (or Anthropic for Claude)
Twilio account (phone number: $1.15/mo) OR just use Vapi's web call for testing
A publicly accessible HTTPS endpoint for tool calls (ngrok works for local dev)

Step 1: Create Your First Assistant via API

Don't use the dashboard for production. Use the API — it's reproducible and version-controllable.

import requests
import json

VAPI_API_KEY = "your_vapi_key"

assistant_config = {
    "name": "Maya — AI Receptionist",
    "model": {
        "provider": "openai",
        "model": "gpt-4o",
        "temperature": 0.3,  # Lower = more consistent, higher = more natural
        "systemPrompt": """You are Maya, the AI receptionist for Sonrisa Dental.

Your role:
- Answer calls warmly and professionally
- Determine if caller is new patient or existing
- For new patients: collect name, phone, reason for visit, insurance
- For existing patients: help reschedule or answer questions
- Book appointments using the book_appointment tool
- Escalate emergencies to the on-call line: (555) 911-0000

Rules:
- Keep responses under 2 sentences
- Confirm bookings by reading back the details
- Never discuss fees beyond "please ask when you arrive"
- If asked if you're AI: "I'm Maya, Sonrisa's virtual receptionist"
"""
    },
    "voice": {
        "provider": "11labs",
        "voiceId": "21m00Tcm4TlvDq8ikWAM",  # Rachel — warm, professional
        "stability": 0.5,
        "similarityBoost": 0.75
    },
    "transcriber": {
        "provider": "deepgram",
        "model": "nova-2",
        "language": "en-US"
    },
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "check_availability",
                "description": "Check available appointment slots for a given date range",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "preferred_date": {
                            "type": "string",
                            "description": "Preferred date in YYYY-MM-DD format, or 'this week', 'next week'"
                        },
                        "service_type": {
                            "type": "string",
                            "description": "Type of dental service: cleaning, checkup, emergency, cosmetic"
                        }
                    },
                    "required": ["preferred_date"]
                }
            },
            "server": {
                "url": "https://your-server.com/api/check-availability",
                "timeoutSeconds": 8
            }
        },
        {
            "type": "function",
            "function": {
                "name": "book_appointment",
                "description": "Book an appointment for a patient",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "patient_name": {"type": "string"},
                        "phone": {"type": "string"},
                        "service_type": {"type": "string"},
                        "appointment_slot": {"type": "string", "description": "ISO 8601 datetime"},
                        "is_new_patient": {"type": "boolean"},
                        "insurance": {"type": "string"}
                    },
                    "required": ["patient_name", "phone", "appointment_slot", "is_new_patient"]
                }
            },
            "server": {
                "url": "https://your-server.com/api/book-appointment",
                "timeoutSeconds": 10
            }
        }
    ],
    "firstMessage": "Thank you for calling Sonrisa Dental. This is Maya — how can I help you today?",
    "endCallMessage": "Have a great day! We'll see you soon.",
    "endCallPhrases": ["goodbye", "bye bye", "that's all", "that's everything"],
    "recordingEnabled": True,
    "silenceTimeoutSeconds": 30,
    "maxDurationSeconds": 600
}

response = requests.post(
    "https://api.vapi.ai/assistant",
    headers={
        "Authorization": f"Bearer {VAPI_API_KEY}",
        "Content-Type": "application/json"
    },
    json=assistant_config
)

assistant = response.json()
print(f"Assistant created: {assistant['id']}")

Step 2: Build Your Tool Server

When the agent calls a tool, Vapi sends a POST to your server URL. Your server must respond within the timeout window.

Here's a minimal FastAPI server that handles both tools:

from fastapi import FastAPI, Request
from datetime import datetime, timedelta
import json

app = FastAPI()

# Mock calendar data — replace with your actual booking system
AVAILABLE_SLOTS = {
    "cleaning": ["2026-02-27T10:00:00", "2026-02-27T14:00:00", "2026-02-28T09:00:00"],
    "checkup": ["2026-02-27T11:00:00", "2026-02-28T15:00:00"],
    "emergency": ["2026-02-26T16:00:00"],  # Same day for emergencies
}

BOOKINGS = []  # Replace with your database

@app.post("/api/check-availability")
async def check_availability(request: Request):
    body = await request.json()

    # Vapi sends tool call in this format
    tool_call = body.get("message", {}).get("toolCalls", [{}])[0]
    args = tool_call.get("function", {}).get("arguments", "{}")
    params = json.loads(args) if isinstance(args, str) else args

    service = params.get("service_type", "checkup")
    slots = AVAILABLE_SLOTS.get(service, AVAILABLE_SLOTS["checkup"])

    # Format slots for the agent to speak naturally
    formatted = []
    for slot in slots[:3]:  # Max 3 options to avoid overwhelming caller
        dt = datetime.fromisoformat(slot)
        formatted.append(dt.strftime("%A %B %d at %I:%M %p"))

    return {
        "results": [{
            "toolCallId": tool_call.get("id"),
            "result": f"Available slots: {', '.join(formatted)}"
        }]
    }

@app.post("/api/book-appointment")
async def book_appointment(request: Request):
    body = await request.json()

    tool_call = body.get("message", {}).get("toolCalls", [{}])[0]
    args = tool_call.get("function", {}).get("arguments", "{}")
    params = json.loads(args) if isinstance(args, str) else args

    # Save to your database here
    booking = {
        "id": f"BK{len(BOOKINGS)+1:04d}",
        "patient": params.get("patient_name"),
        "phone": params.get("phone"),
        "service": params.get("service_type", "checkup"),
        "slot": params.get("appointment_slot"),
        "new_patient": params.get("is_new_patient", True),
        "insurance": params.get("insurance", "unspecified"),
        "created_at": datetime.utcnow().isoformat()
    }
    BOOKINGS.append(booking)

    # Send confirmation SMS here (Twilio)
    # send_sms(params.get("phone"), f"Confirmed: {booking['service']} on {booking['slot']}")

    slot_dt = datetime.fromisoformat(params["appointment_slot"])
    formatted_slot = slot_dt.strftime("%A %B %d at %I:%M %p")

    return {
        "results": [{
            "toolCallId": tool_call.get("id"),
            "result": f"Appointment confirmed for {params['patient_name']} on {formatted_slot}. Confirmation ID: {booking['id']}"
        }]
    }

# Run: uvicorn server:app --reload --port 8000
# Expose locally: ngrok http 8000

Step 3: Attach a Phone Number

# Buy a number (or bring your own Twilio number)
number_response = requests.post(
    "https://api.vapi.ai/phone-number",
    headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
    json={
        "provider": "twilio",
        "assistantId": assistant["id"],
        "name": "Sonrisa Main Line",
        # Use existing Twilio number:
        "twilioPhoneNumber": "+15551234567",
        "twilioAccountSid": "ACxxx",
        "twilioAuthToken": "your_auth_token"
    }
)
phone = number_response.json()
print(f"Number attached: {phone['number']}")

Or buy directly through Vapi (they handle Twilio):

buy_response = requests.post(
    "https://api.vapi.ai/phone-number/buy",
    headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
    json={
        "areaCode": "512",  # Target local area code
        "assistantId": assistant["id"]
    }
)

Step 4: Configure Webhooks for Call Events

Track every call outcome for analysis and follow-up:

# Add to assistant config (or update existing)
update_response = requests.patch(
    f"https://api.vapi.ai/assistant/{assistant['id']}",
    headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
    json={
        "serverUrl": "https://your-server.com/api/vapi-webhook",
        "serverUrlSecret": "your_webhook_secret"
    }
)

@app.post("/api/vapi-webhook")
async def vapi_webhook(request: Request):
    body = await request.json()
    message_type = body.get("message", {}).get("type")

    if message_type == "end-of-call-report":
        report = body["message"]
        call_data = {
            "duration": report.get("durationSeconds"),
            "ended_reason": report.get("endedReason"),  # "customer-ended-call", "silence-timed-out", etc.
            "cost": report.get("cost"),  # In USD
            "transcript": report.get("transcript"),
            "summary": report.get("summary"),
            "tools_called": [t["name"] for t in report.get("toolCalls", [])],
            "appointment_booked": "book_appointment" in [t["name"] for t in report.get("toolCalls", [])]
        }

        # Log it, send to Slack, update CRM, whatever
        print(f"Call ended: {call_data}")

        # If appointment was booked, email the practice
        if call_data["appointment_booked"]:
            # send_email_to_practice(call_data)
            pass

    return {"status": "ok"}

The Settings That Matter Most

Temperature: Keep it low (0.2–0.4) for receptionists. Higher temperature = more natural conversation but more hallucinations. A receptionist that confidently books the wrong slot is worse than one that sounds a bit stiff.

Turn detection sensitivity: Vapi's default is usually fine, but if you're getting cut-offs (agent interrupts before caller finishes), increase endpointingConfig.onPunctuationSeconds.

First message: Don't make it a question. "Thank you for calling X, this is Maya" lets the caller control what happens next. Starting with a question ("How can I help you today?") sounds more bot-like.

Max duration: Set a hard cap (maxDurationSeconds: 600). Rare, but some callers will keep an agent on the line indefinitely. A 10-minute cap prevents runaway costs.

Tool timeouts: Your tools must respond within the timeout. If they don't, the agent gets no result and has to handle the uncertainty gracefully. Keep tools fast (< 5 seconds) or the call feels broken.

Testing Your Agent

Test before going live with real calls:

# Initiate a test call programmatically
test_call = requests.post(
    "https://api.vapi.ai/call",
    headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
    json={
        "assistantId": assistant["id"],
        "customer": {
            "number": "+1YOUR_TEST_NUMBER"  # Call your own phone
        },
        "phoneNumberId": phone["id"]
    }
)
print(f"Test call initiated: {test_call.json()['id']}")

Test these scenarios every time before deploying:

New patient booking — happy path
Existing patient reschedule
Emergency call (should get different response + escalation)
"Are you a robot?" question
Long silence (should handle gracefully)
Caller hangs up mid-booking (check if partial data is saved)
Tool failure (take your server down, see what the agent does)

Cost Breakdown

For a small dental practice (200 calls/month, avg 2 min):

Component	Cost
Vapi platform	~$20 (400 min @ $0.05)
OpenAI GPT-4o	~$8 (included in Vapi pricing)
ElevenLabs TTS	~$5
Deepgram STT	~$3
Twilio phone	$1.15
Total	~$37/month

At a $299/month service price, that's ~88% gross margin.

Where to Go from Here

Multi-language support: Add language to the transcriber config. Vapi supports 30+ languages. For a CDMX dental practice, add Spanish as a fallback.
Human handoff: Use Vapi's transfer tool to forward calls to a human during business hours.
Calendar integration: Replace the mock slots with a real Cal.com or Google Calendar API call in your tool server.
CRM sync: Log every call and booking to HubSpot or whatever CRM the business uses.

The core architecture above scales to any voice use case: lead qualification, outbound sales calls, appointment reminders (outbound), customer support triage, or intake for law firms.

Rey Midas builds AI-powered business automation at MidasTools. We deploy Vapi-based phone agents for service businesses — dental practices, law firms, and real estate offices — starting at $499 setup.

DEV Community

How to Build a Vapi Voice Agent from Scratch (Complete 2026 Guide)

How to Build a Vapi Voice Agent from Scratch (Complete 2026 Guide)

What Vapi Is (and What It Isn't)

Architecture Overview

Prerequisites

Step 1: Create Your First Assistant via API

Step 2: Build Your Tool Server

Step 3: Attach a Phone Number

Step 4: Configure Webhooks for Call Events

The Settings That Matter Most

Testing Your Agent

Cost Breakdown

Where to Go from Here

Top comments (0)