Phone trees. We all hate them.
"Press 1 for sales. Press 2 for support. Press 3 for billing. Press 9 to repeat this menu."
Even the companies running them hate them. They are expensive to build, painful to maintain, and callers abandon them in frustration. But the alternative — routing every call to a human — does not scale.
For years, the only way to build a smarter IVR was to hire a telephony specialist, negotiate a contract with an enterprise platform like Avaya or Genesys, and spend months in integration hell.
That is no longer true.
What Changed: AI Understands Intent
The old IVR model was brittle because it depended on exact inputs: press a number, say a keyword, follow a script. One mismatch and the caller was lost.
Modern LLMs understand intent. A caller can say "I got double-charged last month" and an AI agent knows that means billing. It does not need a menu. It does not need a keyword. It listens, understands, and routes — or resolves the issue entirely.
The remaining challenge is the plumbing: how do you connect an LLM to an actual phone call?
The Traditional IVR Stack (Why It Is Hard)
If you tried to build this from scratch, here is what you would need:
- SIP trunk — a carrier connection to receive phone calls
- Media server — to handle the RTP audio stream
- ASR/STT pipeline — to convert speech to text in real time
- NLU layer — to extract intent from the transcript
- TTS engine — to synthesize the AI response back to speech
- Call routing logic — to transfer, queue, or hang up
- Monitoring and failover — because calls cannot go down at 2AM
Each of these is a non-trivial system. Telephony engineers spend careers specializing in just the media layer. Most backend developers have never touched SIP or RTP.
The result: AI IVR projects get stuck in the plumbing phase and never ship.
A Different Model: Offload the Telephony
VoIPBin handles the entire telephony stack — SIP, RTP, STT, TTS, DTMF — and exposes it through a simple REST API.
Your AI agent only ever sees text. It receives a caller's transcribed message via webhook, responds with text, and VoIPBin handles everything else: synthesizing the voice, playing it on the call, listening for the next utterance, and looping back.
Here is how the architecture looks:
Caller ──► VoIPBin (SIP/RTP/STT) ──► Webhook (your server)
│
LLM (GPT, Claude...)
│
Text response back to VoIPBin
│
VoIPBin (TTS) ──► Caller hears voice
Your server is just an HTTP endpoint. No telephony knowledge required.
Building the AI IVR: Step by Step
Step 1: Get Your API Key
Sign up via the API — no OTP, no approval queue:
curl -X POST https://api.voipbin.net/v1.0/auth/signup \
-H "Content-Type: application/json" \
-d '{
"name": "your-name",
"email": "you@example.com",
"password": "yourpassword"
}'
The response includes accesskey.token — that is your API key for all subsequent requests.
Step 2: Get a Phone Number
curl -X POST https://api.voipbin.net/v1.0/numbers \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json" \
-d '{
"country_code": "US",
"number_type": "local"
}'
Save the number — callers will dial this to reach your AI IVR.
Step 3: Build the Webhook Handler
When a call comes in, VoIPBin sends your server a webhook with the caller's speech transcribed to text. Your job is to:
- Run the text through your LLM
- Return a response with the next action
Here is a minimal Python example using FastAPI and OpenAI:
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from openai import OpenAI
app = FastAPI()
client = OpenAI()
SYSTEM_PROMPT = """
You are an AI receptionist for Acme Corp.
When a caller describes their issue, classify it into one of:
- BILLING: payment issues, invoices, refunds
- SUPPORT: technical problems, bugs, how-to questions
- SALES: pricing, demos, new accounts
- OTHER: anything else
If you can resolve the issue directly with a short answer, do so.
Otherwise, tell the caller you are connecting them to the right team
and include the routing tag at the end of your message like [ROUTE:BILLING].
"""
@app.post("/ivr/webhook")
async def handle_call(request: Request):
body = await request.json()
caller_speech = body.get("speech_result", "")
call_id = body.get("call_id", "")
if not caller_speech:
# First turn: greet the caller
return JSONResponse({
"action": "talk",
"text": "Hello, thank you for calling Acme Corp. How can I help you today?",
"webhook_url": "https://yourserver.com/ivr/webhook"
})
# Process with LLM
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": caller_speech}
]
)
ai_text = response.choices[0].message.content
# Check if routing is needed
if "[ROUTE:BILLING]" in ai_text:
clean_text = ai_text.replace("[ROUTE:BILLING]", "").strip()
return JSONResponse({
"action": "transfer",
"text": clean_text,
"destination": "+18005550100" # billing team
})
elif "[ROUTE:SUPPORT]" in ai_text:
clean_text = ai_text.replace("[ROUTE:SUPPORT]", "").strip()
return JSONResponse({
"action": "transfer",
"text": clean_text,
"destination": "+18005550200" # support team
})
elif "[ROUTE:SALES]" in ai_text:
clean_text = ai_text.replace("[ROUTE:SALES]", "").strip()
return JSONResponse({
"action": "transfer",
"text": clean_text,
"destination": "+18005550300" # sales team
})
else:
# AI resolved it — continue the conversation
return JSONResponse({
"action": "talk",
"text": ai_text + " Is there anything else I can help you with?",
"webhook_url": "https://yourserver.com/ivr/webhook"
})
Step 4: Link the Number to Your Webhook
curl -X POST https://api.voipbin.net/v1.0/flows \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json" \
-d '{
"name": "AI IVR Flow",
"actions": [
{
"type": "webhook",
"url": "https://yourserver.com/ivr/webhook",
"method": "POST"
}
]
}'
Assign this flow to your phone number, and every incoming call is routed through your AI agent.
The Conversation Loop
What makes this more powerful than a traditional IVR is that the conversation is stateful. The caller does not have to start over. They can say:
"I was double-charged in April"
And if the AI asks a follow-up:
"Which card ending in which digits was charged?"
The caller answers, the AI gathers the information, and when it transfers to billing, it can pass along a structured summary of what was discussed.
You can maintain conversation history keyed to call_id:
# Simple in-memory store (use Redis in production)
call_history = {}
@app.post("/ivr/webhook")
async def handle_call(request: Request):
body = await request.json()
call_id = body.get("call_id")
caller_speech = body.get("speech_result", "")
# Retrieve or initialize conversation history
history = call_history.get(call_id, [])
if caller_speech:
history.append({"role": "user", "content": caller_speech})
messages = [{"role": "system", "content": SYSTEM_PROMPT}] + history
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
ai_text = response.choices[0].message.content
history.append({"role": "assistant", "content": ai_text})
call_history[call_id] = history
# ... routing logic as before
What You Did Not Have to Do
Look at what is missing from this code:
- No SIP stack
- No RTP handling
- No audio codec configuration
- No jitter buffer tuning
- No STT streaming pipeline
- No TTS voice synthesis
- No telephony carrier negotiation
You wrote an HTTP server that talks to an LLM. VoIPBin handled everything else.
Going Further
Once the basic IVR is working, a few extensions that are straightforward to add:
Post-call summary: When the call ends, VoIPBin sends a final webhook. Log the full transcript, run it through your LLM to generate a structured summary, and store it in your CRM.
Business hours logic: Check the current time in your webhook handler. Outside business hours, tell the caller and offer a callback option instead.
Voicemail fallback: If no agents are available, switch the action from transfer to record to capture a voicemail.
Sentiment escalation: If the caller seems frustrated (detectable via LLM prompt), skip the routing step and connect directly to a senior agent.
All of these are just logic in your webhook handler — no telephony config changes needed.
The Bottom Line
Traditional IVR systems are expensive and inflexible because they were built in an era when telephony was the hard part. Today, the hard part is building good AI — and the telephony layer can be abstracted away entirely.
Your IVR is now just a Python function. Update the system prompt and the routing changes. Add a new route and it takes a few lines. The entire system can be deployed and iterated on like any other web service.
If you want to try it:
- Sign up: https://voipbin.net
- API docs: https://api.voipbin.net/v1.0
- MCP server:
uvx voipbin-mcp— works in Claude Code and Cursor - Go SDK:
go get github.com/voipbin/voipbin-go
The phone tree that took six months to build can now be replaced in an afternoon. And when your requirements change next week, you update a prompt — not a call flow diagram.
Top comments (0)