Every caller hears it: "Press 1 for Sales. Press 2 for Support. Press 3 for Billing."
And every caller hates it.
Numeric IVR menus were invented because computers could not understand speech. That limitation is gone. Your AI can now listen to what callers actually say and route them instantly — no dial-pad gymnastics required.
This post shows you how to build natural language call routing with a real phone number, in about 50 lines of Python.
The Problem with Traditional IVR
Classic IVR works like this:
- Play a menu
- Wait for a DTMF keypress
- Branch on the digit
Simple code, terrible experience:
- Callers forget which option they want by option 7
- "Press 0 to hear these options again" is a UX failure
- Callers say "representative" into the void and nothing happens
- Any menu change requires re-recording audio
The fix is obvious: let the caller say what they want.
The Architecture
Caller dials your number
|
v
VoIPBin answers
STT converts speech to text
|
v
Your webhook receives
the transcript
|
v
LLM classifies intent
sales / support / billing / other
|
v
VoIPBin transfers call
to the right destination
Your server never touches audio. VoIPBin handles RTP, STT, and TTS. You write pure business logic.
Step 1: Sign Up and Get a Number
# Create an account
curl -X POST https://api.voipbin.net/v1.0/auth/signup \
-H "Content-Type: application/json" \
-d '{"username":"you","password":"pass","email":"you@example.com"}'
# Save the accesskey.token as TOKEN
# Buy a phone number
curl -X POST https://api.voipbin.net/v1.0/numbers \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"country_code":"US","area_code":"415"}'
The number is active immediately. Point it to your webhook in the VoIPBin dashboard.
Step 2: Answer the Call and Prompt the Caller
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/call", methods=["POST"])
def handle_call():
return jsonify({
"actions": [
{
"type": "talk",
"text": "Hi! Who would you like to speak with? "
"You can say Sales, Support, Billing, or describe your issue."
},
{
"type": "input",
"speech": {"timeout": 5},
"action_url": "https://yourserver.com/route"
}
]
})
The input action listens for speech, transcribes it, and POSTs the result to /route.
Step 3: Classify Intent and Transfer
import openai
ROUTING_TABLE = {
"sales": "+14155550100",
"support": "+14155550101",
"billing": "+14155550102",
}
def classify_intent(transcript: str) -> str:
resp = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": (
"You are a call routing classifier. "
"Respond with exactly one word: sales, support, billing, or unknown."
)
},
{"role": "user", "content": transcript}
],
max_tokens=5,
)
return resp.choices[0].message.content.strip().lower()
@app.route("/route", methods=["POST"])
def route_call():
data = request.json
transcript = data.get("speech", {}).get("results", [{}])[0].get("text", "")
intent = classify_intent(transcript)
destination = ROUTING_TABLE.get(intent)
if destination:
return jsonify({
"actions": [
{"type": "talk", "text": f"Connecting you to {intent} now."},
{"type": "transfer", "destination": destination}
]
})
# Fallback
return jsonify({
"actions": [
{"type": "talk", "text": "Let me transfer you to our main desk."},
{"type": "transfer", "destination": "+14155550199"}
]
})
if __name__ == "__main__":
app.run(port=5000)
That is the complete router.
What Callers Experience
| Caller says | Intent | Goes to |
|---|---|---|
| "I want to buy something" | sales | Sales team |
| "My account is broken" | support | Support team |
| "Question about my invoice" | billing | Billing team |
| "Can I speak to someone?" | unknown | Main desk |
| "Quiero hablar con soporte" | support | Support team |
That last row matters. LLM-based routing handles multilingual callers with zero extra configuration.
Useful Extensions
Priority routing — detect urgent calls before classification runs:
PRIORITY_KEYWORDS = {"urgent", "outage", "down", "emergency"}
if any(kw in transcript.lower() for kw in PRIORITY_KEYWORDS):
return jsonify({
"actions": [
{"type": "talk", "text": "This sounds urgent. Connecting you to on-call now."},
{"type": "transfer", "destination": "+14155550911"}
]
})
Multi-turn context — if intent is unclear, ask a follow-up and send the full exchange to the LLM:
def classify_with_context(turns: list[dict]) -> str:
messages = [{"role": "system", "content": "Route to: sales, support, billing, or unknown."}]
messages.extend(turns)
resp = openai.chat.completions.create(
model="gpt-4o-mini", messages=messages, max_tokens=5
)
return resp.choices[0].message.content.strip().lower()
Why This Scales
Your server is stateless. Every webhook hit is an independent HTTP request. Run it behind any load balancer with zero sticky-session config. Ten calls or ten thousand — same code, same latency.
VoIPBin owns the stateful parts: active call legs, audio streams, DTMF detection. Your code stays clean.
Run It Locally
pip install flask openai
# Tunnel for local testing
npx localtunnel --port 5000
# Set your VoIPBin webhook to <tunnel-url>/call
# Call your purchased number
For production, deploy to any Python host (Railway, Render, Fly.io) and update the webhook URL.
What You Built
- A phone number that understands natural language
- LLM-based intent classification — swap models any time
- Automatic transfer to the right team
- Multilingual support at zero extra cost
- A stateless, horizontally scalable webhook server
No telephony SDK. No DTMF parsing. No recorded menus to maintain.
The days of Press 1 for Sales are over.
Resources:
- VoIPBin: https://voipbin.net
- Sign up:
POST https://api.voipbin.net/v1.0/auth/signup - MCP Server:
uvx voipbin-mcp(works in Claude Code and Cursor) - Go SDK:
go get github.com/voipbin/voipbin-go
Top comments (0)