Your app is live. Users love it. Then someone files a support ticket:
"I called your number and got a dead line."
You check your notes — you never actually set up a phone number. You assumed voice support would come later. But "later" has arrived, and now you need:
- A real phone number users can call
- Something that answers 24/7 (not just during business hours)
- Intelligent responses, not a static IVR menu from 2003
- Something you can build in a weekend, not a quarter
This post walks through building an AI-powered inbound call handler using VoIPBin — a CPaaS built specifically for AI agents. You write the conversation logic. VoIPBin handles the telephony.
How It Works
The architecture is straightforward:
Incoming Call
↓
VoIPBin (answers, handles audio)
↓
Webhook → Your Server
↓
Your AI (processes text, decides response)
↓
VoIPBin (speaks the response via TTS)
Your server never touches audio. It receives a text transcript, returns a text reply. VoIPBin handles the rest — RTP, STT, TTS, codec negotiation, DTMF, silence detection. All of it.
Step 1: Get Your API Key
No OTP, no credit card form. One POST:
curl -s -X POST "https://api.voipbin.net/v1.0/auth/signup" \
-H "Content-Type: application/json" \
-d '{
"username": "yourname",
"password": "yourpassword",
"email": "you@example.com",
"firstname": "Jane",
"lastname": "Dev"
}'
The response includes accesskey.token. That is your API key for everything that follows.
Step 2: Provision a Phone Number
Search for an available number and purchase it:
# Search available numbers (US)
curl -s "https://api.voipbin.net/v1.0/numbers/available?country_code=US&limit=5" \
-H "Authorization: Bearer $TOKEN"
# Purchase the number
curl -s -X POST "https://api.voipbin.net/v1.0/numbers" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"number": "+12025550142",
"call_flow_id": "YOUR_FLOW_ID"
}'
The call_flow_id links this number to a VoIPBin Flow — the routing logic that runs when someone calls in.
Step 3: Create an AI Call Flow
A VoIPBin Flow defines what happens when the call connects. For an AI handler, you need a talk action (to greet the caller) followed by a webhook action (to loop your AI into the conversation):
curl -s -X POST "https://api.voipbin.net/v1.0/flows" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "AI Inbound Handler",
"actions": [
{
"type": "talk",
"text": "Thanks for calling. How can I help you today?",
"language": "en-US"
},
{
"type": "input",
"timeout": 5,
"speech": true,
"webhook": {
"url": "https://your-server.com/call-webhook",
"method": "POST"
}
}
]
}'
When the caller speaks, VoIPBin transcribes their words and POSTs the transcript to your webhook URL.
Step 4: Build the AI Webhook Handler
Here is a minimal Python + FastAPI handler that uses OpenAI to generate responses:
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from openai import OpenAI
app = FastAPI()
client = OpenAI() # reads OPENAI_API_KEY from env
SYSTEM_PROMPT = """
You are a helpful customer support assistant for Acme Corp.
Answer questions about orders, returns, and business hours.
Keep responses concise — under 3 sentences — since this is a phone call.
"""
@app.post("/call-webhook")
async def handle_call(request: Request):
body = await request.json()
# VoIPBin sends the caller's transcribed speech
caller_text = body.get("speech_text", "")
call_id = body.get("call_id", "")
print(f"[{call_id}] Caller said: {caller_text}")
# Ask your AI
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": caller_text}
]
)
reply = response.choices[0].message.content
print(f"[{call_id}] AI reply: {reply}")
# Return the next actions for VoIPBin to execute
return JSONResponse({
"actions": [
{
"type": "talk",
"text": reply,
"language": "en-US"
},
{
"type": "input",
"timeout": 5,
"speech": True,
"webhook": {
"url": "https://your-server.com/call-webhook",
"method": "POST"
}
}
]
})
The loop is self-sustaining:
- VoIPBin speaks the greeting
- Caller responds → VoIPBin transcribes → webhook fires
- Your AI generates a reply → returned as
talk+inputactions - Repeat until the caller hangs up
Step 5: Handle Call End Gracefully
Sometimes you want to close the call intentionally — say, after resolving the issue or detecting a goodbye:
def build_response(reply: str, end_call: bool = False) -> dict:
actions = [
{"type": "talk", "text": reply, "language": "en-US"}
]
if end_call:
actions.append({"type": "hangup"})
else:
actions.append({
"type": "input",
"timeout": 5,
"speech": True,
"webhook": {
"url": "https://your-server.com/call-webhook",
"method": "POST"
}
})
return {"actions": actions}
Detect "goodbye", "thanks, bye", or a low-confidence transcript and end cleanly.
What You Get
With roughly 100 lines of application code, you now have:
| Capability | How it's handled |
|---|---|
| Real phone number | VoIPBin provisioning API |
| Call answering | VoIPBin Flow |
| Speech-to-text | VoIPBin STT (automatic) |
| AI response logic | Your webhook + LLM |
| Text-to-speech | VoIPBin TTS (automatic) |
| Concurrent callers | VoIPBin scales it |
| 24/7 availability | Your server + VoIPBin infra |
You did not write a single line of audio processing code. No RTP sockets. No codec handling. No SIP state machines.
Add Conversation Memory (Optional)
For multi-turn awareness, store history in a dict keyed by call_id:
from collections import defaultdict
call_history = defaultdict(list)
@app.post("/call-webhook")
async def handle_call(request: Request):
body = await request.json()
caller_text = body.get("speech_text", "")
call_id = body.get("call_id", "")
call_history[call_id].append(
{"role": "user", "content": caller_text}
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
*call_history[call_id]
]
)
reply = response.choices[0].message.content
call_history[call_id].append(
{"role": "assistant", "content": reply}
)
return JSONResponse(build_response(reply))
Now your AI remembers everything said in the call — no extra infrastructure needed.
Try It
-
Signup:
POST https://api.voipbin.net/v1.0/auth/signup - Docs: voipbin.net
-
Golang SDK:
go get github.com/voipbin/voipbin-go -
MCP Server (for Claude Code / Cursor):
uvx voipbin-mcp
If you've built something with AI + voice — or have questions about the webhook loop — drop a comment below. Always happy to talk through the architecture.
Top comments (0)