DEV Community

voipbin
voipbin

Posted on

Build a 24/7 AI Inbound Call Handler Without Hiring a Telephony Engineer

Your app is live. Users love it. Then someone files a support ticket:

"I called your number and got a dead line."

You check your notes — you never actually set up a phone number. You assumed voice support would come later. But "later" has arrived, and now you need:

  • A real phone number users can call
  • Something that answers 24/7 (not just during business hours)
  • Intelligent responses, not a static IVR menu from 2003
  • Something you can build in a weekend, not a quarter

This post walks through building an AI-powered inbound call handler using VoIPBin — a CPaaS built specifically for AI agents. You write the conversation logic. VoIPBin handles the telephony.


How It Works

The architecture is straightforward:

Incoming Call
     ↓
 VoIPBin (answers, handles audio)
     ↓
 Webhook → Your Server
     ↓
 Your AI (processes text, decides response)
     ↓
 VoIPBin (speaks the response via TTS)
Enter fullscreen mode Exit fullscreen mode

Your server never touches audio. It receives a text transcript, returns a text reply. VoIPBin handles the rest — RTP, STT, TTS, codec negotiation, DTMF, silence detection. All of it.


Step 1: Get Your API Key

No OTP, no credit card form. One POST:

curl -s -X POST "https://api.voipbin.net/v1.0/auth/signup" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "yourname",
    "password": "yourpassword",
    "email": "you@example.com",
    "firstname": "Jane",
    "lastname": "Dev"
  }'
Enter fullscreen mode Exit fullscreen mode

The response includes accesskey.token. That is your API key for everything that follows.


Step 2: Provision a Phone Number

Search for an available number and purchase it:

# Search available numbers (US)
curl -s "https://api.voipbin.net/v1.0/numbers/available?country_code=US&limit=5" \
  -H "Authorization: Bearer $TOKEN"

# Purchase the number
curl -s -X POST "https://api.voipbin.net/v1.0/numbers" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "number": "+12025550142",
    "call_flow_id": "YOUR_FLOW_ID"
  }'
Enter fullscreen mode Exit fullscreen mode

The call_flow_id links this number to a VoIPBin Flow — the routing logic that runs when someone calls in.


Step 3: Create an AI Call Flow

A VoIPBin Flow defines what happens when the call connects. For an AI handler, you need a talk action (to greet the caller) followed by a webhook action (to loop your AI into the conversation):

curl -s -X POST "https://api.voipbin.net/v1.0/flows" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "AI Inbound Handler",
    "actions": [
      {
        "type": "talk",
        "text": "Thanks for calling. How can I help you today?",
        "language": "en-US"
      },
      {
        "type": "input",
        "timeout": 5,
        "speech": true,
        "webhook": {
          "url": "https://your-server.com/call-webhook",
          "method": "POST"
        }
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

When the caller speaks, VoIPBin transcribes their words and POSTs the transcript to your webhook URL.


Step 4: Build the AI Webhook Handler

Here is a minimal Python + FastAPI handler that uses OpenAI to generate responses:

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI()  # reads OPENAI_API_KEY from env

SYSTEM_PROMPT = """
You are a helpful customer support assistant for Acme Corp.
Answer questions about orders, returns, and business hours.
Keep responses concise — under 3 sentences — since this is a phone call.
"""

@app.post("/call-webhook")
async def handle_call(request: Request):
    body = await request.json()

    # VoIPBin sends the caller's transcribed speech
    caller_text = body.get("speech_text", "")
    call_id = body.get("call_id", "")

    print(f"[{call_id}] Caller said: {caller_text}")

    # Ask your AI
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": caller_text}
        ]
    )
    reply = response.choices[0].message.content

    print(f"[{call_id}] AI reply: {reply}")

    # Return the next actions for VoIPBin to execute
    return JSONResponse({
        "actions": [
            {
                "type": "talk",
                "text": reply,
                "language": "en-US"
            },
            {
                "type": "input",
                "timeout": 5,
                "speech": True,
                "webhook": {
                    "url": "https://your-server.com/call-webhook",
                    "method": "POST"
                }
            }
        ]
    })
Enter fullscreen mode Exit fullscreen mode

The loop is self-sustaining:

  1. VoIPBin speaks the greeting
  2. Caller responds → VoIPBin transcribes → webhook fires
  3. Your AI generates a reply → returned as talk + input actions
  4. Repeat until the caller hangs up

Step 5: Handle Call End Gracefully

Sometimes you want to close the call intentionally — say, after resolving the issue or detecting a goodbye:

def build_response(reply: str, end_call: bool = False) -> dict:
    actions = [
        {"type": "talk", "text": reply, "language": "en-US"}
    ]

    if end_call:
        actions.append({"type": "hangup"})
    else:
        actions.append({
            "type": "input",
            "timeout": 5,
            "speech": True,
            "webhook": {
                "url": "https://your-server.com/call-webhook",
                "method": "POST"
            }
        })

    return {"actions": actions}
Enter fullscreen mode Exit fullscreen mode

Detect "goodbye", "thanks, bye", or a low-confidence transcript and end cleanly.


What You Get

With roughly 100 lines of application code, you now have:

Capability How it's handled
Real phone number VoIPBin provisioning API
Call answering VoIPBin Flow
Speech-to-text VoIPBin STT (automatic)
AI response logic Your webhook + LLM
Text-to-speech VoIPBin TTS (automatic)
Concurrent callers VoIPBin scales it
24/7 availability Your server + VoIPBin infra

You did not write a single line of audio processing code. No RTP sockets. No codec handling. No SIP state machines.


Add Conversation Memory (Optional)

For multi-turn awareness, store history in a dict keyed by call_id:

from collections import defaultdict

call_history = defaultdict(list)

@app.post("/call-webhook")
async def handle_call(request: Request):
    body = await request.json()
    caller_text = body.get("speech_text", "")
    call_id = body.get("call_id", "")

    call_history[call_id].append(
        {"role": "user", "content": caller_text}
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            *call_history[call_id]
        ]
    )
    reply = response.choices[0].message.content

    call_history[call_id].append(
        {"role": "assistant", "content": reply}
    )

    return JSONResponse(build_response(reply))
Enter fullscreen mode Exit fullscreen mode

Now your AI remembers everything said in the call — no extra infrastructure needed.


Try It

  • Signup: POST https://api.voipbin.net/v1.0/auth/signup
  • Docs: voipbin.net
  • Golang SDK: go get github.com/voipbin/voipbin-go
  • MCP Server (for Claude Code / Cursor): uvx voipbin-mcp

If you've built something with AI + voice — or have questions about the webhook loop — drop a comment below. Always happy to talk through the architecture.

Top comments (0)