voipbin

Posted on Apr 30

Make Your First AI Phone Call in 10 Minutes

#voip #ai #webdev #tutorial

Most tutorials about AI voice calling assume you already know telephony. They throw terms like SIP trunking, RTP streams, codec negotiation, and DTMF detection at you before you've even written a line of code.

This isn't that tutorial.

In 10 minutes, you'll have a real AI voice call running — a bot that picks up, talks back with synthesized speech, and hangs up cleanly. No telephony background required.

What You're Building

A simple flow:

You trigger an outbound call via a REST API
VoIPBin connects the call and hits your webhook
Your webhook returns instructions: "speak this text, then wait for input"
VoIPBin handles all the audio — STT, TTS, RTP — and sends transcriptions back to you
You respond with the next action

Your code never touches audio. It just speaks HTTP.

Prerequisites

Any language that can run an HTTP server (examples below use Python)
A publicly reachable URL for your webhook (use ngrok if you're local)
10 minutes

Step 1: Sign Up and Get Your API Key

curl -s -X POST "https://api.voipbin.net/v1.0/auth/signup" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "your-name",
    "email": "you@example.com",
    "password": "yourpassword"
  }'

Response:

{
  "accesskey": {
    "token": "YOUR_API_TOKEN_HERE"
  }
}

No email confirmation. No OTP. Just a token. Keep it.

Step 2: Write Your Webhook Handler

This is the core of your AI bot. VoIPBin calls this URL when the call connects, and whenever speech is transcribed.

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/call-webhook", methods=["POST"])
def handle_call():
    data = request.json
    print("Incoming event:", data)

    call_type = data.get("type")

    if call_type == "call.answered":
        # Call just connected — greet the user
        return jsonify({
            "actions": [
                {
                    "type": "talk",
                    "text": "Hello! This is your AI assistant. How can I help you today?",
                    "language": "en-US",
                    "gender": "female"
                },
                {
                    "type": "listen",
                    "timeout": 5000
                }
            ]
        })

    elif call_type == "call.speech":
        transcript = data.get("transcript", "")
        print(f"User said: {transcript}")

        # Simple echo response — replace with your AI logic
        return jsonify({
            "actions": [
                {
                    "type": "talk",
                    "text": f"You said: {transcript}. Thanks for calling. Goodbye!",
                    "language": "en-US",
                    "gender": "female"
                },
                {
                    "type": "hangup"
                }
            ]
        })

    # Default: hang up
    return jsonify({"actions": [{"type": "hangup"}]})

if __name__ == "__main__":
    app.run(port=5000)

Run it:

pip install flask
python app.py

Expose it with ngrok:

ngrok http 5000
# Copy the https URL, e.g. https://abc123.ngrok.io

Step 3: Make the Call

Now trigger an outbound call. Replace the values with your token, webhook URL, and a real destination number:

curl -s -X POST "https://api.voipbin.net/v1.0/calls" \
  -H "Authorization: Bearer YOUR_API_TOKEN_HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "flow_id": null,
    "source": {
      "type": "sip",
      "target": "sip:bot@voipbin.net"
    },
    "destination": {
      "type": "tel",
      "target": "+15551234567"
    },
    "webhook_url": "https://abc123.ngrok.io/call-webhook"
  }'

VoIPBin dials the number. When it answers, your webhook gets called. The bot speaks, listens, echoes back what was said, and hangs up.

That's it.

What Just Happened (Under the Hood)

Here's why this was so simple:

You wrote zero audio code. VoIPBin handled:

Dialing via PSTN
Encoding and streaming RTP audio
Speech-to-text (STT) on the caller's voice
Text-to-speech (TTS) for your bot's responses
Session state and timing

Your server only processed JSON and returned JSON. This is VoIPBin's Media Offloading model — your AI logic runs as a stateless HTTP service, and the telephony infrastructure handles everything media-related.

This matters a lot when you scale. Your webhook can be a serverless function, a container, or any backend. No persistent connections. No WebSocket management. No codec knowledge needed.

Swap in Real AI Logic

The echo response is a placeholder. Replacing it with GPT-4o, Claude, or any LLM is straightforward:

import openai

client = openai.OpenAI()

def ai_response(transcript: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful phone assistant. Keep responses short and clear."},
            {"role": "user", "content": transcript}
        ]
    )
    return response.choices[0].message.content

# In your webhook:
elif call_type == "call.speech":
    transcript = data.get("transcript", "")
    reply = ai_response(transcript)

    return jsonify({
        "actions": [
            {"type": "talk", "text": reply, "language": "en-US", "gender": "female"},
            {"type": "listen", "timeout": 5000}
        ]
    })

Now you have a real AI phone assistant — no telephony expertise required, no audio handling, no infrastructure to manage.

What's Next

From here, you can extend this pattern in many directions:

Multi-turn conversations — maintain chat history between turns using a session store
Inbound calls — assign a phone number (or use a Direct Hash SIP URI for no-number setups) and handle incoming calls the same way
Structured data extraction — after each call, log the transcript and run a summarization pass
Outbound campaigns — loop over a list of numbers and trigger calls programmatically

The core loop stays the same: webhook in, JSON out.

Resources

VoIPBin API docs
MCP Server (for Claude Desktop / Cursor): uvx voipbin-mcp
Golang SDK: go get github.com/voipbin/voipbin-go
Sign up: POST https://api.voipbin.net/v1.0/auth/signup

If you build something with this, drop a comment — always curious what people connect AI phone calls to first.

DEV Community