DEV Community

voipbin
voipbin

Posted on

Make Your First AI Phone Call in 10 Minutes

Most tutorials about AI voice calling assume you already know telephony. They throw terms like SIP trunking, RTP streams, codec negotiation, and DTMF detection at you before you've even written a line of code.

This isn't that tutorial.

In 10 minutes, you'll have a real AI voice call running — a bot that picks up, talks back with synthesized speech, and hangs up cleanly. No telephony background required.


What You're Building

A simple flow:

  1. You trigger an outbound call via a REST API
  2. VoIPBin connects the call and hits your webhook
  3. Your webhook returns instructions: "speak this text, then wait for input"
  4. VoIPBin handles all the audio — STT, TTS, RTP — and sends transcriptions back to you
  5. You respond with the next action

Your code never touches audio. It just speaks HTTP.


Prerequisites

  • Any language that can run an HTTP server (examples below use Python)
  • A publicly reachable URL for your webhook (use ngrok if you're local)
  • 10 minutes

Step 1: Sign Up and Get Your API Key

curl -s -X POST "https://api.voipbin.net/v1.0/auth/signup" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "your-name",
    "email": "you@example.com",
    "password": "yourpassword"
  }'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "accesskey": {
    "token": "YOUR_API_TOKEN_HERE"
  }
}
Enter fullscreen mode Exit fullscreen mode

No email confirmation. No OTP. Just a token. Keep it.


Step 2: Write Your Webhook Handler

This is the core of your AI bot. VoIPBin calls this URL when the call connects, and whenever speech is transcribed.

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/call-webhook", methods=["POST"])
def handle_call():
    data = request.json
    print("Incoming event:", data)

    call_type = data.get("type")

    if call_type == "call.answered":
        # Call just connected — greet the user
        return jsonify({
            "actions": [
                {
                    "type": "talk",
                    "text": "Hello! This is your AI assistant. How can I help you today?",
                    "language": "en-US",
                    "gender": "female"
                },
                {
                    "type": "listen",
                    "timeout": 5000
                }
            ]
        })

    elif call_type == "call.speech":
        transcript = data.get("transcript", "")
        print(f"User said: {transcript}")

        # Simple echo response — replace with your AI logic
        return jsonify({
            "actions": [
                {
                    "type": "talk",
                    "text": f"You said: {transcript}. Thanks for calling. Goodbye!",
                    "language": "en-US",
                    "gender": "female"
                },
                {
                    "type": "hangup"
                }
            ]
        })

    # Default: hang up
    return jsonify({"actions": [{"type": "hangup"}]})

if __name__ == "__main__":
    app.run(port=5000)
Enter fullscreen mode Exit fullscreen mode

Run it:

pip install flask
python app.py
Enter fullscreen mode Exit fullscreen mode

Expose it with ngrok:

ngrok http 5000
# Copy the https URL, e.g. https://abc123.ngrok.io
Enter fullscreen mode Exit fullscreen mode

Step 3: Make the Call

Now trigger an outbound call. Replace the values with your token, webhook URL, and a real destination number:

curl -s -X POST "https://api.voipbin.net/v1.0/calls" \
  -H "Authorization: Bearer YOUR_API_TOKEN_HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "flow_id": null,
    "source": {
      "type": "sip",
      "target": "sip:bot@voipbin.net"
    },
    "destination": {
      "type": "tel",
      "target": "+15551234567"
    },
    "webhook_url": "https://abc123.ngrok.io/call-webhook"
  }'
Enter fullscreen mode Exit fullscreen mode

VoIPBin dials the number. When it answers, your webhook gets called. The bot speaks, listens, echoes back what was said, and hangs up.

That's it.


What Just Happened (Under the Hood)

Here's why this was so simple:

You wrote zero audio code. VoIPBin handled:

  • Dialing via PSTN
  • Encoding and streaming RTP audio
  • Speech-to-text (STT) on the caller's voice
  • Text-to-speech (TTS) for your bot's responses
  • Session state and timing

Your server only processed JSON and returned JSON. This is VoIPBin's Media Offloading model — your AI logic runs as a stateless HTTP service, and the telephony infrastructure handles everything media-related.

This matters a lot when you scale. Your webhook can be a serverless function, a container, or any backend. No persistent connections. No WebSocket management. No codec knowledge needed.


Swap in Real AI Logic

The echo response is a placeholder. Replacing it with GPT-4o, Claude, or any LLM is straightforward:

import openai

client = openai.OpenAI()

def ai_response(transcript: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful phone assistant. Keep responses short and clear."},
            {"role": "user", "content": transcript}
        ]
    )
    return response.choices[0].message.content

# In your webhook:
elif call_type == "call.speech":
    transcript = data.get("transcript", "")
    reply = ai_response(transcript)

    return jsonify({
        "actions": [
            {"type": "talk", "text": reply, "language": "en-US", "gender": "female"},
            {"type": "listen", "timeout": 5000}
        ]
    })
Enter fullscreen mode Exit fullscreen mode

Now you have a real AI phone assistant — no telephony expertise required, no audio handling, no infrastructure to manage.


What's Next

From here, you can extend this pattern in many directions:

  • Multi-turn conversations — maintain chat history between turns using a session store
  • Inbound calls — assign a phone number (or use a Direct Hash SIP URI for no-number setups) and handle incoming calls the same way
  • Structured data extraction — after each call, log the transcript and run a summarization pass
  • Outbound campaigns — loop over a list of numbers and trigger calls programmatically

The core loop stays the same: webhook in, JSON out.


Resources

  • VoIPBin API docs
  • MCP Server (for Claude Desktop / Cursor): uvx voipbin-mcp
  • Golang SDK: go get github.com/voipbin/voipbin-go
  • Sign up: POST https://api.voipbin.net/v1.0/auth/signup

If you build something with this, drop a comment — always curious what people connect AI phone calls to first.

Top comments (0)