DEV Community

voipbin
voipbin

Posted on

Your AI Agent Can Make Real Phone Calls — Without Touching RTP or SIP

If you're building AI agents that need to communicate with humans, you've probably hit the same wall: voice is hard.

Not the AI part. The telephony part.

RTP, SIP, DTMF, codecs, NAT traversal — this is a 40-year-old stack that was not designed for agents. Most developers end up either avoiding voice entirely, or spending weeks fighting infrastructure before writing a single line of agent logic.

There's a better path.

The Core Problem: Agents Shouldn't Handle Audio

A typical DIY voice bot pipeline:

  1. Receive raw RTP audio from the caller
  2. Run STT to get a transcript
  3. Send the transcript to your LLM
  4. Run TTS on the response
  5. Stream audio back over RTP

Every step has latency, codec issues, and infrastructure concerns. And none of it is your actual product.

Media Offloading: Let VoIPBin Handle Audio

VoIPBin uses Media Offloading. Your AI agent only ever sees text. VoIPBin handles RTP, STT, and TTS.

Caller → VoIPBin (RTP/STT) → Your Agent (text only) → VoIPBin (TTS/RTP) → Caller
Enter fullscreen mode Exit fullscreen mode

Getting Started

1. Sign Up

curl -X POST https://api.voipbin.net/v1.0/auth/signup \
  -H "Content-Type: application/json" \
  -d '{"username": "myagent", "password": "mypassword"}'
Enter fullscreen mode Exit fullscreen mode

You get an accesskey.token immediately — no email verification needed.

2. Create a Call Flow

curl -X POST "https://api.voipbin.net/v1.0/flows?accesskey=YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Bot",
    "actions": [
      {"type": "talk", "text": "Hello! How can I help you today?"},
      {"type": "transcribe", "end_silence_timeout": 2}
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

3. Make a Call

curl -X POST "https://api.voipbin.net/v1.0/calls?accesskey=YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "flow_id": "<flow-id>",
    "destination": "+15551234567"
  }'
Enter fullscreen mode Exit fullscreen mode

VoIPBin dials out, handles the audio, and your agent logic runs on transcripts.

No Phone Number? No Problem

VoIPBin supports Direct Hash SIP URIs — no number provisioning needed:

sip:direct.<12-hex-chars>@sip.voipbin.net
Enter fullscreen mode Exit fullscreen mode

Great for internal tools, dev testing, or agent-to-agent communication.

Use It From Claude Code (MCP)

VoIPBin ships an MCP server. Add to your settings:

{
  "mcpServers": {
    "voipbin": {
      "command": "uvx",
      "args": ["voipbin-mcp"],
      "env": { "VOIPBIN_API_KEY": "your-access-key" }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Then just tell Claude Code: "make a test call to this number" — no curl needed.

What You Skip

  • No RTP stack to manage
  • No codec negotiation
  • No STT/TTS infrastructure to deploy
  • No SIP registration

You keep: your agent logic and LLM calls.

Links

Top comments (0)