SMS-based OTP is everywhere. It is also increasingly broken.
SIM-swapping attacks let bad actors hijack someone's phone number and intercept every text message. SS7 vulnerabilities have allowed nation-state-level interception of SMS OTPs for years. And delivery rates vary wildly depending on carrier relationships, country regulations, and anti-spam filters.
Voice OTP is a credible alternative — and for certain user segments (elderly users, regions with poor SMS delivery, high-security flows) it's already preferred. Let's build one.
How Voice OTP Works
Instead of sending a text, your backend places an outbound call to the user's number. An AI voice reads the code aloud. The user hears it and types it into your app.
That's it. No SMS gateway contract. No carrier filtering. Just a phone call.
The flow looks like this:
User requests OTP
│
▼
Your Backend generates code → stores in cache → calls VoIPBin API
│
▼
VoIPBin places outbound call → reads OTP via TTS
│
▼
User hears code → enters it in your app
│
▼
Your Backend validates → grants access
No special telephony knowledge required. No SIP trunk to configure. Just an API call.
Getting Started with VoIPBin
First, grab an API key. It's a single POST request — no email confirmation, no OTP to get through (the irony).
curl -s -X POST https://api.voipbin.net/v1.0/auth/signup \
-H "Content-Type: application/json" \
-d '{"username": "yourname", "password": "yourpass"}'
Response:
{
"accesskey": {
"token": "your-api-token-here"
}
}
Save that token. You'll use it in every subsequent request.
The Backend: Generate and Deliver OTP via Voice
Here's a minimal Python backend using FastAPI that:
- Generates a 6-digit OTP
- Stores it in memory (use Redis in production)
- Places an outbound call via VoIPBin that reads the code aloud
import random
import httpx
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI()
VOIPBIN_TOKEN = "your-api-token-here"
VOIPBIN_BASE = "https://api.voipbin.net/v1.0"
CALLER_NUMBER = "+12025551234" # your VoIPBin number
# In-memory OTP store — use Redis in production
otp_store: dict[str, str] = {}
class OTPRequest(BaseModel):
phone: str # E.164 format, e.g. "+14155551234"
user_id: str
class OTPVerify(BaseModel):
user_id: str
code: str
@app.post("/otp/request")
async def request_otp(req: OTPRequest):
code = str(random.randint(100000, 999999))
otp_store[req.user_id] = code
# Speak the digits with pauses for clarity
spoken = ". ".join(list(code))
message = (
f"Your verification code is: {spoken}. "
f"I repeat: {spoken}. "
"Do not share this code with anyone."
)
async with httpx.AsyncClient() as client:
resp = await client.post(
f"{VOIPBIN_BASE}/calls",
headers={
"Authorization": f"Bearer {VOIPBIN_TOKEN}",
"Content-Type": "application/json",
},
json={
"source": {"type": "tel", "target": CALLER_NUMBER},
"destination": {"type": "tel", "target": req.phone},
"actions": [
{
"type": "talk",
"text": message,
"language": "en-US",
"gender": "female",
}
],
},
)
if resp.status_code not in (200, 201):
raise HTTPException(status_code=502, detail="Failed to place call")
return {"status": "call_initiated", "user_id": req.user_id}
@app.post("/otp/verify")
async def verify_otp(req: OTPVerify):
stored = otp_store.get(req.user_id)
if not stored or stored != req.code:
raise HTTPException(status_code=401, detail="Invalid or expired OTP")
del otp_store[req.user_id]
return {"status": "verified", "user_id": req.user_id}
Install dependencies:
pip install fastapi uvicorn httpx
uvicorn main:app --reload
Test the flow:
# Step 1: request OTP
curl -X POST http://localhost:8000/otp/request \
-H "Content-Type: application/json" \
-d '{ "phone": "+14155551234", "user_id": "user_42" }'
# Step 2: verify (after user enters code)
curl -X POST http://localhost:8000/otp/verify \
-H "Content-Type: application/json" \
-d '{ "user_id": "user_42", "code": "391847" }'
The call is placed immediately. The user's phone rings, they hear the code read aloud twice, and your /otp/verify endpoint handles the rest.
Making the Voice Clearer
OTP readability matters. A voice that runs "three nine one eight four seven" together is hard to follow. A few techniques help:
Pace the digits. The ". ".join(list(code)) trick in the example above inserts a short pause between each digit. VoIPBin's TTS engine respects punctuation pauses.
Repeat the code. Always say it twice. Users are often distracted when they pick up an unexpected call.
Keep it short. The full message should be under 15 seconds. Don't add marketing copy — this is a security interaction.
Use a consistent caller ID. Register a number with VoIPBin and use it exclusively for OTP calls. This lets users recognize the call and builds trust over time.
Handling Call Failures
What if the call doesn't connect? Real-world handling means:
@app.post("/otp/request")
async def request_otp(req: OTPRequest):
code = str(random.randint(100000, 999999))
otp_store[req.user_id] = {"code": code, "attempts": 0}
# ... place call ...
# Return call ID so frontend can poll status
call_id = resp.json().get("id")
return {
"status": "call_initiated",
"call_id": call_id,
"fallback_available": True, # offer SMS fallback in UI
}
On the frontend, show a "Didn't receive the call?" button after 20 seconds. That button can either retry the voice call or fall back to SMS — your choice.
Multilingual Support
VoIPBin's TTS supports multiple languages. If you know the user's locale, pass it directly:
language_map = {
"ko": "ko-KR",
"ja": "ja-JP",
"es": "es-ES",
"fr": "fr-FR",
"de": "de-DE",
"en": "en-US",
}
lang = language_map.get(user_locale, "en-US")
actions = [
{
"type": "talk",
"text": message,
"language": lang,
"gender": "female",
}
]
For a global user base, this is the kind of UX detail that actually matters. A Japanese user hearing their OTP in Japanese is a meaningfully better experience.
Why Not Just Use an SMS Gateway?
Fair question. SMS OTP is well-understood and cheap for most use cases. Voice OTP makes more sense when:
- Your users are at elevated risk of SIM-swapping (crypto, fintech, high-value accounts)
- SMS delivery is unreliable in your target market
- Your users are less mobile-native and find a phone call more natural
- You're building a voice-first product anyway and want consistent UX
- Regulatory requirements in your industry recommend or require voice-based verification
You can also offer both and let the user choose. Many security-conscious apps show: "Send code via SMS" / "Call me instead."
What You've Built
With about 60 lines of Python and one API endpoint, you have:
- Outbound voice calls that speak a dynamically generated OTP
- OTP validation with a clean POST endpoint
- Language-aware TTS for international users
- A foundation to add DTMF collection, retry logic, and SMS fallback
No SIP configuration. No audio encoding. No carrier contract. VoIPBin handles the call placement, number management, and TTS — you just send JSON.
Want to go further? Try collecting the OTP during the call via DTMF — the user presses their keypad instead of typing into your app, and VoIPBin returns the digits via webhook. It's a tighter UX for certain flows.
Check the VoIPBin docs or explore the MCP integration: uvx voipbin-mcp to test calls directly from Claude Code or Cursor.
Top comments (0)