Your AI voice bot picks up the phone. The caller says:
"Hey, I was calling back about my order from last week."
And your bot responds:
"Hello! How can I help you today?"
The caller sighs. Explains everything again. Gets frustrated. Hangs up.
This is the dirty secret of most AI voice bots: every call starts from zero. No memory. No context. No recognition of returning callers. It does not matter how smart your LLM is — if the bot has no memory of past interactions, it will always feel robotic.
The solution is not complicated. This post shows you exactly how to build persistent caller context that survives across sessions.
Why Voice Bots Forget
Most voice bot architectures look like this:
Inbound call → Speech-to-Text → LLM prompt → Text-to-Speech → Response
Each call initializes a fresh LLM context. The phone number that called you? Ignored. The fact that this same number called three times this week? Unknown. The issue they reported on Monday? Gone.
The fix requires two things:
- A caller identity anchor — the phone number from the inbound call
- A persistent store — a database or cache keyed to that number
VoIPBin gives you the caller ID on every webhook event. You supply the memory layer. Together, they make a bot that actually knows who it is talking to.
Architecture Overview
Inbound call (with caller ID)
↓
VoIPBin Webhook
↓
Look up caller in Redis
↓
Build LLM prompt with history
↓
AI responds (via VoIPBin TTS)
↓
Append exchange to Redis
↓
Call ends → persist to DB
The caller ID acts as the session key. Redis holds short-term conversation memory. A database (Postgres, Mongo — your choice) holds long-term history.
Setup
Install dependencies:
pip install flask redis openai requests
Sign up for VoIPBin (no OTP, instant token):
curl -s -X POST https://api.voipbin.net/v1.0/auth/signup \
-H "Content-Type: application/json" \
-d '{"username":"yourbot","password":"yourpassword","email":"you@example.com"}'
# Returns: {"token": "<your-access-token>"}
Set your environment variables:
export VOIPBIN_TOKEN="<your-access-token>"
export OPENAI_API_KEY="<your-openai-key>"
export REDIS_URL="redis://localhost:6379"
Building the Memory Layer
import redis
import json
import os
from datetime import datetime
r = redis.from_url(os.environ["REDIS_URL"])
CALLER_HISTORY_PREFIX = "caller:"
MAX_HISTORY_TURNS = 10 # Keep last 10 exchanges per caller
HISTORY_TTL = 86400 * 30 # 30 days
def get_caller_history(phone_number: str) -> list:
key = f"{CALLER_HISTORY_PREFIX}{phone_number}"
raw = r.get(key)
if raw:
return json.loads(raw)
return []
def save_caller_history(phone_number: str, history: list):
key = f"{CALLER_HISTORY_PREFIX}{phone_number}"
trimmed = history[-MAX_HISTORY_TURNS:]
r.setex(key, HISTORY_TTL, json.dumps(trimmed))
def append_exchange(phone_number: str, user_msg: str, bot_msg: str):
history = get_caller_history(phone_number)
history.append({
"role": "user",
"content": user_msg,
"timestamp": datetime.utcnow().isoformat()
})
history.append({
"role": "assistant",
"content": bot_msg,
"timestamp": datetime.utcnow().isoformat()
})
save_caller_history(phone_number, history)
The Webhook Handler
VoIPBin sends a webhook when a call comes in and when speech is recognized. Here is the core handler:
from flask import Flask, request, jsonify
import openai
import os
app = Flask(__name__)
client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
SYSTEM_PROMPT = """
You are a helpful customer support voice assistant.
You will be given the caller's phone number and their conversation history.
Use this context to provide personalized, continuous support.
Keep responses concise (under 40 words) - this is a phone call, not a chat.
If you recognize a returning caller, acknowledge it naturally.
"""
@app.route("/webhook/call", methods=["POST"])
def handle_call_event():
data = request.json
event_type = data.get("type")
caller_number = data.get("from") # e.g. "+14155551234"
call_id = data.get("call_id")
if event_type == "call.ringing":
history = get_caller_history(caller_number)
is_returning = len(history) > 0
# Store call session state
r.setex(f"call:{call_id}:caller", 3600, caller_number)
r.setex(f"call:{call_id}:returning", 3600, "1" if is_returning else "0")
greeting = build_greeting(caller_number, is_returning, history)
return jsonify({
"actions": [
{"type": "talk", "text": greeting},
{"type": "listen"}
]
})
elif event_type == "call.speech_recognized":
user_speech = data.get("speech", "")
caller_number = r.get(f"call:{call_id}:caller").decode()
history = get_caller_history(caller_number)
# Build LLM messages with full history
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
messages.append({
"role": "system",
"content": f"Caller phone number: {caller_number}. Prior conversation history follows."
})
for turn in history[-6:]: # Last 3 exchanges
messages.append({"role": turn["role"], "content": turn["content"]})
messages.append({"role": "user", "content": user_speech})
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=100
)
bot_reply = response.choices[0].message.content
append_exchange(caller_number, user_speech, bot_reply)
return jsonify({
"actions": [
{"type": "talk", "text": bot_reply},
{"type": "listen"}
]
})
return jsonify({"status": "ok"})
def build_greeting(phone_number: str, is_returning: bool, history: list) -> str:
if not is_returning:
return "Hello! How can I help you today?"
last_user_msg = next(
(h["content"] for h in reversed(history) if h["role"] == "user"),
None
)
if last_user_msg:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Generate a brief, warm phone greeting (under 20 words) for a returning caller. Mention their last topic naturally."},
{"role": "user", "content": f"Last topic: {last_user_msg}"}
],
max_tokens=50
)
return response.choices[0].message.content
return "Welcome back! Great to hear from you again. How can I help?"
if __name__ == "__main__":
app.run(port=5000)
Register Your Webhook with VoIPBin
Point VoIPBin to your server:
curl -X POST https://api.voipbin.net/v1.0/numbers/<your-number-id>/webhook \
-H "Authorization: Bearer $VOIPBIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"webhook_url": "https://your-server.com/webhook/call"}'
Now every inbound call to your VoIPBin number triggers the webhook with the caller's phone number included.
What This Unlocks
Once your bot has memory, the interaction quality jumps immediately:
First call:
Caller: "I need help resetting my password."
Bot: "Of course! I've sent a reset link to your email."
Second call (3 days later):
Bot: "Welcome back! Did the password reset work out for you?"
Caller: "Actually no, I never got that email."
Bot: "Got it — let me resend that to a different address."
The caller did not explain the context again. The bot already knew.
Production Considerations
Caller privacy: Phone numbers are PII. Consider hashing them before using as Redis keys (sha256(phone_number)). Store only what you need.
History size: Limit turns per caller. Unbounded history means unbounded token costs. MAX_HISTORY_TURNS = 10 is a reasonable default for most support bots.
Redis vs database: Redis for hot, active sessions. Move to Postgres or MongoDB for long-term storage after the call ends.
Multi-number bots: If you run bots on multiple VoIPBin numbers, namespace your keys: caller:{bot_number}:{caller_number} to avoid cross-bot memory bleed.
The Result
With fewer than 100 lines of Python, your AI voice bot now:
- Recognizes returning callers by phone number
- Loads prior conversation context before responding
- Generates personalized greetings based on the last interaction
- Appends each exchange to persistent memory automatically
This is not a complex feature. It is missing from most voice bots because the infrastructure was hard. VoIPBin handles the telephony — RTP, STT, TTS, and caller ID delivery. You handle the logic. The phone call becomes just another API call.
Try VoIPBin: https://voipbin.net — signup is instant, no OTP.
MCP Server (use directly from Claude Desktop or Cursor): uvx voipbin-mcp
Go SDK: go get github.com/voipbin/voipbin-go
Top comments (0)