voipbin

Posted on Apr 23

Build AI Voice Bots That Remember: Persistent Context Across Multiple Calls

#ai #voip #tutorial #python

Your AI voice bot picks up the phone. The caller says:

"Hey, I was calling back about my order from last week."

And your bot responds:

"Hello! How can I help you today?"

The caller sighs. Explains everything again. Gets frustrated. Hangs up.

This is the dirty secret of most AI voice bots: every call starts from zero. No memory. No context. No recognition of returning callers. It does not matter how smart your LLM is — if the bot has no memory of past interactions, it will always feel robotic.

The solution is not complicated. This post shows you exactly how to build persistent caller context that survives across sessions.

Why Voice Bots Forget

Most voice bot architectures look like this:

Inbound call → Speech-to-Text → LLM prompt → Text-to-Speech → Response

Each call initializes a fresh LLM context. The phone number that called you? Ignored. The fact that this same number called three times this week? Unknown. The issue they reported on Monday? Gone.

The fix requires two things:

A caller identity anchor — the phone number from the inbound call
A persistent store — a database or cache keyed to that number

VoIPBin gives you the caller ID on every webhook event. You supply the memory layer. Together, they make a bot that actually knows who it is talking to.

Architecture Overview

Inbound call (with caller ID)
        ↓
   VoIPBin Webhook
        ↓
  Look up caller in Redis
        ↓
  Build LLM prompt with history
        ↓
  AI responds (via VoIPBin TTS)
        ↓
  Append exchange to Redis
        ↓
  Call ends → persist to DB

The caller ID acts as the session key. Redis holds short-term conversation memory. A database (Postgres, Mongo — your choice) holds long-term history.

Setup

Install dependencies:

pip install flask redis openai requests

curl -s -X POST https://api.voipbin.net/v1.0/auth/signup \
  -H "Content-Type: application/json" \
  -d '{"username":"yourbot","password":"yourpassword","email":"you@example.com"}'
# Returns: {"token": "<your-access-token>"}

Set your environment variables:

export VOIPBIN_TOKEN="<your-access-token>"
export OPENAI_API_KEY="<your-openai-key>"
export REDIS_URL="redis://localhost:6379"

Building the Memory Layer

import redis
import json
import os
from datetime import datetime

r = redis.from_url(os.environ["REDIS_URL"])

CALLER_HISTORY_PREFIX = "caller:"
MAX_HISTORY_TURNS = 10  # Keep last 10 exchanges per caller
HISTORY_TTL = 86400 * 30  # 30 days

def get_caller_history(phone_number: str) -> list:
    key = f"{CALLER_HISTORY_PREFIX}{phone_number}"
    raw = r.get(key)
    if raw:
        return json.loads(raw)
    return []

def save_caller_history(phone_number: str, history: list):
    key = f"{CALLER_HISTORY_PREFIX}{phone_number}"
    trimmed = history[-MAX_HISTORY_TURNS:]
    r.setex(key, HISTORY_TTL, json.dumps(trimmed))

def append_exchange(phone_number: str, user_msg: str, bot_msg: str):
    history = get_caller_history(phone_number)
    history.append({
        "role": "user",
        "content": user_msg,
        "timestamp": datetime.utcnow().isoformat()
    })
    history.append({
        "role": "assistant",
        "content": bot_msg,
        "timestamp": datetime.utcnow().isoformat()
    })
    save_caller_history(phone_number, history)

The Webhook Handler

VoIPBin sends a webhook when a call comes in and when speech is recognized. Here is the core handler:

from flask import Flask, request, jsonify
import openai
import os

app = Flask(__name__)
client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])

SYSTEM_PROMPT = """
You are a helpful customer support voice assistant.
You will be given the caller's phone number and their conversation history.
Use this context to provide personalized, continuous support.
Keep responses concise (under 40 words) - this is a phone call, not a chat.
If you recognize a returning caller, acknowledge it naturally.
"""

@app.route("/webhook/call", methods=["POST"])
def handle_call_event():
    data = request.json
    event_type = data.get("type")
    caller_number = data.get("from")  # e.g. "+14155551234"
    call_id = data.get("call_id")

    if event_type == "call.ringing":
        history = get_caller_history(caller_number)
        is_returning = len(history) > 0

        # Store call session state
        r.setex(f"call:{call_id}:caller", 3600, caller_number)
        r.setex(f"call:{call_id}:returning", 3600, "1" if is_returning else "0")

        greeting = build_greeting(caller_number, is_returning, history)
        return jsonify({
            "actions": [
                {"type": "talk", "text": greeting},
                {"type": "listen"}
            ]
        })

    elif event_type == "call.speech_recognized":
        user_speech = data.get("speech", "")
        caller_number = r.get(f"call:{call_id}:caller").decode()
        history = get_caller_history(caller_number)

        # Build LLM messages with full history
        messages = [{"role": "system", "content": SYSTEM_PROMPT}]
        messages.append({
            "role": "system",
            "content": f"Caller phone number: {caller_number}. Prior conversation history follows."
        })
        for turn in history[-6:]:  # Last 3 exchanges
            messages.append({"role": turn["role"], "content": turn["content"]})
        messages.append({"role": "user", "content": user_speech})

        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            max_tokens=100
        )
        bot_reply = response.choices[0].message.content

        append_exchange(caller_number, user_speech, bot_reply)

        return jsonify({
            "actions": [
                {"type": "talk", "text": bot_reply},
                {"type": "listen"}
            ]
        })

    return jsonify({"status": "ok"})


def build_greeting(phone_number: str, is_returning: bool, history: list) -> str:
    if not is_returning:
        return "Hello! How can I help you today?"

    last_user_msg = next(
        (h["content"] for h in reversed(history) if h["role"] == "user"),
        None
    )
    if last_user_msg:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Generate a brief, warm phone greeting (under 20 words) for a returning caller. Mention their last topic naturally."},
                {"role": "user", "content": f"Last topic: {last_user_msg}"}
            ],
            max_tokens=50
        )
        return response.choices[0].message.content

    return "Welcome back! Great to hear from you again. How can I help?"


if __name__ == "__main__":
    app.run(port=5000)

Register Your Webhook with VoIPBin

Point VoIPBin to your server:

curl -X POST https://api.voipbin.net/v1.0/numbers/<your-number-id>/webhook \
  -H "Authorization: Bearer $VOIPBIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"webhook_url": "https://your-server.com/webhook/call"}'

Now every inbound call to your VoIPBin number triggers the webhook with the caller's phone number included.

What This Unlocks

Once your bot has memory, the interaction quality jumps immediately:

First call:

Caller: "I need help resetting my password."
Bot: "Of course! I've sent a reset link to your email."

Second call (3 days later):

Bot: "Welcome back! Did the password reset work out for you?"
Caller: "Actually no, I never got that email."
Bot: "Got it — let me resend that to a different address."

The caller did not explain the context again. The bot already knew.

Production Considerations

Caller privacy: Phone numbers are PII. Consider hashing them before using as Redis keys (sha256(phone_number)). Store only what you need.

History size: Limit turns per caller. Unbounded history means unbounded token costs. MAX_HISTORY_TURNS = 10 is a reasonable default for most support bots.

Redis vs database: Redis for hot, active sessions. Move to Postgres or MongoDB for long-term storage after the call ends.

Multi-number bots: If you run bots on multiple VoIPBin numbers, namespace your keys: caller:{bot_number}:{caller_number} to avoid cross-bot memory bleed.

The Result

With fewer than 100 lines of Python, your AI voice bot now:

Recognizes returning callers by phone number
Loads prior conversation context before responding
Generates personalized greetings based on the last interaction
Appends each exchange to persistent memory automatically

This is not a complex feature. It is missing from most voice bots because the infrastructure was hard. VoIPBin handles the telephony — RTP, STT, TTS, and caller ID delivery. You handle the logic. The phone call becomes just another API call.

Try VoIPBin: https://voipbin.net — signup is instant, no OTP.

MCP Server (use directly from Claude Desktop or Cursor): uvx voipbin-mcp

Go SDK: go get github.com/voipbin/voipbin-go

DEV Community