I got tired of seeing small businesses miss calls. So I built Vokio — a voice AI agent that answers real phone calls, remembers callers between sessions, and generates a post-call summary automatically.
Here's the full architecture and how I solved the hard parts.
Stack
- Python + Flask (webhook server)
- Vapi (telephony + STT)
- Claude Haiku (conversation)
- Deepgram Nova 3 (Spanish STT)
- Azure TTS (Spanish voices)
- SQLite (memory between calls)
How it works
Vapi handles the phone call and speech recognition. Instead of using a built-in LLM, I configured Vapi to use a Custom LLM — pointing it to my Flask server. Every time the caller says something, Vapi sends a POST request to my endpoint and expects an OpenAI-compatible streaming response.
@app.route("/chat/completions", methods=["POST"])
def chat():
data = request.json or {}
phone = data["call"]["customer"]["number"]
last_message = [m for m in data["messages"] if m["role"] == "user"][-1]["content"]
response = generate_response(phone, last_message)
# return SSE streaming response
The memory system
This is the interesting part. Every caller is stored in SQLite by phone number. When a call comes in, I query the DB and inject the caller's history into the system prompt.
def generate_response(phone: str, user_message: str) -> str:
caller = get_caller(phone)
history = get_history(phone)
if caller and caller.get("name"):
system += f"\n\nYou already know this caller, their name is {caller['name']}."
if caller and caller.get("sentiment") == "frustrated":
system += "\n\nNOTE: This caller was frustrated last time. Be especially patient."
If the caller was frustrated last time, Claude knows and adjusts its tone automatically.
Post-call analysis
When the call ends, Vapi sends a webhook to /end-call. I pass the transcript to Claude and get back:
- 2-line summary
- Urgency score (1-5)
- Required action ("call back today", "send quote", "none")
- Customer sentiment (satisfied / neutral / frustrated)
All stored in SQLite. The business knows at a glance which calls need attention today.
The tricky parts
Streaming is required. Vapi expects SSE streaming responses. If you return regular JSON the call hangs up immediately. Took me a while to figure that out.
Tool use + verbal response. When Claude uses a tool (like saving the caller's name) without producing text, you get an empty response. I added a follow-up API call to get the verbal response without Claude announcing what it just saved.
dotenv loading. If environment variables already exist in the system, load_dotenv() won't override them. Use override=True.
Result
A 5-minute call costs approximately €0.28. The agent answers in Spanish, detects if the caller switches to English or Catalan, and responds in the same language automatically.
I packaged it as a ready-to-deploy template with 6 business sector prompts included (dental clinic, restaurant, hair salon, real estate, mechanic, hostel).
Top comments (0)