I've been building voice AI systems for small businesses, and I wanted to share the architecture behind a real-time voice agent designed specifically for restaurants. This post walks through how we connected OpenAI's Realtime API with Twilio to create an AI that answers phone calls, handles reservations, and takes orders — all without human intervention.
The Problem
Restaurants miss 15-25% of incoming calls. During peak hours, that number can hit 40%. Every missed call is a lost reservation, a missed takeout order, or a frustrated customer who calls your competitor instead.
Hiring a dedicated phone person costs $2,500-4,000/month and only covers one shift. Answering services cost $500-1,500/month and mostly just take messages. We wanted to build something that could actually handle the calls — book reservations, take orders, answer questions — 24/7.
Architecture Overview
The system uses a triage agent pattern with specialized sub-agents:
Incoming Call (Twilio)
→ WebSocket Connection
→ Triage Agent (classifies intent)
→ Reservation Agent (books/modifies/cancels)
→ Order Agent (takes takeout/delivery orders)
→ Inquiry Agent (hours, menu, location)
→ Feedback Agent (complaints, suggestions)
Key Components
Twilio Voice + Media Streams — Handles the telephony layer. When a call comes in, Twilio establishes a WebSocket connection and streams raw audio.
OpenAI Realtime API — Processes audio in real-time. We use function calling to give the AI structured tools for booking reservations, checking availability, etc.
Google Calendar Integration — Real-time sync for reservations. The AI checks availability before confirming any booking.
Menu OCR Pipeline — Restaurant owners upload a PDF or photo of their menu. We extract items, prices, and descriptions automatically.
The Triage Pattern
The most important architectural decision was the triage pattern. Instead of one monolithic prompt trying to handle everything, we route calls to specialized agents:
// Simplified triage logic
async function triageCall(transcript: string): Promise<AgentType> {
const intent = await classifyIntent(transcript);
switch (intent) {
case 'reservation':
return new ReservationAgent(calendarService);
case 'order':
return new OrderAgent(menuService, posIntegration);
case 'inquiry':
return new InquiryAgent(restaurantInfo);
case 'feedback':
return new FeedbackAgent(notificationService);
default:
return new GeneralAgent();
}
}
Each agent has its own system prompt, tool definitions, and context. This keeps responses focused and reduces hallucination significantly.
Handling Reservations
The reservation agent validates everything before confirming:
const reservationTools = [
{
name: 'check_availability',
description: 'Check if a specific date/time has open tables',
parameters: {
date: { type: 'string', format: 'YYYY-MM-DD' },
time: { type: 'string', format: 'HH:MM' },
party_size: { type: 'number' }
}
},
{
name: 'create_reservation',
description: 'Book a confirmed reservation',
parameters: {
date: { type: 'string' },
time: { type: 'string' },
party_size: { type: 'number' },
customer_name: { type: 'string' },
phone: { type: 'string' },
special_requests: { type: 'string' }
}
}
];
The AI naturally confirms details back to the caller: "Let me confirm — party of 4, this Friday at 7pm, under the name Johnson?"
Multi-Language Support
One unexpected win: the system automatically responds in the caller's language. OpenAI's Realtime API handles language detection natively. For restaurants in diverse cities, this is huge — no need to hire multilingual staff.
What We Learned
Things that work well:
- Structured tool calling prevents most hallucination issues
- The triage pattern keeps each agent focused and accurate
- Real-time audio processing feels natural to callers (sub-second latency)
- Automatic language detection is a massive differentiator
Things that need work:
- Very noisy environments on the caller's end can cause transcription issues
- Complex multi-party negotiations (event planning for 50+ people) still need human handoff
- Some older callers are uncomfortable talking to an AI
Results
For a typical restaurant doing ~25 calls/day:
- Missed calls dropped from ~20% to near 0%
- ~$1,200/month in recovered revenue from calls that would have gone to voicemail
- Staff freed up from phone duty during peak hours
- Setup time: ~30 minutes (connect calendar, upload menu, forward number)
Resources
If you're interested in building something similar or want to see how this works in practice:
- How AI Phone Systems Reduce Missed Calls for Busy Restaurants — Deep dive into the missed call problem and how AI solves it
- Virtual Receptionist vs AI Phone Agent for Restaurants — Comparison of different approaches by cost and capability
The full architecture handles edge cases I didn't cover here — call transfers, SMS confirmations, POS integration with Square and Toast, and more. Happy to answer questions in the comments.
What's your experience with voice AI in production? I'd love to hear about other real-world use cases.
Top comments (0)