We've been running AI voice agents that answer phone calls for small businesses (dental practices, restaurants, trades) for about 6 months now. Here's what actually happened in production vs. what we expected.
The Stack
- Voice: ElevenLabs for synthesis, Deepgram for STT
- Brain: LLM-powered conversation engine (GPT-4o / Claude for different use cases)
- Orchestration: VAPI for call flow management
- Integrations: Various PMS/CRM systems via API
- Infrastructure: Redundant telephony with automatic failover
Lesson 1: Latency is everything
Users will tolerate a slightly imperfect response. They will NOT tolerate a 2-second pause before the response starts. We spent more engineering time on latency optimization than on conversation quality.
What worked:
- Streaming TTS (start speaking before the full response is generated)
- Aggressive caching for common responses (hours, location, basic FAQ)
- Edge-deployed STT for faster transcription
- Pre-computed response fragments for predictable conversation paths
Our p95 response latency went from ~1800ms to ~650ms. That single improvement increased our "caller stayed on the line" rate from 72% to 91%.
Lesson 2: The 80/20 of calls is real
Across all our deployments:
- ~40% of calls are appointment/reservation booking (highly automatable)
- ~25% are FAQ (hours, location, directions, "do you take my insurance")
- ~15% are rescheduling/cancellations
- ~10% are complex enough to need a human
- ~10% are spam/robocalls (yes, your AI will talk to other AIs)
We built increasingly sophisticated handling for the top 80% and increasingly fast escalation for the bottom 20%.
Lesson 3: After-hours is the real product
This surprised us. We pitched it as "never miss a call" but the real transformation was after-hours coverage.
One dental practice's data:
- Total calls: ~45/day
- Calls between 6 PM and 9 AM: ~12/day (27%)
- Previously answered after-hours: 0
- Bookings from after-hours calls: 4-6/week
- Revenue impact: ~€5,000-7,000/month in new patient value
The AI doesn't need to be perfect during business hours when humans are available as backup. It needs to be GOOD after hours when the alternative is voicemail (which converts at roughly 0%).
Lesson 4: Multilingual is harder than you think
We serve Irish and Argentine markets, so English + Spanish support was a requirement. Issues we hit:
- Language detection: Callers often start in one language and switch mid-sentence. Code-switching is common in bilingual communities.
- Cultural expectations: Argentine Spanish callers expect more warmth and small talk than Irish English callers. "Cutting to the chase" in Argentine culture feels rude.
- Proper nouns: Names, street addresses, and business names in one language embedded in another language's sentence structure.
- Accent variation: Dublin English ≠ Cork English ≠ Belfast English. Argentine Spanish ≠ Mexican Spanish.
Solution: Separate conversation models per language with shared business logic. More expensive, much better results.
Lesson 5: Monitor everything
Every call is:
- Recorded (with consent notification)
- Transcribed
- Scored on resolution (did the caller get what they needed?)
- Flagged if the AI seemed confused or the caller expressed frustration
We review flagged calls weekly. This is how we find edge cases and improve. Without this loop, quality degrades silently.
Lesson 6: The competitive moat is integration, not AI
Every AI voice platform uses similar LLMs and TTS engines. The actual differentiator is:
- How well you integrate with industry-specific software (Dentrix, OpenTable, ServiceTitan, etc.)
- How well you understand the business rules (a dental practice has different scheduling logic than a restaurant)
- How fast you can onboard a new business (we got this down to ~2 hours for dental)
The AI is becoming commoditized. The workflow around it isn't.
Numbers after 6 months
| Metric | Value |
|---|---|
| Total calls handled | ~180,000 |
| Avg resolution rate | 87% |
| Avg call duration | 2.4 min |
| Customer satisfaction | 4.2/5 |
| Missed call reduction | 94% |
| p95 response latency | 650ms |
What's next
- Real-time sentiment detection to escalate faster when callers are frustrated
- Proactive outbound (appointment reminders, follow-ups)
- Multi-modal (voice + SMS in the same conversation)
- Better handling of multi-party calls
If you're building in this space, happy to chat. The field is moving incredibly fast and there's room for more players, especially in underserved verticals and non-English markets.
Built with love at VoiceFleet — AI voice agents for local businesses.
Top comments (0)