VoiceFleet

Posted on Apr 4 • Originally published at voicefleet.ai

6 Months of AI Receptionists in Production: What We Learned

#ai #voiceai #startup #automation

We've been running AI voice agents that answer phone calls for small businesses (dental practices, restaurants, trades) for about 6 months now. Here's what actually happened in production vs. what we expected.

The Stack

Voice: ElevenLabs for synthesis, Deepgram for STT
Brain: LLM-powered conversation engine (GPT-4o / Claude for different use cases)
Orchestration: VAPI for call flow management
Integrations: Various PMS/CRM systems via API
Infrastructure: Redundant telephony with automatic failover

Lesson 1: Latency is everything

Users will tolerate a slightly imperfect response. They will NOT tolerate a 2-second pause before the response starts. We spent more engineering time on latency optimization than on conversation quality.

What worked:

Streaming TTS (start speaking before the full response is generated)
Aggressive caching for common responses (hours, location, basic FAQ)
Edge-deployed STT for faster transcription
Pre-computed response fragments for predictable conversation paths

Our p95 response latency went from ~1800ms to ~650ms. That single improvement increased our "caller stayed on the line" rate from 72% to 91%.

Lesson 2: The 80/20 of calls is real

Across all our deployments:

~40% of calls are appointment/reservation booking (highly automatable)
~25% are FAQ (hours, location, directions, "do you take my insurance")
~15% are rescheduling/cancellations
~10% are complex enough to need a human
~10% are spam/robocalls (yes, your AI will talk to other AIs)

We built increasingly sophisticated handling for the top 80% and increasingly fast escalation for the bottom 20%.

Lesson 3: After-hours is the real product

This surprised us. We pitched it as "never miss a call" but the real transformation was after-hours coverage.

One dental practice's data:

Total calls: ~45/day
Calls between 6 PM and 9 AM: ~12/day (27%)
Previously answered after-hours: 0
Bookings from after-hours calls: 4-6/week
Revenue impact: ~€5,000-7,000/month in new patient value

The AI doesn't need to be perfect during business hours when humans are available as backup. It needs to be GOOD after hours when the alternative is voicemail (which converts at roughly 0%).

Lesson 4: Multilingual is harder than you think

We serve Irish and Argentine markets, so English + Spanish support was a requirement. Issues we hit:

Language detection: Callers often start in one language and switch mid-sentence. Code-switching is common in bilingual communities.
Cultural expectations: Argentine Spanish callers expect more warmth and small talk than Irish English callers. "Cutting to the chase" in Argentine culture feels rude.
Proper nouns: Names, street addresses, and business names in one language embedded in another language's sentence structure.
Accent variation: Dublin English ≠ Cork English ≠ Belfast English. Argentine Spanish ≠ Mexican Spanish.

Solution: Separate conversation models per language with shared business logic. More expensive, much better results.

Lesson 5: Monitor everything

Every call is:

Recorded (with consent notification)
Transcribed
Scored on resolution (did the caller get what they needed?)
Flagged if the AI seemed confused or the caller expressed frustration

We review flagged calls weekly. This is how we find edge cases and improve. Without this loop, quality degrades silently.

Lesson 6: The competitive moat is integration, not AI

Every AI voice platform uses similar LLMs and TTS engines. The actual differentiator is:

How well you integrate with industry-specific software (Dentrix, OpenTable, ServiceTitan, etc.)
How well you understand the business rules (a dental practice has different scheduling logic than a restaurant)
How fast you can onboard a new business (we got this down to ~2 hours for dental)

The AI is becoming commoditized. The workflow around it isn't.

Numbers after 6 months

Metric	Value
Total calls handled	~180,000
Avg resolution rate	87%
Avg call duration	2.4 min
Customer satisfaction	4.2/5
Missed call reduction	94%
p95 response latency	650ms

What's next

Real-time sentiment detection to escalate faster when callers are frustrated
Proactive outbound (appointment reminders, follow-ups)
Multi-modal (voice + SMS in the same conversation)
Better handling of multi-party calls

If you're building in this space, happy to chat. The field is moving incredibly fast and there's room for more players, especially in underserved verticals and non-English markets.

Built with love at VoiceFleet — AI voice agents for local businesses.

DEV Community