DEV Community

VoiceFleet
VoiceFleet

Posted on • Originally published at voicefleet.ai

How AI Receptionists Work: A Technical Deep Dive into Dental Practice Phone Automation

I keep seeing "AI receptionist" thrown around without anyone explaining what's actually happening under the hood. Here's a technical breakdown of how the call flow works for dental practices.

The 6-Step Pipeline

1. Call Routing (0-2s)

Standard SIP forwarding. Three modes: primary (AI first), overflow (human first, AI fallback after 3-4 rings), after-hours only. No hardware needed at the practice.

2. ASR — Speech Recognition (Real-Time)

Converting speech to text at 95-97% accuracy. The tricky parts:

  • Regional accents (Irish English has significant variation between Dublin, Cork, rural)
  • Domain-specific vocabulary ("periapical abscess", "composite veneer", "occlusal splint")
  • Noisy environments (caller in car, on street)
  • Interruptions and corrections

3. Intent + Entity Extraction (50-200ms)

LLM processes the transcript and determines:

  • Intent: book, cancel, reschedule, ask question, report emergency
  • Entities: dates, dentist preferences, treatment type, patient name
  • Sentiment: calm, anxious, in pain

Example input: "I was in last week for a filling and it's still quite sore"
→ Intent: post-treatment concern (triage trigger)
→ Entities: patient context (recent filling), symptom (pain)
→ Action: run emergency triage protocol

4. PMS Query (200-500ms)

This is where it gets interesting. The AI connects to practice management systems (Dentally, SOE, Exact, Carestream) via API and:

  • Queries real-time appointment availability
  • Respects booking rules (appointment types, durations, provider assignments)
  • Checks patient records (returning patient? usual provider?)
  • Applies business logic (new patients get 45-min slots, emergencies get same-day)

The appointment is booked and in the diary before the call ends. No "someone will call you back."

5. Response Generation (100-300ms)

LLM generates contextual response → TTS with natural prosody. Modern TTS includes pauses, intonation, even filler words ("let me check that for you").

6. Conversational Loop

Steps 2-5 repeat. Full context maintained throughout. Handles topic switches, corrections, multi-part requests.

Total per-exchange latency: 400ms-1s.

What's Automatable vs. What Needs Humans

~75% of dental practice calls follow predictable patterns: booking, confirming, rescheduling, directions, insurance queries, treatment FAQs. All automatable.

The remaining 25% (emergencies, complex treatment discussions, complaints, billing disputes) get warm-transferred with a conversation summary. Patient never repeats themselves.

The Security Stack

For health data:

  • AES-256 at rest, TLS 1.3 in transit
  • EU data centres only (GDPR)
  • Configurable retention (default 90 days, auto-delete)
  • No model training on patient data
  • DPA and DPIA support

Interesting Stat

92% of callers don't realise they're talking to AI. Average AI call: 2:15 vs human: 3:40 — AI is faster because it has instant PMS access.


Full guide with setup details →

Top comments (0)