DEV Community

Sam
Sam

Posted on

Building an AI Voice Agent for Appointment Booking: What I Learned

Over the past few months I’ve been building VoiceIntego, an AI voice agent that answers calls and books appointments for service businesses (dental clinics, HVAC, plumbing). Here are some of the technical lessons that surprised me along the way.

  1. Latency is the whole game

With text chatbots, a 2-second delay is fine. On a phone call, anything over ~800ms feels broken — people start talking over the AI. The hard part isn’t the LLM response; it’s the round trip: speech-to-text → LLM → text-to-speech, all streaming. You have to stream every stage and start TTS before the full response is generated.

  1. Interruptions break naive pipelines

Real callers interrupt. “Actually, can we do Tuesday instead—” mid-sentence. A simple request/response loop can’t handle this. You need barge-in detection: monitor the incoming audio stream and cancel the current TTS playback the moment the caller starts speaking again.

  1. Booking logic needs guardrails, not vibes

Letting the LLM “decide” availability is a recipe for double-bookings. The reliable pattern: the LLM extracts intent (date, time, service), then deterministic code checks the actual calendar API and confirms. The model handles language; your code handles truth.

  1. Confirmation loops matter more than you’d think

Always read the booking back: “So that’s a cleaning on Tuesday the 9th at 2pm — correct?” Phone audio is noisy and names/times get misheard constantly. One extra confirmation turn cuts errors dramatically.

  1. Phone numbers and edge cases everywhere

Voicemail detection, callers who mumble, background noise, people who say “yeah” to mean no. The happy path is maybe 20% of the work.

If you’re building something in this space, happy to compare notes. You can see what I’m working on at VoiceIntego.

Top comments (0)