<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Autor Technologies Inc.</title>
    <description>The latest articles on DEV Community by Autor Technologies Inc. (@autor_tech).</description>
    <link>https://dev.to/autor_tech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3842303%2Fbf0b0e32-20ca-43ae-b5aa-7032797fc21e.png</url>
      <title>DEV Community: Autor Technologies Inc.</title>
      <link>https://dev.to/autor_tech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/autor_tech"/>
    <language>en</language>
    <item>
      <title>We Analyzed 10,000 Automated Healthcare Voice Calls — Here's What We Found</title>
      <dc:creator>Autor Technologies Inc.</dc:creator>
      <pubDate>Tue, 31 Mar 2026 16:24:13 +0000</pubDate>
      <link>https://dev.to/autor_tech/we-analyzed-10000-automated-healthcare-voice-calls-heres-what-we-found-32me</link>
      <guid>https://dev.to/autor_tech/we-analyzed-10000-automated-healthcare-voice-calls-heres-what-we-found-32me</guid>
      <description>&lt;p&gt;Last October, we hit a milestone at Autor that I didn't see coming: Loquent, our production voice AI platform, processed its 10,000th automated healthcare call. Instead of celebrating, we did what any team of engineers would do — we pulled the data, locked ourselves in a room for a week, and tore apart every single pattern we could find.&lt;/p&gt;

&lt;p&gt;What we discovered changed how we build voice AI. Some of it confirmed our assumptions. Most of it didn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;For context, Loquent handles automated calls for healthcare and dental clinics across Canada. We're talking appointment scheduling, confirmations, cancellations, insurance verification questions, and general intake routing. The system runs 24/7 on a stack built with Twilio for telephony, Anthropic Claude for conversation intelligence, Deepgram for speech-to-text, and ElevenLabs for text-to-speech. We built the first version in under 8 weeks and have been iterating on it for the past six months.&lt;/p&gt;

&lt;p&gt;The 10,000 calls in this dataset span 14 clinic clients — a mix of dental offices, family practices, and specialist clinics in Ontario and British Columbia. Call durations ranged from 12 seconds (hang-ups) to 14 minutes (complex scheduling with insurance questions). The median call was 2 minutes and 38 seconds.&lt;/p&gt;

&lt;p&gt;Here's what the data told us.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 1: 73% of Calls Follow Just 4 Patterns
&lt;/h2&gt;

&lt;p&gt;We categorized every call by intent. Out of the dozens of potential reasons someone calls a clinic, four patterns dominated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Appointment booking: 31%&lt;/li&gt;
&lt;li&gt;Appointment confirmation/change: 24%&lt;/li&gt;
&lt;li&gt;Cancellation: 11%&lt;/li&gt;
&lt;li&gt;"Am I covered for this?" (insurance/billing questions): 7%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's 73% of all inbound call volume handled by four well-defined flows. The remaining 27% was a grab bag — prescription refill requests, referral follow-ups, directions to the clinic, and a surprising number of people just wanting to talk to "a real person" about nothing specific.&lt;/p&gt;

&lt;p&gt;This matters because it means you don't need a general-purpose conversational AI to handle the majority of healthcare front-desk calls. You need four really good, tightly scoped flows with clean handoff logic for everything else. We spent months trying to make Loquent handle every possible conversation gracefully. The data told us to stop doing that and instead make those four flows bulletproof.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 2: Latency Tolerance is Exactly 1.8 Seconds
&lt;/h2&gt;

&lt;p&gt;We measured caller drop-off rates against our system's response latency — the time between when a caller finishes speaking and when the AI begins its response. The data was clear: at 1.2 seconds or less, drop-off rates were near zero. Between 1.2 and 1.8 seconds, drop-off crept up slightly. Above 1.8 seconds, we saw a cliff. Callers either hung up or started talking over the AI, derailing the conversation.&lt;/p&gt;

&lt;p&gt;1.8 seconds. That's your budget for the entire pipeline: speech-to-text transcription, LLM inference, text-to-speech generation, and audio delivery back through Twilio. In practice, this means we run Deepgram's streaming transcription (adds ~300ms), Claude Haiku for most routine responses (adds ~400-600ms), and ElevenLabs with their Turbo v2 model (adds ~350ms). That leaves roughly 200ms of network overhead before we're in the danger zone.&lt;/p&gt;

&lt;p&gt;For complex queries where we need Claude Sonnet's reasoning — like disambiguating between similar appointment types or handling multi-step insurance questions — we've built a "thinking buffer" that plays a natural filler phrase ("Let me check that for you...") to buy an extra 2-3 seconds. This single trick reduced our complex-query drop-off rate by 41%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 3: Morning Callers Are 2.3x More Patient Than Afternoon Callers
&lt;/h2&gt;

&lt;p&gt;This one surprised us. We segmented call behavior by time of day and found a pattern so consistent it changed our system design.&lt;/p&gt;

&lt;p&gt;Callers between 8am and 11am had an average interaction length of 3 minutes 12 seconds and tolerated longer AI response times before dropping off. Callers between 2pm and 5pm averaged 1 minute 54 seconds and were significantly more likely to request a human transfer.&lt;/p&gt;

&lt;p&gt;Our theory: morning callers are often calling during a planned moment — they're at their desk, coffee in hand, checking things off a list. Afternoon callers are squeezing in a call between meetings or during a break. They want speed.&lt;/p&gt;

&lt;p&gt;We now dynamically adjust Loquent's behavior based on time of day. Afternoon calls get shorter confirmations, faster routing, and more aggressive escalation to human staff. Morning calls get slightly more conversational, exploratory flows. This alone improved our afternoon completion rate by 18%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 4: The "Second Sentence" Problem
&lt;/h2&gt;

&lt;p&gt;Here's a pattern we almost missed. In 34% of calls where the AI's first response was correct and helpful, the caller still asked to speak to a human. We dug into the transcripts and found the issue wasn't accuracy — it was the AI's second sentence.&lt;/p&gt;

&lt;p&gt;The AI would correctly answer the question, then add a follow-up that felt robotic or presumptuous. Things like: "Is there anything else I can help you with today?" delivered in the exact same cadence as a phone tree. Or worse, immediately pivoting to: "I can also help you with appointment scheduling, prescription inquiries, or billing questions."&lt;/p&gt;

&lt;p&gt;Real receptionists don't do this. They pause. They let the caller process. They read the room.&lt;/p&gt;

&lt;p&gt;We rewrote our prompt engineering to include explicit "breath" instructions — moments where the AI generates a brief pause and waits for the caller to lead. We also cut the generic menu-style follow-ups entirely. The result: human transfer requests after successful first responses dropped from 34% to 12%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 5: 6% of Callers Will Try to Break Your AI (And That's Fine)
&lt;/h2&gt;

&lt;p&gt;We identified a consistent 6% of callers across all clinics who deliberately tested the AI. They'd ask trick questions, try to confuse it, speak in fragments, or demand things the AI clearly couldn't do. We affectionately call these "stress-test callers" internally.&lt;/p&gt;

&lt;p&gt;Early on, we tried to make the system handle these gracefully — clever redirects, patient re-prompts, escalation paths. We burned weeks on it. The data showed us something freeing: these callers almost always called back within 24 hours and had a normal, productive interaction the second time. They were curious, not hostile.&lt;/p&gt;

&lt;p&gt;We now let these calls fail gracefully with a simple "I want to make sure you get the help you need — let me connect you with the team" after two confused exchanges. No heroics. Our engineering time is better spent on the 94%.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Changed For Us
&lt;/h2&gt;

&lt;p&gt;After this analysis, we made three architectural decisions that shaped Loquent's next iteration:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Flow specialization over generalization.&lt;/strong&gt; We rebuilt our four core flows from scratch, each with its own optimized prompt chain, latency budget, and escalation logic. The "general conversation" handler became a thin routing layer, not a Swiss Army knife.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Time-aware behavior.&lt;/strong&gt; Loquent now adapts its conversational style, response length, and escalation thresholds based on time of day. The morning version and the afternoon version are meaningfully different systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Silence as a feature.&lt;/strong&gt; We invested heavily in teaching the AI when not to talk. Strategic pauses, shorter confirmations, and eliminating the "anything else?" reflex made the system feel less like a phone tree and more like a receptionist who respects your time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers After the Rebuild
&lt;/h2&gt;

&lt;p&gt;Six weeks after implementing these changes across all 14 clinics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overall call completion rate: 74% → 82%&lt;/li&gt;
&lt;li&gt;Average call duration: 2:38 → 2:11&lt;/li&gt;
&lt;li&gt;Human transfer requests: 22% → 14%&lt;/li&gt;
&lt;li&gt;Client satisfaction (post-call survey): 3.4/5 → 4.1/5&lt;/li&gt;
&lt;li&gt;Peak hour handling capacity: up 23% (same infrastructure cost)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these improvements came from a better model or a fancier tech stack. They came from reading our own data honestly and being willing to simplify.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Most healthcare voice AI problems are scope problems, not intelligence problems.&lt;/strong&gt; You don't need AGI to book a dental cleaning. You need four flows that work perfectly and clean handoffs for everything else.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Latency isn't a "nice to have" metric — it's the metric.&lt;/strong&gt; Every millisecond above 1.8 seconds costs you callers. Architect your entire pipeline around this constraint from day one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Time of day changes caller behavior more than you'd expect.&lt;/strong&gt; Build your system to adapt, or you're leaving completion rate on the table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The AI's second sentence matters more than the first.&lt;/strong&gt; Getting the answer right is table stakes. How the AI handles the moment after the answer determines whether the caller stays or bounces.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Not every edge case deserves engineering time.&lt;/strong&gt; The 6% who stress-test your system will come back. Focus your effort on the 94% who just want their appointment booked.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;If you're building something similar, we'd love to hear about it. Reach out at &lt;a href="mailto:hello@autor.ca"&gt;hello@autor.ca&lt;/a&gt; or visit &lt;a href="https://www.autor.ca" rel="noopener noreferrer"&gt;autor.ca&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
      <category>typescript</category>
    </item>
    <item>
      <title>How We Built a Production Voice AI Agent in Under 8 Weeks (With Twilio + Anthropic Claude)</title>
      <dc:creator>Autor Technologies Inc.</dc:creator>
      <pubDate>Tue, 24 Mar 2026 23:36:05 +0000</pubDate>
      <link>https://dev.to/autor_tech/how-we-built-a-production-voice-ai-agent-in-under-8-weeks-with-twilio-anthropic-claude-8n8</link>
      <guid>https://dev.to/autor_tech/how-we-built-a-production-voice-ai-agent-in-under-8-weeks-with-twilio-anthropic-claude-8n8</guid>
      <description>&lt;p&gt;Earlier this year, we shipped Loquent — a production conversational AI platform that handles real phone calls, books appointments, processes patient follow-ups, and verifies insurance — completely autonomously, 24/7.&lt;/p&gt;

&lt;p&gt;We built it in under 8 weeks.&lt;/p&gt;

&lt;p&gt;This isn't a tutorial about building a toy chatbot. This is a breakdown of what it actually takes to get voice AI into production — the architecture decisions, the hard lessons, and the specific stack we used.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem We Were Solving
&lt;/h2&gt;

&lt;p&gt;Healthcare and dental clinics miss a massive percentage of inbound calls. Front desks get overwhelmed during peak hours. Patients call after hours and get voicemail. Appointments slip through.&lt;/p&gt;

&lt;p&gt;The ask: build an AI system that could handle inbound and outbound calls — booking appointments, confirming details, following up with patients, verifying insurance — without a human in the loop.&lt;/p&gt;

&lt;p&gt;Not a demo. Not a prototype. Production. Real patients. Real clinics. Real calls.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;p&gt;Before diving into architecture, here's what we ended up with:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Voice / Telephony&lt;/td&gt;
&lt;td&gt;Twilio Voice + Media Streams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speech-to-Text&lt;/td&gt;
&lt;td&gt;Deepgram Streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Anthropic Claude (claude-sonnet)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-to-Speech&lt;/td&gt;
&lt;td&gt;ElevenLabs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;NestJS + Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend Dashboard&lt;/td&gt;
&lt;td&gt;Next.js + React&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL + Prisma&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue&lt;/td&gt;
&lt;td&gt;Redis + BullMQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;AWS (ECS, RDS, ElastiCache)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrations&lt;/td&gt;
&lt;td&gt;HubSpot, Salesforce, Zendesk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We evaluated OpenAI's Realtime API, but at the time latency and reliability on production call volumes wasn't where we needed it. We went with the Deepgram → Claude → ElevenLabs pipeline, which gave us full control over each layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Caller dials in
     ↓
Twilio receives call → webhook fires to our backend
     ↓
Twilio Media Stream opens WebSocket to our server
     ↓
Audio chunks stream in real-time → Deepgram STT
     ↓
Transcript fed to Claude with conversation context + clinic data
     ↓
Claude response → ElevenLabs TTS → audio streamed back via Twilio
     ↓
Actions extracted (book appointment, send confirmation, update CRM)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The whole loop needs to complete in under 1.5 seconds to feel natural. That's the hard constraint everything else is built around.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Latency Problem
&lt;/h2&gt;

&lt;p&gt;This was the hardest engineering challenge. Users tolerate maybe 1–2 seconds of silence before it feels broken. We were dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deepgram STT: ~200–400ms&lt;/li&gt;
&lt;li&gt;Claude inference: ~400–800ms&lt;/li&gt;
&lt;li&gt;ElevenLabs TTS first-chunk: ~300–500ms&lt;/li&gt;
&lt;li&gt;Twilio playback: ~100ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's already pushing 2 seconds before any network overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we did:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Stream everything.&lt;/strong&gt; We don't wait for a complete Claude response before starting TTS. The moment Claude starts outputting tokens, we pipe them to ElevenLabs sentence by sentence. The first audio chunk starts playing while Claude is still generating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. End-of-utterance detection.&lt;/strong&gt; We use Deepgram's endpointing, but also built our own silence detection layer. Aggressive endpointing cuts off users mid-sentence. Too conservative and the response feels laggy. We tuned this per use case — a patient confirming an appointment has different speech patterns than one describing symptoms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Claude prompt engineering for speed.&lt;/strong&gt; Verbose responses kill latency. We prompt Claude to be concise, speak like a receptionist, and never use filler phrases that add tokens without value. We also give it explicit response format guidance — short sentences, direct answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Pre-warm everything.&lt;/strong&gt; ElevenLabs has cold start latency. We keep connections warm with keepalive pings. Same with our database pool.&lt;/p&gt;

&lt;p&gt;With all of this in place, we got average response latency down to ~900ms. Occasionally spikes to 1.4s. Feels natural.&lt;/p&gt;




&lt;h2&gt;
  
  
  Designing the Claude Prompt
&lt;/h2&gt;

&lt;p&gt;This took more iteration than the infrastructure. The system prompt has to do a lot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are [Clinic Name]'s AI receptionist. Your name is [Agent Name].

You are speaking with a patient over the phone. Be warm, professional, 
and concise. Speak in short sentences. Never say "Certainly!" or 
"Absolutely!" or similar filler phrases.

CLINIC CONTEXT:
- Name: [Clinic Name]
- Hours: [Hours]
- Providers: [Provider list with availability]
- Services: [Service list]

CURRENT PATIENT CONTEXT:
[Injected dynamically: patient name, upcoming appointments, 
last visit, insurance status]

AVAILABLE ACTIONS:
[JSON schema of actions Claude can trigger: book_appointment, 
cancel_appointment, send_confirmation, transfer_to_human, etc.]

RULES:
- If you cannot handle the request, transfer to a human. Never guess.
- Confirm all bookings by repeating back date, time, and provider.
- Never discuss billing details — transfer to billing team.
- If the patient seems distressed, offer to transfer immediately.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: &lt;strong&gt;Claude needs to know what it can and cannot do&lt;/strong&gt;. An AI that tries to handle everything and fails is worse than one that gracefully transfers when out of scope.&lt;/p&gt;

&lt;p&gt;We use Claude's tool use (function calling) for actions — booking, cancelling, sending confirmations. This gives us clean structured outputs instead of trying to parse intent from natural language.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Tenant Architecture
&lt;/h2&gt;

&lt;p&gt;Loquent serves multiple clinics, each with their own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Phone number(s)&lt;/li&gt;
&lt;li&gt;Providers and availability&lt;/li&gt;
&lt;li&gt;Booking rules and constraints&lt;/li&gt;
&lt;li&gt;Brand voice and agent name&lt;/li&gt;
&lt;li&gt;CRM integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system prompt is dynamically assembled per-call using the clinic's configuration. We built a dashboard where clinic admins can update their agent's name, working hours, provider list, and escalation rules without touching code.&lt;/p&gt;

&lt;p&gt;Each clinic gets isolated data — separate database schemas, separate API credentials, separate call logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Integrations Layer
&lt;/h2&gt;

&lt;p&gt;The hard part isn't the AI — it's making the AI useful by connecting it to real data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Appointment booking:&lt;/strong&gt; We built adapters for common dental/healthcare practice management systems. The adapter pattern let us add new integrations without touching the core engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CRM sync:&lt;/strong&gt; After every call, we write a structured summary back to HubSpot or Salesforce — caller ID, intent, outcome, booking details, and a Claude-generated call summary. This is actually one of the features clinics love most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confirmation messages:&lt;/strong&gt; Post-call, we trigger SMS/email confirmations via Twilio Messaging and SendGrid. Patients get a confirmation within 30 seconds of booking.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Broke in Production (And How We Fixed It)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: Callers interrupting the AI mid-sentence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Users naturally interrupt. The AI was finishing its sentence before responding to the interruption, which felt robotic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Barge-in detection. When Deepgram detects speech while TTS is playing, we immediately stop audio playback, flush the TTS buffer, and re-run inference with the new input. Feels much more natural.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: Claude hallucinating availability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Early builds had Claude generating appointment times that didn't exist. Patients were being told "Tuesday at 2pm" when the provider wasn't available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Availability is never in the prompt. Instead it's a tool call. Claude calls &lt;code&gt;get_availability(provider, date_range)&lt;/code&gt; and we return actual real-time slots. Claude can only offer what the function returns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 3: Long calls running up costs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some patients would keep the AI on the phone indefinitely — confused, or just chatty. Unbounded calls = unbounded cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Configurable max duration per clinic. At 10 minutes, the AI politely offers to transfer to a human or calls back. Average call length is now 2.5 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 4: Noisy environments destroying STT accuracy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Callers in cars, waiting rooms, restaurants. Background noise crushed Deepgram accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Deepgram's noise suppression model + a fallback that asks the caller to repeat if confidence drops below threshold. "I'm sorry, I didn't quite catch that — could you repeat that for me?"&lt;/p&gt;




&lt;h2&gt;
  
  
  Numbers After Launch
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calls handled:&lt;/strong&gt; Thousands of automated calls per month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Average handle time:&lt;/strong&gt; 2.5 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transfer rate:&lt;/strong&gt; ~18% (calls that go to a human)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Booking completion rate:&lt;/strong&gt; ~74% of calls that started with booking intent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uptime:&lt;/strong&gt; 99.7%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Average response latency:&lt;/strong&gt; ~900ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The clinics running Loquent have effectively eliminated missed after-hours calls. One client told us their front desk spends the first hour of every morning re-booking patients who couldn't get through the day before. Loquent eliminated that entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with tool use from day one.&lt;/strong&gt; We initially tried to have Claude make decisions through natural language reasoning. Switching to structured tool calls for all actions made the system dramatically more reliable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invest in evals earlier.&lt;/strong&gt; We didn't set up proper evaluation pipelines until week 5. Building a test call suite in week 1 would have caught several issues earlier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Separate the conversation engine from the telephony layer sooner.&lt;/strong&gt; The abstraction between "what the AI is doing" and "how the call works" should be clean from the start. We refactored this at week 6 and it made everything better.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We're now extending Loquent to handle outbound campaigns — appointment reminders, recall messaging, post-visit follow-ups. The same architecture works, you just flip the direction of the call.&lt;/p&gt;

&lt;p&gt;We're also exploring multi-agent setups where a triage agent hands off to specialist agents (billing, clinical questions, booking) with full context preservation.&lt;/p&gt;




&lt;p&gt;If you're building something similar or want to talk through the architecture, we're at &lt;a href="https://getloquent.com" rel="noopener noreferrer"&gt;getloquent.com&lt;/a&gt; and &lt;a href="https://www.autor.ca" rel="noopener noreferrer"&gt;autor.ca&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Happy to answer questions in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Autor is a Toronto-based AI development studio. We build custom AI agents, voice assistants, and full-stack AI products for businesses. &lt;a href="https://www.autor.ca" rel="noopener noreferrer"&gt;autor.ca&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
  </channel>
</rss>
