<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Luis Pham</title>
    <description>The latest articles on DEV Community by Luis Pham (@luispham).</description>
    <link>https://dev.to/luispham</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3901735%2Fbdbc03f5-9ac3-4fe0-8df9-e16707cc7167.png</url>
      <title>DEV Community: Luis Pham</title>
      <link>https://dev.to/luispham</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/luispham"/>
    <language>en</language>
    <item>
      <title>What I Learned About Latency While Building a Real-Time Voice AI Agent</title>
      <dc:creator>Luis Pham</dc:creator>
      <pubDate>Fri, 08 May 2026 13:10:00 +0000</pubDate>
      <link>https://dev.to/luispham/what-i-learned-about-latency-while-building-a-real-time-voice-ai-agent-g6o</link>
      <guid>https://dev.to/luispham/what-i-learned-about-latency-while-building-a-real-time-voice-ai-agent-g6o</guid>
      <description>&lt;h1&gt;
  
  
  What I Learned About Latency While Building a Real-Time Voice AI Agent
&lt;/h1&gt;

&lt;p&gt;When I started building a real-time voice AI agent, I thought about latency mostly as an engineering problem.&lt;/p&gt;

&lt;p&gt;Reduce the delay.&lt;br&gt;&lt;br&gt;
Make the response faster.&lt;br&gt;&lt;br&gt;
Stream audio as quickly as possible.&lt;/p&gt;

&lt;p&gt;That is still true.&lt;/p&gt;

&lt;p&gt;But after working on &lt;a href="https://ringbooker.com" rel="noopener noreferrer"&gt;RingBooker&lt;/a&gt;, an AI receptionist for salons, spas, med spas, and beauty clinics, I started to think about latency differently.&lt;/p&gt;

&lt;p&gt;Latency is not only a technical metric.&lt;/p&gt;

&lt;p&gt;On a phone call, latency is part of the user experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  A small delay feels bigger on the phone
&lt;/h2&gt;

&lt;p&gt;In a web app, a short delay is usually fine.&lt;/p&gt;

&lt;p&gt;A button can show a loading state.&lt;br&gt;&lt;br&gt;
A page can display a spinner.&lt;br&gt;&lt;br&gt;
A chatbot can show typing dots.&lt;/p&gt;

&lt;p&gt;The user understands that something is happening.&lt;/p&gt;

&lt;p&gt;A phone call does not have that same visual feedback.&lt;/p&gt;

&lt;p&gt;When the caller stops talking and the AI does not respond, even a short pause can feel strange.&lt;/p&gt;

&lt;p&gt;The caller may wonder:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did it hear me?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Is the call still connected?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That emotional reaction matters.&lt;/p&gt;

&lt;p&gt;The system might be working perfectly in the background, but the caller does not see that.&lt;/p&gt;

&lt;p&gt;They only hear silence.&lt;/p&gt;

&lt;h2&gt;
  
  
  End-to-end latency matters more than one number
&lt;/h2&gt;

&lt;p&gt;At first, it is tempting to measure only one part of the system.&lt;/p&gt;

&lt;p&gt;Model response time.&lt;br&gt;&lt;br&gt;
Speech-to-text time.&lt;br&gt;&lt;br&gt;
Text-to-speech time.&lt;br&gt;&lt;br&gt;
Network delay.&lt;/p&gt;

&lt;p&gt;But the caller experiences the whole chain.&lt;/p&gt;

&lt;p&gt;For a voice AI agent, the real question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How long does it take from the moment the caller stops speaking to the moment they hear a useful response?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That path can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;caller audio&lt;/li&gt;
&lt;li&gt;voice activity detection&lt;/li&gt;
&lt;li&gt;speech-to-text&lt;/li&gt;
&lt;li&gt;intent understanding&lt;/li&gt;
&lt;li&gt;model response&lt;/li&gt;
&lt;li&gt;tool calls or retrieval&lt;/li&gt;
&lt;li&gt;text-to-speech&lt;/li&gt;
&lt;li&gt;audio streaming back to the caller&lt;/li&gt;
&lt;li&gt;telephony network behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optimizing one piece helps, but it does not always fix the felt experience.&lt;/p&gt;

&lt;p&gt;The full loop is what matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  “Fast” is not always the same as “natural”
&lt;/h2&gt;

&lt;p&gt;This surprised me.&lt;/p&gt;

&lt;p&gt;I assumed faster would always feel better.&lt;/p&gt;

&lt;p&gt;But if the AI responds too instantly, it can feel unnatural.&lt;/p&gt;

&lt;p&gt;Humans usually leave tiny pauses in conversation.&lt;/p&gt;

&lt;p&gt;They breathe.&lt;br&gt;&lt;br&gt;
They process.&lt;br&gt;&lt;br&gt;
They acknowledge.&lt;br&gt;&lt;br&gt;
They sometimes say “okay” before moving forward.&lt;/p&gt;

&lt;p&gt;A voice AI that snaps back too quickly can feel robotic, even if the latency is technically great.&lt;/p&gt;

&lt;p&gt;So the goal is not always the lowest possible delay.&lt;/p&gt;

&lt;p&gt;The goal is a response rhythm that feels alive and useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some delays are more acceptable than others
&lt;/h2&gt;

&lt;p&gt;Not all latency feels the same.&lt;/p&gt;

&lt;p&gt;If the caller asks a simple question, they expect a quick answer.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Are you open today?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A long delay here feels bad.&lt;/p&gt;

&lt;p&gt;But if the caller asks something more complex, a short pause can feel normal.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Can you help me find an appointment for a color service next week?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In that case, a small delay may be acceptable if the AI acknowledges what is happening:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Let me get a few details first.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I can help collect that request.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The user experience depends on context.&lt;/p&gt;

&lt;p&gt;This is where product design and engineering meet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Silence needs design
&lt;/h2&gt;

&lt;p&gt;One thing I learned is that silence cannot be ignored.&lt;/p&gt;

&lt;p&gt;If the system needs time, the conversation should make that clear.&lt;/p&gt;

&lt;p&gt;This does not mean adding filler everywhere.&lt;/p&gt;

&lt;p&gt;Too much filler is annoying.&lt;/p&gt;

&lt;p&gt;But the AI needs ways to keep the caller oriented.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;acknowledge the request&lt;/li&gt;
&lt;li&gt;ask one clear follow-up question&lt;/li&gt;
&lt;li&gt;avoid long unexplained pauses&lt;/li&gt;
&lt;li&gt;do not overtalk while processing&lt;/li&gt;
&lt;li&gt;hand off when the request is too specific&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of voice UX is about making the caller feel that the system is still present.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool calls make latency more complicated
&lt;/h2&gt;

&lt;p&gt;For a simple conversation, the AI can respond directly.&lt;/p&gt;

&lt;p&gt;But real products often need tools.&lt;/p&gt;

&lt;p&gt;For local businesses, a voice agent may need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;check business hours&lt;/li&gt;
&lt;li&gt;understand service rules&lt;/li&gt;
&lt;li&gt;collect appointment details&lt;/li&gt;
&lt;li&gt;look up knowledge base information&lt;/li&gt;
&lt;li&gt;prepare a call summary&lt;/li&gt;
&lt;li&gt;decide whether to hand off&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every tool call can add delay.&lt;/p&gt;

&lt;p&gt;This creates a tradeoff.&lt;/p&gt;

&lt;p&gt;More context can make the answer better.&lt;/p&gt;

&lt;p&gt;But too much waiting can make the call feel worse.&lt;/p&gt;

&lt;p&gt;The question becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Does this tool call improve the caller experience enough to justify the delay?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not only an engineering question.&lt;/p&gt;

&lt;p&gt;It is a product question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Call summaries changed how I thought about latency
&lt;/h2&gt;

&lt;p&gt;Early on, I wanted the AI to answer as much as possible during the call.&lt;/p&gt;

&lt;p&gt;But for local businesses, the after-call summary is often just as important.&lt;/p&gt;

&lt;p&gt;The caller needs a fast, useful interaction.&lt;/p&gt;

&lt;p&gt;The business needs structured context after the call.&lt;/p&gt;

&lt;p&gt;That means the AI does not always need to solve everything live.&lt;/p&gt;

&lt;p&gt;Sometimes it is better to keep the call simple, collect the right information, and give the business a clear next step.&lt;/p&gt;

&lt;p&gt;This reduces pressure on the live conversation.&lt;/p&gt;

&lt;p&gt;It also avoids forcing the caller to wait while the AI tries to do too much.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency affects trust
&lt;/h2&gt;

&lt;p&gt;This is probably the most important lesson.&lt;/p&gt;

&lt;p&gt;When a voice AI pauses awkwardly, talks over the caller, or responds too slowly, the problem is not just speed.&lt;/p&gt;

&lt;p&gt;The problem is trust.&lt;/p&gt;

&lt;p&gt;The caller may start to feel:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This is not reliable.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For salons, spas, med spas, and other appointment-based businesses, that matters.&lt;/p&gt;

&lt;p&gt;A caller might be trying to book a same-day service, ask about a consultation, reschedule, or decide whether the business feels responsive.&lt;/p&gt;

&lt;p&gt;The phone call is part of the trust-building process.&lt;/p&gt;

&lt;p&gt;If the AI feels slow or confused, the business may feel slow or confused too.&lt;/p&gt;

&lt;p&gt;That is why latency is not just a backend concern.&lt;/p&gt;

&lt;p&gt;It is a brand experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would measure
&lt;/h2&gt;

&lt;p&gt;If I were starting from scratch, I would measure latency in layers.&lt;/p&gt;

&lt;p&gt;Not only:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How fast did the model respond?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how long until speech was detected&lt;/li&gt;
&lt;li&gt;how long until the caller’s intent was understood&lt;/li&gt;
&lt;li&gt;how long until the first useful audio came back&lt;/li&gt;
&lt;li&gt;how often the AI talked over the caller&lt;/li&gt;
&lt;li&gt;how often the caller interrupted&lt;/li&gt;
&lt;li&gt;how often the AI needed to recover&lt;/li&gt;
&lt;li&gt;how often a human handoff was needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The raw timing is useful.&lt;/p&gt;

&lt;p&gt;But the conversation outcome matters more.&lt;/p&gt;

&lt;p&gt;A fast bad answer is still a bad answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;Latency in voice AI is not just about making the system faster.&lt;/p&gt;

&lt;p&gt;It is about making the conversation feel responsive.&lt;/p&gt;

&lt;p&gt;The caller should feel heard.&lt;br&gt;&lt;br&gt;
The AI should avoid awkward silence.&lt;br&gt;&lt;br&gt;
The system should not overtalk.&lt;br&gt;&lt;br&gt;
The business should receive useful context.&lt;/p&gt;

&lt;p&gt;That is the hard part.&lt;/p&gt;

&lt;p&gt;The best voice AI products will not only optimize milliseconds.&lt;/p&gt;

&lt;p&gt;They will optimize the feeling of the call.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>performance</category>
      <category>ux</category>
    </item>
    <item>
      <title>The Hidden UX Problem in Voice AI: When Should the AI Stop Talking?</title>
      <dc:creator>Luis Pham</dc:creator>
      <pubDate>Sun, 03 May 2026 13:08:00 +0000</pubDate>
      <link>https://dev.to/luispham/the-hidden-ux-problem-in-voice-ai-when-should-the-ai-stop-talking-4a62</link>
      <guid>https://dev.to/luispham/the-hidden-ux-problem-in-voice-ai-when-should-the-ai-stop-talking-4a62</guid>
      <description>&lt;h1&gt;
  
  
  The Hidden UX Problem in Voice AI: When Should the AI Stop Talking?
&lt;/h1&gt;

&lt;p&gt;One of the hardest parts of building a voice AI product is not making the AI talk.&lt;/p&gt;

&lt;p&gt;It is knowing when the AI should stop talking.&lt;/p&gt;

&lt;p&gt;I did not fully appreciate this at the beginning.&lt;/p&gt;

&lt;p&gt;When I started building &lt;a href="https://ringbooker.com" rel="noopener noreferrer"&gt;RingBooker&lt;/a&gt;, an AI receptionist for salons, spas, med spas, beauty clinics, I was focused on the obvious problems:&lt;/p&gt;

&lt;p&gt;Latency.&lt;br&gt;&lt;br&gt;
Speech recognition.&lt;br&gt;&lt;br&gt;
Call routing.&lt;br&gt;&lt;br&gt;
Booking intent.&lt;br&gt;&lt;br&gt;
Call summaries.&lt;/p&gt;

&lt;p&gt;Those are all important.&lt;/p&gt;

&lt;p&gt;But the more I worked on real phone-call flows, the more I realized that silence, interruption, and timing are part of the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text AI can explain. Voice AI has to pace itself.
&lt;/h2&gt;

&lt;p&gt;In a text interface, long answers can still work.&lt;/p&gt;

&lt;p&gt;The user can skim.&lt;br&gt;&lt;br&gt;
They can scroll.&lt;br&gt;&lt;br&gt;
They can reread.&lt;br&gt;&lt;br&gt;
They can ignore parts of the answer.&lt;/p&gt;

&lt;p&gt;On a phone call, the user cannot skim.&lt;/p&gt;

&lt;p&gt;They have to listen in real time.&lt;/p&gt;

&lt;p&gt;That means every extra sentence costs attention.&lt;/p&gt;

&lt;p&gt;A voice agent that gives a complete answer may still feel bad if it talks for too long.&lt;/p&gt;

&lt;p&gt;This is especially true for local business calls.&lt;/p&gt;

&lt;p&gt;A caller usually does not want a long explanation. They want to know what to do next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI should not fill every silence
&lt;/h2&gt;

&lt;p&gt;This was one of my early mistakes.&lt;/p&gt;

&lt;p&gt;I assumed silence was always bad.&lt;/p&gt;

&lt;p&gt;So I wanted the AI to respond quickly, keep the conversation moving, and avoid awkward pauses.&lt;/p&gt;

&lt;p&gt;But not every pause needs to be filled.&lt;/p&gt;

&lt;p&gt;Sometimes the caller is thinking.&lt;br&gt;&lt;br&gt;
Sometimes they are checking their calendar.&lt;br&gt;&lt;br&gt;
Sometimes they are asking someone next to them.&lt;br&gt;&lt;br&gt;
Sometimes they are about to correct themselves.&lt;/p&gt;

&lt;p&gt;If the AI jumps in too quickly, it feels pushy.&lt;/p&gt;

&lt;p&gt;If it waits too long, it feels broken.&lt;/p&gt;

&lt;p&gt;That middle ground is harder than it sounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interruptions are not edge cases
&lt;/h2&gt;

&lt;p&gt;In voice AI, interruption handling is not a feature you add later.&lt;/p&gt;

&lt;p&gt;It is core UX.&lt;/p&gt;

&lt;p&gt;People interrupt naturally.&lt;/p&gt;

&lt;p&gt;They say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Actually, wait...”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“No, I meant tomorrow.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Can I ask something else?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the AI ignores that and keeps talking, the user immediately feels that they are not being heard.&lt;/p&gt;

&lt;p&gt;This is different from text.&lt;/p&gt;

&lt;p&gt;In text, the assistant can finish its response and the user can reply after.&lt;/p&gt;

&lt;p&gt;On a call, the timing itself communicates whether the AI is listening.&lt;/p&gt;

&lt;h2&gt;
  
  
  A good voice agent needs shorter answers
&lt;/h2&gt;

&lt;p&gt;This is one of the product rules I keep coming back to.&lt;/p&gt;

&lt;p&gt;For phone calls, shorter is usually better.&lt;/p&gt;

&lt;p&gt;Not because the user is impatient, but because voice is linear.&lt;/p&gt;

&lt;p&gt;The caller cannot jump ahead.&lt;/p&gt;

&lt;p&gt;For example, if someone calls a salon and asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Do you have anything today?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI does not need to explain the entire booking process.&lt;/p&gt;

&lt;p&gt;It probably needs to say something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I can help check the request. What service are you looking for?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then collect the useful details.&lt;/p&gt;

&lt;p&gt;The goal is not to sound smart.&lt;/p&gt;

&lt;p&gt;The goal is to move the call forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI should ask one thing at a time
&lt;/h2&gt;

&lt;p&gt;This is another lesson from phone UX.&lt;/p&gt;

&lt;p&gt;In a chatbot, you can ask multiple questions at once:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What service do you need, what day works best, and do you have a preferred provider?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That can work in text.&lt;/p&gt;

&lt;p&gt;On a call, it often fails.&lt;/p&gt;

&lt;p&gt;The caller may answer only one part.&lt;/p&gt;

&lt;p&gt;Or they may forget the first question.&lt;/p&gt;

&lt;p&gt;Or they may respond vaguely.&lt;/p&gt;

&lt;p&gt;For voice, one question at a time usually works better.&lt;/p&gt;

&lt;p&gt;Bad:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What service are you looking for, what time do you prefer, and is there a specific stylist?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Better:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What service are you looking for?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Do you have a preferred day or time?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Do you have a preferred stylist?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It feels slower on paper, but it is often smoother in conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Knowing when to hand off is part of the UX
&lt;/h2&gt;

&lt;p&gt;Sometimes the best thing the AI can do is stop trying to solve the call.&lt;/p&gt;

&lt;p&gt;This is especially true when the caller asks for something sensitive, complex, or very specific.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a medical-aesthetic treatment question that needs professional judgment&lt;/li&gt;
&lt;li&gt;a pricing question that depends on consultation&lt;/li&gt;
&lt;li&gt;a complaint&lt;/li&gt;
&lt;li&gt;a caller who repeatedly asks for a human&lt;/li&gt;
&lt;li&gt;a policy exception&lt;/li&gt;
&lt;li&gt;a complicated reschedule&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those moments, continuing to talk can hurt trust.&lt;/p&gt;

&lt;p&gt;A good voice agent should be comfortable saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I’ll pass this to the team so they can follow up with the right answer.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not failure.&lt;/p&gt;

&lt;p&gt;That is good product behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  The summary should replace unnecessary talking
&lt;/h2&gt;

&lt;p&gt;Another thing I learned: the AI does not need to explain everything to the caller if the real value is in the follow-up.&lt;/p&gt;

&lt;p&gt;For local businesses, the call often has two users:&lt;/p&gt;

&lt;p&gt;The caller.&lt;br&gt;&lt;br&gt;
The business team.&lt;/p&gt;

&lt;p&gt;The caller wants a quick response.&lt;/p&gt;

&lt;p&gt;The business wants clean context.&lt;/p&gt;

&lt;p&gt;So instead of making the AI over-explain, the product can capture the details and send the team a useful summary.&lt;/p&gt;

&lt;p&gt;That means the AI can keep the call shorter while still creating value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voice AI needs boundaries, not just intelligence
&lt;/h2&gt;

&lt;p&gt;A lot of AI products are designed to show how capable the model is.&lt;/p&gt;

&lt;p&gt;But on the phone, capability without restraint can feel uncomfortable.&lt;/p&gt;

&lt;p&gt;The AI should know:&lt;/p&gt;

&lt;p&gt;When to answer.&lt;br&gt;&lt;br&gt;
When to ask a follow-up.&lt;br&gt;&lt;br&gt;
When to pause.&lt;br&gt;&lt;br&gt;
When to stop.&lt;br&gt;&lt;br&gt;
When to hand off.&lt;/p&gt;

&lt;p&gt;Those decisions shape the user experience as much as the model quality.&lt;/p&gt;

&lt;p&gt;Maybe more.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I am trying to optimize for
&lt;/h2&gt;

&lt;p&gt;For RingBooker, I am not trying to make the AI sound like the most impressive receptionist in the world.&lt;/p&gt;

&lt;p&gt;I am trying to make it useful in the calls a beauty business normally misses:&lt;/p&gt;

&lt;p&gt;After-hours calls.&lt;br&gt;&lt;br&gt;
Peak-hour overflow.&lt;br&gt;&lt;br&gt;
Same-day requests.&lt;br&gt;&lt;br&gt;
Reschedules.&lt;br&gt;&lt;br&gt;
Consultation inquiries.&lt;br&gt;&lt;br&gt;
Human handoff requests.&lt;/p&gt;

&lt;p&gt;In those moments, the AI does not need to dominate the conversation.&lt;/p&gt;

&lt;p&gt;It needs to help the caller feel heard and give the business enough context to act.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The hidden UX problem in voice AI is restraint.&lt;/p&gt;

&lt;p&gt;Knowing what to say matters.&lt;/p&gt;

&lt;p&gt;But knowing when to stop talking may matter even more.&lt;/p&gt;

&lt;p&gt;A voice AI agent should not try to win the conversation.&lt;/p&gt;

&lt;p&gt;It should help the caller get to the next step.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>voiceai</category>
      <category>startup</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>Why Voice AI for Local Businesses Is Harder Than a Chatbot</title>
      <dc:creator>Luis Pham</dc:creator>
      <pubDate>Thu, 30 Apr 2026 12:06:00 +0000</pubDate>
      <link>https://dev.to/luispham/why-voice-ai-for-local-businesses-is-harder-than-a-chatbot-2d0o</link>
      <guid>https://dev.to/luispham/why-voice-ai-for-local-businesses-is-harder-than-a-chatbot-2d0o</guid>
      <description>&lt;h1&gt;
  
  
  Why Voice AI for Local Businesses Is Harder Than a Chatbot
&lt;/h1&gt;

&lt;p&gt;I used to think a voice AI agent was basically a chatbot with audio.&lt;/p&gt;

&lt;p&gt;User speaks.&lt;br&gt;&lt;br&gt;
AI understands.&lt;br&gt;&lt;br&gt;
AI replies.&lt;/p&gt;

&lt;p&gt;That was the simple version in my head.&lt;/p&gt;

&lt;p&gt;But after working on &lt;a href="https://ringbooker.com" rel="noopener noreferrer"&gt;RingBooker&lt;/a&gt;, an AI receptionist for salons, spas, med spas, and beauty clinics, I started to see voice AI very differently.&lt;/p&gt;

&lt;p&gt;A chatbot can be useful even when it feels a little slow.&lt;/p&gt;

&lt;p&gt;A phone call cannot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text gives the user more patience
&lt;/h2&gt;

&lt;p&gt;When someone uses a chatbot, they expect a short delay.&lt;/p&gt;

&lt;p&gt;They can see the answer being generated.&lt;br&gt;&lt;br&gt;
They can reread the message.&lt;br&gt;&lt;br&gt;
They can scroll back.&lt;br&gt;&lt;br&gt;
They can pause before replying.&lt;/p&gt;

&lt;p&gt;The experience gives them space.&lt;/p&gt;

&lt;p&gt;A phone call does not.&lt;/p&gt;

&lt;p&gt;On a call, silence feels uncomfortable almost immediately.&lt;/p&gt;

&lt;p&gt;If the AI waits too long, the caller may think the call dropped.&lt;/p&gt;

&lt;p&gt;If the AI replies too quickly, it can feel unnatural.&lt;/p&gt;

&lt;p&gt;If the AI talks too much, the caller interrupts.&lt;/p&gt;

&lt;p&gt;That makes the timing much harder to get right.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voice AI has to feel alive
&lt;/h2&gt;

&lt;p&gt;In text, the user judges the quality mostly by the final answer.&lt;/p&gt;

&lt;p&gt;In voice, the user judges the whole interaction.&lt;/p&gt;

&lt;p&gt;The pause before the answer.&lt;br&gt;&lt;br&gt;
The tone.&lt;br&gt;&lt;br&gt;
The interruption handling.&lt;br&gt;&lt;br&gt;
The confidence.&lt;br&gt;&lt;br&gt;
The moment when the AI says “Let me check that.”&lt;br&gt;&lt;br&gt;
The way it handles uncertainty.&lt;/p&gt;

&lt;p&gt;Even when the underlying model is good, the experience can still feel bad if the voice flow is awkward.&lt;/p&gt;

&lt;p&gt;This was one of the first things I had to accept:&lt;/p&gt;

&lt;p&gt;The model is only one part of the product.&lt;/p&gt;

&lt;p&gt;The conversation design is just as important.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interruptions are normal, not edge cases
&lt;/h2&gt;

&lt;p&gt;In a chatbot, the flow is usually clean.&lt;/p&gt;

&lt;p&gt;The user sends a message.&lt;br&gt;&lt;br&gt;
The assistant replies.&lt;br&gt;&lt;br&gt;
Then the user sends another message.&lt;/p&gt;

&lt;p&gt;Phone calls are not like that.&lt;/p&gt;

&lt;p&gt;People interrupt.&lt;/p&gt;

&lt;p&gt;They correct themselves.&lt;/p&gt;

&lt;p&gt;They ask a second question before the first one is answered.&lt;/p&gt;

&lt;p&gt;They start with one intent and change it halfway through.&lt;/p&gt;

&lt;p&gt;For local businesses, this happens all the time.&lt;/p&gt;

&lt;p&gt;A salon caller might say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Do you have anything today? Actually, tomorrow morning would be better.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A med spa caller might say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I’m interested in laser. Wait, is that the same as IPL?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A spa caller might ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How much is a massage? Also, do you have couples appointments?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the AI cannot handle interruptions, the caller feels trapped inside a script.&lt;/p&gt;

&lt;p&gt;That is not a good experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  The input is messy
&lt;/h2&gt;

&lt;p&gt;Most AI demos happen in clean environments.&lt;/p&gt;

&lt;p&gt;Real phone calls do not.&lt;/p&gt;

&lt;p&gt;People call from cars.&lt;br&gt;&lt;br&gt;
They call from busy rooms.&lt;br&gt;&lt;br&gt;
They speak quietly.&lt;br&gt;&lt;br&gt;
They use vague words.&lt;br&gt;&lt;br&gt;
They ask incomplete questions.&lt;br&gt;&lt;br&gt;
They may not know the correct service name.&lt;/p&gt;

&lt;p&gt;For a chatbot, messy input is annoying.&lt;/p&gt;

&lt;p&gt;For a phone agent, messy input is the default.&lt;/p&gt;

&lt;p&gt;This changes the product design.&lt;/p&gt;

&lt;p&gt;The AI has to ask follow-up questions, but not too many.&lt;/p&gt;

&lt;p&gt;It has to collect useful information, but not sound like a form.&lt;/p&gt;

&lt;p&gt;It has to be helpful, but not overconfident.&lt;/p&gt;

&lt;p&gt;That balance is difficult.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local businesses need boundaries
&lt;/h2&gt;

&lt;p&gt;One mistake I see in many AI product ideas is trying to make the AI do everything.&lt;/p&gt;

&lt;p&gt;Answer every question.&lt;br&gt;&lt;br&gt;
Book every appointment.&lt;br&gt;&lt;br&gt;
Handle every exception.&lt;br&gt;&lt;br&gt;
Replace every human step.&lt;/p&gt;

&lt;p&gt;For local businesses, I think that is the wrong starting point.&lt;/p&gt;

&lt;p&gt;A salon, spa, or med spa does not need an AI that pretends to be perfect.&lt;/p&gt;

&lt;p&gt;They need an AI that can reliably help with the calls the team cannot always answer.&lt;/p&gt;

&lt;p&gt;That might mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;answering after-hours calls&lt;/li&gt;
&lt;li&gt;collecting booking intent&lt;/li&gt;
&lt;li&gt;asking for preferred time&lt;/li&gt;
&lt;li&gt;capturing service details&lt;/li&gt;
&lt;li&gt;summarizing the call&lt;/li&gt;
&lt;li&gt;handing off when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The handoff is not a failure.&lt;/p&gt;

&lt;p&gt;Sometimes the handoff is the product working correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust is more fragile on the phone
&lt;/h2&gt;

&lt;p&gt;In a chatbot, a wrong answer is bad.&lt;/p&gt;

&lt;p&gt;On a phone call, a wrong answer can feel worse.&lt;/p&gt;

&lt;p&gt;The caller is giving attention in real time. They may be trying to book something, ask about pricing, reschedule, or decide whether the business feels trustworthy.&lt;/p&gt;

&lt;p&gt;If the AI sounds too confident about something it should not promise, trust drops.&lt;/p&gt;

&lt;p&gt;If it pretends to know a policy it does not know, trust drops.&lt;/p&gt;

&lt;p&gt;If it refuses to hand off when the caller asks for a human, trust drops.&lt;/p&gt;

&lt;p&gt;For appointment-based businesses, trust matters because the call is often part of the buying decision.&lt;/p&gt;

&lt;p&gt;This is especially true for services like hair color, skin treatments, injections, laser, massage, or first-time consultations.&lt;/p&gt;

&lt;p&gt;The caller is not only asking for information.&lt;/p&gt;

&lt;p&gt;They are testing whether the business feels responsive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The transcript is not enough
&lt;/h2&gt;

&lt;p&gt;At first, I thought the transcript would be one of the most important outputs.&lt;/p&gt;

&lt;p&gt;But the more I thought about the workflow, the more I realized most business owners do not want to read long transcripts.&lt;/p&gt;

&lt;p&gt;They want the useful summary.&lt;/p&gt;

&lt;p&gt;Who called?&lt;br&gt;&lt;br&gt;
What did they want?&lt;br&gt;&lt;br&gt;
How urgent was it?&lt;br&gt;&lt;br&gt;
What service were they asking about?&lt;br&gt;&lt;br&gt;
What should the team do next?&lt;/p&gt;

&lt;p&gt;A clean summary can be more useful than a perfect transcript.&lt;/p&gt;

&lt;p&gt;This is one of the biggest differences between building a demo and building a product.&lt;/p&gt;

&lt;p&gt;The demo is about showing that the AI can talk.&lt;/p&gt;

&lt;p&gt;The product is about helping the business take action after the call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The goal is not to sound impressive
&lt;/h2&gt;

&lt;p&gt;A good voice AI product should not be measured only by how smart it sounds.&lt;/p&gt;

&lt;p&gt;For local businesses, I think the better questions are:&lt;/p&gt;

&lt;p&gt;Did it answer quickly?&lt;br&gt;&lt;br&gt;
Did it understand the caller’s intent?&lt;br&gt;&lt;br&gt;
Did it avoid making promises it should not make?&lt;br&gt;&lt;br&gt;
Did it know when to ask a follow-up question?&lt;br&gt;&lt;br&gt;
Did it know when to hand off?&lt;br&gt;&lt;br&gt;
Did it send the team something useful?&lt;/p&gt;

&lt;p&gt;That is a more practical benchmark.&lt;/p&gt;

&lt;p&gt;It is also harder than it sounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would build around first
&lt;/h2&gt;

&lt;p&gt;If I were starting again, I would not start with the most complex booking flow.&lt;/p&gt;

&lt;p&gt;I would start with the most common missed-call situations:&lt;/p&gt;

&lt;p&gt;After-hours callers.&lt;br&gt;&lt;br&gt;
Busy-hour overflow.&lt;br&gt;&lt;br&gt;
Same-day appointment requests.&lt;br&gt;&lt;br&gt;
Reschedules.&lt;br&gt;&lt;br&gt;
Basic pricing questions.&lt;br&gt;&lt;br&gt;
Consultation inquiries.&lt;br&gt;&lt;br&gt;
Human handoff requests.&lt;/p&gt;

&lt;p&gt;These are not the most glamorous flows, but they are the ones that happen every day.&lt;/p&gt;

&lt;p&gt;And for a local business, capturing one missed opportunity can matter more than having a perfect AI demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;Voice AI is not just chatbot logic plus speech.&lt;/p&gt;

&lt;p&gt;It is a different product surface.&lt;/p&gt;

&lt;p&gt;The user experience is faster, messier, and less forgiving.&lt;/p&gt;

&lt;p&gt;That is what makes it hard.&lt;/p&gt;

&lt;p&gt;But that is also what makes it interesting.&lt;/p&gt;

&lt;p&gt;For local businesses, the phone is still where many high-intent customers show up. If AI can help answer those calls without pretending to replace the human team, I think there is a real product there.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>voiceai</category>
      <category>startup</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>Building an AI Receptionist for Salons: What Was Harder Than I Expected</title>
      <dc:creator>Luis Pham</dc:creator>
      <pubDate>Tue, 28 Apr 2026 06:47:08 +0000</pubDate>
      <link>https://dev.to/luispham/building-an-ai-phone-agent-for-salons-what-was-harder-than-i-expected-hn6</link>
      <guid>https://dev.to/luispham/building-an-ai-phone-agent-for-salons-what-was-harder-than-i-expected-hn6</guid>
      <description>&lt;h1&gt;
  
  
  Building an AI Receptionist for Salons: What Was Harder Than I Expected
&lt;/h1&gt;

&lt;p&gt;I’m building &lt;a href="https://ringbooker.com" rel="noopener noreferrer"&gt;RingBooker&lt;/a&gt;, an AI receptionist for salons, spas, med spas, and other appointment-based businesses.&lt;/p&gt;

&lt;p&gt;When I started, I thought the product was mostly about the AI.&lt;/p&gt;

&lt;p&gt;Answer the phone.&lt;br&gt;&lt;br&gt;
Understand the caller.&lt;br&gt;&lt;br&gt;
Collect the booking details.&lt;br&gt;&lt;br&gt;
Send the business a summary.&lt;/p&gt;

&lt;p&gt;Simple enough.&lt;/p&gt;

&lt;p&gt;It turned out the hard part was not only the AI model. The hard part was the phone call itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  A phone call has no “loading state”
&lt;/h2&gt;

&lt;p&gt;With a chatbot, a user can wait.&lt;/p&gt;

&lt;p&gt;They can see the message is generating. They can scroll back. They can reread the answer.&lt;/p&gt;

&lt;p&gt;On a phone call, silence feels broken.&lt;/p&gt;

&lt;p&gt;Even a short delay can make the caller wonder:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Is this still listening?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That changed how I thought about latency.&lt;/p&gt;

&lt;p&gt;At first I was looking at latency like a normal backend metric. How fast is the response? How long does the model take? How quickly can the audio come back?&lt;/p&gt;

&lt;p&gt;But in a real phone call, the user does not care about the number.&lt;/p&gt;

&lt;p&gt;They care about whether the conversation feels alive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The caller will interrupt
&lt;/h2&gt;

&lt;p&gt;This was another thing I underestimated.&lt;/p&gt;

&lt;p&gt;In text, the flow is clean. User sends a message. Assistant replies.&lt;/p&gt;

&lt;p&gt;On a call, people interrupt constantly.&lt;/p&gt;

&lt;p&gt;They start with one request, then change it halfway through.&lt;/p&gt;

&lt;p&gt;“Do you have anything today? Actually tomorrow is better.”&lt;/p&gt;

&lt;p&gt;“I need a refill. Wait, maybe a full set.”&lt;/p&gt;

&lt;p&gt;“Can I speak to someone? No, actually I just want to know the price first.”&lt;/p&gt;

&lt;p&gt;If the AI keeps talking when the caller is trying to correct something, the whole experience feels wrong.&lt;/p&gt;

&lt;p&gt;Barge-in is not a small feature. It is part of the core UX.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local business calls are messy
&lt;/h2&gt;

&lt;p&gt;A lot of AI demos assume the user gives clean input.&lt;/p&gt;

&lt;p&gt;Real callers do not.&lt;/p&gt;

&lt;p&gt;They speak from cars.&lt;br&gt;&lt;br&gt;
They call from noisy rooms.&lt;br&gt;&lt;br&gt;
They use vague phrases.&lt;br&gt;&lt;br&gt;
They ask two questions at once.&lt;br&gt;&lt;br&gt;
They sometimes do not know the exact service name.&lt;/p&gt;

&lt;p&gt;For salons and spas, this is common.&lt;/p&gt;

&lt;p&gt;A caller may say “nails” when they mean acrylic full set.&lt;br&gt;&lt;br&gt;
A med spa caller may ask about “laser” without knowing which treatment.&lt;br&gt;&lt;br&gt;
A hair salon caller may ask for “color” without knowing whether it is highlights, root touch-up, or correction.&lt;/p&gt;

&lt;p&gt;So the AI cannot just collect form fields. It has to ask enough follow-up questions without making the call feel like an interrogation.&lt;/p&gt;

&lt;p&gt;That balance is harder than I expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The summary matters more than the transcript
&lt;/h2&gt;

&lt;p&gt;At first I cared a lot about the transcript.&lt;/p&gt;

&lt;p&gt;Then I realized the business owner probably does not want to read a full call transcript.&lt;/p&gt;

&lt;p&gt;They want the useful part:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who called&lt;/li&gt;
&lt;li&gt;what they wanted&lt;/li&gt;
&lt;li&gt;how urgent it was&lt;/li&gt;
&lt;li&gt;what questions they asked&lt;/li&gt;
&lt;li&gt;what should happen next&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a busy salon owner, a clean summary is more valuable than a perfect transcript.&lt;/p&gt;

&lt;p&gt;This changed the product direction for me.&lt;/p&gt;

&lt;p&gt;The call itself is only half the product. The handoff to the business is the other half.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI should not try to handle everything
&lt;/h2&gt;

&lt;p&gt;This is probably the biggest lesson so far.&lt;/p&gt;

&lt;p&gt;It is tempting to make the AI answer every question and complete every flow.&lt;/p&gt;

&lt;p&gt;But for real businesses, that is risky.&lt;/p&gt;

&lt;p&gt;Some calls should go to a human.&lt;br&gt;&lt;br&gt;
Some questions depend on policy.&lt;br&gt;&lt;br&gt;
Some prices depend on consultation.&lt;br&gt;&lt;br&gt;
Some callers just want reassurance.&lt;/p&gt;

&lt;p&gt;A useful AI phone agent needs to know its boundary.&lt;/p&gt;

&lt;p&gt;For RingBooker, I started thinking less about “AI replacing the front desk” and more about “AI covering the calls the team cannot answer.”&lt;/p&gt;

&lt;p&gt;That framing feels much healthier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The existing phone number is part of the product
&lt;/h2&gt;

&lt;p&gt;This was not obvious to me at the beginning.&lt;/p&gt;

&lt;p&gt;For many local businesses, the phone number is everywhere:&lt;/p&gt;

&lt;p&gt;Google Business Profile, website, ads, Instagram, business cards, printed signs, old customers’ phones.&lt;/p&gt;

&lt;p&gt;Asking them to change that number is a huge ask.&lt;/p&gt;

&lt;p&gt;So call forwarding became an important part of the product idea.&lt;/p&gt;

&lt;p&gt;The business should be able to keep the number customers already know, while RingBooker sits behind the front-desk line for missed, overflow, or after-hours calls.&lt;/p&gt;

&lt;p&gt;It sounds like a small detail, but for local businesses it is a big trust issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would tell another builder
&lt;/h2&gt;

&lt;p&gt;If you are building a voice AI product, do not start only with the model.&lt;/p&gt;

&lt;p&gt;Start with the awkward parts of the call.&lt;/p&gt;

&lt;p&gt;What happens when the caller interrupts?&lt;br&gt;&lt;br&gt;
What happens when the audio is bad?&lt;br&gt;&lt;br&gt;
What happens when the AI is unsure?&lt;br&gt;&lt;br&gt;
What happens when the caller asks for a human?&lt;br&gt;&lt;br&gt;
What happens after the call ends?&lt;/p&gt;

&lt;p&gt;Those edge cases are not edge cases for long. They become the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;I still think voice AI will become very important for local businesses.&lt;/p&gt;

&lt;p&gt;But I no longer think the goal is to make the AI sound impressive.&lt;/p&gt;

&lt;p&gt;The goal is simpler and harder:&lt;/p&gt;

&lt;p&gt;Answer quickly.&lt;br&gt;&lt;br&gt;
Be clear.&lt;br&gt;&lt;br&gt;
Do not overpromise.&lt;br&gt;&lt;br&gt;
Know when to hand off.&lt;br&gt;&lt;br&gt;
Give the business a useful next step.&lt;/p&gt;

&lt;p&gt;That is the version of the product I’m trying to build.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>voiceai</category>
      <category>startup</category>
      <category>saas</category>
    </item>
  </channel>
</rss>
