Building an AI Phone Agent for Salons: What Was Harder Than I Expected
I’m building RingBooker, an AI phone agent for salons, spas, med spas, and other appointment-based businesses.
When I started, I thought the product was mostly about the AI.
Answer the phone.
Understand the caller.
Collect the booking details.
Send the business a summary.
Simple enough.
It turned out the hard part was not only the AI model. The hard part was the phone call itself.
A phone call has no “loading state”
With a chatbot, a user can wait.
They can see the message is generating. They can scroll back. They can reread the answer.
On a phone call, silence feels broken.
Even a short delay can make the caller wonder:
“Is this still listening?”
That changed how I thought about latency.
At first I was looking at latency like a normal backend metric. How fast is the response? How long does the model take? How quickly can the audio come back?
But in a real phone call, the user does not care about the number.
They care about whether the conversation feels alive.
The caller will interrupt
This was another thing I underestimated.
In text, the flow is clean. User sends a message. Assistant replies.
On a call, people interrupt constantly.
They start with one request, then change it halfway through.
“Do you have anything today? Actually tomorrow is better.”
“I need a refill. Wait, maybe a full set.”
“Can I speak to someone? No, actually I just want to know the price first.”
If the AI keeps talking when the caller is trying to correct something, the whole experience feels wrong.
Barge-in is not a small feature. It is part of the core UX.
Local business calls are messy
A lot of AI demos assume the user gives clean input.
Real callers do not.
They speak from cars.
They call from noisy rooms.
They use vague phrases.
They ask two questions at once.
They sometimes do not know the exact service name.
For salons and spas, this is common.
A caller may say “nails” when they mean acrylic full set.
A med spa caller may ask about “laser” without knowing which treatment.
A hair salon caller may ask for “color” without knowing whether it is highlights, root touch-up, or correction.
So the AI cannot just collect form fields. It has to ask enough follow-up questions without making the call feel like an interrogation.
That balance is harder than I expected.
The summary matters more than the transcript
At first I cared a lot about the transcript.
Then I realized the business owner probably does not want to read a full call transcript.
They want the useful part:
- who called
- what they wanted
- how urgent it was
- what questions they asked
- what should happen next
For a busy salon owner, a clean summary is more valuable than a perfect transcript.
This changed the product direction for me.
The call itself is only half the product. The handoff to the business is the other half.
The AI should not try to handle everything
This is probably the biggest lesson so far.
It is tempting to make the AI answer every question and complete every flow.
But for real businesses, that is risky.
Some calls should go to a human.
Some questions depend on policy.
Some prices depend on consultation.
Some callers just want reassurance.
A useful AI phone agent needs to know its boundary.
For RingBooker, I started thinking less about “AI replacing the front desk” and more about “AI covering the calls the team cannot answer.”
That framing feels much healthier.
The existing phone number is part of the product
This was not obvious to me at the beginning.
For many local businesses, the phone number is everywhere:
Google Business Profile, website, ads, Instagram, business cards, printed signs, old customers’ phones.
Asking them to change that number is a huge ask.
So call forwarding became an important part of the product idea.
The business should be able to keep the number customers already know, while RingBooker sits behind the front-desk line for missed, overflow, or after-hours calls.
It sounds like a small detail, but for local businesses it is a big trust issue.
What I would tell another builder
If you are building a voice AI product, do not start only with the model.
Start with the awkward parts of the call.
What happens when the caller interrupts?
What happens when the audio is bad?
What happens when the AI is unsure?
What happens when the caller asks for a human?
What happens after the call ends?
Those edge cases are not edge cases for long. They become the product.
Final thought
I still think voice AI will become very important for local businesses.
But I no longer think the goal is to make the AI sound impressive.
The goal is simpler and harder:
Answer quickly.
Be clear.
Do not overpromise.
Know when to hand off.
Give the business a useful next step.
That is the version of the product I’m trying to build.
Top comments (0)