How I Built an AI Voice Agent That Makes Outbound Sales Calls

#ai #voice #saas #startup

Every startup founder knows the outbound sales grind. You either hire appointment setters at $3K–$5K a month and deal with constant turnover, or you make the calls yourself and burn hours that should go toward building the product.
I decided to build a third option.

The Problem

I was running outbound campaigns for a small business and quickly hit the ceiling. One person can make maybe 50–60 calls a day before their voice gives out and their energy tanks. Hiring was expensive and unreliable — I went through three appointment setters in four months.
I kept thinking: these calls follow a script. The qualifying questions are the same every time. The objection handling follows predictable patterns. This is exactly the kind of structured, repetitive work that AI should be doing.
So I built Novara AI.

How It Works Under the Hood

The architecture is simpler than you might think. There are three core components working together in real-time:
Speech-to-Text: When the prospect speaks, their audio gets transcribed in milliseconds. Modern STT models have gotten incredibly accurate, even with accents, background noise, and crosstalk.
The Brain (LLM): The transcript gets fed to a language model along with the user's call script, objection handling rules, and qualification criteria. The model decides what to say next — not by picking from a decision tree, but by understanding the conversation context and generating a natural response that stays on-script.
Text-to-Speech: The response gets converted to spoken audio using voice synthesis. This is where the magic happens. Modern TTS doesn't sound robotic anymore. Natural pacing, filler words, appropriate pauses — the quality has crossed a threshold where most people on the other end don't realize they're talking to AI.
The entire loop — listen, think, speak — happens fast enough that the conversation flows without awkward silences.

The Hardest Part Wasn't the AI

The voice technology was actually the easier part. The hard part was making it useful for non-technical people.
My target users are agency owners, real estate teams, solar companies — people who want results, not an API to tinker with. They don't want to write code or configure webhooks. They want to paste their script, upload a contact list, and start getting meetings booked.
So the entire setup flow had to be dead simple. Upload your script. Set your targets. Hit launch. See meetings appear in your calendar. I got it down to about 10 minutes from signup to first call.
That simplicity is what I think matters most in this space. There are plenty of AI voice platforms that are powerful but require a developer to set up. Most small businesses and agencies don't have that luxury.

What I've Learned So Far

A few things that surprised me after launching:
Scripts matter more than AI quality. The best voice AI in the world will fail with a bad script. The companies getting the best results are the ones who already had proven cold call scripts from their human reps. AI just runs those scripts at 10x the volume.
Follow-up calls are the hidden goldmine. Everyone thinks about cold outreach, but the highest-converting use case is actually speed-to-lead — calling inbound leads within minutes of form submission. A lead that gets called in 2 minutes converts at 5–10x the rate of one that gets called the next day.
People don't care that it's AI. I expected more pushback from prospects. In practice, most people either don't notice or don't care, as long as the conversation is helpful and relevant. What they hate is irrelevant, poorly-timed calls — and that's a script problem, not an AI problem.

Where AI Calling Is Headed

We're still early. In the next 12 months I expect to see AI calling agents that can handle multi-call sequences (first call qualifies, second call closes), integrate natively with CRMs to pull prospect context mid-call, and switch between languages seamlessly.
The companies that adopt this now are going to have a structural cost advantage. When your competitor is paying $15K a month for three SDRs and you're getting the same output from AI at a fraction of that, the math compounds fast.
If you want to see what I built, check out Novara AI at getnovara.io. There's a free trial if you want to test it with real calls.

Tech Stack (For the Curious)

For those wondering about the technical side: the frontend is React, the voice pipeline uses a combination of real-time STT and TTS models, and the conversation engine is LLM-powered with custom guardrails to keep calls on-script. Happy to answer questions in the comments.