I spent 3-4 weeks building an AI voice agent that cold-calls HVAC owners for me. Real outbound. Real objection handling. Real appointments booked. When we listen back through the recordings, the owners never clock it as AI — they just think its a human SDR with a slightly weird cadence.
This is the build, and the honest reason its sitting on the shelf for high-volume right now.
The stack
Make.com runs the whole thing — seven scenarios wired around a single Google Sheet called NeverMiss – Outbound Calls that acts as the CRM.
-
dialler — every 5 minutes, pulls one row where
status=new,enabled=TRUE,attempts<3, and fires a call into Vapi - Logger — webhook from Vapi after every call. Writes transcript, recording, outcome back. Uses an OpenAI module to classify the transcript into structured outcomes
-
Watch Dog — every 10 minutes, catches rows where the call went out but the Logger webhook never fired (Vapi had a bad minute), flips them to
logger-missedfor retry - Reconciler — sweeps stuck rows, enforces attempt caps, handles edge cases
- recording sync — pulls MP3s out of Vapi before the 30-day retention wipes them, stores in Drive
- Booking Handler — when the agent closes a booking mid-call, writes the calendar event and fires a confirmation
- CallBack Dialer — separate path for inbound callback requests (form-fill → call within 60 seconds)
One Vapi agent on the voice side. Four Twilio numbers rotating for caller ID reputation. ElevenLabs for the actual voice. The complexity isnt in any one piece — its in how tightly theyre wired so nothing quietly breaks.
The Sheet is the CRM, with row locks
The Sheet has columns you would expect: business_name, phone_e164, status, call_id, recording_url, email, last_called_at, attempts, notes.
And some you might not: lead_uid (UUID, stable key), lock_token + lock_expires_at, next_callable_at, logger_ran_at.
The lock columns are the interesting part. Before the dialler fires a call, it writes a lock token and an expiry (lock_expires_at = now + 10 min). Thats the contract — if any other scenario or retry sees a row with a live lock, it skips it. If the lock is stale (expired), the row is free for reclaim.
Stops the Watch Dog from re-queueing a row thats actively being called. Stops two dialler runs from racing the same lead. Stops weird double-call bugs.
This is the kind of thing you dont need until volume hits, and then you really need it.
How it sounds like a human
Most AI voice builds fail on two things: the voice itself, and the gap between the human finishing their sentence and the agent replying. Both telegraph "robot" immediately.
For voice, ElevenLabs inside Vapi, cloned from a real sample. Tested OpenAI, Deepgram, PlayHT — ElevenLabs still wins on natural breath patterns and micro-pauses.
For latency, Vapis tunable VAD and smart endpointing, tuned so the gap between you finishing and the agent replying sits right where a human would land. Too low, it interrupts. Too high, the silence tells on it.
The system prompt is about 1,100 words. The agents named Alex. Rules include:
- Never read the prospect back to themselves
- Use natural fillers
- If asked "is this a robot" — be honest
- If asked for pricing — give a range and offer to send email details
- If told to stop calling — confirm and end gracefully
Specific objections each get short inline responses that flip to a question. Nothing read aloud — everything handled as conversation.
The agent can also call two tools mid-call:
-
book_appointment— triggers the Booking Handler scenario which writes to Calendar and emails the confirm -
capture_email— writes to the Sheet, kicks off a follow-up sequence
When those tools fire, the agent pauses briefly, gets the response, reads the confirmation back. No human in the loop on a successful call.
The glue — Logger, Watch Dog, recording sync
This is the boring but critical part.
Logger
Every Vapi call ends with a webhook hitting a Make scenario. The payload has the call ID, transcript, recording URL, end reason, and a Vapi-generated summary. The Logger finds the matching Sheet row by lead_uid, then runs the transcript through OpenAI to classify outcome into my taxonomy:
bookedemail_capturedvoicemailno_answerdo_not_callno_interestcallback_requested
Writes everything back, fires downstream actions — calendar events, DNC flags, re-queue for voicemails.
Watch Dog
Vapi sometimes fails to fire the post-call webhook. Without a watchdog, those rows sit silently in dialing status forever while you think everything is fine.
Mine runs every 10 minutes, finds rows with a last_called_at older than 5 minutes but logger_ran_at still empty, and flips status to logger-missed so the dialler picks them up again. Also clears stale lock tokens.
Recording sync
Vapi deletes recordings after 30 days. I want them forever — to retrain the prompt, spot objection patterns, audit outcomes. So a Make scenario runs every 4 hours, pulls the MP3, uploads to a Drive folder named <lead_uid>_<company>_<date>.mp3, writes the Drive URL back, flips a synced flag.
Owners cant tell its AI — actual numbers
The recording sync is how I proved this instead of just claiming it.
After 200+ connected calls, I tagged every recording: did the prospect ever ask "is this a bot?", and did they ever seem confused?
11 out of 200 asked if it was AI. Thats 5.5%.
And most of the 11 asked after Alex had already booked a callback or captured their email — curiosity, not suspicion.
The other 94.5% just had a normal conversation. Successful bookings averaged 3 min 40 sec — identical to a human SDR doing the same call, meaning prospects werent shortcutting it because they smelled a robot.
Id share clips but theyre real owners voices without opt-in. Trust the data.
Why Ive parked it for volume
Heres the uncomfortable part.
Most HVAC numbers on public directories arent direct owner lines. Theyre main business lines routing through:
- An IVR ("press 1 for service, 2 for billing")
- A call center
- A front-desk receptionist whose actual job is keeping people like me away from the owner
Current voice AI is still bad at IVRs. Vapi, Retell, Bland — all of them. The agent hears "press 1 for service" and either sits silent until the menu times out, or tries to talk to a recording.
Human gatekeepers are a coin flip — warm ones might put you through, suspicious ones kill the call with one question the agent cant answer.
Until voice AI gets genuinely good at navigating phone trees — pressing digits intelligently, detecting robot-vs-human, adjusting mid-call — high-volume outbound to public numbers burns money for 10-15% owner connection rates.
So Ive paused it. Not dead. Paused. When the models catch up, its right back on.
Why it still works
If youve got direct owner mobile numbers, this system is a weapon.
| Metric | Number |
|---|---|
| Connection rate (direct mobiles) | 40-55% |
| Book rate (on connected calls) | 8-14% |
| All-in cost per connected minute | ~$0.18 |
Thats cold-outbound economics that actually pencil.
Where direct mobiles come from ethically:
- LinkedIn scraping through Apollo / Clay / Wiza
- Network-sourced lists
- Trade show attendees
- Re-engagement on your own past leads
Also works brilliantly for inbound callback — form-fill, Alex calls within 60 seconds, qualifies, books. Running that in production right now. Book rates are silly.
What Im building instead
Inbound. AI receptionists catching the calls HVAC owners are already missing nights, weekends, summer peak.
Same agent pattern, flipped direction, better unit economics, 100% connection rate (theyre the ones calling you).
Thats what NeverMiss is now. If you run HVAC, plumbing, or roofing and want to hear what a modern AI receptionist sounds like when the call lands at 9:47pm on a Saturday — nevermisshq.com has the live demo.
And when voice AI finally cracks IVR? The dialler comes right off the shelf.
Happy to answer build questions in the comments.
Rayhan Mahmood runs NeverMiss AI automation for US home service businesses.
Top comments (0)