DEV Community

Sam
Sam

Posted on

What I learned building an AI voice agent stack solo (Vapi + n8n, 2 months in)

Two months ago I started building voice agents for small service businesses. dental clinics and HVAC companies that lose real money every time a call goes to voicemail. I'm doing it solo, alongside a day job, which means every wrong turn costs me a weekend I don't get back.

Here's what actually went wrong, what I'd tell myself on day one, and the parts of the stack that held up.

The stack, briefly

  • Vapi for the voice layer (speech-to-text, the LLM turn, text-to-speech)
  • n8n self-hosted on a cheap VPS for orchestration — booking lookups, calendar writes, follow-up triggers
  • A Google Sheets + n8n layer for scheduling and logging while I'm pre-revenue and don't want to pay for tooling I haven't validated

Nothing here is exotic. That was the point. I wanted boring, debuggable infrastructure I could reason about at 11pm.

Lesson 1: The hard problem isn't the AI. It's the handoff.

I assumed the voice model would be the scary part. It wasn't. Modern voice platforms handle the conversation surprisingly well out of the box.

The actual pain was everything around the conversation — what happens when the agent needs to check an appointment slot, write to a calendar, or hand off to a human gracefully. That orchestration logic is where I lost the most time, and it's the part no demo video ever shows you.

If you're evaluating this space: budget your time for the plumbing, not the model.

Lesson 2: Self-hosting n8n is worth it, but prune your execution data or die

Running n8n in Docker on a small VPS is genuinely fine for low volume. What nobody warned me about: execution data accumulates fast and will quietly eat your disk.

The fix is one environment variable:

Set it early. I found out the way you'd expect — a workflow failing for no obvious reason, an hour of confusion, then a df -h showing a nearly full disk.

Lesson 3: Cold outreach taught me more than my landing page did

I ran a cold email campaign to roughly 1,600 leads over two months. Clean domain warmup, SPF/DKIM/DMARC all verified, aggregate reports showing no auth failures.

Replies: basically zero.

That stung, but it was useful. It forced me to confront that deliverability being technically correct and the message being compelling are completely different problems. The infrastructure was fine. The offer and the targeting weren't sharp enough yet. No amount of DNS hygiene fixes a message that doesn't land.

Lesson 4: Narrow beats broad, faster than I expected

Early on I wanted to serve "service businesses." Too vague. The moment I picked one vertical and wrote scripts for specific call patterns — new patient booking, after-hours emergencies, the weird edge cases a real receptionist handles — everything got easier. The demos got sharper. The objections got predictable.

If you're building anything agent-shaped: pick the narrowest viable slice and over-fit to it. You can generalize later.

What I'd tell myself on day one

  1. The model is the easy 20%. Plan for the orchestration.
  2. Turn on data pruning before you need it.
  3. Correct infrastructure ≠ a message people respond to. Validate the offer separately.
  4. Go narrower than feels comfortable.

If you're building in the voice-agent or automation space and have hit the same walls, I'd genuinely like to compare notes in the comments.

I'm building [VoiceIntego], AI voice agents for service businesses, mostly so businesses stop losing jobs to voicemail. Still early. Happy to talk shop.

Top comments (0)