DEV Community

Cover image for The Quote Was $0.07 a Minute. The Bill Wasn't.
TheAutomate.io
TheAutomate.io

Posted on • Originally published at theautomate.io

The Quote Was $0.07 a Minute. The Bill Wasn't.

TL;DR

  • The $0.07/min voice agent cost quote is real. It's just not the whole bill.
  • In production, voice agent cost runs $0.13 to $0.31 per minute once the model, telephony, and TTS are counted.
  • If you're budgeting for voice AI, you need all four layers. Not just one.

That $0.07 a minute figure is accurate. It's the voice engine. Voice agent cost in a live production build is a different number entirely once you count everything that actually makes the call work.

Hook slide showing the gap between quoted and actual voice agent cost

Why Does the $0.07 Quote Feel Like a Bait and Switch?

It's not a bait and switch. It's a scope mismatch. The vendor quoted one layer of a four-layer stack.

The voice engine handles real-time audio. That's the part most platforms lead with in their pricing pages. It's the most visible component and it's genuinely priced around that figure. But it doesn't think, it doesn't route calls, and it doesn't turn text into speech. It just moves audio.

When a prospect asks "what does it cost", they mean end to end. Most pricing pages don't answer that question. They answer a much smaller one.

This is why voice agent cost surprises people after they commit. Not because anyone lied. Because the question and the answer weren't about the same thing.

Slide showing the engine-only component of voice agent cost

What Does the LLM Add to Voice Agent Cost?

The language model is the most variable part of the voice agent cost stack, and it's often the biggest surprise.

Every time your agent speaks, it's running a prompt through a model. Faster, cheaper models keep the cost low. More capable models cost more per call. The gap between them is real, and the right choice depends on how much reasoning your use case actually needs.

This is exactly why the cheap-first, expensive-on-retry pattern exists. You route straightforward turns through a cheaper model and only escalate to the heavier one when the call demands it. It's one of the most practical ways to control voice agent cost in production without sacrificing call quality.

If you're running a high volume of calls, the model layer is where your cost discipline either holds or falls apart.

Slide showing the LLM layer in the voice agent cost stack

What Does Telephony Add to the Bill?

Telephony is the part nobody mentions in the demo. It's also unavoidable.

Calls have to travel somewhere. Whether you're using Twilio, Vonage, or a platform-bundled solution, there's a per-minute charge on the PSTN side. Some platforms include it. Most don't. If you're calling Australian mobile numbers, you're paying Australian termination rates.

According to ACMA's numbering and infrastructure guidance, calls to certain number ranges carry different cost structures. Worth checking before you assume flat-rate global pricing applies to your use case.

Telephony alone won't blow your budget. But if you didn't model it in, it'll make your unit economics look worse than expected once real calls start flowing.

Slide showing telephony as a component of total voice agent cost

What Does Text-to-Speech Do to Voice Agent Cost?

TTS is cheap per character, but it runs on every single utterance the agent makes. It adds up.

Every word your agent says goes through a TTS engine. ElevenLabs, Deepgram, Cartesia, platform-native options. They're all priced differently. Some are billed by character. Some by minute. Some are bundled into the voice platform tier.

The quality gap between providers is real. A cheap TTS voice sounds robotic. That matters for conversion on outbound calls, especially if you're in finance broking or insurance where trust is everything. You're not going to choose a voice that undermines the call just to save fractions of a cent.

The full picture on voice agent cost looks something like this:

  • Voice engine (real-time audio routing)
  • Language model (reasoning and response generation)
  • Telephony (PSTN call routing and termination)
  • Text-to-speech (converting model output to audio)

All four are real costs. All four run on every call. The $0.13 to $0.31 per minute range reflects that reality.

Slide showing TTS as the final layer in voice agent cost

So How Do You Keep Voice Agent Cost Under Control?

Model selection and call design are the two levers you actually control.

You can't negotiate telephony rates much. TTS is mostly fixed by quality tier. But you can control how often the heavy model fires, and you can design calls that resolve faster.

Shorter calls with tighter prompts cost less per outcome. It's not about being cheap. It's about not burning budget on unnecessary turns. A well-scoped agent that handles one job cleanly will almost always beat a do-everything agent on unit economics.

For a deeper look at where AI build costs can spiral in unexpected ways, the model dependency post covers what happens when a key piece of your stack disappears mid-build. The cost there isn't just dollars.

Key Takeaways

  • The $0.07/min quote is the voice engine only. Production voice agent cost runs $0.13 to $0.31 per minute all in.
  • Four cost layers sit under every call: voice engine, LLM, telephony, and TTS. Model your budget against all four.
  • Call design and model routing are the main levers. Shorter, tighter calls cost less per outcome.

If you're about to sign off on a voice AI build and haven't stress-tested the cost model, DM me AUDIT. I'll send you the five questions worth asking before you commit.


Originally published at theautomate.io.

Top comments (0)