DEV Community: TheAutomate.io

Agent Validation: Stop Before You Book, Charge, or Send

TheAutomate.io — Thu, 25 Jun 2026 21:12:54 +0000

TL;DR

Agent validation is a mandatory checkpoint before every irreversible action: booking, charging, or sending anything.
Without it, a single bad data point propagates through your system and you can't easily undo it.
This matters most for finance brokers, insurers, and anyone whose agents touch real money or real calendars.

Agent validation is the pattern where your AI agent pauses before any irreversible action and confirms the data is clean before it proceeds. No exceptions.

Why Do Agents Act Without Checking First?

Most agents are wired to optimise for speed, not safety, and that's a problem the moment they touch anything you can't reverse.

The default posture for a voice AI build is: collect the data, do the thing. Fast is good. Friction is bad. And that logic holds right up until the agent books a lead into the wrong calendar slot, fires off a confirmation SMS to the wrong number, or triggers a charge on a card that was entered with a typo. Now you've got a customer complaint and a manual cleanup job. Neither is fast.

The issue isn't the AI. It's the absence of a gate between "data collected" and "action taken". Agent validation adds that gate.

What Does an Agent Validation Step Actually Look Like?

It's a function that runs after data collection and before the action node, checking every required field against defined rules before the agent is allowed to proceed.

In practice, agent validation sits inside your workflow orchestrator. On N8N, it's typically an IF node or a Code node that checks a set of preconditions. The agent does not move forward until every check returns true.

A basic validation block covers:

Required fields are present and non-empty (name, phone, email, appointment time)
Phone number format matches a real pattern (AU numbers, not a string of zeros)
Email passes a syntax check and domain isn't obviously fake
No conflicting data (two different appointment times captured in the same session)
External CRM confirms the lead record exists before writing back to it

If any check fails, the agent either retries the data collection step or routes to a human. It does not proceed.

Where Does Agent Validation Fit in the Broader Architecture?

Validation belongs at the boundary between the conversational layer and the integration layer, not inside the prompt and not bolted on at the end.

This is the mistake most first-build operators make. They try to handle validation inside the LLM prompt itself. "Make sure the phone number is valid before booking." That's not agent validation. That's a suggestion. The model might follow it. It might not. And you can't unit test a prompt the same way you can test a code block.

Real agent validation is code, not instructions. It runs deterministically. Pass or fail, every time, with no ambiguity.

If you're storing call state in Postgres, your validation step can also read from that state to catch edge cases the prompt layer never sees. Production agents that store state in Postgres cover this pattern in detail. And for any validation call that fails silently, a dead letter queue for failed call events gives you the replay capability to recover without losing the lead.

The ACMA also has requirements around consent and contact records for outbound communications. Agent validation is a natural place to check those flags before any outbound action fires.

What's the Trade-off of Adding This Gate?

Agent validation adds a small amount of latency and build complexity. That's the whole trade-off. It's an easy one.

The check adds a step. On a live call, that might mean the agent takes a beat before confirming the booking. That's fine. Callers don't notice a beat. They do notice a wrong booking or a confirmation SMS that goes to a stranger's number.

On the build side, you're writing and maintaining a validation function. It's not complex code. But it does need to be kept in sync with whatever fields your workflow actually uses. If you add a new integration that requires a postcode, your agent validation block needs to know about it.

The discipline is this: every time you add an irreversible action to an agent workflow, you add a corresponding validation rule before it. No exceptions. Not "we'll add it later". Not "this one's low risk". Every action that can't be easily undone gets a gate.

Key Takeaways

Agent validation is a deterministic code check, not a prompt instruction. It runs before every booking, charge, or send action.
Build it as a discrete node in your workflow, sitting between the conversational layer and the integration layer.
Every new irreversible action you add to the workflow needs a corresponding validation rule. Treat it as non-negotiable.

If you're building voice agents for finance, insurance, or real estate clients and want a second set of eyes on how your validation layer is wired, DM me AUDIT. I'll send you five questions that show where the gaps usually are.

Originally published at theautomate.io.

Retell's Concurrent Call Pricing: What You're Actually Paying For

TheAutomate.io — Wed, 24 Jun 2026 21:05:17 +0000

TL;DR

Every Retell account includes 20 concurrent calls free. Extra concurrent calls cost $8 per slot per month.
Concurrent call pricing scales with your busiest hour, not your total monthly call volume.
This matters most to service businesses with predictable peak windows, like brokers and real estate offices.

Retell prices on concurrent calls, not minutes or volume. Twenty concurrent calls are included on every account. Beyond that, each additional slot costs $8 per month.

What Does Concurrent Calls Actually Mean?

Concurrent calls is the number of calls your agent can handle at exactly the same time, right now.

It's not the number of calls per day. It's not your monthly volume. It's a snapshot of your peak load. If your finance brokerage runs a follow-up campaign and five leads call back inside the same thirty-second window, that's five concurrent calls. The nineteenth is fine. The twenty-first gets queued or dropped, depending on your setup.

This is the number that matters. Not how many calls you take across the month. How many you take at once.

What Does the Free Tier Actually Cover?

Twenty concurrent calls is a genuinely useful free tier for most small service businesses.

Think about what twenty simultaneous active calls looks like in practice. For most finance brokers, insurance advisers, or real estate offices with a team of under twenty staff, hitting that ceiling in normal operations is unlikely. The free tier covers a realistic peak for a business that's using voice AI for inbound follow-up, not running a national outbound campaign.

Where it gets tight is when you're doing bulk outbound. Dialling a list of several hundred leads in a short window will compress your concurrency fast. That's when the $8 slots start mattering.

How Does $8 Per Slot Per Month Stack Up?

The $8 per concurrent call per month cost is predictable infrastructure, not usage-based surprise billing.

This is the part that catches people off guard in a good way. You're not paying per minute on this line item. You're paying for capacity. If you add ten extra concurrent call slots and barely use them one month, you still pay for ten. But if you hammer them every day, you still only pay for ten. It behaves more like a reserved instance than a metered API.

That's a different mental model from most telephony pricing. And it's worth mapping before you build. Check the full voice agent cost breakdown to see where concurrency sits relative to your model and TTS spend.

What Are the Real Trade-offs to Consider?

The trade-off is buying for your peak, not your average.

Concurrency-based pricing means you need to think about your busiest hour, not your busiest month. That's a shift. A business with steady inbound across the day is in a very different position from one that sends a bulk SMS campaign at 9am and gets a wave of callbacks in the next forty minutes.

Here's what to map before you commit to a concurrency tier:

What's your realistic peak simultaneous call window?
Are you running inbound only, outbound only, or both at once?
Do your campaign cadences compress callbacks into short windows?
Is your outbound dialler pacing calls, or firing them simultaneously?
What happens to overflow? Queued, voicemail, or lost?

Getting this wrong means either overpaying for headroom you don't use, or dropping calls at the worst moment. Neither is a good look. For businesses where missed calls are a compliance risk, like those operating under ACMA telecommunications rules, overflow handling isn't optional.

How Does This Fit Into a Production Build?

Concurrency is an infrastructure decision, and it needs to be modelled before you go live, not after you hit the ceiling.

In a production voice AI stack on Retell and N8N, the concurrency limit affects how you design your retry logic and your call queuing. If you're replaying failed call events, you need to know your available slots before you fire the retry. A dead letter queue that replays into a saturated concurrency pool just fails again. Read how failed call events get replayed safely if you haven't set that up yet.

The $8 per slot model gives you a clean cost input for your capacity planning. Pick your peak target, multiply by $8, add it to your monthly build cost, and you know what you're running on.

Key Takeaways

Every Retell account includes 20 concurrent calls free. That covers most small service business peaks without spending a cent on capacity.
Extra concurrent calls cost $8 per slot per month. It's capacity pricing, not usage pricing.
Model your peak window before you go live. The free tier is generous, but compressed outbound campaigns can hit the ceiling fast.
Overflow handling matters. Dropped calls at peak are a lead loss and potentially a compliance issue depending on your industry.

If you want a quick read on how concurrency fits into total voice AI running costs, the voice agent cost breakdown covers the full picture.

Running a finance, insurance, real estate, or accounting business and want to know if your call volume fits inside the free tier? DM me AUDIT and I'll ask you five questions. You'll have a clear answer in under ten minutes.

Originally published at theautomate.io.

The Dead Letter Queue: Where Failed Calls Go to Be Replayed

TheAutomate.io — Tue, 23 Jun 2026 22:07:18 +0000

TL;DR

A dead letter queue catches failed call events so your voice AI stack can replay them instead of silently dropping leads.
Without one, a transient API timeout means the lead is gone. No alert. No retry. Just gone.
If you're running a production voice agent for any service business, this is non-negotiable infrastructure.

A dead letter queue is where a failed call event lands when your automation can't process it. Instead of the event disappearing, it parks itself somewhere safe so you can inspect it and replay it.

What actually happens when a call event fails?

Most stacks don't fail loudly. They just drop the event and move on.

Your webhook fires. Retell AI sends the call completed payload. N8N receives it, tries to write to GHL, hits a timeout. What happens next depends entirely on whether you've got error handling wired up. If you haven't, the answer is nothing. The event's gone. The lead's gone. Nobody knows.

This isn't a hypothetical. Transient failures happen constantly in production. API rate limits, downstream CRM hiccups, temporary network drops. Any of them can swallow an event.

What does a dead letter queue actually do?

It catches the failed event before it disappears and routes it somewhere you control.

The concept comes from message queue systems but the principle applies to any async automation. When an event can't be processed successfully after a set number of retries, instead of discarding it, you push it to a holding queue. That queue holds the payload intact. You can inspect what failed, fix the underlying issue, and replay the event against the live system.

For a voice AI pipeline, that payload typically includes the call ID, the contact record, the outcome data, and whatever the agent captured during the conversation. All of that is recoverable. But only if you catch it.

How do you wire a dead letter queue into an N8N voice AI pipeline?

You add an error branch to every webhook handler, then write failures to a Postgres table or a dedicated queue node.

In N8N, the simplest implementation uses the Error Trigger node combined with a Postgres write. Every failed execution gets caught by the error trigger. You write the raw payload, the error message, the timestamp, and the workflow ID to a dead letter table. From there you've got options. You can build a manual replay UI, an automated retry on a schedule, or just an alert that fires to Slack so you know something failed.

The Postgres approach pairs well with production state management because your failed events live in the same database as your call state. One place to look when things go wrong.

A minimal dead letter table needs at least these columns:

event_id (the original call or webhook ID)
payload (the full JSON, stored as JSONB)
error_message (what failed and why)
failed_at (timestamp)
retry_count (how many attempts were made)
status (pending, replayed, resolved)

Where does cost discipline fit into this architecture?

Replaying an event costs a fraction of what it costs to re-run a full voice AI call. The infrastructure is cheap. The lost lead isn't.

A replay reads from your dead letter table, reconstructs the payload, and re-runs the downstream steps. No new Retell AI call. No new TTS. No telephony minutes. You're just re-processing data you already have. The compute cost is negligible.

Contrast that with what happens if you don't catch the failure. The lead doesn't get logged in GHL. No follow-up gets triggered. The contact may never hear from you again unless they reach back out themselves. For a finance broker or insurance business, a missed follow-up isn't a minor inconvenience. It's a lost opportunity that cost real money to generate.

The full cost picture of running voice agents is already layered. As covered in the voice agent cost breakdown, you're paying across multiple services simultaneously. Losing a lead on top of that spend stings twice.

Is there a compliance angle to this for Australian businesses?

Yes. Under the Australian Privacy Act, you're responsible for what happens to personal data that passes through your systems, including data in failed events.

A call event payload typically contains the contact's name, phone number, and whatever the agent captured during the conversation. If that payload sits unprotected in a generic error log, or worse, gets dropped entirely with no record, you've got an auditing problem. The OAIC's APP guidelines are clear that entities must take reasonable steps to protect personal information from misuse or loss.

A dead letter queue with proper access controls and retention policies isn't just good engineering. It's defensible practice. You know where every piece of data went. You can show what failed, when, and what you did about it.

Key Takeaways

A dead letter queue is the difference between a recoverable failure and a permanently lost lead.
In N8N, wire an Error Trigger to a Postgres dead letter table. Store the full payload, the error, and the retry count.
Replay costs a fraction of re-running the original call. The infrastructure is worth it.
For Australian service businesses, a dead letter queue also supports Privacy Act compliance by giving you a full audit trail of what happened to every event.

If your voice AI stack doesn't have a dead letter queue, it has an unknown number of leads it's already dropped. You just don't know which ones. If you want to know what else your current build might be quietly losing, DM me AUDIT and I'll send you five questions that'll show you where the gaps are.

Originally published at theautomate.io.

Before You Sign With Any AI Vendor, Check These Three Things

TheAutomate.io — Mon, 22 Jun 2026 21:49:30 +0000

TL;DR

Before signing with any ai vendor, demand three clauses in writing or walk away from the deal.
A named deliverable, a pre-deployment baseline, and an exit clause are non-negotiable contract requirements.
If you skip these, you're the buyer who later reports no usable output and no recourse.

Picking an ai vendor without these three things in writing is how you end up paying for months of activity that never becomes output. Not bad luck. A predictable outcome.

Why Do So Many AI Vendor Deals Produce No Usable Output?

Because the contract never defined what output meant.

This is the pattern. A business owner gets a demo. It looks sharp. The pitch is confident. They sign. Weeks pass and nothing ships that they can point at. When they push back, the vendor gestures at work done rather than results delivered. The contract didn't say which was required.

It's not unique to AI. But AI makes it worse because the category is new enough that buyers don't know what to demand yet. Vendors know this. Some exploit it. Most don't mean to. But the outcome's the same either way.

Don't go into a vendor conversation hoping for good faith. Go in with a checklist.

What Is a Named Deliverable and Why Does It Matter With Any AI Vendor?

A named deliverable is a specific, measurable thing the ai vendor agrees to produce. Not a process. A thing.

"Implementation support" is not a deliverable. "A working voice agent that handles inbound qualification calls for your mortgage broker workflow" is a deliverable. The difference is whether you can look at the end date and say yes or no.

If your contract says the vendor will "build and configure AI systems to support your operations", that's a blank cheque for them and a dead end for you. Push for specificity. What does it do. For which workflow. By when. And what does done actually look like.

Vendors who push back on naming deliverables are telling you something important.

What Is a Pre-Deployment Baseline and Why Do You Need One Before Signing?

A pre-deployment baseline is a snapshot of your current performance, captured before the ai vendor touches anything. Without it, you can't prove the work moved the needle.

This matters more than buyers realise at the time of signing. If you don't know how many leads you were calling back manually, how long that was taking, or how many were falling through the cracks, you have nothing to compare the new system against. The vendor can tell you anything after deployment and you can't argue with it.

Capture the baseline yourself if the vendor won't ask for it. It takes a short conversation and a spreadsheet. Track the numbers that matter to your specific workflow before anything gets switched on.

For context on why production systems need proper state management to even surface these metrics, our post on why production agents store state in Postgres rather than the model covers the architecture decision behind accurate reporting.

Baseline data protects you. It also tells a serious vendor what good looks like. Both parties benefit.

Does Every AI Vendor Contract Need an Exit Clause?

Yes. Every single one.

An exit clause defines the conditions under which you can leave without penalty. It's not pessimism. It's basic commercial hygiene. Every mature vendor will have one. The ones who resist it are the ones you most need it with.

Here's what a reasonable exit clause covers:

Ownership of any data, workflows, or configurations built during the engagement
Notice period for termination
What happens to your integrations if you leave
Any lock-in tied to proprietary tooling or formats

This matters especially when an ai vendor builds inside a platform they also own. You want your CRM data, your call recordings, your prompt logic. In writing. Before you start. The ACMA doesn't care who built your outbound calling system. If it breaches compliance, it's your problem. Make sure you can access and audit it.

We ran into a version of this dependency problem from a different angle. It's worth reading about what model dependency actually costs when a tool goes dark mid-build.

Is This Checklist Actually Enough Before Engaging an AI Vendor?

It's the floor, not the ceiling. But most buyers don't even have the floor.

Three clauses won't catch every bad vendor. But they filter out most of the situations that end in frustration. If an ai vendor won't name a deliverable, won't support a baseline, and won't give you an exit, you've learned what you needed to know before any money changed hands.

Buyers who report no usable output almost always skipped one or more of these steps. It's not a coincidence. It's the predictable result of signing a contract that was never designed to produce anything specific.

Key Takeaways

Every ai vendor contract needs a named deliverable. Vague scope is the vendor's friend and your problem.
Capture a pre-deployment baseline before anything gets built. You can't prove ROI without a before.
An exit clause is non-negotiable. Own your data, your logic, and your ability to leave.

If you're in finance broking, insurance, accounting, or real estate and you're about to sign with an ai vendor, or you already have and something feels off, DM AUDIT and I'll send you the five questions worth asking before it gets complicated.

Originally published at theautomate.io.

Why Production Agents Store State in Postgres, Not the Model

TheAutomate.io — Fri, 19 Jun 2026 13:58:46 +0000

TL;DR

Production agents that rely on context window state alone break when calls drop, retry, or hand off mid-conversation.
Persisting call progress to Postgres means any retry picks up exactly where the last attempt stopped.
If you're shipping voice AI for real clients, state management is the architecture decision that matters most.

Production agents need somewhere reliable to store what's happened during a call. The model's context window isn't that place. Postgres is.

Why Does Context Window State Break Production Agents?

Context window state is ephemeral. The moment the call ends, the connection closes, or the model resets, everything in that context is gone.

For a demo build, that's fine. The call completes cleanly, no retries, no handoffs, no dropped connections mid-qualification. But real calls don't behave that way. A lead's mobile drops. A Retell AI webhook times out. The agent gets restarted because a new model version was deployed. Any of those events wipes the context clean. The agent starts over. The lead gets asked the same questions again. They hang up.

This is the failure mode that doesn't show up in testing. It shows up at 7pm on a Tuesday when a finance broker's best lead of the week calls back and gets treated like a stranger.

What Does Postgres State Persistence Actually Look Like?

At the start of every call, the agent writes a session row to Postgres. Every meaningful step writes an update. Every retry reads that row first.

The structure is simple. You've got a call ID, a contact ID, a status field, and a JSON column that holds whatever the agent has collected so far. Loan purpose, property value, suburb. Whatever the qualification script needs.

When a retry fires, the agent doesn't start from scratch. It reads the Postgres row, sees what's already been confirmed, and picks up from the next unanswered question. The lead doesn't notice. The broker gets a complete record.

This is the same pattern used in any stateful background job. The agent is just a worker. Postgres is the job queue and the audit log rolled into one.

Does Storing State Outside the Model Cost More?

The infrastructure cost is predictable and well below the cost of a bad call experience.

A Postgres instance sized for typical voice agent session volume adds a modest, fixed line to your stack. It doesn't scale with call volume the way model tokens do. Speaking of which, if you're already thinking carefully about what goes into the model's context on each turn, you're already doing cost control. Keeping state in Postgres actually helps here. You write a structured summary back to the context rather than re-injecting a full conversation transcript.

That's a smaller prompt on every retry. Fewer tokens. Lower cost per call. The voice agent cost breakdown post covers how those per-call costs stack up across model, telephony, and TTS. State persistence is one of the levers that keeps that number flat.

How Do Production Agents Handle Retries With Persistent State?

Retry logic becomes simple when the agent can read exactly what it last confirmed.

Without Postgres, a retry means choosing between two bad options. Re-run the full call from the top and annoy the lead, or skip qualification and send the broker an incomplete record. Neither is good.

With Postgres, the retry flow looks like this:

Read the session row for this contact and call ID
Check which qualification steps have a confirmed value
Inject only the confirmed fields into the model context as a brief summary
Resume from the first unanswered step
Write each new answer back to Postgres as it's confirmed

This is also where the cheap model first, expensive model on retry pattern pairs well. The retry call carries structured context, not a wall of transcript. A cheaper model can handle resumption. You only escalate to the expensive model if the call hits a point that needs better reasoning.

Is This Approach Relevant for Australian Compliance Obligations?

Yes. Postgres-backed state gives you an auditable record of every call attempt, which matters under Australian Privacy Act obligations.

The Office of the Australian Information Commissioner is clear that organisations handling personal information need to be able to account for what was collected and when. A voice agent that writes structured session data to a persistent store gives you that audit trail. Context-only agents don't. If a broker's client later asks what was captured during a qualification call, a Postgres row answers that question. A model context that no longer exists doesn't.

This isn't a compliance lecture. It's just a practical reason why persistent state is the right default, not an optional upgrade.

Key Takeaways

Production agents that rely on context window state alone will fail when calls drop, retry, or restart.
Persisting state to Postgres lets retries resume mid-qualification rather than starting over.
Structured state in Postgres reduces token usage on retries and supports a cheaper-model-first cost strategy.
An auditable session record is good practice under Australian privacy obligations, not just a technical nicety.

If you're building voice AI for finance, insurance, or real estate clients and you're not sure how your current stack handles dropped calls or retries, that's worth a closer look. DM AUDIT and I'll send you five questions that'll show exactly where the gaps are.

Originally published at theautomate.io.

The Quote Was $0.07 a Minute. The Bill Wasn't.

TheAutomate.io — Mon, 15 Jun 2026 22:24:16 +0000

TL;DR

The $0.07/min voice agent cost quote is real. It's just not the whole bill.
In production, voice agent cost runs $0.13 to $0.31 per minute once the model, telephony, and TTS are counted.
If you're budgeting for voice AI, you need all four layers. Not just one.

That $0.07 a minute figure is accurate. It's the voice engine. Voice agent cost in a live production build is a different number entirely once you count everything that actually makes the call work.

Why Does the $0.07 Quote Feel Like a Bait and Switch?

It's not a bait and switch. It's a scope mismatch. The vendor quoted one layer of a four-layer stack.

The voice engine handles real-time audio. That's the part most platforms lead with in their pricing pages. It's the most visible component and it's genuinely priced around that figure. But it doesn't think, it doesn't route calls, and it doesn't turn text into speech. It just moves audio.

When a prospect asks "what does it cost", they mean end to end. Most pricing pages don't answer that question. They answer a much smaller one.

This is why voice agent cost surprises people after they commit. Not because anyone lied. Because the question and the answer weren't about the same thing.

What Does the LLM Add to Voice Agent Cost?

The language model is the most variable part of the voice agent cost stack, and it's often the biggest surprise.

Every time your agent speaks, it's running a prompt through a model. Faster, cheaper models keep the cost low. More capable models cost more per call. The gap between them is real, and the right choice depends on how much reasoning your use case actually needs.

This is exactly why the cheap-first, expensive-on-retry pattern exists. You route straightforward turns through a cheaper model and only escalate to the heavier one when the call demands it. It's one of the most practical ways to control voice agent cost in production without sacrificing call quality.

If you're running a high volume of calls, the model layer is where your cost discipline either holds or falls apart.

What Does Telephony Add to the Bill?

Telephony is the part nobody mentions in the demo. It's also unavoidable.

Calls have to travel somewhere. Whether you're using Twilio, Vonage, or a platform-bundled solution, there's a per-minute charge on the PSTN side. Some platforms include it. Most don't. If you're calling Australian mobile numbers, you're paying Australian termination rates.

According to ACMA's numbering and infrastructure guidance, calls to certain number ranges carry different cost structures. Worth checking before you assume flat-rate global pricing applies to your use case.

Telephony alone won't blow your budget. But if you didn't model it in, it'll make your unit economics look worse than expected once real calls start flowing.

What Does Text-to-Speech Do to Voice Agent Cost?

TTS is cheap per character, but it runs on every single utterance the agent makes. It adds up.

Every word your agent says goes through a TTS engine. ElevenLabs, Deepgram, Cartesia, platform-native options. They're all priced differently. Some are billed by character. Some by minute. Some are bundled into the voice platform tier.

The quality gap between providers is real. A cheap TTS voice sounds robotic. That matters for conversion on outbound calls, especially if you're in finance broking or insurance where trust is everything. You're not going to choose a voice that undermines the call just to save fractions of a cent.

The full picture on voice agent cost looks something like this:

Voice engine (real-time audio routing)
Language model (reasoning and response generation)
Telephony (PSTN call routing and termination)
Text-to-speech (converting model output to audio)

All four are real costs. All four run on every call. The $0.13 to $0.31 per minute range reflects that reality.

So How Do You Keep Voice Agent Cost Under Control?

Model selection and call design are the two levers you actually control.

You can't negotiate telephony rates much. TTS is mostly fixed by quality tier. But you can control how often the heavy model fires, and you can design calls that resolve faster.

Shorter calls with tighter prompts cost less per outcome. It's not about being cheap. It's about not burning budget on unnecessary turns. A well-scoped agent that handles one job cleanly will almost always beat a do-everything agent on unit economics.

For a deeper look at where AI build costs can spiral in unexpected ways, the model dependency post covers what happens when a key piece of your stack disappears mid-build. The cost there isn't just dollars.

Key Takeaways

The $0.07/min quote is the voice engine only. Production voice agent cost runs $0.13 to $0.31 per minute all in.
Four cost layers sit under every call: voice engine, LLM, telephony, and TTS. Model your budget against all four.
Call design and model routing are the main levers. Shorter, tighter calls cost less per outcome.

If you're about to sign off on a voice AI build and haven't stress-tested the cost model, DM me AUDIT. I'll send you the five questions worth asking before you commit.

Originally published at theautomate.io.

Cheap Model First, Expensive Model on Retry: Voice Agent Cost Control

TheAutomate.io — Mon, 15 Jun 2026 10:36:31 +0000

TL;DR

Route every voice agent turn through a cheap model first to keep inference costs flat without cutting quality.
Only escalate to an expensive model when the cheap model signals low confidence or produces a bad output.
This pattern suits any production voice agent where cost discipline and accuracy both matter.

The fastest way to blow your inference budget on a voice agent is to route every single turn through your most capable model. You don't need to. Here's why.

Why Does a Voice Agent's Cost Spike Without This Pattern?

Most turns in a live call are simple. Routing all of them to a premium model is waste, plain and simple.

Think about what a voice agent actually handles in a typical service call. Confirming an appointment. Spelling back a name. Asking a qualification question. These are not tasks that require your most powerful model. They need speed and coherence, not depth.

The problem is that most builders set a single model for the whole agent and forget about it. That default decision quietly eats margin on every call, every day. By the time you notice, the bill is already there.

What Does the Two-Model Architecture Actually Look Like?

The pattern is a simple routing decision: cheap model first, expensive model only when the cheap one fails or flags uncertainty.

In practice, you configure your voice agent to send every turn to a fast, low-cost model. That model handles the straightforward stuff. When it returns a response that falls below a confidence threshold, produces a nonsense answer, or hits an edge case it can't resolve cleanly, a retry fires. That retry goes to a more capable, more expensive model.

The expensive model doesn't run on every turn. It runs when it needs to. That's the whole idea.

This is the same principle behind building evals into your AI agent workflow. Cheap and fast handles the bulk. The expensive layer handles the tail.

How Do You Define When a Retry Should Fire?

The retry condition is where this pattern lives or dies. Get it wrong and you're back to routing everything to the expensive model anyway.

There are a few reliable signals to watch for. You can trigger a retry on:

Low token-level confidence from the cheap model
A hallucinated entity (name, date, dollar figure) that doesn't match the call context
A response that contradicts a known fact in the caller's record
A turn where the agent's reply is blank, truncated, or internally inconsistent
A post-call eval flag from your testing suite catching a known failure pattern

The last point matters more than it sounds. If you're running LLM evals as unit tests for your agent, those evals can feed directly into your retry logic. A failure pattern caught in testing becomes a trigger condition in production.

What Are the Real Trade-offs of This Approach?

This pattern trades a small increase in latency on retried turns for a meaningful reduction in overall inference spend.

When the retry fires, the caller waits a beat longer than usual. That's the cost. In a voice agent context, a short pause is far less noticeable than in a chat interface. Most callers interpret a brief pause as the agent thinking. That's fine. What's not fine is a wrong answer delivered fast.

The other trade-off is engineering overhead. You need to define your retry conditions carefully. Set them too broadly and most turns hit the expensive model anyway. Set them too narrowly and bad outputs slip through to the caller.

This is not a set-and-forget config. It's a living decision that you tune as your call data grows. Bodies like ACMA don't regulate inference routing, but they do care about caller experience and consent. A voice agent that keeps producing wrong answers because your retry threshold is too tight is a compliance and reputation risk, not just a UX problem.

Does This Pattern Work Across Different Voice Agent Use Cases?

Yes. The pattern is use-case agnostic. It applies anywhere you're running a voice agent at scale with real inference costs.

For finance brokers and insurance firms running inbound qualification calls, most of the conversation is structured and predictable. The cheap model handles it comfortably. The expensive model only wakes up when a caller goes off-script or mentions something the agent hasn't seen before.

For real estate or accounting firms running outbound follow-up, the pattern works the same way. Routine turns stay cheap. Complex turns get the upgrade.

The point is that your voice agent isn't a single model. It's a routing system with two tiers. That framing changes how you build, how you test, and how you budget.

If you've run into the risk of locking into a single model entirely, this connects directly to the broader problem covered in model dependency and what it costs mid-build.

Key Takeaways

A voice agent routed entirely through a premium model is expensive by default, not by necessity.
The cheap-first, expensive-on-retry pattern keeps inference spend flat while preserving accuracy on hard turns.
Retry conditions need to be defined deliberately and tuned over time as real call data comes in.

If you're running a voice agent and the inference bill is growing faster than the call volume, this is worth looking at properly. DM me AUDIT and I'll send you five questions to figure out where your routing is leaking spend.

Originally published at theautomate.io.

Model Dependency Killed My Build Mid-Session

TheAutomate.io — Sun, 14 Jun 2026 06:48:31 +0000

TL;DR

Model dependency is a real build risk: Anthropic's Fable 5 went offline mid-build due to a US export-control order on 12 June 2026.
I switched to Opus 4.8 mid-session and kept shipping. The commits from that afternoon are still in the repo.
Treat the model like any other dependency: verify what it gives you, keep it swappable.

Model dependency isn't a theoretical risk. On 12 June 2026, mid-build, it became a very real one for me.

What Actually Happened?

On 9 June 2026, Anthropic launched Fable 5. It was the strongest model available that week, and I was building saas.theautomate.io on it.

Days from launch. Deep in the build. The kind of session where you're in flow and the last thing you want is a variable to change.

On 12 June, a US export-control order pulled Fable 5 access for foreign nationals. I'm in Australia. Mid-session, the model stopped responding.

No warning. No grace period. Just gone.

What Does Model Dependency Actually Cost?

In my case, an afternoon. That's it. But only because I'd built the session in a way that didn't assume the model would always be there.

I switched to Opus 4.8 and kept shipping. The commits from that afternoon are still in the repo. The launch wasn't delayed.

But here's what I kept thinking: if I'd been less careful about how I'd structured the context, the prompts, the assumptions baked into the session, it could have cost a lot more.

Model dependency is silent until it isn't. Then it's loud.

What Did I Realise About Building With AI Models?

The model you build with is a dependency you don't control. Same as any third-party API, same as any SaaS tool, but with less documentation when it disappears.

This is the bit most builders skip. You pick a model because it's the best available right now. You optimise for it. You get used to how it reasons, how it handles edge cases, how it responds to your prompts.

And then a policy decision in a country you're not in changes everything.

It's not a criticism of Anthropic. Export-control law is export-control law. The lesson is structural, not political.

This connects directly to why verifying what your AI produces matters as much as what it produces. If you can't trust the output, the model name on the tin doesn't matter.

How Do You Build to Survive Model Dependency?

You treat the model like any other swappable dependency and build accordingly from day one.

For foreign builders using US AI infrastructure, this isn't paranoia. It's engineering discipline. Export-control orders, terms-of-service changes, model deprecations, regional restrictions. Any of them can pull a model mid-project.

The US government directive that suspended Fable 5 access is a reminder that the AI infrastructure most of us build on is regulated from offshore.

Here's what actually helps:

Keep your prompts in plain text files, not baked into a single tool's interface
Write evals before you're in crisis mode, not after
Document your assumptions about model behaviour separately from the code
Test on at least two models during development, not just the one you prefer
Don't optimise so hard for one model's quirks that switching becomes a rewrite

And as I've written before, plan mode is where this discipline pays off most. Decisions made in planning are cheap. Decisions made mid-crisis are expensive.

Is This Just a Problem for Solo Builders?

No. It's a problem for anyone shipping production systems that depend on a single model from a single provider.

Solo builders feel it sharply because there's no team to absorb the disruption. But the same model dependency risk sits inside every automation stack that calls an AI endpoint without a fallback.

For Australian SMBs using AI-integrated tools, whether it's voice agents, document processing, or CRM automation, the question isn't whether this will happen to your vendor. It's whether your vendor planned for it.

The honest answer from most of them is: probably not explicitly.

This is why the build decisions you make early matter so much. Not just for performance. For resilience.

Key Takeaways

Model dependency is a structural build risk, not a theoretical one. It happened mid-build on 12 June 2026.
Losing access to Fable 5 cost an afternoon, not a launch, because the session was built to be swappable.
Treat your model like any other third-party dependency: verify its outputs, document your assumptions, and keep a fallback ready.

Building something with AI at the core? DM me AUDIT and I'll send you five questions to check whether your stack survives losing its primary model mid-project. Takes ten minutes. Worth it before you find out the hard way.

Originally published at theautomate.io.

LLM Evals Are the Unit Tests of AI Agent Work

TheAutomate.io — Tue, 09 Jun 2026 21:50:27 +0000

TL;DR

LLM evals are the unit tests of AI agent work. If you're not writing them, regressions will find your clients before you do.
We started writing them after the third production regression. Same story every time: a prompt change breaks a downstream behaviour silently.
This post is for builders shipping real production agents who want a repeatable way to catch breakage before it goes live.

LLM evals are automated checks that verify your agent still behaves the way you intended after every change. They're the unit tests of LLM work. And if you haven't written any yet, you're flying blind.

What are LLM evals and why do they matter?

LLM evals are test cases that check whether your language model outputs match expected behaviour given a known input.

Think of them the way a developer thinks about unit tests. You write a test that says: given this input, the model should respond in this way. You run the test after every change. If it breaks, you know before you deploy. Without LLM evals, you're relying on manual spot-checking or, worse, client complaints to surface regressions. Neither scales. Neither protects your reputation.

For AI agent builders, evals matter even more than they do in traditional software. Model behaviour can shift with a single prompt edit. A word change in a system prompt can alter tone, compliance posture, or the entire decision branch. You need a fast way to know when that happens.

What problem do LLM evals actually solve?

They catch silent regressions. The changes that don't throw an error but break the intended behaviour anyway.

The production regression pattern is predictable. You tweak a prompt to fix one thing. Something else shifts. The agent starts handling an edge case differently. Nobody notices until a client calls. By then the damage is done.

This is the specific failure mode LLM evals guard against. Not syntax errors. Not crashed workflows. The subtle drift where the agent's tone hardens, or it stops collecting a required field, or it starts offering information it shouldn't. That kind of breakage is invisible to logs. Evals make it visible.

You can read more about how prompt description choices affect downstream behaviour in this post on why code description is now the bottleneck.

How do you structure an LLM evals architecture in production?

You need three things: a set of fixed test inputs, a defined expected output or rubric, and a runner that checks the two against each other after every deploy.

The inputs are real examples drawn from production. Actual edge cases. The scenarios that burned you. The rubric can be exact-match for structured outputs, or LLM-as-judge for open-ended responses where you need a model to score another model. The runner sits in your CI pipeline or triggers on every N8N workflow deploy.

For anyone building on N8N, the cost structure here is worth thinking about carefully. A self-hosted instance gives you the freedom to run evals on every change without per-execution billing. That matters when you're running dozens of test cases on every push. For context on how infrastructure cost shapes build decisions, see this breakdown on managed agents vs N8N agent cost.

A working eval suite for a production voice agent typically covers:

Intent classification: does the agent recognise what the caller wants?
Field extraction: does it collect the right data without being prompted twice?
Compliance guardrails: does it stay within scope and avoid prohibited topics?
Tone consistency: does it match the persona defined in the system prompt?
Handoff triggers: does it escalate correctly when the condition is met?

What are the real trade-offs with LLM evals?

Writing evals takes time upfront. Not writing them costs more time later, just distributed across incidents you didn't plan for.

The most common objection is that it slows down shipping. It does, briefly. But a production regression in a voice agent calling finance broker leads has real consequences. Compliance exposure. Broken client trust. Manual cleanup. The upfront cost of writing evals is small compared to any of those.

The other trade-off is eval quality. A bad eval gives you false confidence. If your rubric is too loose, the test passes and the regression still ships. LLM-as-judge approaches help here, but they introduce their own variability. The Gartner research on AI reliability reinforces the industry-wide consensus: testing and monitoring LLM outputs is non-negotiable for production deployments.

The best time to write your first eval was before your first production regression. The second best time is now.

Key Takeaways

LLM evals are the unit tests of AI agent work. They catch silent regressions before your clients do.
Start with real production edge cases as your test inputs. Don't write theoretical scenarios.
The cost of not writing LLM evals shows up as incidents, not as time on a sprint board.

If your production agents don't have LLM evals yet and you want a second set of eyes on the build, DM AUDIT and I'll send you five questions. We'll work out whether your current setup is exposed and what it'd take to fix it.

Originally published at theautomate.io.

Code Description Is Now the Bottleneck (Not the Code)

TheAutomate.io — Mon, 08 Jun 2026 22:15:04 +0000

TL;DR

Spec-Kit hit 90,000 stars in seven months because code description is now the slowest part of building.
Writing the code is no longer the hard part. Telling the AI what to build clearly enough is.
Builders shipping real systems need to get better at spec work, not just prompting.

Spec-Kit hit 90,000 stars in seven months. That's the signal. Code description is the new bottleneck, and the tools that solve it are growing faster than almost anything else in the dev stack.

Why Did Spec-Kit Grow So Fast?

Because the AI can write the code. It can't read your mind.

Spec-Kit solves a specific pain: documenting what code does so the AI has enough context to work with it accurately. That's not a glamorous problem. But it's the real one. Most builders have felt it. You hand Claude or Cursor a messy repo and the output degrades fast. The model doesn't know what's connected to what. Code description fills that gap. Spec-Kit growing this fast tells you the gap was enormous.

What Does This Mean for Builders Shipping Automation?

The craft is shifting from writing to describing, and most builders haven't caught up.

I ship voice AI agents and N8N workflows for Aussie service businesses. Finance brokers, insurance outfits, accounting firms. The build itself moves fast now. Claude Code can wire a Retell AI agent to GHL in a fraction of the time it used to take. But the part that still slows everything down? Getting the spec right before a single line runs. What should the agent say? Under what conditions? What happens when the call goes sideways? That's code description work, even when there's no code yet. This is exactly why Plan Mode is the cheapest phase of any automation build. You're solving the description problem before it becomes a build problem.

Where Does Code Description Actually Break Down?

It breaks at handover, at scope changes, and whenever someone new touches the system.

Here's where most teams feel the pain:

A workflow built in N8N has no documentation. The next builder can't extend it without reverse-engineering it.
A voice agent prompt made sense to the person who wrote it. Six weeks later, nobody remembers why a particular branch exists.
A client asks for a change. The builder has to re-read the whole system before touching anything.
The handover call turns into a two-hour archaeology session instead of a clean knowledge transfer.

Code description isn't just about AI tools reading your codebase. It's about humans understanding what was built. Both problems compound when you skip it. We bake description discipline into every handover. The offboarding kit we give clients includes prompts in plain text precisely because of this.

Is Code Description a Skill or a Tool Problem?

It's both. But the skill comes first.

Spec-Kit is a tool. It helps. But a tool won't fix unclear thinking. If you can't describe what a system should do in plain English, no amount of tooling saves the build. The developers and AI builders getting the best output from Claude Code right now aren't necessarily the fastest typists or the deepest prompt engineers. They're the ones who can write a clear, specific brief before they start. That's a writing skill. A thinking skill. And it's in short supply. The broader research on AI-assisted development from McKinsey consistently points at specification quality as a key driver of output quality. Not the model. Not the tool. The input.

What Should You Actually Change in Your Build Process?

Write the spec before you open the IDE. Every time.

That sounds obvious. Almost nobody does it consistently. Here's what it looks like in practice on my builds:

Write what the system needs to do in plain English before any tooling opens.
Define the edge cases explicitly. What happens when the call drops? When the CRM field is empty? When the lead says no?
Document every workflow node's intent, not just its function. Future you will thank present you.
Treat the spec as a deliverable, not a pre-task. It earns its own line on the project.

This is the same reason we charge for the handover call. Documentation and description work is real labour. It prevents the rebuild that costs three times as much.

Key Takeaways

Spec-Kit's growth to 90,000 stars in seven months is a clear signal: code description is the current bottleneck in AI-assisted development.
Writing code is no longer the hard part. Describing what you want, clearly enough for an AI to act on it reliably, is.
Builder discipline around specs and documentation pays back on every handover, every change request, and every new system that gets bolted on.

If you're building automation for your business and your specs are more vibes than structure, let's fix that before it costs you. DM me AUDIT and I'll send you the five questions I ask before touching any new build.

Originally published at theautomate.io.

Plan Mode Is the Cheapest Phase. Use It.

TheAutomate.io — Thu, 04 Jun 2026 22:00:25 +0000

TL;DR

Plan mode is the cheapest phase of any automation build and the only one that decides if you're solving the right problem.
Skipping it means you ship fast and fix slowly. Usually expensively.
This is for builders and SMB owners who want working systems, not polished mistakes.

Plan mode is where the real work happens. Not the build. Not the deploy. The phase where you figure out what you're actually trying to fix.

What Is Plan Mode and Why Does It Cost So Little?

Plan mode is thinking time. It's the phase before you touch a single tool, write a single node, or record a single voice prompt.

No compute costs. No API calls. No Retell AI sessions burning credits. Just you, a doc, and a clear-eyed look at the problem. That's why it's cheap. It's also why most people rush it. Cheap feels like optional. It's not.

The decisions made in plan mode follow the build everywhere. What the agent says. What it doesn't say. Which CRM field it writes to. Whether it calls at all. Get those wrong here and you'll be unravelling them for weeks.

What Does "Solving the Right Problem" Actually Mean?

It means confirming the pain before you build the fix. Most automation briefs describe a symptom, not a root cause.

A finance broker says they need an AI agent to follow up leads faster. That's the symptom. The root cause might be that their intake form collects the wrong data. Or that the lead source is low-quality. Or that their follow-up sequence is fine but nobody's reviewing call outcomes.

Build a fast follow-up agent for a broken intake form and you've just automated the wrong thing. Faster. Plan mode is the only phase that forces that question before it's expensive to answer.

What Should You Actually Do in Plan Mode?

Map the process that exists, not the process you wish existed.

Talk to the person who handles the task manually. Watch them do it once if you can. Most of what they know isn't written down anywhere. It lives in their head and their inbox and their gut feel about which leads are worth a call.

Here's what plan mode should produce before you open N8N or GHL:

A plain-English description of the current process, step by step
The specific failure point the client wants fixed
The data inputs the automation will need and where they actually come from
The compliance constraints that apply (ACMA and DNCR matter for any outbound calling in Australia)
A clear definition of what "done" looks like

Without that list, you're guessing. And guessing in build mode is expensive.

How Does Skipping Plan Mode Show Up in the Build?

It shows up as rework. Lots of it. Usually after the client has already seen a demo.

You build a voice agent that handles inbound queries. Looks good in testing. Then you find out the client's CRM doesn't have an API endpoint that maps to what the agent needs to write. Or the call flow assumes a linear conversation and the actual leads are anything but. Or the agent's been routing calls to a number that rings a phone nobody picks up.

None of these are AI problems. They're plan mode problems. You can read more about how scoping breaks down in production in 18 N8N Workflows for Clients Who Had No Idea What an API Was. Same pattern comes up every time.

For a deeper look at how compliance constraints should factor into planning for AU finance and insurance builds, the ACMA guidance on outbound calling is worth reading before you scope anything.

Does Plan Mode Change How You Price the Work?

Yes. Plan mode should be a paid engagement, not a free discovery call.

If you're doing plan mode properly, it takes real time. You're mapping processes, identifying data sources, checking compliance requirements, and writing a brief that the entire build depends on. That's billable work. It's also where you earn the right to charge what the build is actually worth.

We covered the same logic in Why We Charge for the Handover Call. The principle's identical. If a phase of the project produces real output that shapes everything after it, it's not a courtesy. It's a line item.

Plan mode done well also protects the client. They get a written record of what was agreed before anyone touches a tool. That's worth paying for too.

Key Takeaways

Plan mode is the cheapest phase of any build. It's also the only one that decides whether you're solving the right problem.
Skipping it doesn't save time. It moves the cost into rework, late-stage scope changes, and broken demos.
Plan mode output should be a written brief covering the current process, the specific failure point, the required data inputs, and the compliance constraints.
If your plan mode is thorough enough to be useful, it's thorough enough to be paid.

If you're not sure whether your current build has a plan mode problem or an execution problem, DM AUDIT and I'll send you the five questions I ask before touching any client system.

Originally published at theautomate.io.

The Offboarding Kit: What Clients Actually Get Back

TheAutomate.io — Mon, 01 Jun 2026 22:12:20 +0000

TL;DR

The offboarding kit is the clean exit package every automation client deserves, covering prompts, data, recordings, and credentials.
Clients leave with everything they need to run or rebuild without you.
If you're building for Australian SMBs, a solid offboarding kit is also basic professional conduct.

A good offboarding kit hands the client back everything that belongs to them. Prompts in plain text. Knowledge base exported. Recordings in their storage. Credentials rotated.

Why does the offboarding kit matter at all?

Because a builder who keeps your IP in their account isn't a builder. They're a gatekeeper.

Most automation engagements end quietly. The client moves on, the builder moves on, and nobody thinks hard about what actually got handed over. That's fine until the client needs to change something, and realises they can't. The offboarding kit exists to close that gap before it opens. It's the document that says: here's everything, it's yours, you don't need us to run it.

This also matters for how you're perceived. An Australian SMB owner in finance or real estate talks to other Australian SMB owners. Word gets around. Clients who leave with a clean offboarding kit tend to refer more than clients who leave feeling like something was left behind.

What goes into the prompts section of the offboarding kit?

Every system prompt, every branch, every persona instruction delivered as plain text files the client can open, read, and edit without logging into anything.

Not a screenshot. Not a PDF. Plain text. The reason is simple: if the client ever needs to hand this to another builder, that builder should be able to pick it up cold. Proprietary formats create unnecessary friction.

We document the prompt structure in a short README alongside the files. What each prompt does, what inputs it expects, what it's been tuned to avoid. Not a novel. Just enough that someone competent can orient themselves in under an hour.

We charge for the handover call that walks clients through this material. If you're curious why that's a line item and not a courtesy, the reasoning is here.

How do you export the knowledge base cleanly?

Export everything the agent used to answer questions. Raw documents, structured FAQs, any product or compliance content that was loaded into context.

The knowledge base is often the thing clients built over months of iteration. Answering edge cases, adding product nuances, correcting the agent when it said something wrong. That history lives in the files you loaded. It shouldn't disappear when the engagement ends.

We export in formats the client can actually use. Not just what the platform spits out by default. If the source documents were Word files, they leave as Word files. If it was a structured FAQ, they get a clean spreadsheet. The client should be able to hand this to a new provider and say "this is what the agent knows".

For businesses in regulated sectors, retaining this material is more than good practice. The Australian Privacy Act has specific obligations around data handling that clients in finance and insurance should be across before anything gets deleted.

Where do call recordings go in the offboarding kit?

Recordings go into storage the client controls. Not the builder's Retell AI account. Not an N8N cloud bucket with shared credentials.

This one catches people out. Recording infrastructure set up quickly tends to default to builder-owned storage. It's convenient during the build. It becomes a problem at the end of it.

A proper offboarding kit migrates recordings to client-owned cloud storage before the engagement closes. Or, better, sets that up from day one. The client should be able to pull a specific call, review it, share it with a compliance team, without asking anyone for access.

This is especially relevant for finance brokers and insurance operators where call recordings have a practical compliance function. Not hypothetically. Actually.

What does credential rotation look like at offboarding?

Every API key, every webhook secret, every platform login the builder had access to gets rotated the day the engagement ends.

Not the day after. Not when someone gets around to it. The day it ends.

A clean offboarding kit includes a credential rotation checklist. The items on it:

Retell AI API keys (generate new, revoke old)
N8N webhook secrets and any external service credentials stored in workflow variables
GHL sub-account access removed for builder user
Any third-party integrations where the builder's personal API key was used instead of a dedicated service account
SMS and email sending credentials if those were provisioned through the builder's account

Rotation isn't about distrust. It's hygiene. The client's stack shouldn't be exposed because a builder's account gets compromised six months after the engagement ended. For more on how self-hosted N8N affects this compared to managed options, this cost and control breakdown is worth reading.

Key Takeaways

The offboarding kit is what separates a professional engagement from a dependency trap.
Prompts in plain text, knowledge base exported, recordings in client storage, credentials rotated. Four things. Do all four.
Set up clean ownership from day one and the offboarding kit writes itself.

If you're an Australian SMB thinking about automation and want to know whether your current setup would survive a clean exit, DM AUDIT and I'll send you the five questions we use to check.

Originally published at theautomate.io.