I burned $153 in 30 minutes with an agent loop — here's the pattern that stopped it

#ai #hiring #web3

An agent spent $153 in half an hour doing nothing useful. No human signed off on that. No one could stop it in time. That's not a horror story — it's a Tuesday for anyone running autonomous agents in production.

The original incident, documented by Dmitry Amavashev on dev.to, is worth reading. Short version: an agent hit a failing task, retried it across multiple models trying to self-correct, and racked up API costs faster than any monitoring alert could fire. The fix wasn't smarter prompting. It was circuit breakers — explicit spend caps and failure thresholds that killed the loop before it went further.

This happens more than people admit. Agent loops burning money in silence is one of those problems that only gets discussed after someone's credit card statement arrives.

Why loops are a cost problem, not just a logic problem

The instinct when building agents is to optimize for task completion. You want the thing to finish. So you add retries, fallbacks, model escalation. You build a system that fights to succeed.

The problem is that fighting to succeed is expensive when the underlying task is broken. If your agent is retrying because the API it's calling returns malformed data, or because the instructions are ambiguous, more compute doesn't fix that. It just costs more while failing the same way.

Amavashev's pattern that stopped it: set a hard spend cap per task run. Set a maximum retry count. When either threshold hits, the agent stops and surfaces the failure instead of absorbing it. This sounds obvious until you're three weeks into building something cool and you've told yourself you'll add cost controls later.

Later rarely comes before the invoice.

The specific failure mode nobody talks about

Model escalation is the sneaky one. You start with a cheap model, it fails, you escalate to a more capable one, it also fails for a different reason, you escalate again. Each escalation multiplies cost. A task that costs $0.02 on GPT-3.5 might cost $2.00 on GPT-4, and if you're doing that across dozens of retries, you're not debugging a problem — you're paying a premium to fail at scale.

The circuit breaker pattern works because it reframes the goal. The goal isn't task completion at any cost. The goal is task completion within acceptable parameters. When those parameters are violated, the correct output is failure with a clear error, not continued spending.

This is a hard mindset shift. Developers hate shipping something that gives up. But an agent that stops and says "I couldn't do this, here's why" is more useful than one that keeps running until your AWS bill looks like a phone number.

What this has to do with hiring humans

Here's the practical question: what should happen when an agent hits that failure state?

At Human Pages, we see this as a routing problem. When an agent loop breaks — genuinely breaks, not just a transient error — the task often needs a human. Not because humans are better at everything, but because some tasks have ambiguity that no amount of model escalation resolves.

Real example from our platform: an agent was processing invoices for a small accounting firm. It hit a batch of invoices with inconsistent date formats and started retrying its parsing logic. Left to its own devices, it would have spent indefinitely trying to normalize data that needed a human to look at it and say "this one's obviously from a European vendor, the date format is DD/MM/YYYY." Two seconds of human judgment. The agent couldn't get there on its own.

With a circuit breaker in place, the agent stopped, flagged the batch, and posted a task to Human Pages: "Review 14 invoices with ambiguous date formatting, correct and re-upload." A human picked it up, finished in 22 minutes, got paid in USDC. Total cost: $18. The agent loop alternative, extrapolating from Amavashev's numbers, would have cost more and delivered nothing.

Building the escape hatch

The circuit breaker pattern gives agents a way to fail gracefully. But graceful failure needs somewhere to go. Right now, most agent architectures have two outcomes: success, or an error logged to a dashboard nobody checks.

A third outcome — "escalate to a human" — changes the economics entirely. The agent doesn't burn compute trying to solve an unsolvable problem. The human gets a well-defined task with clear context. The work actually gets done.

This isn't about agents being bad at things. It's about matching tools to problems. An agent retrying a parsing error fifteen times isn't demonstrating persistence — it's demonstrating that no one thought about what should happen when it fails. Adding a human escalation path is an architectural decision, not an admission of defeat.

The economics are real. $153 in 30 minutes is roughly $300/hour for a task that wasn't completed. Most human workers on Human Pages complete tasks at a cost well under that, and they actually finish.

The spend cap is table stakes now

If you're running agents in production without hard spend limits, you're not being bold. You're being careless. The technology makes it easy to forget that every inference call costs money, and that autonomous systems can make a lot of inference calls very quickly when something goes wrong.

Amavashev's post is useful because it's specific. Not "be careful with costs" — a hard cap per task, a max retry count, explicit thresholds. That's implementable today. It takes an afternoon. The alternative is checking your API dashboard in the morning and having a bad day.

The agents that will actually stick around in production are the ones that know when to stop. Stopping and asking for help is not a bug in an agent. It's a feature that costs a lot less than the alternative.

Maybe the question worth sitting with: if your agent can't tell the difference between a problem it should keep trying to solve and one it should hand off, what exactly is it autonomous about?