How to stop an AI agent from burning $47,000 in a loop nobody noticed.

#security #ai #devops #agents

A multi-agent research system sat in production for eleven days doing exactly what it was built to do. Four agents, LangChain-style, coordinating over A2A to pull market data and summarize it. Every health check passed the whole time. No crash, no 500, no timeout... from the outside the system was perfectly healthy.

Two of the four agents had quietly locked into a recursive loop, passing clarification requests and verification instructions back and forth, thousands of times, around the clock. Nobody noticed because nothing was technically wrong. The thing that finally caught it was a person opening the invoice and asking why the number was so high. The number was $47,000.

That story has been making the rounds because it's so relatable, but it isn't a one-off. Uber said it burned through its entire 2026 AI coding budget in four months. One company reportedly ran up a $500M Claude bill after rolling out access with no usage caps. The FinOps Foundation said that around April, the conversation across the industry flipped from "go fast" to "we need guardrails, how do we control this." This is the dominant operational failure mode in production agents right now, and it's barely about the model at all.

Why this keeps happening

The reason isn't that people are careless. It's that the spend cap, when it exists at all, lives in the wrong place.

Most teams "control" cost with monitoring. A billing alert. A dashboard. Maybe a daily spend report. But every one of those is a postmortem. By the time the alert fires, the money is already gone. The $47K loop didn't trip anything because the team was watching user metrics, signups, queries completed, response quality, not per-agent spend in real time. The bill was a monthly line item, not a guardrail.

And the other common fix, putting a budget check inside the agent, has its own problem: the agent is the thing that's misbehaving. A loop that's lost the plot isn't going to cleanly evaluate its own "am I allowed to keep going" check. You're asking the runaway process to stop itself. Sometimes it does. The eleven-day ones are when it doesn't.

There's also the part nobody likes to admit: nothing here was broken. The agents were "working." That's exactly why it ran for eleven days. A failure that looks like normal operation won't get caught by anything watching for failures.

The fix is a hard ceiling that lives outside the agent

The cap has to sit somewhere the loop can't reach, and it has to fire before the call, not after the invoice. That means a budget that's enforced on the tool call itself, not advice in a prompt and not an alert after the fact.

This is the part I work on, so here's how we do it in Faramesh. Your whole policy lives in one file, and a budget is a few lines:

agent "research-crew/analysis" {
  default deny

  rules {
    permit market_data_lookup
    permit summarize
  }

  rate_limit "market_data_lookup": 30 per minute

  budget daily {
    max       $50
    on_exceed defer
  }
}

Two things are doing the work. The rate_limit caps how fast a single tool can be hit, so an agent stuck in a tight loop can't fire the same call hundreds of times a minute. The budget block puts a hard daily ceiling on spend, and on_exceed defer means that when the ceiling is hit, the next call doesn't run, it pauses and waits for a human. The loop stops at the dollar, not at the invoice.

The important part is where this runs. Faramesh sits between the agent and its tools as a local daemon, and every tool call goes through it before it executes. The decision is deterministic, there's no model in that path, so the same spend under the same policy always gets the same answer. The runaway agent can't talk its way past it, because the check isn't inside the agent. It's a wall in front of the tool.

On the $47K incident specifically: a daily cap would have turned an eleven-day, $47,000 silent loop into a one-day, $50 pause and a notification. Same loop. Same bug. Wildly different outcome, because the ceiling didn't depend on anyone noticing.

The takeaway

Runaway cost isn't really a model problem and it isn't bad luck. It's an architecture problem. The spend limit is usually in a place the spend can route around, inside the agent, or after the fact in a billing alert. Move it outside the agent and in front of the action, and the worst case stops being "we found out when the invoice came" and starts being "it paused and pinged us."

Faramesh is open source. If you want to put a hard ceiling in front of your agents, the repo's at github.com/faramesh/faramesh-core. If you wire it up and something's off, tell me, that's all useful feedback right now.

Top comments (2)

Gautam K • Jul 30

I've been circling the same problem space and kept arriving at your architecture, enforcement outside the agent, deterministic, in front of the tool call. Then I found Faramesh.

What's the pull actually been? Are people installing it and leaving it in, or reading the post and moving on? And is anyone treating agent spend as a budget line yet, or is it still an engineer's side concern?

Alex Morgan • Jun 24

This is the right architecture. The pattern you're describing — cap outside the agent, enforced before the call, not after the invoice — maps to what I've been calling "upstream economic grounding" in my own writing on AI FinOps. Most of the market is building downstream dashboards to watch the fire, and you've put the extinguisher in front of the agent. The on_exceed defer behavior is the detail that makes this production-ready — pausing rather than hard-failing means the loop stops at the dollar, not at a crashed pipeline.

Curious about your thinking on how this interacts with model-level budgets (Anthropic's task budgets, etc.) — do you see those as complementary layers or competing approaches? Either way, useful project. Glad someone's building this.