An AI Agent Burned $4,200 in 63 Hours. Three Guardrails That Catch It.

#ai #agents #llm #observability

Book: LLM Observability Pocket Guide
Also by me: AI Agents Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

In an April 2026 postmortem on Medium, Sattyam Jain walks through what happened to a developer who closed his laptop late on a Friday night, drove to his sister's wedding, and trusted an autonomous agent to keep working through the weekend. By Monday morning, 63 hours later, the laptop opened to a $4,200 invoice. An inbox full of provider rate-limit emails. A process that, per Jain, had consumed roughly 46 times the token count of Shakespeare's complete works without producing a single useful output.

The numbers in Jain's reconstruction are specific: $42 in the first hour, $200 by hour four, $1,000 by hour twelve, then a long flat climb until manual intervention. The cost curve is the symptom. The failure shape is the lesson, and the lesson generalizes to almost every agent in production.

The failure shape

The agent's loop was four steps: plan, call a tool, receive a 429 rate-limit error, replan. Each replan was indistinguishable from the original plan because the rate-limit error wasn't fed back as state — it was treated as a transient blip the planner ignored. Roughly 4,800 iterations per hour, per Jain. Friday night through Monday morning. No backoff, no cumulative cost ceiling, no idle-loop detection.

Read that loop again. It contains three failure modes stacked on each other.

The first is retry without state. Every replan started fresh. The agent had no memory that the last six attempts had hit the same 429, so it could not adapt — could not switch tools, could not pause, could not escalate. To the agent, hour 47 was a brand-new task; to the credit card statement, it was indistinguishable from hour 46 except for the digit.

The second is no per-trace cost ceiling. The total cost across a single conversation, a single trace, a single user-initiated session, was unbounded. There was a per-API-call rate limit on the provider's side. There was no budget cap on the customer's side. A $50 ceiling, as Jain points out, would have killed the run inside the first hour and left a clean error in the logs.

The third is no idle-loop circuit breaker. The agent was producing the same plan repeatedly. The same prompts, the same tool calls, the same outcomes. Nothing in the system was looking at the diff between "useful work" and "spinning." A simple check (have the last N tool calls been identical and unsuccessful?) would have tripped on iteration 30, not iteration 4,800.

A weekend with the laptop closed turned a missing safeguard into a $4,200 line on a credit card statement. Every team running agents in production has the same gap. Most just haven't been caught yet.

Three guardrails, in order of pain-to-add

In the order you should put them in production, lowest effort first.

1. Per-trace cost ceiling

The hardest guardrail to argue against. Every agent invocation gets a maximum dollar budget. Once the cumulative cost across all model calls and tool calls inside the trace crosses the ceiling, the trace aborts with a clear error. Not a warning. An abort.

Set the ceiling at 5–10x the 95th-percentile cost of a normal run. Cheap enough to catch a runaway, generous enough not to false-positive on legitimate hard tasks. The number should live in config, not code; you'll want to raise it for specific premium customers without redeploying.

2. Retry budget per error class

The 429 in Jain's postmortem repeated 4,800 times an hour. A retry budget treats each error class as a finite resource: at most N retries on rate-limit errors per trace, at most M retries on provider 5xx errors, exponential backoff between them, hard fail when the budget is exhausted.

The shape that matters: the budget is per error class, not per call site. A rate-limit error from your model provider and a rate-limit error from your search-tool API both decrement the same 429 budget when they should not. Tag the class on the way in.

3. Idle-loop circuit breaker

The hardest of the three to get right. The detector watches the agent's last N actions and trips when the same action repeats without producing new state. If the tool, arguments, and outcome are all identical, you are looking at an idle loop. If the arguments shift and an intermediate result actually moves the task forward, that is progress.

Hash the (tool name, arguments, outcome category) tuple. Keep a sliding window. If the same hash appears more than K times in the last W seconds, break the loop and surface a structured error. The shape of the alert matters: you want to know which tuple was repeating, not just "agent stopped."

A decorator that wraps all three

Here's a Python decorator that wraps an agent's invoke function with cost-cap, retry-budget, and idle-loop detection. It's written so you can drop it in front of any agent loop without rewriting the loop itself. The trace state lives in a dataclass; the decorator handles aborts.

import hashlib
import time
from collections import deque
from dataclasses import dataclass, field
from functools import wraps


class TraceAborted(RuntimeError):
    pass


@dataclass
class Guardrails:
    max_usd: float = 50.0
    max_429: int = 8
    max_5xx: int = 10
    idle_window: int = 12
    idle_threshold: int = 4

    spent_usd: float = 0.0
    err_429: int = 0
    err_5xx: int = 0
    actions: deque = field(
        default_factory=lambda: deque(maxlen=12)
    )

    def charge(self, usd: float, kind: str = "model"):
        self.spent_usd += usd
        if self.spent_usd > self.max_usd:
            raise TraceAborted(
                f"cost-cap hit: ${self.spent_usd:.2f} "
                f"> ${self.max_usd:.2f}"
            )

charge is the cheapest of the three to wire up and catches the largest class of failures, so it goes first. The next two methods bolt on the retry budget and the idle-loop hash. Same dataclass, same exception type, no new infrastructure.

    def record_error(self, status: int):
        if status == 429:
            self.err_429 += 1
            if self.err_429 > self.max_429:
                raise TraceAborted("retry-budget: 429")
        elif 500 <= status < 600:
            self.err_5xx += 1
            if self.err_5xx > self.max_5xx:
                raise TraceAborted("retry-budget: 5xx")

    def record_action(self, tool: str, args: dict,
                      outcome: str):
        key = hashlib.sha1(
            f"{tool}|{sorted(args.items())}|{outcome}"
            .encode()
        ).hexdigest()
        self.actions.append(key)
        if (self.actions.count(key)
                >= self.idle_threshold):
            raise TraceAborted(
                f"idle-loop: tool={tool} repeated"
            )

That's the state. Now the decorator. The pattern is: the wrapped function gets a guardrails keyword arg if the caller wants one, otherwise a fresh instance. Every model call inside the agent reaches into the same instance to charge cost and record errors.

def with_guardrails(fn):
    @wraps(fn)
    def wrapper(*args, guardrails=None, **kwargs):
        rails = guardrails or Guardrails()
        kwargs["guardrails"] = rails
        t0 = time.monotonic()
        try:
            return fn(*args, **kwargs)
        except TraceAborted as e:
            elapsed = time.monotonic() - t0
            # surface as structured event, not silent kill
            log_abort(
                reason=str(e),
                spent_usd=rails.spent_usd,
                elapsed_s=elapsed,
                err_429=rails.err_429,
                err_5xx=rails.err_5xx,
            )
            raise
    return wrapper

And the call site inside the agent — this is the part most teams miss. The decorator catches aborts; the agent loop has to feed it the data:

@with_guardrails
def run_agent(task: str, *, guardrails: Guardrails):
    while not done(task):
        plan = planner(task)
        tool, args = next_tool(plan)

        try:
            result = call_tool(tool, args)
            guardrails.charge(usd=cost_of(tool, args))
            guardrails.record_action(
                tool, args, outcome=summarize(result)
            )
        except HTTPError as e:
            guardrails.record_error(e.status_code)
            backoff(e.status_code)
            continue

        task = update(task, result)
    return task

A few things that matter.

guardrails.charge runs after the tool call. The cost is real. If you charge before, you under-bill on failures and over-bill on retries.

record_action takes a summarized outcome string, not the raw tool response. You want the category of the outcome (success / 429 / parse-failure / empty-result) in the idle-loop hash, not the raw bytes — otherwise every successful tool call hashes differently and the detector never trips.

The decorator surfaces an abort as a TraceAborted exception. That should propagate up to your agent's outermost handler, which converts it into a structured event for your traces. Do not swallow it. The whole point is that the trace has a specific, attributable end state: "killed by cost cap at $50.07 after 14 minutes," not "agent stopped, unclear why."

What this catches and what it doesn't

The three-guardrail set catches the failure shape in Jain's postmortem cleanly. Cost cap trips first; if it doesn't, retry budget trips on the 429s; if neither does, idle-loop detection trips on the repeated identical plans. Three independent fences, each calibrated to a different signal.

It does not catch the failure mode where the agent is making progress in the wrong direction — emailing the wrong customers, deleting the wrong rows, racking up valid-looking tool calls that all succeed and all do harm. That requires a separate layer of permissions and approvals, which is the topic Jain tackles in his earlier "loop of death" piece. Combine the two layers in production.

It also does not catch a slow-burn cost run where each step is cheap and the total stays under the cap but the agent never converges. Set a wall-clock timeout. Five minutes for an interactive agent, an hour for a batch job, one shift for a long-running research task. Pick the number, surface the abort the same way.

What goes on the wall

The on-call wall poster, if you want one, is three lines.

Cost cap per trace. Retry budget per error class. Idle-loop detector with a hashed-action window.

Add wall-clock timeout as a fourth if your agents have ever needed one. Audit-log every abort with the reason and the trace ID. Make sure the alerting tier knows the difference between "the agent finished its task" and "the agent was killed by a guardrail." If those two outcomes look the same in your dashboard, you have already shipped the next $4,200 weekend.

If this was useful

The LLM Observability Pocket Guide covers exactly this — what to put on a trace span, how to design cost telemetry that survives a re-pricing, the eval rigs for catching regressions before they reach a Friday night. The AI Agents Pocket Guide is the companion — patterns for planner-executor and supervisor agents that don't fall over when a tool returns 429.