Alex Spinov

Posted on Jun 3 • Originally published at blog.spinov.online

A Budget Brake That Stops a Scraper Before $200

#python #webscraping #costoptimization #dataengineering

A spend alert is a receipt. It tells you the money is already gone.

That distinction sounds pedantic until a loop is running. In November 2025, four AI agents fell into an infinite retry loop. Nobody noticed for eleven days. The bill was $47,000. Joe Carpenter, who wrote that incident up on Dev.to on April 25, 2026, put the failure of the usual safety net better than I can: "Spend alerts — fire after the damage is done. An alert at $1,000 doesn't help when an agent burns $4,700 per day." (dev.to/dingdawg)

That line has been rattling around in my head for a week. Because it names the whole problem. The dashboard, the alert at 80%, the provider's monthly cap: every one of them tells you what happened. None of them refuses the next call. They narrate the fire. They don't stop it.

So this post is about the thing that does stop it: a small piece of code that sits in front of the spend and says no before any money moves. I'll show you the whole thing, about forty lines, standard library only. Run it locally, and paste the real output. Then I'll tell you exactly where it stops working, because it does.

TL;DR

Observability is after-the-fact. A spend alert at $1,000 cannot un-spend the $4,700 a runaway loop burned today. You need a guard before the step, not a report after it.
BudgetBrake does two things: it refuses the next step when the projected session total would cross a ceiling you set, and it stops a loop that keeps spinning on the same state without making progress.
Two guards, two very different price tags. In my demo the stasis guard caught a stuck retry loop at $0.30. The ceiling alone — on a loop whose cursor kept drifting — didn't fire until $200. Same bug, 600× difference in what it cost you.
The honest catch: a brake limits your future spend, not your past. It cannot tell a runaway loop from an expensive-but-legitimate run. And my cost numbers are an illustrative model, not a measured per-run invoice. I'll be clear about which is which.

The frame everyone gets backwards: observability is not enforcement

Here is the claim I want to defend, and I think it's worth arguing about: cost dashboards and spend alerts do not stop a runaway scraper. They describe it. The only thing that stops it is a brake that refuses the next run before it spends.

Observability is a receipt. Enforcement is a fuse. Those are different objects, and the industry sells you the first one as if it were the second.

Watch what each one actually does when a loop goes bad at 3am. The dashboard updates. The alert at 80% of budget fires, into a Slack channel nobody is reading at 3am. The provider's monthly hard cap is set to a number large enough to not be annoying, which means it's set too high to save you. Eleven days later someone opens the invoice. Every one of those tools did its job perfectly. The job just wasn't "stop the loop."

I am not arguing against observability. You absolutely want the receipt, for the post-mortem, for the trend, for knowing which actor is your money pit. I run that stuff too. I'm arguing it's the wrong layer for prevention. You don't put a smoke detector where you needed a circuit breaker.

Simon Willison made a related point on May 30, 2026, writing up how Anthropic contains Claude across products (simonwillison.net) — the whole piece is about putting hard limits around an agent so its blast radius is bounded, using sandboxes like gVisor and Seatbelt. Same instinct, different layer: he's containing what the agent can touch; I'm containing what it can spend. And Robinhood shipped a money version of the same idea on May 27 — they let AI agents trade stocks, but only out of a pre-loaded dedicated wallet, with bigger trades gated behind a human (techcrunch.com). A wallet with a fixed amount in it is a budget brake. It physically cannot spend what isn't there. That's the shape of the thing.

Where I'm coming from (and what I'm not claiming)

I should be straight about my evidence, because it's easy to fake authority on cost and I don't want to.

I have run 2,190 production scraper runs across 32 published actors. The single busiest one, a Trustpilot review scraper, has 962 runs by itself. Those are raw lifetime counters from my own dashboard at apify.com/knotless_cadence, as of May 2026, not a sampled estimate.

What that gives me is not a per-run billing ledger. I don't have a clean spreadsheet that says "run #1,847 cost me $0.41." Nobody logs that by default, and I didn't either. What I have is 2,190 runs of watching where the bill comes from — and it's almost never the steady-state scrape. It's the loop. A retry that re-fires the same failing request. A pagination cursor that doesn't advance and walks off the end of the data forever. A job that "finishes" but the scheduler restarts it every five minutes because the exit code is wrong. None of those show up as a single scary line item. They show up as a slow drip that you find on the invoice, which is exactly Carpenter's $4,700-a-day, eleven-days-late story at a smaller scale.

So when I put a dollar figure in the code below, treat it as an illustrative model, not as my measured cost. I'll mark it loudly. The mechanism is the real part. The number is a placeholder you replace with your provider's pricing.

The brake

The idea is small enough to hold in your head. Before every paid step, you ask the brake for permission. The brake projects what the session total would be if this step ran, and if that crosses your ceiling, it raises instead of returning. After the step, you tell it what the step actually cost. And separately, you feed it a key describing the work state (a page number, a cursor, a task id) so it can notice when the state stops changing, which is what a stuck loop looks like from the outside.

Three methods. reserve is the fuse before spend. settle is the bookkeeping after. note_progress is the stasis detector. That's it.

from dataclasses import dataclass, field


class BudgetExceeded(Exception):
    """Raised before a step that would push the session past its ceiling."""


class LoopStalled(Exception):
    """Raised when the run repeats the same state without making progress."""


@dataclass
class BudgetBrake:
    ceiling_usd: float                 # hard cap for the whole session
    max_stall: int = 5                 # identical states in a row before we stop
    spent_usd: float = 0.0             # what has actually been settled
    _last_state: object = field(default=None, repr=False)
    _stall_count: int = 0

    def reserve(self, estimated_usd):
        """Call this BEFORE a paid step. Refuse if the projection crosses the cap."""
        projected = self.spent_usd + estimated_usd
        if projected > self.ceiling_usd:
            raise BudgetExceeded(
                f"refused step: projected ${projected:.2f} "
                f"exceeds ceiling ${self.ceiling_usd:.2f} "
                f"(spent ${self.spent_usd:.2f} + next ${estimated_usd:.2f})"
            )
        return projected

    def settle(self, actual_usd):
        """Call this AFTER the step with what it really cost."""
        self.spent_usd += actual_usd
        return self.spent_usd

    def note_progress(self, state_key):
        """Watch the work state. If it stops changing, the loop is stuck."""
        if state_key == self._last_state:
            self._stall_count += 1
        else:
            self._stall_count = 0
            self._last_state = state_key
        if self._stall_count >= self.max_stall:
            raise LoopStalled(
                f"no progress for {self._stall_count} steps "
                f"on state {state_key!r}; stopping at ${self.spent_usd:.2f}"
            )

Note the order in reserve: it checks before it lets you proceed. This is the whole difference from an alert. An alert subtracts after the fact and tells you the number is high. reserve refuses to let the number get there. The call site looks like this:

COST_PER_STEP = 0.05  # illustrative unit cost — set yours from provider pricing

def run_normal(pages):
    brake = BudgetBrake(ceiling_usd=200.0)
    for page in range(1, pages + 1):
        brake.reserve(COST_PER_STEP)      # ask permission before spending
        # ... do the paid work here (fetch, parse, LLM call) ...
        brake.settle(COST_PER_STEP)       # book the real cost
        brake.note_progress(f"page:{page}")  # state advances every step
    return f"normal run done: {pages} pages, spent ${brake.spent_usd:.2f}"

That COST_PER_STEP = 0.05 is the placeholder I warned you about. Five cents a step is a made-up unit so the demo is reproducible. Put your real number there: a fetch through a paid proxy, an LLM call, an actor compute unit, whatever your step actually costs. The brake doesn't care what the cost represents. It cares that you told it the truth about the magnitude.

Two ways a loop goes wrong, two guards

Now the interesting part — and the part that surprised me when I ran it.

There are two distinct failure modes, and they need different guards. A loop can spin on the same state forever (a retry that keeps hitting the same broken request). Or it can spend forever while appearing to make progress (a cursor that increments past the real data and just keeps paginating into the void). The first is caught by note_progress. The second is only caught by the ceiling. Here are both, with the runaway loops written out:

def run_runaway():
    """A retry loop that never advances. Two guards could catch it; stasis fires first."""
    brake = BudgetBrake(ceiling_usd=200.0)
    step = 0
    while True:
        step += 1
        try:
            brake.reserve(COST_PER_STEP)
        except BudgetExceeded as exc:
            return f"BRAKE (ceiling) at step {step}: {exc}"
        brake.settle(COST_PER_STEP)
        try:
            brake.note_progress("page:42-retry")  # same state every time
        except LoopStalled as exc:
            return f"BRAKE (stasis) at step {step}: {exc}"


def run_runaway_no_stasis_guard():
    """Same loop, but the cursor drifts each step, so only the ceiling stops it."""
    brake = BudgetBrake(ceiling_usd=200.0)
    step = 0
    while True:
        step += 1
        try:
            brake.reserve(COST_PER_STEP)
        except BudgetExceeded as exc:
            return f"BRAKE (ceiling) at step {step}: {exc}"
        brake.settle(COST_PER_STEP)
        brake.note_progress(f"page:{step}")  # state moves, stasis never trips

And here is the actual output from running the file. Not paraphrased. I ran python3 budget_brake.py and pasted what it printed:

cost model: $0.05/step (illustrative — set yours)

normal run done: 120 pages, spent $6.00
BRAKE (stasis) at step 6: no progress for 5 steps on state 'page:42-retry'; stopping at $0.30
BRAKE (ceiling) at step 4000: refused step: projected $200.00 exceeds ceiling $200.00 (spent $199.95 + next $0.05)

Read those last two lines together, because that's the headline.

The stuck retry loop got stopped at step 6, having spent thirty cents. The stasis guard noticed the state page:42-retry hadn't changed in five tries and pulled the plug. The drifting loop — the one that looks like it's working because the page number keeps going up — ran all the way to step 4000 and $200 before the ceiling stopped it. Same kind of bug. Same wasted compute. One cost you a coffee's worth; the other cost you the whole budget. The difference is entirely whether your guard understands "no progress" or only understands "too much money."

That's the design lesson I didn't expect to learn from my own demo. A ceiling alone is a backstop, not a brake. It only fires when the damage is nearly maxed out, which is uncomfortably close to the spend-alert problem I started this post complaining about. The thing that actually saves you money is the progress check, because it catches the loop while it's cheap.

A floating-point footnote, because I'd want to know

One honest oddity in that output. The ceiling fired on step 4000 with the message saying projected $200.00 exceeds ceiling $200.00 — which reads like a contradiction, since 200 doesn't exceed 200. It's floating-point accumulation. After 3,999 additions of 0.05, the real stored total is 199.95000000001122, so the projected 200.00000000001123 genuinely is a hair over 200. The .2f formatting rounds it for display and hides the crime.

I'm leaving it in rather than papering over it, because it's a real thing that will bite you: if you accumulate money in floats over thousands of steps, your ceiling is fuzzy by a rounding error. For a $200 fuse, a fraction-of-a-cent slop is fine. If you need it exact, accumulate in integer cents or use decimal.Decimal. I mention it because pretending the demo was clean would be exactly the kind of too-perfect output that makes you distrust an article.

Where this brake stops working (the part I won't skip)

Tahosin, who wrote a sharp post on May 31 about adding a black-box logger to a Python agent and querying a $200 crash with DuckDB (dev.to/tahosin), described their own safety function as "a seatbelt, not a vault." I'm going to be just as blunt about mine, because a tool you trust past its limits is worse than no tool.

A budget brake limits your future spend. It does nothing about money already gone. If you call reserve after the expensive thing happened, you've already paid. The fuse only works in front of the spend, every time, with no path around it. One unguarded code path and the whole thing is theater.

It cannot tell a runaway loop from an expensive but legitimate run. If a real, correct job needs to spend $250 and your ceiling is $200, the brake stops it at $200 and looks like a false alarm. The ceiling encodes your appetite, not the job's correctness. You'll tune it, and tuning it wrong in either direction has a cost: too low and you kill good runs, too high and you're back to finding out on the invoice.

The stasis detector only catches loops that look stuck. A loop with a drifting cursor (my second demo) slides right past it, because from the outside the state keeps changing. Choosing the right state_key is the actual skill here, and it's job-specific. "Page number" is a bad key if your bug increments the page number. "Hash of the last response body" is better but not free.

And the cost numbers are a model. Five cents a step is illustrative. Your real per-step cost varies by proxy, by model, by token count, by retries-within-a-step. The brake is only as honest as the estimate you hand reserve. Garbage estimate in, false confidence out.

None of that makes it useless. It makes it a fuse, not a vault. A fuse that trips at thirty cents on a stuck loop is worth having, even knowing everything it can't do.

What I'd actually ship on Monday

Three moves, in order of effort.

Put a reserve call in front of every paid step, today, even with a rough cost estimate. A rough estimate that refuses is worth more than a precise alert that observes. This is the move that would have saved Carpenter's team $46,970.

Pick a state_key that genuinely advances when work advances, and feed it to note_progress. This is the cheap-catch guard — the one that stopped my loop at $0.30. Spend a minute making sure your key can't accidentally track the bug instead of the work.

Keep your dashboard and your alerts. They're the receipt, and the receipt is useful: for the post-mortem, for finding your money pit, for the trend line. Just stop asking the receipt to do the brake's job. Log it, query it later if you want (DuckDB over an event stream, like tahosin did, is a genuinely nice way to do the after-the-fact part). But put the fuse in front of the spend, where it can still say no.

Follow for the numbers from the next batch of production runs. And I'm genuinely curious: what's the worst surprise bill an agent or a scraper ever handed you, and what tripped first — an alert, a human, or nothing until the invoice? Tell me in the comments, I read every one.

Written with AI assistance; the code was run locally and the output above is the real, unedited program output. The cost figures are an illustrative model, clearly marked, not a measured billing ledger.