DEV Community

Albert Mavashev
Albert Mavashev

Posted on • Originally published at runcycles.io

Your AI Agent Budget Check Has a Race Condition

When I first started putting budget limits around agent workflows, I thought the solution would be simple.

Track the spend.

Check what is left.

Stop the next call if the budget is gone.

That works in a demo. It even works in light testing.

Then you run the same workflow with concurrency, retries, or a restart in the middle, and the whole thing gets shaky.

The problem is not the math.

The problem is where the decision gets made.

The naive version

A lot of first implementations look roughly like this:

def call_model(prompt: str, estimated_cost: int) -> str:
    remaining = get_remaining_budget()

    if remaining < estimated_cost:
        raise RuntimeError("budget exceeded")

    result = llm_call(prompt)

    actual_cost = calculate_cost(result)
    record_spend(actual_cost)

    return result
Enter fullscreen mode Exit fullscreen mode

At first glance, this seems fine.

  • Check the remaining budget
  • Make the call
  • Record the spend

For a single worker, single process, no retries, no failures, it mostly works.

Production is not that environment.

Where it breaks

1. Concurrency

Say you have $5 left.

Now 10 workers all check the budget at about the same time.

They all read the same value.

They all think they have room.

They all proceed.

You did not have one bug. You had ten correct reads and one broken design.

That is a classic time-of-check vs time-of-use problem.

2. Retries

Now add retry logic.

Maybe the model call times out.

Maybe the network flakes.

Maybe your framework retries automatically and your application retries too.

Did the first attempt spend money?

Did the second one?

Did both get recorded?

Did neither?

If your budget tracking is tied to local control flow, retries turn accounting into guesswork.

3. Restarts and crashes

If the process dies after the model call but before record_spend(), your state is wrong.

The money is gone.

Your counter says it is not.

If you keep the budget in memory, a restart makes it even worse. The counter resets. The spend does not.

The real issue

A budget check inside application code is not an authority.

It is a hint.

It is only as correct as the current process, the current thread, and the current execution path. Once multiple workers share the same budget, you need the decision to happen in one place, atomically.

That means:

  • check the budget
  • reserve the amount
  • make the call
  • reconcile the actual cost

Not:

  • read the budget
  • hope nothing else changes
  • make the call
  • update later

Those are not the same thing.

A better pattern: reserve, execute, commit

The simplest durable shape I found was this:

reservation = reserve_budget(
    scope="tenant/acme/workflow/summarizer",
    amount=estimated_cost,
    idempotency_key=run_step_id,
)

try:
    result = llm_call(prompt)
    actual_cost = calculate_cost(result)

    commit_budget(
        reservation_id=reservation.id,
        amount=actual_cost,
    )
except Exception:
    release_budget(reservation_id=reservation.id)
    raise
Enter fullscreen mode Exit fullscreen mode

That changes the semantics in a useful way.

Before the model call happens, the budget is already spoken for.

A concurrent worker cannot grab the same dollars.

A retry can reuse the same idempotency key.

A failure can release what was reserved but not spent.

The important part is not the API shape.

The important part is that the reservation is atomic.

Why this belongs outside the agent

I also learned that once agents start sharing budgets across tenants, workflows, runs, and tools, this logic stops being “just a wrapper.”

Now you need:

  • atomic reservations
  • idempotency
  • retry safety
  • shared scope rules
  • audit history
  • behavior that is consistent across runtimes

At that point, budget control starts to look less like a helper function and more like infrastructure.

That was the point where I stopped treating it as app code and pulled it into its own service.

One practical rule

If your budget check looks like:

remaining = read_balance()
if remaining >= estimated_cost:
    do_the_thing()
    write_new_balance()
Enter fullscreen mode Exit fullscreen mode

you do not have enforcement yet.

You have a race condition with good intentions.

Closing thought

Most agent failures are not exotic.

They come from very ordinary bugs:

stale reads, duplicate retries, counters that live in the wrong place, and side effects that happen before anyone realizes the run has gone off the rails.

A budget limit only matters if it can say no before the next step happens.

Everything else is reporting.


I ran into this while building Cycles, an open-source budget authority for autonomous agents. What looked like a simple spend check turned into a distributed systems problem: concurrency, retries, idempotency, and state that had to stay correct under failure.

That was the real lesson. Once multiple workers can spend from the same pool, the budget check has to be atomic or it is not real.

If you want to see how I implemented this in Cycles, more at https://runcycles.io

Top comments (2)

Collapse
 
acytryn profile image
Andre Cytryn

the TOCTOU framing is exactly right. and it's not just a concurrency thing. even with a single worker, a crash after the LLM call but before record_spend() means you've spent money with no record of it. the double-entry bookkeeping analogy actually maps well here: reserve creates a pending debit, commit settles it, release voids it. that's basically what payment rails have been doing for decades.

one thing I've run into: estimating cost upfront is hard when you don't know the output length. do you over-reserve and release the delta, or try to predict more accurately? curious how Cycles handles that.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.