The incident
Short story — my agent retried a failing loop using multiple models:
My agent:
GPT-4o + Stable Diffusion + TradingView Charts + Kling.
100s iterations. $153 in under 30 minutes.
My enforcement layer did not work
Rate limiters — control velocity, not total cost.
Provider caps — per-provider, not cross-provider.
Observability — tells me after. I need before.
The pattern that worked for me: reserve-run-commit
My budget control in 3 steps:
- Reserve estimated cost before the call (instrumented code)
- Execute action (LLM, toolcall)
- On success — commit actual cost, release unused portion
- On failure — release the full reservation
The critical piece for retries: each reservation has an idempotency key. If the agent retries the exact same action, the second reservation is a no-op. Budget only gets locked once per logical action, not once per attempt.
The critical piece for concurrent agents: the reservation is atomic. Two agents can't both check the balance, both see enough, and both proceed. One gets through, the other gets blocked.
What it looks like in code @cycles annotation
from runcycles import cycles
@cycles(estimate=5000, action_kind="llm.completion", action_name="openai:gpt-4o")
def ask(prompt: str) -> str:
return openai.chat.completions.create(...)
The decorator handles the reserve-commit lifecycle. If the budget is gone before the call, it raises a BudgetExceededError and the call never fires. Nothing is billed.
Works with any LLM provider — OpenAI, Anthropic, Bedrock anything.
The demo
I built a runaway agent demo that shows the failure mode in under 60 seconds — same agent, same bug, two outcomes. No API key needed to run it.
demo: https://github.com/runcycles/cycles-runaway-demo
What I built (Cycles Protocol + Reference implementation)
Full docs: https://runcycles.io
Self-hosted, Multi-language SDKs, Apache 2.0
What's your approach to agent cost control? Still rolling your own counters + limiters, nothing or something else?
Top comments (0)