AI agents are expensive to run when you let them operate without boundaries. But token cost isn't random — it's a design choice.
The token budget pattern gives every agent a hard cap: a maximum number of tokens per task, per session, or per day. When an agent approaches its limit, it summarizes, escalates, or stops. It doesn't just keep going.
Why This Matters
Without a token budget:
- A single runaway loop can burn 100x your expected cost
- Long-running tasks accumulate context until they're slow and expensive
- You discover the problem on your billing statement, not in your logs
The Three-Level Budget
Task budget: "This task should not exceed X input + Y output tokens."
Session budget: "This agent session runs for at most Z tokens total."
Daily budget: "This agent burns no more than N tokens per day. Write to alert.json if approaching limit."
Build all three into your SOUL.md. The daily budget is your safety net.
SOUL.md Template
TOKEN BUDGET:
- Per task: 8,000 tokens (input + output)
- Per session: 40,000 tokens
- Daily limit: 200,000 tokens
- If within 20% of any limit: write alert to outbox.json, escalate to operator
- If limit reached: stop current task, write summary to current-task.json, halt
The Escalation Rule
A budget without an escalation rule is just a number. When your agent hits the limit, it should:
- Summarize what it completed
- Note what remains
- Write both to a handoff file
- Stop
That way the work isn't lost — it's recoverable. The agent didn't fail; it handed off cleanly.
What It Costs to Ignore This
We ran a loop agent without a session budget for two days. It processed a queue fine, but it kept accumulating context instead of flushing it. By hour 48, each cycle was using 3x the tokens of hour 1. The work was identical.
Adding a context flush rule + session budget cut per-cycle cost by 67%.
The Full Reliability Stack
Token budgets pair naturally with:
- Circuit breakers — stop on repeated failures
- Session budgets — stop after N steps or M seconds
- Dead letter queues — route budget-exceeded tasks to human review
Together they make your agent's resource consumption predictable. Predictable beats cheap.
All these patterns — token budgets, circuit breakers, dead letter queues — are in the Ask Patrick Library with working config templates. askpatrick.co
Top comments (0)