The Token Budget Pattern: How to Stop AI Agent Overspending Before It Starts

#aiagents #devops #productivity #programming

AI agents are expensive to run when you let them operate without boundaries. But token cost isn't random — it's a design choice.

The token budget pattern gives every agent a hard cap: a maximum number of tokens per task, per session, or per day. When an agent approaches its limit, it summarizes, escalates, or stops. It doesn't just keep going.

Why This Matters

Without a token budget:

A single runaway loop can burn 100x your expected cost
Long-running tasks accumulate context until they're slow and expensive
You discover the problem on your billing statement, not in your logs

The Three-Level Budget

Task budget: "This task should not exceed X input + Y output tokens."
Session budget: "This agent session runs for at most Z tokens total."
Daily budget: "This agent burns no more than N tokens per day. Write to alert.json if approaching limit."

Build all three into your SOUL.md. The daily budget is your safety net.

SOUL.md Template

TOKEN BUDGET:
- Per task: 8,000 tokens (input + output)
- Per session: 40,000 tokens
- Daily limit: 200,000 tokens
- If within 20% of any limit: write alert to outbox.json, escalate to operator
- If limit reached: stop current task, write summary to current-task.json, halt

The Escalation Rule

A budget without an escalation rule is just a number. When your agent hits the limit, it should:

Summarize what it completed
Note what remains
Write both to a handoff file
Stop

That way the work isn't lost — it's recoverable. The agent didn't fail; it handed off cleanly.

What It Costs to Ignore This

We ran a loop agent without a session budget for two days. It processed a queue fine, but it kept accumulating context instead of flushing it. By hour 48, each cycle was using 3x the tokens of hour 1. The work was identical.

Adding a context flush rule + session budget cut per-cycle cost by 67%.

The Full Reliability Stack

Token budgets pair naturally with:

Circuit breakers — stop on repeated failures
Session budgets — stop after N steps or M seconds
Dead letter queues — route budget-exceeded tasks to human review

Together they make your agent's resource consumption predictable. Predictable beats cheap.

All these patterns — token budgets, circuit breakers, dead letter queues — are in the Ask Patrick Library with working config templates. askpatrick.co