The Token Budget Pattern: How to Stop AI Agent Cost Surprises Before They Happen
Most AI agent teams look at their API bill at the end of the month and try to work backwards to figure out what ran expensive. That's the wrong order.
The better approach: budget per task, before execution.
The Problem With Aggregate Billing
You're running 8 agents. Total spend this month: $180. Which task drove the spike? You don't know without digging. And by the time you're digging, you've already paid.
Silent cost accumulation is a first-class reliability problem. An agent that does the right thing but costs 10x your estimate isn't reliable — it's unpredictable.
The Token Budget Field
Add one field to your task state:
{
"task": "analyze_competitor_content",
"max_tokens": 50000,
"estimated_tokens": 12000,
"cost_estimate_usd": 0.18,
"status": "pending"
}
Before executing, the agent checks: will this task exceed max_tokens? If yes — write the estimate to outbox.json and stop. Flag it before running, not after.
What to Put in max_tokens
Start with 2x your p95 expected token count for that task type. After two weeks of logs, you'll have real data to calibrate against.
For reference:
- Simple data retrieval: 2,000–5,000 tokens
- Summarization tasks: 5,000–15,000 tokens
- Multi-step reasoning: 15,000–50,000 tokens
- Full document analysis: 50,000–200,000 tokens
The Pre-Execution Cost Check Pattern
def check_budget_before_run(task_spec: dict) -> bool:
"""Return True if safe to proceed, False if flagged."""
max_tokens = task_spec.get('max_tokens', float('inf'))
estimated_tokens = task_spec.get('estimated_tokens', 0)
if estimated_tokens > max_tokens:
flag_to_outbox({
'type': 'budget_exceeded',
'task': task_spec['task'],
'estimated': estimated_tokens,
'budget': max_tokens,
'action_required': 'approve or adjust budget before proceeding'
})
return False
return True
This runs in milliseconds. The cost of checking is zero. The cost of not checking compounds.
Track Actual vs Estimated
After each run, log actual spend:
{
"task": "analyze_competitor_content",
"estimated_tokens": 12000,
"actual_tokens": 14300,
"variance_pct": 19,
"timestamp": "2026-03-08T22:00:00Z"
}
After 30 days of this, you'll know which task types consistently over-estimate vs under-estimate. That's real operational intelligence — not guesswork.
The 5-Minute Token Budget Audit
- List your top 5 tasks by frequency
- Check the last 10 runs for each — what's the p95 token count?
- Set
max_tokensto 2x p95 for each - Add the pre-execution check to your task loop
- Review variance weekly — tighten budgets as you accumulate data
Real Numbers
After implementing per-task token budgets across 5 agents:
- Identified 2 tasks that ran 8x their expected cost on edge cases
- Monthly bill dropped from $180 to $94 — not from cutting tasks, from catching runaway runs before they completed
- Zero surprise spikes in the 30 days since implementation
The key insight: most cost spikes aren't from malicious or broken behavior. They're from tasks hitting unexpected input sizes that nobody modeled. A budget check catches this every time.
The configs I use for per-task token budgeting across 5 production agents are in the Ask Patrick Library. Pre-built, battle-tested, ready to adapt.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.