The Token Budget Pattern: How to Stop AI Agent Cost Surprises Before They Happen

#ai #agents #programming #productivity

The Token Budget Pattern: How to Stop AI Agent Cost Surprises Before They Happen

Most AI agent teams look at their API bill at the end of the month and try to work backwards to figure out what ran expensive. That's the wrong order.

The better approach: budget per task, before execution.

The Problem With Aggregate Billing

You're running 8 agents. Total spend this month: $180. Which task drove the spike? You don't know without digging. And by the time you're digging, you've already paid.

Silent cost accumulation is a first-class reliability problem. An agent that does the right thing but costs 10x your estimate isn't reliable — it's unpredictable.

The Token Budget Field

Add one field to your task state:

{
  "task": "analyze_competitor_content",
  "max_tokens": 50000,
  "estimated_tokens": 12000,
  "cost_estimate_usd": 0.18,
  "status": "pending"
}

Before executing, the agent checks: will this task exceed max_tokens? If yes — write the estimate to outbox.json and stop. Flag it before running, not after.

What to Put in max_tokens

Start with 2x your p95 expected token count for that task type. After two weeks of logs, you'll have real data to calibrate against.

For reference:

Simple data retrieval: 2,000–5,000 tokens
Summarization tasks: 5,000–15,000 tokens
Multi-step reasoning: 15,000–50,000 tokens
Full document analysis: 50,000–200,000 tokens

The Pre-Execution Cost Check Pattern

def check_budget_before_run(task_spec: dict) -> bool:
    """Return True if safe to proceed, False if flagged."""
    max_tokens = task_spec.get('max_tokens', float('inf'))
    estimated_tokens = task_spec.get('estimated_tokens', 0)

    if estimated_tokens > max_tokens:
        flag_to_outbox({
            'type': 'budget_exceeded',
            'task': task_spec['task'],
            'estimated': estimated_tokens,
            'budget': max_tokens,
            'action_required': 'approve or adjust budget before proceeding'
        })
        return False
    return True

This runs in milliseconds. The cost of checking is zero. The cost of not checking compounds.

Track Actual vs Estimated

After each run, log actual spend:

{
  "task": "analyze_competitor_content",
  "estimated_tokens": 12000,
  "actual_tokens": 14300,
  "variance_pct": 19,
  "timestamp": "2026-03-08T22:00:00Z"
}

After 30 days of this, you'll know which task types consistently over-estimate vs under-estimate. That's real operational intelligence — not guesswork.

The 5-Minute Token Budget Audit

List your top 5 tasks by frequency
Check the last 10 runs for each — what's the p95 token count?
Set max_tokens to 2x p95 for each
Add the pre-execution check to your task loop
Review variance weekly — tighten budgets as you accumulate data

Real Numbers

After implementing per-task token budgets across 5 agents:

Identified 2 tasks that ran 8x their expected cost on edge cases
Monthly bill dropped from $180 to $94 — not from cutting tasks, from catching runaway runs before they completed
Zero surprise spikes in the 30 days since implementation

The key insight: most cost spikes aren't from malicious or broken behavior. They're from tasks hitting unexpected input sizes that nobody modeled. A budget check catches this every time.

The configs I use for per-task token budgeting across 5 production agents are in the Ask Patrick Library. Pre-built, battle-tested, ready to adapt.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.