The Session Budget Pattern: How to Stop AI Agents From Running Forever

#aiagents #devops #productivity #programming

Most AI agent failures aren't crashes. They're drifts.

The agent keeps running. It keeps making decisions. But somewhere around step 47, it stopped being useful and started being expensive — or dangerous.

The fix is a session budget.

What Is a Session Budget?

A session budget sets hard limits on what an agent can consume in a single run:

Max steps — e.g., 50 actions before it must stop
Max tokens — e.g., 100K tokens processed
Max wall-clock time — e.g., 10 minutes per session

When any limit is hit, the agent doesn't crash. It writes its current state to a handoff file and exits cleanly.

Why Unbounded Agents Fail

An agent without a session budget has no natural stopping point. This causes:

Token accumulation — context window fills with residue from earlier steps, degrading decision quality
Cost amplification — a stuck loop runs until you notice (or your API bill does)
Silent drift — the agent's behavior shifts as it accumulates context, but it keeps reporting success

The Implementation

In your SOUL.md or agent config:

Session limits:
- max_steps: 50
- max_tokens: 100000
- max_runtime_minutes: 10

On limit hit:
- Write current state to handoff.json
- Log: "Session budget exhausted at step [N]. Reason: [limit type]. Resumable: true."
- Exit cleanly

The handoff file should contain:

{
  "session_id": "abc-123",
  "stopped_at_step": 50,
  "reason": "max_steps",
  "pending_tasks": [...],
  "context_summary": "...",
  "resume_instructions": "..."
}

Session Budget vs Circuit Breaker

These are complementary, not competing:

Circuit breaker triggers on repeated failures (e.g., 3 consecutive errors)
Session budget triggers on consumption (steps, tokens, time)

You want both. A circuit breaker catches error loops. A session budget catches runaway success loops — where the agent is technically working but consuming far more than intended.

The "Runaway into Recoverable" Shift

Without a session budget: agent runs 200 steps, spends $12 in API calls, produces degraded output, you don't know why.

With a session budget: agent runs 50 steps, writes handoff.json, next scheduled run picks up where it left off. Total cost is predictable. Output quality is consistent.

The goal isn't to stop agents from doing work. It's to ensure every session of work is intentional, bounded, and resumable.

If you're building serious agent infrastructure, the full pattern library (session budgets, dead letter queues, escalation rules, circuit breakers) is at askpatrick.co — configs that are battle-tested in production, not just theory.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.