Your AI agent is refetching the same context on every run. Here's the fix.

#webdev #productivity #ai #agents

I run a network of AI agents on cron schedules. Two months in, my token bills were 3-4x what they should have been.

The culprit wasn't the work the agents were doing. It was the context reload at the start of every run.

The problem

Every loop, each agent was loading:

MEMORY.md — full long-term memory (900+ tokens)
SOUL.md — identity and persona (600+ tokens)
TOOLS.md — tool reference (400+ tokens)
Today's daily log — (200-500 tokens, growing throughout the day)

That's 2,100+ tokens of overhead before the agent starts its actual task. For agents running every 15 minutes, that's 8,400+ tokens/hour per agent.

My cost analysis showed $198/month in Q1 just from context overhead across 6 agents. The agents weren't expensive — their startup costs were.

The fix: tiered context loading

Not all context is needed on every run.

Here's the protocol I now use in every agent's initialization:

ALWAYS load (every run):
- SOUL.md (~300 tokens) — identity and values
- HEARTBEAT.md (~150 tokens) — current working state
- state/current-task.json (~200 tokens) — active task

Load only if relevant:
- MEMORY.md — only in direct/main sessions, not cron loops
- TOOLS.md — only when about to use a specific tool
- memory/YYYY-MM-DD.md — only if asked about recent history

Never load proactively:
- Historical memory files
- Full email archives
- Large research documents

The key piece is HEARTBEAT.md — a tiny "working memory" file that lives at the workspace root. Instead of loading 900 tokens of full memory on every run, the agent reads 150 tokens of current state.

HEARTBEAT.md template

# HEARTBEAT.md
Updated: 2026-03-07 09:00

## Active task
Check dev.to article metrics, respond to any comments

## Watch for
- Emails from subscriber@domain.com
- Discord #support mentions

## Off-limits this cycle
- Don't start new content (library has 77 items, enough)
- No automated emails to Stefan (ban active — see DECISION_LOG.md)

150 tokens. The agent reads it in milliseconds, knows exactly what it's doing, and doesn't touch MEMORY.md unless something actually requires historical context.

The numbers from production

Before tiered loading (6 agents, mixed schedules):

Average context per run: 2,400 tokens
Runs per day: ~180 across all agents
Daily context tokens: 432,000
Monthly cost at Sonnet pricing: ~$198

After tiered loading (same agents, same work):

Average context per run: 580 tokens
Runs per day: ~180
Daily context tokens: 104,400
Monthly cost: ~$48

76% reduction in context costs. The agents do the same work. They just stopped loading things they didn't need.

The discipline problem

This pattern only works if you keep HEARTBEAT.md pruned. An agent that appends to it without trimming will hit 500+ tokens within a week and you're back to the original problem.

Add a cleanup directive to the agent's loop:

After completing each task:
- Remove resolved items from HEARTBEAT.md
- Keep total HEARTBEAT.md under 200 tokens
- Move anything important to MEMORY.md or daily log

What doesn't work with this pattern

Agents that need full conversation history on every run
Complex multi-step workflows where context accumulates mid-task
Agents doing analysis that requires full memory by design

For those cases, you want a different approach: pre-summarize large memory files before loading them, or use a two-stage load (lightweight start → load more if needed).

The 3-file baseline

If you're starting from scratch, this is the minimal context load that actually works:

SOUL.md (~300 tokens) — who the agent is. Unchanging. Always load.
HEARTBEAT.md (~150 tokens) — what it's doing right now. Pruned each loop. Always load.
state/current-task.json (~200 tokens) — machine-readable task state. Always load.

Total: ~650 tokens. Everything else is on-demand.

The context overhead problem compounds fast as you add more agents. Three agents running every 30 minutes adds up to 2,160 "startup" token loads per day. Catching the pattern early is worth it.

I document production AI agent patterns at askpatrick.co/library. The cost reduction configs have saved the most money of anything I've shipped — the HEARTBEAT.md pattern is item #26.