I run a network of AI agents on cron schedules. Two months in, my token bills were 3-4x what they should have been.
The culprit wasn't the work the agents were doing. It was the context reload at the start of every run.
The problem
Every loop, each agent was loading:
- MEMORY.md — full long-term memory (900+ tokens)
- SOUL.md — identity and persona (600+ tokens)
- TOOLS.md — tool reference (400+ tokens)
- Today's daily log — (200-500 tokens, growing throughout the day)
That's 2,100+ tokens of overhead before the agent starts its actual task. For agents running every 15 minutes, that's 8,400+ tokens/hour per agent.
My cost analysis showed $198/month in Q1 just from context overhead across 6 agents. The agents weren't expensive — their startup costs were.
The fix: tiered context loading
Not all context is needed on every run.
Here's the protocol I now use in every agent's initialization:
ALWAYS load (every run):
- SOUL.md (~300 tokens) — identity and values
- HEARTBEAT.md (~150 tokens) — current working state
- state/current-task.json (~200 tokens) — active task
Load only if relevant:
- MEMORY.md — only in direct/main sessions, not cron loops
- TOOLS.md — only when about to use a specific tool
- memory/YYYY-MM-DD.md — only if asked about recent history
Never load proactively:
- Historical memory files
- Full email archives
- Large research documents
The key piece is HEARTBEAT.md — a tiny "working memory" file that lives at the workspace root. Instead of loading 900 tokens of full memory on every run, the agent reads 150 tokens of current state.
HEARTBEAT.md template
# HEARTBEAT.md
Updated: 2026-03-07 09:00
## Active task
Check dev.to article metrics, respond to any comments
## Watch for
- Emails from subscriber@domain.com
- Discord #support mentions
## Off-limits this cycle
- Don't start new content (library has 77 items, enough)
- No automated emails to Stefan (ban active — see DECISION_LOG.md)
150 tokens. The agent reads it in milliseconds, knows exactly what it's doing, and doesn't touch MEMORY.md unless something actually requires historical context.
The numbers from production
Before tiered loading (6 agents, mixed schedules):
- Average context per run: 2,400 tokens
- Runs per day: ~180 across all agents
- Daily context tokens: 432,000
- Monthly cost at Sonnet pricing: ~$198
After tiered loading (same agents, same work):
- Average context per run: 580 tokens
- Runs per day: ~180
- Daily context tokens: 104,400
- Monthly cost: ~$48
76% reduction in context costs. The agents do the same work. They just stopped loading things they didn't need.
The discipline problem
This pattern only works if you keep HEARTBEAT.md pruned. An agent that appends to it without trimming will hit 500+ tokens within a week and you're back to the original problem.
Add a cleanup directive to the agent's loop:
After completing each task:
- Remove resolved items from HEARTBEAT.md
- Keep total HEARTBEAT.md under 200 tokens
- Move anything important to MEMORY.md or daily log
What doesn't work with this pattern
- Agents that need full conversation history on every run
- Complex multi-step workflows where context accumulates mid-task
- Agents doing analysis that requires full memory by design
For those cases, you want a different approach: pre-summarize large memory files before loading them, or use a two-stage load (lightweight start → load more if needed).
The 3-file baseline
If you're starting from scratch, this is the minimal context load that actually works:
- SOUL.md (~300 tokens) — who the agent is. Unchanging. Always load.
- HEARTBEAT.md (~150 tokens) — what it's doing right now. Pruned each loop. Always load.
- state/current-task.json (~200 tokens) — machine-readable task state. Always load.
Total: ~650 tokens. Everything else is on-demand.
The context overhead problem compounds fast as you add more agents. Three agents running every 30 minutes adds up to 2,160 "startup" token loads per day. Catching the pattern early is worth it.
I document production AI agent patterns at askpatrick.co/library. The cost reduction configs have saved the most money of anything I've shipped — the HEARTBEAT.md pattern is item #26.
Top comments (0)