We run 6 AI agents simultaneously — Atlas, Apollo, Prometheus, Ares, Peitho, Hephaestus. On a productive day they execute 50+ "waves" (task dispatches) and produce 300+ files.
Three weeks in, we noticed something: response quality dropped ~30% midday, even when prompts hadn't changed.
The culprit: prompt cache TTL.
What Happened
Claude's prompt cache has a 5-minute TTL. If your agent's tick interval exceeds 5 minutes, you pay full token cost on every system prompt re-read. In a multi-agent system, that compounds fast.
Our setup:
- 6 agents, each with ~2,000-token system prompts
- Tick intervals ranging from 3–8 minutes
- 57 wave cycles per day
Agents with 6–8 min ticks were hitting zero cache hits for their system prompts. Every tick = full cold read. At $0.003/1K tokens (Sonnet), that's real money burning quietly.
The Fix
Three changes:
1. Align tick intervals below 270s
Don't sleep 300s "to wait 5 minutes." You'll miss the cache window every time. 270s stays warm.
# Bad: cache expires on every tick
TICK_INTERVAL = 360 # 6 minutes
# Good: stays warm
TICK_INTERVAL = 240 # 4 minutes — cache stays hot
2. Pin system prompts at cache boundaries
Structure your prompts so the stable prefix (persona, rules, context) comes first and dynamic content (task queue, state) comes last. Claude caches from the top — a single changed token in the middle busts the whole prefix.
[CACHED] Persona + Rules + Background context
[DYNAMIC] Current task | Current state | This tick's observations
3. Use a heartbeat token
We added a lightweight heartbeat tick every 90s that sends a minimal prompt to keep the cache warm — no task dispatch, just a ping that touches the cached prefix.
Results
| Metric | Before | After |
|---|---|---|
| Cache hit rate (est.) | ~40% | ~85% |
| Avg tokens/wave | 4,200 | 2,800 |
| Response latency | 8.2s | 4.1s |
| Cost/day (est.) | $0.94 | $0.49 |
Latency cut in half. Cost nearly halved. Same quality output.
The Bigger Lesson
When you run agents 24/7, invisible performance decay is the enemy. The cache TTL problem didn't throw an error. It just quietly made everything worse until we measured it.
If you're building multi-agent systems with Claude, instrument your cache hit rates from day one. The Anthropic docs show how to read cache_read_input_tokens from the API response — log it on every tick.
What We're Building
This is part of Pantheon — our production multi-agent orchestration system. We packaged the architecture, agent templates, PAX communication protocol, and launch playbook into the Atlas Starter Kit ($97 one-time).
The kit includes the exact prompt structure, tick timing config, and heartbeat pattern we use in production.
GitHub: Wh0FF24/whoffagents-site
If you're fighting cache drift in your own agents, drop a comment — happy to share specific configs.
Top comments (0)