Recca Tsai

Posted on Apr 25 • Originally published at recca0120.github.io

12 More Days Scanned: Claude Code Sub-Agent Cache TTL Has Been 100% 5m for 17 Straight Days — This Isn''t a Regression, It''s the New Default

#claudecode #promptcaching #agents #python

Originally published at recca0120.github.io

}}">Two weeks ago I scanned 95 days of Claude Code logs and found that since 4/9, sub-agents had been 100% downgraded to 5m TTL — 5 consecutive days, 4,840 API calls, with the main agent completely untouched. I left the conclusion at "monitoring," since 5 days could still be rollout flapping.

Today 4/26, I re-ran the same Python. The streak is now 17 days, 15,727 API calls, 0 1h writes. This isn't flapping — Anthropic's server has quietly hard-coded the sub-agent default TTL to 5m. No changelog, no announcement, and the main issue was just closed without resolution.

This is a follow-up: latest data, cost math, community and media state, and why cnighswonger's proxy can't save you here either.

Past Two Weeks of Data

Scan covers 4/13–4/25 (cut-off of last post → today):

Metric	Main agent	Sub-agent
Total API calls	60,291	15,727
1h writes	100% (150.7M tokens)	0
5m writes	0	100% (60.4M tokens)
Consecutive 1h-write days	13	0
Consecutive 5m-write days	0	13

Add the 4/9–4/12 stretch and the sub-agent has run 17 straight days at 100% 5m, with 0 1h writes. Sub-agent workload didn't drop — 4/14 (the day I posted last) hit 2,648 calls, 4/17 spiked to 2,821, both two-week highs. The full cost impact landed on me.

Key contrast: the main agent stayed 100% 1h the entire time, untouched. So this is unambiguously server-side discrimination against the "sub-agent identity" — not quota throttling, not a client version, not a workflow change.

How Much More Expensive: The Math

Anthropic's official cache pricing:

Cache write to 5m TTL: 1.25× base input price
Cache write to 1h TTL: 2× base input price
Cache read (both): 0.1× base input price

Intuition says 5m writes are cheaper — 1.25× vs 2×, a 37.5% saving. But sub-agent workflows defeat that intuition.

A typical sub-agent runs 30 minutes, 5 turns. Between turns it waits for the LLM to think, runs tools, parses results. 3 inter-turn gaps over 5 minutes is normal. Each gap past TTL expires the cache and forces a rewrite next turn.

Total cost (with base input as 1×):

Old (1h TTL):
  1 cache write @ 2×  = 2.0
  4 cache reads @ 0.1× = 0.4
  Total = 2.4×

New (5m TTL):
  4 cache writes @ 1.25× = 5.0
  1 cache read   @ 0.1× = 0.1
  Total = 5.1×

About 2.1×. A heavy sub-agent workflow (parallel Task fan-out, long plan-execute, code-review pipelines) that used to cost \$10 now costs \$21.

This assumes inter-turn gaps average over 5m. If your sub-agent finishes every turn within 5m (e.g. pure retrieval), the impact is much smaller. The hardest-hit are sub-agents that "run long, wait for tool results."

GitHub Activity Past Week

Issue #46829: Closed by Anthropic

cnighswonger's #46829 was closed by Anthropic without a fix. Comments are uniformly angry:

DaQue: "I don't like the stealth nerf."
rinchen: "Yet another issue closed without resolution by Anthropic."
lizthegrey (Engineering Director at Honeycomb, jumped in 4/25): posted her own grep one-liner, listed her affected versions and dates (4/01 v2.1.81, 4/09 v2.1.85, 4/13–4/17 v2.1.92, 4/21 v2.1.114), and explicitly stated she provided redacted jsonl transcripts to Anthropic. The most credible piece of evidence submitted so far.

# lizthegrey's one-liner
grep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude | \
  jq 'select(.isSidechain == false and (.message.model | startswith("claude-haiku") | not) and .message.usage.cache_creation.ephemeral_5m_input_tokens > 0) | .timestamp + "," + .version' 2>/dev/null | \
  sed 's/T.*,/,/' | sort | uniq -c

Same data source as my 60-line Python from the last post, just more concise. Drop-in usable.

Issue #50213: Sub-agent Trailing Block Missing cache_control

ofekron added measurements on #50213 on 4/17: every built-in sub-agent (Explore, Plan, general-purpose) shows nonzero cache_creation on second spawn — the trailing system-context block has no cache_control marker, so each fresh spawn wastes ~4.7K tokens rewriting. 0 new comments past week — this issue is being ignored.

Together the two issues say the same thing: Anthropic's posture toward sub-agent cache leans toward "save where we can," not "optimize where we can."

No Movement from Anthropic Staff

bcherny's earlier mention of a "per-request env var / flag for TTL" — still not shipped
Jarred Sumner's earlier defense in The Register that "sub-agent 5m is a one-shot optimization" — no response to the 4/9 100% 5m data
Anthropic posted nothing on these issues in the past week

Update (2026-04-26): Official Position vs My Data

After publishing, I dug into Anthropic's public posture. Boris Cherny (creator of Claude Code), via The Register:

"One-hour cache has been implemented in some places for subscribers, while a five-minute cache is the true default."

So Anthropic's official line is "5m is the true default; 1h is opt-in for some subscriber scenarios" — which actually agrees with this post's framing of "this is the new default, not a regression."

But the official stance can't explain one thing: time-series data from }}">the first audit shows that from 2026-02-07 to 03-05, 28 consecutive days, sub-agents received 100% 1h (not mixed, not 50%). If those 28 days of "1h treatment" were a "special case," it was a stably-allocated special case, not an occasional gift.

This post's 17 days of 100% 5m can be re-positioned: the sub-agent 1h treatment subscribers used to receive is being stably revoked. Anthropic didn't "change the default," but the "1h special case formerly granted to sub-agents" effectively disappeared. That's a fact the official statement can't paper over.

Media Coverage and a Bigger Thread

This isn't just blowing up on GitHub:

The Register (4/13): Anthropic: Claude quota drain not caused by cache tweaks — Anthropic publicly denies a cache link, with Sumner's defense quoted in full
XDA Developers: Anthropic quietly nerfed Claude Code's 1-hour cache
DevOps.com: Developers Using Anthropic Claude Code Hit by Token Drain Crisis

Worth tracking: Issue #41930 — since 3/23 every paid tier has been hit by abnormal quota burn, Pro / Max 5× / Max 20× included. Single prompts eat 3–7% of session quota; 5h windows drain in as little as 19 minutes. The community treats cache TTL regression, autocompact cascades, and sub-agent fan-out as stacked root causes. My 4/9 second-wave finding fills in the timeline of "sub-agent specifically got worse again on 4/9."

Can cnighswonger's Proxy Save This? My Take

cnighswonger/claude-code-cache-fix v3.0.3 has nice A/B numbers on CC v2.1.117: through the proxy 95.5% cache hit rate, direct 82.3%. It runs 7 hot-reloadable extensions, including ttl-management, which "detects server TTL tier and injects correct cache_control markers."

But for the "server force-writes sub-agent into 5m" problem, the proxy probably can't save you. My read:

The proxy fixes "caches that should hit but miss because of client bugs" (unstable fingerprint, non-deterministic tool ordering, inconsistent cache_control markers)
It can't fix "client marks 1h, server still writes 5m" — that's server-side behavior, the proxy can't rewrite responses
From our 17 days of 100% 5m / 0 1h writes, the server is doing the latter for sub-agents

Easy to verify: install the proxy, run the same script against ~/.claude/projects/*.jsonl, see if sub-agent ephemeral_1h_input_tokens ever goes from 0 to nonzero. If it stays 0, the server-side change is confirmed.

This isn't a knock on cnighswonger's proxy — it has demonstrated value for the main agent and any cache-miss scenario. Just don't expect it to "bring back sub-agent 1h TTL."

Conclusion: This Is the New Default

In the 4/14 post I called the 4/9 wave "a second silent regression." On 4/26 I'm revising the wording: this is no longer a regression — it's Anthropic's new default for sub-agents.

Evidence weight:

17 consecutive days (4/9–4/25)
15,727 API calls in just the past 13 days
0 1h writes (not low — actually zero)
Main agent untouched (clear differential treatment)
Media + GitHub + community on fire, Anthropic stays silent

If you lean heavily on sub-agents:

Scan your own data first — use the Python from the last post, or lizthegrey's jq one-liner above
Calculate the actual cost impact — it's not "a bit more," it's about 2×
Re-evaluate your sub-agent workflows — anything doable in the main agent shouldn't fan out to sub-agents
Drop a data point on issue #46829 — closed but still indexed. With Honeycomb-tier voices already pushing, more data makes external coverage easier to follow up

}}">Background — Claude Code session cost & cache misconception covers the cache cost logic. }}">First audit covers how to scan your own logs to verify. Read both for the full picture.