DEV Community

정상록
정상록

Posted on

Claude Code's Silent Cache TTL Change: Why Your Quota Is Draining Faster

Claude Code's Silent Cache TTL Change: Why Your Quota Is Draining Faster

If you've been using Claude Code and noticed your quota draining significantly faster since early March 2026, you're not alone. A developer's meticulous analysis of 3 months of session data has uncovered the reason — and it's a single configuration change that Anthropic made without any announcement.

The Discovery

Developer Sean Swanson analyzed thousands of JSONL session files spanning three months and posted his findings in GitHub Issue #46829. Here's what he found:

  • February 1, 2026: Anthropic set Claude Code's prompt cache TTL to 1 hour
  • March 7, 2026: TTL was rolled back to 5 minutes
  • No changelog, no announcement, no documentation update

Understanding Cache TTL Economics

Claude Code caches system prompts, rule files, and conversation history to avoid resending them with every request. The TTL determines how long this cache stays valid.

Here's the pricing structure that makes this significant:

Operation Cost (vs base token price)
5-min cache write 1.25x
1-hour cache write 2x
Cache read 0.1x

The critical insight: cache reads are 10-12.5x cheaper than writes.

With a 5-minute TTL, any pause longer than 5 minutes — reading docs, reviewing code, grabbing coffee — triggers a full cache expiration. Your next prompt then requires a complete cache rewrite instead of a cheap read.

The Numbers

Swanson's analysis showed:

  • 220M tokens written to the 5-minute cache tier over 3 months
  • These would have been cache reads under the 1-hour TTL
  • Estimated 20-32% increase in cache creation costs

Real-World Impact

The effects have been widespread across subscription tiers:

  • Max subscribers ($200/mo): First-time quota limit hits since subscribing
  • Pro users ($20/mo): As few as 2 prompts possible in 5 hours
  • Enterprise teams: Going from all-day Opus sessions to 2-hour limits
  • AMD's AI Director: Publicly raised similar concerns

Anthropic's Response

Two key engineers addressed the issue:

Jarred Sumner (Bun creator, now at Anthropic):

"A significant portion of Claude Code requests are one-shot calls, so 5-minute TTL is actually cheaper [overall]."

No plans to expose TTL settings to users.

Boris Cherny (Claude Code creator):

"A cache miss on a 1M token context window is very expensive. We're exploring reducing the default context window to 400K."

The Bug Factor

Adding complexity to the debate: two bugs were discovered in the caching code. Community members argue that discussing 5min vs 1hr TTL is pointless until these bugs are fixed, as cache may not be working as intended regardless.

Additionally, AWS Bedrock users found that TTL is hardcoded to 5 minutes in their environment (#32671).

What You Can Do

  1. Minimize session gaps: Keep interactions within 5-minute windows
  2. Optimize context: Trim unnecessary system prompts and rule files
  3. Monitor usage: Track your daily quota consumption to plan work
  4. Contribute data: Share your usage patterns in #46829

The Bigger Picture

The Register noted:

"Focusing on cache optimization may be evidence that Anthropic's quotas are providing less processing time than before."

Whether intentional cost optimization or unintended side effect, this incident highlights the importance of transparency in AI tool pricing. When a single configuration change can alter the effective value of a subscription by 20-30%, users deserve to know about it upfront.


Have you noticed changes in your Claude Code quota consumption? I'd love to hear about your experience in the comments.

Top comments (0)