Your Claude Max Weekly Limit Runs Out in 3 Days. Here's Why (and the Fix)

Your Claude Max weekly limit runs out in 3-4 days instead of 7.

You're paying $200/month. You're not imagining it.

We run Claude Code full-time building PaperLink - a document sharing and analytics platform. 8+ hour daily sessions, Clean Architecture across 4 layers, 30+ custom Claude skills, 9 automated hooks. We're not casual users - and we burned through limits faster than anyone we talked to.

We tracked this for months and found 4 root causes stacking simultaneously. None of them were obvious.

Root Cause #1: Anthropic Quietly Changed the Rules

In March 2026, Anthropic reduced peak-hour limits and removed the off-peak bonus that heavy users relied on.

There was no announcement. No changelog entry. We noticed when our weekly limit started dying on Wednesday instead of lasting until Sunday. The community confirmed it:

"Claude Code is the best coding tool I've ever used, for the 45 minutes a day I can actually use it."

This is the new baseline. Anthropic did not restore previous limits.

Root Cause #2: Cache Bug (10-20x Token Inflation)

Claude Code CLI versions 2.1.110 through 2.1.117 had a prompt caching bug that caused 10-20x token inflation per request.

What this means: a request that should have cost 5,000 tokens was costing 50,000-100,000. We run dozens of requests per session across our skill pipeline - the weekly limit was evaporating before we could finish a single feature.

Most people never noticed because token spend is not visible in the Claude Code UI by default.

Fix: Update to CLI version 2.1.118 or later.

# Check your current version
claude --version

# Update
npm i -g @anthropic-ai/claude-code@latest

Root Cause #3: The Most Expensive Configuration Possible

After the Opus 4.7 release, we were running the most expensive configuration possible:

Model: claude-opus-4-7 (most expensive model)
Effort level: xhigh (maximum thinking tokens per request)

We wanted the best output quality for our Clean Architecture codebase. But every single request - including trivial ones like "rename this variable" - was using the maximum token budget.

Fix: Pin the model to claude-opus-4-6. For 95% of coding tasks, the output quality is identical. The token cost is significantly lower.

{
  "model": "claude-opus-4-6",
  "effortLevel": "high"
}

Root Cause #4: Long Sessions Are Token Killers

We analyzed our usage patterns and found 87% of token spend came from agentic sessions lasting 8+ hours.

PaperLink has a full skill pipeline - business requirements, technical plans, implementation, code review, tests - all running through Claude Code. A single feature can mean 50+ tool calls in one session. Here's why that's expensive:

Context grows with every request/response pair
At 100,000 tokens of context, every new response costs roughly 2x what it costs at 50,000
After 6-7 hours, Claude starts "freelancing" - renaming variables, refactoring out-of-scope files, doing things you didn't ask for
Quality degrades, you correct it, corrections add more context, which increases cost further

It's a vicious cycle.

Fix: Set an auto-compact window and disable the 1M context option.

{
  "env": {
    "CLAUDE_CODE_AUTO_COMPACT_WINDOW": "200000",
    "CLAUDE_CODE_DISABLE_1M_CONTEXT": "1"
  }
}

CLAUDE_CODE_AUTO_COMPACT_WINDOW at 200k tokens forces context compression before it balloons. CLAUDE_CODE_DISABLE_1M_CONTEXT caps the per-request token ceiling.

The Complete settings.json Fix

Here's the full configuration that doubled our weekly limit:

{
  "model": "claude-opus-4-6",
  "effortLevel": "high",
  "autoUpdatesChannel": "stable",
  "alwaysThinkingEnabled": false,
  "env": {
    "DISABLE_AUTOUPDATER": "1",
    "CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
    "CLAUDE_CODE_AUTO_COMPACT_WINDOW": "200000"
  }
}

Two settings worth explaining:

alwaysThinkingEnabled: false - This was forcing extended thinking tokens on every request, even trivial ones. Extended thinking is powerful for complex architectural decisions - we use it when Claude needs to reason across domain, application, infrastructure, and presentation layers. But burning thinking tokens on "add a semicolon" is pure waste.

DISABLE_AUTOUPDATER: 1 - After the cache bug in versions 2.1.110-2.1.117, we pinned to a stable version. Auto-updates to buggy CLI versions can silently inflate your token spend for days before you notice.

One Thing I Removed

We had ENABLE_PROMPT_CACHING_1H in our env vars. Turns out Max subscription already gets 1-hour prompt cache TTL automatically. This variable was redundant - and according to the community, may have actually added overhead rather than helping.

The Result

Before	After
Weekly limit exhausted in 3-4 days	Weekly limit lasts 6-7 days
Frequent "limit reached" mid-session	Rare limit hits, usually end of week
Long sessions degrading quality	Shorter, higher-quality sessions

What Did NOT Change

Anthropic did not restore previous limits. This is the new normal.

These settings minimize your token spend within the current pricing model. If Anthropic changes limits again, the fundamentals still apply: control your model cost, control your context size, avoid buggy CLI versions.

Quick Checklist

Before your next session, verify:

[ ] CLI version >= 2.1.118 (claude --version)
[ ] Model pinned to claude-opus-4-6 (or claude-sonnet-4-6 for lighter tasks)
[ ] Effort level set to high, not xhigh
[ ] alwaysThinkingEnabled is false
[ ] Auto-compact window set to 200k
[ ] 1M context disabled
[ ] Auto-updater disabled (update manually when stable)
[ ] No redundant ENABLE_PROMPT_CACHING_1H env var