Your Claude Max weekly limit runs out in 3-4 days instead of 7.
You're paying $200/month. You're not imagining it.
We run Claude Code full-time building PaperLink - a document sharing and analytics platform. 8+ hour daily sessions, Clean Architecture across 4 layers, 30+ custom Claude skills, 9 automated hooks. We're not casual users - and we burned through limits faster than anyone we talked to.
We tracked this for months and found 4 root causes stacking simultaneously. None of them were obvious.
Root Cause #1: Anthropic Quietly Changed the Rules
In March 2026, Anthropic reduced peak-hour limits and removed the off-peak bonus that heavy users relied on.
There was no announcement. No changelog entry. We noticed when our weekly limit started dying on Wednesday instead of lasting until Sunday. The community confirmed it:
"Claude Code is the best coding tool I've ever used, for the 45 minutes a day I can actually use it."
This is the new baseline. Anthropic did not restore previous limits.
Root Cause #2: Cache Bug (10-20x Token Inflation)
Claude Code CLI versions 2.1.110 through 2.1.117 had a prompt caching bug that caused 10-20x token inflation per request.
What this means: a request that should have cost 5,000 tokens was costing 50,000-100,000. We run dozens of requests per session across our skill pipeline - the weekly limit was evaporating before we could finish a single feature.
Most people never noticed because token spend is not visible in the Claude Code UI by default.
Fix: Update to CLI version 2.1.118 or later.
# Check your current version
claude --version
# Update
npm i -g @anthropic-ai/claude-code@latest
Root Cause #3: The Most Expensive Configuration Possible
After the Opus 4.7 release, we were running the most expensive configuration possible:
-
Model:
claude-opus-4-7(most expensive model) -
Effort level:
xhigh(maximum thinking tokens per request)
We wanted the best output quality for our Clean Architecture codebase. But every single request - including trivial ones like "rename this variable" - was using the maximum token budget.
Fix: Pin the model to claude-opus-4-6. For 95% of coding tasks, the output quality is identical. The token cost is significantly lower.
{
"model": "claude-opus-4-6",
"effortLevel": "high"
}
Root Cause #4: Long Sessions Are Token Killers
We analyzed our usage patterns and found 87% of token spend came from agentic sessions lasting 8+ hours.
PaperLink has a full skill pipeline - business requirements, technical plans, implementation, code review, tests - all running through Claude Code. A single feature can mean 50+ tool calls in one session. Here's why that's expensive:
- Context grows with every request/response pair
- At 100,000 tokens of context, every new response costs roughly 2x what it costs at 50,000
- After 6-7 hours, Claude starts "freelancing" - renaming variables, refactoring out-of-scope files, doing things you didn't ask for
- Quality degrades, you correct it, corrections add more context, which increases cost further
It's a vicious cycle.
Fix: Set an auto-compact window and disable the 1M context option.
{
"env": {
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "200000",
"CLAUDE_CODE_DISABLE_1M_CONTEXT": "1"
}
}
CLAUDE_CODE_AUTO_COMPACT_WINDOW at 200k tokens forces context compression before it balloons. CLAUDE_CODE_DISABLE_1M_CONTEXT caps the per-request token ceiling.
The Complete settings.json Fix
Here's the full configuration that doubled our weekly limit:
{
"model": "claude-opus-4-6",
"effortLevel": "high",
"autoUpdatesChannel": "stable",
"alwaysThinkingEnabled": false,
"env": {
"DISABLE_AUTOUPDATER": "1",
"CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "200000"
}
}
Two settings worth explaining:
alwaysThinkingEnabled: false - This was forcing extended thinking tokens on every request, even trivial ones. Extended thinking is powerful for complex architectural decisions - we use it when Claude needs to reason across domain, application, infrastructure, and presentation layers. But burning thinking tokens on "add a semicolon" is pure waste.
DISABLE_AUTOUPDATER: 1 - After the cache bug in versions 2.1.110-2.1.117, we pinned to a stable version. Auto-updates to buggy CLI versions can silently inflate your token spend for days before you notice.
One Thing I Removed
We had ENABLE_PROMPT_CACHING_1H in our env vars. Turns out Max subscription already gets 1-hour prompt cache TTL automatically. This variable was redundant - and according to the community, may have actually added overhead rather than helping.
The Result
| Before | After |
|---|---|
| Weekly limit exhausted in 3-4 days | Weekly limit lasts 6-7 days |
| Frequent "limit reached" mid-session | Rare limit hits, usually end of week |
| Long sessions degrading quality | Shorter, higher-quality sessions |
What Did NOT Change
Anthropic did not restore previous limits. This is the new normal.
These settings minimize your token spend within the current pricing model. If Anthropic changes limits again, the fundamentals still apply: control your model cost, control your context size, avoid buggy CLI versions.
Quick Checklist
Before your next session, verify:
- [ ] CLI version >= 2.1.118 (
claude --version) - [ ] Model pinned to
claude-opus-4-6(orclaude-sonnet-4-6for lighter tasks) - [ ] Effort level set to
high, notxhigh - [ ]
alwaysThinkingEnabledisfalse - [ ] Auto-compact window set to 200k
- [ ] 1M context disabled
- [ ] Auto-updater disabled (update manually when stable)
- [ ] No redundant
ENABLE_PROMPT_CACHING_1Henv var
We're still pushing Claude Code hard every day on PaperLink - if we find more optimizations, we'll update this post.
What's the biggest Claude Code frustration you've hit? Drop your experience in the comments - curious how common these patterns are.
Top comments (0)