Anthropic's April 1 communication about token usage changes referenced "7% of users" seeing higher costs during peak hours. That framing buried the real story. The policy change is minor. Two cache bugs are not.
The actual problem
Two separate bugs are causing 10–20x token inflation for power users, and neither has anything to do with peak-hour pricing:
Bug 1 — Sentinel replacement bug (standalone binary)
If you installed Claude Code via the 228MB standalone installer, a sentinel replacement bug triggers a full cache rebuild on every request. Your context isn't being read from cache — it's being rewritten from scratch each time. The standalone binary distribution is the trigger. This is tracked in GitHub issue #40524.
Bug 2 — Resume session bug (v2.1.69+)
If you use --resume or --continue flags on v2.1.69 or later, a cache prefix mismatch causes the entire conversation history to be rewritten instead of read. The cache exists. Claude just isn't reading it correctly — it's appending instead of resuming.
If you run autonomous loops with --continue, or regularly resume long sessions, you're likely burning 10–20x what you should be. The 7% headline does not describe your situation.
Who is affected
- Anyone using the standalone binary installer (not npm)
- Anyone running
--resumeor--continueon v2.1.69+ - Power users with long sessions, multi-turn autonomous workflows, or heavy CLAUDE.md files
If you're running Claude Code via npm and never use resume flags, your exposure is limited to the peak-hour policy change, which is comparatively small.
5 workarounds you can apply today
1. Switch from standalone binary to npm
npm install -g @anthropic-ai/claude-code
This sidesteps the sentinel replacement bug entirely. If you installed via the 228MB standalone installer, uninstall it and switch to the npm package. Same CLI, different distribution path, no cache rebuild on every request.
2. Avoid --resume and --continue until patched
Until the fix ships, these flags are a token sink. For tasks that span sessions, use /clear to start a clean context rather than resuming a broken one. The resume flag is supposed to save tokens. Right now it costs them.
3. Shrink your CLAUDE.md under 800 tokens
CLAUDE.md loads on every session start, before cache has anything to offer. If your file is 3,000 tokens of instructions, that's 3,000 tokens of cold-load on every session. When the cache breaks, that entire block gets reloaded repeatedly.
Audit what's actually required at session start versus what can be loaded on demand. Cut references to documentation you rarely need. Move heavy reference material to separate files and load them explicitly when relevant. A lean CLAUDE.md is a structural hedge against cache failures — not just an optimization.
4. Move heavy autonomous work outside 8am–2pm ET
The peak-hour multiplier policy is real, even if smaller than the cache bugs. Autonomous loops that run for hours are better scheduled off-peak. This doesn't fix the bugs, but it reduces the compounding effect of running a token-inflating workflow during an already-expensive window.
5. Measure your actual burn
npx ccusage@latest
Run this before and after applying the workarounds above. You can't manage what you're not measuring. ccusage pulls your actual usage data and gives you a clear picture of where tokens are going.
The structural principle: lazy-load your context
The cache bugs hit hardest when there's a large baseline context — instructions, personas, reference material — all loaded upfront regardless of what you're actually doing. The more you front-load, the more you lose when cache breaks.
The pattern that holds up under cache failure is lazy-loading: only load what the current task requires, defer everything else. If you want to see this in practice, PRISM Forge takes this approach with 23 personas — none are loaded at session start, each is loaded on demand when a signal fires. That's not a fix for the cache bug, but it means the baseline context stays small, so cache failures are cheaper.
Where this lands
Anthropic is actively working the fix — GitHub #40524 is open and patches are in progress. These bugs will be resolved.
The hygiene habits are worth keeping anyway. Lean CLAUDE.md, lazy-loaded context, measured token usage — these aren't workarounds for a temporary bug. They're how you keep costs predictable as sessions get longer and workflows get more autonomous.
The 7% framing will probably age badly once the cache fixes ship and people can see what was actually happening. Build your setup like you're the 10–20x case, because for a subset of power users, that's exactly what it is.
Top comments (0)