Claude Code Is Silently Burning 10-20x Your Token Budget — Here's the Fix

#claudecode #ai #devtools #productivity

If you're on Claude Max and wondering why your usage cap disappears in an hour, you're not alone. I went down this rabbit hole after burning through my entire daily quota in a single coding session.

The Problem

There are two independent bugs inflating Claude Code token usage right now:

Bug 1: Broken prompt caching in the standalone binary

The standalone Claude Code binary (installed via claude.ai/install.sh or npm install -g) ships with Anthropic's custom Bun fork. Baked into this fork is a native-layer string replacement that modifies the JSON request body after serialization but before TLS encryption. It targets a billing attribution sentinel (cch=24b72) and replaces it with a 5-character hash.

The issue: it replaces the first occurrence in the serialized body. Since messages[] comes before system[] in the JSON, if your conversation history contains the sentinel string (from discussing billing, reading CC source, or having it in your CLAUDE.md), the wrong instance gets replaced. This changes the message content on every request, breaking the cache prefix and forcing a full rebuild — roughly $0.04-0.15 per request wasted.

Bug 2: MCP connector tool schema injection

Cloud MCP connectors (Ahrefs, Supabase, Similarweb, etc.) inject their complete tool definition schemas into every API call, regardless of whether you're using them. Ahrefs alone adds 100+ tool definitions. That's tens of thousands of tokens of overhead on every single message.

The Fixes

Switch to npx to bypass the Bun binary:

# One-time alias setup
echo "alias claude='npx @anthropic-ai/claude-code'" >> ~/.bashrc
source ~/.bashrc

This runs Claude Code through Node instead of the custom Bun fork, sidestepping the sentinel replacement bug entirely. Only downside is slightly slower startup (npx checks the registry each time).

Disconnect unused MCP connectors from Claude Settings → Connectors. Keep only what you actively use. Disconnected connectors show "Needs authentication" and stop injecting tool schemas.

Other mitigations:

Use /compact to compress context mid-session
Avoid --resume (can inherit broken cache from previous sessions)
Start fresh sessions frequently
Keep your CLAUDE.md lean

Status

Anthropic is aware of both issues. The cache bug is tracked at anthropics/claude-code#24147 and #40524. No public fix date yet, but the word is it's coming.

Credit to Paweł Huryn and Alex Volkov for surfacing this publicly on X.

Top comments (1)

Harjot Singh • May 31

10-20x is dramatic but believable, and the "silently" is the real crime - the waste is invisible because the tool re-sends context, re-reads files, and reprocesses history without ever surfacing that you're paying for the same tokens over and over. If your fix is around prompt caching, scoped reads, or compacting the session, that's exactly the right lever: the budget evaporates on input/overhead, not on the code you actually wanted.

The structural endgame of this fix is to stop one model from carrying all of it - route the mechanical bulk to cheap models and reserve the premium budget for the hard parts. That's how Moonshift keeps a full build to ~$3 flat: a multi-agent pipeline (prompt to a shipped SaaS on your own GitHub + Vercel) where context is scoped per agent and routing enforces the cheap-80/expensive-20 split. First run's free, no card. Strong post - is your fix mostly caching/context-trimming, or did you also change how you structure tasks to avoid the re-read loop? Curious how much of the 10-20x was pure context bloat.