speedy_devv

Posted on May 16

Cut Claude Code Token Costs

#claudecode #ai #tokencosts #developertools

Your Claude Code bill went up twice in 30 days. You felt it on the invoice before you understood why.

On June 15, 2026, Anthropic moves Agent SDK, claude -p, and Claude Code GitHub Actions onto a separate metered credit pool that does not roll over. Once the pool drains, you pay full API rates.

At the same time, the new Opus 4.7 tokenizer reports about 1.46x more text tokens than 4.6 at the same per-token price. Simon Willison flagged it as "actually a pretty big price bump." Image content can hit 3.01x. Same prompt, same dollars per token, more tokens per request.

Five open-source repos fight back. I ranked them by max stated savings, with install commands and where each percentage actually comes from. Every number in this post is vendor-stated. Real savings shift with codebase size, MCP server count, and how often your sessions repeat work.

Why your bill is climbing right now

Three things are stacking on top of each other.

First, the June 15 split. Programmatic Claude Code usage gets its own dedicated budget instead of sharing the chat pool. The Pro plan ships $20 of Agent SDK credit, Max 5x ships $100, Max 20x ships $200. None of it rolls over. Interactive Claude Code in your terminal is unaffected.

Second, the Opus 4.7 tokenizer. Same price per token, more tokens per request. Willison's testing measured 1.46x for plain text and up to 3.01x for images. A 15MB PDF only inflated 1.08x, so the impact varies with content type.

Third, Fast Mode now defaults to Opus 4.7 in recent Claude Code releases. Faster, smarter, and quietly more expensive per request than the 4.6 baseline you had a month ago.

The fix is not "use Sonnet for everything." The fix is fewer tokens hitting the wire on every call you do make.

The five tools, ranked by max stated savings

Verify each one in your own workflow. Percentages come straight from each repo's README or the third-party listing noted.

lean-ctx: 60% to 95% reduction across reads, up to 99% on cached reads (vendor README).
airis-mcp-gateway: up to 97% context token reduction. The 97% figure comes from the VoltAgent awesome-claude-code-subagents listing*, not the repo itself. The repo's own README says only "Token Efficiency: Measurable reduction in initial context overhead" with no number.
agentmemory: 92% fewer tokens than pasting full context across sessions. Badge sits at the top of the README.
9router: 20% to 40% per request via RTK token compression on tool output. Worked example in the README: 47K tokens shrunk to 28K.
cc-ledger: 0% direct savings. This is the meter. You need it on before you can prove the other four did anything.

(*third-party listing, not vendor)

lean-ctx: compress the inputs

lean-ctx is a single Rust binary that sits between Claude Code and your filesystem. It hooks every file read, every grep, every shell command. Output gets compressed before it reaches the model.

Headline claim is 60% to 95% reduction, with up to 99% on cached reads. The 99% figure assumes cache hits. Cold runs see less.

Three commands stand it up:

curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx setup
lean-ctx init --agent claude-code

Best fit: workflows where you re-read the same files often. Worst fit: thin sessions with mostly short prompts. The compression overhead pays off only when there is something to compress.

airis-mcp-gateway: compress the tool listings

If your Claude Code talks to Sentry, GitHub, Linear, Postgres, and a couple of other MCP servers, the system prompt pays a tax for every tool listing on every turn. airis-mcp-gateway aggregates many MCP servers behind a single SSE endpoint with intelligent routing and on-demand lifecycle management.

The 97% figure that travels with this repo comes from the VoltAgent awesome-claude-code-subagents listing. The repo itself is more conservative. Read both before you quote a number to your team.

Production install:

curl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bash

Skip this one if your .mcp.json lists fewer than five servers. The savings come from collapsing tool listings, and a slim setup has nothing to collapse.

agentmemory: stop paying for re-explaining your project

Every new Claude Code session starts fresh. You re-paste the stack notes, the rules, the patterns. agentmemory kills that cost. It captures what the agent does via hooks, compresses into searchable observations, and injects relevant prior context into future sessions.

The README claims 92% fewer tokens against the worst-case "paste full context every session" baseline. Worked comparison in the repo: about 170K tokens per year (around $10) with agentmemory versus 19.5M+ tokens pasting full context manually.

Install:

npx @agentmemory/agentmemory

If you already use Claude Code's /resume and a tight CLAUDE.md, the marginal savings are smaller than the badge implies. Still useful. Just not 92% useful.

9router: compress the outputs and arbitrage providers

9router is a multi-provider router with two tricks. The first is RTK Token Saver, which auto-compresses tool_result content like git diff, grep, find, ls, tree, and log output. The README quotes 20% to 40% per request and shows a worked example: 47K tokens without RTK, 28K tokens with it, a 40% cut on that one call.

The second trick is provider routing. 9router fronts 40+ providers including Kiro AI (Claude 4.5), OpenCode Free, Vertex AI's $300 credit pool, GLM at $0.6/1M, MiniMax at $0.2/1M, and Kimi at $9/month.

Install:

npm install -g 9router
9router

Caveat worth saying out loud. Routing to non-Anthropic providers changes the trust profile, the latency, the model quality, and the data handling. Read the provider's terms before you push real client code through it.

cc-ledger: see what you spent

You cannot manage what you cannot see. cc-ledger captures every Claude Code edit, prompt, and per-turn token cost via Claude Code hooks. It writes to ~/.cc-ledger/ledger.db and tracks five token classes per turn: input, output, cache_read, cache_write_5m, and cache_write_1h.

Those classes match Anthropic's own pricing model. Cache reads bill at 0.10x input rates. 5-minute cache writes bill at 1.25x. 1-hour cache writes bill at 2x. Without a ledger, prompt caching feels invisible. With one, you see the line move.

Install:

curl -fsSL https://ccledger.dev/install | bash

cc-ledger also computes "shadow billing", which estimates what your subscription usage would have cost on the API. Sat at six stars on May 15, 2026. Early-stage. Use it for visibility, not as a billing system of record.

The do-this-in-order recipe

Stack the savings in this order. Each step compounds on the last, and the ledger lets you see what each step actually saved.

Install cc-ledger first. You need a baseline. Run it, then work normally for one day. Note the daily spend.
Install agentmemory. This kills the cost of re-explaining your project on every new session.
Install lean-ctx. This compresses every file read and shell command before it hits the model.
Add airis-mcp-gateway only if you have five or more MCP servers configured. Otherwise skip it.
Add 9router only if you are willing to route to non-Anthropic providers. Highest impact for Pro users on tight budgets, also the most disruptive change to your workflow.
Re-check cc-ledger after one week. Compare against your baseline.

One install at a time, with the ledger between each step. That's how you tell which tool moved the line.

What Anthropic itself recommends (free)

Before any third-party tool, the free moves from Anthropic's own cost guide:

Run /clear between unrelated tasks so context does not balloon.
Run /compact with custom instructions to keep only what the next step needs.
Set MAX_THINKING_TOKENS=8000 so extended thinking has a ceiling.
Prefer CLI tools over MCP servers when the CLI is installed already.
Move CLAUDE.md detail into skills, so the bytes load only when needed.

Anthropic also publishes a "$13 per developer per active day" enterprise benchmark, and notes that agent teams use about 7x more tokens than standard sessions. Worth knowing before you launch a fleet of subagents on a tight budget.

Risks and caveats

A few honest gotchas, in order of how often they bite people.

Every percentage above is vendor-stated. The 97% airis figure comes from a third-party listing, not the repo itself. The 99% lean-ctx figure assumes cache hits. The 92% agentmemory figure compares against the worst-case baseline. The 40% 9router figure is one worked example, not a benchmark.

9router routes traffic to non-Anthropic providers. That changes trust, latency, quality, and data handling.

cc-ledger is early-stage. Use it for visibility, not as a billing system of record.

The June 15 split is recent. Anthropic has changed billing twice in two months. Check the official pricing page before making subscription decisions based on this post.

Stacking five tools adds operational surface area. Bash install, Docker, npm global, npx daemon, hook scripts. Install one at a time and re-measure with cc-ledger between each.

None of these tools is endorsed by Anthropic. They are community projects.

The bill went up twice in 30 days. The fix is not one tool. The fix is a stack with a meter on top. Install the ledger first, then layer the rest.

Full breakdown with every source link: https://www.buildthisnow.com/blog/guide/mechanics/cut-claude-code-token-costs