Phasu Yeneng

Posted on Apr 20

9 Verified Tools to Stop Burning Claude Tokens Unnecessarily

#ai #claude #productivity #tools

You're not using Claude more — you're just wasting more context.

I went looking for real, working tools after seeing a widely-shared list that mixed legitimate repos with hallucinated ones. This article only covers tools I could verify on GitHub, organized by the type of waste they fix.

Why tokens disappear faster than you expect

Before the tools: understanding where tokens actually go.

Most developers assume their prompts are the main cost. They're not. The real culprits are:

Verbose model output — Claude explaining what it's about to do, then doing it, then summarizing what it did
Raw CLI output — dumping full git log, npm install, or test runner output directly into context
Bloated CLAUDE.md — this file loads on every turn before Claude reads a single line of your code. A 5,000-token CLAUDE.md costs 5,000 tokens per message, before you've typed a word
Code navigation by content — when Claude reads entire files to find one function instead of navigating by symbol
Ghost tokens — leftover context from earlier in the session that no longer contributes to the task but still costs money every turn

Each category has a different fix. Here's what actually works.

Category 1 — Shrink what Claude writes back

Caveman

The simplest idea with surprisingly good results: make Claude talk like a caveman. Short words. No filler. Still technically accurate.

It ships as a Claude Code skill that cuts ~65–75% of output tokens while preserving full technical accuracy. The compression is aggressive but the signal stays intact — you get fix auth bug in login.js line 42 instead of three paragraphs explaining what the fix does.

Works as a plugin for Cursor, Windsurf, Cline, and others too.

# install as a Claude Code skill
# see: https://github.com/juliusbrussee/caveman

Best for: Long coding sessions where Claude's explanations are eating your context budget.

claude-token-efficient

Single CLAUDE.md file. Drop it into your project. Done.

It bakes response-terseness rules directly into Claude's instructions, forcing shorter output on heavy workflows without you having to prompt for it every time. No code changes, no new dependencies.

Best for: Projects where you want a permanent "be concise" baseline without running an extra tool.

Category 2 — Compress what you send in

RTK (Rust Token Killer)

A CLI proxy that filters terminal output before it reaches Claude. Instead of Claude seeing 2,000 tokens of raw git status, it sees ~200 tokens of the relevant parts.

The hook transparently rewrites shell commands — git status becomes rtk git status — and Claude never sees the rewrite, just the compressed result.

# without rtk
git status  → ~2,000 tokens raw output

# with rtk
rtk git status  → ~200 tokens filtered output

Claims 60–90% reduction on common dev commands. Single Rust binary, zero dependencies.

Best for: Teams running agentic workflows where Claude executes a lot of shell commands.

Headroom

A context optimization layer that sits between your app and the LLM. Three compression modes:

SmartCrusher — JSON compression
CodeCompressor — AST-aware code compression (understands structure, not just text)
Kompress-base — general text compression

The AST-aware approach is the interesting one. It doesn't just truncate code — it understands which parts of a file are structurally relevant and compresses accordingly.

Best for: Applications (not just Claude Code) that programmatically build context before sending to any LLM.

claude-token-optimizer

Reusable CLAUDE.md setup prompts that structure your documentation so Claude only loads what it needs per task.

One real-world example from the repo: a RedwoodJS project reduced session start from 11,000 tokens down to 1,300 by restructuring which docs load when.

Best for: Projects with large documentation that Claude currently loads all at once.

Category 3 — MCP-level optimization

Token Optimizer MCP

Intelligent token optimization for Claude Code via MCP. Claims 95%+ reduction through caching, compression, and smart tool intelligence — meaning it tracks which tools Claude actually uses and optimizes the tool definitions it sends.

Best for: Claude Code users running MCP-heavy workflows.

PromptThrift MCP

Compresses conversation history using a local Gemma 4 model (runs on your machine, no extra API cost) or heuristic fallback. Key feature: pinned facts — you can mark specific context as protected so it survives compression.

Claims 70–90% reduction on conversation history while keeping critical context intact.

Best for: Long multi-turn sessions where early context becomes expensive dead weight.

Token Savior

MCP server specifically built for code navigation. Instead of Claude reading entire files to find a function, it indexes your codebase by symbol — functions, classes, imports, call graph — and navigates by pointer.

Also includes a persistent memory engine that stores decisions, conventions, and session summaries in SQLite and re-injects them as a compact delta at session start.

Claims 97% reduction on code navigation across 170+ real sessions.

Best for: Large codebases where Claude spends significant tokens just finding the right file and function.

Category 4 — Clean up ghost tokens

Token Optimizer

Targets "ghost tokens" — context that's still technically in the window but no longer relevant to the current task. Also helps survive context compaction without losing quality.

More of a diagnostic and cleanup tool than a compression layer.

Best for: Sessions that run long and accumulate stale context from earlier tasks.

The free fix most people skip

Before installing anything: audit your CLAUDE.md.

According to Claude Code's official documentation, CLAUDE.md loads before every response, before Claude reads your code, before anything else. It's the most expensive file in your project on a per-token basis.

The recommended limit is under 200 lines. If yours is longer, move sections into on-demand skill files that only load when invoked. Most CLAUDE.md files I've seen in the wild are 3–5x longer than they need to be.

Summary

Tool	What it fixes	Claimed reduction
Caveman	Verbose model output	~65–75% output tokens
claude-token-efficient	Verbose model output	Drop-in terseness
RTK	Raw CLI output in context	60–90% on shell commands
Headroom	Input context size	AST-aware compression
claude-token-optimizer	Doc structure / loading	11k → 1.3k session start
Token Optimizer MCP	MCP tool definitions	95%+
PromptThrift MCP	Conversation history	70–90%
Token Savior	Code navigation	~97%
Token Optimizer	Ghost tokens / session health	Varies

The numbers above come from each project's own documentation — treat them as upper bounds under ideal conditions, not guarantees.

Pick one from Category 1 and one from Category 2. Those two changes alone will have the most impact on day-to-day cost for most workflows.

Top comments (1)

Harjot Singh • May 31

Solid curated list, and "unnecessarily" is the key qualifier - the goal isn't to use fewer tokens, it's to stop wasting the ones that buy you nothing. The biggest unnecessary burn I see with Claude specifically: re-sending a huge context every turn when only a slice changed, and letting it re-read files it already has in context. Tooling that scopes/caches that is worth more than any single trick.

If I could add a tenth: a hard per-session/per-task budget that actually stops the run, not just reports it. Tools that show you the burn are great, but the ones that prevent the runaway (cap + abort) are the ones that save you from the $200-overnight horror story. That prevention-over-observation stance is baked into Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - scoped context + hard caps are why a full build stays ~$3 flat instead of quietly ballooning. Genuinely useful roundup. Of the 9, which gave you the biggest single drop in token burn? I'd bet it's a context-scoping one rather than a model-swap, but curious if your experience differs.