DEV Community

Phasu  Yeneng
Phasu Yeneng

Posted on

9 Verified Tools to Stop Burning Claude Tokens Unnecessarily

You're not using Claude more — you're just wasting more context.

I went looking for real, working tools after seeing a widely-shared list that mixed legitimate repos with hallucinated ones. This article only covers tools I could verify on GitHub, organized by the type of waste they fix.


Why tokens disappear faster than you expect

Before the tools: understanding where tokens actually go.

Most developers assume their prompts are the main cost. They're not. The real culprits are:

  • Verbose model output — Claude explaining what it's about to do, then doing it, then summarizing what it did
  • Raw CLI output — dumping full git log, npm install, or test runner output directly into context
  • Bloated CLAUDE.md — this file loads on every turn before Claude reads a single line of your code. A 5,000-token CLAUDE.md costs 5,000 tokens per message, before you've typed a word
  • Code navigation by content — when Claude reads entire files to find one function instead of navigating by symbol
  • Ghost tokens — leftover context from earlier in the session that no longer contributes to the task but still costs money every turn

Each category has a different fix. Here's what actually works.


Category 1 — Shrink what Claude writes back

Caveman

The simplest idea with surprisingly good results: make Claude talk like a caveman. Short words. No filler. Still technically accurate.

It ships as a Claude Code skill that cuts ~65–75% of output tokens while preserving full technical accuracy. The compression is aggressive but the signal stays intact — you get fix auth bug in login.js line 42 instead of three paragraphs explaining what the fix does.

Works as a plugin for Cursor, Windsurf, Cline, and others too.

# install as a Claude Code skill
# see: https://github.com/juliusbrussee/caveman
Enter fullscreen mode Exit fullscreen mode

Best for: Long coding sessions where Claude's explanations are eating your context budget.


claude-token-efficient

Single CLAUDE.md file. Drop it into your project. Done.

It bakes response-terseness rules directly into Claude's instructions, forcing shorter output on heavy workflows without you having to prompt for it every time. No code changes, no new dependencies.

Best for: Projects where you want a permanent "be concise" baseline without running an extra tool.


Category 2 — Compress what you send in

RTK (Rust Token Killer)

A CLI proxy that filters terminal output before it reaches Claude. Instead of Claude seeing 2,000 tokens of raw git status, it sees ~200 tokens of the relevant parts.

The hook transparently rewrites shell commands — git status becomes rtk git status — and Claude never sees the rewrite, just the compressed result.

# without rtk
git status  → ~2,000 tokens raw output

# with rtk
rtk git status  → ~200 tokens filtered output
Enter fullscreen mode Exit fullscreen mode

Claims 60–90% reduction on common dev commands. Single Rust binary, zero dependencies.

Best for: Teams running agentic workflows where Claude executes a lot of shell commands.


Headroom

A context optimization layer that sits between your app and the LLM. Three compression modes:

  • SmartCrusher — JSON compression
  • CodeCompressor — AST-aware code compression (understands structure, not just text)
  • Kompress-base — general text compression

The AST-aware approach is the interesting one. It doesn't just truncate code — it understands which parts of a file are structurally relevant and compresses accordingly.

Best for: Applications (not just Claude Code) that programmatically build context before sending to any LLM.


claude-token-optimizer

Reusable CLAUDE.md setup prompts that structure your documentation so Claude only loads what it needs per task.

One real-world example from the repo: a RedwoodJS project reduced session start from 11,000 tokens down to 1,300 by restructuring which docs load when.

Best for: Projects with large documentation that Claude currently loads all at once.


Category 3 — MCP-level optimization

Token Optimizer MCP

Intelligent token optimization for Claude Code via MCP. Claims 95%+ reduction through caching, compression, and smart tool intelligence — meaning it tracks which tools Claude actually uses and optimizes the tool definitions it sends.

Best for: Claude Code users running MCP-heavy workflows.


PromptThrift MCP

Compresses conversation history using a local Gemma 4 model (runs on your machine, no extra API cost) or heuristic fallback. Key feature: pinned facts — you can mark specific context as protected so it survives compression.

Claims 70–90% reduction on conversation history while keeping critical context intact.

Best for: Long multi-turn sessions where early context becomes expensive dead weight.


Token Savior

MCP server specifically built for code navigation. Instead of Claude reading entire files to find a function, it indexes your codebase by symbol — functions, classes, imports, call graph — and navigates by pointer.

Also includes a persistent memory engine that stores decisions, conventions, and session summaries in SQLite and re-injects them as a compact delta at session start.

Claims 97% reduction on code navigation across 170+ real sessions.

Best for: Large codebases where Claude spends significant tokens just finding the right file and function.


Category 4 — Clean up ghost tokens

Token Optimizer

Targets "ghost tokens" — context that's still technically in the window but no longer relevant to the current task. Also helps survive context compaction without losing quality.

More of a diagnostic and cleanup tool than a compression layer.

Best for: Sessions that run long and accumulate stale context from earlier tasks.


The free fix most people skip

Before installing anything: audit your CLAUDE.md.

According to Claude Code's official documentation, CLAUDE.md loads before every response, before Claude reads your code, before anything else. It's the most expensive file in your project on a per-token basis.

The recommended limit is under 200 lines. If yours is longer, move sections into on-demand skill files that only load when invoked. Most CLAUDE.md files I've seen in the wild are 3–5x longer than they need to be.


Summary

Tool What it fixes Claimed reduction
Caveman Verbose model output ~65–75% output tokens
claude-token-efficient Verbose model output Drop-in terseness
RTK Raw CLI output in context 60–90% on shell commands
Headroom Input context size AST-aware compression
claude-token-optimizer Doc structure / loading 11k → 1.3k session start
Token Optimizer MCP MCP tool definitions 95%+
PromptThrift MCP Conversation history 70–90%
Token Savior Code navigation ~97%
Token Optimizer Ghost tokens / session health Varies

The numbers above come from each project's own documentation — treat them as upper bounds under ideal conditions, not guarantees.

Pick one from Category 1 and one from Category 2. Those two changes alone will have the most impact on day-to-day cost for most workflows.

Top comments (0)