Stop Paying for Noise: Trim LLM Tokens from Both Ends of the Pipe

#ai #coding #productivity #claude

The Token Tax You Are Paying

Every time an LLM-powered coding agent runs cargo test or git status, it swallows reams of output. Most of that is noise—progress bars, ANSI escapes, empty lines. You pay for every token. On the other side, verbose model replies burn even more. The result is a slow, expensive loop that scales badly.

Two open-source tools attack the problem from opposite ends of the pipe. RTK strips input noise before it reaches the model. caveman forces the model to talk like, well, a caveman. Together they keep more of your token budget for work that matters.

How RTK Compresses the Input Stream

RTK is an OSS CLI proxy. It sits between your terminal and the LLM, reading command output and dropping everything that is not signal.

The numbers are stark. Across 2,927 real-world developer commands, RTK saved 10.3M tokens from 11.6M input tokens—an 89.2% reduction [Source]. The tool is not guessing; it is measuring.

Per-command compression rates from the RTK website show consistent results:

cargo test: 91.8%
git status: 80.8%
find: 78.3%
grep: 49.5%

The RTK repository describes it as a “CLI proxy that reduces LLM token consumption by 60-90% on common dev commands.” The tool is lightweight and plugs into existing workflows without changing how you run commands.

caveman Takes the Output Side

If RTK handles the flood of input tokens, caveman disciplines the output. It is a Claude Code skill that instructs the model to respond with minimal words. The caveman repository states it “cuts 65% of tokens by talking like caveman.”

The principle is simple: fewer output tokens mean faster completion and lower costs. caveman does not alter the substance of the response; it just strips the fluff. For routine tasks—explaining an error, summarising a diff—the 65% saving is pure gain.

Why Both Sides Matter

Input token reduction is the biggest lever. An 89% drop on commands that run hundreds of times per session rapidly compounds. Output reduction is smaller in absolute terms but still valuable; 65% less output per interaction keeps the conversation tight and responsive.

Using both tools creates a high-efficiency loop: slim input, slim output, same results. Neither tool requires complex configuration, and both are available as OSS under the MIT licence for RTK and a similarly permissive setup for caveman.

What Is Missing

The evidence shows each tool works independently. No combined benchmark exists yet. The 65% output figure for caveman comes from the repository description alone; per-task examples would strengthen the case. RTK’s aggregate data is solid, but session-level detail is not published. These gaps do not undermine the core claim—that trimming both ends of the pipe saves meaningful money—but they are worth noting before measuring an integrated setup.

A Grounded Takeaway

If you pay for LLM tokens, you are paying for noise. RTK and caveman attack that noise at the input and output stages respectively. The savings are measurable, and both tools are free to use. Start with RTK—the 89% input reduction is the headline figure—and add caveman when verbose model responses are eating into your budget.

Would you use both tools in the same workflow? The data suggests you should.

Top comments (1)

Harjot Singh • Jun 1

trimming tokens from both ends is exactly the discipline that makes long autonomous runs affordable. I obsess over this in Moonshift since an overnight build can rack up cost fast if context isn't scoped per step. agents build + deploy + market a SaaS end to end. good piece. first run's free if you want to see the token budgeting in action.