DEV Community

Jangwook Kim
Jangwook Kim

Posted on • Originally published at jangwook.net

I Tried RTK (Rust Token Killer) — A CLI Proxy That Cuts LLM Token Costs 60–90%

A month into using Claude Code, I opened the bill and flinched. The real culprit wasn't the API calls themselves — it was Bash command output packing the context window. find . -name "*.ts" spitting out hundreds of lines, cargo test dumping thousands — all tokens, all billable.

RTK (Rust Token Killer) directly targets that problem. It sits between your AI coding agent and the shell, compressing command output before it ever reaches the LLM. The claim: 60–90% token savings. I ran actual tests to see if that holds up.

What RTK Actually Does

One sentence: command output filter. It takes the results of git status, find, ls -la and strips everything the LLM doesn't need to understand the state of your project.

Four strategies do the work:

  • Smart Filtering: removes irrelevant lines (progress bars, timestamps, repeated headers)
  • Grouping: condenses similar items (28 .ts files instead of 28 individual lines)
  • Truncation: cuts output above a threshold, marks it ...(truncated)
  • Deduplication: removes repeated output patterns

Integration with Claude Code happens via a PreToolUse hook. If you understand how Claude Code hooks work, the design clicks immediately — rtk init -g registers it automatically under ~/.claude/hooks/. After that, when Claude Code runs git status, the hook intercepts it and rewrites it as rtk git status. Claude Code never knows.

Supported agents: Claude Code, Cursor, Windsurf, Cline, GitHub Copilot CLI, Gemini CLI, Antigravity, Hermes. Single Rust binary, zero runtime dependencies.

I Installed It and Measured

cargo install --git https://github.com/rtk-ai/rtk
Enter fullscreen mode Exit fullscreen mode

Requires Rust toolchain. On my M3 Mac it took about 1 minute 40 seconds to compile. Verify:

rtk --version
# rtk 0.40.0
Enter fullscreen mode Exit fullscreen mode

Before integrating with Claude Code, I ran manual benchmarks against my blog repository (Astro-based, 258 files).

Test 1: find command

find src/content/blog/ko/ -name "*.md" -type f
Enter fullscreen mode Exit fullscreen mode
  • Original: 15,360 chars
  • RTK: 2,070 chars
  • Savings: 86.5% (99.9% in tokens)

Here's what RTK's find output looks like:

28F 1D:

./ claude-agent-sdk-subagents-orchestration-tutorial-2026.md
claude-api-prompt-caching-cost-optimization-guide.md
claude-code-agentic-workflow-patterns-5-types.md
...
Enter fullscreen mode Exit fullscreen mode

It summarizes the count (28F 1D: 28 files, 1 directory) and compresses path prefixes. The dramatic savings come from the fact that find output is mostly repeated path segments.

Test 2: ls -la (large directory)

ls -la src/content/blog/ko/
Enter fullscreen mode Exit fullscreen mode
  • Original: 23,848 chars (full permissions, owner, timestamps)
  • RTK: 12,069 chars
  • Savings: 49.4%

Drops the permissions string (drwxr-xr-x), owner (jangwook staff), and exact timestamps. Leaves filename and size. Reasonable — that's usually what the LLM actually needs.

Test 3: git status (small output)

  • Original: 274 chars
  • RTK: Actually grew larger (expanded tracked file list)
  • Savings: None (counterproductive)

This is important. Small outputs like git status get inflated by RTK rather than compressed.

Test 4: git log --oneline -20

  • Savings: 0% (passthrough)

Short, already-structured output passes through untouched.

Full stats (rtk gain):

RTK Token Savings (Global Scope)
════════════════════════════════════════════════════════════

Total commands:    6
Input tokens:      9.1K
Output tokens:     3.8K
Tokens saved:      5.5K (60.6%)
Total exec time:   153ms (avg 25ms)
Efficiency meter: ███████████████░░░░░░░░░ 60.6%

By Command
  rtk ls:    49.4% saved
  rtk find:  99.9% saved (in tokens)
  rtk git:   0% saved
Enter fullscreen mode Exit fullscreen mode

60.6% average across 6 commands. But this test was skewed toward large-output commands (find, ls), so the headline number looks better than a real mixed workload would.

Where It Works and Where It Doesn't

Honestly, the advertised "60–90%" doesn't show up for every command. The effect splits sharply by command type.

Commands where RTK helps a lot:

Command Why
find Mostly repeated path prefixes → dramatic compression on summary
ls -la (large dirs) Permissions and owner removal adds up
cargo test Strips success cases, shows only failures
npm test / jest Test summary compression
docker ps Long container IDs and port info compressed
grep -r (large) Context line removal

Commands where RTK barely helps:

Command Why
git status (small changes) Already short → nothing to compress
git log --oneline Already condensed format
cat (single file) Content is the point — can't compress
echo, pwd, simple commands Passthrough

Project shape matters. TypeScript monorepo with frequent tsc --noEmit runs? RTK probably helps noticeably. Mostly API calls and small file edits? Savings will be minimal.

How to Integrate with Claude Code

Two steps.

Step 1: Install

cargo install --git https://github.com/rtk-ai/rtk
Enter fullscreen mode Exit fullscreen mode

Step 2: Register the Claude Code hook

rtk init -g
Enter fullscreen mode Exit fullscreen mode

-g means global (~/.claude/). Drop -g if you only want per-project application. It asks:

  • Whether to auto-patch ~/.claude/settings.json (say y)
  • Whether to create ~/.claude/RTK.md (Claude's awareness file for rtk usage)

Restart Claude Code. After that, all Bash tool calls are automatically rewritten through RTK. Run git status through Claude Code and check rtk gain — if it shows a new record, the integration worked.

One caveat: the hook only fires on Bash tool calls. Claude Code's built-in Read, Grep, and Glob tools bypass the hook entirely. Actual savings depend on how often your agent uses Bash for file exploration vs. native tools.

Another thing: there's a different Rust tool also named rtkreachingforthejack/rtk (Rust Type Kit). If rtk gain doesn't work after installation, check for a name collision.

Putting RTK in Context

Three layers where you can cut LLM agent costs:

  1. Model selection: switch to cheaper models (Haiku, Flash)
  2. API layer: prompt caching, batch API, MCP schema compression
  3. Shell layer: RTK (command output compression)

RTK is layer 3. If you haven't touched this layer yet, there's likely real savings here. If you've already optimized model choice and caching, RTK's marginal contribution gets smaller.

What I genuinely like: it's transparent. rtk init -g once, done. No changing how you write prompts. No manually prepending rtk to every command. Just use Claude Code the same way you always have. That kind of zero-behavior-change optimization has very low psychological friction.

Honest Limitations

A few things frustrated me after a few days of real use.

First: RTK's compression can strip important information. I had a failed test where the error stack trace was partially truncated. You can access raw output with rtk err <command> or rtk proxy <command>, but Claude Code can't automatically decide when it needs the full output.

Second: Installation requires Rust toolchain. No official binary releases as of v0.40.0. For teams without Rust in their stack, onboarding friction is real.

Third: Don't let the benchmark numbers mislead you. My actual Claude Code costs are dominated by Read, Grep, and code generation — areas RTK doesn't touch. The "60–90%" figure applies to find/ls-heavy tests. In a mixed real-world workload, 10–30% is the more honest estimate.

I wouldn't call RTK overhyped. But "cuts your Claude Code bill in half" is too much to expect. For projects with heavy file traversal or frequent test runs, it's genuinely useful. For everything else, single-digit savings is the likely outcome.

Finding Missed Savings: rtk discover

One underrated feature is discover. It analyzes your existing Claude Code session history and back-calculates how much you would have saved if those commands had run through RTK.

rtk discover
Enter fullscreen mode Exit fullscreen mode

In my case, the biggest token wasters in my history were find, npm install (verbose logs), and docker logs. Of those, only find typically ran through Claude Code's Bash tool — the others I'd been running directly in terminal, outside the hook's reach.

You can also check per-session stats:

rtk session
Enter fullscreen mode Exit fullscreen mode

This shows token savings and hook rewrite rates per Claude Code session. Oddly useful for understanding your own Bash-heavy vs. native-tool usage patterns.

RTK for Teams

Solo use is easy — install, rtk init -g, done. Team adoption needs more thought.

You'll want an onboarding script: guiding everyone from Rust install → RTK install → Claude Code hook registration. Put it in your Makefile or scripts/setup.sh:

#!/bin/bash
# scripts/setup-rtk.sh
if ! command -v cargo &> /dev/null; then
    echo "Rust required: https://rustup.rs"
    exit 1
fi

cargo install --git https://github.com/rtk-ai/rtk
rtk init -g --auto-patch
echo "RTK installed. Restart Claude Code."
Enter fullscreen mode Exit fullscreen mode

Document it in CLAUDE.md: Noting RTK setup in the team's CLAUDE.md helps new members understand what's happening when commands get auto-rewritten. RTK's absence on any individual machine doesn't break Claude Code — it just means that person isn't saving tokens.

CI/CD: don't bother: RTK only matters for local dev environments where a developer is working with an AI agent. Don't add it to your CI pipeline.

The honest blocker for team adoption is Rust. "Install Rust and wait 1:40 for a compile" is a real ask for Python or Node.js teams. Official binary releases would fix this — but they don't exist as of v0.40.0.

How RTK Compares to Other Token Optimization Approaches

Where does RTK fit in the cost optimization stack?

Method Layer Applies To Complexity Expected Savings
Model downgrade (Haiku/Flash) Pricing All API calls Low 5–20x (price diff)
Prompt caching API Repeated system prompts Medium 40–70%
MCP schema compression (mcp2cli) API MCP tool injection Medium 96–99%
RTK Shell Bash command output Low 0–90% (varies)

Model downgrades have the biggest absolute impact but come with quality trade-offs. Prompt caching is powerful for repetitive workflows. MCP schema compression delivers dramatic savings if you're running many MCP tools. RTK has the lowest implementation friction and zero workflow change requirement.

As covered in the real cost of AI agents in production, agent costs accumulate across multiple factors. RTK addresses one slice of that — shell command output — and does it transparently. The right framing is "another layer in the cost stack," not a silver bullet.

Should You Install It?

My take: try it, but set realistic expectations. A 1 minute 40 second cargo install and one rtk init -g command. Check rtk gain after a week — if you're saving 10%+ keep it running, if not, rtk init -g --uninstall and move on.

RTK is a well-built tool. 50+ supported commands, clean agent integration, solid tracking. The expectation adjustment is the key thing: this is a "nice optimization layer," not a "cut your bill in half" move.

MIT license, free, open source. Nothing to lose.

GitHub: rtk-ai/rtk

Official site: rtk-ai.app

Top comments (0)