DEV Community

Cover image for RTK, Model Routing, and the Community Tools That Actually Work With Claude Code
Hari Venkata Krishna Kotha
Hari Venkata Krishna Kotha

Posted on

RTK, Model Routing, and the Community Tools That Actually Work With Claude Code

This is Part 2 of a series on getting more out of Claude Code. Part 1 covered the 50,000 token overhead problem, the 44% reduction fix, and the memory/lessons.md system.

In Part 1, I mentioned RTK saved me 60-90% on tool output tokens. This post goes deeper: how RTK actually works under the hood, the difference between Unix and Windows installations, model routing for subagents, environment variables for cost control, and 7 community tools I tested (most of which I didn't end up using).

RTK: How It Actually Works

RTK (Rust Token Killer) is a Rust-based CLI proxy that intercepts shell commands, runs them, and compresses the output before it reaches your AI tool's context window. It supports 10+ AI coding tools including Claude Code, GitHub Copilot, Cursor, Gemini CLI, Codex, Windsurf, Cline, and OpenCode, but this post focuses on Claude Code.

Version note: RTK is actively developed. The latest release is v0.35.0 (April 6, 2026), which expanded AWS CLI filters. I'm running v0.34.2 in this post — features and exact command output may differ slightly in newer versions.

RTK applies four optimization strategies to every CLI command output before it enters your context window:

Raw Output (5,000 tokens)
    ↓
Smart Filtering (remove ANSI codes, spinner artifacts, progress bars)
    ↓
Grouping (consolidate related output lines, collapse repeated patterns)
    ↓
Deduplication (deduplicate repeated patterns like passing tests)
    ↓
Truncation (keep errors/warnings, trim verbose success output)
    ↓
Filtered Output (500-2,000 tokens)
Enter fullscreen mode Exit fullscreen mode

Why This Matters More Than You Think: The Re-Read Tax

This is the concept that changed how I think about Claude Code optimization.

When Claude runs a command, the output stays in context. On the next turn, Claude re-reads ALL prior context, including every command output from earlier in the session. Then on the turn after that, it re-reads everything again.

Here's the math. Say you run git diff and it produces 2,000 tokens of output. Over a 10-turn conversation after that command:

Turn 1: 2,000 tokens read
Turn 2: 2,000 tokens re-read
Turn 3: 2,000 tokens re-read
...
Turn 10: 2,000 tokens re-read
Total: 20,000 tokens consumed from one command
Enter fullscreen mode Exit fullscreen mode

With RTK compressing that diff to 800 tokens (59% reduction):

Total: 8,000 tokens instead of 20,000
Savings: 12,000 tokens from a single command
Enter fullscreen mode Exit fullscreen mode

Now multiply across 80+ commands in a real coding session. From my actual work building a .NET 10 Blazor application: 80 RTK commands, 152K input tokens, 39K output tokens, 113.6K tokens saved at 74.6% efficiency. The re-read savings compound on top of that — each saved token gets re-read on every subsequent turn, so the actual context reduction is a multiple of the direct savings.

Unix vs Windows: Two Different Integration Models

This is something the README doesn't make obvious. RTK works fundamentally differently depending on your OS.

Unix (macOS/Linux) uses Hook Mode:

How it works:
1. RTK installs a PreToolUse hook in Claude Code's hooks system
2. When Claude runs any Bash command, the hook rewrites the command BEFORE execution
   (e.g., git status becomes rtk git status)
3. RTK filters the output transparently
4. Claude doesn't know RTK exists

Token overhead: 0
Setup: rtk init -g --hook-only
Enter fullscreen mode Exit fullscreen mode

The --hook-only flag is important. Without it, RTK also creates an RTK.md file with instructions for Claude. But since the hook works transparently (Claude doesn't need to know about RTK), that file adds unnecessary per-turn overhead for zero benefit.

Windows uses CLAUDE.md Mode (the only option on Windows):

How it works:
1. RTK adds instructions to ~/.claude/CLAUDE.md
2. These instructions tell Claude: "prefix all Bash commands with rtk"
3. Claude reads the instructions every turn and writes: rtk git status
4. RTK binary filters the output

Token overhead: the CLAUDE.md instructions add some per-turn overhead
Setup: rtk init -g --claude-md
Enter fullscreen mode Exit fullscreen mode

Windows can't use hook mode. When you run rtk init -g on Windows, RTK explicitly tells you "Hook-based mode requires Unix (macOS/Linux)" and falls back to --claude-md automatically. Note that --claude-md is now labeled "legacy mode" in the latest RTK help text (v0.34+), but on Windows it remains the only working option.

Is the CLAUDE.md overhead worth it on Windows?

Yes. A single rtk git diff typically saves more tokens than the instructions cost. A single rtk pytest can save thousands of tokens. The overhead pays for itself on your first filtered command, and every command after that is pure savings.

Installing RTK on Windows: Step by Step

This is what I actually did. Recording it because several things aren't obvious from the docs:

# Step 1: Install RTK
# Option A: Homebrew (macOS)
brew install rtk

# Option B: Curl installer (macOS/Linux)
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh

# Option C: Cargo (Windows — use Git Bash, not PowerShell)
cargo install --git https://github.com/rtk-ai/rtk

# Step 2: Find where cargo put the binary (Windows only)
# Usually: C:\Users\<username>\.cargo\bin\rtk.exe
# Add this to your system PATH if it's not already

# Step 3: Initialize for Claude Code
rtk init -g --claude-md

# Step 4: Verify it works
rtk --version
rtk git status
Enter fullscreen mode Exit fullscreen mode

Things that tripped me up:

  • cargo install rtk (without the git URL) installs the wrong package (Rust Type Kit, a completely different tool). Always use the full git URL.
  • Run from Git Bash, not native PowerShell. Some RTK shell integrations assume bash.
  • If you use VS Code's integrated terminal, make sure it's set to Git Bash, not PowerShell.
  • The binary path needs to be in your PATH environment variable for Claude Code to find it.

RTK Configuration

RTK stores config at:

  • Windows: %APPDATA%\rtk\config.toml (or ~/.config/rtk/config.toml in Git Bash)
  • macOS/Linux: ~/.config/rtk/config.toml

Two settings worth knowing:

# Exclude specific commands from filtering
# (if RTK strips output you actually need to see)
[hooks]
exclude = ["some-command-that-needs-raw-output"]

# Tee: saves raw output when commands fail
# Your safety net if RTK strips a critical error message
[tee]
enabled = true
rotation_limit = 5
Enter fullscreen mode Exit fullscreen mode

The tee feature is like a flight recorder on an airplane. During normal operation, you never need it. But if RTK strips a critical error and Claude misses a bug, you can recover the unfiltered output.

Measuring Your Savings

# Cumulative savings across all sessions
rtk gain

# Per-command breakdown
rtk gain --history

# Find commands you ran WITHOUT rtk that could have been filtered
rtk discover
Enter fullscreen mode Exit fullscreen mode

Here's the actual rtk gain output from my work laptop while building a .NET 10 Blazor application:

RTK gain output showing 80 commands, 113.6K tokens saved at 74.6% efficiency, with rtk dotnet test as the top filter at 99.1% savings across 19 runs

74.6% efficiency across 80 commands. 113,600 tokens saved. The rtk dotnet test filter alone saved 108K tokens across 19 runs. dotnet test output is verbose by default (test discovery, build output, individual test results, summary), and RTK strips it down to just failures and counts.

The rtk discover command is the most useful when starting out. It scans your session logs and shows commands you ran without the rtk prefix that could have been filtered. Basically shows you your missed savings.

Commands Worth Knowing

A few commands that aren't in the basic README but are useful:

# Show your RTK adoption across recent Claude Code sessions
rtk session

# Claude Code spending vs RTK savings analysis
rtk cc-economics

# Filter for .NET commands (build, test, restore, format)
rtk dotnet test
rtk dotnet build

# Learn CLI corrections from your error history
rtk learn
Enter fullscreen mode Exit fullscreen mode

The rtk dotnet filter is the one that produced 99% savings on my tests. If you're a .NET developer, that filter alone justifies the install. There are similar specialized filters for Cargo, Vitest, Pytest, Playwright, Prettier, Prisma, Next.js, ESLint, TypeScript, Docker, kubectl, and around 100+ commands total.

When RTK Shines vs When It Doesn't

This is the most important thing to understand about RTK, and nobody talks about it: RTK only intercepts Bash commands. Claude Code's built-in tools (Read, Write, Edit, Grep, Glob, WebFetch, WebSearch) bypass Bash entirely and never touch RTK.

In a typical Claude Code session, you might run 5-10 Bash commands vs 50-100 dedicated tool calls. If your session is mostly Read/Edit/Grep operations, RTK savings will be minimal — not because RTK is broken, but because there's nothing for it to intercept.

RTK shines in sessions where Bash is heavily used:

  • Running builds: rtk dotnet build, rtk cargo build, rtk next build
  • Running tests: rtk dotnet test, rtk vitest run, rtk pytest, rtk playwright test
  • Git operations: rtk git diff, rtk git log, rtk git status
  • Package managers: rtk pnpm install, rtk npm run build
  • Docker/K8s: rtk docker ps, rtk kubectl get pods

This is exactly what my work data showed: 80 commands, 74.6% efficiency, and the biggest savings came from rtk dotnet test (99% reduction across 19 runs). When I'm building features and running test suites repeatedly, RTK saves real tokens. When I'm in a code review session reading files and editing inline, RTK has nothing to do.

Sessions where RTK savings are minimal:

  • Conversation-heavy sessions (design discussions, explanations)
  • Code review sessions (mostly Read/Edit dedicated tools)
  • File search and exploration (Grep/Glob dedicated tools)
  • Very short sessions (1-3 turns) — the re-read tax hasn't compounded yet

This isn't a bug. It's a fundamental architecture choice. If you're optimizing token usage, install RTK AND make sure you're using dedicated tools instead of cat/head/find/grep via Bash. Both matter.

Model Routing: Stop Burning Opus Tokens on File Searches

If you're on Opus (or even Sonnet), every subagent Claude spawns runs on the same model by default. That means when Claude kicks off a code-reviewer agent, an exploration search, or a simple git status check through a subagent, it burns your most expensive tokens.

The fix is adding model routing rules to your global rules files. I created a performance.md in ~/.claude/rules/common/ with explicit model assignments:

Use Haiku for:

  • File search, grep, glob, codebase exploration
  • Summarizing search results or documentation
  • Simple formatting, renaming, mechanical edits
  • Reading and reporting file contents
  • Git status checks, log summaries

Use Sonnet for:

  • Code generation, implementation, refactoring
  • Code review
  • Test writing
  • Build error fixing
  • Planning and documentation

Use Opus only for:

  • Architecture decisions requiring multi-system reasoning
  • Deep debugging across 5+ files with complex interactions
  • Multi-dimensional analysis tasks

The rule file sets the default subagent model to Sonnet and lists specific overrides. Claude Code reads this on every session and applies the routing automatically when spawning subagents with the model parameter.

This doesn't change your main conversation model. It only affects subagents. But subagents can account for a significant portion of token usage in complex sessions, especially when Claude spawns multiple exploration or review agents.

Environment Variable Worth Setting

One variable that gives you cost control without changing your workflow:

# Cap extended thinking tokens (default is 31,999 which can be excessive)
export MAX_THINKING_TOKENS=10000

# These go in your shell profile (~/.bashrc, ~/.zshrc,
# or Windows environment variables)
Enter fullscreen mode Exit fullscreen mode

MAX_THINKING_TOKENS is the most impactful. Claude's extended thinking can use up to 32K tokens of internal reasoning before responding. For most tasks, 10K is more than enough. The default is generous and burns tokens on over-analysis.

7 Community Tools I Tested (And Why I Kept Only 2)

I deep-researched seven community tools that claim to enhance Claude Code. Here's the honest breakdown:

Tools I Kept

1. RTK (Rust Token Killer) — Already covered above. The single most impactful optimization tool.

2. lessons.md Pattern (from CCO/Claude Code Optimization) — Not really a "tool," but a methodology. Keep a lessons.md file in each project, write a rule every time you correct Claude. Simple, effective, zero overhead. Covered in Part 1.

Tools I Evaluated and Skipped

3. claude-mem (Memory Manager)
Promises persistent memory across sessions via an embedded vector database. Sounds great in theory. Concerns I found during evaluation:

  • Has reported Windows compatibility issues including a multi-GB ONNX model download requirement
  • The built-in memory system in ~/.claude/projects/<project>/memory/ already handles persistent memory with simple markdown files, no vector DB needed
  • Verdict: Skip on Windows. Linux/Mac users may have a smoother experience.

4. CCO (Claude Code Optimizer)
A package of configuration files (skills, rules, agents) designed for Claude Code. The self-improvement loop pattern (lessons.md) is genuinely useful and I adopted it. But the rest of the configuration overlapped heavily with what I already had from Everything Claude Code.

  • Verdict: Adopt the lessons.md pattern. Skip the rest if you already have ECC.

5. Superinterface / CLine / Similar IDE Extensions
Various tools that wrap Claude Code with additional UI. The problem: Claude Code already works well in the terminal and VS Code. Adding another layer introduces latency, potential conflicts, and more things that can break.

  • Verdict: Unnecessary complexity for most workflows.

6. Custom MCP Servers for Token Tracking
Some community members built MCP servers that track token usage per conversation. Interesting idea, but RTK's rtk gain command already gives you this data without the setup overhead.

  • Verdict: RTK covers this use case.

7. Automated Session Management Tools
Tools that auto-compact, auto-checkpoint, or auto-restart sessions. The problem is they make assumptions about when you want to compact or restart. Claude Code's built-in compaction (with the strategic-compact skill nudging you at good breakpoints) worked better for me than automated approaches.

  • Verdict: Use the strategic-compact skill instead.

The Pattern

Most community tools try to solve problems that Claude Code already handles, just not obviously. Before installing any third-party tool, check if there's a built-in feature, a rule file, or a skill that does the same thing with less overhead.

The Complete Optimization Stack

Here's everything I run, in priority order:

# What Token Impact Setup Time
1 RTK 60-90% tool output savings 30 seconds
2 Environment variables (MAX_THINKING_TOKENS) Caps runaway thinking 10 seconds
3 Skills audit (global vs project-level) Frees 74% of skill overhead 15 minutes
4 Model routing rules Routes subagents to cheaper models 10 minutes
5 Memory system (user + feedback files) Smarter responses across sessions 10 minutes
6 lessons.md file Permanent mistake prevention 30 seconds to create

Total setup time: under 30 minutes. The compound savings across a week of coding sessions add up fast.


Part 1 covered the token overhead problem and the 44% fix. Parts 3 and 4 (Skills.sh ecosystem guide and curated skills by category) are in the works.

Top comments (0)