I'm on a paid Claude Code plan. A few weeks ago, I noticed my usage limits were hitting way faster than expected. I wasn't doing anything unusual, just regular development work. But Claude kept running out of context mid-conversation, forgetting things I'd said 10 messages ago, and compacting earlier than it should. (Compaction is when Claude Code summarizes earlier messages to free up context space. When it happens too early, you lose nuance and detail from earlier in the conversation.)
I went looking for answers. LinkedIn, Dev.to, Instagram, Reddit. Most articles said the same things, and honestly, half of them were copies of each other. Token reduction tips, useful skills lists, prompt tricks. I decided to stop bookmarking and start testing. Tried every method I came across, measured the results, and kept what actually worked.
Here's what I found.
The 50,000 Token Problem You Don't Know You Have
When you install skills in Claude Code, their metadata loads into your context window on every single message. And when a skill's trigger matches your prompt, the full content loads too. The more skills you have installed, the more metadata overhead you carry per turn, and the more likely full skill content gets pulled in during a busy session.
I came across the Everything Claude Code repository and was honestly amazed. Skills, agents, commands, rules, all packaged together. So I did what most people would do: installed everything globally.
That was a mistake.
Here's what my setup looked like before I realized the problem:
Component Size Estimated Tokens
Skills (global) 196KB ~50,000
Agent definitions 58KB ~15,000
Command files 142KB ~36,000
Rule files 9KB ~2,000
TOTAL 405KB ~103,000 tokens
(Rough estimate: 1KB of text ≈ 250 tokens. Not all of this loads on every turn because skills use progressive disclosure, loading only metadata first and full content when triggered. But the potential overhead is still massive, and in practice, a busy session triggers many of them.)
Over 100,000 tokens of potential overhead sitting in my setup. That's a significant chunk of Claude's context window spent on instructions, most of which weren't relevant to what I was doing at that moment.
No wonder my conversations were getting compacted early. No wonder Claude was "forgetting" things. There wasn't enough room left for the actual work.
How to Check Your Own Overhead
Before you do anything else, run this in your terminal (Windows users: use Git Bash, not PowerShell):
du -sh ~/.claude/skills/ ~/.claude/agents/ ~/.claude/commands/ ~/.claude/rules/
Reading your results:
Each line shows the size of a directory. Add them up for your total overhead.
Example output:
144K /Users/you/.claude/skills/
76K /Users/you/.claude/agents/
172K /Users/you/.claude/commands/
9K /Users/you/.claude/rules/
That's 401KB total. To estimate tokens, multiply your total KB by 250 (1KB ≈ 250 tokens). So 401KB ≈ 100,000 tokens of potential overhead. Not all of it loads every turn (skills use progressive disclosure), but the more skills you have, the more likely multiple will trigger and load fully during a session.
If your skills directory alone is over 100KB, you're almost certainly carrying skills you don't use in most projects.
For context, my setup was 405KB before I touched anything. After moving domain-specific skills to project level and cleaning up unused agents, it dropped to 232KB. Same capabilities, 44% less overhead.
The Fix: 44% Reduction in One Afternoon
The principle is simple: only keep things globally that you use in 80%+ of your projects. Everything else goes to project level, where it only loads when you're working in that specific project.
I went from 20 global skills down to 6. The other 14 moved to the projects that actually needed them.
Component Before After Saved
Skills (global) 196KB 51KB 145KB (74% reduction)
Agent definitions 58KB 52KB 6KB
Command files 142KB 120KB 22KB
Rule files 9KB 9KB 0KB (modified, not reduced)
TOTAL 405KB 232KB 173KB (~44% reduction)
What I kept globally (the skills I use in every project):
- Coding standards (applies to every language)
- Security review (should check this everywhere)
- TDD workflow (I practice TDD daily)
- Verification loop (prevents claiming things are done before checking)
- Strategic compaction (suggests when to compact context manually)
- Continuous learning (tracks patterns across sessions)
What I moved to project level:
Docker patterns, Python patterns, React patterns, e2e testing, eval harness, iterative retrieval, full-stack patterns, and several others. These are useful but only in specific projects. Loading Docker patterns while I'm writing documentation is pure waste.
The difference was immediate. Conversations lasted longer before compaction. Claude held context from earlier in the session. Fewer "I don't have context on that" moments.
The Tool Output Problem Nobody Talks About
Most optimization advice focuses on what's loaded at the start of a conversation: skills, rules, CLAUDE.md. But there's another source of token waste that's just as big, and almost nobody mentions it.
Every time Claude runs a CLI command (git status, npm test, a build command), the raw output gets dumped into the context window. And here's the thing most people miss: that output gets re-read on every subsequent turn. It doesn't disappear.
Think about it this way. You ask Claude to run your test suite. The output is 5,000 tokens. 4,950 of those tokens are passing tests. 50 tokens are the actual failures you care about. But all 5,000 tokens sit in context and get re-read on turn 2, turn 3, turn 4, and every turn after.
Over a 20-turn session with 50 tool calls, you can easily accumulate 100,000+ tokens of tool output. Most of it noise.
RTK: The Token Saver That Actually Made a Difference
RTK (Rust Token Killer) is an open-source tool that filters CLI output before it enters Claude's context window. It applies four optimization passes: smart filtering (removes noise), grouping (aggregates similar items like errors by type), truncation (keeps relevant context, cuts redundancy), and deduplication (collapses repeated log lines with counts).
Real savings from my sessions:
| Command Category | Example Commands | Token Savings |
|---|---|---|
| Build output | cargo build, tsc, next build | 80-90% |
| Test output | vitest, pytest, playwright | 90-99% |
| Git operations | git status, git diff, git log | 59-80% |
| File listings | ls, find, grep | 60-75% |
The way I explain it to people: imagine you ask a librarian to check something. Without RTK, the librarian carries back the entire bookshelf, drops it on your desk, and says "the answer is on page 47." With RTK, the librarian comes back with just page 47, highlighted. Same answer. But your desk isn't buried anymore.
Installing RTK
# macOS/Linux (recommended)
brew install rtk
# Or via Cargo (IMPORTANT: do NOT run "cargo install rtk" without
# the git URL — that installs "Rust Type Kit", a completely
# different package. If "rtk gain" fails, you have the wrong one.)
cargo install --git https://github.com/rtk-ai/rtk
# Or via quick-install script
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
# Then add to Claude Code globally
rtk init -g
On Unix (macOS/Linux), RTK installs as a PostToolUse hook. It works transparently. Claude doesn't even know it's there. Zero token overhead.
On Windows, it works through Git Bash. The hook and RTK.md get installed the same way. If you're using Claude Code with Git Bash as your shell (which most Windows developers do), the experience is identical to macOS/Linux. The RTK.md file that gets created adds about 1,200 tokens of instructions, but a single filtered git diff saves more than that. Net positive after your first tool call.
Windows-specific tips:
- Download the pre-built binary from the releases page (rtk-x86_64-pc-windows-msvc.zip), or install via
cargo install --git https://github.com/rtk-ai/rtkin Git Bash - Make sure the binary path is in your system PATH
- Run
rtk init -gthe same as on Unix - Run from Git Bash, not native PowerShell (some shell integrations assume bash)
Measuring Your Savings
RTK has built-in analytics:
# See your cumulative savings
rtk gain
# See savings per command type
rtk gain --history
# Find commands you ran WITHOUT rtk that could have been optimized
rtk discover
The rtk discover command is the most useful one when you're starting out. It scans your Claude Code session logs and shows you exactly which commands you could have filtered but didn't.
The Memory System That Stops Claude From Asking the Same Questions
The last piece that made a real difference wasn't about reducing tokens. It was about making Claude smarter across sessions.
Claude Code has a file-based memory system at ~/.claude/projects/<project>/memory/. You create markdown files with frontmatter and Claude reads them at the start of every session.
I use four types:
User memories: Who I am, my tech stack, my preferences. Instead of explaining my setup every session, Claude already knows.
Feedback memories: Every time I correct Claude, the correction gets saved. "Use plain text in forms, not bullets." "Don't suggest tools I haven't used." Claude stops repeating the same mistakes.
Project memories: Current state of work. Deadlines, decisions, context that would otherwise be lost between sessions.
Reference memories: Where to find things in external systems. "Bug tracking is in Linear project X." Saves the "where is that tracked?" conversation every time.
lessons.md: One File That Changes Everything
This is the simplest thing I did and possibly the most impactful. I keep a lessons.md file in every project's .claude/ directory. Every time I correct Claude on something, it writes a rule:
## 2026-03-15 - Don't add error handling for impossible cases
**Rule:** Only add try-catch blocks at system boundaries (user input,
API calls, file I/O). Don't wrap internal function calls that can't
realistically fail.
**Why:** Added defensive error handling around a pure math function.
User said "this function takes two integers and adds them, it can't
throw. You're adding complexity for nothing."
**Applies when:** Writing or reviewing error handling in any codebase.
Claude reads this file at the start of every session. The correction sticks permanently. Over a few weeks, the file becomes a precise set of rules that make Claude work exactly the way you need.
The principle is simple: never correct the same mistake twice. The first correction is a lesson. The second one means the system failed.
The Priority Order
If you're starting from scratch, here's what I'd do in order:
| Priority | What | Effort | Impact |
|---|---|---|---|
| 1 | Install RTK | 30 seconds | 60-90% tool output savings |
| 2 | Audit global skills, move domain-specific to project level | 15 minutes | Free up context window |
| 3 | Set up basic memory files (user profile + 2-3 feedback entries) | 10 minutes | Smarter responses, fewer repeated mistakes |
| 4 | Start a lessons.md file | 30 seconds to create, 30 seconds per correction | Permanent mistake prevention |
| 5 | Set MAX_THINKING_TOKENS env variable | 10 seconds | Cap runaway thinking, save tokens on over-analysis |
| 6 | Add model routing rules for subagents | 10 minutes | Route exploration/search subagents to cheaper models |
None of this is complicated. Most of it takes less than 15 minutes. But the compound effect of doing all six is significant: longer sessions, better context retention, fewer repeated mistakes, and lower token bills.
The tools are there. Most people just don't know they exist, or don't realize how much overhead they're carrying.
This is Part 1 of a series on getting more out of Claude Code. Part 2 covers RTK in depth, including Windows setup, configuration, subagent behavior, and community tools that complement it.
Top comments (0)