DEV Community

Cover image for How I cut Claude Code's token overhead by 44% and stopped hitting usage limits mid-session.
Hari Venkata Krishna Kotha
Hari Venkata Krishna Kotha

Posted on

How I cut Claude Code's token overhead by 44% and stopped hitting usage limits mid-session.

I'm on a paid Claude Code plan. A few weeks ago, I noticed my usage limits were hitting way faster than expected. I wasn't doing anything unusual, just regular development work. But Claude kept running out of context mid-conversation, forgetting things I'd said 10 messages ago, and compacting earlier than it should. (Compaction is when Claude Code summarizes earlier messages to free up context space. When it happens too early, you lose nuance and detail from earlier in the conversation.)

I went looking for answers. LinkedIn, Dev.to, Instagram, Reddit. Most articles said the same things, and honestly, half of them were copies of each other. Token reduction tips, useful skills lists, prompt tricks. I decided to stop bookmarking and start testing. Tried every method I came across, measured the results, and kept what actually worked.

Here's what I found.

The 50,000 Token Problem You Don't Know You Have

When you install skills in Claude Code, their metadata loads into your context window on every single message. And when a skill's trigger matches your prompt, the full content loads too. The more skills you have installed, the more metadata overhead you carry per turn, and the more likely full skill content gets pulled in during a busy session.

I came across the Everything Claude Code repository and was honestly amazed. Skills, agents, commands, rules, all packaged together. So I did what most people would do: installed everything globally.

That was a mistake.

Here's what my setup looked like before I realized the problem:

Component          Size       Estimated Tokens
Skills (global)    196KB      ~50,000
Agent definitions  58KB       ~15,000
Command files      142KB      ~36,000
Rule files         9KB        ~2,000
TOTAL              405KB      ~103,000 tokens
Enter fullscreen mode Exit fullscreen mode

(Rough estimate: 1KB of text ≈ 250 tokens. Not all of this loads on every turn because skills use progressive disclosure, loading only metadata first and full content when triggered. But the potential overhead is still massive, and in practice, a busy session triggers many of them.)

Over 100,000 tokens of potential overhead sitting in my setup. That's a significant chunk of Claude's context window spent on instructions, most of which weren't relevant to what I was doing at that moment.

No wonder my conversations were getting compacted early. No wonder Claude was "forgetting" things. There wasn't enough room left for the actual work.

How to Check Your Own Overhead

Before you do anything else, run this in your terminal (Windows users: use Git Bash, not PowerShell):

du -sh ~/.claude/skills/ ~/.claude/agents/ ~/.claude/commands/ ~/.claude/rules/
Enter fullscreen mode Exit fullscreen mode

Reading your results:

Each line shows the size of a directory. Add them up for your total overhead.

Example output:

144K    /Users/you/.claude/skills/
76K     /Users/you/.claude/agents/
172K    /Users/you/.claude/commands/
9K      /Users/you/.claude/rules/
Enter fullscreen mode Exit fullscreen mode

That's 401KB total. To estimate tokens, multiply your total KB by 250 (1KB ≈ 250 tokens). So 401KB ≈ 100,000 tokens of potential overhead. Not all of it loads every turn (skills use progressive disclosure), but the more skills you have, the more likely multiple will trigger and load fully during a session.

If your skills directory alone is over 100KB, you're almost certainly carrying skills you don't use in most projects.

For context, my setup was 405KB before I touched anything. After moving domain-specific skills to project level and cleaning up unused agents, it dropped to 232KB. Same capabilities, 44% less overhead.

The Fix: 44% Reduction in One Afternoon

The principle is simple: only keep things globally that you use in 80%+ of your projects. Everything else goes to project level, where it only loads when you're working in that specific project.

I went from 20 global skills down to 6. The other 14 moved to the projects that actually needed them.

Component          Before     After      Saved
Skills (global)    196KB      51KB       145KB (74% reduction)
Agent definitions  58KB       52KB       6KB
Command files      142KB      120KB      22KB
Rule files         9KB        9KB        0KB (modified, not reduced)
TOTAL              405KB      232KB      173KB (~44% reduction)
Enter fullscreen mode Exit fullscreen mode

What I kept globally (the skills I use in every project):

  • Coding standards (applies to every language)
  • Security review (should check this everywhere)
  • TDD workflow (I practice TDD daily)
  • Verification loop (prevents claiming things are done before checking)
  • Strategic compaction (suggests when to compact context manually)
  • Continuous learning (tracks patterns across sessions)

What I moved to project level:
Docker patterns, Python patterns, React patterns, e2e testing, eval harness, iterative retrieval, full-stack patterns, and several others. These are useful but only in specific projects. Loading Docker patterns while I'm writing documentation is pure waste.

The difference was immediate. Conversations lasted longer before compaction. Claude held context from earlier in the session. Fewer "I don't have context on that" moments.

The Tool Output Problem Nobody Talks About

Most optimization advice focuses on what's loaded at the start of a conversation: skills, rules, CLAUDE.md. But there's another source of token waste that's just as big, and almost nobody mentions it.

Every time Claude runs a CLI command (git status, npm test, a build command), the raw output gets dumped into the context window. And here's the thing most people miss: that output gets re-read on every subsequent turn. It doesn't disappear.

Think about it this way. You ask Claude to run your test suite. The output is 5,000 tokens. 4,950 of those tokens are passing tests. 50 tokens are the actual failures you care about. But all 5,000 tokens sit in context and get re-read on turn 2, turn 3, turn 4, and every turn after.

Over a 20-turn session with 50 tool calls, you can easily accumulate 100,000+ tokens of tool output. Most of it noise.

RTK: The Token Saver That Actually Made a Difference

RTK (Rust Token Killer) is an open-source tool that filters CLI output before it enters Claude's context window. It applies four optimization passes: smart filtering (removes noise), grouping (aggregates similar items like errors by type), truncation (keeps relevant context, cuts redundancy), and deduplication (collapses repeated log lines with counts).

Real savings from my sessions:

Command Category Example Commands Token Savings
Build output cargo build, tsc, next build 80-90%
Test output vitest, pytest, playwright 90-99%
Git operations git status, git diff, git log 59-80%
File listings ls, find, grep 60-75%

The way I explain it to people: imagine you ask a librarian to check something. Without RTK, the librarian carries back the entire bookshelf, drops it on your desk, and says "the answer is on page 47." With RTK, the librarian comes back with just page 47, highlighted. Same answer. But your desk isn't buried anymore.

Installing RTK

# macOS/Linux (recommended)
brew install rtk

# Or via Cargo (IMPORTANT: do NOT run "cargo install rtk" without
# the git URL — that installs "Rust Type Kit", a completely
# different package. If "rtk gain" fails, you have the wrong one.)
cargo install --git https://github.com/rtk-ai/rtk

# Or via quick-install script
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh

# Then add to Claude Code globally
rtk init -g
Enter fullscreen mode Exit fullscreen mode

On Unix (macOS/Linux), RTK installs as a PostToolUse hook. It works transparently. Claude doesn't even know it's there. Zero token overhead.

On Windows, it works through Git Bash. The hook and RTK.md get installed the same way. If you're using Claude Code with Git Bash as your shell (which most Windows developers do), the experience is identical to macOS/Linux. The RTK.md file that gets created adds about 1,200 tokens of instructions, but a single filtered git diff saves more than that. Net positive after your first tool call.

Windows-specific tips:

  • Download the pre-built binary from the releases page (rtk-x86_64-pc-windows-msvc.zip), or install via cargo install --git https://github.com/rtk-ai/rtk in Git Bash
  • Make sure the binary path is in your system PATH
  • Run rtk init -g the same as on Unix
  • Run from Git Bash, not native PowerShell (some shell integrations assume bash)

Measuring Your Savings

RTK has built-in analytics:

# See your cumulative savings
rtk gain

# See savings per command type
rtk gain --history

# Find commands you ran WITHOUT rtk that could have been optimized
rtk discover
Enter fullscreen mode Exit fullscreen mode

The rtk discover command is the most useful one when you're starting out. It scans your Claude Code session logs and shows you exactly which commands you could have filtered but didn't.

The Memory System That Stops Claude From Asking the Same Questions

The last piece that made a real difference wasn't about reducing tokens. It was about making Claude smarter across sessions.

Claude Code has a file-based memory system at ~/.claude/projects/<project>/memory/. You create markdown files with frontmatter and Claude reads them at the start of every session.

I use four types:

User memories: Who I am, my tech stack, my preferences. Instead of explaining my setup every session, Claude already knows.

Feedback memories: Every time I correct Claude, the correction gets saved. "Use plain text in forms, not bullets." "Don't suggest tools I haven't used." Claude stops repeating the same mistakes.

Project memories: Current state of work. Deadlines, decisions, context that would otherwise be lost between sessions.

Reference memories: Where to find things in external systems. "Bug tracking is in Linear project X." Saves the "where is that tracked?" conversation every time.

lessons.md: One File That Changes Everything

This is the simplest thing I did and possibly the most impactful. I keep a lessons.md file in every project's .claude/ directory. Every time I correct Claude on something, it writes a rule:

## 2026-03-15 - Don't add error handling for impossible cases

**Rule:** Only add try-catch blocks at system boundaries (user input,
API calls, file I/O). Don't wrap internal function calls that can't
realistically fail.
**Why:** Added defensive error handling around a pure math function.
User said "this function takes two integers and adds them, it can't
throw. You're adding complexity for nothing."
**Applies when:** Writing or reviewing error handling in any codebase.
Enter fullscreen mode Exit fullscreen mode

Claude reads this file at the start of every session. The correction sticks permanently. Over a few weeks, the file becomes a precise set of rules that make Claude work exactly the way you need.

The principle is simple: never correct the same mistake twice. The first correction is a lesson. The second one means the system failed.

The Priority Order

If you're starting from scratch, here's what I'd do in order:

Priority What Effort Impact
1 Install RTK 30 seconds 60-90% tool output savings
2 Audit global skills, move domain-specific to project level 15 minutes Free up context window
3 Set up basic memory files (user profile + 2-3 feedback entries) 10 minutes Smarter responses, fewer repeated mistakes
4 Start a lessons.md file 30 seconds to create, 30 seconds per correction Permanent mistake prevention
5 Set MAX_THINKING_TOKENS env variable 10 seconds Cap runaway thinking, save tokens on over-analysis
6 Add model routing rules for subagents 10 minutes Route exploration/search subagents to cheaper models

None of this is complicated. Most of it takes less than 15 minutes. But the compound effect of doing all six is significant: longer sessions, better context retention, fewer repeated mistakes, and lower token bills.

The tools are there. Most people just don't know they exist, or don't realize how much overhead they're carrying.


This is Part 1 of a series on getting more out of Claude Code. Part 2 covers RTK in depth, including Windows setup, configuration, subagent behavior, and community tools that complement it.

Top comments (0)