I've been using Claude Code on a 200-file TypeScript project. The model is great. The token bill was not.
The problem wasn't the model — it was what I was feeding it. Every session, the agent would read 30-40 files trying to orient itself before doing any actual work. Same files, same discoveries, same wasted tokens. Every single time.
After a lot of trial and error, I got my average input tokens per query from about 8,200 down to 2,100. Here's what worked, in order of impact.
Step 1: Write a real CLAUDE.md (not a vague one)
Most people write something like:
This is a TypeScript project using Express and React.
Please follow best practices.
This tells the agent almost nothing. It's going to read your whole codebase anyway.
What actually works is being specific about decisions, not descriptions:
## Auth
- Auth uses middleware in src/auth/middleware.ts
- JWT tokens, not sessions. Refresh token rotation in src/auth/refresh.ts
- DO NOT touch src/auth/legacy.ts — deprecated, will be removed Q2
## Database
- Prisma ORM, schema in prisma/schema.prisma
- All migrations must be backward-compatible
- Connection pooling handled by src/db/pool.ts, do not create new connections
## Conventions
- All API handlers in src/handlers/, one file per resource
- Error handling through src/lib/errors.ts, do not use try/catch in handlers
- Tests mirror src/ structure in tests/
The key: tell the agent what it would otherwise spend 10 minutes figuring out. Decisions, not descriptions. "We use Express" is useless. "Auth uses JWT with refresh rotation in this specific file" saves the agent from reading your entire auth directory.
Impact: about 20% token reduction. Significant, but not enough.
Step 2: Stop letting the agent grep your whole project
Here's what happens when you ask "how does authentication work in this project" without any context management:
- Agent searches for "auth" across the codebase
- Gets 40+ hits across middleware, tests, configs, legacy code, node_modules if you're unlucky
- Reads 15-20 files to piece together the picture
- Burns 8,000+ tokens before writing a single line of code
The agent doesn't need 40 files. It needs the auth middleware, the two things it depends on, and the three things that depend on it. That's maybe 5 files.
The question is: how do you give the agent the right 5 files instead of all 40?
This is where I stopped being able to solve it with prompting alone.
Step 3: Give the agent a dependency graph
I built a tool called vexp that pre-computes a dependency graph of your codebase at the AST level. Not grep, not text search — actual parsed relationships: who imports what, who calls what, what types flow where.
When the agent asks about authentication, instead of grep-matching "auth" across 40 files, it gets the relevant subgraph: the auth function, its dependencies, and its dependents, packed into a token budget you control.
Before (grep approach):
Agent reads: 40 files, 8,247 tokens
Relevant files: 5
Wasted: about 80% of input tokens
After (dependency graph):
Agent reads: capsule with 5 relevant nodes, 2,140 tokens
Relevant files: 5
Wasted: near zero
Same information, 74% fewer tokens.
Step 4: Solve the "amnesia" problem
Token reduction is half the problem. The other half: every new session starts from zero.
Monday the agent spends 20 minutes discovering that your payment module has a non-obvious dependency on a legacy Redis cache. Tuesday, new session, same 20 minutes. Wednesday, same again.
I tried every approach to make agents save their own notes:
- "After completing a task, save your observations" — ignored 90% of the time
- Detailed save instructions in CLAUDE.md — maybe 15% compliance
- Making it a "required step" — agent writes "completed successfully, no issues" and moves on
The models are optimized for current-task completion. A tool that only benefits future sessions has zero value to the current context window. The incentive structure works against you.
What actually worked: passive observation. Instead of asking the agent to save things, watch what it does. Track which files it reads, what changes it makes at the AST level, and infer observations from its behavior. The agent that spent 20 minutes on your Redis dependency didn't save a note about it — but the tool call pattern and code changes tell you exactly what it learned.
These observations get linked to the code graph. When the underlying code changes, linked observations automatically go stale. So you're never feeding the agent outdated context.
- Session 1: Agent discovers Redis dependency — observation saved passively
- Session 2: Agent gets the observation immediately — skips the 20-minute rediscovery
- Session 3: Someone refactors the Redis cache out — observation flagged stale — agent re-explores
The combined result
| Metric | Before | After |
|---|---|---|
| Avg input tokens/query | 8,200 | 2,100 |
| Session start orientation | 5-10 min | under 30 sec |
| Repeated discoveries | Every session | Once |
| Token reduction | — | 65-74% |
On a practical level this means:
- If you're on Claude Max/Pro: 2-3x more work before hitting usage caps
- If you're on API: direct cost savings on input tokens
- On any plan: the agent starts working immediately instead of spending the first 10 minutes reading
Setup
vexp works as a VS Code extension or standalone CLI. It's an MCP server, so it works with any agent that speaks MCP: Claude Code, Cursor, Windsurf, Cline, Roo Code, Copilot, aider, Codex.
# VS Code
Search "vexp" in the extension marketplace
# CLI (for Claude Code, terminal agents)
npm install -g vexp-cli
Free tier: 2,000 nodes, full memory tools, no account needed. Runs 100% local — single Rust binary, SQLite, zero network calls.
Pro ($19/mo): multi-repo support, 50k nodes, priority updates.
What I'd do if I were starting today
- Write a specific CLAUDE.md — decisions, not descriptions. 30 minutes, 20% improvement.
- Set up a dependency graph — stop letting the agent grep. This is where the real token savings are.
- Let memory accumulate — don't try to make the agent save notes. Observe passively and let the context build itself over 3-4 sessions.
The first step is free and takes 30 minutes. The rest takes about 5 minutes to install.
I'm the developer behind vexp. Happy to answer questions about the architecture, MCP integration, or anything else in the comments.
Top comments (0)