Context windows keep growing. 200k tokens. A million. The assumption is that bigger windows mean better answers when working with code.
In practice, that's not what happens.
The attention problem
Say you have a typical 80-file TypeScript project. That's about 63,000 tokens. Any modern model can fit that in its context window, no problem.
But fitting it isn't the same as understanding it. There's a growing body of research showing that attention quality falls off as context gets longer. At some point, stuffing more tokens in actually makes the output worse. The model starts losing track of things, latency goes up, and the reasoning gets sloppy.
And when you think about it, most of what's in those 63k tokens is noise for the kind of questions you're usually asking. You want to know how services connect, what the API surface looks like, how the type system is structured. The model doesn't need to read through every loop body, error handler, and validation chain to answer that. That stuff is maybe 80% of your token budget, and it's not helping.
What the model actually needs
When you're asking about architecture, what matters is:
- What functions and methods exist, their parameters and return types
- What types and interfaces are defined
- How modules connect and export
- Class hierarchies and trait implementations
What doesn't matter:
- How you iterate through a list
- What happens inside a try/catch
- Variable assignments in function bodies
- The internals of a CRUD operation the model has seen a thousand times
Skim: strip implementation, keep structure
I built Skim to do this automatically. It uses tree-sitter to parse code at the AST level and strips out implementation nodes while keeping the structural signal intact.
skim file.ts # structure mode
// Before: Full implementation
export class UserService {
constructor(private db: Database, private cache: Cache) {}
async getUser(id: string): Promise<User | null> {
const cached = await this.cache.get(`user:${id}`);
if (cached) return JSON.parse(cached);
const user = await this.db.query('SELECT * FROM users WHERE id = $1', [id]);
if (user) await this.cache.set(`user:${id}`, JSON.stringify(user), 3600);
return user;
}
async updateUser(id: string, data: Partial<User>): Promise<User> {
const updated = await this.db.query(
'UPDATE users SET ... WHERE id = $1 RETURNING *', [id]
);
await this.cache.del(`user:${id}`);
return updated;
}
}
// After: Structure mode
export class UserService {
constructor(private db: Database, private cache: Cache) {}
async getUser(id: string): Promise<User | null> { /* ... */ }
async updateUser(id: string, data: Partial<User>): Promise<User> { /* ... */ }
}
The model can still see what UserService does, what it depends on, and what each method accepts and returns. It just doesn't have to wade through the caching logic and SQL queries to get there.
Four modes
| Mode | Reduction | Good for |
|---|---|---|
structure |
60% | Understanding architecture, reviewing design |
signatures |
88% | Mapping API surfaces, understanding interfaces |
types |
91% | Analyzing the type system, domain modeling |
full |
0% | Passthrough, same as cat |
skim src/ --mode=types # just type definitions
skim src/ --mode=signatures # function and method signatures
skim 'src/**/*.ts' # glob patterns, parallel processing
Real numbers
Here's what that 80-file TypeScript project looks like across modes:
| Mode | Tokens | Reduction |
|---|---|---|
| Full | 63,198 | 0% |
| Structure | 25,119 | 60.3% |
| Signatures | 7,328 | 88.4% |
| Types | 5,181 | 91.8% |
In types mode, the whole project comes down to about 5k tokens. That fits in a single prompt with plenty of room left for your question. You can ask things like "explain the entire authentication flow" or "how do these services interact?" and the model actually has enough headroom to reason about it properly.
Pipe workflows
Skim just writes to stdout, so it plugs into whatever you're already using:
# Feed to Claude
skim src/ --mode=structure | claude "Review the architecture"
# Feed to any LLM API
skim src/ --mode=types | curl -X POST api.openai.com/... -d @-
# Quick structural overview
skim src/ | less
# See token counts
skim src/ --show-stats 2>&1 >/dev/null
# Output: Files: 80, Lines: 12,450, Tokens (original): 63,198, Tokens (transformed): 25,119
This was a deliberate design choice. Skim is a streaming reader (think cat but with some brains), not a file compression tool. Everything goes to stdout so you can pipe it wherever.
Under the hood
The parsing is done with tree-sitter, the same incremental parser that handles syntax highlighting in most modern editors. Each language defines which AST node types to keep for each mode:
-
Structure: function, class, and interface declarations stay. Bodies get replaced with
/* ... */ - Signatures: just function signatures and method declarations
- Types: type definitions, interfaces, enums, type aliases
Internally it's a strategy pattern where each language owns its transformation rules:
impl Language {
pub(crate) fn transform_source(&self, source: &str, mode: Mode, config: &Config) -> Result<String> {
match self {
Language::Json => json::transform_json(source), // serde_json
_ => tree_sitter_transform(source, *self, mode), // tree-sitter
}
}
}
JSON gets its own path through serde_json because it's data, not code. Everything else goes through tree-sitter.
On the performance side, it does 14.6ms for a 3000-line file. The hot path uses zero-copy string slicing, referencing source bytes directly without allocating. There's a caching layer using mtime invalidation that gets you 40-50x faster on repeated reads, and rayon handles parallel processing when you're working with multiple files.
9 languages
TypeScript, JavaScript, Python, Rust, Go, Java, Markdown, JSON, YAML. It figures out the language from the file extension. If you want to add a new tree-sitter language, it takes about 30 minutes.
Getting started
# Try without installing
npx rskim src/
# Install via npm
npm install -g rskim
# Install via cargo
cargo install rskim
# Basic usage
skim file.ts # structure mode (default)
skim src/ --mode=signatures # signatures for a directory
skim 'src/**/*.ts' --mode=types # glob pattern, types only
skim src/ --show-stats # token count comparison
Full docs on GitHub: github.com/dean0x/skim
Website: dean0x.github.io/x/skim
When to reach for it
- You want to ask an LLM about architecture or design and the codebase is too noisy at full size
- You're getting an overview of unfamiliar code and don't need implementation details yet
- You're documenting API surfaces
- Token costs are adding up ($3/M tokens on a 63k project, query after query)
- You're running a local model where context is more limited
When you actually need the model to look at implementation (debugging a specific function, refactoring logic), just use full mode or plain cat.
Open source, MIT licensed. Supports 9 languages, built in Rust. Curious how others are dealing with this when they work with AI on larger codebases.
Top comments (0)