Dean Sharon

Posted on Mar 7

How I strip 90% of code before feeding it to my coding agent

#ai #rust #cli #programming

Context windows keep growing. 200k tokens. A million. The assumption is that bigger windows mean better answers when working with code.

In practice, that's not what happens.

The attention problem

Say you have a typical 80-file TypeScript project. That's about 63,000 tokens. Any modern model can fit that in its context window, no problem.

But fitting it isn't the same as understanding it. There's a growing body of research showing that attention quality falls off as context gets longer. At some point, stuffing more tokens in actually makes the output worse. The model starts losing track of things, latency goes up, and the reasoning gets sloppy.

And when you think about it, most of what's in those 63k tokens is noise for the kind of questions you're usually asking. You want to know how services connect, what the API surface looks like, how the type system is structured. The model doesn't need to read through every loop body, error handler, and validation chain to answer that. That stuff is maybe 80% of your token budget, and it's not helping.

What the model actually needs

When you're asking about architecture, what matters is:

What functions and methods exist, their parameters and return types
What types and interfaces are defined
How modules connect and export
Class hierarchies and trait implementations

What doesn't matter:

How you iterate through a list
What happens inside a try/catch
Variable assignments in function bodies
The internals of a CRUD operation the model has seen a thousand times

Skim: strip implementation, keep structure

I built Skim to do this automatically. It uses tree-sitter to parse code at the AST level and strips out implementation nodes while keeping the structural signal intact.

skim file.ts                     # structure mode

// Before: Full implementation
export class UserService {
  constructor(private db: Database, private cache: Cache) {}

  async getUser(id: string): Promise<User | null> {
    const cached = await this.cache.get(`user:${id}`);
    if (cached) return JSON.parse(cached);
    const user = await this.db.query('SELECT * FROM users WHERE id = $1', [id]);
    if (user) await this.cache.set(`user:${id}`, JSON.stringify(user), 3600);
    return user;
  }

  async updateUser(id: string, data: Partial<User>): Promise<User> {
    const updated = await this.db.query(
      'UPDATE users SET ... WHERE id = $1 RETURNING *', [id]
    );
    await this.cache.del(`user:${id}`);
    return updated;
  }
}

// After: Structure mode
export class UserService {
  constructor(private db: Database, private cache: Cache) {}
  async getUser(id: string): Promise<User | null> { /* ... */ }
  async updateUser(id: string, data: Partial<User>): Promise<User> { /* ... */ }
}

The model can still see what UserService does, what it depends on, and what each method accepts and returns. It just doesn't have to wade through the caching logic and SQL queries to get there.

Four modes

Mode	Reduction	Good for
`structure`	60%	Understanding architecture, reviewing design
`signatures`	88%	Mapping API surfaces, understanding interfaces
`types`	91%	Analyzing the type system, domain modeling
`full`	0%	Passthrough, same as cat

skim src/ --mode=types           # just type definitions
skim src/ --mode=signatures      # function and method signatures
skim 'src/**/*.ts'               # glob patterns, parallel processing

Real numbers

Here's what that 80-file TypeScript project looks like across modes:

Mode	Tokens	Reduction
Full	63,198	0%
Structure	25,119	60.3%
Signatures	7,328	88.4%
Types	5,181	91.8%

In types mode, the whole project comes down to about 5k tokens. That fits in a single prompt with plenty of room left for your question. You can ask things like "explain the entire authentication flow" or "how do these services interact?" and the model actually has enough headroom to reason about it properly.

Pipe workflows

Skim just writes to stdout, so it plugs into whatever you're already using:

# Feed to Claude
skim src/ --mode=structure | claude "Review the architecture"

# Feed to any LLM API
skim src/ --mode=types | curl -X POST api.openai.com/... -d @-

# Quick structural overview
skim src/ | less

# See token counts
skim src/ --show-stats 2>&1 >/dev/null
# Output: Files: 80, Lines: 12,450, Tokens (original): 63,198, Tokens (transformed): 25,119

This was a deliberate design choice. Skim is a streaming reader (think cat but with some brains), not a file compression tool. Everything goes to stdout so you can pipe it wherever.

Under the hood

The parsing is done with tree-sitter, the same incremental parser that handles syntax highlighting in most modern editors. Each language defines which AST node types to keep for each mode:

Structure: function, class, and interface declarations stay. Bodies get replaced with /* ... */
Signatures: just function signatures and method declarations
Types: type definitions, interfaces, enums, type aliases

Internally it's a strategy pattern where each language owns its transformation rules:

impl Language {
    pub(crate) fn transform_source(&self, source: &str, mode: Mode, config: &Config) -> Result<String> {
        match self {
            Language::Json => json::transform_json(source),  // serde_json
            _ => tree_sitter_transform(source, *self, mode), // tree-sitter
        }
    }
}

JSON gets its own path through serde_json because it's data, not code. Everything else goes through tree-sitter.

On the performance side, it does 14.6ms for a 3000-line file. The hot path uses zero-copy string slicing, referencing source bytes directly without allocating. There's a caching layer using mtime invalidation that gets you 40-50x faster on repeated reads, and rayon handles parallel processing when you're working with multiple files.

9 languages

TypeScript, JavaScript, Python, Rust, Go, Java, Markdown, JSON, YAML. It figures out the language from the file extension. If you want to add a new tree-sitter language, it takes about 30 minutes.

Getting started

# Try without installing
npx rskim src/

# Install via npm
npm install -g rskim

# Install via cargo
cargo install rskim

# Basic usage
skim file.ts                     # structure mode (default)
skim src/ --mode=signatures      # signatures for a directory
skim 'src/**/*.ts' --mode=types  # glob pattern, types only
skim src/ --show-stats           # token count comparison

Full docs on GitHub: github.com/dean0x/skim

Website: dean0x.github.io/x/skim

When to reach for it

You want to ask an LLM about architecture or design and the codebase is too noisy at full size
You're getting an overview of unfamiliar code and don't need implementation details yet
You're documenting API surfaces
Token costs are adding up ($3/M tokens on a 63k project, query after query)
You're running a local model where context is more limited

When you actually need the model to look at implementation (debugging a specific function, refactoring logic), just use full mode or plain cat.

Open source, MIT licensed. Supports 9 languages, built in Rust. Curious how others are dealing with this when they work with AI on larger codebases.

DEV Community