Stacklit on GitHub -- the tool I built after running these tests.
I have been using AI agents on real projects for the past year. Claude Code, Cursor, Aider. The one problem that never goes away: every session starts by the agent reading files to understand the codebase. Same files. Same tokens. Every time.
So I tested four tools that claim to solve this. I ran them on FastAPI (108,075 lines of Python, 1,131 files) and measured what actually came out.
The four tools
Repomix (23k stars) -- packs your entire repo into one XML or Markdown file. Every line of source code in a single output.
Aider repo-map (part of Aider, 43k stars) -- generates an ephemeral text map of functions and classes ranked by relevance. Built into Aider, not available separately.
Codebase Memory MCP (1.4k stars) -- builds a SQLite knowledge graph from tree-sitter ASTs. 66 languages. Queryable through 14 MCP tools.
Stacklit (new, my project) -- generates a committed JSON index with module graph, exports, types, and activity. One file, committed to git.
The big comparison table
| Feature | Repomix | Aider repo-map | CB Memory MCP | Stacklit |
|---|---|---|---|---|
| What it produces | Full code dump (XML/MD) | Ephemeral text map | SQLite knowledge graph | JSON index file |
| Output size (FastAPI) | ~800k tokens | ~8k-15k tokens | per-query | 4,142 tokens |
| Committed to repo | No (too large) | No (ephemeral) | No (local DB) | Yes |
| Works with Claude Code | Yes (paste) | No | Yes (MCP) | Yes (file + MCP) |
| Works with Cursor | Yes (paste) | No | Yes (MCP) | Yes (file + MCP) |
| Works with Aider | Yes (paste) | Built-in | No | Yes (reads file) |
| Works with Copilot | Manual paste | No | No | Yes (reads file) |
| Dependency graph | No | No | Yes | Yes |
| Module detection | No | No | Yes | Yes |
| Export/function signatures | Full source code | Function names only | Full signatures | Signatures with types |
| Type definitions | Full source code | No | Yes | Yes (struct/class fields) |
| Git activity heatmap | No | No | Partial | Yes (90-day) |
| Visual output | No | No | No | HTML with 4 views |
| MCP server | No | No | Yes (14 tools) | Yes (7 tools) |
| Monorepo aware | No | No | No | Yes (8 formats) |
| Incremental updates | No | No | Partial | Yes (Merkle hash) |
| Languages (full parsing) | N/A (dumps everything) | Many (tree-sitter) | 66 (tree-sitter) | 11 (tree-sitter) |
| Languages (basic) | N/A | N/A | N/A | Any (line count) |
| Runtime required | Node.js | Python | C binary running | None |
| Install | npx repomix |
Built into Aider | Download + run | npx stacklit init |
| Binary size | ~50MB (Node) | Python env | ~2MB (C) | 32MB (Go, no CGO) |
| Configuration | repomix.config.json | In Aider config | CLI flags | stacklit.toml |
| Open source | MIT | Apache 2.0 | MIT | MIT |
Token cost breakdown on FastAPI
This is the number that matters. I ran each tool on FastAPI and counted tokens using tiktoken (cl100k_base encoding, same as GPT-4/Claude):
| Tool | Output tokens | Context windows used | Time |
|---|---|---|---|
| Repomix (XML) | ~800,000 | 4-6 windows (overflows 200k) | ~8s |
| Repomix (compressed) | ~400,000 | 2-3 windows | ~12s |
| Aider repo-map | ~8,000-15,000 | Fits in one | Per-prompt |
| CB Memory MCP | Varies per query | N/A (streaming) | Sub-ms per query |
| Stacklit | 4,142 | Fits in one | 0.4s |
Stacklit produces the smallest static output. It does not include source code. It includes structure: which modules exist, what they export, how they connect, what changed recently.
Token cost across 4 projects
Not just FastAPI. I ran Stacklit on four popular open source repos:
| Project | Language | Files | Lines | Stacklit tokens |
|---|---|---|---|---|
| Express.js | JavaScript | 141 | 21,346 | 3,765 |
| FastAPI | Python | 1,131 | 108,075 | 4,142 |
| Gin | Go | 100 | 23,829 | 3,361 |
| Axum | Rust | 300 | 43,997 | 14,371 |
Full outputs are in the examples directory.
When to use each tool
Repomix: the brute force approach
Best for: pasting a small repo into ChatGPT for a one-shot question. Repos under 50 files where token cost does not matter.
The problem at scale: a 500-file repo produces 500,000+ tokens. That overflows most context windows. The agent gets all the code but no structural understanding. It still has to figure out the architecture from raw source.
Aider repo-map: the smart but locked approach
Best for: people who already use Aider. The repo-map is genuinely good. It ranks code by relevance to your current task using a PageRank-style algorithm.
The catch: it only works inside Aider. You cannot use it with Claude Code, Cursor, or Copilot. The map regenerates every prompt and is not shareable.
Codebase Memory MCP: the power user approach
Best for: large codebases where you need deep semantic queries. Call path tracing, dead code detection, relationship traversal across 66 languages.
The trade-off: you run a server process. The knowledge graph lives in a local database. Switch machines or share with a teammate? They rebuild locally. There is no committed artifact.
Stacklit: the committed index approach
Best for: teams where multiple people (or multiple agents) work on the same repo. The index is a JSON file you commit to git. Clone the repo, the index is there.
It works with every tool without per-tool configuration. Claude Code reads it as a file. Cursor reads it as a file. The MCP server is optional, for tools that prefer querying.
What Stacklit extracts per language
| Language | Parser | What you get in the index |
|---|---|---|
| Go | stdlib AST | imports, exports with full signatures, struct fields, interface methods |
| TypeScript/JS | tree-sitter | ESM/CJS/dynamic imports, classes, interfaces, type aliases, enums |
| Python | tree-sitter | imports, classes with all methods, type hints, decorators, __main__
|
| Rust | tree-sitter | use/mod/crate, pub items with generics, trait methods, struct fields |
| Java | tree-sitter | imports, public classes, method signatures with parameter types |
| C# | tree-sitter | using directives, public types, method signatures |
| Ruby | tree-sitter | require/require_relative, classes, modules, methods |
| PHP | tree-sitter | namespace use, classes, traits, public methods |
| Kotlin | tree-sitter | imports, classes, objects, data classes, functions |
| Swift | tree-sitter | imports, structs, classes, protocols, enums |
| C/C++ | tree-sitter | #include, functions, structs, unions, typedefs |
Everything else gets basic support: language detection and line count per module.
Where Stacklit falls short
I want to be upfront about this:
- 11 languages with full extraction, not 66. Codebase Memory MCP covers more languages deeply. If your stack is Elixir or Haskell, Stacklit gives you line counts, not full extraction.
- No function-level call graphs. Stacklit maps module dependencies, not "which function calls which." CB Memory MCP and Axon do this.
- No runtime queries. The index is a snapshot. It does not answer questions about the codebase on demand the way a running MCP server does. (Though Stacklit does have an MCP server that reads from the index.)
- No source code in the output. Repomix gives the agent actual code. Stacklit gives a map. Sometimes the agent needs the code and will still read files.
My actual recommendation
Use Stacklit as a baseline for every repo. It takes 90 milliseconds to generate and costs nothing to maintain with a git hook.
Then layer other tools on top for specific needs:
- Repomix for one-shot full-codebase prompts
- Aider if that is your daily driver
- CB Memory MCP for deep semantic analysis on large codebases
They are not mutually exclusive. A committed stacklit.json makes every other tool work better because the agent starts with context instead of from zero.
Try it now
npx stacklit init
One command. Scans your codebase. Generates the index. Opens a visual map in your browser.
Works on macOS, Linux, Windows. MIT licensed. Zero runtime dependencies.
The examples directory has full outputs from Express.js, FastAPI, Gin, and Axum so you can see what the index looks like before running it.
Which of these tools do you use? Have you tried combining them? Genuinely curious what setups people have landed on.
Top comments (1)
In my experience with enterprise teams, integrating AI agents with existing codebases often reveals a surprising challenge: it's not the integration itself but the maintenance of context across sessions. We discovered that using RAG (Retrieval-Augmented Generation) architectures can significantly reduce token costs by dynamically fetching relevant context instead of loading entire histories. This approach also makes it easier to scale FastAPI applications while keeping token usage efficient. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)