DEV Community

Cover image for I tested 4 codebase-to-AI tools on FastAPI (108k lines). Here are the token costs.
GDS K S
GDS K S

Posted on

I tested 4 codebase-to-AI tools on FastAPI (108k lines). Here are the token costs.

Stacklit on GitHub -- the tool I built after running these tests.


I have been using AI agents on real projects for the past year. Claude Code, Cursor, Aider. The one problem that never goes away: every session starts by the agent reading files to understand the codebase. Same files. Same tokens. Every time.

So I tested four tools that claim to solve this. I ran them on FastAPI (108,075 lines of Python, 1,131 files) and measured what actually came out.

The four tools

Repomix (23k stars) -- packs your entire repo into one XML or Markdown file. Every line of source code in a single output.

Aider repo-map (part of Aider, 43k stars) -- generates an ephemeral text map of functions and classes ranked by relevance. Built into Aider, not available separately.

Codebase Memory MCP (1.4k stars) -- builds a SQLite knowledge graph from tree-sitter ASTs. 66 languages. Queryable through 14 MCP tools.

Stacklit (new, my project) -- generates a committed JSON index with module graph, exports, types, and activity. One file, committed to git.

The big comparison table

Feature Repomix Aider repo-map CB Memory MCP Stacklit
What it produces Full code dump (XML/MD) Ephemeral text map SQLite knowledge graph JSON index file
Output size (FastAPI) ~800k tokens ~8k-15k tokens per-query 4,142 tokens
Committed to repo No (too large) No (ephemeral) No (local DB) Yes
Works with Claude Code Yes (paste) No Yes (MCP) Yes (file + MCP)
Works with Cursor Yes (paste) No Yes (MCP) Yes (file + MCP)
Works with Aider Yes (paste) Built-in No Yes (reads file)
Works with Copilot Manual paste No No Yes (reads file)
Dependency graph No No Yes Yes
Module detection No No Yes Yes
Export/function signatures Full source code Function names only Full signatures Signatures with types
Type definitions Full source code No Yes Yes (struct/class fields)
Git activity heatmap No No Partial Yes (90-day)
Visual output No No No HTML with 4 views
MCP server No No Yes (14 tools) Yes (7 tools)
Monorepo aware No No No Yes (8 formats)
Incremental updates No No Partial Yes (Merkle hash)
Languages (full parsing) N/A (dumps everything) Many (tree-sitter) 66 (tree-sitter) 11 (tree-sitter)
Languages (basic) N/A N/A N/A Any (line count)
Runtime required Node.js Python C binary running None
Install npx repomix Built into Aider Download + run npx stacklit init
Binary size ~50MB (Node) Python env ~2MB (C) 32MB (Go, no CGO)
Configuration repomix.config.json In Aider config CLI flags stacklit.toml
Open source MIT Apache 2.0 MIT MIT

Token cost breakdown on FastAPI

This is the number that matters. I ran each tool on FastAPI and counted tokens using tiktoken (cl100k_base encoding, same as GPT-4/Claude):

Tool Output tokens Context windows used Time
Repomix (XML) ~800,000 4-6 windows (overflows 200k) ~8s
Repomix (compressed) ~400,000 2-3 windows ~12s
Aider repo-map ~8,000-15,000 Fits in one Per-prompt
CB Memory MCP Varies per query N/A (streaming) Sub-ms per query
Stacklit 4,142 Fits in one 0.4s

Stacklit produces the smallest static output. It does not include source code. It includes structure: which modules exist, what they export, how they connect, what changed recently.

Token cost across 4 projects

Not just FastAPI. I ran Stacklit on four popular open source repos:

Project Language Files Lines Stacklit tokens
Express.js JavaScript 141 21,346 3,765
FastAPI Python 1,131 108,075 4,142
Gin Go 100 23,829 3,361
Axum Rust 300 43,997 14,371

Full outputs are in the examples directory.

When to use each tool

Repomix: the brute force approach

Best for: pasting a small repo into ChatGPT for a one-shot question. Repos under 50 files where token cost does not matter.

The problem at scale: a 500-file repo produces 500,000+ tokens. That overflows most context windows. The agent gets all the code but no structural understanding. It still has to figure out the architecture from raw source.

Aider repo-map: the smart but locked approach

Best for: people who already use Aider. The repo-map is genuinely good. It ranks code by relevance to your current task using a PageRank-style algorithm.

The catch: it only works inside Aider. You cannot use it with Claude Code, Cursor, or Copilot. The map regenerates every prompt and is not shareable.

Codebase Memory MCP: the power user approach

Best for: large codebases where you need deep semantic queries. Call path tracing, dead code detection, relationship traversal across 66 languages.

The trade-off: you run a server process. The knowledge graph lives in a local database. Switch machines or share with a teammate? They rebuild locally. There is no committed artifact.

Stacklit: the committed index approach

Best for: teams where multiple people (or multiple agents) work on the same repo. The index is a JSON file you commit to git. Clone the repo, the index is there.

It works with every tool without per-tool configuration. Claude Code reads it as a file. Cursor reads it as a file. The MCP server is optional, for tools that prefer querying.

What Stacklit extracts per language

Language Parser What you get in the index
Go stdlib AST imports, exports with full signatures, struct fields, interface methods
TypeScript/JS tree-sitter ESM/CJS/dynamic imports, classes, interfaces, type aliases, enums
Python tree-sitter imports, classes with all methods, type hints, decorators, __main__
Rust tree-sitter use/mod/crate, pub items with generics, trait methods, struct fields
Java tree-sitter imports, public classes, method signatures with parameter types
C# tree-sitter using directives, public types, method signatures
Ruby tree-sitter require/require_relative, classes, modules, methods
PHP tree-sitter namespace use, classes, traits, public methods
Kotlin tree-sitter imports, classes, objects, data classes, functions
Swift tree-sitter imports, structs, classes, protocols, enums
C/C++ tree-sitter #include, functions, structs, unions, typedefs

Everything else gets basic support: language detection and line count per module.

Where Stacklit falls short

I want to be upfront about this:

  • 11 languages with full extraction, not 66. Codebase Memory MCP covers more languages deeply. If your stack is Elixir or Haskell, Stacklit gives you line counts, not full extraction.
  • No function-level call graphs. Stacklit maps module dependencies, not "which function calls which." CB Memory MCP and Axon do this.
  • No runtime queries. The index is a snapshot. It does not answer questions about the codebase on demand the way a running MCP server does. (Though Stacklit does have an MCP server that reads from the index.)
  • No source code in the output. Repomix gives the agent actual code. Stacklit gives a map. Sometimes the agent needs the code and will still read files.

My actual recommendation

Use Stacklit as a baseline for every repo. It takes 90 milliseconds to generate and costs nothing to maintain with a git hook.

Then layer other tools on top for specific needs:

  • Repomix for one-shot full-codebase prompts
  • Aider if that is your daily driver
  • CB Memory MCP for deep semantic analysis on large codebases

They are not mutually exclusive. A committed stacklit.json makes every other tool work better because the agent starts with context instead of from zero.

Try it now

npx stacklit init
Enter fullscreen mode Exit fullscreen mode

One command. Scans your codebase. Generates the index. Opens a visual map in your browser.

Works on macOS, Linux, Windows. MIT licensed. Zero runtime dependencies.

github.com/glincker/stacklit

The examples directory has full outputs from Express.js, FastAPI, Gin, and Axum so you can see what the index looks like before running it.


Which of these tools do you use? Have you tried combining them? Genuinely curious what setups people have landed on.

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

In my experience with enterprise teams, integrating AI agents with existing codebases often reveals a surprising challenge: it's not the integration itself but the maintenance of context across sessions. We discovered that using RAG (Retrieval-Augmented Generation) architectures can significantly reduce token costs by dynamically fetching relevant context instead of loading entire histories. This approach also makes it easier to scale FastAPI applications while keeping token usage efficient. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)