Your LLM already thinks in graphs. Your codebase is a graph. Connect them.
Every time an LLM works on your code it burns most of its context figuring out where things are. Grep, read wrong file, grep again, read another wrong file.
Eventually finds the bug. 80k tokens later.
The fix is always in like 3 files. The other 12 were orientation.
Your codebase is already a graph. Functions call functions, routes point to handlers, services import services, frontend HTTP calls hit backend endpoints. The
structure is right there.
LLMs are neural networks. They process relationships between things. That's literally what they do.
But we make them grep through files like it's 2004. Read a file, hope it's relevant, read another one. No structure, no map, just vibes and keyword matching.
repo-graph scans your codebase once. Extracts the graph that already exists. Entities, relationships, feature flows. Serves it over MCP. Now the LLM traverses the
structure instead of brute-forcing the filesystem.
Same bug fix on a Go + Angular monorepo. Same model, same prompt, fresh context, no hints.
Without repo-graph: 75,308 tokens. 4 minutes 36 seconds. About 15 files explored.
With repo-graph: 29,838 tokens. 30 seconds. 2 files read.
Without it greps for keywords, reads files, greps more, reads more, eventually narrows down. With it calls flow("groups"), gets the exact handler function and file,
reads it, fixes it.
It extracts modules, functions, routes, services, components and how they connect. Auto-generates feature flows by tracing from route entry points through handler
chains. Cross-stack linking matches frontend HTTP calls to backend routes automatically.
13 languages out of the box. Regex heuristics, not AST. No build step. One dependency. Adding a new language is one file.
pip install mcp-repo-graph
repo-graph-init --repo /path/to/your/project
Code is a graph. LLMs process graphs. repo-graph connects the two. A few hundred tokens to query the structure vs thousands to explore the filesystem.


Top comments (1)
hopefully not slop, seriously wondering how the hell do people benchmark this stuff properly ?