Martin Vogel

Posted on Mar 20

How I Cut My AI Coding Agent's Token Usage by 120x With a Code Knowledge Graph

#ai #claudecode #mcp #opensource

AI coding agents are powerful — but they're also blind. Every time Claude Code, Codex, or Gemini CLI needs to understand your codebase, they explore it file by file. Grep here, read there, grep again. For a simple question like "what calls ProcessOrder?", an agent might burn through 45,000 tokens just opening files and scanning for matches.

I built codebase-memory-mcp to fix this. It parses your codebase into a persistent knowledge graph — functions, classes, call chains, imports, HTTP routes — and exposes it through 14 MCP tools. The same question now costs ~200 tokens and answers in under 1ms.

The Problem: File-by-File Exploration Doesn't Scale

Here's what actually happens when you ask an AI agent "trace the callers of ProcessOrder":

Agent greps for ProcessOrder across all files (~15,000 tokens)
Reads each matching file to understand context (~25,000 tokens)
Follows imports to find indirect callers (~20,000 tokens)
Gives up after hitting context limits, missing half the call chain

Multiply this by every question in a coding session and you're burning hundreds of thousands of tokens per hour — most of it reading files that aren't relevant.

The Fix: Parse Once, Query Forever

codebase-memory-mcp runs a one-time indexing pass using tree-sitter AST parsing. It extracts every function, class, method, import, call relationship, and HTTP route into a SQLite-backed graph. After that, the graph stays fresh automatically via a background watcher that detects file changes.

You: "what calls ProcessOrder?"

Agent calls: trace_call_path(function_name="ProcessOrder", direction="inbound")

→ Returns structured call chain in ~200 tokens, <1ms

No LLM is embedded in the server. Your agent is the intelligence layer — it just gets precise structural answers instead of raw file contents.

Benchmarks: 120x Token Reduction

I ran agent-vs-agent testing across 31 languages (372 questions). Five representative structural queries on a real multi-service project:

Query Type	Knowledge Graph	File-by-File Search	Savings
Find function by pattern	~200 tokens	~45,000 tokens	225x
Trace call chain (depth 3)	~800 tokens	~120,000 tokens	150x
Dead code detection	~500 tokens	~85,000 tokens	170x
List all HTTP routes	~400 tokens	~62,000 tokens	155x
Architecture overview	~1,500 tokens	~100,000 tokens	67x
Total	~3,400	~412,000	121x

That's a 99.2% reduction. The cost difference between graph queries and file exploration adds up fast over a full development session.

It Handles the Linux Kernel

The stress test I'm most proud of: indexing the entire Linux kernel.

28 million lines of code, 75,000 files
2.1 million nodes, 4.9 million edges
Indexed in 1 minute on an M3 Pro in fast mode, 5mins for advanced indexing also including large files and digging a bit more. The average repo will be indexed sub second depending on your hardware (more cpu the better).

The pipeline is RAM-first: LZ4-compressed bulk read, in-memory SQLite, fused Aho-Corasick pattern matching, single dump at the end. Memory is released back to the OS after indexing completes. Average-sized repos index in milliseconds.

64 Languages, Zero Dependencies

All 64 language grammars are vendored as C source and compiled into a single static binary. Nothing to install, nothing that breaks when tree-sitter updates a grammar upstream.

Programming languages (39): Python, Go, JavaScript, TypeScript, TSX, Rust, Java, C++, C#, C, PHP, Ruby, Kotlin, Scala, Swift, Dart, Zig, Elixir, Haskell, OCaml, Objective-C, Lua, Bash, Perl, Groovy, Erlang, R, Clojure, F#, Julia, Vim Script, Nix, Common Lisp, Elm, Fortran, CUDA, COBOL, Verilog, Emacs Lisp

Scientific (5): MATLAB, Lean 4, FORM, Magma, Wolfram

Config/markup (20): HTML, CSS, SCSS, YAML, TOML, HCL, SQL, Dockerfile, JSON, XML, Markdown, Makefile, CMake, Protobuf, GraphQL, Vue, Svelte, Meson, GLSL, INI

This matters because real-world codebases aren't monolingual. A typical project has Go backends, TypeScript frontends, SQL migrations, Dockerfiles, YAML configs, and shell scripts. One indexing pass captures all of it. We also already introduced more advanced indexing using LSP like techniques, basically creating a "LSP + Tree-Sitter" hybrid approach. Currently only supported for Go, C and C++, more supported languages coming soon.

14 MCP Tools

The full tool surface:

Tool	What it does
`search_graph`	Find functions/classes by name pattern, label, degree
`trace_call_path`	Follow callers/callees at configurable depth
`get_architecture`	Languages, packages, entry points, routes, hotspots, clusters
`detect_changes`	Map git diff to affected symbols with risk classification
`query_graph`	Raw Cypher queries (`MATCH (f:Function)-[:CALLS]->(g)...`)
`search_code`	Full-text search across indexed source
`get_code_snippet`	Read a specific function/class by qualified name
`get_graph_schema`	Inspect available node/edge types
`manage_adr`	Architecture Decision Records that persist across sessions
`index_repository`	Trigger initial index (auto-sync handles the rest)
`list_projects`	Show all indexed repos with stats
`delete_project`	Clean up a project's graph data
`index_status`	Check indexing progress
`ingest_traces`	Import OpenTelemetry traces into the graph

Works With 8 AI Agents

One install command auto-detects and configures all of these:

Claude Code — full integration with skills + PreToolUse hooks
Codex CLI — MCP config + AGENTS.md instructions
Gemini CLI — MCP config + BeforeTool hooks
Zed — JSONC settings integration
OpenCode — MCP config + AGENTS.md
Antigravity — MCP config + AGENTS.md
Aider — CONVENTIONS.md instructions
KiloCode — MCP settings + rules

The hooks are advisory — they remind agents to check the graph before reaching for grep/glob/read, without blocking anything.

Setup: 3 Commands

# Download (or use the one-liner: curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/scripts/setup.sh | bash)
tar xzf codebase-memory-mcp-*.tar.gz
mv codebase-memory-mcp ~/.local/bin/

# Auto-configure all detected agents
codebase-memory-mcp install

# Restart your agent, then:
"Index this project"

That's it. No Docker, no API keys, no npm install, no runtime dependencies. A ~15MB static binary for macOS (arm64/amd64), Linux (arm64/amd64), or Windows (amd64).

Built-In Graph Visualization

If you download the UI variant, you get a 3D interactive graph explorer at localhost:9749:

It runs as a background thread alongside the MCP server — available whenever your agent is connected.

The Design Philosophy

A lot of code intelligence tools embed an LLM for natural language → graph query translation. This means extra API keys, extra cost, and another model to configure and keep updated.

With MCP, the agent you're already talking to is the query translator. It reads tool descriptions, understands your question, and makes the right tool call. No intermediate LLM needed.

Similarly, the tool focuses on structural precision over semantic fuzziness. When an agent asks "what calls X?", it needs an exact answer — not a ranked list of "probably related" functions. The graph gives exact call chains with import-aware, type-inferred resolution.

What's Next

LSP-style hybrid type resolution — already live for Go, C, and C++ (more languages coming)
Cross-service HTTP linking — discovers REST routes and matches them to HTTP call sites with confidence scoring
Louvain community detection — automatically discovers functional modules by clustering call edges

Try It

GitHub: github.com/DeusData/codebase-memory-mcp
Website: deusdata.github.io/codebase-memory-mcp
MIT licensed — use it commercially, fork it, contribute

If you're burning tokens on file-by-file exploration, give it a shot. Index your project and ask your agent a structural question — the difference is immediate.

Built with pure C, tree-sitter, and SQLite. No runtime dependencies. 780+ stars and growing. We built it for developers using coding agents. We want to reach here the most performant solution in this space as we believe it will enable more efficient coding for everyone and vice versa will translate in more good solutions coming up, faster and cheaper in token burn

Top comments (1)

John Sanders • Mar 23

High test. I tried something like this and failed. HT. Interested to see what kind of "personality" results you are getting. I couldn't maintain a fluent personality and off-load the technical with any success. OP- would be interested in discussing, if available. Thanks for sharing.