DEV Community

Martin Vogel
Martin Vogel

Posted on

How I Cut My AI Coding Agent's Token Usage by 120x With a Code Knowledge Graph

AI coding agents are powerful — but they're also blind. Every time Claude Code, Codex, or Gemini CLI needs to understand your codebase, they explore it file by file. Grep here, read there, grep again. For a simple question like "what calls ProcessOrder?", an agent might burn through 45,000 tokens just opening files and scanning for matches.

I built codebase-memory-mcp to fix this. It parses your codebase into a persistent knowledge graph — functions, classes, call chains, imports, HTTP routes — and exposes it through 14 MCP tools. The same question now costs ~200 tokens and answers in under 1ms.

The Problem: File-by-File Exploration Doesn't Scale

Here's what actually happens when you ask an AI agent "trace the callers of ProcessOrder":

  1. Agent greps for ProcessOrder across all files (~15,000 tokens)
  2. Reads each matching file to understand context (~25,000 tokens)
  3. Follows imports to find indirect callers (~20,000 tokens)
  4. Gives up after hitting context limits, missing half the call chain

Multiply this by every question in a coding session and you're burning hundreds of thousands of tokens per hour — most of it reading files that aren't relevant.

The Fix: Parse Once, Query Forever

codebase-memory-mcp runs a one-time indexing pass using tree-sitter AST parsing. It extracts every function, class, method, import, call relationship, and HTTP route into a SQLite-backed graph. After that, the graph stays fresh automatically via a background watcher that detects file changes.

You: "what calls ProcessOrder?"

Agent calls: trace_call_path(function_name="ProcessOrder", direction="inbound")

→ Returns structured call chain in ~200 tokens, <1ms
Enter fullscreen mode Exit fullscreen mode

No LLM is embedded in the server. Your agent is the intelligence layer — it just gets precise structural answers instead of raw file contents.

Benchmarks: 120x Token Reduction

I ran agent-vs-agent testing across 31 languages (372 questions). Five representative structural queries on a real multi-service project:

Query Type Knowledge Graph File-by-File Search Savings
Find function by pattern ~200 tokens ~45,000 tokens 225x
Trace call chain (depth 3) ~800 tokens ~120,000 tokens 150x
Dead code detection ~500 tokens ~85,000 tokens 170x
List all HTTP routes ~400 tokens ~62,000 tokens 155x
Architecture overview ~1,500 tokens ~100,000 tokens 67x
Total ~3,400 ~412,000 121x

That's a 99.2% reduction. The cost difference between graph queries and file exploration adds up fast over a full development session.

It Handles the Linux Kernel

The stress test I'm most proud of: indexing the entire Linux kernel.

  • 28 million lines of code, 75,000 files
  • 2.1 million nodes, 4.9 million edges
  • Indexed in 1 minute on an M3 Pro in fast mode, 5mins for advanced indexing also including large files and digging a bit more. The average repo will be indexed sub second depending on your hardware (more cpu the better).

The pipeline is RAM-first: LZ4-compressed bulk read, in-memory SQLite, fused Aho-Corasick pattern matching, single dump at the end. Memory is released back to the OS after indexing completes. Average-sized repos index in milliseconds.

64 Languages, Zero Dependencies

All 64 language grammars are vendored as C source and compiled into a single static binary. Nothing to install, nothing that breaks when tree-sitter updates a grammar upstream.

Programming languages (39): Python, Go, JavaScript, TypeScript, TSX, Rust, Java, C++, C#, C, PHP, Ruby, Kotlin, Scala, Swift, Dart, Zig, Elixir, Haskell, OCaml, Objective-C, Lua, Bash, Perl, Groovy, Erlang, R, Clojure, F#, Julia, Vim Script, Nix, Common Lisp, Elm, Fortran, CUDA, COBOL, Verilog, Emacs Lisp

Scientific (5): MATLAB, Lean 4, FORM, Magma, Wolfram

Config/markup (20): HTML, CSS, SCSS, YAML, TOML, HCL, SQL, Dockerfile, JSON, XML, Markdown, Makefile, CMake, Protobuf, GraphQL, Vue, Svelte, Meson, GLSL, INI

This matters because real-world codebases aren't monolingual. A typical project has Go backends, TypeScript frontends, SQL migrations, Dockerfiles, YAML configs, and shell scripts. One indexing pass captures all of it. We also already introduced more advanced indexing using LSP like techniques, basically creating a "LSP + Tree-Sitter" hybrid approach. Currently only supported for Go, C and C++, more supported languages coming soon.

14 MCP Tools

The full tool surface:

Tool What it does
search_graph Find functions/classes by name pattern, label, degree
trace_call_path Follow callers/callees at configurable depth
get_architecture Languages, packages, entry points, routes, hotspots, clusters
detect_changes Map git diff to affected symbols with risk classification
query_graph Raw Cypher queries (MATCH (f:Function)-[:CALLS]->(g)...)
search_code Full-text search across indexed source
get_code_snippet Read a specific function/class by qualified name
get_graph_schema Inspect available node/edge types
manage_adr Architecture Decision Records that persist across sessions
index_repository Trigger initial index (auto-sync handles the rest)
list_projects Show all indexed repos with stats
delete_project Clean up a project's graph data
index_status Check indexing progress
ingest_traces Import OpenTelemetry traces into the graph

Works With 8 AI Agents

One install command auto-detects and configures all of these:

  • Claude Code — full integration with skills + PreToolUse hooks
  • Codex CLI — MCP config + AGENTS.md instructions
  • Gemini CLI — MCP config + BeforeTool hooks
  • Zed — JSONC settings integration
  • OpenCode — MCP config + AGENTS.md
  • Antigravity — MCP config + AGENTS.md
  • Aider — CONVENTIONS.md instructions
  • KiloCode — MCP settings + rules

The hooks are advisory — they remind agents to check the graph before reaching for grep/glob/read, without blocking anything.

Setup: 3 Commands

# Download (or use the one-liner: curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/scripts/setup.sh | bash)
tar xzf codebase-memory-mcp-*.tar.gz
mv codebase-memory-mcp ~/.local/bin/

# Auto-configure all detected agents
codebase-memory-mcp install

# Restart your agent, then:
"Index this project"
Enter fullscreen mode Exit fullscreen mode

That's it. No Docker, no API keys, no npm install, no runtime dependencies. A ~15MB static binary for macOS (arm64/amd64), Linux (arm64/amd64), or Windows (amd64).

Built-In Graph Visualization

If you download the UI variant, you get a 3D interactive graph explorer at localhost:9749:

Graph visualization showing codebase knowledge graph with nodes and edges

It runs as a background thread alongside the MCP server — available whenever your agent is connected.

The Design Philosophy

A lot of code intelligence tools embed an LLM for natural language → graph query translation. This means extra API keys, extra cost, and another model to configure and keep updated.

With MCP, the agent you're already talking to is the query translator. It reads tool descriptions, understands your question, and makes the right tool call. No intermediate LLM needed.

Similarly, the tool focuses on structural precision over semantic fuzziness. When an agent asks "what calls X?", it needs an exact answer — not a ranked list of "probably related" functions. The graph gives exact call chains with import-aware, type-inferred resolution.

What's Next

  • LSP-style hybrid type resolution — already live for Go, C, and C++ (more languages coming)
  • Cross-service HTTP linking — discovers REST routes and matches them to HTTP call sites with confidence scoring
  • Louvain community detection — automatically discovers functional modules by clustering call edges

Try It

If you're burning tokens on file-by-file exploration, give it a shot. Index your project and ask your agent a structural question — the difference is immediate.


Built with pure C, tree-sitter, and SQLite. No runtime dependencies. 780+ stars and growing. We built it for developers using coding agents. We want to reach here the most performant solution in this space as we believe it will enable more efficient coding for everyone and vice versa will translate in more good solutions coming up, faster and cheaper in token burn

Top comments (0)