ArshTechPro

Posted on May 17

CodeGraph: Stop Your AI Agent From Grepping the Same Files 50 Times

#ai #claude #opensource #agentskills

When Claude Code explores a codebase it does not know, it spawns Explore agents that scan files with grep, glob, and Read. Every one of those calls costs tokens and time. And most of that work is not even useful analysis. It is discovery: figuring out where things live before the agent can start reading the code that actually matters.

CodeGraph removes that discovery step. Instead of letting the agent explore blind, it hands the agent a map.

What it is

CodeGraph is an open-source tool that builds a pre-indexed knowledge graph of your codebase: symbols, call graphs, imports, inheritance, and code structure. The agent queries that graph in one shot instead of scanning files one by one.

It runs as an MCP server for Claude Code, is MIT licensed, supports 19+ languages, and is 100 percent local. No API keys. No data leaving your machine. Just a SQLite database.

How it works

The pipeline is straightforward:

Extraction. tree-sitter parses your source into ASTs. Language-specific queries pull out nodes (functions, classes, methods) and edges (calls, imports, extends, implements).
Storage. Everything goes into a local SQLite database with FTS5 full-text search.
Resolution. References get linked up: function calls map to their definitions, imports map to source files, class inheritance is traced.
Auto-sync. A file watcher uses native OS events to keep the graph fresh as you code. Changes are debounced with a short quiet window. No configuration needed.

When the agent needs to understand the code, it asks the graph a question and gets entry points, related symbols, and code snippets back in a single call.

The numbers

This is the part that gets attention. The maintainer benchmarked Claude Code's Explore agent with and without CodeGraph across six real codebases, including VS Code, Excalidraw, and the Swift compiler.

The reported average: 92 percent fewer tool calls and 71 percent faster exploration.

A concrete example: on the VS Code codebase, answering "how does the extension host communicate with the main process" took 52 tool calls without CodeGraph and 3 with it. On a Java codebase, the agent answered the full question in a single CodeGraph call and zero file reads.

A fair note: these are self-reported, single-query benchmarks, so treat them as best-case. But the underlying idea does not depend on the exact percentage. Giving an agent a structured index instead of forcing it to explore blind is sound regardless.

Getting started

The interactive installer wires everything up:

npx @colbymchenry/codegraph

It installs CodeGraph globally, configures the MCP server in your Claude config, sets up auto-allow permissions, and adds global instructions. Then restart Claude Code and initialize a project:

cd your-project
codegraph init -i

Once a .codegraph/ directory exists, Claude Code uses the tools automatically.

A bonus that stands on its own

Even if you ignore the AI angle, one command is worth knowing: codegraph affected. It traces import dependencies transitively to find which test files are impacted by a set of changed files.

git diff --name-only HEAD | codegraph affected --stdin --quiet

Drop that in a CI script or git hook and you only run the tests that a change can actually break. That is a genuine speedup with no AI involved at all.

Is it useful?

Very, if you work in a large or unfamiliar codebase with Claude Code. The bigger and less familiar the repo, the more discovery the agent has to do, and the more CodeGraph saves you in tokens and wall-clock time.

The honest limit: on a small project, an agent can grep a 20-file repo cheaply enough that the index buys you little. The payoff scales with codebase size.

The takeaway

AI coding agents are not slow because the model is slow. They are slow because they spend most of their time figuring out where things are. CodeGraph solves that once, locally, and keeps the map current as you work. For anyone using Claude Code on a serious codebase, it is a low-effort, high-return addition.

DEV Community