Introduction
"~35% cheaper · ~70% fewer tool calls · 100% local"
This is the No.71 article in the "One Open Source Project a Day" series. Today we are exploring CodeGraph.
Start with a scenario: you ask Claude Code "How is AuthService being called?" Without any assistance, Claude's approach is: glob-scan directories, run multiple greps, read several files — then finally answer. The whole process might trigger 10–15 tool calls and consume hundreds of thousands of tokens.
CodeGraph's insight is to front-load this work: before you start, it has already parsed your codebase with tree-sitter into a semantic graph stored in a local SQLite database, then exposes 8 query tools to AI agents via MCP. When the agent needs to understand code, a single codegraph_context call returns entry points, related symbols, and code snippets — no file reading required.
9.6k Stars, 588 Forks. Benchmarks across 7 real open-source projects: average 35% cost savings, 70% fewer tool calls, 49% speed improvement. On VS Code's large TypeScript repository, one architecture Q&A dropped from 1.4M tokens to 393k — cost from $0.64 to $0.42.
What You Will Learn
- CodeGraph's four-stage pipeline: Extract → Store → Resolve → Auto-Sync
- The 8 MCP tools and when to use each
- A detailed breakdown of benchmark results across 7 projects: why do larger codebases benefit more?
- How 19-language support and 13-framework route recognition work
- Complete setup walkthrough from installation to Claude Code integration
-
codegraph affected: using dependency tracing for smart CI test selection
Prerequisites
- Familiarity with Claude Code, Cursor, or similar AI coding tools
- Basic understanding of MCP (Model Context Protocol)
- Node.js experience
Project Background
Project Introduction
CodeGraph is a local semantic code knowledge graph tool designed specifically to improve AI coding agent efficiency. Its core insight:
AI agents spend a massive amount of tokens and time in the "discovery phase" — scanning directories, searching for symbols, reading files — rather than on the actual reasoning and generation.
CodeGraph's solution is to outsource the discovery phase to a pre-built index: before you start working, the index is already ready, letting AI agents pull structured code knowledge directly instead of exploring the file system from scratch.
The technology choices are pragmatic: tree-sitter for AST parsing (mature, multi-language, high-performance), SQLite FTS5 for full-text search (zero external dependencies, fully local), and native OS file events for live sync (FSEvents/inotify/ReadDirectoryChangesW).
Author/Team
- Author: Colby McHenry (GitHub: colbymchenry)
- Repository: colbymchenry/codegraph
-
Distribution: npm package
@colbymchenry/codegraph
Project Stats
- ⭐ GitHub Stars: 9,600+
- 🍴 Forks: 588
- 📦 npm package:
@colbymchenry/codegraph - 🔧 Runtime: Node.js 20–24
- 💻 Platforms: Windows, macOS, Linux
- 📄 License: MIT
- 🌐 Repository: colbymchenry/codegraph
Main Features
Core Utility
CodeGraph inserts a pre-built index layer between AI agents and codebases:
Codebase (TypeScript / Python / Go / ...)
↓ tree-sitter parsing
Semantic graph (symbols + relationships + call chains)
↓ stored in SQLite FTS5
Local knowledge base
↓ exposed via MCP
AI coding agents (Claude Code / Cursor / Codex CLI / OpenCode)
Without CodeGraph:
User: "How is AuthService being called?"
→ Agent: glob("src/**/*.ts") # Tool call 1
→ Agent: grep("AuthService") # Tool call 2
→ Agent: read("auth.service.ts") # Tool call 3
→ Agent: grep("import.*Auth") # Tool call 4
→ Agent: read("user.controller.ts") # Tool call 5
→ Agent: read("app.module.ts") # Tool call 6
... 10–15 total tool calls, massive token consumption
With CodeGraph:
User: "How is AuthService being called?"
→ Agent: codegraph_callers("AuthService") # Tool call 1
→ Returns: full caller list + call sites + code snippets
→ Agent answers directly, no file reading needed
Quick Start
One-command install (recommended):
# Run the interactive installer — auto-detects installed AI agents and configures them
npx @colbymchenry/codegraph
# Initialize in your project (-i for interactive)
cd your-project
codegraph init -i
Non-interactive install (CI environments):
# Auto-detect all installed agents, global install
codegraph install --yes
# Target specific agents
codegraph install --target=cursor,claude --yes
# Project-local install
codegraph install --target=auto --location=local
Manual Claude Code configuration:
npm install -g @colbymchenry/codegraph
Add to ~/.claude.json (or project-level .claude.json):
{
"mcpServers": {
"codegraph": {
"type": "stdio",
"command": "codegraph",
"args": ["serve", "--mcp"]
}
}
}
Verify installation:
codegraph status # Check index status and stats
codegraph query "UserService" # Test symbol search
The 8 MCP Tools
The complete toolset CodeGraph exposes to AI agents:
| Tool | Purpose | Typical Invocation |
|---|---|---|
codegraph_search |
Find symbols by name | "Find all functions called authenticate" |
codegraph_context |
Build code context for a task | "What code is relevant to the login flow?" |
codegraph_callers |
Find what calls a function | "What calls AuthService?" |
codegraph_callees |
Find what a function calls | "What does processPayment call internally?" |
codegraph_impact |
Analyze change impact radius | "What breaks if I change this function?" |
codegraph_node |
Get details about a specific symbol | "Show me UserController's full signature" |
codegraph_files |
Get indexed file structure | "What is the overall project structure?" |
codegraph_status |
Check index health and stats | "How many symbols are indexed? Last sync?" |
codegraph_context is the most important tool — it doesn't just return search results; it intelligently assembles a comprehensive context package for a given task, including entry points, related symbols, and code snippets:
# Command-line equivalent
codegraph context "fix user login bug"
# → Automatically finds login-related functions, call chains, and relevant files
# packaged into context Claude can consume directly
Project Advantages
| Dimension | CodeGraph | Native AI Agent (no assist) | Other code indexers |
|---|---|---|---|
| Tool call count | ~70% fewer | High (re-scans each task) | Partial reduction |
| Token usage | ~59% fewer | High | Partial reduction |
| Data privacy | 100% local | Depends on agent | Most require uploads |
| Real-time sync | Native OS file events | N/A | Usually polling or manual |
| Language support | 19+ languages | Depends on agent | Usually 3–5 |
| Framework route detection | 13 frameworks | None | Rare |
| Installation complexity | One npx command | N/A | Usually requires server |
Detailed Analysis
1. The Four-Stage Pipeline
Stage 1: Extraction
tree-sitter parses source files into ASTs, extracting:
- Symbols: functions, classes, methods, interfaces, variable definitions
- Relationships: function calls, module imports, class inheritance, interface implementations
tree-sitter's key advantage: it is a fault-tolerant parser — it can extract partial structure even when code has syntax errors. This is critical for indexing files that are actively being edited.
Stage 2: Storage
All data lands in a local SQLite database using the FTS5 (Full-Text Search 5) extension:
-- Symbols table (simplified)
CREATE VIRTUAL TABLE symbols USING fts5(
name, -- Symbol name
kind, -- function/class/method/...
file_path, -- Source file
line_start, -- Starting line
signature, -- Function signature
docstring, -- Documentation comment
code_snippet -- Code excerpt
);
-- Relationships table
CREATE TABLE edges (
from_id INTEGER, -- Caller symbol ID
to_id INTEGER, -- Callee symbol ID
kind TEXT, -- calls/imports/inherits/implements
file TEXT,
line INTEGER
);
Stage 3: Resolution
The critical step: resolving abstract "called something named X" into concrete "called the definition in file Y at line Z."
Source code: import { AuthService } from './auth.service'
...
this.authService.login(user)
↓ resolution
Graph edges: UserController.login → AuthService.login (calls)
UserController → AuthService (imports)
Stage 4: Auto-Sync
Uses native OS file events (not polling!) to detect changes:
- macOS:
FSEvents - Linux:
inotify - Windows:
ReadDirectoryChangesW
A 2-second debounce prevents triggering mass rebuilds when files change rapidly — it waits for changes to settle before doing incremental updates.
2. Benchmark Deep Dive
Test conditions: Claude Code (headless, Opus 4.7) answering architecture questions. Each result is the median of 4 runs on the same question, across 7 real open-source repositories.
Project Language Size Cost ↓ Token ↓ Speed ↑ Tool Calls ↓
──────────────────────────────────────────────────────────────────────────────────────
VS Code TypeScript ~10k files 35% 73% 41% 72%
Excalidraw TypeScript ~600 files 47% 73% 60% 86%
Django Python ~2.7k files 34% 64% 59% 81%
Tokio Rust ~700 files 52% 81% 63% 89%
OkHttp Java ~640 files 17% 41% 36% 64%
Gin Go ~150 files 22% 23% 34% 19%
Alamofire Swift ~100 files 38% 59% 51% 77%
──────────────────────────────────────────────────────────────────────────────────────
Average 35% 59% 49% 70%
Patterns worth noting:
Tokio (Rust, 700 files) sees the biggest gains (81% token reduction, 89% fewer tool calls): Rust's type system is complex — agents originally needed extensive file exploration to understand trait implementations and generic relationships. CodeGraph's pre-built relationships make this dramatically cheaper.
Gin (Go, 150 files) sees the smallest gains (23% token reduction, 19% fewer tool calls): Small Go projects have simple file structures. Agents can already navigate them efficiently, so CodeGraph's marginal value is lower.
VS Code's absolute numbers are the most striking: the same question costs $0.64 (1.4M tokens) without CodeGraph, $0.42 (393k tokens) with it. A single task saves $0.22.
Takeaway: The larger the codebase, the more complex the dependencies, and the richer the language's type system, the greater CodeGraph's benefit. For developers using Claude Code heavily on large projects, the ROI is clear.
3. 19 Languages + 13 Framework Route Detection
Language support (via tree-sitter grammars):
TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Vue, Liquid, Pascal/Delphi, Scala
Framework route detection is a differentiating feature — CodeGraph doesn't just recognize symbols, it understands the mapping between URL routes and their handler functions:
# Django
urlpatterns = [
path('users/<int:pk>/', UserDetailView.as_view()),
]
# → CodeGraph knows GET /users/{id}/ maps to UserDetailView
# FastAPI
@app.get("/items/{item_id}")
async def read_item(item_id: int):
...
# → CodeGraph knows GET /items/{id} maps to read_item()
The 13 supported frameworks: Django, Flask, FastAPI, Express, NestJS, Laravel, Rails, Spring, Gin/chi/gorilla/mux, Axum/actix/Rocket, ASP.NET, Vapor, React Router/SvelteKit.
This means AI agents can ask "Where is the handler for /api/users/:id?" and get a precise answer, without needing to scan routing config files.
4. codegraph affected — Smart CI Test Selection
An underappreciated feature: by tracing import dependencies, it identifies which test files are actually affected by changed source files.
# CI scenario: only run tests affected by this change
git diff --name-only | codegraph affected --stdin
# Manually specify changed files
codegraph affected src/auth.ts
# With filter (only e2e tests)
codegraph affected src/auth.ts --filter "e2e/*"
How it works:
Changed: src/auth.ts
↓ CodeGraph queries the dependency graph
Direct importers: user.service.ts, auth.controller.ts
Indirect importers: app.module.ts, integration.test.ts
↓ Filter to test files only
Affected tests: auth.spec.ts, user.service.spec.ts, integration.test.ts
↓ Output
[these files] ← run only these, not the full test suite
On large projects, this can compress CI test time from tens of minutes to a few minutes.
5. Configuration and Performance Notes
Project config file (.codegraph/config.json):
{
"version": 1,
"languages": ["typescript", "javascript"],
"exclude": ["node_modules/**", "dist/**", "build/**", "*.min.js"],
"maxFileSize": 1048576,
"extractDocstrings": true,
"trackCallSites": true
}
SQLite backend selection:
CodeGraph ships with two SQLite backends:
-
Native
better-sqlite3(default, recommended): High performance, supports concurrent reads -
WASM fallback: Better compatibility, but 5–10x slower than native, and concurrent operations may produce
database is lockederrors
If you encounter performance issues or lock errors:
# Rebuild the native module
npm rebuild better-sqlite3
# Check which backend is active
codegraph status
CLI Reference
codegraph # Run interactive installer
codegraph init [path] # Initialize in a project
codegraph uninit [path] # Remove CodeGraph from a project
codegraph index [path] # Full index (--force to rebuild)
codegraph sync [path] # Incremental update
codegraph status [path] # Show statistics
codegraph query <search> # Search symbols
codegraph files [path] # Show file structure
codegraph context <task> # Build AI context for a task
codegraph affected [files] # Find affected test files
codegraph serve --mcp # Start MCP server
Library API (embed CodeGraph in your own tools):
import CodeGraph from '@colbymchenry/codegraph';
const cg = await CodeGraph.init('/path/to/project');
// Full index with progress callbacks
await cg.indexAll({
onProgress: (p) => console.log(`${p.phase}: ${p.current}/${p.total}`)
});
// Search symbols
const results = cg.searchNodes('UserService');
// Get call chain
const callers = cg.getCallers(results[0].node.id);
// Build AI context
const context = await cg.buildContext('fix login bug', {
maxNodes: 20,
includeCode: true
});
// Impact radius analysis (depth 2)
const impact = cg.getImpactRadius(results[0].node.id, 2);
cg.watch(); // Start file watching for auto-sync
cg.close(); // Clean up resources
Project Links & Resources
Official Resources
- 🌟 GitHub: https://github.com/colbymchenry/codegraph
- 📦 npm: @colbymchenry/codegraph
- ⚡ Quick install:
npx @colbymchenry/codegraph
Target Audience
- Heavy Claude Code / Cursor users: Working on large projects and looking to reduce cost and improve response speed
- Large TypeScript/Rust/Python project developers: Codebases large enough that AI agent file-scanning overhead is noticeable
-
CI/CD engineers: Using
codegraph affectedfor smart test selection to eliminate unnecessary full test runs - Toolchain developers: Embedding code semantic analysis into their own tools via the Library API
Summary
Key Takeaways
- Core value: Inserts a pre-built semantic index between AI agents and codebases — average 35% cost savings, 70% fewer tool calls, 49% speed improvement
- Technology choices: tree-sitter (AST parsing) + SQLite FTS5 (full-text search) + native OS file events (live sync) — zero external service dependencies
-
8 MCP tools:
codegraph_contextis the most critical — one call returns a complete context package for the task at hand - 19 languages + 13 framework route detection covering mainstream development stacks
-
codegraph affected: dependency-traced smart test selection, a CI acceleration tool - Gains scale with codebase size: Tokio (Rust, 700 files) reaches 89% fewer tool calls; small Go projects see ~19%
One-Line Review
CodeGraph does something deceptively simple yet extremely practical: it converts the code discovery work that AI agents redo on every task into a reusable local index — not a feature addition, but a workflow architecture optimization.
Find more useful knowledge and interesting products on my Homepage
Top comments (0)