DEV Community

Cover image for AI coding agents don't need more context, they need a graph
Ansh Sonkar
Ansh Sonkar

Posted on

AI coding agents don't need more context, they need a graph

Github - Get the full Info about Carto

The problem

AI coding tools are confident. They'll propose a 12-line patch to a file with 83 transitive dependents like it's nothing. You accept it. Things break downstream. The agent can't see this coming because nothing in its context models your import graph. It reads the file it's editing, maybe a few greps, maybe a vector search hit, then writes confidently into code it doesn't understand. Most of the time the patch lands somewhere reasonable and tests pass. The bad case is silent: a confident refactor on a file that 60+ other files import, the cascade doesn't run locally, and three days later something downstream breaks.

So I built Carto

It indexes your codebase into a local SQLite database (import graph, domain map, blast radius for every file) and exposes a validate_diff tool your AI calls before showing you anything. It gets back the risk level, every file the change touches transitively, and whether it crosses domain boundaries it shouldn't. The AI sees all of this before it proposes the patch. It revises, splits the change, or flags it. The bad diff never makes it to your screen.

Real case

Last week Claude Code proposed a 12-line refactor of a postgres formatter. Looked clean. Lint passed. I was about to accept. validate_diff returned HIGH with 83 transitive dependents and a high_blast violation on the modified file. The agent threw out its own patch and asked if I wanted to split the change instead. That one tool call is the entire reason the rest of the system exists.

Beyond validation

Ask Carto what files to touch for any task, which files break if you change one, what patterns already exist before you write new code, or get a full architectural overview of the codebase (domains, entry points, key patterns). And because every decision is logged to a local SQLite log, the AI remembers what was already decided yesterday, last week, or three sessions ago. did_we_discuss_this("snake_case naming") returns the prior decision. The AI stops re-litigating settled questions inside the same repo.
This was the smallest piece to build and honestly the most useful one. The agent's worst trait is that it forgets, and SQLite turns out to be a fine memory.

Under the hood

tree-sitter parses every file for imports and symbols (0.05-0.2ms per file), Babel goes deeper on API handler files only to extract routes and models. Domains are detected by running Leiden+CPM graph clustering over the import graph (files that heavily import each other naturally cluster together, names inferred from path tokens). Blast radius queries run on a Uint32Array bitset layer built from SQLite. Median 20.7× faster than raw SQL on a 7,567-file repo. The bitset class itself is 60 lines, zero deps. Word-level OR, AND, popcount, iterate. Three pre-allocated bitsets total in the BFS hot loop, no allocation churn per hop.

simulate_change_impact

Only exists because of bitmap OR aggregation. Computing the union blast radius across multiple files simultaneously has no SQLite equivalent at this latency. It's the same insight that makes validate_diff cheap: a union of 20 files' blast radii reduces to one OR pass over data already in memory. validate_diff runs in 0.040ms at the median on a 7,567-file repo. Budget going in was p50 under 5ms, p99 under 15ms. Cleared by 30 to 60 times. The reason the budget mattered: this has to run inside the agent loop on every proposed diff. At 50ms the agent skips the call. At 0.04ms there's no reason not to make it.

No daemon. Just 4 git hooks (pre-commit, post-checkout, post-merge, post-rewrite) keep the index fresh. Stale files re-parse inline at MCP query time. 22 MCP tools total. carto init auto-wires into Cursor, Claude Code, Kiro, Windsurf, VS Code Copilot, Codex, Claude Desktop, Zed, JetBrains. One install, restart your AI tool, the AI calls Carto on its own from then on.

Why this shape was already in my head: I was building Emfirge, a cloud security agent that maps AWS infrastructure into a graph and simulates blast radius for every change. To make Emfirge's AI understand AWS, I wrote a module called cartography.py. It mapped resources, built a graph, wrote it into a structured map. The AI stopped hallucinating about IAM and VPC peering. One night I was watching Claude Code propose a refactor inside a file with 60+ dependents and realized I'd already solved this once. For AWS. Same exact shape. Source code and cloud infra are both directed graphs of components with declared dependencies. Carto is cartography.py retargeted at source.

npm install -g carto-md
cd your-project
carto init

MIT. Local only. No telemetry, no cloud, no account.

github.com/theanshsonkar/carto

Top comments (0)