Not another code graph engine. A lightweight navigation workflow for AI coding agents.
You ask Claude about a function. It gives you a confident, detailed explanation. You build on it for an hour. Then you find out it was wrong.
Or: you change a function, tests pass, you ship. Three days later, four other places called that function, and Claude never mentioned them.
Same root cause: Claude does not have a reliable way to navigate your codebase.
It starts from scratch every time. It reads what you give it. It guesses what it does not have. That is where you get hallucinations, missed impact, and fixes that are locally correct but globally incomplete.
The fix is not a smarter model. The fix is a map.
Why this matters for real-world engineers
If you use AI for coding regularly, you have probably seen this already.
The model starts in the wrong place, reads a few files that look related, misses one critical connection, and then builds the wrong mental model from there.
Sometimes it still produces code that looks plausible. Sometimes it even almost works. That is exactly what makes it dangerous.
Because now you are not saving time anymore. You are doing one of these instead:
- babysitting its search process
- repeatedly correcting its assumptions
- re-pasting the right files into context
- cleaning up a solution built on the wrong part of the system
At that point, the bottleneck is no longer code generation. It is codebase navigation.
And the bigger or messier the repo gets, the worse this becomes.
Most real repositories are not clean demo projects. They have:
- historical baggage
- duplicated patterns
- stale modules
- hidden wiring
- config-driven behavior
- one weird file that everything secretly depends on
A human engineer eventually learns those paths over time. AI does not. Every session starts with partial memory, incomplete context, and a high chance of exploring the wrong route.
That is the exact problem this project is trying to solve.
The dilemma
Here is the dilemma every Claude Code user runs into:
Option A: Let Claude read everything.
It greps the whole repo, opens 20 files, reads thousands of lines, and tries to be thorough.
That sounds safe, but on a real repo it gets expensive fast:
- token burn goes through the roof
- context gets noisy
- the model starts forgetting what it read five minutes ago
Option B: Let Claude read what it thinks is relevant.
Now it moves faster.
It opens 3 or 4 files, gives you a confident answer, and starts coding.
That sounds efficient, but this is how you get the dangerous kind of bug:
- it fixes one obvious path
- it misses the second path it never knew existed
- you only find out later that the change was incomplete
Both options suck. Read too much means expensive and slow. Read too little means you miss things.
There is a third option: give Claude a map.
Think of your codebase like Tokyo's subway system. Without a map, you can still get somewhere, but you spend the whole time wandering between lines and hoping you guessed the right transfer. With a map, you glance once, see the route, and move.
The map does not stop exploration. It just stops dumb exploration.
What AI Index actually is
AI Index is an AI-maintained repository graph.
It is not documentation for humans. It is not a call graph. It is not the source of truth.
Its job is simple:
- tell the agent which domain to open first
- show the main change surfaces inside that domain
- record the non-obvious "also check this" rules
- point back to real files, not prose summaries
The code is still the source of truth. The index is the traversal layer that helps the agent reach the right code faster and more completely.
Single source of truth matters
A lot of AI mistakes start the moment the model reads human-written documentation, treats it as truth, and never double-checks the actual implementation. If that documentation is stale, the model inherits the stale mental model too.
AI Index avoids that trap on purpose. It does not try to explain what a function does in prose. It points the agent to the real file, and lets the code answer the question. The map tells Claude where to look. The code tells Claude what is true.
Why this is not just another CLAUDE.md / AI_INDEX template
A flat AI_INDEX is still too close to a phone book:
auth → src/auth/
payments → src/payments/
billing → src/billing/
That helps Claude find a folder. But it still does not tell Claude which surfaces move together.
The newer AI Index shape is closer to an operations card:
# AI Index — repo-name
## Domain Index
| Domain | File | Owns | Open When |
|---|---|---|---|
| Auth | `AI_INDEX/auth.md` | login, tokens, middleware | auth bugs, session issues |
# Domain — Auth
## Scope
- owns: `src/auth/`
- change_surfaces:
- services: `src/auth/`
- tests: `tests/auth/`
- must_check:
- `src/billing/` when auth payload shape changes
Now Claude does not just know where auth lives. It knows what belongs to the same change surface, and what else must be checked before calling the edit done.
That is the difference between a list of folders and a real navigation layer.
The problem is not intelligence. It is navigation.
We tested existing approaches like Aider's repo map and similar repo-map tools.
They are useful. But they solve a different problem.
They help models understand a repository. In practice, AI coding assistants usually fail because they:
- read too many irrelevant files
- miss critical connections like registries, routing, or config wiring
- build incorrect mental models
- waste tokens exploring blindly
- fail to trace impact correctly
The bottleneck is not intelligence. The bottleneck is navigation.
This project is not trying to compete with full code intelligence systems. It is solving a narrower and more practical problem:
AI coding assistants do not fail because they are not smart enough. They fail because they do not know where to look.
We focus on navigation, not summarization.
Why not just use search, grep, or repo maps?
Search and grep help, but they are reactive. They only work well if you already know what to look for.
Repo maps help with orientation, but a summary is not a navigation system. It still does not tell the model where to start for this task, how to move from one file to another, or what paths to follow when tracing impact.
The repeated failure mode is not "the model does not understand the code." It is "the model is looking at the wrong code."
Once it starts from the wrong place, every step after that compounds the error.
AI Index gives it:
- a starting point
- a small set of relevant domain files
- change surfaces and must-check rules
- a fallback when the index is incomplete
The goal is not to eliminate exploration. The goal is to guide it.
What this plugin bundle does
A Claude Code plugin bundle: a small set of skills plus an AI Index format that gives Claude a persistent navigation layer for your codebase.
| Skill | What it does |
|---|---|
/ai-index |
Default entry point. Decide whether to use an existing AI Index, build one, or sync it after changes |
/use-ai-index |
Read the root index first, open only relevant domain files, and sweep the right change surface |
/generate-graph |
Build the AI Index from scratch |
/sync-graph |
Update only the affected graph files after meaningful code changes |
/debug |
Locate → root cause → pattern sweep → fix |
/new-feature |
Find pattern → trace impact → implement |
The index is split by default:
-
AI_INDEX.mdfor read order, global rules, and the domain index -
AI_INDEX/<domain>.mdfor change surfaces, must-check rules, and critical traversal nodes
It can live in the repo, or beside the repo if you want to keep it as a research asset. Either way, Claude uses it as a navigation layer, not as source of truth.
How it works
The map
/generate-graph builds an AI Index as a small root file plus per-domain files. The root tells Claude which domains to open. The domain files record the real change surfaces and must-check rules.
The workflow
Instead of dumping context into the model, we do this:
Build an AI-maintained graph.
AI inspects the real repo structure, defines domains by change ownership, and writes the root file plus domain files.Navigate using the graph.
Start from the root index, open only the relevant domains, and follow change surfaces plus must-check rules.Read source code only when needed.
The graph narrows the search space. The model still reads actual source for correctness.Sync only when traversal shape changes.
Add new routes, services, jobs, configs, or domain edges to the graph. Leave the graph alone for tiny internal edits.
The first build is done by AI. Later updates are also done by AI. Scripts are not the source of truth anymore.
Your workflow
You still do not need to learn a whole new system.
- First time on a repo: start with
/ai-indexor/generate-graph - Repo already has an index: use
/use-ai-index - After meaningful code changes: run
/sync-graph - Bug fix or feature work: let
/debugor/new-featureuse the graph as the starting point
That is it. The point is to give Claude a better starting map, not to make you maintain a second giant documentation system.
Does it actually work?
Eight benchmark tasks across repos of different sizes (small hobby project to 77K-file monorepo), comparing: graph-guided navigation vs. no map vs. project docs vs. fullstack-debug vs. Aider's PageRank map.
Test 1 — Bug fix: missing rate limit (small repo)
| Metric | A (graph) | B (no map) |
|---|---|---|
| Tokens | 14K | 14K |
| Tool calls | 10 | 12 |
| Found root cause? | ✅ | ✅ |
| Found cascade impact? | ✅ | ❌ |
Same tokens, but B missed the restore/undo path. It fixed the main bug and left a secondary code path broken. A found it because the map pushed Claude to inspect both related paths.
Test 2 — Bug fix: UI refresh issue (small repo)
| Metric | A (graph) | B (no map) |
|---|---|---|
| Tokens | 5K | 5.1K |
| Tool calls | 4 | 5 |
| Found root cause? | ✅ | ✅ |
Simple UI bug — comparable performance. Graph doesn't help much when the entry point is obvious.
Test 3 — New feature planning (small repo)
| Metric | A (graph) | B (no map) |
|---|---|---|
| Tokens | 11K | 14K |
| Tool calls | 10 | 14 |
| Identified impact correctly? | ✅ | ✅ |
23% fewer tokens. The graph told Claude which files to skip. B explored files that turned out to be irrelevant.
Test 4 — Understanding a flow (small repo)
| Metric | A (graph) | B (no map) |
|---|---|---|
| Tokens | 5K | 6K |
| Tool calls | 5 | 8 |
| Accurate explanation? | ✅ | ✅ |
17% fewer tokens, 37% fewer tool calls. Graph provided entry points directly.
Test 5 — Pattern audit: find all instances of a bug pattern (small repo)
| Metric | A (graph) | B (no map) | A + exhaustive sweep |
|---|---|---|---|
| Tokens | 16K | 22K | 16K + $0.02 |
| Tool calls | 12 | 18 | 12 + sweep |
| Coverage | ~80% | ~60% | 100% |
Neither agent alone hits 100%. Graph scopes the search area, then an optional exhaustive sweep scans every file for the same bug pattern — costs about $0.02 on a large repo. Full coverage.
Test 6 — Bug fix: missing feature flag (large repo, 77K files)
| Metric | A (graph) | C (no map) |
|---|---|---|
| Tokens | 48K | 72K |
| Tool calls | 14 | 26 |
| Found root cause? | ✅ | ✅ |
33% fewer tokens on a 77K-file repo. The graph narrowed the search from the entire monorepo to a single domain. C explored broadly before finding the right area.
Test 7 — Cross-repo investigation: frontend calling backend (large repo)
| Metric | A (graph) | C (no map) |
|---|---|---|
| Tokens | 55K | 82K |
| Tool calls | 18 | 33 |
| Found the backend endpoint? | ✅ | ✅ |
| Found the wiring gap? | ✅ | ❌ |
C found the backend endpoint. A found that too — plus the fact that the frontend component called get_tool_input_text(). Infrastructure ready, caller not wired. Graph saved 33% tokens over no-map.
Test 8 — New feature investigation: session context tool calls (large repo, 4 approaches)
Frontend developer asks: can we add tool calls, in/out flags, and tool names to the session context API?
| Metric | A (graph) | C (no map) | D (project docs) | E (fullstack-debug) | Aider map |
|---|---|---|---|---|---|
| Tokens | 61K | 47K | 64K | 49K | N/A |
| Tool calls | 17 | 30 | 35 | 32 | N/A |
| Found endpoint? | ✅ | ✅ | ✅ | ✅ | ❌ |
| Found existing helpers? | ✅ | ✅ | ✅ | ✅ | — |
| Extra insight | — | — | ⚠️ ingestion caveat | — | — |
Aider's map optimizes for editing context, not investigation. Its PageRank-based ranking prioritizes "globally important" functions — on the 77K-file repo, the session context endpoint wasn't important enough to make it into the 560-line map. A task-specific graph with explicit edges performs better for tracing and investigation. Agent D (project docs) found a critical caveat about data storage that others missed. Agent A used fewest tool calls (17 vs 30-35).
Honest note: in Test 8, the graph version actually used MORE tokens (61K vs 47K). The graph guided Claude to read deeper — it found an ingestion caveat the others missed, but it cost more tokens doing so. The graph doesn't always save tokens. Its value is coverage, not cost.
Summary: when does each approach help?
| Task type | Token savings (graph vs no map) | Quality difference |
|---|---|---|
| Bug fix (clear entry point) | ~0% | Graph finds cascade impact others miss |
| Bug fix (UI flow) | ~3% | Comparable |
| New feature planning | 23% | Graph knows which files to skip |
| Understanding a flow | 17% | Graph provides entry points directly |
| Pattern audit (large repo) | 42% | Graph + exhaustive sweep = 100% coverage |
| Cross-repo investigation | 33% | Graph points to the right repo/domain |
| Feature investigation (large repo) | Varies | Graph wins on investigation; docs may still surface caveats |
Key findings
The graph's biggest value isn't saving tokens — it's preventing missed impact. On a 10-file repo, savings are 17-23%. On a 77K-file repo, savings jump to 33-42%. But finding the cascade bug (the restore/undo path that only the graph version caught) — that's a qualitative difference, not quantitative.
(42% is the peak saving on pattern audits across large repos. Average across all task types is 17–33%. We show the full range in the benchmarks above.)
Aider's map and this graph solve different problems. Aider optimizes for editing context (which files to include when making changes). This plugin optimizes for investigation and impact tracing (which files are connected to your change). On the 77K-file repo, the session context endpoint wasn't in Aider's 560-line map at all — it wasn't globally important, just task-relevant.
No single approach achieves 100% coverage on pattern audits. The best workflow is a hybrid: graph scopes down the search area, then an exhaustive sweep finds every remaining instance for ~$0.02.
Project notes can still surface useful caveats — but the current design goal is to fold those caveats into repo-level rules or must_check, not maintain a separate Docs: field.
What this is NOT
- Not a full code intelligence platform
- Not a semantic search engine
- Not a replacement for reading code
- Not trying to be the most accurate graph possible
What this IS
An AI-maintained repository graph for change-complete navigation.
Not perfect understanding. Not complete graphs. Just this:
Find the right code quickly, and do not miss the related paths that matter.
Get it
Install through the marketplace repo:
/plugin add-marketplace https://github.com/ithiria894/AI-Index
/plugin install codebase-navigator
This release ships as a Claude Code plugin made of skills and docs. It is not a standalone MCP server.
Then start with /ai-index (or jump straight to /generate-graph if you are bootstrapping a new repo).
github.com/ithiria894/AI-Index
Built from research, source code analysis, and way too many hours of watching Claude confidently explain code it had not actually read.
Top comments (0)