ithiria894

Posted on Mar 31 • Edited on Apr 15

Claude Can't Read Your Code — So I Built a Map for AI Navigation

#ai #productivity #programming #claudecode

Not another code graph engine. A lightweight navigation workflow for AI coding agents.

You ask Claude about a function. It gives you a confident, detailed explanation. You build on it for an hour. Then you find out it was wrong.

Or: you change a function, tests pass, you ship. Three days later, four other places called that function, and Claude never mentioned them.

Same root cause: Claude does not have a reliable way to navigate your codebase.

It starts from scratch every time. It reads what you give it. It guesses what it does not have. That is where you get hallucinations, missed impact, and fixes that are locally correct but globally incomplete.

The fix is not a smarter model. The fix is a map.

Why this matters for real-world engineers

If you use AI for coding regularly, you have probably seen this already.

The model starts in the wrong place, reads a few files that look related, misses one critical connection, and then builds the wrong mental model from there.

Sometimes it still produces code that looks plausible. Sometimes it even almost works. That is exactly what makes it dangerous.

Because now you are not saving time anymore. You are doing one of these instead:

babysitting its search process
repeatedly correcting its assumptions
re-pasting the right files into context
cleaning up a solution built on the wrong part of the system

At that point, the bottleneck is no longer code generation. It is codebase navigation.

And the bigger or messier the repo gets, the worse this becomes.

Most real repositories are not clean demo projects. They have:

historical baggage
duplicated patterns
stale modules
hidden wiring
config-driven behavior
one weird file that everything secretly depends on

A human engineer eventually learns those paths over time. AI does not. Every session starts with partial memory, incomplete context, and a high chance of exploring the wrong route.

That is the exact problem this project is trying to solve.

The dilemma

Here is the dilemma every Claude Code user runs into:

Option A: Let Claude read everything.

It greps the whole repo, opens 20 files, reads thousands of lines, and tries to be thorough.

That sounds safe, but on a real repo it gets expensive fast:

token burn goes through the roof
context gets noisy
the model starts forgetting what it read five minutes ago

Option B: Let Claude read what it thinks is relevant.

Now it moves faster.

It opens 3 or 4 files, gives you a confident answer, and starts coding.

That sounds efficient, but this is how you get the dangerous kind of bug:

it fixes one obvious path
it misses the second path it never knew existed
you only find out later that the change was incomplete

Both options suck. Read too much means expensive and slow. Read too little means you miss things.

There is a third option: give Claude a map.

Think of your codebase like Tokyo's subway system. Without a map, you can still get somewhere, but you spend the whole time wandering between lines and hoping you guessed the right transfer. With a map, you glance once, see the route, and move.

The map does not stop exploration. It just stops dumb exploration.

What AI Index actually is

AI Index is an AI-maintained repository graph.

It is not documentation for humans. It is not a call graph. It is not the source of truth.

Its job is simple:

tell the agent which domain to open first
show the main change surfaces inside that domain
record the non-obvious "also check this" rules
point back to real files, not prose summaries

The code is still the source of truth. The index is the traversal layer that helps the agent reach the right code faster and more completely.

Single source of truth matters

A lot of AI mistakes start the moment the model reads human-written documentation, treats it as truth, and never double-checks the actual implementation. If that documentation is stale, the model inherits the stale mental model too.

AI Index avoids that trap on purpose. It does not try to explain what a function does in prose. It points the agent to the real file, and lets the code answer the question. The map tells Claude where to look. The code tells Claude what is true.

Why this is not just another CLAUDE.md / AI_INDEX template

A flat AI_INDEX is still too close to a phone book:

auth → src/auth/
payments → src/payments/
billing → src/billing/

That helps Claude find a folder. But it still does not tell Claude which surfaces move together.

The newer AI Index shape is closer to an operations card:

# AI Index — repo-name
## Domain Index
| Domain | File | Owns | Open When |
|---|---|---|---|
| Auth | `AI_INDEX/auth.md` | login, tokens, middleware | auth bugs, session issues |

# Domain — Auth
## Scope
- owns: `src/auth/`
- change_surfaces:
  - services: `src/auth/`
  - tests: `tests/auth/`
- must_check:
  - `src/billing/` when auth payload shape changes

Now Claude does not just know where auth lives. It knows what belongs to the same change surface, and what else must be checked before calling the edit done.

That is the difference between a list of folders and a real navigation layer.

The problem is not intelligence. It is navigation.

We tested existing approaches like Aider's repo map and similar repo-map tools.

They are useful. But they solve a different problem.

They help models understand a repository. In practice, AI coding assistants usually fail because they:

read too many irrelevant files
miss critical connections like registries, routing, or config wiring
build incorrect mental models
waste tokens exploring blindly
fail to trace impact correctly

The bottleneck is not intelligence. The bottleneck is navigation.

This project is not trying to compete with full code intelligence systems. It is solving a narrower and more practical problem:

AI coding assistants do not fail because they are not smart enough. They fail because they do not know where to look.

We focus on navigation, not summarization.

Why not just use search, grep, or repo maps?

Search and grep help, but they are reactive. They only work well if you already know what to look for.

Repo maps help with orientation, but a summary is not a navigation system. It still does not tell the model where to start for this task, how to move from one file to another, or what paths to follow when tracing impact.

The repeated failure mode is not "the model does not understand the code." It is "the model is looking at the wrong code."

Once it starts from the wrong place, every step after that compounds the error.

AI Index gives it:

a starting point
a small set of relevant domain files
change surfaces and must-check rules
a fallback when the index is incomplete

The goal is not to eliminate exploration. The goal is to guide it.

What this plugin bundle does

A Claude Code plugin bundle: a small set of skills plus an AI Index format that gives Claude a persistent navigation layer for your codebase.

Skill	What it does
`/ai-index`	Default entry point. Decide whether to use an existing AI Index, build one, or sync it after changes
`/use-ai-index`	Read the root index first, open only relevant domain files, and sweep the right change surface
`/generate-graph`	Build the AI Index from scratch
`/sync-graph`	Update only the affected graph files after meaningful code changes
`/debug`	Locate → root cause → pattern sweep → fix
`/new-feature`	Find pattern → trace impact → implement

The index is split by default:

AI_INDEX.md for read order, global rules, and the domain index
AI_INDEX/<domain>.md for change surfaces, must-check rules, and critical traversal nodes

It can live in the repo, or beside the repo if you want to keep it as a research asset. Either way, Claude uses it as a navigation layer, not as source of truth.

How it works

The map

/generate-graph builds an AI Index as a small root file plus per-domain files. The root tells Claude which domains to open. The domain files record the real change surfaces and must-check rules.

The workflow

Instead of dumping context into the model, we do this:

Build an AI-maintained graph.
AI inspects the real repo structure, defines domains by change ownership, and writes the root file plus domain files.
Navigate using the graph.
Start from the root index, open only the relevant domains, and follow change surfaces plus must-check rules.
Read source code only when needed.
The graph narrows the search space. The model still reads actual source for correctness.
Sync only when traversal shape changes.
Add new routes, services, jobs, configs, or domain edges to the graph. Leave the graph alone for tiny internal edits.

The first build is done by AI. Later updates are also done by AI. Scripts are not the source of truth anymore.

Your workflow

You still do not need to learn a whole new system.

First time on a repo: start with /ai-index or /generate-graph
Repo already has an index: use /use-ai-index
After meaningful code changes: run /sync-graph
Bug fix or feature work: let /debug or /new-feature use the graph as the starting point

That is it. The point is to give Claude a better starting map, not to make you maintain a second giant documentation system.

Does it actually work?

Eight benchmark tasks across repos of different sizes (small hobby project to 77K-file monorepo), comparing: graph-guided navigation vs. no map vs. project docs vs. fullstack-debug vs. Aider's PageRank map.

Test 1 — Bug fix: missing rate limit (small repo)

Metric	A (graph)	B (no map)
Tokens	14K	14K
Tool calls	10	12
Found root cause?	✅	✅
Found cascade impact?	✅	❌

Same tokens, but B missed the restore/undo path. It fixed the main bug and left a secondary code path broken. A found it because the map pushed Claude to inspect both related paths.

Test 2 — Bug fix: UI refresh issue (small repo)

Metric	A (graph)	B (no map)
Tokens	5K	5.1K
Tool calls	4	5
Found root cause?	✅	✅

Simple UI bug — comparable performance. Graph doesn't help much when the entry point is obvious.

Test 3 — New feature planning (small repo)

Metric	A (graph)	B (no map)
Tokens	11K	14K
Tool calls	10	14
Identified impact correctly?	✅	✅

23% fewer tokens. The graph told Claude which files to skip. B explored files that turned out to be irrelevant.

Test 4 — Understanding a flow (small repo)

Metric	A (graph)	B (no map)
Tokens	5K	6K
Tool calls	5	8
Accurate explanation?	✅	✅

17% fewer tokens, 37% fewer tool calls. Graph provided entry points directly.

Test 5 — Pattern audit: find all instances of a bug pattern (small repo)

Metric	A (graph)	B (no map)	A + exhaustive sweep
Tokens	16K	22K	16K + $0.02
Tool calls	12	18	12 + sweep
Coverage	~80%	~60%	100%

Neither agent alone hits 100%. Graph scopes the search area, then an optional exhaustive sweep scans every file for the same bug pattern — costs about $0.02 on a large repo. Full coverage.

Test 6 — Bug fix: missing feature flag (large repo, 77K files)

Metric	A (graph)	C (no map)
Tokens	48K	72K
Tool calls	14	26
Found root cause?	✅	✅

33% fewer tokens on a 77K-file repo. The graph narrowed the search from the entire monorepo to a single domain. C explored broadly before finding the right area.

Test 7 — Cross-repo investigation: frontend calling backend (large repo)

Metric	A (graph)	C (no map)
Tokens	55K	82K
Tool calls	18	33
Found the backend endpoint?	✅	✅
Found the wiring gap?	✅	❌

C found the backend endpoint. A found that too — plus the fact that the frontend component called get_tool_input_text(). Infrastructure ready, caller not wired. Graph saved 33% tokens over no-map.

Test 8 — New feature investigation: session context tool calls (large repo, 4 approaches)

Frontend developer asks: can we add tool calls, in/out flags, and tool names to the session context API?

Metric	A (graph)	C (no map)	D (project docs)	E (fullstack-debug)	Aider map
Tokens	61K	47K	64K	49K	N/A
Tool calls	17	30	35	32	N/A
Found endpoint?	✅	✅	✅	✅	❌
Found existing helpers?	✅	✅	✅	✅	—
Extra insight	—	—	⚠️ ingestion caveat	—	—

Aider's map optimizes for editing context, not investigation. Its PageRank-based ranking prioritizes "globally important" functions — on the 77K-file repo, the session context endpoint wasn't important enough to make it into the 560-line map. A task-specific graph with explicit edges performs better for tracing and investigation. Agent D (project docs) found a critical caveat about data storage that others missed. Agent A used fewest tool calls (17 vs 30-35).

Honest note: in Test 8, the graph version actually used MORE tokens (61K vs 47K). The graph guided Claude to read deeper — it found an ingestion caveat the others missed, but it cost more tokens doing so. The graph doesn't always save tokens. Its value is coverage, not cost.

Summary: when does each approach help?

Task type	Token savings (graph vs no map)	Quality difference
Bug fix (clear entry point)	~0%	Graph finds cascade impact others miss
Bug fix (UI flow)	~3%	Comparable
New feature planning	23%	Graph knows which files to skip
Understanding a flow	17%	Graph provides entry points directly
Pattern audit (large repo)	42%	Graph + exhaustive sweep = 100% coverage
Cross-repo investigation	33%	Graph points to the right repo/domain
Feature investigation (large repo)	Varies	Graph wins on investigation; docs may still surface caveats

Key findings

The graph's biggest value isn't saving tokens — it's preventing missed impact. On a 10-file repo, savings are 17-23%. On a 77K-file repo, savings jump to 33-42%. But finding the cascade bug (the restore/undo path that only the graph version caught) — that's a qualitative difference, not quantitative.

(42% is the peak saving on pattern audits across large repos. Average across all task types is 17–33%. We show the full range in the benchmarks above.)

Aider's map and this graph solve different problems. Aider optimizes for editing context (which files to include when making changes). This plugin optimizes for investigation and impact tracing (which files are connected to your change). On the 77K-file repo, the session context endpoint wasn't in Aider's 560-line map at all — it wasn't globally important, just task-relevant.

No single approach achieves 100% coverage on pattern audits. The best workflow is a hybrid: graph scopes down the search area, then an exhaustive sweep finds every remaining instance for ~$0.02.

Project notes can still surface useful caveats — but the current design goal is to fold those caveats into repo-level rules or must_check, not maintain a separate Docs: field.

What this is NOT

Not a full code intelligence platform
Not a semantic search engine
Not a replacement for reading code
Not trying to be the most accurate graph possible

What this IS

An AI-maintained repository graph for change-complete navigation.

Not perfect understanding. Not complete graphs. Just this:

Find the right code quickly, and do not miss the related paths that matter.

Get it

Install through the marketplace repo:

/plugin add-marketplace https://github.com/ithiria894/AI-Index
/plugin install codebase-navigator

This release ships as a Claude Code plugin made of skills and docs. It is not a standalone MCP server.

Then start with /ai-index (or jump straight to /generate-graph if you are bootstrapping a new repo).

github.com/ithiria894/AI-Index

Built from research, source code analysis, and way too many hours of watching Claude confidently explain code it had not actually read.

DEV Community