A deterministic alternative to embedding-based repo understanding

#python #ai #devtools #opensource

Hey everyone, I'm Avi a CS student at FHNW in Switzerland.

I’ve been a bit frustrated with how AI coding tools handle larger codebases. Most of them rely on embeddings + prompting, which is cool for fuzzy stuff, but sometimes feels inconsistent, hard to reason about, and probably token-heavy.

So I wanted to try something more “boring” and predictable.

I built a small prototype called ai-context-map. It uses static analysis to build a structural graph of a repo:

files
imports / dependencies
some basic symbols (mostly Python for now)

The idea is to precompute a map of the repo so an AI (or even a human) doesn’t have to rediscover structure every time.

No ML, no embeddings, no API calls. Just parsing + graph stuff.

It outputs something like a .ai/context.yaml file. Very simplified example:

entry_points:
  - path: src/main.py

core_modules:
  - src/services/auth.py

task_routes:
  api_change:
    - src/api/routes.py
    - src/services/auth.py

anchors:
  - symbol: login_user
    file: src/services/auth.py
    line: 42

What I'm trying to figure out is basically if this direction even makes sense.

Where does a purely static / graph-based approach fall apart compared to embeddings?
Are there tools doing something similar already that I should look into?
If you work with larger repos: would something deterministic like this actually help, or is vector search + big context already “good enough”?

One thing I'm curious about:

Could something like this reduce how many files an AI needs to look at, and therefore reduce token usage?

Repo:
https://github.com/inspiringsource/ai-context-map

Would really appreciate feedback (also “this is useless” is fine)