阿小卡卡

Posted on May 17

I deleted 4080 lines and rebuilt from scratch. Here's why call graphs are wrong for AI-generated code.

#opensource #productivity #showdev #tooling

AI writes 5000 lines in 5 minutes. I need 3 hours to review them.
The code works. The tests pass. But I no longer understand my own project.

I tried fixing this 3 times. Failed 3 times. Deleted 4080 lines along the way.
Here's what finally worked.

My take: every code visualization tool you've ever tried is showing you the wrong layer. Call graphs, dependency maps, AST trees — all wrong. Here's why.

Attempt 1: Call graphs

First instinct: visualize how functions call each other. Tools like Madge and dependency-cruiser do this beautifully.

The result was a hairball.

parseRequest() → validateInput() → fetchUser() → comparePassword() → ...

Each node was a function name. Each edge was a call. Beautiful. Useless.

The problem: I didn't want to know which function calls which. I wanted to know what the project does for the user. A call graph shows the how, not the what.

Imagine asking "What does scrambled eggs with tomatoes taste like?" and getting "prepare() calls slice() then whisk()". That's a call graph.

Attempt 2: AST + import maps

OK, so call graphs are too granular. Let me zoom out — module-level dependencies, AST analysis, the whole tree.

Tools I tried: ts-morph, PyCG, language servers.

Result: a prettier hairball. Now my nodes were files instead of functions, but the same fundamental problem — structure, not semantics.

I could see "auth/login.ts imports utils/jwt.ts" but that's not "the user logs in by submitting their password, the system verifies it, then signs a JWT".

The first sentence is structure. The second is the actual story.

Attempt 3: UCG (Universal Code Graph)

Now I got architectural. I designed a 5-layer system:

Language adapter (TS, Python) → UCG IR (universal graph)
  → Framework plugins → AI semantic layer → Canvas

The idea: structure is truth, semantics is decoration. Build a precise call graph at the bottom, let AI add labels on top.

I built it. It compiled. It rendered.

It still showed the wrong thing.

The semantic labels AI generated were like putting lipstick on the call graph. "src/graph/" got labeled "Canvas Rendering" — accurate but uninformative. "f-login" got labeled "User Authentication" — same thing the path already said.

The real insight: The structure of code (what functions call what) and the structure of features (what the project does for the user) are two different graphs. They're not aligned. You can't get from one to the other by adding labels.

I deleted 4080 lines of UCG code in one commit.

What finally worked: FCG (Feature & Flow Graph)

The breakthrough was admitting AI doesn't need help finding the structure. AI already knows what each feature does — it just wrote them. So I let AI describe features directly, in user terms:

Overview view — 6 epics laid out in user-journey order, semantic edges between them.

{
  "id": "f-login",
  "name": "User Login",
  "steps": [
    { "name": "Receive credentials", "role": "input" },
    { "name": "Find user", "role": "data-read" },
    { "name": "Verify password", "role": "auth" },
    { "name": "Issue token", "role": "compute" },
    { "name": "Return token", "role": "output" },
    { "name": "Return auth failed", "role": "error" }
  ],
  "flow": [
    { "from": "input", "to": "find", "kind": "next" },
    { "from": "find", "to": "verify", "kind": "next" },
    { "from": "verify", "to": "issue", "kind": "conditional", "condition": "password correct" },
    { "from": "verify", "to": "fail", "kind": "conditional", "condition": "password wrong" }
  ]
}

This is what I actually want to see when reviewing AI work. Not "function A calls function B". But "step 1 happens, then step 2, except when X then step 3".

It's the story of the feature.

I called this format FCG (Feature & Flow Graph) and built a viewer for it. AI writes the JSON during normal workflow (every code change → AI updates features.json). I read the graph, not the code.

Three principles that emerged

After validating on two real projects (a 40-feature simulation engine and a hackathon submission), three rules crystallized:

Semantic control belongs to AI / features.json — naming, ordering, grouping. The frontend never infers semantics.
Visual & interaction belongs to the frontend — drag, zoom, theme, layout algorithms.
When uncertain, let AI write it down explicitly — no frontend heuristics. Schema gets richer, the canvas stays a pure consumer.

These rules saved me from rebuilding it a fourth time.

Real example: the tool modeling itself

Here's what convinced me FCG works:

I had AI generate features.json for CodeSee itself. The graph shows install → scan → viewer → layout → sync as a clean user journey, with 22 features grouped under 6 epics, all the cross-feature relationships rendered as semantic edges.

Features view — drill down into any epic, see its features grouped in containers.

A new contributor can now understand the project in 30 seconds by opening the live demo — no need to read the code.

Drill down further and you see the actual flow inside a single feature:

Steps view — async (dashed animated), conditional (with condition labels), error branches all rendered with distinct visual language.

What I learned

If you're working with AI-generated code and feeling that same loss of ownership:

Don't try to make sense of the code structure. The code is fine, you just can't keep up with the volume.
Don't try to summarize code with AI labels. Wrong abstraction layer.
Have AI describe what the code does for the user, in user terms. Then you read the description.

The best part: AI is already doing this work mentally when it generates code. It just doesn't usually write it down. CodeSee's prompts make it write it down.

Try it

CodeSee is open source (MIT) and the viewer runs as a static site — no install needed:

→ Live demo (modeling itself, drop your own features.json to try)
→ GitHub repo

Works with Cursor, Claude Code, Kiro, Copilot, Codex, Gemini CLI, and any AI IDE that reads AGENTS.md or SKILL.md.

I'm an independent developer building this in my spare time. If you give it a try, I'd love feedback — open an issue or find me on LinuxDo.

What's the worst code visualization tool you've used? What do you wish existed instead? Drop a comment — I'm collecting pain points to drive the roadmap.

Building this took three rebuilds and two real projects to validate. One stable schema came out of it. If it sounds useful: a star helps me know I'm onto something — ⭐ here.