HIROKI II

Posted on May 31

Graphs That Teach: How Understand-Anything Turns Codebases Into Interactive Maps

#ai #claude #knowledgegraph #devtools

Graphs That Teach: How Understand-Anything Turns Codebases Into Interactive Maps

I still remember my first day at a new job. The codebase had 200,000 lines spread across a microservices architecture I'd never seen before. My manager said, "Take the week to get familiar." I spent most of it drowning in grep results and IDE tabs.

Most code visualization tools try to impress you. They generate complex graphs with hundreds of nodes that look impressive in a demo but overwhelm you when you actually need to understand something.

That's exactly what Lum1104, a Georgia Tech researcher studying LLM multi-agent collaboration, set out to fix. His project tagline isn't subtle: "Graphs that teach > graphs that impress."

The Problem: Code Maps for AI, Not Humans

We've seen plenty of knowledge graph tools lately. Many are built for AI agents to navigate code — giving them structured context so they can generate better responses or write patches.

But humans are different. When you join a new team, you don't need a perfect graph. You need a story — how does this API route connect to that database call? What's the entry point for the payment flow?

Existing tools leave you doing detective work: open a file, follow an import, jump to a definition, repeat. Every question burns time and mental energy.

The Philosophy: Scan Once, Reuse Many Times

Understand-Anything (GitHub: Lum1104/Understand-Anything, 45.9k stars) flips this model. Instead of having an LLM re-read your source files for every single question — burning tokens and context — it compresses the entire project into a single JSON knowledge graph.

Once. Up front.

All subsequent interactions query this pre-built graph rather than raw source code. A similar project (code-review-graph) reported saving 6.8× tokens for review tasks and 49× for daily coding. That's not just efficiency — it's a fundamentally different way to think about code comprehension.

How It Works: The Best of Both Worlds

The architecture is genuinely elegant. It uses a hybrid approach:

Tree-sitter (deterministic): Parses source into concrete syntax trees. Same input, same output, every time. This extracts structural facts — imports, exports, function/class definitions, call sites, inheritance. It also enables fingerprint-based change detection for incremental updates.

LLM (semantic): Reads the parsed structure alongside original source to produce plain-English summaries, tags, architectural layer assignments, business-domain mapping, and guided tours.

The result is reproducible and smart — the structural side never changes unexpectedly, but the semantic side adapts to your specific codebase.

The Multi-Agent Pipeline

Behind the scenes, 6-7 specialized agents collaborate:

Agent	What It Does
`project-scanner`	Discovers files, detects languages and frameworks
`file-analyzer`	Extracts functions, classes, imports; builds graph nodes and edges
`architecture-analyzer`	Identifies layers (API, Service, Data, UI, Utility)
`tour-builder`	Generates guided learning paths ordered by dependency
`graph-reviewer`	Validates completeness and referential integrity
`domain-analyzer`	Extracts business domains, flows, and process steps
`article-analyzer`	Extracts entities from wiki articles (for `/understand-knowledge`)

File analyzers run in parallel — up to 5 concurrent, processing 20-30 files per batch. And because it's incremental, only changed files get re-analyzed.

What Makes It Different

Beyond the core knowledge graph, three features stood out to me:

1. Guided Onboarding Tours

The /understand-onboard command generates a walkthrough of the architecture, ordered by dependency. New team members don't just get a graph — they get a path through it.

2. Diff Impact Analysis

Ever made a change and wondered what you might have broken? /understand-diff maps your git diff onto the knowledge graph and shows exactly which parts of the system are affected. It's like having architectural hindsight.

3. Business Domain View

Not everything is technical. /understand-domain extracts business domains, flows, and steps — showing how code maps to real business processes. The marketing team finally has a bridge to understand the engineering team.

Platform Support

Understand-Anything supports 17 platforms: Claude Code (native plugin), Cursor, VS Code + GitHub Copilot, Copilot CLI, Codex, OpenCode, OpenClaw, Antigravity, Gemini CLI, Pi Agent, Vibe CLI, Hermes, Cline, KIMI CLI, Trae.

Installation is refreshingly simple:

# For Claude Code
/plugin marketplace add Lum1104/Understand-Anything
/plugin install understand-anything

# One-line for others
curl -fsSL https://raw.githubusercontent.com/Lum1104/Understand-Anything/main/install.sh | bash

What It's Not

This is important. Understand-Anything doesn't write code for you. It's not a code generator. It's a comprehension tool.

If you're looking for something that will automatically refactor your monolith or generate new features, look elsewhere. But if you need to understand a codebase — whether it's your first day or your thousandth — this changes the game.

The Bigger Picture

The "scan once, reuse many times" philosophy feels like a pattern we'll see more of. As codebases grow and AI coding assistants become standard, the bottleneck shifts from writing code to understanding it.

Tools like Understand-Anything don't replace human judgment — they augment it. They give you the context to make better decisions faster.

And in a world where we're all increasingly working with code we didn't write, that's not just useful. It's essential.

Try it: github.com/Lum1104/Understand-Anything

License: MIT

Latest: v2.7.3 (May 2026)

Top comments (2)

Harjot Singh • May 31

Turning a codebase into an interactive map attacks the real problem with onboarding into any nontrivial repo: the structure exists but it's invisible - you have to reconstruct the mental model by jumping between files, and a flat file tree tells you nothing about what actually depends on what. A graph that surfaces the call/import/dependency relationships gives you the architecture at a glance instead of after two weeks of spelunking. The hard part, and where I'd be curious about your approach, is signal-to-noise: a full dependency graph of a real codebase is a hairball, so the value is entirely in what you choose to collapse, cluster, and hide.

The reason this resonates: the same map that helps a human understand a codebase is exactly the structured context an AI needs to work in it well - relationships and architecture, not just raw files. It's the context-engineering problem from the visualization side. It's central to how I build Moonshift, the thing I work on - a multi-agent pipeline that takes a prompt to a deployed SaaS, where giving agents the right structural view of the code (not a file dump) is what keeps generation coherent. Multi-model routing keeps a build ~$3 flat, first run free no card. Genuinely useful tool. How do you tame the hairball on large repos - clustering by module, or interactive expand-on-demand? That's the make-or-break for codebase graphs in my experience.

HIROKI II • Jun 9

You've nailed the exact tension — the full dependency graph is a hairball, so the value lives entirely in what you hide. For Understand-Anything on the human side, the answer is architectural layer abstraction + query-driven subgraphs: the architecture-analyzer auto-categorizes every node into API/Service/Data/UI/Utility layers, and you filter by layer rather than manually expanding nodes. /understand-diff isolates only affected nodes; semantic search (/understand auth) returns a subgraph scoped to your question. The 10K-node hairball collapses to ~50 relevant nodes per interaction.

That said — for your Moonshift use case, CodeGraph (32.1K stars) might be more directly relevant. Understand-Anything is built for human comprehension; CodeGraph is built as an AI-native context engine. Instead of rendering a visual graph, it compresses a repo into a SQLite graph with FTS5 search — your agent asks "who calls X?" in a single MCP tool call and gets exact call chains back. No file-by-file exploration, no agent loops. We benchmarked it across 7 repos: 57% fewer tokens, 71% fewer tool calls on average. At your ~$3/build cost model, the math is straightforward.

Wrote about it here if you're curious: dev.to/hiroki-ii-ai/codegraph-the-... — curious how you're handling context engineering on the Moonshift side. Are you pre-computing a project graph, or extracting on-demand per prompt?