Sahil Singh

Posted on Feb 8 • Originally published at glue.tools

Why Your Codebase Is a Graph, Not Files

#devtools #programming #architecture #ai

Open your IDE. You see a file tree. Directories and files, neatly organized. This is a lie.

Your codebase isn't a tree. It's a graph. And until you see it as a graph, you'll keep making the same mistakes: breaking things you didn't know were connected, duplicating logic that already exists somewhere, and estimating 2 days for work that takes 2 weeks.

The File Tree Illusion

File systems impose a hierarchy on code. Every file lives in exactly one directory. This creates an illusion of organization: "auth stuff is in /auth, billing is in /billing, utils are in /utils."

But code doesn't respect directory boundaries. authMiddleware.ts imports from sessionStore.ts which depends on redisClient.ts which shares a connection pool with cacheService.ts which is used by billingService.ts. The "auth stuff" and the "billing stuff" are connected through 3 intermediate dependencies.

In a file tree, these look separate. In a dependency graph, they're neighbors.

What the Graph Reveals

When you map your codebase as a graph — nodes for files/functions, edges for dependencies — patterns emerge that are invisible in a file tree:

Hidden Dependencies

Function A in the auth service calls Function B in the user service, which reads from the same database table that Function C in the billing service writes to. Changing Function C can break Function A through a path that no import statement reveals.

Feature Boundaries

Files that change together in the same commits form natural clusters. These clusters are your actual features — regardless of what directory they're in. The "checkout feature" might span files in /controllers, /services, /models, and /utils.

Complexity Hotspots

Nodes with the most edges are your riskiest files. A utility function imported by 47 other files is a single point of failure. Change it, and you have 47 potential regressions.

Knowledge Silos

Graph analysis reveals which parts of the codebase are only touched by one or two developers. These are your bus factor risks — areas where knowledge is concentrated in too few heads.

Why This Matters for AI Tools

Every AI coding tool that indexes your codebase treats it as a collection of files. They embed each file as a vector, search by semantic similarity, and return the closest matches.

This is file-tree thinking applied to AI. It misses:

Structural relationships (what calls what)
Transitive dependencies (what's connected through intermediate hops)
Feature boundaries (what forms a logical unit)

Graph-based code intelligence solves this. Instead of "find files similar to my query," it answers "find files structurally connected to what I'm working on." The difference is the difference between a web search and actually understanding the codebase.

Practical Implications

Stop estimating from file counts. "It's just 3 files" means nothing if those files have 40 downstream dependents.
Map dependencies before coding. Understand the graph neighborhood of your changes, not just the files you're editing.
Use graph metrics for code health. Cyclomatic complexity measures function complexity. Graph metrics (coupling, cohesion, centrality) measure system complexity.
Let the graph define features. Community detection algorithms (like Louvain) can automatically identify feature boundaries from dependency patterns.

Your codebase is a graph. Start treating it like one.

Originally published on glue.tools. Glue is the pre-code intelligence platform — paste a ticket, get a battle plan.

DEV Community