There are generally two practical approaches people consider:
- Store dependencies as adjacency lists (custom structure)
- Use a graph database
This post breaks down how each approach behaves across:
- Creation
- Updates
- Deletion
- Maintenance
- Storage internals
- Time complexity
The Problem: Code is a Graph
Example: Function Structure in a Real Repo
Consider a simple codebase:
// auth.js
export function validateUser() {
checkAuth();
logAccess();
}
function checkAuth() {
dbQuery();
}
// db.js
export function dbQuery() {
connectDB();
}
function connectDB() {}
function logAccess() {}
This creates a dependency graph:
validateUser → checkAuth → dbQuery → connectDB
validateUser → logAccess
How Each Approach Stores This
Adjacency List
{
"validateUser": ["checkAuth", "logAccess"],
"checkAuth": ["dbQuery"],
"dbQuery": ["connectDB"],
"connectDB": [],
"logAccess": []
}
Graph Database
(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Approach 1: Adjacency List
Creation
adj["validateUser"] = ["checkAuth", "logAccess"];
adj["checkAuth"] = ["dbQuery"];
adj["dbQuery"] = ["connectDB"];
adj["connectDB"] = [];
adj["logAccess"] = [];
Updating
// Add new dependency
adj["validateUser"].push("newFunc");
// If you need reverse lookup (extra structure)
reverse["newFunc"].push("validateUser");
Deletion
adj["validateUser"] = adj["validateUser"].filter(f => f !== "logAccess");
// Also clean reverse map
reverse["logAccess"] = reverse["logAccess"].filter(f => f !== "validateUser");
Maintenance Problems
- Need to maintain forward + reverse maps
- Easy to get inconsistent
- Hard to track multi-hop dependencies
Storage Internals
- Stored as plain objects / JSON
- Relationships = strings
- No indexing on relationships
Traversal = manual DFS/BFS every time
Approach 2: PostgreSQL (Relational Model)
Representation
functions(id, name)
calls(from_id, to_id)
Creation
INSERT INTO functions (id, name) VALUES (1, 'validateUser');
INSERT INTO functions (id, name) VALUES (2, 'checkAuth');
INSERT INTO calls (from_id, to_id) VALUES (1, 2);
Updating
INSERT INTO calls (from_id, to_id) VALUES (1, 6);
Deletion
DELETE FROM calls WHERE from_id = 1 AND to_id = 5;
How Traversal Works (Joins)
In PostgreSQL, relationships are not directly connected. Instead, they are reconstructed using JOIN operations.
To traverse:
A → B → C → D
PostgreSQL performs multiple JOINs:
SELECT f1.name, f2.name, f3.name
FROM functions f1
JOIN calls c1 ON f1.id = c1.from_id
JOIN functions f2 ON f2.id = c1.to_id
JOIN calls c2 ON f2.id = c2.from_id
JOIN functions f3 ON f3.id = c2.to_id;
What a JOIN Does Here
- Combines rows from multiple tables
- Matches foreign keys (from_id → to_id)
- Builds intermediate result sets step by step
Each JOIN is essentially reconstructing the graph edge manually.
Why Joins Become Expensive (in this use case)
- Each hop requires another JOIN
- Intermediate results grow at each step
- Work increases with graph depth
So cost becomes:
proportional to number of joins × intermediate result size
Storage Internals
- Stored as rows in tables
- Relationships represented via foreign keys
- No direct pointer traversal
Traversal = repeated JOIN operations
Time Complexity
- Single hop (with index): O(log N)
- Multi-hop traversal: O(k × join_cost)
Where:
- k = number of hops
- join_cost grows with data size
Approach 3: Graph Database
Representation
(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Creation
CREATE (a:Function {name: "validateUser"})
CREATE (b:Function {name: "checkAuth"})
CREATE (a)-[:CALLS]->(b)
Updating
MATCH (a:Function {name: "validateUser"})
CREATE (a)-[:CALLS]->(:Function {name: "newFunc"})
Deletion
MATCH (a:Function {name: "validateUser"})-[r:CALLS]->(b:Function {name: "logAccess"})
DELETE r
Maintenance
- No reverse mapping needed
- Relationships are first-class
- Consistency handled by DB
Storage Internals
- Nodes stored separately
- Relationships stored as pointers (node → relationship → node)
- Index-free adjacency
Traversal = follow pointers directly (no joins, no scans)
Why Graph Databases Win: How Traversal Works
In adjacency lists, traversal looks like:
A → B → C → D
But internally:
- You lookup A
- Then search for B
- Then search for C
- Repeat
Each step involves work.
In graph databases:
A → (pointer) → B → (pointer) → C → (pointer) → D
Traversal becomes:
follow pointer → follow pointer → follow pointer
No searching. No joins.
What This Means
- Each hop is constant time
- Traversal cost depends on path length, not total data
- Even large graphs remain efficient
Time Complexity
Adjacency List
- Direct lookup: O(1)
- Traversal (DFS/BFS): O(V + E)
- Reverse lookup: O(V + E) (unless extra structure maintained)
Graph Database
- Starting node lookup: O(log N)
- Each hop traversal: O(1)
Total traversal:
O(k)
(where k = number of relationships followed)
Key Insight
- Adjacency List → O(V + E)
- Graph DB → O(k)
This is why graph databases scale much better for deep, connected data.
Final Takeaway
At small scale, all approaches work.
At large scale:
Either you maintain graph logic yourself
Or you use a system built for graphs
Graph databases like Neo4j are best suited for problems where relationships and multi-hop traversals are core to the system.
They remove the complexity of managing relationships manually and make traversal efficient and natural.
Top comments (1)
Great comparison of storage approaches! The adjacency list vs PostgreSQL vs graph database breakdown is exactly what I needed when evaluating storage for AI agent memory systems.
We ended up with a different trade-off: for robot control systems, the query latency matters more than traversal complexity. Most queries are "give me all related entities for this context" — similar to your "what imports this function" use case — but the latency budget is under 10ms on embedded hardware.
For that constraint, traditional graph databases introduce too much overhead. We ended up building a custom embedded solution that stores edges as sorted vectors with binary search, giving us O(log n) traversal with zero network latency.
If you are dealing with AI agents that need fast local knowledge retrieval without the distributed setup overhead, worth considering lighter-weight approaches. Something like moteDB (a Rust-native embedded multimodal database) might handle the graph structure + vector similarity in a single zero-dependency crate.