There are generally two practical approaches people consider:
- Store dependencies as adjacency lists (custom structure)
- Use a graph database
This post breaks down how each approach behaves across:
- Creation
- Updates
- Deletion
- Maintenance
- Storage internals
- Time complexity
The Problem: Code is a Graph
Example: Function Structure in a Real Repo
Consider a simple codebase:
// auth.js
export function validateUser() {
checkAuth();
logAccess();
}
function checkAuth() {
dbQuery();
}
// db.js
export function dbQuery() {
connectDB();
}
function connectDB() {}
function logAccess() {}
This creates a dependency graph:
validateUser → checkAuth → dbQuery → connectDB
validateUser → logAccess
How Each Approach Stores This
Adjacency List
{
"validateUser": ["checkAuth", "logAccess"],
"checkAuth": ["dbQuery"],
"dbQuery": ["connectDB"],
"connectDB": [],
"logAccess": []
}
Graph Database
(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Approach 1: Adjacency List
Creation
adj["validateUser"] = ["checkAuth", "logAccess"];
adj["checkAuth"] = ["dbQuery"];
adj["dbQuery"] = ["connectDB"];
adj["connectDB"] = [];
adj["logAccess"] = [];
Updating
// Add new dependency
adj["validateUser"].push("newFunc");
// If you need reverse lookup (extra structure)
reverse["newFunc"].push("validateUser");
Deletion
adj["validateUser"] = adj["validateUser"].filter(f => f !== "logAccess");
// Also clean reverse map
reverse["logAccess"] = reverse["logAccess"].filter(f => f !== "validateUser");
Maintenance Problems
- Need to maintain forward + reverse maps
- Easy to get inconsistent
- Hard to track multi-hop dependencies
Storage Internals
- Stored as plain objects / JSON
- Relationships = strings
- No indexing on relationships
Traversal = manual DFS/BFS every time
Approach 2: PostgreSQL (Relational Model)
Representation
functions(id, name)
calls(from_id, to_id)
Creation
INSERT INTO functions (id, name) VALUES (1, 'validateUser');
INSERT INTO functions (id, name) VALUES (2, 'checkAuth');
INSERT INTO calls (from_id, to_id) VALUES (1, 2);
Updating
INSERT INTO calls (from_id, to_id) VALUES (1, 6);
Deletion
DELETE FROM calls WHERE from_id = 1 AND to_id = 5;
How Traversal Works (Joins)
In PostgreSQL, relationships are not directly connected. Instead, they are reconstructed using JOIN operations.
To traverse:
A → B → C → D
PostgreSQL performs multiple JOINs:
SELECT f1.name, f2.name, f3.name
FROM functions f1
JOIN calls c1 ON f1.id = c1.from_id
JOIN functions f2 ON f2.id = c1.to_id
JOIN calls c2 ON f2.id = c2.from_id
JOIN functions f3 ON f3.id = c2.to_id;
What a JOIN Does Here
- Combines rows from multiple tables
- Matches foreign keys (from_id → to_id)
- Builds intermediate result sets step by step
Each JOIN is essentially reconstructing the graph edge manually.
Why Joins Become Expensive (in this use case)
- Each hop requires another JOIN
- Intermediate results grow at each step
- Work increases with graph depth
So cost becomes:
proportional to number of joins × intermediate result size
Storage Internals
- Stored as rows in tables
- Relationships represented via foreign keys
- No direct pointer traversal
Traversal = repeated JOIN operations
Time Complexity
- Single hop (with index): O(log N)
- Multi-hop traversal: O(k × join_cost)
Where:
- k = number of hops
- join_cost grows with data size
Approach 3: Graph Database
Representation
(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Creation
CREATE (a:Function {name: "validateUser"})
CREATE (b:Function {name: "checkAuth"})
CREATE (a)-[:CALLS]->(b)
Updating
MATCH (a:Function {name: "validateUser"})
CREATE (a)-[:CALLS]->(:Function {name: "newFunc"})
Deletion
MATCH (a:Function {name: "validateUser"})-[r:CALLS]->(b:Function {name: "logAccess"})
DELETE r
Maintenance
- No reverse mapping needed
- Relationships are first-class
- Consistency handled by DB
Storage Internals
- Nodes stored separately
- Relationships stored as pointers (node → relationship → node)
- Index-free adjacency
Traversal = follow pointers directly (no joins, no scans)
Why Graph Databases Win: How Traversal Works
In adjacency lists, traversal looks like:
A → B → C → D
But internally:
- You lookup A
- Then search for B
- Then search for C
- Repeat
Each step involves work.
In graph databases:
A → (pointer) → B → (pointer) → C → (pointer) → D
Traversal becomes:
follow pointer → follow pointer → follow pointer
No searching. No joins.
What This Means
- Each hop is constant time
- Traversal cost depends on path length, not total data
- Even large graphs remain efficient
Time Complexity
Adjacency List
- Direct lookup: O(1)
- Traversal (DFS/BFS): O(V + E)
- Reverse lookup: O(V + E) (unless extra structure maintained)
Graph Database
- Starting node lookup: O(log N)
- Each hop traversal: O(1)
Total traversal:
O(k)
(where k = number of relationships followed)
Key Insight
- Adjacency List → O(V + E)
- Graph DB → O(k)
This is why graph databases scale much better for deep, connected data.
Final Takeaway
At small scale, all approaches work.
At large scale:
Either you maintain graph logic yourself
Or you use a system built for graphs
Graph databases like Neo4j are best suited for problems where relationships and multi-hop traversals are core to the system.
They remove the complexity of managing relationships manually and make traversal efficient and natural.
Top comments (0)