DEV Community

Prajwal Gaonkar
Prajwal Gaonkar

Posted on

How to Store Code Intelligence Graphs Efficiently

There are generally two practical approaches people consider:

  • Store dependencies as adjacency lists (custom structure)
  • Use a graph database

This post breaks down how each approach behaves across:

  • Creation
  • Updates
  • Deletion
  • Maintenance
  • Storage internals
  • Time complexity

The Problem: Code is a Graph

Example: Function Structure in a Real Repo

Consider a simple codebase:

// auth.js
export function validateUser() {
  checkAuth();
  logAccess();
}

function checkAuth() {
  dbQuery();
}

// db.js
export function dbQuery() {
  connectDB();
}

function connectDB() {}

function logAccess() {}
Enter fullscreen mode Exit fullscreen mode

This creates a dependency graph:

validateUser → checkAuth → dbQuery → connectDB
validateUser → logAccess
Enter fullscreen mode Exit fullscreen mode

How Each Approach Stores This

Adjacency List

{
  "validateUser": ["checkAuth", "logAccess"],
  "checkAuth": ["dbQuery"],
  "dbQuery": ["connectDB"],
  "connectDB": [],
  "logAccess": []
}
Enter fullscreen mode Exit fullscreen mode

Graph Database

(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Enter fullscreen mode Exit fullscreen mode

Approach 1: Adjacency List

Creation

adj["validateUser"] = ["checkAuth", "logAccess"];
adj["checkAuth"] = ["dbQuery"];
adj["dbQuery"] = ["connectDB"];
adj["connectDB"] = [];
adj["logAccess"] = [];
Enter fullscreen mode Exit fullscreen mode

Updating

// Add new dependency
adj["validateUser"].push("newFunc");

// If you need reverse lookup (extra structure)
reverse["newFunc"].push("validateUser");
Enter fullscreen mode Exit fullscreen mode

Deletion

adj["validateUser"] = adj["validateUser"].filter(f => f !== "logAccess");

// Also clean reverse map
reverse["logAccess"] = reverse["logAccess"].filter(f => f !== "validateUser");
Enter fullscreen mode Exit fullscreen mode

Maintenance Problems

  • Need to maintain forward + reverse maps
  • Easy to get inconsistent
  • Hard to track multi-hop dependencies

Storage Internals

  • Stored as plain objects / JSON
  • Relationships = strings
  • No indexing on relationships

Traversal = manual DFS/BFS every time


Approach 2: PostgreSQL (Relational Model)

Representation

functions(id, name)
calls(from_id, to_id)
Enter fullscreen mode Exit fullscreen mode

Creation

INSERT INTO functions (id, name) VALUES (1, 'validateUser');
INSERT INTO functions (id, name) VALUES (2, 'checkAuth');
INSERT INTO calls (from_id, to_id) VALUES (1, 2);
Enter fullscreen mode Exit fullscreen mode

Updating

INSERT INTO calls (from_id, to_id) VALUES (1, 6);
Enter fullscreen mode Exit fullscreen mode

Deletion

DELETE FROM calls WHERE from_id = 1 AND to_id = 5;
Enter fullscreen mode Exit fullscreen mode

How Traversal Works (Joins)

In PostgreSQL, relationships are not directly connected. Instead, they are reconstructed using JOIN operations.

To traverse:

A → B → C → D
Enter fullscreen mode Exit fullscreen mode

PostgreSQL performs multiple JOINs:

SELECT f1.name, f2.name, f3.name
FROM functions f1
JOIN calls c1 ON f1.id = c1.from_id
JOIN functions f2 ON f2.id = c1.to_id
JOIN calls c2 ON f2.id = c2.from_id
JOIN functions f3 ON f3.id = c2.to_id;
Enter fullscreen mode Exit fullscreen mode

What a JOIN Does Here

  • Combines rows from multiple tables
  • Matches foreign keys (from_id → to_id)
  • Builds intermediate result sets step by step

Each JOIN is essentially reconstructing the graph edge manually.


Why Joins Become Expensive (in this use case)

  • Each hop requires another JOIN
  • Intermediate results grow at each step
  • Work increases with graph depth

So cost becomes:

proportional to number of joins × intermediate result size


Storage Internals

  • Stored as rows in tables
  • Relationships represented via foreign keys
  • No direct pointer traversal

Traversal = repeated JOIN operations


Time Complexity

  • Single hop (with index): O(log N)
  • Multi-hop traversal: O(k × join_cost)

Where:

  • k = number of hops
  • join_cost grows with data size

Approach 3: Graph Database

Representation

(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Enter fullscreen mode Exit fullscreen mode

Creation

CREATE (a:Function {name: "validateUser"})
CREATE (b:Function {name: "checkAuth"})
CREATE (a)-[:CALLS]->(b)
Enter fullscreen mode Exit fullscreen mode

Updating

MATCH (a:Function {name: "validateUser"})
CREATE (a)-[:CALLS]->(:Function {name: "newFunc"})
Enter fullscreen mode Exit fullscreen mode

Deletion

MATCH (a:Function {name: "validateUser"})-[r:CALLS]->(b:Function {name: "logAccess"})
DELETE r
Enter fullscreen mode Exit fullscreen mode

Maintenance

  • No reverse mapping needed
  • Relationships are first-class
  • Consistency handled by DB

Storage Internals

  • Nodes stored separately
  • Relationships stored as pointers (node → relationship → node)
  • Index-free adjacency

Traversal = follow pointers directly (no joins, no scans)


Why Graph Databases Win: How Traversal Works

In adjacency lists, traversal looks like:

A → B → C → D
Enter fullscreen mode Exit fullscreen mode

But internally:

  • You lookup A
  • Then search for B
  • Then search for C
  • Repeat

Each step involves work.


In graph databases:

A → (pointer) → B → (pointer) → C → (pointer) → D
Enter fullscreen mode Exit fullscreen mode

Traversal becomes:

follow pointer → follow pointer → follow pointer

No searching. No joins.


What This Means

  • Each hop is constant time
  • Traversal cost depends on path length, not total data
  • Even large graphs remain efficient

Time Complexity

Adjacency List

  • Direct lookup: O(1)
  • Traversal (DFS/BFS): O(V + E)
  • Reverse lookup: O(V + E) (unless extra structure maintained)

Graph Database

  • Starting node lookup: O(log N)
  • Each hop traversal: O(1)

Total traversal:

O(k)
(where k = number of relationships followed)


Key Insight

  • Adjacency List → O(V + E)
  • Graph DB → O(k)

This is why graph databases scale much better for deep, connected data.


Final Takeaway

At small scale, all approaches work.

At large scale:

Either you maintain graph logic yourself
Or you use a system built for graphs

Graph databases like Neo4j are best suited for problems where relationships and multi-hop traversals are core to the system.

They remove the complexity of managing relationships manually and make traversal efficient and natural.

Top comments (0)