DEV Community

Prajwal Gaonkar
Prajwal Gaonkar

Posted on

How to Store Code Intelligence Graphs Efficiently

There are generally two practical approaches people consider:

  • Store dependencies as adjacency lists (custom structure)
  • Use a graph database

This post breaks down how each approach behaves across:

  • Creation
  • Updates
  • Deletion
  • Maintenance
  • Storage internals
  • Time complexity

The Problem: Code is a Graph

Example: Function Structure in a Real Repo

Consider a simple codebase:

// auth.js
export function validateUser() {
  checkAuth();
  logAccess();
}

function checkAuth() {
  dbQuery();
}

// db.js
export function dbQuery() {
  connectDB();
}

function connectDB() {}

function logAccess() {}
Enter fullscreen mode Exit fullscreen mode

This creates a dependency graph:

validateUser → checkAuth → dbQuery → connectDB
validateUser → logAccess
Enter fullscreen mode Exit fullscreen mode

How Each Approach Stores This

Adjacency List

{
  "validateUser": ["checkAuth", "logAccess"],
  "checkAuth": ["dbQuery"],
  "dbQuery": ["connectDB"],
  "connectDB": [],
  "logAccess": []
}
Enter fullscreen mode Exit fullscreen mode

Graph Database

(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Enter fullscreen mode Exit fullscreen mode

Approach 1: Adjacency List

Creation

adj["validateUser"] = ["checkAuth", "logAccess"];
adj["checkAuth"] = ["dbQuery"];
adj["dbQuery"] = ["connectDB"];
adj["connectDB"] = [];
adj["logAccess"] = [];
Enter fullscreen mode Exit fullscreen mode

Updating

// Add new dependency
adj["validateUser"].push("newFunc");

// If you need reverse lookup (extra structure)
reverse["newFunc"].push("validateUser");
Enter fullscreen mode Exit fullscreen mode

Deletion

adj["validateUser"] = adj["validateUser"].filter(f => f !== "logAccess");

// Also clean reverse map
reverse["logAccess"] = reverse["logAccess"].filter(f => f !== "validateUser");
Enter fullscreen mode Exit fullscreen mode

Maintenance Problems

  • Need to maintain forward + reverse maps
  • Easy to get inconsistent
  • Hard to track multi-hop dependencies

Storage Internals

  • Stored as plain objects / JSON
  • Relationships = strings
  • No indexing on relationships

Traversal = manual DFS/BFS every time


Approach 2: PostgreSQL (Relational Model)

Representation

functions(id, name)
calls(from_id, to_id)
Enter fullscreen mode Exit fullscreen mode

Creation

INSERT INTO functions (id, name) VALUES (1, 'validateUser');
INSERT INTO functions (id, name) VALUES (2, 'checkAuth');
INSERT INTO calls (from_id, to_id) VALUES (1, 2);
Enter fullscreen mode Exit fullscreen mode

Updating

INSERT INTO calls (from_id, to_id) VALUES (1, 6);
Enter fullscreen mode Exit fullscreen mode

Deletion

DELETE FROM calls WHERE from_id = 1 AND to_id = 5;
Enter fullscreen mode Exit fullscreen mode

How Traversal Works (Joins)

In PostgreSQL, relationships are not directly connected. Instead, they are reconstructed using JOIN operations.

To traverse:

A → B → C → D
Enter fullscreen mode Exit fullscreen mode

PostgreSQL performs multiple JOINs:

SELECT f1.name, f2.name, f3.name
FROM functions f1
JOIN calls c1 ON f1.id = c1.from_id
JOIN functions f2 ON f2.id = c1.to_id
JOIN calls c2 ON f2.id = c2.from_id
JOIN functions f3 ON f3.id = c2.to_id;
Enter fullscreen mode Exit fullscreen mode

What a JOIN Does Here

  • Combines rows from multiple tables
  • Matches foreign keys (from_id → to_id)
  • Builds intermediate result sets step by step

Each JOIN is essentially reconstructing the graph edge manually.


Why Joins Become Expensive (in this use case)

  • Each hop requires another JOIN
  • Intermediate results grow at each step
  • Work increases with graph depth

So cost becomes:

proportional to number of joins × intermediate result size


Storage Internals

  • Stored as rows in tables
  • Relationships represented via foreign keys
  • No direct pointer traversal

Traversal = repeated JOIN operations


Time Complexity

  • Single hop (with index): O(log N)
  • Multi-hop traversal: O(k × join_cost)

Where:

  • k = number of hops
  • join_cost grows with data size

Approach 3: Graph Database

Representation

(validateUser)-[:CALLS]->(checkAuth)
(validateUser)-[:CALLS]->(logAccess)
(checkAuth)-[:CALLS]->(dbQuery)
(dbQuery)-[:CALLS]->(connectDB)
Enter fullscreen mode Exit fullscreen mode

Creation

CREATE (a:Function {name: "validateUser"})
CREATE (b:Function {name: "checkAuth"})
CREATE (a)-[:CALLS]->(b)
Enter fullscreen mode Exit fullscreen mode

Updating

MATCH (a:Function {name: "validateUser"})
CREATE (a)-[:CALLS]->(:Function {name: "newFunc"})
Enter fullscreen mode Exit fullscreen mode

Deletion

MATCH (a:Function {name: "validateUser"})-[r:CALLS]->(b:Function {name: "logAccess"})
DELETE r
Enter fullscreen mode Exit fullscreen mode

Maintenance

  • No reverse mapping needed
  • Relationships are first-class
  • Consistency handled by DB

Storage Internals

  • Nodes stored separately
  • Relationships stored as pointers (node → relationship → node)
  • Index-free adjacency

Traversal = follow pointers directly (no joins, no scans)


Why Graph Databases Win: How Traversal Works

In adjacency lists, traversal looks like:

A → B → C → D
Enter fullscreen mode Exit fullscreen mode

But internally:

  • You lookup A
  • Then search for B
  • Then search for C
  • Repeat

Each step involves work.


In graph databases:

A → (pointer) → B → (pointer) → C → (pointer) → D
Enter fullscreen mode Exit fullscreen mode

Traversal becomes:

follow pointer → follow pointer → follow pointer

No searching. No joins.


What This Means

  • Each hop is constant time
  • Traversal cost depends on path length, not total data
  • Even large graphs remain efficient

Time Complexity

Adjacency List

  • Direct lookup: O(1)
  • Traversal (DFS/BFS): O(V + E)
  • Reverse lookup: O(V + E) (unless extra structure maintained)

Graph Database

  • Starting node lookup: O(log N)
  • Each hop traversal: O(1)

Total traversal:

O(k)
(where k = number of relationships followed)


Key Insight

  • Adjacency List → O(V + E)
  • Graph DB → O(k)

This is why graph databases scale much better for deep, connected data.


Final Takeaway

At small scale, all approaches work.

At large scale:

Either you maintain graph logic yourself
Or you use a system built for graphs

Graph databases like Neo4j are best suited for problems where relationships and multi-hop traversals are core to the system.

They remove the complexity of managing relationships manually and make traversal efficient and natural.

Top comments (1)

Collapse
 
motedb profile image
mote

Great comparison of storage approaches! The adjacency list vs PostgreSQL vs graph database breakdown is exactly what I needed when evaluating storage for AI agent memory systems.

We ended up with a different trade-off: for robot control systems, the query latency matters more than traversal complexity. Most queries are "give me all related entities for this context" — similar to your "what imports this function" use case — but the latency budget is under 10ms on embedded hardware.

For that constraint, traditional graph databases introduce too much overhead. We ended up building a custom embedded solution that stores edges as sorted vectors with binary search, giving us O(log n) traversal with zero network latency.

If you are dealing with AI agents that need fast local knowledge retrieval without the distributed setup overhead, worth considering lighter-weight approaches. Something like moteDB (a Rust-native embedded multimodal database) might handle the graph structure + vector similarity in a single zero-dependency crate.