DEV Community

eyanpen
eyanpen

Posted on

How FalkorDB Stores Edges: Why Neighbor Lookup Is O(degree)

Many people have a question when they first see FalkorDB's architecture:

It doesn't use traditional adjacency lists but maintains edges with sparse matrices — how does it efficiently find all edges of a given node?

And a follow-up question:

If neighbor data is already stored contiguously, why is the query complexity still O(degree) instead of O(1)?


1. How Traditional Graph Databases Store Edges

Traditional graph databases (like Neo4j) typically use:

Adjacency List

For example:

A -> B
A -> C
A -> D
Enter fullscreen mode Exit fullscreen mode

Internally it looks more like:

A:
  edge1 -> edge2 -> edge3
Enter fullscreen mode Exit fullscreen mode

That is:

  • Each node maintains its own edge linked list
  • To find all edges of a node:

    • Simply traverse the linked list

Therefore the complexity is:

O(degree)
Enter fullscreen mode Exit fullscreen mode

Where:

degree = number of edges
Enter fullscreen mode Exit fullscreen mode

For example:

  • out_degree
    Number of outgoing edges

  • in_degree
    Number of incoming edges


2. FalkorDB Is Completely Different: Sparse Matrix

FalkorDB's core design is not an adjacency list.

It is based on:

  • Sparse Matrix
  • GraphBLAS

to maintain the entire graph.

For example:

A(id=0) -> B(id=1)
Enter fullscreen mode Exit fullscreen mode

Internal representation:

M[0,1] = edge_id
Enter fullscreen mode Exit fullscreen mode

Meaning:

source=0
target=1
Enter fullscreen mode Exit fullscreen mode

An edge exists.


3. One Matrix Per Edge Type

For example:

(:User)-[:FRIEND]->(:User)
(:User)-[:LIKES]->(:Post)
Enter fullscreen mode Exit fullscreen mode

FalkorDB maintains:

FRIEND matrix
LIKES matrix
Enter fullscreen mode Exit fullscreen mode

This way during traversal:

No need to scan the entire graph.


4. How Multi-edges Are Maintained

FalkorDB supports:

A -[:CALL]-> B
A -[:CALL]-> B
A -[:CALL]-> B
Enter fullscreen mode Exit fullscreen mode

Therefore a matrix cell cannot simply be:

M[0,1] = 1
Enter fullscreen mode Exit fullscreen mode

It's more like:

M[0,1] = [3,8,15]
Enter fullscreen mode Exit fullscreen mode

That is:

edge ids
Enter fullscreen mode Exit fullscreen mode

Essentially similar to:

  • sparse tensor
  • compressed adjacency structure

5. How to Efficiently Find Edges?

Many people mistakenly think:

0 0 0 1 0 0 1 1 0
Enter fullscreen mode Exit fullscreen mode

Means:

You must scan the entire row to find the 1s.

That's completely wrong.

Because:

Sparse Matrix Doesn't Store Zeros At All


6. What Does a Sparse Matrix Actually Store?

For example:

[0,0,0,1,0,0,1,1,0]
Enter fullscreen mode Exit fullscreen mode

The actual storage looks more like:

[3,6,7]
Enter fullscreen mode Exit fullscreen mode

Meaning:

index 3 has an edge
index 6 has an edge
index 7 has an edge
Enter fullscreen mode Exit fullscreen mode

Zeros don't exist at all.

Therefore:

Finding neighbors of node A:

neighbors(A) = [3,6,7]
Enter fullscreen mode Exit fullscreen mode

Return directly.


7. CSR / CSC: Industrial-Grade Sparse Matrix Structures

Real implementations typically use:

  • CSR (Compressed Sparse Row)
  • CSC (Compressed Sparse Column)

For example:

Matrix:

A: 0 0 0 1 0 0 1 1
B: 1 0 0 0 0 0 0 0
C: 0 1 0 0 1 0 0 0
Enter fullscreen mode Exit fullscreen mode

CSR might store it as:

indices = [3,6,7,0,1,4]
row_ptr = [0,3,4,6]
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • A's data is at indices[0:3]
  • B's data is at indices[3:4]
  • C's data is at indices[4:6]

So:

Finding all edges of A:

indices[0:3]
Enter fullscreen mode Exit fullscreen mode

Gives us:

[3,6,7]
Enter fullscreen mode Exit fullscreen mode

8. Why Is the Complexity Still O(degree)?

This is the most commonly misunderstood point.

Many people ask:

Since [3,6,7] is already in contiguous memory,
isn't a direct memcpy just O(1)?

The answer:

Locating the array is O(1)

But:

Traversing the array is still O(k)

Where:

k = degree
Enter fullscreen mode Exit fullscreen mode

9. What Does Algorithmic Complexity Actually Measure?

For example:

MATCH (a)-[e]->()
RETURN e
Enter fullscreen mode Exit fullscreen mode

The database doesn't just return:

an array pointer
Enter fullscreen mode Exit fullscreen mode

It must:

  • Traverse each edge
  • Decode the edge object
  • Construct the result set
  • Return to the client

Therefore:

for edge in neighbors:
    emit(edge)
Enter fullscreen mode Exit fullscreen mode

Must execute:

degree times
Enter fullscreen mode Exit fullscreen mode

So the overall complexity is:

O(degree)
Enter fullscreen mode Exit fullscreen mode

10. Output-sensitive Complexity

This is a classic concept:

The size of the output itself counts toward complexity

For example:

If:

A has 1 million edges
Enter fullscreen mode Exit fullscreen mode

Even if:

finding the array start
Enter fullscreen mode Exit fullscreen mode

Only takes:

O(1)
Enter fullscreen mode Exit fullscreen mode

But:

Returning 1 million edges:

Cannot be:

O(1)
Enter fullscreen mode Exit fullscreen mode

Because:

You must at least "look at" each element.


11. Why Is FalkorDB Still Fast?

Because:

[3,6,7]
Enter fullscreen mode Exit fullscreen mode

Is:

  • Contiguous memory
  • Cache-friendly
  • SIMD-friendly

The CPU can:

  • Prefetch
  • Vector load
  • Branch prediction

While traditional adjacency lists:

edge1 -> edge2 -> edge3
Enter fullscreen mode Exit fullscreen mode

Involve:

Pointer chasing

Which causes:

  • Cache misses
  • Memory stalls
  • Branch mispredictions

Therefore:

FalkorDB has clear advantages in:

  • High fan-out traversal
  • Multi-hop pattern matching
  • Graph analytics
  • GraphRAG

scenarios.


12. Neo4j vs FalkorDB: The Essential Difference

Neo4j is more like:

nodes + edge linked lists
Enter fullscreen mode Exit fullscreen mode

Suited for:

  • OLTP
  • Single-hop queries
  • High-frequency edge updates

FalkorDB is more like:

a graph computation engine
Enter fullscreen mode Exit fullscreen mode

Suited for:

  • Multi-hop traversal
  • Pattern matching
  • Graph analytics
  • Vectorized computation

For example:

(A)-[:F]->(B)-[:F]->(C)
Enter fullscreen mode Exit fullscreen mode

Neo4j:

pointer traversal
Enter fullscreen mode Exit fullscreen mode

FalkorDB:

matrix multiply
Enter fullscreen mode Exit fullscreen mode

That is:

F × F
Enter fullscreen mode Exit fullscreen mode

This is its biggest architectural difference.


13. Final Summary

FalkorDB's core philosophy:

Don't store "empty"
Only store "existing edges"

Therefore:

0 0 0 1 0 0 1
Enter fullscreen mode Exit fullscreen mode

Actually becomes:

[3,6]
Enter fullscreen mode Exit fullscreen mode

Querying all edges of a node:

  • Locating adjacency data:

    • O(1)
  • Returning all edges:

    • O(degree)

Where:

degree = number of edges for the current node
Enter fullscreen mode Exit fullscreen mode

Not:

total number of edges in the entire graph
Enter fullscreen mode Exit fullscreen mode

This is the core performance model of a Sparse Matrix graph database.


14. Does Splitting Edges Into Multiple Types vs. a Single Type Affect Query Speed?

A common question:

Since locating edges is O(1) and returning edges is O(degree),
does categorizing edges into one type vs. multiple types affect query speed?

The answer: It depends on whether the query specifies an edge type.


When the Query Specifies an Edge Type

For example:

MATCH (a)-[:FRIEND]->(b) RETURN b
Enter fullscreen mode Exit fullscreen mode

FalkorDB only scans the FRIEND matrix.

If all edges are categorized as a single type (e.g., :REL), the matrix contains all edges, making the degree larger.

Multiple types = smaller matrices = less traversal = faster.


When the Query Does Not Specify an Edge Type

For example:

MATCH (a)-[]->(b) RETURN b
Enter fullscreen mode Exit fullscreen mode

FalkorDB needs to merge results from multiple matrices.

In this case:

  • Total traversal volume is the same (total degree)
  • Multiple types have slight merge overhead
  • Single type traverses one matrix directly

The difference is minimal, approximately no impact.


Summary

Scenario Single Type vs. Multiple Types Impact
Query specifies edge type Multiple types faster Only scans the corresponding matrix, smaller degree
Query does not specify edge type Nearly no difference Same total degree, slight merge overhead with multiple types

Practical modeling recommendation:

Splitting into multiple types is the better practice.
Most real-world queries specify a relationship type, and splitting types significantly reduces the number of edges that need to be traversed.

Top comments (0)