DEV Community

Ian Cowley
Ian Cowley

Posted on

I built a Zero-Allocation C# Knowledge Graph (because JVM graphs are too bloated)

If you are building AI agents, you eventually hit the "Memory Wall".

Your agent doesn't just need semantic text chunks (Vector Search) or structured tables (SQL). It often needs to trace relationships. For example: Find all suppliers connected to this failing part, or Find the common connection between User_A and User_B.

To solve this, the industry tells you to spin up a massive Java-based Graph Database container.

I've been writing C# and SQL for decades, and I despise unnecessary bloat. I didn't want to run a 2GB JVM container locally just to traverse a few hundred thousand relationships.

So, I built Glacier.Graph.

It is a zero-dependency, bare-metal C# Knowledge Graph. It completely bypasses the .NET Garbage Collector and can persist 230,000 edges to disk in 30 milliseconds.

Here is how I architected the engine to achieve memory-bandwidth speeds.

The Problem with "Object-Oriented" Graphs

When most developers try to build a graph in memory, they immediately reach for Object-Oriented Programming:

public class Node 
{
    public string Id { get; set; }
    public List<Edge> Edges { get; set; }
}

public class Edge 
{
    public Node Target { get; set; }
    public string RelationType { get; set; }
}

Enter fullscreen mode Exit fullscreen mode

If you have 100,000 nodes and 500,000 relationships, the .NET Garbage Collector now has to track 600,000 object headers scattered randomly across the heap. Traversing this graph means chasing pointer references through RAM, missing the CPU cache every single time, and dealing with brutal GC pauses.

The Fix: The "Forward Star" Representation

Instead of objects, Glacier.Graph uses the Forward Star (or Compressed Sparse Row) technique.

All edges are stored in flat, primitive integer arrays. When you add a relationship, the engine doesn't allocate an object; it just increments an index and writes to int[] arrays.

// The core of the Graph Store
private int[] _head;      // Starting edge index for a node
private int[] _to;        // Target node ID
private int[] _relation;  // Type of relation (e.g. "KNOWS")
private int[] _next;      // Index of the next edge for this node

Enter fullscreen mode Exit fullscreen mode

When you traverse the graph, you aren't doing heap allocations. You are just doing sequential array index lookups. The CPU cache prefetcher loves this, resulting in near-instantaneous traversal speeds.

The Benchmark: Traversing 150,000 Nodes

I generated a complex synthetic graph with 150,000 nodes and 233,000 edges (including chains, branches, and back-references).

I then ran a Breadth-First Search (BFS) to find the shortest path between User_1 and User_99999.

Here is the raw console output:

==========================================
 Glacier.Graph | High-Performance Graph DB
==========================================

[1] Initializing Graph Engine and generating data...
    Total Nodes: 150,001 | Total Edges: 233,333

[3] Executing BFS Shortest Path (User_1 -> User_99999)...
    Path found in 5.3066 ms!
    Hops: 25
    Route: User_1 -> ... (24 intermediate nodes) ... -> User_99999

[4] Executing 4-Hop Neighborhood Search around User_1...
    Neighborhood scanned in 0.4416 ms!
    Discovered 11 connected entities within 4 hops.

Enter fullscreen mode Exit fullscreen mode

Finding a 25-hop path through 150,000 nodes took 5.3 milliseconds. Mapping an entire 4-hop neighborhood took 0.44 milliseconds.

You can't even complete the HTTP handshake to a standard database in the time it takes this engine to traverse the entire network.

Blazing Fast Persistence

Because the graph is just a collection of primitive int[] arrays, saving the graph to disk is incredibly fast. We don't use JSON or heavy serialization.

Using MemoryMarshal.AsBytes(), the engine grabs the raw bytes of the arrays directly out of RAM and blasts them to the SSD.

[2] Saving raw memory arrays to disk...
    Saved 27.46 MB to 'graph_database.bin' in 30 ms.

[3] Destroying graph in memory and reloading from disk...
    Graph revived from disk in 41 ms!

Enter fullscreen mode Exit fullscreen mode

30 milliseconds to save. 41 milliseconds to revive.

Built for AI Agents

Like my other libraries, I didn't want to build a bloated REST API.

Glacier.Graph includes a built-in Model Context Protocol (MCP) server over standard I/O. You can point your AI agents (via AgentDevKit, Claude Desktop, or Cursor) directly at the .dll.

The AI instantly gets JSON-RPC access to add_node, add_edge, find_shortest_path, and find_neighborhood. It allows LLMs to autonomously traverse a high-speed Knowledge Graph without any Python dependencies.

Try it out

If you are tired of massive containerized databases and want bare-metal C# performance for your relational AI data, give it a shot.

GitHub: ian-cowley/Glacier.Graph

Let me know what your traversal times look like!

Top comments (2)

Collapse
 
motedb profile image
mote

The Forward Star representation is clever, but the part that caught my eye was the "save raw bytes to disk with MemoryMarshal.AsBytes" approach. That's the same instinct behind how we handle episodic memory in robot controllers — serialize structured state as raw binary, not objects.

One thing I'd push back on though: the benchmark numbers look great, but they assume you have the graph in memory. For embedded deployments — our robot controllers have 256KB RAM total — you don't have 150K nodes sitting in RAM. You're lucky if you can fit 10K relationships before you have to page to flash.

The CSR representation is actually perfect for this constraint, which the article doesn't fully explore. With flat int arrays, you can memory-map sections from flash storage without deserializing the whole thing. The CPU prefetcher loves sequential int reads. Your 5.3ms BFS on 150K nodes is doing the full graph in RAM — the interesting question is what happens when you're doing the same traversal across a memory-mapped file on flash with 50x slower random access.

The MCP over stdio approach is clean though. No network stack, no REST overhead — just process-to-process bytes. That's how you integrate on embedded: not HTTP servers, not APIs, just shared memory or memory-mapped files.

Have you measured traversal time when the graph isn't fully resident in RAM? That's where CSR really shines or falls apart depending on your access patterns.

Collapse
 
iancowley profile image
Ian Cowley

Great callout on the memory constraint and CSR.

I just implemented a secondary CsrGraphStore storage engine to test this. I built a compiler that flattens the dynamic Forward Star graph into a static CSR binary format, and implemented a strict software LRU page cache over the file stream to simulate a hard 256KB RAM limit, forcing the system to heavily rely on OS page swapping/disk reads during traversal.

I ran a benchmark on a 450,000 node / 700,000 edge graph, doing a 29-hop BFS:

In-RAM (Forward Star): ~8.5 ms
Out-of-Core (CSR with a strict 256KB cache): ~208 ms
While a 208 ms traversal is a significant jump from 8 ms, it's actually an incredible success for a 256KB environment.

The CSR sequential layout is exactly what saves it from total collapse. If we tried to page the Forward Star representation with only 256KB of RAM, the random pointer-chasing in the _next array would have caused catastrophic cache misses for every edge read, easily pushing that 208 ms into several seconds of disk thrashing.

I've pushed version 1.0.1 to GitHub which includes this CSR implementation and a parameter to enforce the maxMemoryBytes cache limit if you want to play around with it.

Thanks for the insightful feedback, it definitely pushed the engine in a better direction for embedded/constrained setups!