DEV Community

TJ Sweet
TJ Sweet

Posted on

Cutting Cypher Latency: Streaming Traversal and Query-Shape Specialization in NornicDB

Below are the headline numbers that motivated the execution model choices in NornicDB. They’re presented first so you can calibrate the rest of the post: the goal is not “benchmarks as marketing,” but to show the scale of the overhead we’re targeting and then explain where it comes from.

Results at a glance (same hardware)

LDBC Social Network Benchmark (M3 Max, 64GB)

Query Type NornicDB Neo4j Speedup
Message content lookup 6,389 ops/sec 518 ops/sec 12×
Recent messages (friends) 2,769 ops/sec 108 ops/sec 25×
Avg friends per city 4,713 ops/sec 91 ops/sec 52×
Tag co-occurrence 2,076 ops/sec 65 ops/sec 32×

Northwind Benchmark (M3 Max, 64GB)

Operation NornicDB Neo4j Speedup
Index lookup 7,623 ops/sec 2,143 ops/sec 3.6×
Count nodes 5,253 ops/sec 798 ops/sec 6.6×
Write: node 5,578 ops/sec 1,690 ops/sec 3.3×
Write: edge 6,626 ops/sec 1,611 ops/sec 4.1×

Parser mode comparison (Northwind query suite)

NornicDB supports two Cypher parser modes that can be switched at runtime:

  • ⚡ nornic (default): lightweight validation + direct execution
  • 🌳 antlr: strict OpenCypher parsing + full parse tree (better diagnostics, higher overhead)
Query ⚡ nornic 🌳 antlr Slowdown
Count all nodes 3,272 hz 45 hz 73×
Count all relationships 3,693 hz 50 hz 74×
Find customer by ID 4,213 hz 2,153 hz
Products supplied by supplier 4,023 hz 53 hz 76×
Supplier→Category traversal 3,225 hz 22 hz 147×
Products with/without orders 3,881 hz 0.82 hz 4,753×
Create/delete relationship 3,974 hz 62 hz 64×

Suite runtime:

Mode Total time
⚡ nornic 17.5s
🌳 antlr 35.3s

Those deltas—especially the big outliers—are what this post is about: where does that overhead come from, and what changes when you design around it?


The problem with “general” execution pipelines

Most mature databases follow a layered approach:

  • Parse query text into a syntax tree
  • Build a logical plan
  • Optimize the plan (often cost-based)
  • Produce a physical plan
  • Execute the plan using a generic operator runtime

That architecture has real advantages: flexibility, correctness, and a framework for optimizing complex queries. But it also has costs that show up in production for common graph workloads:

  • Row-by-row operator overhead (Volcano-style pipelines) can dominate lightweight traversals.
  • Intermediate materialization increases memory traffic.
  • Object churn and indirections increase GC pressure and cache misses.
  • Planning overhead becomes noticeable when queries are small but frequent.

For many real-world graph applications—lookups, short traversals, neighborhood expansions, and simple aggregations—those overheads can outweigh the actual graph work.


What we built: a hybrid engine with streaming fast paths

NornicDB takes a hybrid approach:

  • A general Cypher engine to support a wide set of queries.
  • Optimized streaming executors for common traversal + aggregation shapes.
  • Runtime-switchable parsing modes to trade strictness/debuggability for throughput.

The default production mode favors minimal overhead in the hot path. For query shapes we know are common, we aim to fuse pattern matching and aggregation into tight loops and avoid expensive intermediate structures.

Stream-parse-execute (default mode)

In the default “nornic” parser mode, the engine is designed around a stream-parse-execute approach. The intent is to avoid building heavy intermediate parse structures when we don’t need them, and to push execution decisions into a lightweight, shape-aware path.

This is not a claim that NornicDB has “no planning” anywhere. The codebase still contains analysis artifacts and caching for specific features. The claim is narrower and more useful:

For common traversal and aggregation shapes, NornicDB bypasses generic logical-plan execution and uses pattern-specialized, single-pass streaming executors.

Strict parsing when you want it: ANTLR mode

NornicDB also supports an ANTLR-based parser mode. This mode is stricter and provides better error reporting (line/column), which is valuable during development and debugging. It’s also more expensive: building full parse trees and walking them introduces overhead that can dominate certain query classes.

That tradeoff is intentional. The same engine can run in:

  • Production mode (lower overhead, practical throughput)
  • Debug mode (strict validation and better diagnostics)

Why this model performs well

Performance improvements come from removing layers of overhead on the path that matters most for many graph workloads: traversal + filter + aggregate.

1) Fused traversal and aggregation

For eligible query shapes, NornicDB executes traversal and aggregation in a single pass. Instead of producing intermediate row sets and feeding them through multiple generic operators, the executor performs direct scans and aggregates as it traverses.

2) Streaming execution and early termination

For a subset of query shapes, NornicDB’s execution can stream results and short-circuit work early—for example, when a query contains a LIMIT and the engine can stop once enough rows are produced.

A precise statement is:

Streaming traversal is real for optimized query classes, including LIMIT short-circuiting and selected no-materialization fast paths. This is shape-dependent, not universal for every Cypher query.

3) Fewer intermediate structures in hot paths

The largest gains often come not from clever algorithms, but from not doing unnecessary work:

  • Avoiding full path materialization when only aggregates are needed
  • Avoiding row-by-row operator dispatch
  • Avoiding heavy parse trees in the production fast path

In traversal-heavy workloads, these effects compound.


A note on correctness: constraints and transactions

Performance only matters if results are correct and operations are safe.

NornicDB is not just a query interpreter. It includes:

  • Schema constraints and validation logic
  • Explicit transaction control (BEGIN / COMMIT / ROLLBACK)
  • Storage-backed transaction handling for supported backends

A publication-safe way to state this is:

NornicDB enforces schema constraints and supports explicit storage-backed transactions, while also using optimized fast paths for eligible query shapes.


The real tradeoff: hot-path query shape management

The largest downside of shape-specialized execution isn’t performance—it’s organizational cost.

Every optimized path has a lifecycle:

  • Detect and classify the shape reliably
  • Implement an optimized executor
  • Prove semantic equivalence with the general engine
  • Add regression tests and performance baselines
  • Keep it correct as Cypher features expand

This is real management overhead, and historically it’s why many engines converge on generic operator runtimes.

Why this tradeoff looks different now

Historically, query-shape specialization has high human overhead. In an agent-driven world, the workload is more template-like, and agents can automate the specialization loop: mine top shapes, generate optimized executors, generate differential tests against a reference engine, and maintain coverage metrics. This shifts the work from manual tuning to automated verification and makes specialized execution economically viable again.

The key point isn’t that AI “writes the database for you.” It’s that:

  • Workloads become more template-like when generated by tools and agents.
  • Specialization can be treated as a pipeline: observe → prioritize → implement → verify → measure.

What this model is best at (and what it’s not)

This execution model shines when:

  • Queries are traversal-heavy and relatively structured
  • Workloads are dominated by a small set of templates
  • You care about low-latency and predictable performance
  • Aggregations can be fused into traversal

It’s not designed to claim universal dominance in every Cypher edge case. There will always be queries where a deep optimizer and a fully generalized runtime are the right tools. NornicDB’s approach is to optimize what matters most and retain a general path for everything else.


Closing thoughts

NornicDB’s execution model is a deliberate choice: remove overhead from the hot path by using streaming, shape-specialized executors for common Cypher patterns, while maintaining constraints and transactional boundaries.

If you’re curious, the best way to evaluate these claims is to run the benchmarks and inspect which queries hit optimized paths versus fallback behavior. Performance claims only matter when engineers can reproduce them—and that’s the bar we’re aiming for.

Top comments (0)