TJ Sweet

Posted on Feb 26

Cutting Cypher Latency: Streaming Traversal and Query-Shape Specialization in NornicDB

#algorithms #computerscience #database #performance

Below are the headline numbers that motivated the execution model choices in NornicDB. They’re presented first so you can calibrate the rest of the post: the goal is not “benchmarks as marketing,” but to show the scale of the overhead we’re targeting and then explain where it comes from.

Results at a glance (same hardware)

LDBC Social Network Benchmark (M3 Max, 64GB)

Query Type	NornicDB	Neo4j	Speedup
Message content lookup	6,389 ops/sec	518 ops/sec	12×
Recent messages (friends)	2,769 ops/sec	108 ops/sec	25×
Avg friends per city	4,713 ops/sec	91 ops/sec	52×
Tag co-occurrence	2,076 ops/sec	65 ops/sec	32×

Northwind Benchmark (M3 Max, 64GB)

Operation	NornicDB	Neo4j	Speedup
Index lookup	7,623 ops/sec	2,143 ops/sec	3.6×
Count nodes	5,253 ops/sec	798 ops/sec	6.6×
Write: node	5,578 ops/sec	1,690 ops/sec	3.3×
Write: edge	6,626 ops/sec	1,611 ops/sec	4.1×

Parser mode comparison (Northwind query suite)

NornicDB supports two Cypher parser modes that can be switched at runtime:

⚡ nornic (default): lightweight validation + direct execution
🌳 antlr: strict OpenCypher parsing + full parse tree (better diagnostics, higher overhead)

Query	⚡ nornic	🌳 antlr	Slowdown
Count all nodes	3,272 hz	45 hz	73×
Count all relationships	3,693 hz	50 hz	74×
Find customer by ID	4,213 hz	2,153 hz	2×
Products supplied by supplier	4,023 hz	53 hz	76×
Supplier→Category traversal	3,225 hz	22 hz	147×
Products with/without orders	3,881 hz	0.82 hz	4,753×
Create/delete relationship	3,974 hz	62 hz	64×

Suite runtime:

Mode	Total time
⚡ nornic	17.5s
🌳 antlr	35.3s

Those deltas—especially the big outliers—are what this post is about: where does that overhead come from, and what changes when you design around it?

The problem with “general” execution pipelines

Most mature databases follow a layered approach:

Parse query text into a syntax tree
Build a logical plan
Optimize the plan (often cost-based)
Produce a physical plan
Execute the plan using a generic operator runtime

That architecture has real advantages: flexibility, correctness, and a framework for optimizing complex queries. But it also has costs that show up in production for common graph workloads:

Row-by-row operator overhead (Volcano-style pipelines) can dominate lightweight traversals.
Intermediate materialization increases memory traffic.
Object churn and indirections increase GC pressure and cache misses.
Planning overhead becomes noticeable when queries are small but frequent.

For many real-world graph applications—lookups, short traversals, neighborhood expansions, and simple aggregations—those overheads can outweigh the actual graph work.

What we built: a hybrid engine with streaming fast paths

NornicDB takes a hybrid approach:

A general Cypher engine to support a wide set of queries.
Optimized streaming executors for common traversal + aggregation shapes.
Runtime-switchable parsing modes to trade strictness/debuggability for throughput.

The default production mode favors minimal overhead in the hot path. For query shapes we know are common, we aim to fuse pattern matching and aggregation into tight loops and avoid expensive intermediate structures.

Stream-parse-execute (default mode)

In the default “nornic” parser mode, the engine is designed around a stream-parse-execute approach. The intent is to avoid building heavy intermediate parse structures when we don’t need them, and to push execution decisions into a lightweight, shape-aware path.

This is not a claim that NornicDB has “no planning” anywhere. The codebase still contains analysis artifacts and caching for specific features. The claim is narrower and more useful:

For common traversal and aggregation shapes, NornicDB bypasses generic logical-plan execution and uses pattern-specialized, single-pass streaming executors.

Strict parsing when you want it: ANTLR mode

NornicDB also supports an ANTLR-based parser mode. This mode is stricter and provides better error reporting (line/column), which is valuable during development and debugging. It’s also more expensive: building full parse trees and walking them introduces overhead that can dominate certain query classes.

That tradeoff is intentional. The same engine can run in:

Production mode (lower overhead, practical throughput)
Debug mode (strict validation and better diagnostics)

Why this model performs well

Performance improvements come from removing layers of overhead on the path that matters most for many graph workloads: traversal + filter + aggregate.

1) Fused traversal and aggregation

For eligible query shapes, NornicDB executes traversal and aggregation in a single pass. Instead of producing intermediate row sets and feeding them through multiple generic operators, the executor performs direct scans and aggregates as it traverses.

2) Streaming execution and early termination

For a subset of query shapes, NornicDB’s execution can stream results and short-circuit work early—for example, when a query contains a LIMIT and the engine can stop once enough rows are produced.

A precise statement is:

Streaming traversal is real for optimized query classes, including LIMIT short-circuiting and selected no-materialization fast paths. This is shape-dependent, not universal for every Cypher query.

3) Fewer intermediate structures in hot paths

The largest gains often come not from clever algorithms, but from not doing unnecessary work:

Avoiding full path materialization when only aggregates are needed
Avoiding row-by-row operator dispatch
Avoiding heavy parse trees in the production fast path

In traversal-heavy workloads, these effects compound.

A note on correctness: constraints and transactions

Performance only matters if results are correct and operations are safe.

NornicDB is not just a query interpreter. It includes:

Schema constraints and validation logic
Explicit transaction control (BEGIN / COMMIT / ROLLBACK)
Storage-backed transaction handling for supported backends

A publication-safe way to state this is:

NornicDB enforces schema constraints and supports explicit storage-backed transactions, while also using optimized fast paths for eligible query shapes.

The real tradeoff: hot-path query shape management

The largest downside of shape-specialized execution isn’t performance—it’s organizational cost.

Every optimized path has a lifecycle:

Detect and classify the shape reliably
Implement an optimized executor
Prove semantic equivalence with the general engine
Add regression tests and performance baselines
Keep it correct as Cypher features expand

This is real management overhead, and historically it’s why many engines converge on generic operator runtimes.

Why this tradeoff looks different now

Historically, query-shape specialization has high human overhead. In an agent-driven world, the workload is more template-like, and agents can automate the specialization loop: mine top shapes, generate optimized executors, generate differential tests against a reference engine, and maintain coverage metrics. This shifts the work from manual tuning to automated verification and makes specialized execution economically viable again.

The key point isn’t that AI “writes the database for you.” It’s that:

Workloads become more template-like when generated by tools and agents.
Specialization can be treated as a pipeline: observe → prioritize → implement → verify → measure.

What this model is best at (and what it’s not)

This execution model shines when:

Queries are traversal-heavy and relatively structured
Workloads are dominated by a small set of templates
You care about low-latency and predictable performance
Aggregations can be fused into traversal

It’s not designed to claim universal dominance in every Cypher edge case. There will always be queries where a deep optimizer and a fully generalized runtime are the right tools. NornicDB’s approach is to optimize what matters most and retain a general path for everything else.

Closing thoughts

NornicDB’s execution model is a deliberate choice: remove overhead from the hot path by using streaming, shape-specialized executors for common Cypher patterns, while maintaining constraints and transactional boundaries.

If you’re curious, the best way to evaluate these claims is to run the benchmarks and inspect which queries hit optimized paths versus fallback behavior. Performance claims only matter when engineers can reproduce them—and that’s the bar we’re aiming for.

DEV Community