DEV Community

TJ Sweet
TJ Sweet

Posted on

The Full Graph-RAG Stack As Declarative Pipelines in Cypher

Most RAG systems aren’t architected so much as assembled:

  • embedding service
  • vector search service
  • reranker
  • LLM endpoint
  • application glue for retries, timeouts, auth, and marshaling

It works, until you spend more time maintaining orchestration than improving retrieval quality.

https://github.com/orneryd/NornicDB/commits/main/?since=2026-03-03&until=2026-03-03

This update to NornicDB changes that model: retrieval, embedding, reranking, and inference are now first-class Cypher procedures. The important part is not “new APIs.” The important part is that these stages now execute as part of the query engine.


What landed

New Cypher primitives:

  • db.retrieve
  • db.rretrieve
  • db.rerank
  • db.infer
  • db.index.vector.embed

These are read-oriented pipeline operators designed to compose inside Cypher, not wrappers around separate app-tier flows.


Why this is materially different

1) The pipeline is now declarative and inspectable

Instead of this in app code:

const embedding = await embed(query)
const results = await vectorSearch(embedding)
const reranked = await rerank(query, results)
const answer = await infer(query, reranked)
Enter fullscreen mode Exit fullscreen mode

you can express the same intent in Cypher:

CALL db.index.vector.embed($query) YIELD embedding
CALL db.index.vector.queryNodes('doc_index', 20, embedding) YIELD node, score
WITH collect({id: node.id, content: coalesce(node.content, toString(node)), score: score}) AS candidates
CALL db.rerank({query: $query, candidates: candidates, rerankTopK: 20}) YIELD id, final_score
RETURN id, final_score
Enter fullscreen mode Exit fullscreen mode

That makes the pipeline:

  • versionable
  • benchmarkable
  • testable
  • visible to query execution semantics

2) Less orchestration overhead in the hot path

You still call models. But you remove a lot of unnecessary app-layer choreography between stages:

  • fewer service hops
  • less JSON marshalling back-and-forth
  • fewer per-hop retries/timeouts to coordinate

This reduces tail-latency and shrinks the operational failure surface.

3) Graph + semantic logic are fused in one plan

Because these are Cypher stages, you can combine semantic retrieval with graph constraints directly:

MATCH (u:User {id: $userId})-[:MEMBER_OF]->(g:Group)
CALL db.index.vector.embed($query) YIELD embedding
CALL db.index.vector.queryNodes('doc_index', 50, embedding) YIELD node, score
WHERE (node)-[:VISIBLE_TO]->(g)
WITH collect({id: node.id, content: coalesce(node.content, toString(node)), score: score}) AS candidates
CALL db.rerank({query: $query, candidates: candidates}) YIELD id, final_score
RETURN id, final_score
Enter fullscreen mode Exit fullscreen mode

This is not “vector DB results, then post-filter in app code.” It’s one composable query flow.


Query planner + cache: why this is practical, not just ergonomic

Adding new procedures is easy. Making them production-usable is harder. The key enabler is how query planning and caching interact with these primitives.

Planning path

NornicDB already routes CALL procedures through the Cypher executor dispatch path. That means these RAG primitives participate in the same execution flow as other query stages, rather than being side-channel operations.

Query plan cache

NornicDB keeps a parsed/structured plan cache for repeated query shapes. For RAG workloads, this matters because many queries are template-like:

  • same Cypher structure
  • different parameters ($query, $userId, etc.)

So the engine avoids repeated parse/analysis overhead for the same pipeline shape, and only rebinds inputs.

Result cache policy (important boundaries)

Read-query result caching now treats these primitives intentionally:

  • cacheable by default:
    • db.retrieve
    • db.rretrieve
    • db.rerank
    • db.index.vector.embed
  • db.infer is not cached by default
    • can be opted in per call (cache: true / cache_enabled: true)

This is the right split:

  • retrieval/rerank/embed are often deterministic enough for reuse under normal invalidation rules
  • inference can be non-deterministic and should require explicit opt-in

Correctness under writes

Cached read results follow normal invalidation behavior on writes. So this is not “cache forever”; it is “cache when safe, invalidate on data mutation.”

Net effect: you keep low overhead for repeated pipeline templates without pretending inference is always deterministic.


Procedure boundaries (clear contract)

  • db.retrieve: retrieval stage
  • db.rretrieve: retrieval shorthand, auto-rerank if configured/available
  • db.rerank: true Stage-2 API over caller-provided candidates (does not run retrieval)
  • db.index.vector.embed: returns embedding array for explicit manual pipeline control
  • db.infer: inference stage, default non-cached

That split keeps simple flows short and advanced flows explicit.


What this is not

This does not mean:

  • instant hosted model platform
  • one-line “AI solved” pipeline
  • no tradeoffs in model/provider choice

You still choose providers and quality/latency/cost tradeoffs. What changed is where orchestration logic lives.


Common patterns today:

  • vector systems with retrieval APIs, but app-driven orchestration
  • graph + external RAG glue
  • managed black-box pipelines with limited control

This approach is different: orchestration becomes query-native and composable in Cypher, with planner/cache semantics instead of ad-hoc application control flow.


Why this matters

The main gain is not syntactic convenience. It is reducing accidental complexity:

  • fewer moving parts outside the data layer
  • fewer duplicated pipelines across services/repos
  • better observability and repeatability for retrieval flows
  • easier benchmarkability of real pipeline templates

The strategic question shifts from:

“How should we glue these services together?”

to:

“Which query pipeline shape should we run for this workload?”

That is a better problem to have.

Top comments (0)