TJ Sweet

Posted on Mar 4

The Full Graph-RAG Stack As Declarative Pipelines in Cypher

#architecture #database #llm #rag

Most RAG systems aren’t architected so much as assembled:

embedding service
vector search service
reranker
LLM endpoint
application glue for retries, timeouts, auth, and marshaling

It works, until you spend more time maintaining orchestration than improving retrieval quality.

https://github.com/orneryd/NornicDB/commits/main/?since=2026-03-03&until=2026-03-03

This update to NornicDB changes that model: retrieval, embedding, reranking, and inference are now first-class Cypher procedures. The important part is not “new APIs.” The important part is that these stages now execute as part of the query engine.

What landed

New Cypher primitives:

db.retrieve
db.rretrieve
db.rerank
db.infer
db.index.vector.embed

These are read-oriented pipeline operators designed to compose inside Cypher, not wrappers around separate app-tier flows.

Why this is materially different

1) The pipeline is now declarative and inspectable

Instead of this in app code:

const embedding = await embed(query)
const results = await vectorSearch(embedding)
const reranked = await rerank(query, results)
const answer = await infer(query, reranked)

you can express the same intent in Cypher:

CALL db.index.vector.embed($query) YIELD embedding
CALL db.index.vector.queryNodes('doc_index', 20, embedding) YIELD node, score
WITH collect({id: node.id, content: coalesce(node.content, toString(node)), score: score}) AS candidates
CALL db.rerank({query: $query, candidates: candidates, rerankTopK: 20}) YIELD id, final_score
RETURN id, final_score

That makes the pipeline:

versionable
benchmarkable
testable
visible to query execution semantics

2) Less orchestration overhead in the hot path

You still call models. But you remove a lot of unnecessary app-layer choreography between stages:

fewer service hops
less JSON marshalling back-and-forth
fewer per-hop retries/timeouts to coordinate

This reduces tail-latency and shrinks the operational failure surface.

3) Graph + semantic logic are fused in one plan

Because these are Cypher stages, you can combine semantic retrieval with graph constraints directly:

MATCH (u:User {id: $userId})-[:MEMBER_OF]->(g:Group)
CALL db.index.vector.embed($query) YIELD embedding
CALL db.index.vector.queryNodes('doc_index', 50, embedding) YIELD node, score
WHERE (node)-[:VISIBLE_TO]->(g)
WITH collect({id: node.id, content: coalesce(node.content, toString(node)), score: score}) AS candidates
CALL db.rerank({query: $query, candidates: candidates}) YIELD id, final_score
RETURN id, final_score

This is not “vector DB results, then post-filter in app code.” It’s one composable query flow.

Query planner + cache: why this is practical, not just ergonomic

Adding new procedures is easy. Making them production-usable is harder. The key enabler is how query planning and caching interact with these primitives.

Planning path

NornicDB already routes CALL procedures through the Cypher executor dispatch path. That means these RAG primitives participate in the same execution flow as other query stages, rather than being side-channel operations.

Query plan cache

NornicDB keeps a parsed/structured plan cache for repeated query shapes. For RAG workloads, this matters because many queries are template-like:

same Cypher structure
different parameters ($query, $userId, etc.)

So the engine avoids repeated parse/analysis overhead for the same pipeline shape, and only rebinds inputs.

Result cache policy (important boundaries)

Read-query result caching now treats these primitives intentionally:

cacheable by default:
- db.retrieve
- db.rretrieve
- db.rerank
- db.index.vector.embed
db.infer is not cached by default
- can be opted in per call (cache: true / cache_enabled: true)

This is the right split:

retrieval/rerank/embed are often deterministic enough for reuse under normal invalidation rules
inference can be non-deterministic and should require explicit opt-in

Correctness under writes

Cached read results follow normal invalidation behavior on writes. So this is not “cache forever”; it is “cache when safe, invalidate on data mutation.”

Net effect: you keep low overhead for repeated pipeline templates without pretending inference is always deterministic.

Procedure boundaries (clear contract)

db.retrieve: retrieval stage
db.rretrieve: retrieval shorthand, auto-rerank if configured/available
db.rerank: true Stage-2 API over caller-provided candidates (does not run retrieval)
db.index.vector.embed: returns embedding array for explicit manual pipeline control
db.infer: inference stage, default non-cached

That split keeps simple flows short and advanced flows explicit.

What this is not

This does not mean:

instant hosted model platform
one-line “AI solved” pipeline
no tradeoffs in model/provider choice

You still choose providers and quality/latency/cost tradeoffs. What changed is where orchestration logic lives.

Common patterns today:

vector systems with retrieval APIs, but app-driven orchestration
graph + external RAG glue
managed black-box pipelines with limited control

This approach is different: orchestration becomes query-native and composable in Cypher, with planner/cache semantics instead of ad-hoc application control flow.

Why this matters

The main gain is not syntactic convenience. It is reducing accidental complexity:

fewer moving parts outside the data layer
fewer duplicated pipelines across services/repos
better observability and repeatability for retrieval flows
easier benchmarkability of real pipeline templates

The strategic question shifts from:

“How should we glue these services together?”

to:

“Which query pipeline shape should we run for this workload?”

That is a better problem to have.

DEV Community