Most RAG systems aren’t architected so much as assembled:
- embedding service
- vector search service
- reranker
- LLM endpoint
- application glue for retries, timeouts, auth, and marshaling
It works, until you spend more time maintaining orchestration than improving retrieval quality.
https://github.com/orneryd/NornicDB/commits/main/?since=2026-03-03&until=2026-03-03
This update to NornicDB changes that model: retrieval, embedding, reranking, and inference are now first-class Cypher procedures. The important part is not “new APIs.” The important part is that these stages now execute as part of the query engine.
What landed
New Cypher primitives:
db.retrievedb.rretrievedb.rerankdb.inferdb.index.vector.embed
These are read-oriented pipeline operators designed to compose inside Cypher, not wrappers around separate app-tier flows.
Why this is materially different
1) The pipeline is now declarative and inspectable
Instead of this in app code:
const embedding = await embed(query)
const results = await vectorSearch(embedding)
const reranked = await rerank(query, results)
const answer = await infer(query, reranked)
you can express the same intent in Cypher:
CALL db.index.vector.embed($query) YIELD embedding
CALL db.index.vector.queryNodes('doc_index', 20, embedding) YIELD node, score
WITH collect({id: node.id, content: coalesce(node.content, toString(node)), score: score}) AS candidates
CALL db.rerank({query: $query, candidates: candidates, rerankTopK: 20}) YIELD id, final_score
RETURN id, final_score
That makes the pipeline:
- versionable
- benchmarkable
- testable
- visible to query execution semantics
2) Less orchestration overhead in the hot path
You still call models. But you remove a lot of unnecessary app-layer choreography between stages:
- fewer service hops
- less JSON marshalling back-and-forth
- fewer per-hop retries/timeouts to coordinate
This reduces tail-latency and shrinks the operational failure surface.
3) Graph + semantic logic are fused in one plan
Because these are Cypher stages, you can combine semantic retrieval with graph constraints directly:
MATCH (u:User {id: $userId})-[:MEMBER_OF]->(g:Group)
CALL db.index.vector.embed($query) YIELD embedding
CALL db.index.vector.queryNodes('doc_index', 50, embedding) YIELD node, score
WHERE (node)-[:VISIBLE_TO]->(g)
WITH collect({id: node.id, content: coalesce(node.content, toString(node)), score: score}) AS candidates
CALL db.rerank({query: $query, candidates: candidates}) YIELD id, final_score
RETURN id, final_score
This is not “vector DB results, then post-filter in app code.” It’s one composable query flow.
Query planner + cache: why this is practical, not just ergonomic
Adding new procedures is easy. Making them production-usable is harder. The key enabler is how query planning and caching interact with these primitives.
Planning path
NornicDB already routes CALL procedures through the Cypher executor dispatch path. That means these RAG primitives participate in the same execution flow as other query stages, rather than being side-channel operations.
Query plan cache
NornicDB keeps a parsed/structured plan cache for repeated query shapes. For RAG workloads, this matters because many queries are template-like:
- same Cypher structure
- different parameters (
$query,$userId, etc.)
So the engine avoids repeated parse/analysis overhead for the same pipeline shape, and only rebinds inputs.
Result cache policy (important boundaries)
Read-query result caching now treats these primitives intentionally:
- cacheable by default:
db.retrievedb.rretrievedb.rerankdb.index.vector.embed
-
db.inferis not cached by default- can be opted in per call (
cache: true/cache_enabled: true)
- can be opted in per call (
This is the right split:
- retrieval/rerank/embed are often deterministic enough for reuse under normal invalidation rules
- inference can be non-deterministic and should require explicit opt-in
Correctness under writes
Cached read results follow normal invalidation behavior on writes. So this is not “cache forever”; it is “cache when safe, invalidate on data mutation.”
Net effect: you keep low overhead for repeated pipeline templates without pretending inference is always deterministic.
Procedure boundaries (clear contract)
-
db.retrieve: retrieval stage -
db.rretrieve: retrieval shorthand, auto-rerank if configured/available -
db.rerank: true Stage-2 API over caller-provided candidates (does not run retrieval) -
db.index.vector.embed: returns embedding array for explicit manual pipeline control -
db.infer: inference stage, default non-cached
That split keeps simple flows short and advanced flows explicit.
What this is not
This does not mean:
- instant hosted model platform
- one-line “AI solved” pipeline
- no tradeoffs in model/provider choice
You still choose providers and quality/latency/cost tradeoffs. What changed is where orchestration logic lives.
Common patterns today:
- vector systems with retrieval APIs, but app-driven orchestration
- graph + external RAG glue
- managed black-box pipelines with limited control
This approach is different: orchestration becomes query-native and composable in Cypher, with planner/cache semantics instead of ad-hoc application control flow.
Why this matters
The main gain is not syntactic convenience. It is reducing accidental complexity:
- fewer moving parts outside the data layer
- fewer duplicated pipelines across services/repos
- better observability and repeatability for retrieval flows
- easier benchmarkability of real pipeline templates
The strategic question shifts from:
“How should we glue these services together?”
to:
“Which query pipeline shape should we run for this workload?”
That is a better problem to have.
Top comments (0)