<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: TJ Sweet</title>
    <description>The latest articles on DEV Community by TJ Sweet (@orneryd).</description>
    <link>https://dev.to/orneryd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3786783%2F11c19ce3-48d8-4966-a446-764dbc4558c6.png</url>
      <title>DEV Community: TJ Sweet</title>
      <link>https://dev.to/orneryd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/orneryd"/>
    <language>en</language>
    <item>
      <title>I’m looking for a small number of maintainers for NornicDB</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Mon, 30 Mar 2026 23:18:15 +0000</pubDate>
      <link>https://dev.to/orneryd/im-looking-for-a-small-number-of-maintainers-for-nornicdb-2pn6</link>
      <guid>https://dev.to/orneryd/im-looking-for-a-small-number-of-maintainers-for-nornicdb-2pn6</guid>
      <description>&lt;p&gt;NornicDB is a Neo4j-compatible graph + vector database with MVCC, auditability-oriented features, hybrid retrieval, and a strong bias toward performance and operational simplicity. It’s the kind of system where correctness, latency, storage behavior, and developer ergonomics all matter at the same time.&lt;/p&gt;

&lt;p&gt;I’m not looking for “contributors” in the generic open source sense. I’m looking for people whose engineering habits match the shape of this project.&lt;/p&gt;

&lt;p&gt;The people I work best with tend to have a few things in common:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;they use agentic tooling well, but don’t use it as a substitute for taste or rigor&lt;/li&gt;
&lt;li&gt;they like spec-driven development, not just coding until tests pass&lt;/li&gt;
&lt;li&gt;they default to TDD or regression-first work when touching complex systems&lt;/li&gt;
&lt;li&gt;they care about performance, memory behavior, query shapes, and hot paths&lt;/li&gt;
&lt;li&gt;they care about developer experience, naming, docs, tooling, and maintainability&lt;/li&gt;
&lt;li&gt;they can hold correctness and pragmatism in their head at the same time&lt;/li&gt;
&lt;li&gt;they are comfortable working on systems that have database, query engine, protocol, and infrastructure concerns all mixed together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a beginner-friendly maintenance surface. It’s a real database codebase, and a lot of the work sits in the uncomfortable middle where product expectations, compatibility, performance, and internal simplicity all pull in different directions.&lt;/p&gt;

&lt;p&gt;The kinds of things maintainers might work on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cypher and Bolt compatibility&lt;/li&gt;
&lt;li&gt;MVCC and transactional behavior&lt;/li&gt;
&lt;li&gt;vector and hybrid retrieval execution paths&lt;/li&gt;
&lt;li&gt;storage engine correctness and performance&lt;/li&gt;
&lt;li&gt;audit/history/retention semantics&lt;/li&gt;
&lt;li&gt;benchmarks, profiling, and allocation reduction&lt;/li&gt;
&lt;li&gt;test infrastructure, spec coverage, and regression prevention&lt;/li&gt;
&lt;li&gt;docs and tooling that improve the contributor experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I care much more about engineering taste than resume pedigree.&lt;/p&gt;

&lt;p&gt;If you’re the kind of person who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;writes tests for bugs before fixing them&lt;/li&gt;
&lt;li&gt;gets annoyed by hidden allocations and avoidable abstractions&lt;/li&gt;
&lt;li&gt;wants docs and tooling to be part of the product, not an afterthought&lt;/li&gt;
&lt;li&gt;uses modern AI tooling to move faster, but still insists on clear specs and defensible code&lt;/li&gt;
&lt;li&gt;likes the idea of maintaining infrastructure that other serious teams can trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’d like to talk.&lt;/p&gt;

&lt;p&gt;If that sounds like you, reply here or DM me with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a few lines on what kinds of systems work you like&lt;/li&gt;
&lt;li&gt;links to anything you’ve built, maintained, or profiled&lt;/li&gt;
&lt;li&gt;what parts of NornicDB you’d most want to touch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m intentionally looking for a small number of strong fits, not a large intake.&lt;/p&gt;

</description>
      <category>database</category>
      <category>contributorswanted</category>
      <category>github</category>
      <category>sre</category>
    </item>
    <item>
      <title>The "Boxing In" Strategy: How Go is the Goldilocks Language for AI-Assisted Engineering</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Sun, 29 Mar 2026 17:20:24 +0000</pubDate>
      <link>https://dev.to/orneryd/the-boxing-in-strategy-how-go-is-the-goldilocks-language-for-ai-assisted-engineering-l30</link>
      <guid>https://dev.to/orneryd/the-boxing-in-strategy-how-go-is-the-goldilocks-language-for-ai-assisted-engineering-l30</guid>
      <description>&lt;p&gt;There is a growing realization among developers using AI agents like Cursor, Windsurf, or GitHub Copilot: the choice of programming language is no longer just about runtime performance or ecosystem. It is now about &lt;strong&gt;LLM Steering.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During the development of &lt;strong&gt;NornicDB&lt;/strong&gt; and other projects, I used &lt;strong&gt;AI-assisted engineering&lt;/strong&gt;. I want to make a clear distinction here: this is not "vibe coding." To me, "vibing" is just going with whatever the AI suggests—a passive approach that often leads to technical debt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-assisted engineering&lt;/strong&gt; is a deliberate, high-rigor cycle: using AI for research and planning, drafting a spec, reviewing it, whiteboarding the logic, using the AI to validate the theory in isolated code, and &lt;em&gt;then&lt;/em&gt; applying it to the project. In this workflow, Go is structurally unique. It doesn't just run well; it "boxes in" the AI during that final implementation phase, preventing the hallucination-filled "spaghetti" that often plagues AI-generated code in more flexible languages.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. The "GPS" Effect: Forcing Explicit Intent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The greatest weakness of LLMs is &lt;strong&gt;abstraction drift&lt;/strong&gt;. In languages with deep inheritance or highly flexible functional patterns (like TypeScript or Python), an AI often loses the architectural thread, suggesting three different ways to solve the same problem.&lt;/p&gt;

&lt;p&gt;Go solves this by being &lt;strong&gt;intentionally limited&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Package Boundaries:&lt;/strong&gt; Go’s strict folder-to-package mapping acts as a physical guardrail. The LLM is structurally discouraged from creating complex, circular dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No "Magic":&lt;/strong&gt; Because Go lacks hidden meta-programming, complex decorators, or deep class hierarchies, the AI is forced to write &lt;strong&gt;explicit code&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;My Opinion:&lt;/strong&gt; I believe that for a probabilistic model like an LLM, "explicit" is synonymous with "predictable." By narrowing the solution space to a few idiomatic paths, Go acts as a structural GPS. It doesn't let the AI get "too clever," which is usually when logic begins to break down.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. The OODA Loop: Validating Theory at Scale&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A core part of my engineering process is using AI to validate a theory in code before it ever touches the main repository. Go’s near-instant compilation makes this &lt;strong&gt;Observe-Orient-Decide-Act (OODA)&lt;/strong&gt; loop incredibly tight.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instant Feedback:&lt;/strong&gt; If a validation cycle takes 30 seconds (common in C++ or heavy Java apps), the momentum of the engineering process dies. Go allows me to test a theoretical concurrency pattern or a pointer-safety fix in milliseconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tooling Synergy:&lt;/strong&gt; Because &lt;code&gt;go fmt&lt;/code&gt;, &lt;code&gt;go test&lt;/code&gt;, and &lt;code&gt;go race&lt;/code&gt; are standard and built-in, the AI can generate and run validation tests that match production standards immediately.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Logical Cross-Pollination (The C/C++ Factor)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I’ve noticed anecdotally that LLMs seem to leverage their massive training data in C and C++ to improve their Go logic. While the syntax differs, the &lt;strong&gt;underlying systems logic&lt;/strong&gt;—concurrency patterns, pointer safety, and memory alignment—is highly transferable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Logic Transfer:&lt;/strong&gt; Algorithmic patterns (like HNSW for vector search or MVCC for transaction isolation) translate beautifully from C++ logic into Go implementation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Contamination" Risk (Criticism):&lt;/strong&gt; You must be the "Adult in the Room." Because Go looks like the C-family, LLMs will occasionally try to write "Go-flavored C," attempting manual memory management or pointer arithmetic that fights Go’s garbage collector. This is why the &lt;strong&gt;Review&lt;/strong&gt; and &lt;strong&gt;Whiteboarding&lt;/strong&gt; stages of my process are non-negotiable.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Proof of Concept: The NornicDB Experience&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When I implemented &lt;strong&gt;Snapshot Isolation (SI)&lt;/strong&gt; and a &lt;strong&gt;BYOM (Bring Your Own Model)&lt;/strong&gt; embedding engine into NornicDB, the AI didn't just "vibe" out the code. We went through a rigorous spec and validation phase.&lt;/p&gt;

&lt;p&gt;Because Go handles concurrency through core keywords (&lt;code&gt;channels&lt;/code&gt;/&lt;code&gt;select&lt;/code&gt;), the AI-generated implementation of that spec was structurally sound from the first draft. In more permissive languages, the AI might have suggested five different async libraries; in Go, it just followed the spec into a &lt;code&gt;select&lt;/code&gt; block.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result?&lt;/strong&gt; A hybrid system that hits &lt;strong&gt;~0.6ms P50&lt;/strong&gt; for vector search and &lt;strong&gt;~1.6ms&lt;/strong&gt; for 1-hop graph traversals. The "box" didn't limit the performance—it ensured the AI built it correctly according to the plan.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion: Boxes, Not Blank Canvases&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you’re struggling with AI-assisted development, stop giving your agents a blank canvas. A blank canvas is where hallucinations happen. Give them a &lt;strong&gt;box&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Go is that box. It isn’t opinionated in a way that restricts your freedom, but it is foundational in a way that forces the AI to implement your validated vision with rigor. When the language enforces the boundaries, the engineer is finally free to focus on the high-level architecture and the deep planning that "vibe coding" often skips.&lt;/p&gt;

&lt;p&gt;Is Go the perfect language? No. But for a rigorous AI-assisted engineering workflow, it’s the most reliable one we have.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I am the author of **NornicDB&lt;/em&gt;&lt;em&gt;, an open-source hybrid database. You can see how these engineering patterns resulted in high-performance infrastructure at &lt;a href="https://github.com/orneryd/NornicDB" rel="noopener noreferrer"&gt;github.com/orneryd/NornicDB&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>agents</category>
      <category>sre</category>
      <category>llm</category>
    </item>
    <item>
      <title>~1ms hybrid graph + vector queries (network is now the bottleneck)</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Thu, 26 Mar 2026 00:16:17 +0000</pubDate>
      <link>https://dev.to/orneryd/1ms-hybrid-graph-vector-queries-network-is-now-the-bottleneck-340k</link>
      <guid>https://dev.to/orneryd/1ms-hybrid-graph-vector-queries-network-is-now-the-bottleneck-340k</guid>
      <description>&lt;p&gt;I finally have benchmark results worth sharing.&lt;/p&gt;




&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~0.6ms&lt;/strong&gt; p50 — vector search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~1.6ms&lt;/strong&gt; p50 — vector + 1-hop graph traversal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~6k–15k req/s&lt;/strong&gt; locally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When deployed remotely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~110ms p50&lt;/strong&gt;, which exactly matches network latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;→ The database is fast enough that &lt;strong&gt;the network dominates total latency&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What was tested
&lt;/h3&gt;

&lt;p&gt;Two query types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Vector only&lt;/strong&gt; (embedding similarity, top-k)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector + one-hop graph traversal&lt;/strong&gt; (expand into knowledge graph)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;800 requests&lt;/li&gt;
&lt;li&gt;noisy / real-ish text inputs&lt;/li&gt;
&lt;li&gt;concurrent execution&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Local (M3 Max 64GB Native MacOS Installer)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Vector only&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p50: ~0.58ms&lt;/li&gt;
&lt;li&gt;p95: ~0.80ms&lt;/li&gt;
&lt;li&gt;~15.7k req/s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Vector + graph&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p50: ~1.6ms&lt;/li&gt;
&lt;li&gt;p95: ~2.3ms&lt;/li&gt;
&lt;li&gt;~6k req/s&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Remote (GCP, 8 cores, 32GB RAM)
&lt;/h3&gt;

&lt;p&gt;Client → server latency: ~110ms&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector only&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p50: ~110.7ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Vector + graph&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p50: ~112.9ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The delta between local and remote ≈ network RTT.&lt;/p&gt;




&lt;h3&gt;
  
  
  What’s interesting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Adding graph traversal costs &lt;strong&gt;~1ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Latency distribution is tight (low variance)&lt;/li&gt;
&lt;li&gt;Hybrid queries behave almost like constant-time at small depth&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most systems treat this as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;vector DB + graph DB + glue code&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;one execution engine&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  How this compares (public numbers)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Vector DBs (Pinecone / Weaviate / Qdrant)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Typically &lt;strong&gt;5–50ms p50&lt;/strong&gt; depending on index + scale&lt;/li&gt;
&lt;li&gt;Often network + ANN dominates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Neo4j (graph + vector)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Graph queries: typically &lt;strong&gt;10–100ms+&lt;/strong&gt; depending on traversal&lt;/li&gt;
&lt;li&gt;Vector is a newer add-on layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;TigerGraph&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong traversal performance (parallelized)&lt;/li&gt;
&lt;li&gt;Still generally &lt;strong&gt;multi-ms to 10s of ms&lt;/strong&gt; for real queries&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Important caveats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;These are &lt;strong&gt;single-node, in-memory-ish conditions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Dataset is not at billion-scale (yet)&lt;/li&gt;
&lt;li&gt;Remote throughput is &lt;strong&gt;latency-bound&lt;/strong&gt;, not compute-bound&lt;/li&gt;
&lt;li&gt;Found a &lt;strong&gt;response consistency bug&lt;/strong&gt; (fixed next)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What this suggests
&lt;/h3&gt;

&lt;p&gt;If hybrid queries are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~1–2ms compute&lt;/li&gt;
&lt;li&gt;+100ms network&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then optimizing the DB further doesn’t matter unless:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you colocate compute&lt;/li&gt;
&lt;li&gt;or batch / pipeline queries&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Takeaway
&lt;/h3&gt;

&lt;p&gt;We’re hitting a point where:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;hybrid retrieval is cheaper than the network it rides on&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Looking for feedback on:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;deeper traversal benchmarks (2–3 hops)&lt;/li&gt;
&lt;li&gt;scaling behavior (dataset + concurrency)&lt;/li&gt;
&lt;li&gt;fair comparisons vs existing systems&lt;/li&gt;
&lt;li&gt;real-world workloads (RAG, entity resolution, etc.)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If this resonates (or sounds wrong), I’d love to hear why.&lt;/p&gt;

&lt;h3&gt;
  
  
  Addendum: test setup + external verification
&lt;/h3&gt;

&lt;p&gt;For anyone who wants to reproduce or challenge these numbers: the benchmark used a single-node dataset with &lt;strong&gt;67,280 nodes&lt;/strong&gt;, &lt;strong&gt;40,921 edges&lt;/strong&gt;, and &lt;strong&gt;67,298 embeddings&lt;/strong&gt; indexed with &lt;strong&gt;HNSW (CPU-only)&lt;/strong&gt;. Workload was &lt;strong&gt;800 requests/query type&lt;/strong&gt;, noisy natural-language prompts, concurrent clients, and two query shapes: (1) vector top-k, (2) vector top-k + &lt;strong&gt;1-hop&lt;/strong&gt; graph expansion over returned entities. Local runs were on an M3 Max locally with the native installer; remote runs were on GCP (8 vCPU, 32GB RAM).&lt;/p&gt;

&lt;p&gt;The key observation is straightforward: local compute stayed in low-ms, while remote p50 tracked client↔server RTT (~110ms), so end-to-end latency was network-bound. If you run this yourself, please share p50/p95, dataset size, and hop depth so results are directly comparable.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nodes&lt;/td&gt;
&lt;td&gt;67,280&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edges&lt;/td&gt;
&lt;td&gt;40,921&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embeddings&lt;/td&gt;
&lt;td&gt;67,298&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector index&lt;/td&gt;
&lt;td&gt;HNSW, CPU-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request count&lt;/td&gt;
&lt;td&gt;800 per query type&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query types&lt;/td&gt;
&lt;td&gt;Vector top-k; Vector top-k + 1-hop traversal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Verification queries (same shape)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Vector-only (same query shape as benchmark)&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NORNIC_USERNAME&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$NORNIC_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ENDPOINT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Accept: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "statements":[
      {
        "statement":"CALL db.index.vector.queryNodes('&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;'idx_original_text'&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;', $topK, $text) YIELD node, score RETURN node.originalText AS originalText, score ORDER BY score DESC LIMIT $topK",
        "parameters":{"text":"get it delivered","topK":5},
        "resultDataContents":["row"]
      }
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Vector + one-hop graph (same query shape as benchmark)&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NORNIC_USERNAME&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$NORNIC_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ENDPOINT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Accept: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "statements":[
      {
        "statement":"CALL db.index.vector.queryNodes('&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;'idx_original_text'&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;', $topK, $text) YIELD node, score MATCH (node:OriginalText)-[:TRANSLATES_TO]-&amp;gt;(t:TranslatedText) WHERE t.language = $targetLang RETURN node.originalText AS originalText, score, t.language AS language, coalesce(t.auditedText, t.translatedText) AS translatedText ORDER BY score DESC, language LIMIT $topK",
        "parameters":{"text":"get it delivered","topK":5,"targetLang":"es"},
        "resultDataContents":["row"]
      }
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/orneryd/NornicDB/releases/tag/v1.0.33" rel="noopener noreferrer"&gt;https://github.com/orneryd/NornicDB/releases/tag/v1.0.33&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>database</category>
      <category>performance</category>
      <category>rag</category>
    </item>
    <item>
      <title>Building a Low-Latency MVCC Graph+Vector Database: The Pitfalls That Actually Matter</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Wed, 25 Mar 2026 15:01:22 +0000</pubDate>
      <link>https://dev.to/orneryd/building-a-low-latency-mvcc-graphvector-database-the-pitfalls-that-actually-matter-6ac</link>
      <guid>https://dev.to/orneryd/building-a-low-latency-mvcc-graphvector-database-the-pitfalls-that-actually-matter-6ac</guid>
      <description>&lt;p&gt;Most posts about graph+vector systems focus on feature lists. The hard part is not features. It is maintaining low tail latency while preserving snapshot isolation, temporal history, and managed embeddings in one database runtime.&lt;/p&gt;

&lt;p&gt;This post focuses on the non-obvious engineering problems that showed up in production-like conditions, and the techniques that actually resolved them.&lt;/p&gt;

&lt;h2&gt;
  
  
  1) Latency budgets are architecture budgets
&lt;/h2&gt;

&lt;p&gt;For hybrid retrieval, every boundary in the online path (transport, embedding, retrieval, rerank, graph materialization) adds fixed cost. If you need “instant-feeling” responses, boundary placement is a performance decision, not just an org-chart decision.&lt;/p&gt;

&lt;p&gt;The practical pattern is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep protocol flexibility at the edge.&lt;/li&gt;
&lt;li&gt;Keep the hot retrieval and consistency path tight.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2) Snapshot isolation for graphs requires topology-aware validation
&lt;/h2&gt;

&lt;p&gt;In graph storage, SI is not just “row version check at commit.” You must validate graph structure races:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;edge creation racing with endpoint deletion&lt;/li&gt;
&lt;li&gt;concurrent adjacency mutations around node deletes&lt;/li&gt;
&lt;li&gt;traversal visibility consistency across snapshot boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without topology-aware commit validation, you can pass SI-style checks and still commit structurally invalid graph states.&lt;/p&gt;

&lt;h2&gt;
  
  
  3) MVCC retention can create historical lookup cliffs
&lt;/h2&gt;

&lt;p&gt;Once you introduce pruning, historical reads can degrade badly if lookup depends on sparse post-prune chains. This becomes visible only under real retention churn.&lt;/p&gt;

&lt;p&gt;The fix is to persist per-key retention anchors in MVCC metadata and resolve historical visibility from deterministic retained floors, not optimistic chain walking. That stabilizes historical lookups even after repeated prune cycles.&lt;/p&gt;

&lt;h2&gt;
  
  
  4) “Current-only” indexing is mandatory when history exists
&lt;/h2&gt;

&lt;p&gt;Temporal history and online retrieval have different goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;temporal history exists for audit/reconstruction&lt;/li&gt;
&lt;li&gt;online search exists for current relevance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If historical versions leak into live vector/keyword indexes, retrieval quality drifts and stale entities contaminate candidates. Current-only indexing for live search avoids that failure mode while preserving full historical queryability through MVCC/temporal APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  5) Async embeddings create intentional dual-latency behavior
&lt;/h2&gt;

&lt;p&gt;When the database manages embeddings, write behavior naturally splits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast commit path for transactional state&lt;/li&gt;
&lt;li&gt;deferred embedding work with longer completion windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is expected. The requirement is clear operational semantics and instrumentation that separates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;commit latency&lt;/li&gt;
&lt;li&gt;queueing latency&lt;/li&gt;
&lt;li&gt;embedding execution latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that separation, healthy async behavior gets misdiagnosed as storage/query regressions.&lt;/p&gt;

&lt;h2&gt;
  
  
  6) NFS exposed lock contention that fast local storage hid
&lt;/h2&gt;

&lt;p&gt;A key lesson from this release cycle: moving to Docker + NFS did not just make things slower, it changed what was visible.&lt;/p&gt;

&lt;p&gt;On very fast local storage, some lock contention patterns were masked by short I/O stalls. Under NFS latency variance, those same code paths held locks across work that did not need to be in the critical section. Tail spikes made the contention obvious.&lt;/p&gt;

&lt;p&gt;What changed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We narrowed lock scope in storage API hot paths.&lt;/li&gt;
&lt;li&gt;We applied targeted unlock/relock boundaries around non-critical, longer-running work.&lt;/li&gt;
&lt;li&gt;We kept correctness-sensitive state transitions inside the minimal protected region.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result was not “NFS became fast.” The result was that storage-path lock contention stopped amplifying NFS latency into avoidable tail spikes.&lt;/p&gt;

&lt;h2&gt;
  
  
  7) Conflict semantics and retries are part of performance, not just correctness
&lt;/h2&gt;

&lt;p&gt;Under contention, raw engine-specific conflict leaks produce unstable client behavior and poor retry patterns. Normalizing conflicts into a consistent retryable class and using bounded retry helpers at API boundaries improves both correctness and latency predictability under concurrent load.&lt;/p&gt;

&lt;h2&gt;
  
  
  8) Timings must be interpreted by query shape, not averages
&lt;/h2&gt;

&lt;p&gt;Mixed workloads contain fundamentally different operations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;point lookups&lt;/li&gt;
&lt;li&gt;indexed reads&lt;/li&gt;
&lt;li&gt;bulk scans/deletes&lt;/li&gt;
&lt;li&gt;embedding-triggering writes&lt;/li&gt;
&lt;li&gt;validation queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsecond reads and multi-second maintenance or async-adjacent writes can coexist in the same healthy system. Performance analysis only makes sense when timings are tied to operation class and execution path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The core challenge in this category is not “graph” and not “vector” in isolation. It is enforcing one coherent consistency and latency contract across transactional graph state, temporal history, and managed embedding workflows.&lt;/p&gt;

&lt;p&gt;The pitfalls above are where that contract usually breaks. They are also where the most meaningful performance and reliability gains came from in practice.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>In the New Agentic World: The Software Career Ladder Is Being Rewritten</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Wed, 04 Mar 2026 16:39:23 +0000</pubDate>
      <link>https://dev.to/orneryd/in-the-new-agentic-world-the-software-career-ladder-is-being-rewritten-4cek</link>
      <guid>https://dev.to/orneryd/in-the-new-agentic-world-the-software-career-ladder-is-being-rewritten-4cek</guid>
      <description>&lt;p&gt;I’m going to be direct: software is not going away, but the &lt;em&gt;shape&lt;/em&gt; of software work is changing faster than most people are willing to admit.&lt;/p&gt;

&lt;p&gt;I believe we are entering an agentic era where AI doesn’t just autocomplete code, it co-implements systems. That changes who gets hired, what skills are considered “core,” and where human judgment still matters.&lt;/p&gt;

&lt;p&gt;This post is intentionally forward-looking. I’ll separate &lt;strong&gt;my opinion/projection&lt;/strong&gt; from what is &lt;strong&gt;currently supported by evidence&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Thesis (Opinion)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Software architecture becomes rarer, higher-stakes, and more formal
&lt;/h3&gt;

&lt;p&gt;I expect fewer people to hold true architecture roles, and those roles to become more selective and possibly more credentialed over time. In an agentic world, architecture is no longer “draw boxes and arrows.” It becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;defining system boundaries AI can safely operate within,&lt;/li&gt;
&lt;li&gt;setting policy and compliance constraints,&lt;/li&gt;
&lt;li&gt;owning failure modes and rollback design,&lt;/li&gt;
&lt;li&gt;deciding what must stay deterministic vs probabilistic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short: fewer architects, but more responsibility per architect.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Data engineering fluency becomes the new baseline for “software engineer”
&lt;/h3&gt;

&lt;p&gt;I expect a big shift where what we currently call “data engineering” becomes normal engineering literacy. If your product has AI in it, then data quality, lineage, retrieval, embedding strategy, and observability are not specialist concerns - they’re table stakes.&lt;/p&gt;

&lt;p&gt;My stronger take: the engineer who can’t reason about data pipelines and model interfaces will feel like a frontend engineer in 2008 who refused to learn JavaScript.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) DevOps does not disappear - it evolves into AI governance in production
&lt;/h3&gt;

&lt;p&gt;DevOps/SRE becomes even more critical. The work shifts toward validating AI-proposed changes, enforcing guardrails, and making sure “it worked in a prompt” doesn’t become “we took down prod.”&lt;/p&gt;

&lt;p&gt;Infra will be increasingly generated, but trust will still be earned through verification, policy, and incident response.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) The entry-level ladder is getting steeper, right now
&lt;/h3&gt;

&lt;p&gt;The painful truth: a lot of junior-level code tasks are exactly what agents absorb first. That doesn’t mean juniors are useless; it means old apprenticeship pathways are breaking before new ones are built.&lt;/p&gt;

&lt;p&gt;Large companies with structured graduate programs may keep hiring at scale. Everyone else may expect “AI-augmented mid-level output” from day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Current Evidence Supports
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Strong support: AI/data skills are rising fast
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;WEF Future of Jobs 2025&lt;/strong&gt; reports strong growth in AI, big data, and software-related roles and skills.
Source: &lt;a href="https://www.weforum.org/publications/the-future-of-jobs-report-2025/" rel="noopener noreferrer"&gt;World Economic Forum&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BLS&lt;/strong&gt; still projects strong software developer growth while data-centric occupations remain among the fastest-rising categories.
Source: &lt;a href="https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm" rel="noopener noreferrer"&gt;U.S. Bureau of Labor Statistics&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Strong support: DevOps/platform quality becomes more important with AI
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DORA&lt;/strong&gt; findings suggest AI can improve parts of the development process, but delivery outcomes depend heavily on platform quality and operational fundamentals.
Source: &lt;a href="https://dora.dev/research/2024/dora-report/" rel="noopener noreferrer"&gt;DORA 2024 Report&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Moderate-to-strong support: junior role pressure is real, but uneven
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;There is growing evidence and credible discussion that entry-level pathways are under pressure as AI handles routine implementation work.&lt;/li&gt;
&lt;li&gt;At the same time, hiring patterns are uneven by company type, geography, and maturity of internal training pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Strong support: AI coding tools increase productivity in many contexts - but not automatically
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Studies around AI coding assistants show productivity and confidence gains in many settings.&lt;/li&gt;
&lt;li&gt;Results vary by team process, review culture, and test rigor; quality regressions can occur without guardrails.

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness" rel="noopener noreferrer"&gt;GitHub Copilot research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://survey.stackoverflow.co/2024/ai" rel="noopener noreferrer"&gt;Stack Overflow 2024 AI survey&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where I’m Projecting Beyond the Data (And I’m Owning That)
&lt;/h2&gt;

&lt;p&gt;These are my bets, not settled facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;“Few software architects”&lt;/strong&gt;: evidence shows role polarization, but not definitive proof of formal accreditation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“Data engineering becomes the default SWE identity”&lt;/strong&gt;: evidence supports convergence of skills, but not full replacement of traditional engineering tracks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“Junior ladder pulled up”&lt;/strong&gt;: directionally supported, but likely to be cyclical and industry-dependent rather than absolute.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Practical Career Map for the Agentic Era
&lt;/h2&gt;

&lt;p&gt;If you’re a student, junior, or mid-level engineer, here’s the adaptation path I believe matters most:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Learn systems + data together&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Build things where retrieval, metrics, and model behavior are first-class concerns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Treat AI output as untrusted code&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Verification, testing, and failure analysis are career accelerators now.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Develop “prompt-to-production” judgment&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Anyone can generate code; fewer people can make safe, maintainable, compliant systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build in public with measurable outcomes&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Show latency reductions, lower error rates, improved reliability - not just demos.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Get good at platform constraints&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
CI/CD, policy-as-code, secrets, observability, rollback plans: this is where human leverage compounds.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;I don’t think software engineering is dying. I think it’s splitting.&lt;/p&gt;

&lt;p&gt;One path becomes high-trust engineering: architecture, data systems, platform reliability, and governance.&lt;br&gt;&lt;br&gt;
The other path becomes commoditized implementation mediated by agents.&lt;/p&gt;

&lt;p&gt;My opinion is simple: the winners won’t be the people who “use AI.”&lt;br&gt;&lt;br&gt;
They’ll be the people who can &lt;strong&gt;direct, constrain, verify, and operationalize AI&lt;/strong&gt; at system level.&lt;/p&gt;

&lt;p&gt;That’s the new craft.&lt;/p&gt;

</description>
      <category>careerdevelopment</category>
      <category>ai</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>The Full Graph-RAG Stack As Declarative Pipelines in Cypher</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Wed, 04 Mar 2026 16:15:04 +0000</pubDate>
      <link>https://dev.to/orneryd/the-full-graph-rag-stack-as-declarative-pipelines-in-cypher-dn1</link>
      <guid>https://dev.to/orneryd/the-full-graph-rag-stack-as-declarative-pipelines-in-cypher-dn1</guid>
      <description>&lt;p&gt;Most RAG systems aren’t architected so much as assembled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;embedding service&lt;/li&gt;
&lt;li&gt;vector search service&lt;/li&gt;
&lt;li&gt;reranker&lt;/li&gt;
&lt;li&gt;LLM endpoint&lt;/li&gt;
&lt;li&gt;application glue for retries, timeouts, auth, and marshaling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It works, until you spend more time maintaining orchestration than improving retrieval quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/orneryd/NornicDB/commits/main/?since=2026-03-03&amp;amp;until=2026-03-03" rel="noopener noreferrer"&gt;https://github.com/orneryd/NornicDB/commits/main/?since=2026-03-03&amp;amp;until=2026-03-03&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This update to &lt;strong&gt;NornicDB&lt;/strong&gt; changes that model: retrieval, embedding, reranking, and inference are now first-class Cypher procedures. The important part is not “new APIs.” The important part is that these stages now execute as part of the query engine.&lt;/p&gt;




&lt;h2&gt;
  
  
  What landed
&lt;/h2&gt;

&lt;p&gt;New Cypher primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;db.retrieve&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;db.rretrieve&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;db.rerank&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;db.infer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;db.index.vector.embed&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are read-oriented pipeline operators designed to compose inside Cypher, not wrappers around separate app-tier flows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this is materially different
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) The pipeline is now declarative and inspectable
&lt;/h3&gt;

&lt;p&gt;Instead of this in app code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;vectorSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;infer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;you can express the same intent in Cypher:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;db.index.vector.embed&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;$query&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;db.index.vector.queryNodes&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'doc_index'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="ss"&gt;({&lt;/span&gt;&lt;span class="py"&gt;id:&lt;/span&gt; &lt;span class="n"&gt;node.id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="py"&gt;content:&lt;/span&gt; &lt;span class="nf"&gt;coalesce&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node.content&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="ss"&gt;)),&lt;/span&gt; &lt;span class="py"&gt;score:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;db.rerank&lt;/span&gt;&lt;span class="ss"&gt;({&lt;/span&gt;&lt;span class="py"&gt;query:&lt;/span&gt; &lt;span class="n"&gt;$query&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="py"&gt;candidates:&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;rerankTopK&lt;/span&gt;&lt;span class="dl"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_score&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That makes the pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;versionable&lt;/li&gt;
&lt;li&gt;benchmarkable&lt;/li&gt;
&lt;li&gt;testable&lt;/li&gt;
&lt;li&gt;visible to query execution semantics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2) Less orchestration overhead in the hot path
&lt;/h3&gt;

&lt;p&gt;You still call models. But you remove a lot of unnecessary app-layer choreography between stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fewer service hops&lt;/li&gt;
&lt;li&gt;less JSON marshalling back-and-forth&lt;/li&gt;
&lt;li&gt;fewer per-hop retries/timeouts to coordinate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces tail-latency and shrinks the operational failure surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Graph + semantic logic are fused in one plan
&lt;/h3&gt;

&lt;p&gt;Because these are Cypher stages, you can combine semantic retrieval with graph constraints directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;u:&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;id:&lt;/span&gt; &lt;span class="n"&gt;$userId&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:MEMBER_OF&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;g:&lt;/span&gt;&lt;span class="n"&gt;Group&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;db.index.vector.embed&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;$query&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;db.index.vector.queryNodes&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'doc_index'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:VISIBLE_TO&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="ss"&gt;({&lt;/span&gt;&lt;span class="py"&gt;id:&lt;/span&gt; &lt;span class="n"&gt;node.id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="py"&gt;content:&lt;/span&gt; &lt;span class="nf"&gt;coalesce&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node.content&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="ss"&gt;)),&lt;/span&gt; &lt;span class="py"&gt;score:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;db.rerank&lt;/span&gt;&lt;span class="ss"&gt;({&lt;/span&gt;&lt;span class="py"&gt;query:&lt;/span&gt; &lt;span class="n"&gt;$query&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="py"&gt;candidates:&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_score&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not “vector DB results, then post-filter in app code.” It’s one composable query flow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Query planner + cache: why this is practical, not just ergonomic
&lt;/h2&gt;

&lt;p&gt;Adding new procedures is easy. Making them production-usable is harder. The key enabler is how query planning and caching interact with these primitives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Planning path
&lt;/h3&gt;

&lt;p&gt;NornicDB already routes &lt;code&gt;CALL&lt;/code&gt; procedures through the Cypher executor dispatch path. That means these RAG primitives participate in the same execution flow as other query stages, rather than being side-channel operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query plan cache
&lt;/h3&gt;

&lt;p&gt;NornicDB keeps a parsed/structured plan cache for repeated query shapes. For RAG workloads, this matters because many queries are template-like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;same Cypher structure&lt;/li&gt;
&lt;li&gt;different parameters (&lt;code&gt;$query&lt;/code&gt;, &lt;code&gt;$userId&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the engine avoids repeated parse/analysis overhead for the same pipeline shape, and only rebinds inputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Result cache policy (important boundaries)
&lt;/h3&gt;

&lt;p&gt;Read-query result caching now treats these primitives intentionally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cacheable by default:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;db.retrieve&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;db.rretrieve&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;db.rerank&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;db.index.vector.embed&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;
&lt;code&gt;db.infer&lt;/code&gt; is &lt;strong&gt;not&lt;/strong&gt; cached by default

&lt;ul&gt;
&lt;li&gt;can be opted in per call (&lt;code&gt;cache: true&lt;/code&gt; / &lt;code&gt;cache_enabled: true&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This is the right split:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval/rerank/embed are often deterministic enough for reuse under normal invalidation rules&lt;/li&gt;
&lt;li&gt;inference can be non-deterministic and should require explicit opt-in&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Correctness under writes
&lt;/h3&gt;

&lt;p&gt;Cached read results follow normal invalidation behavior on writes. So this is not “cache forever”; it is “cache when safe, invalidate on data mutation.”&lt;/p&gt;

&lt;p&gt;Net effect: you keep low overhead for repeated pipeline templates without pretending inference is always deterministic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Procedure boundaries (clear contract)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;db.retrieve&lt;/code&gt;: retrieval stage&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;db.rretrieve&lt;/code&gt;: retrieval shorthand, auto-rerank if configured/available&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;db.rerank&lt;/code&gt;: &lt;strong&gt;true Stage-2 API&lt;/strong&gt; over caller-provided candidates (does not run retrieval)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;db.index.vector.embed&lt;/code&gt;: returns embedding array for explicit manual pipeline control&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;db.infer&lt;/code&gt;: inference stage, default non-cached&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That split keeps simple flows short and advanced flows explicit.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this is not
&lt;/h2&gt;

&lt;p&gt;This does &lt;strong&gt;not&lt;/strong&gt; mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;instant hosted model platform&lt;/li&gt;
&lt;li&gt;one-line “AI solved” pipeline&lt;/li&gt;
&lt;li&gt;no tradeoffs in model/provider choice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You still choose providers and quality/latency/cost tradeoffs. What changed is where orchestration logic lives.&lt;/p&gt;




&lt;h1&gt;
  
  
  Common patterns today:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;vector systems with retrieval APIs, but app-driven orchestration&lt;/li&gt;
&lt;li&gt;graph + external RAG glue&lt;/li&gt;
&lt;li&gt;managed black-box pipelines with limited control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach is different: orchestration becomes query-native and composable in Cypher, with planner/cache semantics instead of ad-hoc application control flow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;The main gain is not syntactic convenience. It is reducing accidental complexity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fewer moving parts outside the data layer&lt;/li&gt;
&lt;li&gt;fewer duplicated pipelines across services/repos&lt;/li&gt;
&lt;li&gt;better observability and repeatability for retrieval flows&lt;/li&gt;
&lt;li&gt;easier benchmarkability of real pipeline templates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The strategic question shifts from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How should we glue these services together?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Which query pipeline shape should we run for this workload?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a better problem to have.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>Architectural Consolidation for Low-Latency Retrieval Systems: Why We Co-Located Transport, Embedding, Search, and Reranking</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Mon, 02 Mar 2026 18:13:10 +0000</pubDate>
      <link>https://dev.to/orneryd/architectural-consolidation-for-low-latency-retrieval-systems-why-we-co-located-transport-4kci</link>
      <guid>https://dev.to/orneryd/architectural-consolidation-for-low-latency-retrieval-systems-why-we-co-located-transport-4kci</guid>
      <description>&lt;p&gt;Most Graph-RAG systems are built as a chain of services:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;API ingress
&lt;/li&gt;
&lt;li&gt;query embedding service
&lt;/li&gt;
&lt;li&gt;vector DB
&lt;/li&gt;
&lt;li&gt;sparse/BM25 service
&lt;/li&gt;
&lt;li&gt;fusion/rerank service
&lt;/li&gt;
&lt;li&gt;generation service&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwh1zu4y2o83gyp6z72vy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwh1zu4y2o83gyp6z72vy.png" alt="Typical Graph-RAG Architecture" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That decomposition is clean on paper. It is rarely cheap on the critical path.&lt;/p&gt;

&lt;p&gt;NornicDB made a deliberate architectural trade: &lt;strong&gt;co-locate the online retrieval path in one runtime/container&lt;/strong&gt; (transport, query embedding, retrieval, fusion, rerank, and response assembly) and optimize that path hard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vzl2z0k7g0deuhw75g8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vzl2z0k7g0deuhw75g8.png" alt="NornicDB co-location" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This post is about that choice: what it buys, what it costs, and how we mitigate the costs in code today.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why consolidate at all?
&lt;/h3&gt;

&lt;p&gt;If you split 5 stages across services and each boundary adds even ~1.0–1.5 ms of serialization/network/scheduler overhead, you can burn &lt;strong&gt;5–7.5 ms&lt;/strong&gt; before meaningful retrieval work.&lt;/p&gt;

&lt;p&gt;That’s basically the whole budget for “feels instant” search.&lt;/p&gt;

&lt;p&gt;In the co-located NornicDB path, we cut most of that boundary tax out. In the run you shared (1M corpus setup), we saw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026/02/18 08:01:14 🔍 Search request database="nornic" query="where a prescriptions?"
2026/02/18 08:01:14 ⏱️ Search timing: method=rrf_hybrid cache_hit=false fallback=false total_ms=0 vector_ms=0 bm25_ms=0 fusion_ms=0 candidates[v=26,b=0,f=26] returned=20 query="where a prescriptions?"
[HTTP] POST /nornicdb/search 200 7.96575ms
2026/02/18 08:01:36 🔍 Search request database="nornic" query="where to get the drugs?"
2026/02/18 08:01:36 ⏱️ Search timing: method=rrf_hybrid cache_hit=false fallback=false total_ms=0 vector_ms=0 bm25_ms=0 fusion_ms=0 candidates[v=26,b=0,f=26] returned=20 query="where to get the drugs?"
[HTTP] POST /nornicdb/search 200 7.334291ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mean from those two samples: &lt;strong&gt;~7.65 ms&lt;/strong&gt; end-to-end HTTP.&lt;/p&gt;




&lt;h3&gt;
  
  
  The architectural shape we optimized for
&lt;/h3&gt;

&lt;p&gt;NornicDB keeps compatibility/protocol flexibility at the edge (Bolt/Cypher, REST, GraphQL, gRPC including Qdrant-compatible flows), but collapses online retrieval internals into one operational surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in-process embedding path&lt;/li&gt;
&lt;li&gt;in-process hybrid retrieval orchestration&lt;/li&gt;
&lt;li&gt;in-process optional stage-2 reranking&lt;/li&gt;
&lt;li&gt;in-process transactional graph + vector state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the core reason deployment can be “single container, one runtime, one rollback unit” instead of “service choreography.”&lt;/p&gt;




&lt;h3&gt;
  
  
  Why compressed ANN exists in this architecture
&lt;/h3&gt;

&lt;p&gt;Compression wasn’t added as a “nice-to-have index type.”&lt;br&gt;&lt;br&gt;
It was added as a &lt;strong&gt;scaling lever that preserves the single-runtime model longer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Raw 1024-d float32 vector = 1024 × 4 = &lt;strong&gt;4096 bytes&lt;/strong&gt; before indexing overhead.&lt;br&gt;&lt;br&gt;
At scale, memory bandwidth and cache locality become the bottleneck, not just algorithmic complexity.&lt;/p&gt;

&lt;p&gt;With IVFPQ-style compression, vector payload can drop by orders of magnitude (profile-dependent), which improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in-memory density&lt;/li&gt;
&lt;li&gt;cache residency&lt;/li&gt;
&lt;li&gt;tail-latency stability under load&lt;/li&gt;
&lt;li&gt;throughput per dollar on fixed hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In code, compressed mode is explicitly gated and safety-wrapped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pkg/search/search.go&lt;/code&gt; uses compressed profile resolution&lt;/li&gt;
&lt;li&gt;if compressed profile is inactive -&amp;gt; standard path&lt;/li&gt;
&lt;li&gt;if compressed build/load fails -&amp;gt; &lt;strong&gt;automatic fallback to standard path&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So compression is a scalability primitive, not a reliability gamble.&lt;/p&gt;




&lt;h2&gt;
  
  
  Costs of co-location, and how NornicDB mitigates each one
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Reduced independent scaling of subcomponents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Risk:&lt;/strong&gt; embedding/rerank/generation can’t be scaled as separate deployments as easily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigations implemented:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-database overrides&lt;/strong&gt; for embedding/search/HNSW/k-means and related knobs (&lt;code&gt;docs/operations/configuration.md&lt;/code&gt;), so you can tune behavior by workload without splitting the whole system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider decoupling at runtime&lt;/strong&gt;: embedding and rerank can be local or external (OpenAI/Ollama/HTTP) via config (&lt;code&gt;pkg/server/server.go&lt;/code&gt;, &lt;code&gt;docs/operations/configuration.md&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Planned next step: &lt;strong&gt;sharding roadmap&lt;/strong&gt; (&lt;code&gt;docs/plans/sharding*.md&lt;/code&gt;) for horizontal scale without returning to “everything is a remote hop.”&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2) Tighter resource coupling (CPU/memory/cache contention)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Risk:&lt;/strong&gt; one process means shared contention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigations implemented:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;File-backed vector store&lt;/strong&gt; path to bound RAM during large builds and persistence (&lt;code&gt;pkg/search/search.go&lt;/code&gt;: &lt;code&gt;vectorFileStore&lt;/code&gt; low-RAM path).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime strategy switching&lt;/strong&gt; across CPU brute/GPU brute/HNSW using thresholds (&lt;code&gt;pkg/search/search.go&lt;/code&gt;, &lt;code&gt;docs/operations/configuration.md&lt;/code&gt;), with debounced transitions and replay-before-cutover behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compressed ANN mode&lt;/strong&gt; to reduce memory footprint and bandwidth pressure at high vector counts.&lt;/li&gt;
&lt;li&gt;Async write and queue controls exposed via config for throughput/consistency tuning (&lt;code&gt;docs/operations/configuration.md&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3) Larger blast radius per deploy
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Risk:&lt;/strong&gt; one deploy can affect the full online path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigations implemented:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fail-open reranking load path&lt;/strong&gt;: server starts immediately; reranker loads async; if unavailable/health-check fails, search continues without stage-2 rerank (&lt;code&gt;pkg/server/server.go&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail-open rerank execution&lt;/strong&gt;: rerank errors revert to original order (&lt;code&gt;pkg/search/search.go&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compressed ANN fallback&lt;/strong&gt;: compression failures fall back to standard retrieval path (&lt;code&gt;pkg/search/search.go&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version/compat checks + rebuild path&lt;/strong&gt; for persisted indexes (&lt;code&gt;docs/operations/configuration.md&lt;/code&gt;, &lt;code&gt;pkg/search/search.go&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4) Harder team autonomy boundaries
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Risk:&lt;/strong&gt; fewer service boundaries can blur ownership.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigations implemented:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit extension seams via &lt;strong&gt;plugin systems&lt;/strong&gt; (APOC-style and Heimdall plugin interfaces) in &lt;code&gt;docs/user-guides/heimdall-plugins.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Protocol boundaries remain explicit at API edges (Bolt/Cypher, REST, GraphQL, gRPC), so interface ownership is still clear even when runtime is co-located.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  5) Vendor/runtime lock-in risk
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Risk:&lt;/strong&gt; too many in-process optimizations can trap you in one stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigations implemented:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Protocol pluralism&lt;/strong&gt; in the product surface: Bolt/Cypher, REST, GraphQL, Qdrant-compatible gRPC, additive native gRPC (&lt;code&gt;README.md&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider pluralism&lt;/strong&gt; for model execution: local + external provider modes for embedding/rerank (&lt;code&gt;docs/operations/configuration.md&lt;/code&gt;, &lt;code&gt;pkg/server/server.go&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Compatibility-first stance (Neo4j + Qdrant workflows) keeps migration cost low.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tradeoff summary
&lt;/h2&gt;

&lt;p&gt;NornicDB’s stance is not “microservices are bad.”&lt;br&gt;&lt;br&gt;
It’s: &lt;strong&gt;for this workload, on this latency budget, boundary placement is a performance decision first&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If your top concern is strict per-stage org isolation, split services.&lt;/li&gt;
&lt;li&gt;If your top concern is single-digit-ms retrieval with simpler operations, co-location wins more often.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NornicDB chose co-location, then added mitigations to avoid common co-location failure modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;configurable per-DB policy&lt;/li&gt;
&lt;li&gt;runtime strategy adaptation&lt;/li&gt;
&lt;li&gt;compressed ANN for memory scale&lt;/li&gt;
&lt;li&gt;fail-open degradation paths&lt;/li&gt;
&lt;li&gt;future sharding trajectory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination is the architecture story:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;one deployable runtime today, with deliberate seams for scale tomorrow.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>vectordatabase</category>
      <category>latency</category>
      <category>scaling</category>
      <category>knowledgegraph</category>
    </item>
    <item>
      <title>Cutting Cypher Latency: Streaming Traversal and Query-Shape Specialization in NornicDB</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Thu, 26 Feb 2026 18:56:46 +0000</pubDate>
      <link>https://dev.to/orneryd/cutting-cypher-latency-streaming-traversal-and-query-shape-specialization-in-nornicdb-2j68</link>
      <guid>https://dev.to/orneryd/cutting-cypher-latency-streaming-traversal-and-query-shape-specialization-in-nornicdb-2j68</guid>
      <description>&lt;p&gt;Below are the headline numbers that motivated the execution model choices in NornicDB. They’re presented first so you can calibrate the rest of the post: the goal is not “benchmarks as marketing,” but to show the scale of the overhead we’re targeting and then explain where it comes from.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results at a glance (same hardware)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LDBC Social Network Benchmark (M3 Max, 64GB)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Type&lt;/th&gt;
&lt;th&gt;NornicDB&lt;/th&gt;
&lt;th&gt;Neo4j&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Message content lookup&lt;/td&gt;
&lt;td&gt;6,389 ops/sec&lt;/td&gt;
&lt;td&gt;518 ops/sec&lt;/td&gt;
&lt;td&gt;12×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recent messages (friends)&lt;/td&gt;
&lt;td&gt;2,769 ops/sec&lt;/td&gt;
&lt;td&gt;108 ops/sec&lt;/td&gt;
&lt;td&gt;25×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg friends per city&lt;/td&gt;
&lt;td&gt;4,713 ops/sec&lt;/td&gt;
&lt;td&gt;91 ops/sec&lt;/td&gt;
&lt;td&gt;52×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tag co-occurrence&lt;/td&gt;
&lt;td&gt;2,076 ops/sec&lt;/td&gt;
&lt;td&gt;65 ops/sec&lt;/td&gt;
&lt;td&gt;32×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Northwind Benchmark (M3 Max, 64GB)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;NornicDB&lt;/th&gt;
&lt;th&gt;Neo4j&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Index lookup&lt;/td&gt;
&lt;td&gt;7,623 ops/sec&lt;/td&gt;
&lt;td&gt;2,143 ops/sec&lt;/td&gt;
&lt;td&gt;3.6×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Count nodes&lt;/td&gt;
&lt;td&gt;5,253 ops/sec&lt;/td&gt;
&lt;td&gt;798 ops/sec&lt;/td&gt;
&lt;td&gt;6.6×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write: node&lt;/td&gt;
&lt;td&gt;5,578 ops/sec&lt;/td&gt;
&lt;td&gt;1,690 ops/sec&lt;/td&gt;
&lt;td&gt;3.3×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write: edge&lt;/td&gt;
&lt;td&gt;6,626 ops/sec&lt;/td&gt;
&lt;td&gt;1,611 ops/sec&lt;/td&gt;
&lt;td&gt;4.1×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Parser mode comparison (Northwind query suite)
&lt;/h3&gt;

&lt;p&gt;NornicDB supports two Cypher parser modes that can be switched at runtime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;⚡ nornic&lt;/strong&gt; (default): lightweight validation + direct execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🌳 antlr&lt;/strong&gt;: strict OpenCypher parsing + full parse tree (better diagnostics, higher overhead)&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;⚡ nornic&lt;/th&gt;
&lt;th&gt;🌳 antlr&lt;/th&gt;
&lt;th&gt;Slowdown&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Count all nodes&lt;/td&gt;
&lt;td&gt;3,272 hz&lt;/td&gt;
&lt;td&gt;45 hz&lt;/td&gt;
&lt;td&gt;73×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Count all relationships&lt;/td&gt;
&lt;td&gt;3,693 hz&lt;/td&gt;
&lt;td&gt;50 hz&lt;/td&gt;
&lt;td&gt;74×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Find customer by ID&lt;/td&gt;
&lt;td&gt;4,213 hz&lt;/td&gt;
&lt;td&gt;2,153 hz&lt;/td&gt;
&lt;td&gt;2×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products supplied by supplier&lt;/td&gt;
&lt;td&gt;4,023 hz&lt;/td&gt;
&lt;td&gt;53 hz&lt;/td&gt;
&lt;td&gt;76×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supplier→Category traversal&lt;/td&gt;
&lt;td&gt;3,225 hz&lt;/td&gt;
&lt;td&gt;22 hz&lt;/td&gt;
&lt;td&gt;147×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products with/without orders&lt;/td&gt;
&lt;td&gt;3,881 hz&lt;/td&gt;
&lt;td&gt;0.82 hz&lt;/td&gt;
&lt;td&gt;4,753×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create/delete relationship&lt;/td&gt;
&lt;td&gt;3,974 hz&lt;/td&gt;
&lt;td&gt;62 hz&lt;/td&gt;
&lt;td&gt;64×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Suite runtime:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Total time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;⚡ nornic&lt;/td&gt;
&lt;td&gt;17.5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🌳 antlr&lt;/td&gt;
&lt;td&gt;35.3s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Those deltas—especially the big outliers—are what this post is about: where does that overhead come from, and what changes when you design around it?&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with “general” execution pipelines
&lt;/h2&gt;

&lt;p&gt;Most mature databases follow a layered approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parse query text into a syntax tree&lt;/li&gt;
&lt;li&gt;Build a logical plan&lt;/li&gt;
&lt;li&gt;Optimize the plan (often cost-based)&lt;/li&gt;
&lt;li&gt;Produce a physical plan&lt;/li&gt;
&lt;li&gt;Execute the plan using a generic operator runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That architecture has real advantages: flexibility, correctness, and a framework for optimizing complex queries. But it also has costs that show up in production for common graph workloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row-by-row operator overhead&lt;/strong&gt; (Volcano-style pipelines) can dominate lightweight traversals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intermediate materialization&lt;/strong&gt; increases memory traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Object churn&lt;/strong&gt; and indirections increase GC pressure and cache misses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planning overhead&lt;/strong&gt; becomes noticeable when queries are small but frequent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many real-world graph applications—lookups, short traversals, neighborhood expansions, and simple aggregations—those overheads can outweigh the actual graph work.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we built: a hybrid engine with streaming fast paths
&lt;/h2&gt;

&lt;p&gt;NornicDB takes a hybrid approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;general Cypher engine&lt;/strong&gt; to support a wide set of queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized streaming executors&lt;/strong&gt; for common traversal + aggregation shapes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime-switchable parsing modes&lt;/strong&gt; to trade strictness/debuggability for throughput.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The default production mode favors minimal overhead in the hot path. For query shapes we know are common, we aim to fuse pattern matching and aggregation into tight loops and avoid expensive intermediate structures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stream-parse-execute (default mode)
&lt;/h3&gt;

&lt;p&gt;In the default “nornic” parser mode, the engine is designed around a stream-parse-execute approach. The intent is to avoid building heavy intermediate parse structures when we don’t need them, and to push execution decisions into a lightweight, shape-aware path.&lt;/p&gt;

&lt;p&gt;This is not a claim that NornicDB has “no planning” anywhere. The codebase still contains analysis artifacts and caching for specific features. The claim is narrower and more useful:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For common traversal and aggregation shapes, NornicDB bypasses generic logical-plan execution and uses pattern-specialized, single-pass streaming executors.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Strict parsing when you want it: ANTLR mode
&lt;/h3&gt;

&lt;p&gt;NornicDB also supports an ANTLR-based parser mode. This mode is stricter and provides better error reporting (line/column), which is valuable during development and debugging. It’s also more expensive: building full parse trees and walking them introduces overhead that can dominate certain query classes.&lt;/p&gt;

&lt;p&gt;That tradeoff is intentional. The same engine can run in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production mode&lt;/strong&gt; (lower overhead, practical throughput)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debug mode&lt;/strong&gt; (strict validation and better diagnostics)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why this model performs well
&lt;/h2&gt;

&lt;p&gt;Performance improvements come from removing layers of overhead on the path that matters most for many graph workloads: traversal + filter + aggregate.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Fused traversal and aggregation
&lt;/h3&gt;

&lt;p&gt;For eligible query shapes, NornicDB executes traversal and aggregation in a single pass. Instead of producing intermediate row sets and feeding them through multiple generic operators, the executor performs direct scans and aggregates as it traverses.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Streaming execution and early termination
&lt;/h3&gt;

&lt;p&gt;For a subset of query shapes, NornicDB’s execution can stream results and short-circuit work early—for example, when a query contains a LIMIT and the engine can stop once enough rows are produced.&lt;/p&gt;

&lt;p&gt;A precise statement is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Streaming traversal is real for optimized query classes, including LIMIT short-circuiting and selected no-materialization fast paths. This is shape-dependent, not universal for every Cypher query.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3) Fewer intermediate structures in hot paths
&lt;/h3&gt;

&lt;p&gt;The largest gains often come not from clever algorithms, but from not doing unnecessary work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avoiding full path materialization when only aggregates are needed&lt;/li&gt;
&lt;li&gt;Avoiding row-by-row operator dispatch&lt;/li&gt;
&lt;li&gt;Avoiding heavy parse trees in the production fast path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In traversal-heavy workloads, these effects compound.&lt;/p&gt;




&lt;h2&gt;
  
  
  A note on correctness: constraints and transactions
&lt;/h2&gt;

&lt;p&gt;Performance only matters if results are correct and operations are safe.&lt;/p&gt;

&lt;p&gt;NornicDB is not just a query interpreter. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema constraints&lt;/strong&gt; and validation logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit transaction control&lt;/strong&gt; (BEGIN / COMMIT / ROLLBACK)&lt;/li&gt;
&lt;li&gt;Storage-backed transaction handling for supported backends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A publication-safe way to state this is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;NornicDB enforces schema constraints and supports explicit storage-backed transactions, while also using optimized fast paths for eligible query shapes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The real tradeoff: hot-path query shape management
&lt;/h2&gt;

&lt;p&gt;The largest downside of shape-specialized execution isn’t performance—it’s organizational cost.&lt;/p&gt;

&lt;p&gt;Every optimized path has a lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detect and classify the shape reliably&lt;/li&gt;
&lt;li&gt;Implement an optimized executor&lt;/li&gt;
&lt;li&gt;Prove semantic equivalence with the general engine&lt;/li&gt;
&lt;li&gt;Add regression tests and performance baselines&lt;/li&gt;
&lt;li&gt;Keep it correct as Cypher features expand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is real management overhead, and historically it’s why many engines converge on generic operator runtimes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this tradeoff looks different now
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Historically, query-shape specialization has high human overhead. In an agent-driven world, the workload is more template-like, and agents can automate the specialization loop: mine top shapes, generate optimized executors, generate differential tests against a reference engine, and maintain coverage metrics. This shifts the work from manual tuning to automated verification and makes specialized execution economically viable again.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The key point isn’t that AI “writes the database for you.” It’s that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workloads become more template-like when generated by tools and agents.&lt;/li&gt;
&lt;li&gt;Specialization can be treated as a pipeline: observe → prioritize → implement → verify → measure.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What this model is best at (and what it’s not)
&lt;/h2&gt;

&lt;p&gt;This execution model shines when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queries are traversal-heavy and relatively structured&lt;/li&gt;
&lt;li&gt;Workloads are dominated by a small set of templates&lt;/li&gt;
&lt;li&gt;You care about low-latency and predictable performance&lt;/li&gt;
&lt;li&gt;Aggregations can be fused into traversal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s not designed to claim universal dominance in every Cypher edge case. There will always be queries where a deep optimizer and a fully generalized runtime are the right tools. NornicDB’s approach is to optimize what matters most and retain a general path for everything else.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;NornicDB’s execution model is a deliberate choice: remove overhead from the hot path by using streaming, shape-specialized executors for common Cypher patterns, while maintaining constraints and transactional boundaries.&lt;/p&gt;

&lt;p&gt;If you’re curious, the best way to evaluate these claims is to run the benchmarks and inspect which queries hit optimized paths versus fallback behavior. Performance claims only matter when engineers can reproduce them—and that’s the bar we’re aiming for.&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>database</category>
      <category>performance</category>
    </item>
    <item>
      <title>How I sped up HNSW construction ~2.7x</title>
      <dc:creator>TJ Sweet</dc:creator>
      <pubDate>Mon, 23 Feb 2026 14:33:56 +0000</pubDate>
      <link>https://dev.to/orneryd/how-i-sped-up-hnsw-construction-27x-2jhn</link>
      <guid>https://dev.to/orneryd/how-i-sped-up-hnsw-construction-27x-2jhn</guid>
      <description>&lt;h2&gt;
  
  
  HNSW Build Time at 1M Embeddings: 27 Minutes to 10 Minutes by Fixing Insertion Order
&lt;/h2&gt;

&lt;p&gt;For a 1M-embedding corpus, we reduced HNSW construction time from about 27 minutes to about 10 minutes (2.7x) without changing recall or graph quality.&lt;/p&gt;

&lt;p&gt;This post explains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the problem (where traversal work is wasted during construction),&lt;/li&gt;
&lt;li&gt;the solution (BM25-seeded insertion order),&lt;/li&gt;
&lt;li&gt;and the math behind the observed speedup.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All numbers in this writeup use the validated parameters from the current implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;M=16&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ef_construction=100&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;seed set size = &lt;code&gt;256 * 8 = 2,048&lt;/code&gt; nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Problem: Random insertion order creates traversal waste
&lt;/h2&gt;

&lt;p&gt;HNSW build quality and build cost both depend on insertion order. With random insertion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;early nodes form accidental local hubs,&lt;/li&gt;
&lt;li&gt;new nodes frequently enter a poor region first,&lt;/li&gt;
&lt;li&gt;greedy search spends extra distance evaluations before finding useful neighbors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That wasted traversal work compounds over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual A: Where random-order traversal waste comes from
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1i307tin9lpkg5so44g2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1i307tin9lpkg5so44g2.png" alt="Random Insertion Order" width="800" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In practice, this increases construction cost by a multiplicative overhead factor. I will call that factor &lt;code&gt;beta&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;build_time = ideal_time * beta
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ideal_time&lt;/code&gt; is the minimum cost if each insert reaches good neighbors with minimal detours,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;beta &amp;gt; 1&lt;/code&gt; captures wasted traversal and repair work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Baseline mechanics: why layer-0 dominates at 1M scale
&lt;/h2&gt;

&lt;p&gt;Using &lt;code&gt;M=16&lt;/code&gt;, the level distribution gives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;P(node at layer &amp;gt;= 1) = 1/M = 1/16 = 6.25%&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;so &lt;code&gt;93.75%&lt;/code&gt; of inserted nodes effectively do all meaningful work in layer 0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;code&gt;ef_construction=100&lt;/code&gt;, expected distance computations per insertion are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Upper layers: 0.067 * 100 * 16   =   107
Layer 0:     1.000 * 100 * 32    = 3,200
                                    -----
Total per insert                  = 3,307
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;code&gt;32&lt;/code&gt; above is &lt;code&gt;2*M&lt;/code&gt;, the layer-0 connection bound.)&lt;/p&gt;

&lt;p&gt;So the primary optimization target is not exotic upper-layer behavior; it is reducing layer-0 traversal waste during insertion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution: BM25-seeded insertion creates a backbone first
&lt;/h2&gt;

&lt;p&gt;Instead of random insertion order, we pick a lexically diverse seed set from BM25 and insert those vectors first.&lt;/p&gt;

&lt;p&gt;Seed extraction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take high-IDF terms,&lt;/li&gt;
&lt;li&gt;for each term, take top docs by term frequency,&lt;/li&gt;
&lt;li&gt;defaults: &lt;code&gt;NORNICDB_HNSW_LEXICAL_SEED_MAX_TERMS=256&lt;/code&gt;, &lt;code&gt;NORNICDB_HNSW_LEXICAL_SEED_PER_TERM=8&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;maximum seed set: &lt;code&gt;256 * 8 = 2,048&lt;/code&gt; nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;insert seed nodes first,&lt;/li&gt;
&lt;li&gt;insert remaining &lt;code&gt;N - seed_count&lt;/code&gt; nodes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives the graph a broad early backbone, so later inserts find useful neighbors quickly instead of wandering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual B: Seed-first construction reduces detours
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5x4cub1aggkg4t0ixbcu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5x4cub1aggkg4t0ixbcu.png" alt="Seed-first construction reduces detours" width="800" height="55"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The math check on the 27 -&amp;gt; 10 minute result
&lt;/h2&gt;

&lt;p&gt;Use a conservative distance-op estimate for 1024-dim &lt;code&gt;float32&lt;/code&gt; vectors in Go:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compute plus memory effects: about &lt;code&gt;160 ns&lt;/code&gt; per distance operation in this workload class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then ideal floor for 1M insertions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ideal_time
= 1,000,000 * 3,307 * 160 ns
= 529 s
= 8.8 min
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now map measured times to &lt;code&gt;beta&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;beta_random = 27 / 8.8 = 3.07
beta_seeded = 10 / 8.8 = 1.13
speedup     = beta_random / beta_seeded
            = 3.07 / 1.13
            = 2.72x ~= 2.7x
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the key point: the reported speedup is exactly what you expect if seeded order mostly removes traversal waste.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual C: Overhead factor (&lt;code&gt;beta&lt;/code&gt;) before vs after
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;beta (lower is better)

Random order   | ############################### 3.07
Seeded order   | ###########                     1.13
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Interpretation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;random build spends about &lt;code&gt;3.07x&lt;/code&gt; the ideal work,&lt;/li&gt;
&lt;li&gt;seeded build is close to the floor at about &lt;code&gt;1.13x&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Time decomposition for the 1M run
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total build time decomposition (minutes)

Case            Ideal floor   Overhead   Total
Random order      8.8          18.2      27.0
Seeded order      8.8           1.2      10.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Visual D: Same floor, different overhead
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Random order | [#########.................][##################] 27.0
Seeded order | [#########.................][#]                  10.0
               ideal floor (8.8 min)       overhead
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both runs share the same algorithmic floor; the difference is how much overhead is paid while traversing and wiring the graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this does not require a recall tradeoff
&lt;/h2&gt;

&lt;p&gt;This change does not reduce &lt;code&gt;ef_construction&lt;/code&gt;, &lt;code&gt;M&lt;/code&gt;, or search-time quality knobs. It changes insertion order so the builder spends less effort reaching good neighborhoods.&lt;/p&gt;

&lt;p&gt;That is why a large build-time gain can occur without reducing recall or graph quality: the graph is built with the same target connectivity constraints, but with less wasted traversal on the way there.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to reproduce in your environment
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Keep HNSW params fixed (&lt;code&gt;M&lt;/code&gt;, &lt;code&gt;ef_construction&lt;/code&gt; unchanged).&lt;/li&gt;
&lt;li&gt;Build once with seeding disabled (or effectively zero seed set).&lt;/li&gt;
&lt;li&gt;Build once with defaults:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;NORNICDB_HNSW_LEXICAL_SEED_MAX_TERMS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;256
&lt;span class="nv"&gt;NORNICDB_HNSW_LEXICAL_SEED_PER_TERM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Log and compare:&lt;/li&gt;
&lt;li&gt;total build time,&lt;/li&gt;
&lt;li&gt;insertion throughput,&lt;/li&gt;
&lt;li&gt;recall on a fixed validation query set.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Use ratio as the primary cross-machine signal. Absolute minutes depend on CPU, memory bandwidth, cache behavior, and runtime effects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Secondary effect: same seed mechanism helps k-means init
&lt;/h2&gt;

&lt;p&gt;The same BM25-derived seed mechanism is also used by &lt;code&gt;bm25+kmeans++&lt;/code&gt; seed mode for centroid initialization. That improves initial centroid spread and typically reduces convergence iterations in the k-means phase.&lt;/p&gt;

&lt;p&gt;The important architectural detail is reuse: one seed extraction pass supports both HNSW construction order and k-means initialization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The 27-to-10 minute result is not a tuning artifact. It is a direct consequence of reducing traversal waste during graph construction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep the same quality parameters,&lt;/li&gt;
&lt;li&gt;improve insertion geometry,&lt;/li&gt;
&lt;li&gt;move &lt;code&gt;beta&lt;/code&gt; from about &lt;code&gt;3.07&lt;/code&gt; to about &lt;code&gt;1.13&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 1M scale, this is enough to produce a repeatable 2.7x build-time improvement while preserving result quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/orneryd/NornicDB" rel="noopener noreferrer"&gt;https://github.com/orneryd/NornicDB&lt;/a&gt;&lt;/p&gt;

</description>
      <category>vectordatabase</category>
      <category>hnsw</category>
      <category>rag</category>
      <category>indexing</category>
    </item>
  </channel>
</rss>
