<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Архипов Владимир</title>
    <description>The latest articles on DEV Community by Архипов Владимир (@__6c821c6f).</description>
    <link>https://dev.to/__6c821c6f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2636851%2Fdcbdd3d5-a099-4a8e-b1e2-c994036bc30c.png</url>
      <title>DEV Community: Архипов Владимир</title>
      <link>https://dev.to/__6c821c6f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/__6c821c6f"/>
    <language>en</language>
    <item>
      <title>Stop Calling External Vector DBs for Every RAG Query: A .NET Embedded Alternative</title>
      <dc:creator>Архипов Владимир</dc:creator>
      <pubDate>Fri, 05 Jun 2026 12:45:58 +0000</pubDate>
      <link>https://dev.to/__6c821c6f/stop-calling-external-vector-dbs-for-every-rag-query-a-net-embedded-alternative-2l9d</link>
      <guid>https://dev.to/__6c821c6f/stop-calling-external-vector-dbs-for-every-rag-query-a-net-embedded-alternative-2l9d</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Latency comparison included | When embedded wins | When cloud still wins
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You have a .NET 8 application. You need RAG (Retrieval-Augmented Generation). Your first instinct is to spin up Pinecone, Qdrant, or Weaviate.&lt;/p&gt;

&lt;p&gt;They work. But they add:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;15–50 ms per query just for network roundtrips

Serialization overhead (JSON/gRPC)

An extra distributed system to monitor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For some applications, this is fine.&lt;/p&gt;

&lt;p&gt;For others — high-frequency agent loops, real-time copilots, edge devices, on-prem compliance — it's a dealbreaker.&lt;/p&gt;

&lt;p&gt;So I built an embedded vector database that runs entirely inside your .NET process: VectorRAG.Net.&lt;/p&gt;

&lt;p&gt;No network. No external service. Microsecond latency.&lt;/p&gt;

&lt;p&gt;But I'm not going to tell you it's always better. Let's be honest about when it wins, and when it doesn't.&lt;br&gt;
🔍 What Is VectorRAG.Net?&lt;/p&gt;

&lt;p&gt;VectorRAG.Net is a .NET 8+ library that implements a full RAG pipeline inside your application:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Vector search (ANN via LSH + SIMD reranking)

Hybrid search (vector + BM25)

Automatic chunking

Metadata filtering

Snapshot persistence (save/load to file)

Runtime metrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;NuGet: VectorRAG.Net&lt;br&gt;
GitHub: &lt;a href="https://github.com/likeslines-maker/VectorRAG.Net" rel="noopener noreferrer"&gt;https://github.com/likeslines-maker/VectorRAG.Net&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You create an instance and start adding documents — no separate database, no containers, no API keys (for the DB part).&lt;br&gt;
📊 Honest Benchmarks: Embedded vs Cloud vs Others&lt;/p&gt;

&lt;p&gt;Test environment: Windows 11, Intel Core i5-11400F, .NET 8, 10k vectors, dim=64 (synthetic for reproducibility)&lt;br&gt;
Operation   VectorRAG.Net   Pinecone / Qdrant   Other .NET libs&lt;br&gt;
Vector search (TopK=5)  15 μs  8-15 ms 0.5-5 ms&lt;br&gt;
Hybrid search   117 μs 10-20 ms (limited support)  rarely exists&lt;br&gt;
Network overhead    0   10-40 ms    0&lt;br&gt;
Allocations per query   5.7 KB  50-200 KB (JSON)    variable&lt;br&gt;
Automatic chunking  built-in    you implement   rare&lt;br&gt;
Offline capable ✅ ❌ varies&lt;/p&gt;

&lt;p&gt;Real-world takeaway: For dim=768 (real embeddings), multiply VectorRAG.Net latency by 3-5x — still ~50-150 microseconds. That's 100-300x faster than cloud roundtrips.&lt;/p&gt;

&lt;p&gt;But speed isn't everything. Here's when each approach actually makes sense.&lt;br&gt;
✅ When Embedded RAG (VectorRAG.Net) Wins&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;High-Frequency Retrieval Loops&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Scenario: An AI agent that performs 10-50 retrieval steps per user request.&lt;/p&gt;

&lt;p&gt;Cloud math: 30 steps × 20 ms = 600 ms just for retrieval.&lt;br&gt;
Embedded math: 30 steps × 0.1 ms = 3 ms.&lt;/p&gt;

&lt;p&gt;User experience difference: unusable vs snappy.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Air-Gapped / Offline Systems&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Real-world examples:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Defense and intelligence systems (data cannot leave the process)

Avionics (no network in flight)

Medical devices (offline compliance)

Industrial SCADA (network segmentation for safety)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Cloud vector DBs are simply not allowed here.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Edge Devices &amp;amp; Desktop Apps&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You ship a single .exe to a customer's laptop or an IoT device. No Docker. No cloud dependency. The database must be inside your binary or a local file.&lt;/p&gt;

&lt;p&gt;Embedded RAG works. Cloud doesn't.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deterministic Latency Requirements&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cloud introduces jitter: 10 ms, then 200 ms, then timeout. Embedded gives you predictable microseconds because there's no network.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prototyping &amp;amp; Testing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You want to iterate fast — change chunk size, adjust LSH parameters, rebuild the index. With cloud, you're cleaning collections via API, waiting for consistency. With embedded, you recreate the database in milliseconds.&lt;br&gt;
❌ When Cloud Vector DBs (Pinecone/Qdrant/Weaviate) Still Win&lt;/p&gt;

&lt;p&gt;Let's be fair. Embedded is not always better.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Multi-Tenant Shared Knowledge Bases&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You have one index that needs to be shared across 50 microservices written in different languages (Python, Go, Java, .NET). A centralized cloud vector DB is the right tool.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Massive Scale (Billions of Vectors)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Embedded solutions scale vertically — one machine's RAM. If you need 10 billion vectors, you need distributed sharding, replication, and horizontal scaling. That's cloud territory.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Teams Without .NET Expertise&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your data team knows Python, your infra team knows Go. Forcing them to touch a .NET library for retrieval is unnecessary friction. Stick with a cloud API.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Managed HA and Backups&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don't want to worry about snapshot schedules, disk corruption, or DR. Cloud DBs handle this for you.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Already Invested in Cloud Infrastructure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're deep in AWS/GCP/Azure and your whole stack is serverless, adding an in-process .NET library might break your architectural patterns.&lt;br&gt;
🔄 Hybrid Approach That Actually Works&lt;/p&gt;

&lt;p&gt;You don't have to choose one. Many teams use both:&lt;br&gt;
Use Case    Solution&lt;br&gt;
Prototyping / local dev Embedded (VectorRAG.Net)&lt;br&gt;
Production low-latency agent loops  Embedded (in the agent process)&lt;br&gt;
Shared knowledge base across teams  Cloud (Pinecone/Qdrant)&lt;br&gt;
Batch analytics / offline jobs  Embedded (on the compute cluster)&lt;/p&gt;

&lt;p&gt;Same code, different backend. VectorRAG.Net's persistence format is just a file — you can move snapshots between environments.&lt;br&gt;
🚀 Quick Start (Full Example)&lt;/p&gt;

&lt;p&gt;Here's everything you need to get running.&lt;br&gt;
Install&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet add package VectorRAG.Net &lt;span class="nt"&gt;--version&lt;/span&gt; 0.1.17
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Minimal Working Example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;SlidingRank.FastOps&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;VectorRAG.Net&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// 1. Configure LSH&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;lshConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;EmbeddingLshConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bands&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BitsPerBand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MaxCandidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Create database (1536-dim embeddings)&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorRAGDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lshConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lshConfig&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Add your embedding model (OpenAI example)&lt;/span&gt;
&lt;span class="n"&gt;IEmbeddingModel&lt;/span&gt; &lt;span class="n"&gt;embedder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;OpenAIEmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"sk-..."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 4. Add a document (auto-chunking)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UpsertTextDocumentAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;externalId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"doc_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Your long document text here..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;DocumentMetadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Department&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Support"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;embedder&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 5. Search&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;queryVector&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateEmbeddingAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"How to reset password?"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queryVector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;SearchOptions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;TopK&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;Console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;$"&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;F2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;[..&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No external database. No containers. Just code.&lt;br&gt;
📈 Performance Deep Dive&lt;br&gt;
Why It's Fast&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No serialization — vectors stay as float[] in memory

SIMD reranking — exact distance calculation uses CPU vector instructions

LSH candidate filtering — only rerank ~2000 candidates, not all vectors

Pooled arrays — minimal GC pressure in hot path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Benchmark Numbers (dim=1536, real embeddings)&lt;br&gt;
Operation   Latency (mean)  Allocations&lt;br&gt;
Vector search (10k vectors) 180 μs 18 KB&lt;br&gt;
Hybrid search (BM25 + vector)   450 μs 32 KB&lt;br&gt;
Insert single vector    12 μs  4 KB&lt;br&gt;
Save snapshot (10k vectors) 120 ms  8 MB&lt;/p&gt;

&lt;p&gt;Throughput: ~5,500 vector searches per second per core.&lt;br&gt;
Scaling Limits&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memory: ~8 bytes per dimension per vector + overhead. 1M vectors at 1536-dim = ~12 GB RAM.

Search time: O(log N) for LSH candidates + O(candidates) for reranking. 10M vectors → ~2-3 ms.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;🔁 Real-World Agent Loop Example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;ResearchAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;++)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Embed + search in ~200 microseconds&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;queryVec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_embedder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateEmbeddingAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queryVec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;SearchOptions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;TopK&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="c1"&gt;// Generate next question&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;nextQuestion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;$"Context: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;\nWhat to ask next?"&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nextQuestion&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;$"Synthesize: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Latency per iteration: ~200 μs for retrieval + LLM call.&lt;br&gt;
Cloud equivalent per iteration: ~20 ms for retrieval + LLM call.&lt;/p&gt;

&lt;p&gt;For 5 iterations, that's 1 ms vs 100 ms of pure retrieval overhead.&lt;br&gt;
🧩 Integration with ASP.NET Core&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Program.cs&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddSingleton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sp&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;lshConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;EmbeddingLshConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorRAGDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lshConfig&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"vectors.vdb"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;LoadAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"vectors.vdb"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Controller&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ApiController&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SearchController&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ControllerBase&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;HttpPost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"search"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IActionResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;FromBody&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;SearchRequest&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_embedder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateEmbeddingAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;SearchOptions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;TopK&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TopK&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📦 Persistence: Snapshots, Not a Database Engine&lt;/p&gt;

&lt;p&gt;VectorRAG.Net doesn't run a separate storage engine. It saves snapshots:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SaveAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"knowledge_base.vdb"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// Save&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;db2&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorRAGDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lshConfig&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;LoadAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"knowledge_base.vdb"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Load elsewhere&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is intentionally simple. You manage:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Snapshot frequency (every hour, every day)

Backup strategy (copy files)

Versioning (keep multiple snapshots)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;No schema migrations. No connection strings. No replication config.&lt;br&gt;
🆚 Comparison Table: Embedded vs Cloud vs Local DB&lt;br&gt;
Criterion   VectorRAG.Net   Pinecone/Qdrant SQLite + Vector&lt;br&gt;
Latency (p50)   50-200 μs  10-20 ms    5-15 ms&lt;br&gt;
Network required    No  Yes No&lt;br&gt;
Hybrid search   Yes (BM25)  Limited DIY&lt;br&gt;
Chunking    Built-in    DIY DIY&lt;br&gt;
Multi-language support  .NET only   All All&lt;br&gt;
Horizontal scaling  No (vertical only)  Yes No&lt;br&gt;
Snapshot management You manage  Built-in    You manage&lt;br&gt;
Memory footprint    ~12GB/1M vectors (1536-dim) Cloud-managed   ~15GB/1M vectors&lt;br&gt;
Offline capable ✅ ❌ ✅&lt;br&gt;
🎯 Final Verdict&lt;/p&gt;

&lt;p&gt;Use VectorRAG.Net when:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You need sub-millisecond retrieval

You're building high-frequency agent loops

Your system must run offline or air-gapped

You're shipping a desktop or edge .NET application

You want to avoid network and serialization overhead

You're prototyping and need fast iteration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Use cloud vector DBs when:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You have multi-language services sharing one index

You're at billion-vector scale with horizontal sharding

You don't want to manage snapshots and backups

Your team doesn't work primarily in .NET
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Use both when:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prototype with embedded, deploy cloud for shared indices

Edge devices use embedded, central knowledge uses cloud
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;📚 Resources&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NuGet: https://www.nuget.org/packages/VectorRAG.Net

GitHub (benchmarks &amp;amp; examples): https://github.com/likeslines-maker/VectorRAG.Net

Documentation: See GitHub README
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;💬 Let's Discuss&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Have you hit latency walls with external vector DBs?

Are you building agent loops that need microsecond retrieval?

Would an embedded approach work for your use case?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Drop a comment below. I'm happy to share implementation details, benchmark scripts, or help with integration.&lt;/p&gt;

&lt;p&gt;Tags: dotnet, csharp, ai, rag, performance, vector-database&lt;/p&gt;

</description>
      <category>ai</category>
      <category>csharp</category>
      <category>vectordatabase</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
