Latency comparison included | When embedded wins | When cloud still wins
You have a .NET 8 application. You need RAG (Retrieval-Augmented Generation). Your first instinct is to spin up Pinecone, Qdrant, or Weaviate.
They work. But they add:
15–50 ms per query just for network roundtrips
Serialization overhead (JSON/gRPC)
An extra distributed system to monitor
For some applications, this is fine.
For others — high-frequency agent loops, real-time copilots, edge devices, on-prem compliance — it's a dealbreaker.
So I built an embedded vector database that runs entirely inside your .NET process: VectorRAG.Net.
No network. No external service. Microsecond latency.
But I'm not going to tell you it's always better. Let's be honest about when it wins, and when it doesn't.
🔍 What Is VectorRAG.Net?
VectorRAG.Net is a .NET 8+ library that implements a full RAG pipeline inside your application:
Vector search (ANN via LSH + SIMD reranking)
Hybrid search (vector + BM25)
Automatic chunking
Metadata filtering
Snapshot persistence (save/load to file)
Runtime metrics
NuGet: VectorRAG.Net
GitHub: https://github.com/likeslines-maker/VectorRAG.Net
You create an instance and start adding documents — no separate database, no containers, no API keys (for the DB part).
📊 Honest Benchmarks: Embedded vs Cloud vs Others
Test environment: Windows 11, Intel Core i5-11400F, .NET 8, 10k vectors, dim=64 (synthetic for reproducibility)
Operation VectorRAG.Net Pinecone / Qdrant Other .NET libs
Vector search (TopK=5) 15 μs 8-15 ms 0.5-5 ms
Hybrid search 117 μs 10-20 ms (limited support) rarely exists
Network overhead 0 10-40 ms 0
Allocations per query 5.7 KB 50-200 KB (JSON) variable
Automatic chunking built-in you implement rare
Offline capable ✅ ❌ varies
Real-world takeaway: For dim=768 (real embeddings), multiply VectorRAG.Net latency by 3-5x — still ~50-150 microseconds. That's 100-300x faster than cloud roundtrips.
But speed isn't everything. Here's when each approach actually makes sense.
✅ When Embedded RAG (VectorRAG.Net) Wins
- High-Frequency Retrieval Loops
Scenario: An AI agent that performs 10-50 retrieval steps per user request.
Cloud math: 30 steps × 20 ms = 600 ms just for retrieval.
Embedded math: 30 steps × 0.1 ms = 3 ms.
User experience difference: unusable vs snappy.
- Air-Gapped / Offline Systems
Real-world examples:
Defense and intelligence systems (data cannot leave the process)
Avionics (no network in flight)
Medical devices (offline compliance)
Industrial SCADA (network segmentation for safety)
Cloud vector DBs are simply not allowed here.
- Edge Devices & Desktop Apps
You ship a single .exe to a customer's laptop or an IoT device. No Docker. No cloud dependency. The database must be inside your binary or a local file.
Embedded RAG works. Cloud doesn't.
- Deterministic Latency Requirements
Cloud introduces jitter: 10 ms, then 200 ms, then timeout. Embedded gives you predictable microseconds because there's no network.
- Prototyping & Testing
You want to iterate fast — change chunk size, adjust LSH parameters, rebuild the index. With cloud, you're cleaning collections via API, waiting for consistency. With embedded, you recreate the database in milliseconds.
❌ When Cloud Vector DBs (Pinecone/Qdrant/Weaviate) Still Win
Let's be fair. Embedded is not always better.
- Multi-Tenant Shared Knowledge Bases
You have one index that needs to be shared across 50 microservices written in different languages (Python, Go, Java, .NET). A centralized cloud vector DB is the right tool.
- Massive Scale (Billions of Vectors)
Embedded solutions scale vertically — one machine's RAM. If you need 10 billion vectors, you need distributed sharding, replication, and horizontal scaling. That's cloud territory.
- Teams Without .NET Expertise
Your data team knows Python, your infra team knows Go. Forcing them to touch a .NET library for retrieval is unnecessary friction. Stick with a cloud API.
- Managed HA and Backups
You don't want to worry about snapshot schedules, disk corruption, or DR. Cloud DBs handle this for you.
- Already Invested in Cloud Infrastructure
If you're deep in AWS/GCP/Azure and your whole stack is serverless, adding an in-process .NET library might break your architectural patterns.
🔄 Hybrid Approach That Actually Works
You don't have to choose one. Many teams use both:
Use Case Solution
Prototyping / local dev Embedded (VectorRAG.Net)
Production low-latency agent loops Embedded (in the agent process)
Shared knowledge base across teams Cloud (Pinecone/Qdrant)
Batch analytics / offline jobs Embedded (on the compute cluster)
Same code, different backend. VectorRAG.Net's persistence format is just a file — you can move snapshots between environments.
🚀 Quick Start (Full Example)
Here's everything you need to get running.
Install
dotnet add package VectorRAG.Net --version 0.1.17
Minimal Working Example
using SlidingRank.FastOps;
using VectorRAG.Net;
// 1. Configure LSH
var lshConfig = new EmbeddingLshConfig(Bands: 24, BitsPerBand: 12, MaxCandidates: 2048);
// 2. Create database (1536-dim embeddings)
var db = new VectorRAGDatabase(dimension: 1536, lshConfig: lshConfig);
// 3. Add your embedding model (OpenAI example)
IEmbeddingModel embedder = new OpenAIEmbeddingModel(apiKey: "sk-...");
// 4. Add a document (auto-chunking)
await db.UpsertTextDocumentAsync(
externalId: "doc_001",
text: "Your long document text here...",
metadata: new DocumentMetadata { Department = "Support" },
embeddingModel: embedder
);
// 5. Search
var queryVector = await embedder.GenerateEmbeddingAsync("How to reset password?");
var results = db.Search(queryVector, new SearchOptions { TopK = 5 });
foreach (var result in results)
Console.WriteLine($"{result.Score:F2}: {result.Text[..100]}");
That's it. No external database. No containers. Just code.
📈 Performance Deep Dive
Why It's Fast
No serialization — vectors stay as float[] in memory
SIMD reranking — exact distance calculation uses CPU vector instructions
LSH candidate filtering — only rerank ~2000 candidates, not all vectors
Pooled arrays — minimal GC pressure in hot path
Benchmark Numbers (dim=1536, real embeddings)
Operation Latency (mean) Allocations
Vector search (10k vectors) 180 μs 18 KB
Hybrid search (BM25 + vector) 450 μs 32 KB
Insert single vector 12 μs 4 KB
Save snapshot (10k vectors) 120 ms 8 MB
Throughput: ~5,500 vector searches per second per core.
Scaling Limits
Memory: ~8 bytes per dimension per vector + overhead. 1M vectors at 1536-dim = ~12 GB RAM.
Search time: O(log N) for LSH candidates + O(candidates) for reranking. 10M vectors → ~2-3 ms.
🔁 Real-World Agent Loop Example
public async Task<string> ResearchAsync(string topic)
{
var findings = new List<string>();
for (int step = 0; step < 5; step++)
{
// Embed + search in ~200 microseconds
var queryVec = await _embedder.GenerateEmbeddingAsync(topic);
var results = _db.Search(queryVec, new SearchOptions { TopK = 3 });
// Generate next question
var nextQuestion = await _llm.GenerateAsync(
$"Context: {results[0].Text}\nWhat to ask next?"
);
topic = nextQuestion;
}
return await _llm.GenerateAsync($"Synthesize: {string.Join("\n", findings)}");
}
Latency per iteration: ~200 μs for retrieval + LLM call.
Cloud equivalent per iteration: ~20 ms for retrieval + LLM call.
For 5 iterations, that's 1 ms vs 100 ms of pure retrieval overhead.
🧩 Integration with ASP.NET Core
// Program.cs
builder.Services.AddSingleton(sp =>
{
var lshConfig = new EmbeddingLshConfig(24, 12, 2048);
var db = new VectorRAGDatabase(1536, lshConfig);
if (File.Exists("vectors.vdb"))
db.LoadAsync("vectors.vdb").Wait();
return db;
});
// Controller
[ApiController]
public class SearchController : ControllerBase
{
[HttpPost("search")]
public async Task<IActionResult> Search([FromBody] SearchRequest req)
{
var vector = await _embedder.GenerateEmbeddingAsync(req.Query);
var results = _db.Search(vector, new SearchOptions { TopK = req.TopK });
return Ok(results);
}
}
📦 Persistence: Snapshots, Not a Database Engine
VectorRAG.Net doesn't run a separate storage engine. It saves snapshots:
await db.SaveAsync("knowledge_base.vdb"); // Save
var db2 = new VectorRAGDatabase(1536, lshConfig);
await db2.LoadAsync("knowledge_base.vdb"); // Load elsewhere
This is intentionally simple. You manage:
Snapshot frequency (every hour, every day)
Backup strategy (copy files)
Versioning (keep multiple snapshots)
No schema migrations. No connection strings. No replication config.
🆚 Comparison Table: Embedded vs Cloud vs Local DB
Criterion VectorRAG.Net Pinecone/Qdrant SQLite + Vector
Latency (p50) 50-200 μs 10-20 ms 5-15 ms
Network required No Yes No
Hybrid search Yes (BM25) Limited DIY
Chunking Built-in DIY DIY
Multi-language support .NET only All All
Horizontal scaling No (vertical only) Yes No
Snapshot management You manage Built-in You manage
Memory footprint ~12GB/1M vectors (1536-dim) Cloud-managed ~15GB/1M vectors
Offline capable ✅ ❌ ✅
🎯 Final Verdict
Use VectorRAG.Net when:
You need sub-millisecond retrieval
You're building high-frequency agent loops
Your system must run offline or air-gapped
You're shipping a desktop or edge .NET application
You want to avoid network and serialization overhead
You're prototyping and need fast iteration
Use cloud vector DBs when:
You have multi-language services sharing one index
You're at billion-vector scale with horizontal sharding
You don't want to manage snapshots and backups
Your team doesn't work primarily in .NET
Use both when:
Prototype with embedded, deploy cloud for shared indices
Edge devices use embedded, central knowledge uses cloud
📚 Resources
NuGet: https://www.nuget.org/packages/VectorRAG.Net
GitHub (benchmarks & examples): https://github.com/likeslines-maker/VectorRAG.Net
Documentation: See GitHub README
💬 Let's Discuss
Have you hit latency walls with external vector DBs?
Are you building agent loops that need microsecond retrieval?
Would an embedded approach work for your use case?
Drop a comment below. I'm happy to share implementation details, benchmark scripts, or help with integration.
Tags: dotnet, csharp, ai, rag, performance, vector-database
Top comments (0)