Building Production RAG Systems in .NET 10: The Complete Guide to Embeddings
The Hallucination Problem
Your company spent $50K building an internal chatbot. It tells customers "yes, we ship internationally" when you only ship to the US. Your support team is drowning in corrections.
Sound familiar?
This happens because traditional LLMs generate responses from training data patterns, not your actual data. They hallucinate. They confidently state false information.
RAG (Retrieval-Augmented Generation) fixes this. Instead of hoping the LLM knows about your data, you explicitly feed it your documents first.
What Are Embeddings?
Think of embeddings as a way to convert text into mathematics.
The Simple Version
Text: "The quick brown fox"
↓
Embedding (float array, 1536 dimensions)
[0.234, -0.156, 0.892, ..., 0.421]
↓
This vector captures semantic meaning
Why Vectors Matter
Two sentences with different words can have similar embeddings if they mean the same thing:
Sentence A: "Our Q3 revenue exceeded $5 million"
Embedding A: [0.234, -0.156, 0.892, ...]
Sentence B: "Q3 generated more than $5M in sales"
Embedding B: [0.235, -0.154, 0.894, ...]
← Very similar! The model understands they mean the same thing.
But this completely different sentence:
Sentence C: "I like coffee"
Embedding C: [0.892, 0.234, -0.156, ...]
← Very different vector! Different meaning.
This is how RAG systems find relevant documents by meaning, not just keyword matches.
The RAG Pipeline in .NET 10
Step 1: Generate Embeddings from Your Documents
// In .NET 10 with Microsoft.Extensions.AI
public class DocumentEmbedder
{
private readonly EmbeddingsClient _embeddingClient;
private readonly VectorStore _vectorStore;
public DocumentEmbedder(EmbeddingsClient client, VectorStore store)
{
_embeddingClient = client;
_vectorStore = store;
}
// Embed your documents once
public async Task IndexDocumentsAsync(List<string> documents)
{
var embeddings = await _embeddingClient.GenerateAsync(documents);
var vectors = embeddings.Value.Select((e, i) => new VectorDocument
{
Id = Guid.NewGuid().ToString(),
Content = documents[i],
Vector = e.Vector.ToArray(),
Metadata = new { Source = "DocumentLibrary" }
}).ToList();
await _vectorStore.UpsertAsync(vectors);
}
}
Key point: You embed documents once and store them. Embeddings are deterministic—same document = same vector, every time.
Step 2: When User Asks, Search Semantically
public class RAGResponseGenerator
{
private readonly VectorStore _vectorStore;
private readonly EmbeddingsClient _embeddingClient;
private readonly ChatClient _chatClient;
public async Task<string> AnswerAsync(string userQuestion)
{
// 1. Embed the question
var queryEmbedding = await _embeddingClient
.GenerateAsync(new[] { userQuestion });
// 2. Search vector database for similar documents
var relevantDocs = await _vectorStore.SearchAsync(
vector: queryEmbedding.Value[0].Vector.ToArray(),
topK: 5,
threshold: 0.7 // Similarity score
);
// 3. Build context from relevant documents
var context = string.Join("\n\n", relevantDocs
.Select(d => $"Source: {d.Metadata["Source"]}\n{d.Content}"));
// 4. Generate response grounded in real data
var response = await _chatClient.CompleteAsync(
new ChatMessage(ChatRole.System,
"You are a helpful assistant. Answer using ONLY the provided context. " +
"If the context doesn't contain the answer, say 'I don't have that information.'"),
new ChatMessage(ChatRole.User, $"Context:\n{context}\n\nQuestion: {userQuestion}")
);
return response.Content[0].Text;
}
}
Real-World Use Cases
1. Enterprise Document Search
Problem: "Find all contracts where we agreed to 30-day payment terms"
Keyword search fails. It finds "30 days" but also matches "30-day warranty" in unrelated docs.
RAG solution:
// Semantic search understands intent
var searchResults = await _vectorStore.SearchAsync(
query: "payment terms agreements",
topK: 20
);
// Returns contracts actually discussing payment terms
// Not just keyword matches
2. Customer Support Automation
Problem: Support tickets are repetitive. Your FAQ is massive.
RAG solution:
public class SupportChatbot
{
public async Task<string> AnswerSupportQuestionAsync(string question)
{
// Search FAQ, past tickets, knowledge base
var relevantArticles = await _vectorStore.SearchAsync(
query: question,
filter: new { Type = "FaqOrTicket" }
);
// Generate response from actual support history
var response = await _chatClient.CompleteAsync(
context: relevantArticles,
prompt: $"Customer asks: {question}"
);
return response;
}
}
Result: Consistent answers based on real support history, not hallucinated solutions.
3. Technical Documentation Assistant
Problem: Your API docs are 500 pages. Developers give up.
RAG solution:
// "How do I paginate API results?"
// Search finds: Authentication docs, Pagination section, Examples
// Returns: Exactly what the developer needs
var docSearch = await _vectorStore.SearchAsync(
query: "pagination API results",
filter: new { DocumentType = "ApiDocs" },
topK: 3
);
4. Code Analysis & Documentation
Problem: Onboarding takes weeks. New devs can't find relevant code examples.
RAG solution:
public class CodebaseAssistant
{
// Embed your entire codebase
// "Show me examples of dependency injection usage"
var examples = await _codeVectorStore.SearchAsync(
query: "dependency injection usage examples",
topK: 10
);
// Returns actual code from your repo
}
DO's and DON'Ts for RAG in .NET
✅ DO
- Chunk documents smartly. 512-1024 token chunks work best. Too small = lost context. Too large = expensive embeddings.
var chunks = ChunkDocument(doc, chunkSize: 512, overlap: 100);
- Store metadata. Source, date, version - makes results traceable.
var vector = new VectorDocument
{
Content = text,
Vector = embedding,
Metadata = new { Source = "SalesReport", Date = DateTime.Now }
};
- Monitor similarity scores. Not all search results are good results.
var results = await vectorStore.SearchAsync(query, topK: 5);
var confident = results.Where(r => r.SimilarityScore > 0.75);
- Regenerate embeddings when documents change significantly.
❌ DON'T
- Embed raw PDFs. Extract text first. Preserve structure.
// Bad
var embedding = await client.GenerateAsync(pdfBytes);
// Good
var text = ExtractTextFromPdf(pdf);
var embedding = await client.GenerateAsync(text);
- Trust low similarity scores. If your search returns 0.45 relevance, it's basically random.
// Bad: Use anything over 0.5
// Good: Use results > 0.7, fall back to "I don't know"
Use outdated embeddings for new documents. Inconsistent results.
Forget about cost. Embedding a million documents is expensive. Plan your chunk strategy.
Vector Databases for .NET
| Database | .NET Support | Best For | Cost |
|---|---|---|---|
| Azure Cosmos DB | ✅ Native | Enterprise, serverless | $$$ |
| Azure OpenAI | ✅ Built-in | Quick start, OpenAI models | $$$ |
| pgvector (PostgreSQL) | ✅ Npgsql | Self-hosted, low cost | $ |
| Milvus | ✅ Community | Open source, scalable | $ |
| Pinecone | ✅ REST API | Managed, serverless | $$ |
Minimal Example with Cosmos DB
services.AddAzureOpenAIClient(endpoint, credentials);
services.AddScoped<VectorStoreCosmosDb>();
services.AddScoped<EmbeddingsClient>();
// Dependency injection handles the rest
Measuring RAG Quality
Retrieval Metrics
Precision: Of top-5 results, how many are relevant?
var relevant = searchResults.Count(r => r.IsRelevant);
var precision = relevant / searchResults.Count;
// Target: > 0.8
Recall: Of all relevant documents, did we find them?
var foundRelevant = relevantDocuments
.Count(d => searchResults.Contains(d));
var recall = foundRelevant / totalRelevantDocuments;
// Target: > 0.7
Conclusion
RAG eliminates hallucinations by grounding AI in your actual data.
Key Takeaways:
- Embeddings = Text as math. They capture semantic meaning.
- RAG pipeline = Search → Feed → Generate. Find relevant docs, include them, answer based on reality.
- .NET 10 + Microsoft.Extensions.AI makes this native and simple.
- Vector databases store and search embeddings at scale.
- Production-ready requires chunking strategy, metadata, similarity thresholds.
Next Steps:
- Review Generative AI for Beginners .NET v2 - Lesson 3 covers RAG
- Choose your vector database (start with pgvector for simplicity)
- Extract and chunk your documents
- Build your first RAG pipeline
Resources
- Microsoft.Extensions.AI Docs
- Generative AI for Beginners .NET v2
- RAG vs Fine-tuning
- pgvector for PostgreSQL
- Azure OpenAI Embeddings
What's your biggest question about RAG? Drop it below!
Top comments (0)