You swap to a better model. Still wrong answers. You tune your prompt. Still hallucinations. You increase the temperature — no, lower it — still garbage. Sound familiar?
Here are three failure modes I hit repeatedly while building a RAG API in .NET:
The confident wrong answer. The LLM states a fact with full certainty. The document says the opposite. You look at the retrieved chunk — it was cut in the middle of a sentence, and the half that made it into context was the setup, not the conclusion.
The "I don't know" on an obvious question. The user asks something the document clearly answers. The LLM shrugs. You trace it: the exact answer spans the last two words of chunk N and the first sentence of chunk N+1. Neither chunk scores high enough on its own to make the retrieval cut.
The bloated non-answer. The LLM returns 400 words of vague summary when the user needed a number. The retrieved chunk was an entire page. There were five relevant sentences in it and 900 tokens of noise.
The LLM isn't the problem. The chunks are.
Root Cause: Chunk Boundaries Define Retrieval Quality
In RAG, the pipeline works like this: you embed each chunk into a vector, store those vectors, and at query time you find the chunks most similar to the user's question. The LLM never sees your document — it sees only the chunks you hand it.
This means every embedding is only as good as the text it encodes. And the text it encodes is defined entirely by where you drew the chunk boundaries.
Too large: the chunk contains the answer buried in irrelevant context. The embedding drifts toward the noise. Token cost spikes. The LLM has to wade through padding to find the signal.
Too small: the answer spans two chunks. Each chunk, alone, doesn't capture enough meaning to rank high. Both miss the similarity threshold. The answer is never retrieved.
Wrong boundary: you cut mid-sentence. The embedding captures a dangling clause, not a complete thought. Semantic similarity breaks down.
The defaults in this project — ChunkSize: 1000 characters, ChunkOverlap: 200 — are a starting point, not gospel:
// src/RagApi.Application/Models/DocumentProcessingOptions.cs
public class DocumentProcessingOptions
{
public string DefaultChunkingStrategy { get; set; } = "Fixed";
public int ChunkSize { get; set; } = 1000; // characters
public int ChunkOverlap { get; set; } = 200;
}
But the number that matters more than the size is how you draw the boundaries. That's what the three strategies below address.
The Pipeline in One Paragraph
Before diving into strategies, here's where chunking lives in the full upload flow. DocumentService.UploadDocumentAsync runs four sequential steps: extract text from the raw file (PDF, DOCX, TXT, Markdown), chunk the text using the selected strategy, generate embeddings for every chunk, and upsert those embeddings into the vector store. Chunking is step 2 — everything after it depends on getting step 2 right.
// Step 1: Extract text from document
var text = await _documentProcessor.ExtractTextAsync(fileStream, contentType, cancellationToken);
// Step 2: Chunk the text
var chunks = _documentProcessor.ChunkText(document.Id, text, chunkingOptions);
// Step 3: Generate embeddings for all chunks
var embeddings = await _embeddingService.GenerateEmbeddingsAsync(chunkTexts, cancellationToken);
// Step 4: Store chunks in vector database
await _vectorStore.UpsertChunksAsync(_workspaceContext.Current.CollectionName, chunks, cancellationToken);
Now let's look at each strategy.
Strategy 1: Fixed-Size With Paragraph-Aware Overlap
Good for: mixed document corpora, financial reports, legal docs, anything where you don't know the structure in advance.
The "fixed" in the name is slightly misleading. This strategy doesn't blindly slice at character N. It splits at paragraph boundaries first, then accumulates paragraphs until adding the next paragraph would exceed ChunkSize. At that point it saves the current chunk and begins the next one with ChunkOverlap characters carried over from the tail of the previous chunk.
// src/RagApi.Infrastructure/DocumentProcessing/DocumentProcessor.cs
private static List<DocumentChunk> ChunkByFixed(Guid documentId, string text, ChunkingOptions options)
{
var chunks = new List<DocumentChunk>();
var paragraphs = Regex.Split(text, options.SeparatorPattern)
.Where(p => !string.IsNullOrWhiteSpace(p))
.ToList();
var currentChunk = new StringBuilder();
var chunkIndex = 0;
foreach (var paragraph in paragraphs)
{
if (currentChunk.Length > 0 &&
currentChunk.Length + paragraph.Length > options.ChunkSize)
{
chunks.Add(CreateChunk(documentId, currentChunk.ToString().Trim(), chunkIndex++, ...));
// Start new chunk with overlap
var overlapText = GetOverlapText(currentChunk.ToString(), options.ChunkOverlap);
currentChunk.Clear();
currentChunk.Append(overlapText);
}
currentChunk.AppendLine(paragraph);
}
// ... flush final chunk
return chunks;
}
The overlap is word-boundary aware — GetOverlapText finds the first space after text.Length - overlapSize rather than slicing at a raw character index. This prevents a chunk starting with "...ompany reported a record" when it should start with "company reported a record".
Tradeoffs:
- ✅ Predictable token budget — you know your maximum chunk size
- ✅ Works on any document type without structure assumptions
- ✅ Overlap means a sentence at a chunk boundary still has context in the next chunk
- ❌ A single paragraph larger than
ChunkSizewill still be split mid-paragraph - ❌ Overlap is character-based, not semantic — the carried-over text might not be the most relevant part
When to use: your default for any document corpus where you don't know the structure in advance. Switch to one of the targeted strategies once you know what you're ingesting.
Strategy 2: Sentence-Based
Good for: factual Q&A over research papers, product manuals, FAQs — text where the answer to a question is typically one or two complete sentences.
The key insight is that embedding quality peaks when the encoded text is a complete thought. A sentence is the smallest unit of complete meaning. This strategy splits on .!? boundaries, accumulates sentences until the next sentence would overflow ChunkSize, and carries the last sentence of the previous chunk into the next one as overlap.
// src/RagApi.Infrastructure/DocumentProcessing/DocumentProcessor.cs
private static List<DocumentChunk> ChunkBySentence(Guid documentId, string text, ChunkingOptions options)
{
var sentences = Regex.Split(text, @"(?<=[.!?])\s+")
.Select(s => s.Trim())
.Where(s => !string.IsNullOrWhiteSpace(s))
.ToList();
var currentChunk = new StringBuilder();
var lastSentence = string.Empty; // one-sentence overlap
foreach (var sentence in sentences)
{
if (currentChunk.Length > 0 &&
currentChunk.Length + sentence.Length + 1 > options.ChunkSize)
{
chunks.Add(CreateChunk(documentId, currentChunk.ToString().Trim(), chunkIndex++, ...));
// Start next chunk with the last sentence as overlap
currentChunk.Clear();
if (!string.IsNullOrWhiteSpace(lastSentence))
currentChunk.Append(lastSentence).Append(' ');
}
lastSentence = sentence;
currentChunk.Append(sentence).Append(' ');
}
// ... flush final chunk
return chunks;
}
One-sentence overlap means the answer sentence is never the very first token of a chunk with no preceding context. The chunk before it and the chunk after it both have at least one connecting sentence.
Tradeoffs:
- ✅ Embeddings capture complete semantic units — cosine similarity is more reliable
- ✅ Best retrieval precision for direct factual questions
- ❌ Chunk sizes vary wildly — a three-word sentence and a 200-word sentence get equal weight
- ❌ The regex splitter breaks on abbreviations:
"Mr. Smith arrived"becomes two sentences. Same for"e.g.","i.e.", decimal numbers. Good enough for most corpora; not production-grade for scientific text
When to use: FAQ documents, product manuals, research papers, anything dense with discrete facts where users ask direct questions.
Strategy 3: Paragraph-Based
Good for: well-structured prose — internal wikis, policy PDFs, documentation sites — where a paragraph is a coherent topic unit.
This is the simplest strategy: split on blank lines, make each paragraph exactly one chunk, no size cap.
// src/RagApi.Infrastructure/DocumentProcessing/DocumentProcessor.cs
private static List<DocumentChunk> ChunkByParagraph(Guid documentId, string text)
{
var paragraphs = Regex.Split(text, @"\n\n|\r\n\r\n")
.Select(p => p.Trim())
.Where(p => !string.IsNullOrWhiteSpace(p))
.ToList();
var chunks = new List<DocumentChunk>();
var position = 0;
for (int i = 0; i < paragraphs.Count; i++)
{
var para = paragraphs[i];
chunks.Add(CreateChunk(documentId, para, i, position, position + para.Length));
position += para.Length + 2;
}
return chunks;
}
No size cap is intentional. The paragraph boundary is the semantic boundary. Imposing an artificial size limit would require introducing mid-paragraph cuts, which is exactly the failure mode we're trying to avoid. You accept variable sizes in exchange for zero mid-thought cuts.
Tradeoffs:
- ✅ Most semantically coherent chunks
- ✅ Zero mid-thought cuts — every chunk is a complete idea
- ❌ Sizes vary wildly — a one-liner and a 2000-word section get equal treatment
- ❌ Very large paragraphs overflow the LLM context window. There is no safety size cap in this implementation — something to add if you're ingesting documents with monster paragraphs
When to use: well-structured prose where the author already did the work of organizing information into coherent blocks.
Decision Table
| Strategy | Boundary | Overlap | Best for | Watch out for |
|---|---|---|---|---|
| Fixed | Paragraph | Character (word-safe) | Mixed/unknown docs, legal | Long single paragraphs |
| Sentence | Sentence .!?
|
Last sentence | Factual Q&A, manuals, research | Abbreviations, lists |
| Paragraph | Blank line | None | Structured prose, wikis, policy | Huge paragraphs, no size cap |
How to Choose
The decision is simpler than it looks:
-
Don't know your document structure? → use
Fixed. It's the safe default and handles the widest range of inputs. -
Users ask specific factual questions? → use
Sentence. Precision beats coverage for Q&A workloads. -
Documents are well-structured prose where paragraphs are deliberate? → use
Paragraph. Let the author's structure do the work. -
Mixing document types across a workspace? → use
Fixedas the default, and override per upload using thechunkingStrategyparameter.
That last point is important. Every upload can override the default strategy without touching the server config:
// DocumentService.UploadDocumentAsync signature
public async Task<Document> UploadDocumentAsync(
Stream fileStream,
string fileName,
string contentType,
List<string>? tags = null,
ChunkingStrategy? chunkingStrategy = null, // override per upload
CancellationToken cancellationToken = default)
Pass ChunkingStrategy.Sentence for a product manual, ChunkingStrategy.Paragraph for a policy doc, and null (uses config default) for everything else. The strategy is resolved at call time — no restart required.
Where This Lives
All three strategies are implemented in DocumentProcessor.cs in the open-source dotnet-rag-api project — a full RAG API built on .NET 8 with Clean Architecture, Qdrant for vector storage, and support for OpenAI, Azure OpenAI, or local Ollama as the AI provider.
The IDocumentProcessor interface is decoupled from the vector store: you can run it against Qdrant Cloud, Azure AI Search, or a local Qdrant instance and the chunking logic doesn't change. The same three strategies work regardless of which backend you use for embeddings and retrieval.
If you're hitting retrieval quality issues, look at your chunks before you look at your model.
Top comments (0)