When Retrieval Returns Nothing
Your RAG system works perfectly in testing. You feed it documents, run queries, get relevant chunks back. Deploy to production and suddenly 40% of user queries return empty results — not bad results, literally nothing. The retriever finds zero documents. The LLM falls back to "I don't have enough information to answer that."
This doesn't happen in tutorials because they use clean, preprocessed datasets. Production data arrives messy, inconsistent, and structurally unpredictable.
The problem is embedding normalization drift. During development, you probably embedded your document corpus with consistent preprocessing — lowercase, whitespace normalized, maybe some punctuation stripping. But user queries arrive raw. When query preprocessing doesn't match document preprocessing, cosine similarity tanks.
Here's what actually breaks:
python
import numpy as np
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Document embedded during ingestion (preprocessed)
doc_text = "the quick brown fox jumps over the lazy dog"
doc_embedding = model.encode(doc_text, normalize_embeddings=True)
# User query arrives with different capitalization
query_text = "The Quick Brown Fox Jumps Over The Lazy Dog"
query_embedding = model.encode(query_text, normalize_embeddings=True)
# Cosine similarity
similarity = np.dot(doc_embedding, query_embedding)
---
*Continue reading the full article on [TildAlice](https://tildalice.io/rag-pipeline-failures-3-production-issues/)*
Top comments (0)