The Boring Debug Checklist That Fixes Most “RAG Failures”

#rag #dataengineering #llm #architecture

Introduction
There’s a quiet lie in the RAG world:
“When retrieval breaks, upgrade the model.”
Except… almost every RAG failure I’ve seen had nothing to do with the model.
The failure was upstream, invisible, unmonitored, and extremely boring.
If you debug RAG in the wrong order, you’ll waste weeks tuning embeddings while your ingestion pipeline is silently betraying you.
So here’s the practical, battle-tested checklist to debug RAG before you blame the model.

1. Ingestion Check (the silent killer)
You should diff extraction outputs weekly.
If the text structure changes even slightly, everything downstream shifts:
• Table formats
• Nested headings
• OCR quirks
• Export tool differences
• Collapsed HTML
Ingestion drift is the #1 cause of sudden retrieval collapse.

2. Chunking Check (where retrieval truly breaks)
Chunking looks trivial. In practice it's fragile:
• Boundary drift
• Overlap variance
• Mid-sentence cuts
• Format inconsistencies
• Mixed Markdown/HTML/PDF segment logic
Everything in retrieval depends on stable chunks.
Version your segmenter like you version code.

3. Metadata Check
If metadata is wrong, stale, missing, or flattened —
retrieval becomes misleading even if embeddings are perfect.
Check:
• doc IDs
• hierarchy
• section labels
• timestamps
• type tags

4. Embedding Check
Most embedding issues come from inconsistency, not quality:
• Mixed model versions
• Vector norm drift
• Partial corpus updates
• Tokenization mismatch
• Hidden characters in text
Drift in vector space is easy to detect. Most teams never look.

5. Retrieval Config Check
Defaults are landmines.
Tune:
• top-k
• similarity metric
• MMR / diversity
• filters
• hybrid search weights
A tiny change in config often fixes “unrecoverable” retrieval problems.

6. Eval Sanity Check
Never tune without a ground-truth eval set.
Otherwise you’re chasing randomness.

Conclusion: Debug Before You Blame
Models are rarely the problem.
RAG is a system and systems fail in predictable, boring ways.
This checklist has saved more retrieval pipelines than prompt tuning ever has.

DEV Community

The Boring Debug Checklist That Fixes Most “RAG Failures”

Top comments (0)