Quick Reference: Terms You'll Encounter
Technical Acronyms:
- RAG: Retrieval-Augmented Generation—enhancing LLM responses with retrieved context
- BERT: Bidirectional Encoder Representations from Transformers—foundational language model
- MRR: Mean Reciprocal Rank—average of reciprocal ranks of first relevant result
- nDCG: Normalized Discounted Cumulative Gain—measures ranking quality with position weighting
- API: Application Programming Interface—programmatic service access
Statistical & Mathematical Terms:
- Precision@K: Proportion of top K results that are relevant
- Recall@K: Proportion of all relevant items found in top K
- Latency: Time from query to response (typically measured in milliseconds)
- Throughput: Queries processed per second
Introduction: Why First-Pass Retrieval Isn't Enough
Imagine you're hiring for a senior engineering role. You receive 500 resumes.
Stage 1 (Initial Screening): HR quickly scans each resume for keywords—"Python," "distributed systems," "5+ years." This takes 30 seconds per resume. They pass 50 candidates forward. Fast but imprecise—some great candidates with unusual backgrounds get filtered out, and some keyword-stuffers slip through.
Stage 2 (Deep Evaluation): The hiring manager spends 10 minutes with each of the 50 resumes, reading project descriptions, evaluating career progression, checking for red flags. Much more accurate, but you couldn't do this for all 500.
This is exactly how two-stage retrieval works. The first stage (bi-encoder) is fast but approximate. The second stage (cross-encoder reranker) is slow but precise. Together, they deliver both speed and accuracy.
Here's another analogy: First-pass retrieval is a metal detector; reranking is the archaeologist's brush. The detector quickly finds where to dig. The brush carefully reveals what's actually there. Using only the detector means missing subtle finds. Using only the brush means spending years searching the wrong field.
A third way to think about it: Retrieval is Google's first page; reranking is actually reading the articles. The ranking algorithm gets you close, but clicking through and reading determines what's truly relevant to your specific question.
Bi-Encoders vs Cross-Encoders: The Core Trade-off
python
Top comments (0)