WonderLab

Posted on May 5

RAG Series (7): Retrieval Strategies — How to Find the Most Relevant Content

#rag #ai #mmr #llm

Why Are Retrieval Strategies Important?

In the first six articles, we covered document chunking, embedding generation, and vector storage. Now suppose a user asks: "What are the best practices for Python asynchronous programming?"

Your vector database has 100,000 documents. The most naive approach is: do a similarity search and return the Top-K most similar documents.

But problems arise:

Problem 1: Result duplication. The 5 returned articles might all talk about asyncio, with none covering aiohttp or practical pitfalls.
Problem 2: Low-quality results mixed in. The 5th article might be semantically somewhat related, but actually discusses Go's concurrency model — useless for Python users.
Problem 3: Queries with explicit conditions. The user asked for "articles about Python in 2024", but pure vector retrieval completely ignores the "2024" time constraint.

This article compares 4 retrieval strategies to help you solve these problems.

Four Retrieval Strategies at a Glance

Strategy	Core Idea	Problem Solved	Best For
Similarity Search	Sort by vector similarity	Basic retrieval	General use
MMR	Balance relevance & diversity	Result duplication	Multi-angle answers needed
Threshold Filtering	Only keep high-similarity results	Low-quality mixing in	Quality over quantity
Self-Query	Parse query to generate filters	Explicit conditions in query	Time/category constraints

Experiment Environment

We use 10 technical blog articles as test data, each with metadata (year, category, tags):

The source code is at the end of the article.

[
  {"title": "Python Async Programming: From asyncio to aiohttp", "year": 2024, "category": "Backend"},
  {"title": "2024 Python Performance Optimization Guide", "year": 2024, "category": "Backend"},
  {"title": "JavaScript Async Programming: Promise and async/await", "year": 2023, "category": "Frontend"},
  {"title": "2023 Frontend Framework Comparison: React vs Vue vs Angular", "year": 2023, "category": "Frontend"},
  {"title": "Go Microservices: gRPC and Kubernetes", "year": 2024, "category": "Backend"},
  {"title": "Rust Systems Programming: Memory Safety and Zero-Cost Abstractions", "year": 2023, "category": "Systems"},
  {"title": "Python Machine Learning: From NumPy to PyTorch", "year": 2024, "category": "AI"},
  {"title": "2024 Cloud Native Trends: Service Mesh and eBPF", "year": 2024, "category": "Cloud Native"},
  {"title": "Database Selection Guide: PostgreSQL vs MySQL vs MongoDB", "year": 2023, "category": "Database"},
  {"title": "Python Web Scraping: Scrapy vs Playwright", "year": 2024, "category": "Backend"}
]

Query: "Python asynchronous programming"

Strategy 1: Similarity Search

Principle

The most basic retrieval method. Convert the query text to a vector, find the K most similar documents in the vector store.

results = vectorstore.similarity_search("Python async programming", k=4)

Experiment Results

Retrieved 4 documents, covering 3 categories:

Rank	Year	Category	Title
1	2024	Cloud Native	2024 Cloud Native Trends: Service Mesh and eBPF
2	2023	Frontend	2023 Frontend Framework Comparison: React vs Vue vs Angular
3	2024	Backend	Python Web Scraping: Scrapy vs Playwright
4	2024	Backend	2024 Python Performance Optimization Guide

Analysis

✅ Simple and direct, one line of code
❌ Results concentrated in few categories (Backend appears twice)
❌ May miss content from other relevant angles

Note: The top result is a "Cloud Native" article, which seems counter-intuitive. This is because the BGE model considers this article semantically related (both involve "technology trends" and "services"), but for humans it's clearly not precise enough. This is exactly why multiple strategies should be combined.

Strategy 2: MMR (Maximum Marginal Relevance)

Principle

MMR's core formula:

MMR = λ × Sim(query, di) - (1-λ) × max(Sim(di, dj))

First term: Relevance between document di and the query (larger is better)
Second term: Similarity between document di and already selected documents (smaller is better, ensures diversity)
λ (lambda_mult): Balance parameter, 0.5 means equal weight for relevance and diversity

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "lambda_mult": 0.5, "fetch_k": 20},
)

fetch_k=20 means first select from 20 candidates, then use MMR to pick 4. Larger candidate pools yield better diversity.

Experiment Results

Retrieved 4 documents, covering 4 categories:

Rank	Year	Category	Title
1	2024	Cloud Native	2024 Cloud Native Trends: Service Mesh and eBPF
2	2024	Backend	Python Web Scraping: Scrapy vs Playwright
3	2023	Systems	Rust Systems Programming: Memory Safety and Zero-Cost Abstractions
4	2023	Database	Database Selection Guide: PostgreSQL vs MySQL vs MongoDB

Comparison Analysis

Metric	Similarity Search	MMR
Categories covered	3	4
Category list	Backend, Cloud Native, Frontend	Backend, Cloud Native, Systems, Database
Characteristic	Concentrated in few categories	More dispersed, more diverse

MMR Parameter Tuning

# Only pursue relevance
search_kwargs={"k": 4, "lambda_mult": 1.0}  # Equivalent to similarity search

# Only pursue diversity
search_kwargs={"k": 4, "lambda_mult": 0.0}  # Results may not be very relevant

# Balance both (recommended)
search_kwargs={"k": 4, "lambda_mult": 0.5, "fetch_k": 20}

Strategy 3: Similarity Threshold Filtering

Principle

Only keep results whose similarity score (distance) exceeds a threshold; discard everything below.

Important: Chroma returns distance, not similarity scores. Smaller distance means more similar.

# First check distance distribution
results_with_score = vectorstore.similarity_search_with_score(query, k=10)
for doc, score in results_with_score:
    print(f"distance={score:.4f} | {doc.metadata['title']}")

Distance Distribution (Measured)

distance=0.8652 | 2024 Cloud Native Trends: Service Mesh and eBPF
distance=0.8764 | 2023 Frontend Framework Comparison: React vs Vue vs Angular
distance=0.8833 | Python Web Scraping: Scrapy vs Playwright
distance=0.8857 | 2024 Python Performance Optimization Guide
distance=0.8906 | Python Machine Learning: From NumPy to PyTorch
distance=0.9019 | Rust Systems Programming: Memory Safety and Zero-Cost Abstractions
distance=0.9024 | Python Async Programming: From asyncio to aiohttp
distance=0.9145 | JavaScript Async Programming: Promise and async/await
distance=0.9147 | Database Selection Guide: PostgreSQL vs MySQL vs MongoDB
distance=0.9481 | Go Microservices: gRPC and Kubernetes

Manual Threshold Filtering

threshold = 0.89
filtered = [(doc, score) for doc, score in results_with_score if score <= threshold]
# Result: 4 documents (first 4 with distance <= 0.89)

Analysis

✅ Can filter out obviously irrelevant results (Go article at distance 0.9481)
⚠️ Threshold needs experimentation: too high = nothing returned, too low = no filtering effect
💡 Recommendation: Run a batch of queries to see distance distribution, then set the threshold

Strategy 4: Self-Query (Query Parsing + Metadata Filtering)

Principle

User queries often aren't pure semantic questions — they come with explicit conditions:

"Articles about Python in 2024" → year=2024, tags=Python
"Backend development category articles" → category=Backend
"Frontend-related articles from 2023" → year=2023, category=Frontend

Self-Query core flow:

Natural language query → Parser → Structured filter conditions → Metadata filtering → Vector retrieval

Parser Implementation

Production environments can use LLM (like LangChain's SelfQueryRetriever) for parsing; here we use a rule-based parser to demonstrate the core logic:

def parse_query(query: str) -> dict:
    filters = {}
    semantic = query

    # Extract year
    if match := re.search(r'(20\d{2})\s*', query):
        filters["year"] = int(match.group(1))

    # Extract category
    for cat in ["Backend", "Frontend", "Systems", ...]:
        if cat in query:
            filters["category"] = cat

    # Extract tags
    for tag in ["Python", "JavaScript", "Go", ...]:
        if tag in query:
            filters["tags"] = tag

    return {"semantic_query": semantic, "filters": filters}

Experiment Results

Query 1: "Articles about Python in 2024"

Parse result:
  Semantic query: Python
  Filter conditions: {'year': 2024, 'tags': 'Python'}
After metadata filtering: 4 documents remain:
  - Python Async Programming: From asyncio to aiohttp
  - 2024 Python Performance Optimization Guide
  - Python Machine Learning: From NumPy to PyTorch
  - Python Web Scraping: Scrapy vs Playwright

Query 2: "Backend development category articles"

Parse result:
  Semantic query: Backend development
  Filter conditions: {'category': 'Backend'}
After metadata filtering: 4 documents remain:
  - Python Async Programming: From asyncio to aiohttp
  - 2024 Python Performance Optimization Guide
  - Go Microservices: gRPC and Kubernetes
  - Python Web Scraping: Scrapy vs Playwright

Query 3: "Frontend-related articles from 2023"

Parse result:
  Semantic query: Frontend
  Filter conditions: {'year': 2023}
After metadata filtering: 4 documents remain:
  - JavaScript Async Programming: Promise and async/await
  - 2023 Frontend Framework Comparison: React vs Vue vs Angular
  - Rust Systems Programming: Memory Safety and Zero-Cost Abstractions
  - Database Selection Guide: PostgreSQL vs MySQL vs MongoDB

Analysis

✅ Precisely responds to users' explicit conditions (time, category, tags)
✅ Filter first, then retrieve — dramatically reduces the scope of vector comparisons
⚠️ Parser quality determines effectiveness (rule-based vs LLM parsing)

Using LLM Parser in Production

from langchain.retrievers.self_query.base import SelfQueryRetriever

self_query_retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents="Technical blog articles",
    metadata_field_info=[...],  # Define metadata fields
)
results = self_query_retriever.invoke("Articles about Python in 2024")

Note: In LangChain 1.2.16 community packages, the module location of SelfQueryRetriever may vary. Please adjust the import path according to your installed version.

Four Strategies Comparison Summary

Strategy	Best For	Core Parameters	Caveats
Similarity Search	General use, highest relevance	`k`	Results may duplicate
MMR	Multi-angle answers needed	`lambda_mult`, `fetch_k`	Parameters need tuning
Threshold Filtering	Quality over quantity	`score_threshold`	Needs experimentation
Self-Query	Queries with explicit conditions	Parser quality	Rule-based or LLM parser

Combined Usage Recommendation

In real production environments, combining strategies works best:

User query
    ↓
Self-Query parsing → Metadata filtering (narrow scope)
    ↓
Vector retrieval → MMR (ensure diversity)
    ↓
Threshold filtering (remove low-quality)
    ↓
Top-K results → LLM generates answer

# Combined example
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,
        "lambda_mult": 0.5,
        "fetch_k": 50,
        "filter": {"year": 2024, "category": "Backend"}  # Self-Query parsed conditions
    }
)

Complete Code

The complete code for this article is open-sourced at:

https://github.com/chendongqi/llm-in-action/tree/main/07-retrieval-strategies

Core files:

retrieval_strategies.py — Complete comparison experiment of four retrieval strategies
data/sample_articles.json — 10 test articles data

Summary

This article compared 4 retrieval strategies through code experiments:

Similarity Search — Simple and direct, good for general scenarios
MMR — Uses λ parameter to balance relevance and diversity, solves result duplication
Threshold Filtering — Sets thresholds through distance distribution, filters out low-quality results
Self-Query — Parses natural language into structured filter conditions, precisely responds to constrained queries

Key Insight: There is no best retrieval strategy, only the one most suitable for the current query. Combining Self-Query + MMR + threshold filtering is how you build a retrieval system that is both precise and comprehensive.

DEV Community

RAG Series (7): Retrieval Strategies — How to Find the Most Relevant Content

Why Are Retrieval Strategies Important?

Four Retrieval Strategies at a Glance

Experiment Environment

Strategy 1: Similarity Search

Principle

Experiment Results

Analysis

Strategy 2: MMR (Maximum Marginal Relevance)

Principle

Experiment Results

Comparison Analysis

MMR Parameter Tuning

Strategy 3: Similarity Threshold Filtering

Principle

Distance Distribution (Measured)

Manual Threshold Filtering

Analysis

Strategy 4: Self-Query (Query Parsing + Metadata Filtering)

Principle

Parser Implementation

Experiment Results

Analysis

Using LLM Parser in Production

Four Strategies Comparison Summary

Combined Usage Recommendation

Complete Code

Summary

References

Top comments (0)