DEV Community

Cover image for RAG Series (7): Retrieval Strategies — How to Find the Most Relevant Content
WonderLab
WonderLab

Posted on

RAG Series (7): Retrieval Strategies — How to Find the Most Relevant Content

Why Are Retrieval Strategies Important?

In the first six articles, we covered document chunking, embedding generation, and vector storage. Now suppose a user asks: "What are the best practices for Python asynchronous programming?"

Your vector database has 100,000 documents. The most naive approach is: do a similarity search and return the Top-K most similar documents.

But problems arise:

  • Problem 1: Result duplication. The 5 returned articles might all talk about asyncio, with none covering aiohttp or practical pitfalls.
  • Problem 2: Low-quality results mixed in. The 5th article might be semantically somewhat related, but actually discusses Go's concurrency model — useless for Python users.
  • Problem 3: Queries with explicit conditions. The user asked for "articles about Python in 2024", but pure vector retrieval completely ignores the "2024" time constraint.

This article compares 4 retrieval strategies to help you solve these problems.


Four Retrieval Strategies at a Glance

Strategy Core Idea Problem Solved Best For
Similarity Search Sort by vector similarity Basic retrieval General use
MMR Balance relevance & diversity Result duplication Multi-angle answers needed
Threshold Filtering Only keep high-similarity results Low-quality mixing in Quality over quantity
Self-Query Parse query to generate filters Explicit conditions in query Time/category constraints

Experiment Environment

We use 10 technical blog articles as test data, each with metadata (year, category, tags):

The source code is at the end of the article.

[
  {"title": "Python Async Programming: From asyncio to aiohttp", "year": 2024, "category": "Backend"},
  {"title": "2024 Python Performance Optimization Guide", "year": 2024, "category": "Backend"},
  {"title": "JavaScript Async Programming: Promise and async/await", "year": 2023, "category": "Frontend"},
  {"title": "2023 Frontend Framework Comparison: React vs Vue vs Angular", "year": 2023, "category": "Frontend"},
  {"title": "Go Microservices: gRPC and Kubernetes", "year": 2024, "category": "Backend"},
  {"title": "Rust Systems Programming: Memory Safety and Zero-Cost Abstractions", "year": 2023, "category": "Systems"},
  {"title": "Python Machine Learning: From NumPy to PyTorch", "year": 2024, "category": "AI"},
  {"title": "2024 Cloud Native Trends: Service Mesh and eBPF", "year": 2024, "category": "Cloud Native"},
  {"title": "Database Selection Guide: PostgreSQL vs MySQL vs MongoDB", "year": 2023, "category": "Database"},
  {"title": "Python Web Scraping: Scrapy vs Playwright", "year": 2024, "category": "Backend"}
]
Enter fullscreen mode Exit fullscreen mode

Query: "Python asynchronous programming"


Strategy 1: Similarity Search

Principle

The most basic retrieval method. Convert the query text to a vector, find the K most similar documents in the vector store.

results = vectorstore.similarity_search("Python async programming", k=4)
Enter fullscreen mode Exit fullscreen mode

Experiment Results

Retrieved 4 documents, covering 3 categories:

Rank Year Category Title
1 2024 Cloud Native 2024 Cloud Native Trends: Service Mesh and eBPF
2 2023 Frontend 2023 Frontend Framework Comparison: React vs Vue vs Angular
3 2024 Backend Python Web Scraping: Scrapy vs Playwright
4 2024 Backend 2024 Python Performance Optimization Guide

Analysis

  • ✅ Simple and direct, one line of code
  • ❌ Results concentrated in few categories (Backend appears twice)
  • ❌ May miss content from other relevant angles

Note: The top result is a "Cloud Native" article, which seems counter-intuitive. This is because the BGE model considers this article semantically related (both involve "technology trends" and "services"), but for humans it's clearly not precise enough. This is exactly why multiple strategies should be combined.


Strategy 2: MMR (Maximum Marginal Relevance)

Principle

MMR's core formula:

MMR = λ × Sim(query, di) - (1-λ) × max(Sim(di, dj))
Enter fullscreen mode Exit fullscreen mode
  • First term: Relevance between document di and the query (larger is better)
  • Second term: Similarity between document di and already selected documents (smaller is better, ensures diversity)
  • λ (lambda_mult): Balance parameter, 0.5 means equal weight for relevance and diversity
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "lambda_mult": 0.5, "fetch_k": 20},
)
Enter fullscreen mode Exit fullscreen mode

fetch_k=20 means first select from 20 candidates, then use MMR to pick 4. Larger candidate pools yield better diversity.

Experiment Results

Retrieved 4 documents, covering 4 categories:

Rank Year Category Title
1 2024 Cloud Native 2024 Cloud Native Trends: Service Mesh and eBPF
2 2024 Backend Python Web Scraping: Scrapy vs Playwright
3 2023 Systems Rust Systems Programming: Memory Safety and Zero-Cost Abstractions
4 2023 Database Database Selection Guide: PostgreSQL vs MySQL vs MongoDB

Comparison Analysis

Metric Similarity Search MMR
Categories covered 3 4
Category list Backend, Cloud Native, Frontend Backend, Cloud Native, Systems, Database
Characteristic Concentrated in few categories More dispersed, more diverse

MMR Parameter Tuning

# Only pursue relevance
search_kwargs={"k": 4, "lambda_mult": 1.0}  # Equivalent to similarity search

# Only pursue diversity
search_kwargs={"k": 4, "lambda_mult": 0.0}  # Results may not be very relevant

# Balance both (recommended)
search_kwargs={"k": 4, "lambda_mult": 0.5, "fetch_k": 20}
Enter fullscreen mode Exit fullscreen mode

Strategy 3: Similarity Threshold Filtering

Principle

Only keep results whose similarity score (distance) exceeds a threshold; discard everything below.

Important: Chroma returns distance, not similarity scores. Smaller distance means more similar.

# First check distance distribution
results_with_score = vectorstore.similarity_search_with_score(query, k=10)
for doc, score in results_with_score:
    print(f"distance={score:.4f} | {doc.metadata['title']}")
Enter fullscreen mode Exit fullscreen mode

Distance Distribution (Measured)

distance=0.8652 | 2024 Cloud Native Trends: Service Mesh and eBPF
distance=0.8764 | 2023 Frontend Framework Comparison: React vs Vue vs Angular
distance=0.8833 | Python Web Scraping: Scrapy vs Playwright
distance=0.8857 | 2024 Python Performance Optimization Guide
distance=0.8906 | Python Machine Learning: From NumPy to PyTorch
distance=0.9019 | Rust Systems Programming: Memory Safety and Zero-Cost Abstractions
distance=0.9024 | Python Async Programming: From asyncio to aiohttp
distance=0.9145 | JavaScript Async Programming: Promise and async/await
distance=0.9147 | Database Selection Guide: PostgreSQL vs MySQL vs MongoDB
distance=0.9481 | Go Microservices: gRPC and Kubernetes
Enter fullscreen mode Exit fullscreen mode

Manual Threshold Filtering

threshold = 0.89
filtered = [(doc, score) for doc, score in results_with_score if score <= threshold]
# Result: 4 documents (first 4 with distance <= 0.89)
Enter fullscreen mode Exit fullscreen mode

Analysis

  • ✅ Can filter out obviously irrelevant results (Go article at distance 0.9481)
  • ⚠️ Threshold needs experimentation: too high = nothing returned, too low = no filtering effect
  • 💡 Recommendation: Run a batch of queries to see distance distribution, then set the threshold

Strategy 4: Self-Query (Query Parsing + Metadata Filtering)

Principle

User queries often aren't pure semantic questions — they come with explicit conditions:

  • "Articles about Python in 2024" → year=2024, tags=Python
  • "Backend development category articles" → category=Backend
  • "Frontend-related articles from 2023" → year=2023, category=Frontend

Self-Query core flow:

Natural language query → Parser → Structured filter conditions → Metadata filtering → Vector retrieval
Enter fullscreen mode Exit fullscreen mode

Parser Implementation

Production environments can use LLM (like LangChain's SelfQueryRetriever) for parsing; here we use a rule-based parser to demonstrate the core logic:

def parse_query(query: str) -> dict:
    filters = {}
    semantic = query

    # Extract year
    if match := re.search(r'(20\d{2})\s*', query):
        filters["year"] = int(match.group(1))

    # Extract category
    for cat in ["Backend", "Frontend", "Systems", ...]:
        if cat in query:
            filters["category"] = cat

    # Extract tags
    for tag in ["Python", "JavaScript", "Go", ...]:
        if tag in query:
            filters["tags"] = tag

    return {"semantic_query": semantic, "filters": filters}
Enter fullscreen mode Exit fullscreen mode

Experiment Results

Query 1: "Articles about Python in 2024"

Parse result:
  Semantic query: Python
  Filter conditions: {'year': 2024, 'tags': 'Python'}
After metadata filtering: 4 documents remain:
  - Python Async Programming: From asyncio to aiohttp
  - 2024 Python Performance Optimization Guide
  - Python Machine Learning: From NumPy to PyTorch
  - Python Web Scraping: Scrapy vs Playwright
Enter fullscreen mode Exit fullscreen mode

Query 2: "Backend development category articles"

Parse result:
  Semantic query: Backend development
  Filter conditions: {'category': 'Backend'}
After metadata filtering: 4 documents remain:
  - Python Async Programming: From asyncio to aiohttp
  - 2024 Python Performance Optimization Guide
  - Go Microservices: gRPC and Kubernetes
  - Python Web Scraping: Scrapy vs Playwright
Enter fullscreen mode Exit fullscreen mode

Query 3: "Frontend-related articles from 2023"

Parse result:
  Semantic query: Frontend
  Filter conditions: {'year': 2023}
After metadata filtering: 4 documents remain:
  - JavaScript Async Programming: Promise and async/await
  - 2023 Frontend Framework Comparison: React vs Vue vs Angular
  - Rust Systems Programming: Memory Safety and Zero-Cost Abstractions
  - Database Selection Guide: PostgreSQL vs MySQL vs MongoDB
Enter fullscreen mode Exit fullscreen mode

Analysis

  • ✅ Precisely responds to users' explicit conditions (time, category, tags)
  • ✅ Filter first, then retrieve — dramatically reduces the scope of vector comparisons
  • ⚠️ Parser quality determines effectiveness (rule-based vs LLM parsing)

Using LLM Parser in Production

from langchain.retrievers.self_query.base import SelfQueryRetriever

self_query_retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents="Technical blog articles",
    metadata_field_info=[...],  # Define metadata fields
)
results = self_query_retriever.invoke("Articles about Python in 2024")
Enter fullscreen mode Exit fullscreen mode

Note: In LangChain 1.2.16 community packages, the module location of SelfQueryRetriever may vary. Please adjust the import path according to your installed version.


Four Strategies Comparison Summary

Strategy Best For Core Parameters Caveats
Similarity Search General use, highest relevance k Results may duplicate
MMR Multi-angle answers needed lambda_mult, fetch_k Parameters need tuning
Threshold Filtering Quality over quantity score_threshold Needs experimentation
Self-Query Queries with explicit conditions Parser quality Rule-based or LLM parser

Combined Usage Recommendation

In real production environments, combining strategies works best:

User query
    ↓
Self-Query parsing → Metadata filtering (narrow scope)
    ↓
Vector retrieval → MMR (ensure diversity)
    ↓
Threshold filtering (remove low-quality)
    ↓
Top-K results → LLM generates answer
Enter fullscreen mode Exit fullscreen mode
# Combined example
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,
        "lambda_mult": 0.5,
        "fetch_k": 50,
        "filter": {"year": 2024, "category": "Backend"}  # Self-Query parsed conditions
    }
)
Enter fullscreen mode Exit fullscreen mode

Complete Code

The complete code for this article is open-sourced at:

https://github.com/chendongqi/llm-in-action/tree/main/07-retrieval-strategies

Core files:

  • retrieval_strategies.py — Complete comparison experiment of four retrieval strategies
  • data/sample_articles.json — 10 test articles data

Summary

This article compared 4 retrieval strategies through code experiments:

  1. Similarity Search — Simple and direct, good for general scenarios
  2. MMR — Uses λ parameter to balance relevance and diversity, solves result duplication
  3. Threshold Filtering — Sets thresholds through distance distribution, filters out low-quality results
  4. Self-Query — Parses natural language into structured filter conditions, precisely responds to constrained queries

Key Insight: There is no best retrieval strategy, only the one most suitable for the current query. Combining Self-Query + MMR + threshold filtering is how you build a retrieval system that is both precise and comprehensive.


References

Top comments (0)