π§ The Concept
In standard RAG, the system is a bit "dumb."
If you ask for "Romantic movies from the 90s", a standard vector search just looks for the words "Romantic", "movies", and "90s".
It doesn't actually understand that "90s" is a date range or a category.
Self-Querying Retrieval changes this by giving the LLM a "Search Bar" and "Filters".
1. The Analysis: The LLM looks at your query first.
2. The Translation: It separates the semantic meaning from the query (the "facts" like date, rating, or category).
3. The Structured Search: It writes a formal database query to filter the results before doing the vector search.
π Example-based explanation:
Imagine your company has thousands of policy documents stored in a Vector DB. Each document has Metadata attached to it:
Department: (Sales, Engineering, HR)
Year: (2022, 2023, 2024)
Type: (Benefit, Conduct, Salary)
User Query: "What were the maternity leave benefits for Engineering staff in 2023?"
Standard RAG (The "Basic" Way):
It searches for the entire sentence. It might find a 2024 policy for Sales because the word "maternity" appeared frequently there. Itβs a "blurry" search.
Self-Querying RAG:
The LLM acts as a translator first. It creates two parts:
1. The Semantic Query: "maternity leave benefits"
2. The Filter: Department == 'Engineering' AND Year == 2023
βοΈ Practical Implementation:
To implement Self-Querying Retrieval, you need two things:
- Vector Store that supports metadata filtering (like Chroma).
- LLM that understands how to translate natural language into structured filters.
Following code uses langchain's in-built libraries for easier execution:
from pydantic import BaseModel, Field
from langchain.chat_models import init_chat_model
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_classic.retrievers import SelfQueryRetriever
from langchain_classic.chains.query_constructor.schema import AttributeInfo ## meta-data schema for llm to understand
from dotenv import load_dotenv
load_dotenv()
llm_groq = init_chat_model("openai/gpt-oss-120b", model_provider = 'groq', temperature = 0)
# --- STEP 1: Document Loading ---
# Imagine 'products.txt' contains raw paragraphs about different electronics
with open("./RAG/Retrieval_techniques/products.txt", "w") as f:
f.write("""
The HP ProBook 15 is a professional laptop priced at 1200 dollars. It has a stellar 4.5 rating.
The Acer Aspire is a basic student laptop. It is very affordable at 400 dollars but has a 3.8 rating.
The Razer Blade is a high-end gaming laptop for 2500 dollars, boasting a near-perfect 4.9 rating.
The LG UltraWide is a 4K monitor. It costs 600 dollars and is rated 4.2 by experts.
""")
loader = TextLoader("./RAG/Retrieval_techniques/products.txt")
raw_documents = loader.load()
# --- STEP 2: Chunking ---
# We split by double newlines to keep each product description together
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
chunks = text_splitter.split_documents(raw_documents)
# --- STEP 3: Automated Metadata Enrichment (The LLM Part) ---
class ProductSchema(BaseModel):
"""Schema for extracting product details from text."""
category: str = Field(description="laptop, monitor, or other electronics")
price: int = Field(description="The price as an integer")
brand: str = Field(description="The brand name")
rating: float = Field(description="The numerical rating 0-5")
# Initialize LLM with structured output
llm_structured = llm_groq.with_structured_output(ProductSchema)
print("--- Starting Automated Enrichment ---")
enriched_chunks = []
for chunk in chunks:
# LLM reads the chunk content and generates the structured data
extracted = llm_structured.invoke(chunk.page_content)
# Update the chunk's metadata dictionary
chunk.metadata.update(extracted.model_dump()) ## adding new metadata in chunks
enriched_chunks.append(chunk)
print(f"Processed: {chunk.metadata['brand']} - ${chunk.metadata['price']}")
# --- STEP 4: Vector Store ---
embedding_model = HuggingFaceEndpointEmbeddings(
model="sentence-transformers/all-MiniLM-L6-v2", ## this model returns 384 sized vector
task="feature-extraction",
)
vectorstore = Chroma.from_documents(enriched_chunks, embedding_model)
# --- STEP 5: Define Metadata for the "Self-Query" Brain ---
# AttributeInfo: A schema object that defines the name, description, and data type of your metadata fields.
# Under-the-Hood: It serves as the system prompt for the LLM. It tells the model:
# "Here are the 'columns' available in our database. When a user mentions price, "
# "use the field 'price' and treat it as an integer."
# Without this, the LLM wouldn't know which keys exist in your VectorDB metadata.
metadata_field_info = [
AttributeInfo(name="category", description="The type of product", type="string"),
AttributeInfo(name="price", description="The cost in USD", type="integer"),
AttributeInfo(name="brand", description="The brand name", type="string"),
AttributeInfo(name="rating", description="The customer rating 0-5", type="float"),
]
document_content_description = "Product descriptions from the store catalog"
# --- STEP 6: The Self-Querying Retriever ---
retriever = SelfQueryRetriever.from_llm(
llm_groq, # to make a structured query
vectorstore, # where chunks are stored
document_content_description, # description for llm's understanding
metadata_field_info, # meta-data schema for llm's understanding
verbose=True
)
# Under-the-Hood SelfQueryRetriever:
# It sends the user's prompt + metadata_field_info to the llm_groq
# User: "Laptops under $1000" ; llm converts it to: { "price": 1000, "category": "laptop" }
# LangChain takes that structured query and translates it into the native filtering language of your specific VectorDB (e.g., Chroma's where clause or Pinecone's filter syntax)
# It performs a search where it first discards all chunks that don't match the metadata filter and then performs a semantic vector search on the remaining chunks.
# --- STEP 7: Testing ---
query = "Show me laptops cheaper than 1500 with at least a 4 star rating"
results = retriever.invoke(query)
for doc in results:
print(f"Result: {doc.page_content} | Metadata: {doc.metadata}")
π§ Cons of Self Query Retrieval
The entire system relies on the LLMβs ability to correctly translate a natural language sentence into a precise filter.
If you add new documents with a new metadata field (like color), the retriever won't know it exists until you manually update the
AttributeInfolist in your code. It is not "plug-and-play" with dynamic data.Standard RAG is one step:
Query -> Vector Search
Self-Querying RAG is three steps:
Query -> LLM (to generate the filter)
Filter -> Vector DB (to prune the data)
Filtered Results -> Vector SearchYou have to build a separate "Ingestion Pipeline" that uses an LLM to extract metadata from every single chunk before it goes into the database.
Possible Solutions:
| Challenge | Strategy to Overcome |
|---|---|
| Logic Errors | Few-Shot Prompting to show the LLM examples of correct query translations. |
| Latency | Use a smaller, faster model for query construction |
| Schema Drift | Pydantic model to validate metadata during both ingestion and retrieval. |
π― Conclusion
In modern architectures, we often use Hybrid Self-Querying.
The LLM generates a filter, but we apply it "softly"βgiving higher weight to matches that meet the criteria, rather than deleting everything else.
Top comments (0)