Self Querying Retrieval

#ai #rag #langchain #python

🧠 The Concept

In standard RAG, the system is a bit "dumb."
If you ask for "Romantic movies from the 90s", a standard vector search just looks for the words "Romantic", "movies", and "90s".
It doesn't actually understand that "90s" is a date range or a category.

Self-Querying Retrieval changes this by giving the LLM a "Search Bar" and "Filters".

1. The Analysis: The LLM looks at your query first.

2. The Translation: It separates the semantic meaning from the query (the "facts" like date, rating, or category).

3. The Structured Search: It writes a formal database query to filter the results before doing the vector search.

📝 Example-based explanation:

Imagine your company has thousands of policy documents stored in a Vector DB. Each document has Metadata attached to it:

Department: (Sales, Engineering, HR)
Year: (2022, 2023, 2024)
Type: (Benefit, Conduct, Salary)

User Query: "What were the maternity leave benefits for Engineering staff in 2023?"

Standard RAG (The "Basic" Way):
It searches for the entire sentence. It might find a 2024 policy for Sales because the word "maternity" appeared frequently there. It’s a "blurry" search.

Self-Querying RAG:
The LLM acts as a translator first. It creates two parts:

1. The Semantic Query: "maternity leave benefits"
2. The Filter: Department == 'Engineering' AND Year == 2023

⚙️ Practical Implementation:

To implement Self-Querying Retrieval, you need two things:

Vector Store that supports metadata filtering (like Chroma).
LLM that understands how to translate natural language into structured filters.

Following code uses langchain's in-built libraries for easier execution:

from pydantic import BaseModel, Field
from langchain.chat_models import init_chat_model
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_classic.retrievers import SelfQueryRetriever
from langchain_classic.chains.query_constructor.schema import AttributeInfo ## meta-data schema for llm to understand

from dotenv import load_dotenv
load_dotenv()
llm_groq = init_chat_model("openai/gpt-oss-120b", model_provider = 'groq', temperature = 0)


# --- STEP 1: Document Loading ---
# Imagine 'products.txt' contains raw paragraphs about different electronics
with open("./RAG/Retrieval_techniques/products.txt", "w") as f:
    f.write("""
    The HP ProBook 15 is a professional laptop priced at 1200 dollars. It has a stellar 4.5 rating.
    The Acer Aspire is a basic student laptop. It is very affordable at 400 dollars but has a 3.8 rating.
    The Razer Blade is a high-end gaming laptop for 2500 dollars, boasting a near-perfect 4.9 rating.
    The LG UltraWide is a 4K monitor. It costs 600 dollars and is rated 4.2 by experts.
    """)

loader = TextLoader("./RAG/Retrieval_techniques/products.txt")
raw_documents = loader.load()


# --- STEP 2: Chunking ---
# We split by double newlines to keep each product description together
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
chunks = text_splitter.split_documents(raw_documents)


# --- STEP 3: Automated Metadata Enrichment (The LLM Part) ---
class ProductSchema(BaseModel):
    """Schema for extracting product details from text."""
    category: str = Field(description="laptop, monitor, or other electronics")
    price: int = Field(description="The price as an integer")
    brand: str = Field(description="The brand name")
    rating: float = Field(description="The numerical rating 0-5")

# Initialize LLM with structured output
llm_structured = llm_groq.with_structured_output(ProductSchema)

print("--- Starting Automated Enrichment ---")
enriched_chunks = []

for chunk in chunks:
    # LLM reads the chunk content and generates the structured data
    extracted = llm_structured.invoke(chunk.page_content)

    # Update the chunk's metadata dictionary
    chunk.metadata.update(extracted.model_dump()) ## adding new metadata in chunks
    enriched_chunks.append(chunk)
    print(f"Processed: {chunk.metadata['brand']} - ${chunk.metadata['price']}")


# --- STEP 4: Vector Store ---
embedding_model = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-MiniLM-L6-v2", ## this model returns 384 sized vector
    task="feature-extraction",
)
vectorstore = Chroma.from_documents(enriched_chunks, embedding_model)


# --- STEP 5: Define Metadata for the "Self-Query" Brain ---
# AttributeInfo: A schema object that defines the name, description, and data type of your metadata fields.
# Under-the-Hood: It serves as the system prompt for the LLM. It tells the model: 
# "Here are the 'columns' available in our database. When a user mentions price, "
# "use the field 'price' and treat it as an integer." 
# Without this, the LLM wouldn't know which keys exist in your VectorDB metadata.
metadata_field_info = [
    AttributeInfo(name="category", description="The type of product", type="string"),
    AttributeInfo(name="price", description="The cost in USD", type="integer"),
    AttributeInfo(name="brand", description="The brand name", type="string"),
    AttributeInfo(name="rating", description="The customer rating 0-5", type="float"),
]
document_content_description = "Product descriptions from the store catalog"


# --- STEP 6: The Self-Querying Retriever ---
retriever = SelfQueryRetriever.from_llm(
    llm_groq, # to make a structured query
    vectorstore, # where chunks are stored
    document_content_description, # description for llm's understanding
    metadata_field_info, # meta-data schema for llm's understanding
    verbose=True 
) 
# Under-the-Hood SelfQueryRetriever:
# It sends the user's prompt + metadata_field_info to the llm_groq
# User: "Laptops under $1000" ; llm converts it to: { "price": 1000, "category": "laptop" }
# LangChain takes that structured query and translates it into the native filtering language of your specific VectorDB (e.g., Chroma's where clause or Pinecone's filter syntax)
# It performs a search where it first discards all chunks that don't match the metadata filter and then performs a semantic vector search on the remaining chunks.

# --- STEP 7: Testing ---
query = "Show me laptops cheaper than 1500 with at least a 4 star rating"
results = retriever.invoke(query)

for doc in results:
    print(f"Result: {doc.page_content} | Metadata: {doc.metadata}")

🚧 Cons of Self Query Retrieval

The entire system relies on the LLM’s ability to correctly translate a natural language sentence into a precise filter.
If you add new documents with a new metadata field (like color), the retriever won't know it exists until you manually update the AttributeInfo list in your code. It is not "plug-and-play" with dynamic data.
Standard RAG is one step: Query -> Vector Search
Self-Querying RAG is three steps:
Query -> LLM (to generate the filter)
Filter -> Vector DB (to prune the data)
Filtered Results -> Vector Search
You have to build a separate "Ingestion Pipeline" that uses an LLM to extract metadata from every single chunk before it goes into the database.

Possible Solutions:

Challenge	Strategy to Overcome
Logic Errors	Few-Shot Prompting to show the LLM examples of correct query translations.
Latency	Use a smaller, faster model for query construction
Schema Drift	Pydantic model to validate metadata during both ingestion and retrieval.

🎯 Conclusion

In modern architectures, we often use Hybrid Self-Querying.
The LLM generates a filter, but we apply it "softly"—giving higher weight to matches that meet the criteria, rather than deleting everything else.

DEV Community