Agentic RAG: When Your Retrieval System Learns to Think for Itself

#ai #programming #tutorial #rag

Traditional RAG retrieves once and generates. Agentic RAG retrieves, evaluates, and decides whether to retrieve again.

Traditional RAG vs Agentic RAG

Traditional: User asks → Vector search once → Generate answer
Agentic: User asks → Agent analyzes → Decides search strategy → Verifies results → Re-searches if needed → Generates

The agent thinks between steps.

The Core Loop

class AgenticRAG:
    def __init__(self, retriever, llm, max_rounds=3):
        self.retriever = retriever
        self.llm = llm
        self.max_rounds = max_rounds

    def _decide(self, query, context):
        prompt = f"Query: {query}. Existing info: {context or 'none'}. Decide: RETRIEVE | GENERATE | REFINE"
        return self.llm.generate(prompt)

    def _verify(self, answer, sources):
        prompt = f"Answer: {answer}. Sources: {sources}. Is every claim backed by a source? yes/no"
        return "yes" in self.llm.generate(prompt).lower()

    def run(self, query):
        context = []
        for _ in range(self.max_rounds):
            decision = self._decide(query, "\n".join(context))
            if "GENERATE" in decision:
                answer = self.llm.generate(f"Based on: {context}\nQuery: {query}")
                if self._verify(answer, context):
                    return answer
            else:
                results = self.retriever.search(query, top_k=5)
                context.extend(results)
        return answer

Three Critical Decisions

1. When to Stop Searching

Combine confidence threshold + round limit (max 3) + information gain check.

2. Which Tools to Give the Agent

Start: vector + BM25 + RRF. Add SQL and web search later.

3. How Deep to Verify

Light check (1 LLM call) for every query. Deep check (re-search + compare) for high-stakes answers.

When to Use Each

Scenario	Use
Simple fact lookup	Traditional RAG
Multi-hop reasoning	Agentic RAG
Numerical aggregation	Agentic RAG + SQL
Latency-sensitive (<500ms)	Traditional RAG
High accuracy (medical/legal)	Agentic RAG + deep verify