DEV Community

The Pragamatic Architect
The Pragamatic Architect

Posted on

From Naive to Agentic: The Complete RAG Evolution in 21 Patterns

Naive RAG, Advanced RAG (Multi-Query), Multi-Step RAG, Agentic RAG, Hybrid RAG, Reranked RAG, Metadata-Filtered RAG, Parent Document RAG, Contextual Compression RAG, Corrective RAG, Graph RAG, Structured Data RAG, Conversational RAG, Citation-Grounded RAG, Adaptive Router RAG, Multimodal RAG, Fusion RAG, Multi-Hop RAG, PDF RAG, Image OCR RAG, Local Image OCR RAG
Retrieval Augmented Generation(RAG) Patterns

The Evolution of RAG: 21 Patterns from Prototype to Production

Retrieval-Augmented Generation (RAG) started simple. Chunk your docs. Embed them. Retrieve the top-k. Stuff it in a prompt. That worked. Until it didn't.

Until your retrieval missed context that lived three chunks away. Until your LLM hallucinated over perfectly good documents. Until your users asked questions that required reasoning, not just lookup.

New patterns emerged to fix the failures of the ones before them: Query rewriting. Reranking. Hypothetical document embeddings. Graph-based retrieval. Self-RAG. Corrective RAG. Agentic loops that decide whether to retrieve at all. Each one solves something real, and each introduces tradeoffs worth understanding.

This guide walks through the complete evolution. Every pattern. What it solves. When to reach for it. And most importantly, why you probably need more than one of them.

21 patterns. One throughline: the relentless pursuit of actually getting the right answer.

Why Most RAG Systems Fail in Production

Before we get into the patterns, let's be very clear about the root cause of RAG failure.

Most teams build RAG and then blame the LLM when things go wrong. "The model hallucinated." "GPT-4 got confused." "We need a bigger context window." Nine times out of ten, the model is fine. The retrieval pipeline is the problem.

Retrieval fails in four specific ways:

Too shallow: You retrieved text, but it was the wrong text. The user's question used different words than your document. Semantic similarity only gets you so far.
Too narrow: You retrieved from one source, one index, or one modality. But the answer lived in a CSV, a graph, a PDF, or an image. Your pipeline never looked there.
Too brittle: One bad query, one ambiguous question, or one follow-up that references previous context, and the whole thing breaks down.
Too disconnected: The answer requires combining two facts from two different places. Your pipeline can only retrieve one thing at a time.

Every pattern in this guide is a direct response to one of these four failure modes. Keep that in mind as we go.

The Five Stages of RAG Evolution

The 21 patterns group naturally into five stages. Each stage solves the problems the previous stage created.

Stage 1: Foundation Patterns — Get It Working

These are the patterns every team starts with. They are fast, cheap, and get you 60–70% of the way there. The other 30% is why the remaining 18 patterns exist.

Pattern 01 — Naive RAG: This is where everyone begins, and there is nothing wrong with that. The idea is simple: split your documents into chunks, embed them into a vector database, embed the user's query, find the most similar chunks, and pass them to an LLM to generate an answer. For internal knowledge bases or lightweight prototypes where speed-to-value matters most, Naive RAG is appropriate.
Pattern 02 — Advanced RAG (Multi-Query): This fixes the vocabulary mismatch problem directly. Instead of running one vector search, the system generates multiple query variants. If a user asks, "What is the remote work policy?" the system might also search for "work from home guidelines" or "distributed team rules."
Pattern 13 — Conversational RAG: A user asks, "What is our parental leave policy?" The system answers. Then the user asks, "What about for adoptions?" and the system has absolutely no idea what "that" refers to. Conversational RAG solves this with history-aware query rewriting. "What about for adoptions?" becomes "What is the parental leave policy for adoptive parents?" * When to use it: Any conversational interface or chat-based product. Build it in early.

Stage 2: Retrieval Quality Patterns — Make Retrieval Actually Good

This is the stage most teams underinvest in. If Stage 1 gets you working, Stage 2 gets you trustworthy.

Pattern 05 — Hybrid RAG: Semantic search has a well-known blind spot: exact terms (acronyms, SKUs, legal clauses). Keyword search handles this perfectly. Hybrid RAG combines dense retrieval (vector similarity) with a sparse keyword scorer (BM25), then merges the candidate sets.
Pattern 06 — Reranked RAG: Retrieval and ranking are two different problems. Top-k vector search retrieves candidates that are "probably relevant," but not necessarily in the right order. Reranked RAG separates these concerns by retrieving a broader set of candidates (e.g., top 15) and running a second scoring pass with a reranker model that evaluates the full query-document pair.
Pattern 07 — Metadata-Filtered RAG: Not every question should search everything. A question from an employee in Singapore shouldn't retrieve the US vacation policy. Metadata filtering applies structured constraints (department, region, document type) before semantic search even runs, reducing noise at the source.
Pattern 08 — Parent Document RAG: Small chunks (200 tokens) improve retrieval precision, but lose context. Parent Document RAG uses fine-grained child chunks for precise retrieval. Once a child chunk is found, the system expands it back to its full parent section for the answering stage, giving you both precision and completeness.
Pattern 09 — Contextual Compression RAG: You retrieve a 500-token section, but the answer lives in just 50 tokens. Contextual Compression adds a step where the retrieved document is passed through the LLM to extract only the relevant parts before generating the final answer. Less noise means sharper answers and lower token costs.
**Pattern 10 — Corrective RAG: **Sometimes, retrieval comes back with weak evidence, and Naive RAG will confidently answer anyway. Corrective RAG adds a self-evaluation loop. If retrieved documents fall below a quality threshold, the system rewrites the query and retrieves again. It recovers from its own bad first pass instead of failing silently.
Pattern 17 — Fusion RAG: Instead of combining two retrieval methods with one query, Fusion RAG generates multiple query variants and runs each through multiple retrievers. The results are merged using Reciprocal Rank Fusion. The ensemble catches what any individual strategy would miss.

Stage 3: Reasoning and Orchestration — Handle Complex Questions

These patterns are for questions that cannot be answered with a single retrieval call.

Pattern 03 — Multi-Step RAG: **"What is our remote work policy, and how does it compare to our equipment stipend rules?" This is two questions. Multi-Step RAG decomposes compound questions, retrieves separately for each part, and synthesizes a final answer.
**Pattern 18 — Multi-Hop RAG:
Multi-Hop is different: the second retrieval depends on the result of the first. To find the "most cost-effective standing desk that qualifies for a stipend," the system must first retrieve the stipend limit, then use that number to filter the catalog. This is chain-of-retrieval reasoning.
**Pattern 15 — Adaptive Router RAG: **An HR question should hit the policy store. A product question should hit the catalog. Adaptive Router RAG adds a routing layer before retrieval, sending the query only to the most relevant index based on intent.
**Pattern 04 — Agentic RAG: **Agentic RAG gives an LLM-powered agent access to retrieval as a tool, alongside web search or calculators. The agent decides which tool to use, whether the retrieved information is sufficient, and if more steps are needed. It is a bridge from passive retrieval to active reasoning.

Stage 4: Trust and Grounding — Make It Safe for Production

This stage separates toys from production systems.

Pattern 14 — Citation-Grounded RAG: If your RAG system affects real decisions, it has to cite its sources. Full stop. This pattern formats the retrieved context with explicit source labels and instructs the model to cite them. Users are no longer trusting an AI; they are verifying a claim against a source they already trust.
Pattern 10 (again) — Corrective RAG as a Trust Pattern: The core trust problem isn't just wrong answers; it is confidently wrong answers. Corrective RAG reduces false confidence by refusing to answer from low-quality evidence. It either improves the retrieval or escalates.

Stage 5: Enterprise and Multimodal — Handle Real Business Data

Most business knowledge is not clean text. It's PDFs, CSVs, slide decks, and images. These patterns make RAG work on real data.

Pattern 12 — Structured Data RAG: This is perhaps the highest-ROI pattern here. Many answers live partly in a document (policy rules) and partly in a CSV (equipment catalog). This pattern combines semantic retrieval over text with direct reasoning over structured tables simultaneously.
Pattern 11 — Graph RAG: Vector similarity cannot capture relational knowledge like, "Which teams depend on the authentication service?" Graph RAG loads knowledge as nodes and edges, building context from graph traversal instead of chunk retrieval.
Pattern 16 — Multimodal RAG: Information trapped in architecture diagrams, Visio exports, or PowerPoint slides is invisible to text-only RAG. Multimodal RAG extracts textual representations from these sources, storing them in a vector store to be retrieved alongside traditional documents.
Pattern 19 — PDF RAG: If your enterprise runs on paper, it runs on PDFs. PDF RAG extracts text at the page level, indexes those pages with source labels, and provides answers with precise page-level citations.
**Pattern 20 — Image OCR RAG: **For scanned receipts or field inspection photos, Image OCR RAG relies on pre-extracted text (processed during ingestion) stored in a structured JSON file. At query time, it retrieves against the text and points back to the original image.
**Pattern 21 — Local Image OCR RAG: **This runs OCR live at ingestion time, locally on your machine, rather than relying on pre-extracted JSON or cloud APIs.

The Maturity Model — Where Is Your System Right Now?

Here is the honest maturity ladder most teams follow. Not from simple to "fancy," but from simple to fit for the actual shape of the problem.

Level 1 — Baseline: Pattern 01 (Naive RAG). You have a working system. Great starting point, not a destination.
Level 2 — Better Recall: Add Pattern 02 (multi-query) or Pattern 05 (hybrid). Users stop getting "no answer."
Level 3 — Better Precision: **Add Pattern 06 (reranking). The right answer moves to position 1.
**Level 4 — Better Trust:
Add Pattern 14 (citations) and Pattern 10 (corrective). Stakeholders stop asking "how do we know this is right?"
**Level 5 — Better Workflow Fit: **Add Pattern 03 (multi-step), Pattern 15 (adaptive routing), and Pattern 18 (multi-hop) to handle compound questions.
**Level 6 — Full Enterprise Coverage: **Add Patterns 12, 11, 19, 20, and 21. The system can now answer from structured data, graphs, PDFs, and images.

Which Pattern Should You Use First?

Let the failure mode guide your choice:

Recall is the problem? → Start with Pattern 02 or 05.
Precision is the problem? → Add Pattern 06.
Context memory is the problem? → Add Pattern 13.
Grounding is the problem? → Add Pattern 14.
Data modality is the problem? → Add Pattern 12.
Query complexity is the problem? → Add Pattern 18 or 03.
Source modality is the problem? → Add Patterns 19, 20, or 21.

The Bottom Line

The biggest mistake in RAG is treating it like a single architectural decision. "We are using RAG" is about as informative as "we are using a database." Which one? For what? Optimized how?

RAG is a design space, and the patterns in this guide are its vocabulary. Start with Naive RAG. Break it intentionally. Chase the failure modes up the ladder. It is the shortest route from having an LLM to having an AI system that actually operates on business knowledge.

The full working code for all 21 patterns is here: https://github.com/eagleeyethinker/rag-evolution-patterns

Work through them in order. Run the demos. Break them. Fix them. By the time you reach Pattern 21, you will understand RAG deeply enough to build a production system that earns user trust, not just demo applause.

If this was useful, share it with someone on your team who is still on Pattern 1 and wondering why production is harder than the demo. They will thank you.

Satish Gopinathan is an AI Strategist, Enterprise Architect, and the voice behind The Pragmatic Architect. Read more at eagleeyethinker.com or Subscribe on LinkedIn.

Tags: #RAG #LLM #AIEngineering #GenerativeAI #EnterpriseAI #MachineLearning #VectorSearch #LangChain #AIInProduction #BuildingWithAI

Top comments (0)