DEV Community

Ahmet Özel
Ahmet Özel

Posted on

Classical RAG vs Agentic RAG: a practical decision guide

"Should I use RAG or an agent?" comes up in almost every LLM project I work on. The honest answer is that they are not competing choices. Classical RAG and agentic RAG sit on a spectrum, and picking the wrong end of it either wastes money or gives you weak answers. This post is a practical way to decide, based on a guide and demo I put together.

Repo with runnable code: https://github.com/ahmet-ozel/rag-architecture-guide

Classical RAG in one paragraph

Classical RAG is a fixed pipeline: embed the query, retrieve the top-k chunks from a vector store, stuff them into the prompt, and generate an answer. One retrieval, one generation. It is cheap, fast, and predictable. For a knowledge base where the answer lives in one or two documents, this is usually all you need, and adding anything more just increases latency and cost.

Agentic RAG in one paragraph

Agentic RAG hands control to the model. Instead of a fixed pipeline, the LLM decides what to do: reformulate the query, retrieve, check whether the result is good enough, retrieve again from a different source, call a tool, and only then answer. It can loop. This is far more powerful for hard questions, but it is slower, costs more tokens, and is harder to make deterministic.

A decision tree that works in practice

Start simple and only add complexity when the data forces you to:

  1. Is the answer usually contained in a single chunk or document? Use classical RAG.
  2. Does answering require combining information from several documents or steps of reasoning? Lean agentic.
  3. Do you need to query multiple sources (a vector DB, a SQL table, an external API) to answer? Agentic, because the model needs to choose tools.
  4. Are latency and cost tight constraints (high traffic, user-facing)? Bias toward classical, and only escalate to an agent for the queries that actually need it.
  5. Can you tolerate non-deterministic behavior? If not, classical with strong retrieval beats an agent that occasionally loops in unexpected ways.

A pattern I like: run classical RAG first, and if a confidence or self-check step says the retrieved context is weak, escalate that single query to the agentic path. Most queries stay cheap; only the hard ones pay the agent tax.

The part everyone skips: evaluation

Neither approach means anything without measurement. Before you argue about architecture, build an eval set of real questions with known good answers. Then track:

  • Retrieval quality: are the right chunks being retrieved at all? (recall@k, hit rate)
  • Answer quality: faithfulness (is the answer grounded in the retrieved context?) and relevance.
  • Cost and latency per query, so you can see what agentic behavior actually costs you.

Most "RAG is bad" complaints I see are actually retrieval problems: bad chunking, wrong embedding model, or no reranking. Fixing retrieval often beats switching to an agent.

What the demo covers

The repo walks through both architectures end to end with ChromaDB for vector search and works across OpenAI, Gemini, Claude, Ollama, and vLLM, so you can run it fully local or against a hosted model. It includes the chunking and retrieval steps, the agentic tool-selection loop, and the evaluation metrics so you can compare the two on your own data.

Takeaway

Default to classical RAG. Add agentic behavior when your questions genuinely need multi-step reasoning or multiple sources, and measure the cost when you do. Architecture is a dial, not a switch.

Repo: https://github.com/ahmet-ozel/rag-architecture-guide

How are you deciding between fixed pipelines and agentic retrieval in production? I am especially curious where people draw the line on cost.

Top comments (0)