šā
Iām Luhui Dev, a developer who has been breaking down Agent engineering and exploring how AI can be applied in education.
I focus on Agent Harness, LLM application engineering, AI for Math, and the productization of education SaaS.
A Support Ticket That Opens a Data Maze
Say your company just launched an AI customer-support system.
A major account sends in a ticket: "What's the remaining warranty period on the servers we purchased under Project Alpha last quarter? Could you also share the original contract terms and the current technical support contact?"
It sounds like an ordinary question. But when your tech lead reads the ticket, they pause for a moment.
Because they know answering it requires the system to:
- Look up the customer's profile and project history in the CRM
- Look up the procurement contract and warranty terms for Project Alpha in the ERP / contract management system
- Look up the stock-in date and device serial numbers for that batch of servers in the asset management system
- Look up the current customer-success owner in the HR system
These four systems are maintained by different teams, run on different databases, and enforce different access controls.
A standard RAG system is helpless here. The best it can do is say, "Sorry, I couldn't find relevant information."
This is exactly the problem Agentic RAG is built to solve.
Traditional RAG: A One-Shot Retrieval Clerk
Let's quickly recap how RAG works.
The core idea behind RAG (Retrieval-Augmented Generation) is simple: an LLM's training knowledge is static, while enterprise data is dynamic and private. The fix is to retrieve relevant document chunks from a database before generating an answer, stuff them into the context, and have the LLM answer based on that material.
User question ā [vector search] ā retrieve relevant chunks ā [LLM] ā generate answer
This pipeline works well when there's a single knowledge base and a clear question. But it has two fundamental limits.
Limit one: a single retrieval pass, no iteration. Retrieve once, hand it to the LLM once, done. If the first pass misses the key information, the whole chain breaks, and the LLM is left guessing or saying "I don't know."
Limit two: a single corpus, no routing. Traditional RAG assumes all knowledge lives in one unified vector database. In a real enterprise, data is scattered across CRM, ERP, Confluence, data warehouses, private document stores -- each with its own access point and permission boundary.
Here's an analogy: traditional RAG is a librarian who can only find books on the first floor, while the book you need might be sitting on the fourth floor, behind a different access pass.
Agentic RAG: A Retrieval Department That Thinks
The core shift in Agentic RAG is this: turn a single retrieval pass into a planned, iterative retrieval process.
It's no longer a passive query-and-return pipeline. It's a workflow run by multiple specialized agents, each with a distinct responsibility.
Let's use the support-ticket example to walk through how the whole workflow operates.
Step 1: The Orchestrator Decomposes the Task
The user's question first reaches the Orchestrator.
The orchestrator doesn't retrieve anything directly. It first understands the structure of the question: how many independent information needs are involved? Are there dependencies between them? Which data sources need to be accessed?
For our ticket, the orchestrator breaks it down into:
- Subtask A: Look up the customer's "Project Alpha" basics (customer ID, project number) in the CRM
- Subtask B: Use the project number to look up warranty terms in the contract system
- Subtask C: Use the project number to look up device serial numbers and stock-in dates in the asset management system
- Subtask D: Look up the current technical support owner in the HR system
Note that Subtasks B and C depend on the result of Subtask A (they need the project number first). Subtask D can run in parallel.
This dependency graph is the execution plan produced by the Planner Agent.
Step 2: Query Rewriting for Each Data Source
Every data source expects queries in a different form. The CRM might need keyword search, the contract system might need structured SQL, and the vector database needs semantic search.
The Query Rewriter translates each natural-language subtask into a query format the target source can understand:
- For the CRM vector store:
"Alpha project procurement record {customer name}" - For the contract system:
SELECT warranty_terms FROM contracts WHERE project_id = 'Alpha-XXX' - For asset management:
"Alpha project server stock-in date serial number"
Step 3: Parallel Retrieval Across Permission Boundaries
The Search Fanout Agent queries multiple data sources at once.
There's a key engineering problem here: permissions.
Different data sources have different access controls. CRM data might be open to the sales team, HR data might only be accessible to admins, and contract data might require legal sign-off. An Agentic RAG framework needs to maintain a "credential pool" at this layer -- different access tokens for different data sources -- and make sure retrieval never exceeds the current user's actual authorization scope.
This isn't just a technical problem; it's a compliance one too: AI shouldn't be able to bypass data access controls you were never supposed to have just because you phrased a request in natural language.
Step 4: Sufficiency Checking -- the Most Important Innovation
Once all retrieval results come back, they're passed to the Sufficient Context Agent.
This is the design that most distinguishes Agentic RAG from traditional RAG: the system actively judges whether the information gathered so far is enough to answer the original question, and if not, it spells out exactly what's missing before retrieving again.
In our ticket example, the checker might find:
ā
Found: customer profile, project number, device serial numbers
ā
Found: technical support owner
ā Missing: the contract system returned a document, but the warranty terms are in an attached PDF that the vector search didn't hit
Instead of just saying "not enough information," the checker outputs a precise description of the gap:
"Project number Alpha-2024-087, device serial numbers SN-XXX-YYY-ZZZ, and stock-in date March 2024 have been retrieved. The main contract file has been found, but the warranty terms are in Contract Attachment B. Re-search the contract attachment store specifically for 'Attachment B warranty period.'"
That feedback drives a second retrieval round: the rewriter generates a more precise query targeted at the contract attachment.
This "retrieve ā evaluate ā retrieve again" loop continues until the sufficiency checker determines the information is complete, or the maximum iteration limit is reached.
Step 5: Synthesis Produces the Final Answer
Once everything is in place, the Synthesis Agent combines fragments from four different systems into one coherent, accurate, and source-attributed answer:
"The three servers purchased under Project Alpha (project number Alpha-2024-087, serial numbers SN-XXX-001 through 003) carry a 36-month warranty from their stock-in date (March 15, 2024), per Section 4.2 of Contract Attachment B, expiring March 14, 2027. The current technical support owner is Li Ming (extension 4521, liming@company.com)."
Every sentence has a traceable source.
Cross-System Permissions: Harder Than the Technology
The handling of permission boundaries deserves its own discussion.
In a real enterprise, data permissions are a multi-dimensional problem:
| Dimension | Description | Example |
|---|---|---|
| Role-based access | Different roles see different data | Sales can see a contract summary but not the full text |
| Data classification | A single database can hold multiple sensitivity levels | Employee salary vs. employee directory |
| Time-based access | Some data has time-limited access rules | Financial data is read-only during an audit |
| Cross-system access | Data from System A must not surface in System B's context | GDPR requires data to stay within its jurisdiction |
An Agentic RAG framework needs to enforce these rules on every single retrieval call, not just authorize access once at indexing time.
That means the architecture needs permission checks at query time, rather than the blunt approach of vectorizing everything into one big store.
In database terms: traditional RAG is like joining every table into one giant table and handing it to the LLM. Agentic RAG is like generating a permission-filtered SQL query dynamically for every request.
Three Decisions You Can't Avoid in Practice
When you actually build Agentic RAG in production, three decisions come up every time.
Decision one: routing strategy -- static rules or LLM routing?
Static routing: predefine rules based on keywords or metadata in the query to decide which data source to hit. Fast and predictable, but weak on open-ended queries.
LLM routing: let the LLM understand the query's intent and dynamically decide where to route it. Flexible, but every routing decision burns an LLM call, adding latency and cost.
Decision two: iteration depth -- when do you stop?
The system can get stuck in an infinite loop -- every round of retrieval feels like something is still missing, so it keeps searching.
Engineering-wise, you need:
- A maximum iteration count (typically 2-4 rounds)
- A time budget (answer with what you have once you time out)
- A degradation strategy (answer with available information and flag it as potentially incomplete once the iteration limit is hit)
Decision three: the latency-vs-accuracy trade-off
Agentic RAG is slower than traditional RAG -- there's no avoiding it. Multiple LLM calls, parallel retrieval, and sufficiency evaluation all add latency at every step.
| Approach | Cost multiplier | Latency multiplier | Best fit |
|---|---|---|---|
| Traditional RAG | 1x | 1x | Simple Q&A, single knowledge base |
| Adaptive RAG | 1.5-2x | 1.2-2x | Mixed scenarios with varying query complexity |
| CRAG (Corrective RAG) | 3-5x | 2-3x | High accuracy needs, tolerant of second-scale latency |
| Full Agentic RAG | 5-10x | 3-6x | Complex multi-hop, cross-store, async scenarios |
Not every scenario needs full Agentic RAG.
Classifying intent at the query level -- routing complex queries through the Agentic pipeline and simple ones through traditional RAG -- keeps average cost and latency within a reasonable range.
Closing Thoughts
I think the essence of Agentic RAG is turning retrieval into an executable strategy: if one pass isn't enough, keep searching until it is. And the system itself decides what "enough" means.
That shift sounds simple, but it requires moving from a stateless "query-response" model to a stateful "goal-plan-execute-evaluate-iterate" workflow.
This is the same general challenge every agent system faces: state management is the core difficulty.
If you're building an enterprise AI system that spans multiple data sources, Agentic RAG isn't just an upgrade to your retrieval technique. It forces you to rethink data architecture, permission design, and workflow orchestration. Getting those three things right matters more than which framework or cloud vendor you pick.
References
- Google Research, Unlocking dependable responses with Gemini Enterprise Agent Platform's Agentic RAG, June 2026
- Microsoft, Agentic Retrieval Overview -- Azure AI Search, 2026-04-01 GA
- Microsoft, What is a Knowledge Source? -- Azure AI Search
- Amazon Web Services, Knowledge Bases for Amazon Bedrock -- Multiple Data Sources, April 2024
- MarsDevs, Agentic RAG: The 2026 Production Guide (includes cost/latency comparisons across approaches)
- Google Research, Deeper Insights into Retrieval-Augmented Generation: The Role of Sufficient Context

Top comments (0)