Luhui Dev

Posted on Jun 19 • Originally published at luhuidev.com

Field Notes: How Agentic RAG Handles the Real Mess of Enterprise Data

#ai #agents #luhuidev

🙋‍
I’m Luhui Dev, a developer who has been breaking down Agent engineering and exploring how AI can be applied in education.
I focus on Agent Harness, LLM application engineering, AI for Math, and the productization of education SaaS.

A Support Ticket That Opens a Data Maze

Say your company just launched an AI customer-support system.

A major account sends in a ticket: "What's the remaining warranty period on the servers we purchased under Project Alpha last quarter? Could you also share the original contract terms and the current technical support contact?"

It sounds like an ordinary question. But when your tech lead reads the ticket, they pause for a moment.

Because they know answering it requires the system to:

Look up the customer's profile and project history in the CRM
Look up the procurement contract and warranty terms for Project Alpha in the ERP / contract management system
Look up the stock-in date and device serial numbers for that batch of servers in the asset management system
Look up the current customer-success owner in the HR system

These four systems are maintained by different teams, run on different databases, and enforce different access controls.

A standard RAG system is helpless here. The best it can do is say, "Sorry, I couldn't find relevant information."

This is exactly the problem Agentic RAG is built to solve.

Traditional RAG: A One-Shot Retrieval Clerk

Let's quickly recap how RAG works.

The core idea behind RAG (Retrieval-Augmented Generation) is simple: an LLM's training knowledge is static, while enterprise data is dynamic and private. The fix is to retrieve relevant document chunks from a database before generating an answer, stuff them into the context, and have the LLM answer based on that material.

User question → [vector search] → retrieve relevant chunks → [LLM] → generate answer

This pipeline works well when there's a single knowledge base and a clear question. But it has two fundamental limits.

Limit one: a single retrieval pass, no iteration. Retrieve once, hand it to the LLM once, done. If the first pass misses the key information, the whole chain breaks, and the LLM is left guessing or saying "I don't know."

Limit two: a single corpus, no routing. Traditional RAG assumes all knowledge lives in one unified vector database. In a real enterprise, data is scattered across CRM, ERP, Confluence, data warehouses, private document stores -- each with its own access point and permission boundary.

Here's an analogy: traditional RAG is a librarian who can only find books on the first floor, while the book you need might be sitting on the fourth floor, behind a different access pass.

Agentic RAG: A Retrieval Department That Thinks

The core shift in Agentic RAG is this: turn a single retrieval pass into a planned, iterative retrieval process.

It's no longer a passive query-and-return pipeline. It's a workflow run by multiple specialized agents, each with a distinct responsibility.

Let's use the support-ticket example to walk through how the whole workflow operates.

Step 1: The Orchestrator Decomposes the Task

The user's question first reaches the Orchestrator.

The orchestrator doesn't retrieve anything directly. It first understands the structure of the question: how many independent information needs are involved? Are there dependencies between them? Which data sources need to be accessed?

For our ticket, the orchestrator breaks it down into:

Subtask A: Look up the customer's "Project Alpha" basics (customer ID, project number) in the CRM
Subtask B: Use the project number to look up warranty terms in the contract system
Subtask C: Use the project number to look up device serial numbers and stock-in dates in the asset management system
Subtask D: Look up the current technical support owner in the HR system

Note that Subtasks B and C depend on the result of Subtask A (they need the project number first). Subtask D can run in parallel.

This dependency graph is the execution plan produced by the Planner Agent.

Step 2: Query Rewriting for Each Data Source

Every data source expects queries in a different form. The CRM might need keyword search, the contract system might need structured SQL, and the vector database needs semantic search.

The Query Rewriter translates each natural-language subtask into a query format the target source can understand:

For the CRM vector store: "Alpha project procurement record {customer name}"
For the contract system: SELECT warranty_terms FROM contracts WHERE project_id = 'Alpha-XXX'
For asset management: "Alpha project server stock-in date serial number"

Step 3: Parallel Retrieval Across Permission Boundaries

The Search Fanout Agent queries multiple data sources at once.

There's a key engineering problem here: permissions.

Different data sources have different access controls. CRM data might be open to the sales team, HR data might only be accessible to admins, and contract data might require legal sign-off. An Agentic RAG framework needs to maintain a "credential pool" at this layer -- different access tokens for different data sources -- and make sure retrieval never exceeds the current user's actual authorization scope.

This isn't just a technical problem; it's a compliance one too: AI shouldn't be able to bypass data access controls you were never supposed to have just because you phrased a request in natural language.

Step 4: Sufficiency Checking -- the Most Important Innovation

Once all retrieval results come back, they're passed to the Sufficient Context Agent.

This is the design that most distinguishes Agentic RAG from traditional RAG: the system actively judges whether the information gathered so far is enough to answer the original question, and if not, it spells out exactly what's missing before retrieving again.

In our ticket example, the checker might find:

✅ Found: customer profile, project number, device serial numbers
✅ Found: technical support owner
❌ Missing: the contract system returned a document, but the warranty terms are in an attached PDF that the vector search didn't hit

Instead of just saying "not enough information," the checker outputs a precise description of the gap:

"Project number Alpha-2024-087, device serial numbers SN-XXX-YYY-ZZZ, and stock-in date March 2024 have been retrieved. The main contract file has been found, but the warranty terms are in Contract Attachment B. Re-search the contract attachment store specifically for 'Attachment B warranty period.'"

That feedback drives a second retrieval round: the rewriter generates a more precise query targeted at the contract attachment.

This "retrieve → evaluate → retrieve again" loop continues until the sufficiency checker determines the information is complete, or the maximum iteration limit is reached.

Step 5: Synthesis Produces the Final Answer

Once everything is in place, the Synthesis Agent combines fragments from four different systems into one coherent, accurate, and source-attributed answer:

"The three servers purchased under Project Alpha (project number Alpha-2024-087, serial numbers SN-XXX-001 through 003) carry a 36-month warranty from their stock-in date (March 15, 2024), per Section 4.2 of Contract Attachment B, expiring March 14, 2027. The current technical support owner is Li Ming (extension 4521, liming@company.com)."

Every sentence has a traceable source.

Cross-System Permissions: Harder Than the Technology

The handling of permission boundaries deserves its own discussion.

In a real enterprise, data permissions are a multi-dimensional problem:

Dimension	Description	Example
Role-based access	Different roles see different data	Sales can see a contract summary but not the full text
Data classification	A single database can hold multiple sensitivity levels	Employee salary vs. employee directory
Time-based access	Some data has time-limited access rules	Financial data is read-only during an audit
Cross-system access	Data from System A must not surface in System B's context	GDPR requires data to stay within its jurisdiction

An Agentic RAG framework needs to enforce these rules on every single retrieval call, not just authorize access once at indexing time.

That means the architecture needs permission checks at query time, rather than the blunt approach of vectorizing everything into one big store.

In database terms: traditional RAG is like joining every table into one giant table and handing it to the LLM. Agentic RAG is like generating a permission-filtered SQL query dynamically for every request.

Three Decisions You Can't Avoid in Practice

When you actually build Agentic RAG in production, three decisions come up every time.

Decision one: routing strategy -- static rules or LLM routing?

Static routing: predefine rules based on keywords or metadata in the query to decide which data source to hit. Fast and predictable, but weak on open-ended queries.

LLM routing: let the LLM understand the query's intent and dynamically decide where to route it. Flexible, but every routing decision burns an LLM call, adding latency and cost.

Decision two: iteration depth -- when do you stop?

The system can get stuck in an infinite loop -- every round of retrieval feels like something is still missing, so it keeps searching.

Engineering-wise, you need:

A maximum iteration count (typically 2-4 rounds)
A time budget (answer with what you have once you time out)
A degradation strategy (answer with available information and flag it as potentially incomplete once the iteration limit is hit)

Decision three: the latency-vs-accuracy trade-off

Agentic RAG is slower than traditional RAG -- there's no avoiding it. Multiple LLM calls, parallel retrieval, and sufficiency evaluation all add latency at every step.

Approach	Cost multiplier	Latency multiplier	Best fit
Traditional RAG	1x	1x	Simple Q&A, single knowledge base
Adaptive RAG	1.5-2x	1.2-2x	Mixed scenarios with varying query complexity
CRAG (Corrective RAG)	3-5x	2-3x	High accuracy needs, tolerant of second-scale latency
Full Agentic RAG	5-10x	3-6x	Complex multi-hop, cross-store, async scenarios

Not every scenario needs full Agentic RAG.

Classifying intent at the query level -- routing complex queries through the Agentic pipeline and simple ones through traditional RAG -- keeps average cost and latency within a reasonable range.

Closing Thoughts

I think the essence of Agentic RAG is turning retrieval into an executable strategy: if one pass isn't enough, keep searching until it is. And the system itself decides what "enough" means.

That shift sounds simple, but it requires moving from a stateless "query-response" model to a stateful "goal-plan-execute-evaluate-iterate" workflow.

This is the same general challenge every agent system faces: state management is the core difficulty.

If you're building an enterprise AI system that spans multiple data sources, Agentic RAG isn't just an upgrade to your retrieval technique. It forces you to rethink data architecture, permission design, and workflow orchestration. Getting those three things right matters more than which framework or cloud vendor you pick.

References

Google Research, Unlocking dependable responses with Gemini Enterprise Agent Platform's Agentic RAG, June 2026
Microsoft, Agentic Retrieval Overview -- Azure AI Search, 2026-04-01 GA
Microsoft, What is a Knowledge Source? -- Azure AI Search
Amazon Web Services, Knowledge Bases for Amazon Bedrock -- Multiple Data Sources, April 2024
MarsDevs, Agentic RAG: The 2026 Production Guide (includes cost/latency comparisons across approaches)
Google Research, Deeper Insights into Retrieval-Augmented Generation: The Role of Sufficient Context

DEV Community