DEV Community

Cover image for Agent Series (7): Knowledge Base Integration — The Right Way for Agents to Use RAG
WonderLab
WonderLab

Posted on

Agent Series (7): Knowledge Base Integration — The Right Way for Agents to Use RAG

RAG Meets Agent — It's More Than "Giving the LLM a Search Box"

Most people encounter RAG in this form: user asks a question → retrieve from a knowledge base → stuff the results into the prompt → LLM generates an answer.

That's Pipeline RAG. It works. But it has a fundamental problem — it doesn't think.

Pipeline RAG runs retrieval for every question, regardless of whether it's "How much does WonderBot cost?" (genuinely needs the KB) or "How do you average a Python list?" (the LLM already knows). It's like a worker with one tool: no matter what the job is, first make a trip to the warehouse.

Agentic RAG solves this: let the Agent decide when to retrieve, what to retrieve, and whether the result is good enough.

This article focuses on three core capabilities:

  1. Retrieval decision: Does this question need a KB lookup at all?
  2. Multi-KB routing: It does need retrieval — but from which knowledge base?
  3. Quality gating + fallback: Got results — are they good enough? If not, what next?

Pipeline RAG vs Agentic RAG: The Architectural Difference

Pipeline RAG (always retrieves):
  User question
    ↓
  Vector retrieval (regardless of question type)
    ↓
  Inject into prompt
    ↓
  LLM generates

Agentic RAG (intelligent decisions):
  User question
    ↓
  [Decision node] Does this need retrieval?
    ├─ No  → LLM answers directly (common knowledge / math / general coding)
    └─ Yes → Which knowledge base?
                ├─ product_kb  (features / pricing)
                ├─ ops_kb      (deployment / monitoring)
                └─ faq_kb      (accounts / refunds / invoices)
                      ↓
                Is retrieval quality sufficient?
                  ├─ Yes → LLM generates
                  └─ No  → rewrite query → retry (max 2×) → LLM generates
Enter fullscreen mode Exit fullscreen mode

The core difference: LLM is the control center, not a downstream text generator.


Demo 1: Pipeline RAG vs Agentic RAG

Five test questions: three genuinely need the knowledge base, two don't (general knowledge, arithmetic).

Pipeline RAG

def pipeline_rag(question: str) -> dict:
    """Pipeline RAG: retrieve → inject → generate, always."""
    docs = unified_retriever.invoke(question)
    context = "\n".join(d.page_content for d in docs)
    answer = _ask(
        f"Answer based on the following reference material.\nReference: {context}",
        question,
    )
    return {"answer": answer, "retrieved": True, "docs": len(docs)}
Enter fullscreen mode Exit fullscreen mode

Agentic RAG

def agentic_rag(question: str) -> dict:
    """Agentic RAG: decide first, then (optionally) retrieve."""
    decision = _ask(
        "Decide whether the following question requires a knowledge base lookup.\n"
        "Needs retrieval: product pricing/features, operations procedures, service policies\n"
        "Skip retrieval: general knowledge, arithmetic, standard programming syntax\n"
        "Answer only yes or no",
        f"Question: {question}",
    ).strip().lower()

    if "yes" not in decision:
        answer = _ask("You are a knowledgeable assistant. Answer directly.", question)
        return {"answer": answer, "retrieved": False, "docs": 0}
    else:
        docs = unified_retriever.invoke(question)
        context = "\n".join(d.page_content for d in docs)
        answer = _ask(f"Answer based on the following reference.\nReference: {context}", question)
        return {"answer": answer, "retrieved": True, "docs": len(docs)}
Enter fullscreen mode Exit fullscreen mode

Measured Results

Question Type | Pipeline Retrieval | Agentic Retrieval | Question
─────────────────────────────────────────────────────────────────
Product feat. |    ✓ (3 docs)     |    ✓ (3 docs)    | WonderBot Basic plan — monthly API calls?
Ops           |    ✓ (3 docs)     |    ✓ (3 docs)    | Minimum memory to deploy WonderBot?
User service  |    ✓ (3 docs)     |    ✓ (3 docs)    | Can I get a refund after 30 days?
General know. |    ✓ (3 docs)     |    ✗ skipped     | How to average a Python list?
Arithmetic    |    ✓ (3 docs)     |    ✗ skipped     | What is 1024 divided by 32?
Enter fullscreen mode Exit fullscreen mode

Pipeline RAG retrieved for all five questions — including "What is 1024 divided by 32?" where KB content offers zero value. Agentic RAG correctly skipped retrieval for the two general-knowledge questions.

Not every question is worth a warehouse trip.


Demo 2: Multi-Knowledge-Base Routing

Real enterprise deployments typically maintain multiple knowledge bases: product docs, ops manuals, user FAQ. Different questions belong in different KBs.

Three Knowledge Bases

PRODUCT_DOCS = [
    Document(page_content="WonderBot Pro pricing: Basic ¥99/mo, Pro ¥299/mo, Enterprise custom."),
    Document(page_content="API limits: Basic 10K calls/mo, Pro 100K/mo; overage billed at ¥0.01/call."),
    Document(page_content="Supported LLMs: GPT-4, Claude 3, Gemini Pro, GLM-4 — switchable in the console."),
    Document(page_content="Data security: stored on China-region servers, Level-3 security certified."),
]

OPS_DOCS = [
    Document(page_content="Deployment: Docker 20+, ≥8GB RAM, ≥4 CPU cores, recommended: docker-compose."),
    Document(page_content="Troubleshooting: service down → docker ps; API timeout → check LLM connectivity."),
    Document(page_content="Backup: auto daily at 2am, 30-day retention, restore via restore.sh."),
    Document(page_content="Alerts: CPU >80% for 5min, memory >90%, API error rate >5% → WeChat Work webhook."),
]

FAQ_DOCS = [
    Document(page_content="Password reset: 'Forgot password' → enter email → check reset link → set new password."),
    Document(page_content="Refund policy: full refund within 7 days; prorated within 30 days; none after 30."),
    Document(page_content="Invoice: request in Billing Center → fill company info → e-invoice in 3-5 business days."),
    Document(page_content="API Key: create/revoke in Developer Settings; max 5 keys per account."),
]
Enter fullscreen mode Exit fullscreen mode

LangGraph Routing

def route_node(state: RoutingState) -> RoutingState:
    """Step 1: LLM selects the target knowledge base"""
    decision = _ask(
        "Based on the question, pick the right knowledge base (output name only):\n"
        "product - product features, pricing, tech specs, supported models\n"
        "ops     - deployment, operations, troubleshooting, monitoring, backups\n"
        "faq     - accounts, passwords, refunds, invoices, API Keys",
        f"Question: {state['question']}",
    ).strip().lower()
    ...
Enter fullscreen mode Exit fullscreen mode

The graph topology is minimal:

route → retrieve → generate
Enter fullscreen mode Exit fullscreen mode

route_node's output determines which retriever retrieve_node uses.

Measured Routing Accuracy

Six questions (two per knowledge base), real execution:

Expected KB    | Actual Route | Match | Question
─────────────────────────────────────────────────────────────────────
→ product      |  product     |  ✓   | Pro plan monthly price? Which LLMs are supported?
→ product      |  ops         |  ✗   | Where is data stored? What security certification?
→ ops          |  ops         |  ✓   | How do I troubleshoot API timeouts?
→ ops          |  ops         |  ✓   | What alert fires when CPU hits 80%?
→ faq          |  faq         |  ✓   | I bought 15 days ago — how much of a refund do I get?
→ faq          |  ops         |  ✗   | How do I get a VAT invoice for my company?

Routing accuracy: 4/6 = 67%
Enter fullscreen mode Exit fullscreen mode

Two misroutes worth examining:

  • "Where is data stored / security certification" → routed to ops (should be product): the LLM associated "data storage" with infrastructure/operations instead of product capabilities
  • "VAT invoice for my company" → routed to ops (should be faq): "company" in the question triggered an ops association

67% accuracy with a single routing prompt is typical baseline performance — useful, but not production-ready for high-stakes routing. Common improvements:

# Improvement: add boundary examples to the routing prompt
route_prompt = """
Determine the correct knowledge base:
product: product pricing / features / supported models / data security certification
ops: service deployment / troubleshooting / monitoring / backup procedures
faq: accounts / passwords / refunds / invoices / API Keys / billing

Examples:
"which models are supported" → product
"invoice" → faq          ← billing is always faq, even for companies
"data storage security"   → product  ← security certs are product features

Question: {question}
"""
Enter fullscreen mode Exit fullscreen mode

Full Example Answer

Question: "How do I troubleshoot API timeouts?" → routed to ops, retrieved and generated:

Routed to: ops_kb
Answer: For API timeout troubleshooting, follow these steps:
1. Check LLM service connectivity to ensure the network connection is healthy.
2. Verify Docker container status using `docker ps` to confirm services are running.
3. If the cause is memory overflow, increase the Docker memory limit.
Enter fullscreen mode Exit fullscreen mode

The KB match was correct, and the answer directly references the troubleshooting steps from the ops documents.


Demo 3: Quality Gating + Query Rewriting Fallback

When retrieval quality is poor, blindly generating from low-quality context produces bad answers. A better approach: rewrite the query and try again.

Core Flow

retrieve → evaluate_quality
                 ├─ score ≥ 0.6 → generate
                 └─ score < 0.6 and retries < 2 → rewrite_query → retrieve (loop back)
Enter fullscreen mode Exit fullscreen mode

LangGraph Implementation

QUALITY_THRESHOLD = 0.6
MAX_RETRIES = 2

class QualityGateState(TypedDict):
    question:      str
    rewritten_q:   str    # current query (starts as original question)
    context:       str
    quality_score: float
    answer:        str
    attempts:      int
    path:          list

def qg_rewrite_node(state: QualityGateState) -> QualityGateState:
    """Rewrite the vague query into something more specific"""
    rewritten = _ask(
        "Rewrite the following vague question as a more specific search query, "
        "keeping the original intent but adding relevant keywords. Output only the rewritten query.",
        state["question"],
    ).strip()
    return {**state, "rewritten_q": rewritten, "attempts": state["attempts"] + 1}

def should_rewrite(state: QualityGateState) -> str:
    if state["quality_score"] >= QUALITY_THRESHOLD:
        return "generate"           # quality is sufficient
    if state["attempts"] >= MAX_RETRIES:
        return "generate"           # retry limit reached — fallback generate
    return "rewrite"                # quality too low — rewrite and retry
Enter fullscreen mode Exit fullscreen mode

Measured Results

Three extremely vague questions:

Original Question     | Retries | Final Quality | Execution Path
─────────────────────────────────────────────────────────────────────
"how much does it cost"  |    2    |    0.00      | retrieve → eval(0.50) → rewrite → retrieve → eval(0.00) → rewrite → retrieve → eval(0.00) → generate
"something went wrong"   |    2    |    0.50      | retrieve → eval(0.50) → rewrite → retrieve → eval(0.50) → rewrite → retrieve → eval(0.50) → generate
"money stuff"            |    2    |    0.50      | retrieve → eval(0.50) → rewrite → retrieve → eval(0.50) → rewrite → retrieve → eval(0.50) → generate
Enter fullscreen mode Exit fullscreen mode

Detailed trace for "how much does it cost":

Original query:  "how much does it cost"
  ↓ retrieve  → pulled backup / deployment / refund docs (unrelated)
  ↓ evaluate  → quality score 0.50 (LLM sees slight relevance)
  ↓ rewrite   → "product price range query" (too generic, lost context)
  ↓ retrieve  → quality drops further
  ↓ evaluate  → quality score 0.00
  ↓ rewrite   → "product price range query" (no improvement)
  ↓ generate  → fallback answer

Final answer: "Based on the provided reference material, pricing information is not
               included. If you need pricing details, please contact the service
               provider or visit their official website."
Enter fullscreen mode Exit fullscreen mode

This result teaches an important lesson: query rewriting can't fix a fundamentally underspecified question. "How much does it cost" rewritten to "product price range query" lost the product name context entirely, making retrieval worse, not better.

The deeper fix is to add a clarification step when quality remains persistently low:

# Better approach: ask the user to clarify instead of looping
if state["attempts"] >= MAX_RETRIES and state["quality_score"] < 0.3:
    return "clarify"   # new node: ask "Which product/service are you asking about?"
Enter fullscreen mode Exit fullscreen mode

This is the real challenge in Agentic RAG — low retrieval quality isn't always a retrieval strategy problem. Sometimes the question itself is missing information.


Agentic RAG Design Checklist

Key decision points when building an Agentic RAG system:

Retrieval Decision Layer

  • [ ] Define which question types need retrieval (domain-specific vs. general knowledge)
  • [ ] Include specific boundary examples in the decision prompt to reduce ambiguity
  • [ ] Define explicit skip_retrieval categories: pure math, coding syntax, general facts

Knowledge Base Routing Layer

  • [ ] Write clear descriptions for each KB (type + typical questions + boundary cases)
  • [ ] If routing accuracy < 80%, add Few-shot examples or use a dedicated classifier
  • [ ] Support cross-KB retrieval when questions span multiple domains

Quality Gating Layer

  • [ ] Set a sensible threshold (0.6 is a reasonable starting point)
  • [ ] Cap max retries (2 is usually enough — diminishing returns after that)
  • [ ] Log every rewrite and quality score to drive future improvements
  • [ ] When quality stays persistently low, escalate to user clarification rather than hallucinating

Production Concerns

  • [ ] Add domain-specific examples to routing prompts for problematic edge cases
  • [ ] Consider hybrid retrieval (vector + BM25) to improve baseline quality
  • [ ] Track which questions skip retrieval and which trigger rewrites — use the data to iterate

Summary

Five key conclusions:

  1. Pipeline RAG's problem isn't retrieval — it's the lack of judgment: running retrieval on every question wastes resources and introduces irrelevant content that confuses the LLM
  2. Agentic RAG's essence is LLM-as-scheduler: retrieval, routing, and evaluation are all LLM decisions, not hardwired pipeline steps
  3. Multi-KB routing accuracy with a plain prompt is limited: 67% baseline with a one-sentence routing prompt is typical. Production needs Few-shot examples or a dedicated model
  4. Quality gating + query rewriting is not a silver bullet: extremely vague questions may produce worse rewrites — the real fix is asking the user
  5. LangGraph makes Agentic RAG easy to extend: adding a new KB is one new node + updated routing prompt, no structural changes needed

Next up: Context Engineering — token budget management, dynamic context assembly, and how to make every token count in a 128K context window.


References


Find more useful knowledge and interesting products on my Homepage

Top comments (0)