DEV Community

Nishant prakash
Nishant prakash

Posted on

I Was Done Getting Answers — So I Built RAG That Asks Questions Too

“What if a RAG system could not only fetch information but also reason about it, critique itself, and write a report, all autonomously?”
That question sent me down a rabbit hole that ended with Data-Inspector, a proof-of-concept Agentic RAG pipeline built with Ollama, LangChain, Tavily, and Streamlit.

Rag was not enough

The Spark — From RAG to Agentic RAG

Traditional RAG systems are brilliant at retrieval and response, but not at reasoning or reflection.

They typically:

  1. Retrieve documents relevant to a query.
  2. Feed them to a large language model (LLM).
  3. Generate an answer that sounds confident, even when it’s wrong.

The model reads, but it doesn’t think.

So I began wondering: what if we could assign roles inside the RAG flow?
One agent fetches data, another summarizes, another synthesizes, another critiques, like a research team working in harmony.

That’s how Data-Inspector was born, a system that doesn’t just “search and answer,” but “reads, reasons, and reviews.”


What Exactly Is Agentic RAG?

Before diving into code, let’s unpack what this buzzword really means.

Agentic RAG (Retrieval-Augmented Generation) is an evolution of the classic RAG pipeline.
While traditional RAG enhances an LLM with external knowledge, Agentic RAG gives that process a mind of its own.

From Static Pipelines to Autonomous Reasoners

In standard RAG, you have a single, linear pipeline:

Retrieve → Generate
Enter fullscreen mode Exit fullscreen mode

It’s powerful but static, there’s no reflection, no iteration, and no specialization.

Agentic RAG transforms this static chain into a network of intelligent roles, each responsible for one cognitive task:

Retrieve → Understand → Synthesize → Critique → Generate → (Loop back if needed)
Enter fullscreen mode Exit fullscreen mode

Every role acts as an agent, capable of reasoning over its inputs, producing structured outputs, and handing them off to the next stage.


The Key Principles Behind Agentic RAG

  • Role-based Autonomy Each agent (retriever, summarizer, critic, etc.) has a clearly defined job and communicates via structured data (JSON, markdown).

This modularity allows independent improvement of each skill, like retraining just your summarizer agent for better factual grounding.

  • Reflection Loops Agentic systems don’t stop at the first output. They evaluate and refine.

This is what turns a “talkative assistant” into a “thoughtful collaborator.”

  • Dynamic Knowledge Access
    Instead of relying only on a static vector database, agentic systems can trigger live searches, query APIs, or even plan multi-step reasoning chains.

  • Transparency & Explainability
    Each stage produces interpretable intermediate artifacts, summaries, reviews, critique logs, making the system auditable and debuggable.


Common Architectures of Agentic RAG

Architecture Type Description Example Use
Planner–Executor Loop A planning agent decomposes a task, executors handle retrieval and summarization. Workflow orchestration in research assistants.
Critic–Refiner Loop The system critiques its own output and regenerates it. Self-RAG, Self-Refine, Reflexion.
Multi-Agent Collaboration Multiple specialized agents work in a pipeline, passing structured outputs downstream. Data-Inspector😛

The approach I took, multi-agent collaboration, felt the most natural.
Each Python class became a self-contained professional: retriever, summarizer, synthesizer, and critic, all orchestrated by a pipeline.


Architecture Overview — A RAG System with Personality

Data-Inspector/
├── agents/
│   ├── retriever.py       # Retrieval
│   ├── summarizer.py      # Summarization
│   ├── synthesizer.py     # Knowledge fusion
│   └── critic.py          # Review / Reflection
├── rag/
│   ├── chunker.py         # Document processing
│   └── vectorstore.py     # Vector memory (optional)
├── pipeline.py            # Agentic orchestration
└── ui_streamlit.py        # Interactive interface
Enter fullscreen mode Exit fullscreen mode

Each component acts like a neuron in a cognitive system, independent yet collaborative.


Retrieval — Learning to Find Relevant Knowledge

The Retriever is powered by the Tavily API. It’s the system’s scout, locating relevant information for the query.

class WebRetriever:
    def search_urls(self, query):
        res = self.client.search(query=query, max_results=self.max_sources)
        return res.get("results", [])[: self.max_sources]
Enter fullscreen mode Exit fullscreen mode

Unlike traditional RAG’s static embeddings, this retrieves live knowledge, keeping the system temporally aware and factually updated.


Chunking — Learning to Read Like a Human

HTML pages are noisy. The chunker.py module cleans and splits them into coherent text segments.

def prepare_chunks(raw_html):
    cleaned = clean_text(raw_html)
    chunks = chunk_text(cleaned)
    return chunks
Enter fullscreen mode Exit fullscreen mode

Breaking long text into overlapping chunks lets the summarizer think locally while preserving context globally, just like a human scanning through paragraphs.


Summarization — Turning Reading Into Understanding

Each chunk passes through the SummarizerAgent, guided by a structured system prompt.

SYSTEM_SUMMARIZER = """
You are a precise technical summarizer...
Return JSON with: key_points[], methods, evidence[], limitations[]
"""
Enter fullscreen mode Exit fullscreen mode

Sample output:

{
  "title": "RAG vs Fine-Tuning",
  "key_points": ["RAG adapts faster", "Fine-tuning offers deeper control"],
  "limitations": ["Depends on retrieval quality"]
}
Enter fullscreen mode Exit fullscreen mode

All agents speak in JSON, a shared language that prevents context drift and ensures machine-readable collaboration.


Synthesis — Connecting the Dots

The SynthesisAgent merges multiple summaries into a unified comparative analysis.

def synthesize(self, query, summaries):
    bulletized = "\n".join([f"- {s['title']}: {', '.join(s['key_points'][:5])}" for s in summaries])
    prompt = f"System:{self.system}\nUser: Query: {query}\n{bulletized}"
    return self.llm.invoke(prompt)
Enter fullscreen mode Exit fullscreen mode

Here, the model evolves from “reader” to “analyst,” forming relationships between insights and organizing them logically.


Critique — Giving the System a Conscience

The CriticAgent inspects the synthesized narrative and calls out weak logic or missing perspectives.

def review(self, query, synthesis, summaries):
    prompt = f"System:{self.system}\nUser: Query: {query}\nSYNTHESIS:\n{synthesis}"
    out = self.llm.invoke(prompt)
    return json.loads(out)
Enter fullscreen mode Exit fullscreen mode

Output example:

{
  "missing_perspectives": ["Data bias"],
  "weak_arguments": ["Unsupported claims about fine-tuning benefits"],
  "overall_risk": "medium"
}
Enter fullscreen mode Exit fullscreen mode

This reflective loop transforms a basic RAG pipeline into a self-aware reasoning system.


Report Generation — From Thought to Thesis

Finally, all insights are compiled into a Markdown report via pipeline.py.

report_prompt = f"""
System:{SYSTEM_REPORT}

User: Query: {query}
SYNTHESIS: {synthesis}
CRITIC REVIEW: {review}
Write final report in Markdown.
"""
report_md = self.report_llm.invoke(report_prompt)
Enter fullscreen mode Exit fullscreen mode

The result reads like an academic mini-paper:

  • Executive summary
  • Comparative analysis
  • Decision framework
  • Risks and gaps
  • References

The system doesn’t just compute, it articulates.


Why Agentic RAG Outperforms Traditional RAG

Feature Traditional RAG Agentic RAG (Data-Inspector)
Architecture Single linear chain Multi-agent collaboration
Learning Behavior Retrieval + generation only Retrieval + reasoning + reflection
Error Handling None — one-shot generation Built-in self-critique loop
Explainability Opaque output Transparent intermediate JSONs
Adaptability Static embeddings Dynamic web retrieval + modular agents
Output Depth Fluent but shallow Analytical, reference-backed synthesis

Agentic RAG = Traditional RAG + Cognition.
It elevates retrieval-augmented generation into reason-augmented generation.


Lessons Learned

  • Prompts are contracts. Each agent must have a clear, bounded responsibility, otherwise, outputs collapse into noise.
  • Autonomy is discipline disguised as freedom. Structured interaction enables creativity without chaos.
  • Critique breeds truth. The CriticAgent was the breakthrough, the moment the system began questioning itself, quality skyrocketed.

Looking Ahead

Agentic RAG hints at a future where models won’t just generate answers but will collaborate intelligently.
When Data-Inspector finished its first report, it didn’t feel like I’d run code, it felt like I’d led a discussion with a team of invisible colleagues.


Explore the Project

GitHub: Data-Inspector — Agentic RAG Demo

Run it locally:

pip install -r requirements.txt
streamlit run app/ui_streamlit.py
Enter fullscreen mode Exit fullscreen mode

Final Reflection

What began as a question: “Can RAG think critically?” -> evolved into an experiment in digital reasoning.
And maybe that’s the trajectory AI will take next:

from systems that answer questions to systems that question their own answers.

Top comments (0)