Nishant prakash

Posted on Nov 12

I Was Done Getting Answers — So I Built RAG That Asks Questions Too

#ai #python #programming #rag

“What if a RAG system could not only fetch information but also reason about it, critique itself, and write a report, all autonomously?”
That question sent me down a rabbit hole that ended with Data-Inspector, a proof-of-concept Agentic RAG pipeline built with Ollama, LangChain, Tavily, and Streamlit.

The Spark — From RAG to Agentic RAG

Traditional RAG systems are brilliant at retrieval and response, but not at reasoning or reflection.

They typically:

Retrieve documents relevant to a query.
Feed them to a large language model (LLM).
Generate an answer that sounds confident, even when it’s wrong.

The model reads, but it doesn’t think.

So I began wondering: what if we could assign roles inside the RAG flow?
One agent fetches data, another summarizes, another synthesizes, another critiques, like a research team working in harmony.

That’s how Data-Inspector was born, a system that doesn’t just “search and answer,” but “reads, reasons, and reviews.”

What Exactly Is Agentic RAG?

Before diving into code, let’s unpack what this buzzword really means.

Agentic RAG (Retrieval-Augmented Generation) is an evolution of the classic RAG pipeline.
While traditional RAG enhances an LLM with external knowledge, Agentic RAG gives that process a mind of its own.

From Static Pipelines to Autonomous Reasoners

In standard RAG, you have a single, linear pipeline:

Retrieve → Generate

It’s powerful but static, there’s no reflection, no iteration, and no specialization.

Agentic RAG transforms this static chain into a network of intelligent roles, each responsible for one cognitive task:

Retrieve → Understand → Synthesize → Critique → Generate → (Loop back if needed)

Every role acts as an agent, capable of reasoning over its inputs, producing structured outputs, and handing them off to the next stage.

The Key Principles Behind Agentic RAG

Role-based Autonomy Each agent (retriever, summarizer, critic, etc.) has a clearly defined job and communicates via structured data (JSON, markdown).

This modularity allows independent improvement of each skill, like retraining just your summarizer agent for better factual grounding.

Reflection Loops Agentic systems don’t stop at the first output. They evaluate and refine.

This is what turns a “talkative assistant” into a “thoughtful collaborator.”

Dynamic Knowledge Access
Instead of relying only on a static vector database, agentic systems can trigger live searches, query APIs, or even plan multi-step reasoning chains.
Transparency & Explainability
Each stage produces interpretable intermediate artifacts, summaries, reviews, critique logs, making the system auditable and debuggable.

Common Architectures of Agentic RAG

Architecture Type	Description	Example Use
Planner–Executor Loop	A planning agent decomposes a task, executors handle retrieval and summarization.	Workflow orchestration in research assistants.
Critic–Refiner Loop	The system critiques its own output and regenerates it.	Self-RAG, Self-Refine, Reflexion.
Multi-Agent Collaboration	Multiple specialized agents work in a pipeline, passing structured outputs downstream.	Data-Inspector😛

The approach I took, multi-agent collaboration, felt the most natural.
Each Python class became a self-contained professional: retriever, summarizer, synthesizer, and critic, all orchestrated by a pipeline.

Architecture Overview — A RAG System with Personality

Data-Inspector/
├── agents/
│   ├── retriever.py       # Retrieval
│   ├── summarizer.py      # Summarization
│   ├── synthesizer.py     # Knowledge fusion
│   └── critic.py          # Review / Reflection
├── rag/
│   ├── chunker.py         # Document processing
│   └── vectorstore.py     # Vector memory (optional)
├── pipeline.py            # Agentic orchestration
└── ui_streamlit.py        # Interactive interface

Each component acts like a neuron in a cognitive system, independent yet collaborative.

Retrieval — Learning to Find Relevant Knowledge

The Retriever is powered by the Tavily API. It’s the system’s scout, locating relevant information for the query.

class WebRetriever:
    def search_urls(self, query):
        res = self.client.search(query=query, max_results=self.max_sources)
        return res.get("results", [])[: self.max_sources]

Unlike traditional RAG’s static embeddings, this retrieves live knowledge, keeping the system temporally aware and factually updated.

Chunking — Learning to Read Like a Human

HTML pages are noisy. The chunker.py module cleans and splits them into coherent text segments.

def prepare_chunks(raw_html):
    cleaned = clean_text(raw_html)
    chunks = chunk_text(cleaned)
    return chunks

Breaking long text into overlapping chunks lets the summarizer think locally while preserving context globally, just like a human scanning through paragraphs.

Summarization — Turning Reading Into Understanding

Each chunk passes through the SummarizerAgent, guided by a structured system prompt.

SYSTEM_SUMMARIZER = """
You are a precise technical summarizer...
Return JSON with: key_points[], methods, evidence[], limitations[]
"""

Sample output:

{
  "title": "RAG vs Fine-Tuning",
  "key_points": ["RAG adapts faster", "Fine-tuning offers deeper control"],
  "limitations": ["Depends on retrieval quality"]
}

All agents speak in JSON, a shared language that prevents context drift and ensures machine-readable collaboration.

Synthesis — Connecting the Dots

The SynthesisAgent merges multiple summaries into a unified comparative analysis.

def synthesize(self, query, summaries):
    bulletized = "\n".join([f"- {s['title']}: {', '.join(s['key_points'][:5])}" for s in summaries])
    prompt = f"System:{self.system}\nUser: Query: {query}\n{bulletized}"
    return self.llm.invoke(prompt)

Here, the model evolves from “reader” to “analyst,” forming relationships between insights and organizing them logically.

Critique — Giving the System a Conscience

The CriticAgent inspects the synthesized narrative and calls out weak logic or missing perspectives.

def review(self, query, synthesis, summaries):
    prompt = f"System:{self.system}\nUser: Query: {query}\nSYNTHESIS:\n{synthesis}"
    out = self.llm.invoke(prompt)
    return json.loads(out)

Output example:

{
  "missing_perspectives": ["Data bias"],
  "weak_arguments": ["Unsupported claims about fine-tuning benefits"],
  "overall_risk": "medium"
}

This reflective loop transforms a basic RAG pipeline into a self-aware reasoning system.

Report Generation — From Thought to Thesis

Finally, all insights are compiled into a Markdown report via pipeline.py.

report_prompt = f"""
System:{SYSTEM_REPORT}

User: Query: {query}
SYNTHESIS: {synthesis}
CRITIC REVIEW: {review}
Write final report in Markdown.
"""
report_md = self.report_llm.invoke(report_prompt)

The result reads like an academic mini-paper:

Executive summary
Comparative analysis
Decision framework
Risks and gaps
References

The system doesn’t just compute, it articulates.

Why Agentic RAG Outperforms Traditional RAG

Feature	Traditional RAG	Agentic RAG (Data-Inspector)
Architecture	Single linear chain	Multi-agent collaboration
Learning Behavior	Retrieval + generation only	Retrieval + reasoning + reflection
Error Handling	None — one-shot generation	Built-in self-critique loop
Explainability	Opaque output	Transparent intermediate JSONs
Adaptability	Static embeddings	Dynamic web retrieval + modular agents
Output Depth	Fluent but shallow	Analytical, reference-backed synthesis

Agentic RAG = Traditional RAG + Cognition.
It elevates retrieval-augmented generation into reason-augmented generation.

Lessons Learned

Prompts are contracts. Each agent must have a clear, bounded responsibility, otherwise, outputs collapse into noise.
Autonomy is discipline disguised as freedom. Structured interaction enables creativity without chaos.
Critique breeds truth. The CriticAgent was the breakthrough, the moment the system began questioning itself, quality skyrocketed.

Looking Ahead

Agentic RAG hints at a future where models won’t just generate answers but will collaborate intelligently.
When Data-Inspector finished its first report, it didn’t feel like I’d run code, it felt like I’d led a discussion with a team of invisible colleagues.

Explore the Project

GitHub: Data-Inspector — Agentic RAG Demo

Run it locally:

pip install -r requirements.txt
streamlit run app/ui_streamlit.py

Final Reflection

What began as a question: “Can RAG think critically?” -> evolved into an experiment in digital reasoning.
And maybe that’s the trajectory AI will take next:

from systems that answer questions to systems that question their own answers.

DEV Community