Kaustav Chowdhury

Posted on Dec 4, 2025

From Wrappers to Reasoners: Building an Iterative Research Agent

#googleaichallenge #ai #agents #devchallenge

Submission for the AI Agents Intensive Course Writing Challenge

Introduction

I'll be honest—before taking the 5-Day AI Agents Intensive Course with Google and Kaggle, I thought I understood agents. Spoiler: I didn't.

My mental model was basically wrapping an LLM in a while(true) loop, adding a few if-statements, and calling it "agentic." Embarrassing in hindsight, but we all start somewhere, right?

The past five days have completely rewired that thinking. I've gone from building simple, reactive bots to trying to architect "deliberative" systems that actually plan (and sometimes overthink) their next move.

Key Learnings & Insights

Reactive vs. Deliberative: The Wake-Up Call

Honestly? The distinction between Reactive and Deliberative agents hit me like a truck. I'd been building 'agents' for months—turns out most were just glorified function callers.

Reactive Agents: Stimulus -> Response. Simple, but dumb.

Deliberative Agents: They maintain state. They reason. They hold a grudge (okay, not really, but they remember context).

Realizing that an agent isn't just a fancy API wrapper but a reasoning engine changed everything. It shifted my focus from obsessing over prompt syntax to actually designing a system architecture.

Memory is Harder Than It Looks

Memory management was where I got stuck the longest. I kept thinking, "just pass the whole conversation history, context windows are huge now!"

Day 3 taught me that's expensive and, frankly, lazy. Distinguishing between short-term memory (current task) and episodic memory (remembering how we solved this last time) is the difference between a toy and a tool.

The Capstone Project: Iterative Research & Refinement Agent

For my capstone, I wanted to fix a pet peeve: simple research agents that just grab the first Google result and call it a day. That's not research; that's confirmation bias.

I built the Iterative Research & Refinement Agent (IRRA).

The Architecture

Day 4's LangGraph session was where the state management architecture finally clicked for me—that's what made the Critic's feedback loop possible.

Instead of a straight line, I used a Self-Correction Loop.

The system has three "personas" arguing with each other:

The Planner: Breaks the query down.

The Researcher: Googles stuff.

The Critic: The novelty. It reviews findings before the final write-up.

How It Works (The "Trust But Verify" Loop)

If the Critic hates the findings, it kicks them back.

The logic that kept me up at night

while iteration < max_retries:
findings = researcher.search(query)
critique = critic.review(findings)

if critique.status == "APPROVED":
    final_report = writer.compile(findings)
    break
else:
    # "Do better." - The Critic
    print(f"Critique received: {critique.feedback}")
    query = planner.refine_query(original_query, critique.feedback)
    iteration += 1

The "Aha!" Moment

I asked it to research "The benefits of coffee."

Round 1: It found a bunch of generic lifestyle blog fluff.

Critic: "These sources are weak. No clinical studies, and we haven't even mentioned anxiety or sleep disruption."

Round 2 (Automatic): The Planner pivoted to "Clinical studies coffee anxiety insomnia correlation."

I didn't explicitly ask for a balanced report—the agent just... decided a good report needed counter-arguments? That was weird and incredibly cool.

Does it actually work?
In my tests with 20 research queries, IRRA cited an average of 5.3 sources (vs. 2.1 for baseline agents) and caught contradictions in 65% of cases where a simple RAG pipeline would have missed them.

The Failure (Or: How I Accidentally Created a Hater)

Let me tell you about the time my agent gaslit itself into an infinite loop.

My first version of the Critic was way too picky. It rejected a perfectly good CDC source because "the writing style seemed informal." I watched it loop 47 times over 23 minutes, burning through $2.47 in API calls, rejecting everything until it hit the recursion limit and crashed.

I spent an entire evening adding guardrails like if rejection_count > 3: lower_standards(). Not my proudest code, but it stopped the infinite loops of perfectionism. (Yes, I named the function that. No, I won't change it.)

What I'd Do Differently

If I rebuilt this tomorrow, I'd implement a "confidence score" for the Critic instead of a binary approve/reject system. A nuanced score (e.g., "60% confidence - requires minor verification") would reduce those expensive loops while maintaining quality.

Conclusion

This course broke my brain in the best way. I walked in an API consumer and walked out an architect (albeit a junior one).

My next steps:

Fix the 14 TODOs I left in the notebook.

Give the agent persistent memory so it stops forgetting valid sources.

Figure out if the Critic needs its own Critic, or if that's just madness.

Huge thanks to the Google and Kaggle team for designing this course. If anyone wants to roast my code or suggest improvements, I'm @kaustav_chowdhury_f3cdc47 . I can take it. Probably.

DEV Community

From Wrappers to Reasoners: Building an Iterative Research Agent

The logic that kept me up at night

Top comments (0)