DEV Community

oussama errafif
oussama errafif

Posted on

How to Build a Resilient AI-Powered Research Agent with LangChain, Gemini, and DuckDuckGo

🧠 Build a Resilient AI Research Agent with Gemini + LangChain (No APIs? No Problem.)

In this guide, you'll build a research super-agent that:

✅ Performs deep research on any topic
✅ Handles web/API failures gracefully
✅ Enhances its output with verifiable claims
✅ Generates a full professional report
✅ Uses Google Gemini, DuckDuckGo, Wikipedia, and optionally SerpAPI


🤔 Why This Matters

Most AI research agents work like this:

  1. Take a topic
  2. Query an LLM
  3. Dump the result

What’s wrong with that?

  • They fail when web search APIs are down
  • They hallucinate facts
  • They don’t verify anything
  • They can’t format output beyond a paragraph dump

This agent fixes all of that by combining multiple tools, smart rate-limiting, and multi-phase verification.

Let’s build it, step by step.


⚙️ Step 1: Install Required Tools

We need LangChain to connect LLMs and tools, Gemini for powerful reasoning, and dotenv for environment configs.

pip install langchain-google-genai python-dotenv requests
Enter fullscreen mode Exit fullscreen mode

🧾 Step 2: Set Environment Variables

Create a .env file in your project directory with your API keys:

GOOGLE_API_KEY=your_google_api_key_here
SERPAPI_KEY=your_serpapi_key_here  # optional
Enter fullscreen mode Exit fullscreen mode

❗ You can still run the system without SerpAPI. DuckDuckGo and Wikipedia are enough for many topics.


🔍 Step 3: Create a Multi-Source Search System

We use DuckDuckGoSearchRun via LangChain for fast, no-auth search. If that fails, we fallback to Wikipedia. If SerpAPI is present, we use that too.

import time, os, requests
from langchain_community.tools import DuckDuckGoSearchRun
from dotenv import load_dotenv

load_dotenv()

class ImprovedSearchSystem:
    def __init__(self):
        self.search = DuckDuckGoSearchRun()
        self.search_delay = 3.0
        self.last_search_time = 0

    def safe_web_search(self, query):
        now = time.time()
        if now - self.last_search_time < self.search_delay:
            time.sleep(self.search_delay - (now - self.last_search_time))
        self.last_search_time = time.time()

        for name, method in [
            ("DuckDuckGo", self._try_duckduckgo),
            ("Wikipedia", self._try_wikipedia),
            ("SerpAPI", self._try_serpapi if os.getenv("SERPAPI_KEY") else None)
        ]:
            if method is None: continue
            try:
                result = method(query)
                if result and len(result.strip()) > 50:
                    return f"[{name}] {result}"
            except Exception:
                continue
        return "Search unavailable or no results."

    def _try_duckduckgo(self, query):
        return self.search.invoke(query)

    def _try_wikipedia(self, query):
        q = "_".join(query.split()[:3])
        r = requests.get(f"https://en.wikipedia.org/api/rest_v1/page/summary/{q}")
        if r.status_code == 200:
            return r.json().get("extract", "")
        return ""

    def _try_serpapi(self, query):
        params = {"q": query, "api_key": os.getenv("SERPAPI_KEY")}
        r = requests.get("https://serpapi.com/search", params=params)
        results = r.json().get("organic_results", [])
        return "\n".join(f"{r['title']}: {r.get('snippet', '')}" for r in results[:2])
Enter fullscreen mode Exit fullscreen mode

What this does:

  • Avoids hitting the same search source every time
  • Skips broken sources
  • Doesn’t block if internet is slow

🧠 Step 4: Add Google Gemini via LangChain

We'll use Gemini to do the heavy lifting: writing, verifying, summarizing, polishing.

from langchain_google_genai import ChatGoogleGenerativeAI

class RobustResearchOrchestrator:
    def __init__(self):
        self.llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-flash-latest",
            temperature=0.7,
            google_api_key=os.getenv("GOOGLE_API_KEY")
        )
        self.search = ImprovedSearchSystem()
        self.llm_delay = 2.0
        self.last_llm_time = 0

    def _rate_limit_llm(self):
        now = time.time()
        if now - self.last_llm_time < self.llm_delay:
            time.sleep(self.llm_delay - (now - self.last_llm_time))
        self.last_llm_time = time.time()
Enter fullscreen mode Exit fullscreen mode

📚 Step 5: Research with Web Context

    def research_with_llm_knowledge(self, topic, web_context=""):
        self._rate_limit_llm()
        prompt = f"""
Research the following topic thoroughly:

Topic: {topic}
{f"Web context: {web_context}" if web_context else ""}

Include:
1. Definitions
2. 2024-2025 breakthroughs
3. Key players
4. Challenges
5. Future outlook
6. Technical details
"""
        return self.llm.invoke(prompt).content
Enter fullscreen mode Exit fullscreen mode

Why this matters:
You’re not just asking the LLM for trivia—you’re giving it timely web data as context.


🔎 Step 6: Enhance with Web-Verified Claims

    def enhance_with_verification(self, content, topic):
        self._rate_limit_llm()
        extract_prompt = f"""
Extract 2 verifiable claims from this research on {topic}:

{content[:2000]}
"""
        claims = self.llm.invoke(extract_prompt).content.splitlines()
        verified = []

        for claim in claims[:2]:
            search_result = self.search.safe_web_search(claim)
            if search_result and len(search_result) > 100:
                verified.append(f"{claim}\nVerified: {search_result}")

        if not verified:
            return content

        enhance_prompt = f"""
Update this research with the following verified information:

Original:
{content}

Verified Data:
{chr(10).join(verified)}

Mark updated lines with [VERIFIED].
"""
        return self.llm.invoke(enhance_prompt).content
Enter fullscreen mode Exit fullscreen mode

Why this matters:
You extract and fact-check your own LLM output. Self-auditing makes your system more trustworthy.


📄 Step 7: Format as a Full Report

    def create_report(self, enhanced, topic):
        self._rate_limit_llm()
        report_prompt = f"""
Convert the following into a full technical report on '{topic}' with:

- Executive Summary
- Introduction
- Breakthroughs
- Key Players
- Technical Details
- Challenges
- Future Outlook
- Conclusion

Content:
{enhanced}
"""
        return self.llm.invoke(report_prompt).content
Enter fullscreen mode Exit fullscreen mode

💅 Step 8: Polish for Professional Quality

    def polish_report(self, report):
        self._rate_limit_llm()
        return self.llm.invoke(f"Polish this technical report:\n\n{report}").content
Enter fullscreen mode Exit fullscreen mode

🧪 Step 9: Run the Entire Pipeline

    def run_pipeline(self, topic):
        context = self.search.safe_web_search(f"{topic} 2024 update")
        content = self.research_with_llm_knowledge(topic, context)
        enhanced = self.enhance_with_verification(content, topic)
        report = self.create_report(enhanced, topic)
        final = self.polish_report(report)
        return final
Enter fullscreen mode Exit fullscreen mode

🧯 Step 10: Command Line Runner

def run_robust_research(topic):
    orchestrator = RobustResearchOrchestrator()
    return orchestrator.run_pipeline(topic)

if __name__ == "__main__":
    topic = input("Enter a research topic: ").strip()
    print(run_robust_research(topic))
Enter fullscreen mode Exit fullscreen mode

🧠 Example: Output for "AI Hardware in 2025"

Executive Summary:
AI hardware has rapidly evolved...

Breakthroughs:
- NVIDIA’s Blackwell GPU architecture...
- Intel’s neuromorphic Loihi chip...

Key Players:
- NVIDIA, Intel, Graphcore, Cerebras...

Challenges:
- Energy efficiency, heat dissipation...

[VERIFIED] Blackwell GPUs announced March 2025 with 4x efficiency...
Enter fullscreen mode Exit fullscreen mode

🔚 Final Thoughts

This agent is:

Self-healing (fallbacks and retries)
Fact-aware (claim verification)
Pro-quality (full report structure)
Flexible (extendable to PDF, Slack, APIs)

You now have a bulletproof research pipeline that works even if APIs don’t.

Top comments (0)