DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Building an Autonomous Research Agent with Claude

Originally published at claudeguide.io/autonomous-research-agent-claude

Building an Autonomous Research Agent with Claude

A Claude research agent takes a question, runs multiple web searches, reads source documents, deduplicates findings, and produces a structured report — without manual intervention at each step. A well-architected agent can process 10–15 sources and deliver a 500-word cited summary in under 60 seconds, at a cost of roughly $0.05–$0.15 per research session using Claude Sonnet 3.7.


Architecture: Four-Component Pattern

The most reliable research agent architecture separates concerns into four sequential components:

Question
    │
    ▼
[Planner]  → generates a search plan (3–5 queries)
    │
    ▼
[Searcher] → executes searches, fetches pages (parallel)
    │
    ▼
[Synthesizer] → reads sources, extracts key facts, tracks citations
    │
    ▼
[Formatter] → assembles structured report with inline citations
    │
    ▼
Report
Enter fullscreen mode Exit fullscreen mode

Each component is a Claude call with a specific system prompt and tool set. This separation makes the agent easier to debug (swap one component at a time), cheaper to run (use Haiku for search planning, Sonnet for synthesis), and more reliable than a single monolithic agent loop.


Tool Definitions

Define four tools that cover the agent's capabilities. These tool schemas should be cached (see the prompt caching guide) since they don't change between sessions.

import anthropic
import httpx
from bs4 import BeautifulSoup
from typing import Any

client = anthropic.Anthropic()

# Tool schemas
RESEARCH_TOOLS = [
    {
        "name": "web_search",
        "description": """Search the web for current information on a topic. 
        Returns a list of results with titles, URLs, and snippets.
        Use specific, targeted queries. Prefer queries with named entities, dates, or version numbers.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query string (be specific)"
                },
                "num_results": {
                    "type": "integer",
                    "description": "Number of results to return (1-10)",
                    "default": 5
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "read_page",
        "description": """Fetch and read the full text content of a web page.
        Use this after web_search to get the full content of a promising result.
        Returns cleaned text with the source URL for citation.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "Full URL of the page to read"
                },
                "max_chars": {
                    "type": "integer",
                    "description": "Maximum characters to return (default 4000)",
                    "default": 4000
                }
            },
            "required": ["url"]
        }
    },
    {
        "name": "extract_facts",
        "description": """Extract and store a specific fact or claim from a source.
        Call this for each distinct claim you want to include in the final report.
        Facts are deduplicated automatically — duplicate claims from different sources 
        will be merged with all source URLs retained.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "fact": {
                    "type": "string",
                    "description": "The factual claim to store"
                },
                "source_url": {
                    "type": "string",
                    "description": "URL where this fact was found"
                },
                "source_title": {
                    "type": "string",
                    "description": "Title of the source page"
                },
                "confidence": {
                    "type": "string",
                    "enum": ["high", "medium", "low"],
                    "description": "Confidence in this fact"
                }
            },
            "required": ["fact", "source_url", "confidence"]
        }
    },
    {
        "name": "write_report",
        "description": """Compile all extracted facts into a structured research report.
        Call this once after you have gathered sufficient facts (minimum 5).
        The report will include an executive summary, key findings, and a sources list.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "title": {
                    "type": "string",
                    "description": "Report title"
                },
                "executive_summary": {
                    "type": "string",
                    "description": "2-3 sentence summary of findings"
                },
                "format": {
                    "type": "string",
                    "enum": ["markdown", "json", "plain"],
                    "default": "markdown"
                }
            },
            "required": ["title", "executive_summary"]
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

Tool Implementations


python
import json
import re
from urllib.parse import quote_plus

# In-memory fact store with deduplication
fact_store: list[dict] = []

def execute_tool(tool_name: str, tool_input: dict) -

The cookbook includes a production-ready research agent with Brave/Tavily integration, cost tracking, async parallel search, and export to Markdown, Notion, and Google Docs — ready to run in under 10 minutes.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)