samadhi patil

Posted on Oct 29

Building Advanced AI Agents with LangChain's DeepAgents: A Hands-On Guide

#ai #langchain #python #tutorial

Building Advanced AI Agents with LangChain's DeepAgents: A Hands-On Guide

When Simple Tool-Calling Isn't Enough—Building Agents That Actually Think

TL;DR

From my experience working with LLM agents, the biggest challenge isn't getting them to call a function—it's getting them to handle complex, multi-step workflows without falling apart. Traditional agents are like assistants who forget what they were doing mid-task. LangChain's DeepAgents changes this completely. In my opinion, it's the first framework that truly enables agents to plan strategically, remember context persistently, delegate to specialists, and iterate toward quality. This guide walks you through building a real AI policy research agent that I designed to demonstrate these capabilities. You'll see actual code, understand the design decisions behind each piece, and get a working system that produces professional research reports autonomously.

Introduction

Let me tell you about a problem that drove me crazy for months.

Building an agent that calls a single tool? Easy. Getting an LLM to search the web or query a database? Done it a hundred times. But when I tried to build something more sophisticated—an agent that could research a complex topic, synthesize findings, review its own work, and produce a polished report—everything broke down.

The agents I built were what I now call "shallow." They'd execute one step, maybe two, then lose track of what they were doing. Token limits would overflow. Context would get muddled. Quality would suffer because there was no review process. As per my experience, this is the wall that most developers hit when moving from demos to production AI systems.

Then I discovered LangChain's DeepAgents library, and honestly, it changed how I think about agent architecture entirely.

What struck me first was the philosophy behind it. The DeepAgents team studied production systems like Claude Code and Deep Research—real applications handling genuinely complex workflows—and extracted the patterns that made them work. The result is a framework that gives agents four critical capabilities that shallow agents lack:

Planning tools that let agents break down tasks strategically before diving in. File system access that provides persistent memory outside the conversation context. Sub-agent creation that enables delegation to focused specialists. And long-term memory through LangGraph's Store that maintains state across sessions.

In my view, these aren't just nice features—they're fundamental architectural requirements for any agent doing serious work.

So I decided to build something real to prove it out. Not a toy demo, but an actual policy research system that could rival human analysis. The kind of agent I wish I'd had when I was doing regulatory research manually. In this guide, I'll walk you through exactly how I built it, the decisions I made along the way, and why each piece matters.

What's This Article About?

This article is my attempt to show you—through actual working code—how to build an AI agent that thinks strategically rather than just reacting to inputs.

Here's What We're Building:

Based on my experience with various agent frameworks, I designed a policy research system that demonstrates every key capability of the DeepAgents architecture. This isn't theoretical—it's a complete implementation that I've tested extensively.

The agent can:

Accept complex research questions about AI regulations (like "What's the latest on the EU AI Act?")
Break down the research into logical steps using planning tools
Delegate the actual investigation to a specialized research sub-agent
Save intermediate work to files so context never overflows
Invoke a critique sub-agent to review draft quality
Iterate based on feedback to produce professional reports

What You'll Learn (From My Mistakes and Successes):

Through building this, I learned several critical patterns:

Strategic Planning: In my opinion, the write_todos tool is underrated. I initially skipped it, thinking agents could just "figure it out." Wrong. Explicit planning transforms chaotic execution into methodical workflow.
Context Management: This one burned me hard. My early agents would hit token limits mid-research and forget everything. File system operations (read_file, write_file, edit_file) solve this completely. As per my experience, this is non-negotiable for complex tasks.
Sub-Agent Delegation: I used to cram everything into one mega-prompt. Bad idea. Specialized sub-agents—each with focused responsibilities—produce dramatically better results. One agent researches, another critiques. Clean separation of concerns.
Custom System Prompts: Generic prompts produce generic results. I learned to design detailed, workflow-specific instructions that guide agents step-by-step through complex processes.
Tool Integration: External capabilities like web search aren't add-ons—they're core to agent functionality. I'll show you how I integrated Tavily search seamlessly.
Model Flexibility: One thing I love about DeepAgents is model-agnostic design. I've run this same system on OpenAI, Gemini, and Anthropic models interchangeably.

The Technical Architecture (How I Designed It):

When I sat down to architect this system, I thought about three layers:

Layer 1: Main Orchestrator

Receives research queries
Plans the workflow
Coordinates sub-agents
Manages file system state
Delivers final output

Layer 2: Specialized Sub-Agents

Research Sub-Agent: Conducts deep investigation using web search
Critique Sub-Agent: Reviews outputs for quality, accuracy, completeness

Layer 3: Infrastructure

File system for persistent state
LangGraph Store for long-term memory
Tavily API for real-time information gathering

From my perspective, this layered approach is what enables scalability. Each component has a single, clear responsibility.

Why This Pattern Matters:

In my experience building production AI systems, I've learned that architecture matters more than model choice. A well-designed agent with GPT-3.5 will outperform a poorly designed agent with GPT-4. The patterns you'll learn here—planning, delegation, state management, quality control—apply regardless of which LLM you're using or what domain you're working in.

Whether you're building content creation pipelines, code generation systems, data analysis workflows, or customer service automation, these fundamentals remain constant.

Tech Stack

Component	Technology	Purpose
Agent Framework	LangChain DeepAgents	Core library for building deep, planful agents with context management
LLM Provider (Primary)	OpenAI GPT-4o	Main language model for agent reasoning and generation
LLM Provider (Alternative)	Google Gemini 2.5 Flash	Alternative model option (fully interchangeable)
Default Model	Claude Sonnet 4.5	DeepAgents internal default when no model specified
Web Search	Tavily API	Real-time internet search tool for research gathering
State Management	LangGraph Store	Long-term memory and session persistence
Model Initialization	LangChain `init_chat_model`	Unified interface for multiple LLM providers
File Operations	Built-in File Tools	Context management through `read_file`, `write_file`, `edit_file`, `ls`
Planning	Built-in `write_todos`	Task breakdown and progress tracking
Sub-Agent Management	Built-in `task` tool	Creation and delegation to specialized sub-agents
Environment Management	Python `os.environ`	API key and configuration management
Development Environment	Jupyter Notebook	Interactive development and testing

Why Read It?

If you're building AI agents that do more than simple tool calls, this article is essential reading. Here's why:

1. Move Beyond Basic Agents

Most tutorials show you how to build agents that can call a function or search the web. But when you try to scale those patterns to real-world applications—research assistants, code generation systems, multi-step workflows—you hit a wall. This article shows you how to break through that limitation with architectures designed for complexity.

2. Solve Real Production Challenges

You'll learn solutions to problems you'll actually face:

Context overflow: How to handle tasks that exceed token limits through file-based state management
Task planning: How to make agents think strategically instead of reactively
Quality control: How to build self-reviewing systems through sub-agent delegation
Memory management: How to maintain state across sessions for long-running projects
Modularity: How to break complex agents into focused, maintainable components

3. Complete, Working Implementation

This isn't pseudo-code or theoretical concepts. You get:

Full source code for a production-quality research agent
Step-by-step explanations of each component
Design decisions explained in context
Multiple LLM provider options (OpenAI, Gemini, Anthropic)
Ready-to-run Jupyter notebook with all dependencies
Professional prompt engineering examples

4. Learn Patterns You Can Reuse

The patterns demonstrated here apply far beyond policy research:

Content creation systems with review workflows
Code generation with testing and refinement
Data analysis with iterative exploration
Customer service with escalation and specialization
Report generation with research and synthesis
Any multi-stage workflow requiring planning and quality control

5. Understand the "Why" Behind Design Choices

Each section explains not just what the code does, but why it's structured that way:

Why file systems prevent context overflow
Why sub-agents improve focus and quality
Why custom prompts are critical for complex tasks
Why planning tools enable strategic execution
Why this architecture scales where basic agents don't

6. Stay Current with AI Agent Evolution

The AI agent landscape is evolving rapidly from simple tool-calling to sophisticated, planful systems. DeepAgents represents the cutting edge of this evolution, incorporating lessons from production systems like Claude Code. Understanding this architecture prepares you for where the field is heading.

7. Practical Business Value

These techniques have immediate business applications:

Automating research and analysis workflows
Building intelligent content creation pipelines
Creating self-improving code generation systems
Developing sophisticated customer support agents
Implementing complex decision-making systems

Whether you're a developer building production AI systems, a researcher exploring agent architectures, or a technical leader evaluating AI capabilities, this guide provides practical knowledge you can apply immediately.

Let's Design

The Architecture Philosophy

When I set out to build this policy research agent, the fundamental question was: how do we create an AI system that thinks strategically rather than reactively? Traditional agents process tasks linearly—receive input, call tools, return output. But complex research requires something different: planning, delegation, iteration, and quality control.

The DeepAgents architecture addresses this through four interconnected capabilities that work together to enable sophisticated behavior:

1. Strategic Planning Layer

The Problem: Basic agents jump straight into action without considering the best approach. They can't break down complex tasks or track progress across multiple steps.

The Solution: DeepAgents provides a write_todos tool that enables agents to:

Decompose large research questions into specific subtasks
Create actionable checklists before starting work
Track which steps are complete and which remain
Adjust the plan dynamically as new information emerges

In Our Implementation: The main agent first saves the research question to question.txt, then creates a todo list outlining the research workflow: gather information, analyze findings, write the report, critique the draft, and finalize. This planning step transforms reactive execution into strategic orchestration.

2. Persistent Context Management

The Problem: LLMs have token limits. Complex tasks generate large amounts of intermediate data—research findings, draft content, notes—that quickly overflow the context window. Once that happens, the agent loses track of its work.

The Solution: DeepAgents integrates file system operations (read_file, write_file, edit_file, ls) that allow agents to:

Store intermediate results outside the conversation context
Retrieve specific information when needed
Build up complex outputs incrementally
Continue work across multiple sessions

In Our Implementation: The agent uses three key files:

question.txt: Stores the original research query for reference
final_report.md: Holds the evolving research report through drafts and revisions
Working memory in file system: Prevents context overflow even with extensive research

This file-based approach means the agent can handle research projects of any size without hitting token limits.

3. Specialized Sub-Agent Delegation

The Problem: Trying to do everything in one agent leads to bloated context, unclear responsibilities, and lower quality outputs. Research requires different skills than critique. Gathering information is different from synthesizing it.

The Solution: DeepAgents allows creation of focused sub-agents via the task tool. Each sub-agent has:

Its own specialized system prompt defining clear responsibilities
Dedicated tools appropriate to its function
Isolated context that doesn't clutter the main agent
Single, well-defined output that returns to the main agent

In Our Implementation: We use two sub-agents:

Policy Research Sub-Agent:

Purpose: Conduct in-depth investigation of AI regulations and policies
Tools: Internet search via Tavily API
Instructions: Find key updates, cite sources, compare global approaches, write professionally
Output: Comprehensive research findings passed back to main agent

Policy Critique Sub-Agent:

Purpose: Quality control and editorial review
Tools: File reading to access the draft report
Instructions: Check accuracy, verify citations, assess balance and tone
Output: Constructive feedback without direct modification

This separation of concerns means each sub-agent can focus deeply on its specialty without distractions.

4. Intelligent Workflow Orchestration

The Problem: Who coordinates all these pieces? How does the main agent know when to delegate, when to write, when to revise?

The Solution: A carefully crafted custom system prompt that serves as the "brain" of the operation. This prompt:

Defines the overall workflow step-by-step
Specifies when to invoke each sub-agent
Enforces quality standards and formatting requirements
Provides context about the agent's role and capabilities

In Our Implementation: The policy_research_instructions prompt creates a clear five-step workflow:

Save the Question: Write user query to question.txt for reference
Delegate Research: Invoke policy-research-agent to gather comprehensive information
Synthesize Report: Write findings to final_report.md with proper structure and citations
Quality Review: Optionally invoke policy-critique-agent for editorial feedback
Finalize: Revise based on feedback and output the complete professional report

The prompt also enforces standards:

Markdown formatting with clear section headers
Citation style using [Title](URL) format
Professional, neutral tone suitable for policy briefings
Sources section at the end

The Complete Flow

Here's how everything works together when a user asks: "What are the latest updates on the EU AI Act and its global impact?"

User Query
    ↓
Main Deep Agent
    ↓
1. Saves question to question.txt (context management)
    ↓
2. Creates todo list (planning)
    ↓
3. Invokes Policy Research Sub-Agent
    ↓
    Research Sub-Agent:
    - Uses Tavily search for EU AI Act updates
    - Finds regulations, news, analysis
    - Compares global approaches
    - Formats findings professionally
    - Returns comprehensive research to Main Agent
    ↓
4. Main Agent writes draft to final_report.md
    ↓
5. Invokes Policy Critique Sub-Agent
    ↓
    Critique Sub-Agent:
    - Reads final_report.md
    - Checks accuracy and citations
    - Verifies balanced analysis
    - Returns constructive feedback to Main Agent
    ↓
6. Main Agent revises draft based on feedback
    ↓
7. Outputs final professional policy report

Why This Architecture Scales

This design handles complexity through:

Modularity: Each component has a single, clear responsibility
Extensibility: Easy to add new sub-agents for different research domains
Robustness: File system prevents context overflow regardless of task size
Quality: Built-in review cycle ensures professional outputs
Flexibility: Works with any LLM provider (OpenAI, Gemini, Anthropic, etc.)
Maintainability: Clear separation between orchestration, execution, and review

Unlike basic agents that collapse under complexity, this architecture gets stronger as tasks become more sophisticated. The planning layer ensures strategic execution, the file system prevents memory issues, sub-agents maintain focus, and the workflow orchestration keeps everything coordinated.

This is the fundamental difference between shallow and deep agents: the ability to think, plan, delegate, and iterate rather than just react and respond.

Let's Get Cooking

Now let's build this system step by step, understanding each component and why it matters.

Step 1: Install Dependencies and Setup Environment

First, we need the core libraries that power our deep agent system.

!pip install deepagents tavily-python langchain-google-genai langchain-openai

What We're Installing:

deepagents: The core LangChain library providing planning, file tools, and sub-agent capabilities
tavily-python: Client for Tavily web search API (our research tool)
langchain-google-genai: Integration for Google's Gemini models (alternative LLM)
langchain-openai: Integration for OpenAI's GPT models (primary LLM)

Why These Dependencies: DeepAgents is model-agnostic, so we install multiple LLM provider options. Tavily provides the real-time web search capability our research sub-agent needs.

Step 2: Configure API Keys

import os
from getpass import getpass

# Required for web search functionality
os.environ['TAVILY_API_KEY'] = getpass('Enter Tavily API Key: ')

# Choose your preferred LLM provider
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

# Optional: If using Gemini instead
# os.environ['GOOGLE_API_KEY'] = getpass('Enter Google API Key: ')

Why This Matters:

The Tavily API key enables our research sub-agent to search the web for real-time information
You can use either OpenAI or Google (or other providers)—DeepAgents works seamlessly with all of them
Using getpass keeps your API keys secure and out of your code

Getting API Keys:

Tavily: Sign up at tavily.com for web search access
OpenAI: Get your key from platform.openai.com
Google: Access through Google AI Studio

Step 3: Import Core Libraries

import os
from typing import Literal
from tavily import TavilyClient
from deepagents import create_deep_agent

# Initialize the search client
tavily_client = TavilyClient()

Why We Import These:

typing.Literal: Enables type hints for our search function parameters
TavilyClient: Provides the interface to web search functionality
create_deep_agent: The main factory function for building our deep agent system

Design Note: We initialize the Tavily client at the module level because it will be used inside our tool function, which gets called by the sub-agent.

Step 4: Define the Web Search Tool

from typing import Literal

def internet_search(
    query: str,
    max_results: int = 5,
    topic: Literal["general", "news", "finance"] = "general",
    include_raw_content: bool = False,
):
    """Run a web search and return relevant results.

    This tool allows agents to gather real-time information from the internet.
    """
    search_docs = tavily_client.search(
        query,
        max_results=max_results,
        include_raw_content=include_raw_content,
        topic=topic,
    )
    return search_docs

What This Tool Does:

Accepts a search query string and optional parameters
Calls Tavily's search API to find relevant web content
Returns structured search results (titles, URLs, snippets, content)
Supports different topic categories (general, news, finance)

Why We Structure It This Way:

Clear docstring: Helps the LLM understand when and how to use this tool
Typed parameters: The Literal type hint tells the agent exactly what topic values are valid
Sensible defaults: 5 results and general search work for most cases
Raw content option: Can retrieve full page content when needed for deep research

How the Agent Uses It: The research sub-agent will automatically call this function when it needs to gather information about AI policies, regulations, or related topics.

Step 5: Create the Research Sub-Agent Configuration

sub_research_prompt = """
You are a specialized AI policy researcher.
Conduct in-depth research on government policies, global regulations, and ethical frameworks related to artificial intelligence.

Your answer should:
- Provide key updates and trends
- Include relevant sources and laws (e.g., EU AI Act, U.S. Executive Orders)
- Compare global approaches when relevant
- Be written in clear, professional language

Only your FINAL message will be passed back to the main agent.
"""

research_sub_agent = {
    "name": "policy-research-agent",
    "description": "Used to research specific AI policy and regulation questions in depth.",
    "system_prompt": sub_research_prompt,
    "tools": [internet_search],
}

Breaking This Down:

The System Prompt:

Defines the sub-agent's identity and expertise (AI policy researcher)
Specifies output requirements (updates, sources, comparisons)
Sets quality standards (clear, professional language)
Reminds the agent that only the final message returns to the parent

Why This Prompt Works:

Focused role: "Specialized AI policy researcher" gives clear identity
Concrete requirements: The bullet points tell the agent exactly what to include
Examples: Mentioning "EU AI Act, U.S. Executive Orders" helps the agent understand scope
Important constraint: "Only your FINAL message will be passed back" prevents verbose intermediate steps

The Configuration Dictionary:

name: Identifier the main agent uses to invoke this sub-agent
description: Helps the main agent decide when to delegate to this specialist
system_prompt: The instructions that define this sub-agent's behavior
tools: List of functions this sub-agent can call (just internet search in this case)

Design Philosophy: This sub-agent has ONE job: research AI policies thoroughly. It has the internet search tool to do that job well, and its prompt focuses it entirely on that task. No distractions, no scope creep.

Step 6: Create the Critique Sub-Agent Configuration

sub_critique_prompt = """
You are a policy editor reviewing a report on AI governance.
Check the report at `final_report.md` and the question at `question.txt`.

Focus on:
- Accuracy and completeness of legal information
- Proper citation of policy documents
- Balanced analysis of regional differences
- Clarity and neutrality of tone

Provide constructive feedback, but do NOT modify the report directly.
"""

critique_sub_agent = {
    "name": "policy-critique-agent",
    "description": "Critiques AI policy research reports for completeness, clarity, and accuracy.",
    "system_prompt": sub_critique_prompt,
}

The Critique System Prompt:

Defines role as editorial reviewer (not researcher)
Specifies exactly what files to check
Lists concrete quality criteria to evaluate
Explicitly prohibits direct modification (feedback only)

Why No Tools Here: Unlike the research sub-agent, the critique agent doesn't need internet search. It has access to the file system (built into all DeepAgents) to read the draft report and provide feedback. That's all it needs.

The Review Criteria:
Each bullet point gives the critic something specific to check:

Accuracy and completeness: Are the facts right? Is anything missing?
Proper citation: Are sources properly attributed?
Balanced analysis: Are regional differences fairly represented?
Clarity and neutrality: Is the tone appropriate for policy work?

Why "Do NOT modify": This is crucial. We want the critique agent to identify issues and suggest improvements, but leave the actual editing to the main agent. This separation prevents the critic from overstepping and ensures the main agent maintains control of the final output.

The Workflow: Main agent → writes draft → critique agent reviews → provides feedback → main agent revises. Clean separation of concerns.

Step 7: Design the Main Agent System Prompt

policy_research_instructions = """
You are an expert AI policy researcher and analyst.
Your job is to investigate questions related to global AI regulation, ethics, and governance frameworks.

1️⃣ Save the user's question to `question.txt`
2️⃣ Use the `policy-research-agent` to perform in-depth research
3️⃣ Write a detailed report to `final_report.md`
4️⃣ Optionally, ask the `policy-critique-agent` to critique your draft
5️⃣ Revise if necessary, then output the final, comprehensive report

When writing the final report:
- Use Markdown with clear sections (## for each)
- Include citations in [Title](URL) format
- Add a ### Sources section at the end
- Write in professional, neutral tone suitable for policy briefings
"""

This Is the Brain of the Operation. Let me explain why each part matters:

Identity and Purpose ("You are an expert AI policy researcher..."):

Establishes the agent's high-level role
Sets expectations for quality and expertise
Provides context for decision-making

The Numbered Workflow (Steps 1-5):
This is the most important part. It gives the agent a clear execution plan:

Save the question: Creates a persistent reference the agent can check later
Delegate to research sub-agent: Uses the specialist to gather information
Write the report: Synthesizes findings into a structured document
Get critique: Optionally invokes the editor for quality review
Finalize: Revises based on feedback and delivers the result

Why Numbered Steps Work: They create a mental model for the LLM. The agent knows there's a sequence, knows what comes next, and can track progress.

The Formatting Requirements (Markdown, citations, etc.):
These aren't just style preferences—they're quality controls:

Markdown sections: Ensures structured, navigable reports
Title citations: Makes sources clickable and verifiable
Sources section: Consolidates references for easy checking
Professional tone: Appropriate for the policy analysis domain

Why This Prompt Architecture Works:

Clear role: The agent knows who it is
Explicit workflow: The agent knows what to do
Quality standards: The agent knows how to do it well
File-based state: The agent can handle any complexity
Delegation model: The agent knows when to get help

This is the difference between an agent that wanders aimlessly and one that executes strategically.

Step 8: Initialize the Main Deep Agent

from langchain.chat_models import init_chat_model
from deepagents import create_deep_agent

# Initialize with OpenAI GPT-4o
model = init_chat_model(model="openai:gpt-4o")

# Alternative: Use Google Gemini instead
# model = init_chat_model(model="google_genai:gemini-2.5-flash")

# Create the deep agent
agent = create_deep_agent(
    model=model,
    tools=[internet_search],
    system_prompt=policy_research_instructions,
    subagents=[research_sub_agent, critique_sub_agent],
)

What's Happening Here:

Model Initialization:

model = init_chat_model(model="openai:gpt-4o")

init_chat_model is LangChain's unified interface for any LLM
Format is "provider:model_name"
Switching providers is as simple as changing this string
If you don't specify a model, DeepAgents defaults to Claude Sonnet 4.5

Creating the Deep Agent:

agent = create_deep_agent(...)

This single function call assembles the entire complex system:

model: The LLM that powers reasoning and decision-making
tools: Functions the main agent can call directly (internet search)
system_prompt: The workflow instructions we defined above
subagents: The specialized agents available for delegation

What create_deep_agent Does Internally:

Sets up the planning system (todo management)
Configures file system tools (read, write, edit, ls)
Registers the sub-agents for delegation
Integrates the custom tools we provided
Wraps everything in a LangGraph workflow
Connects to the LLM for execution

Why This Is Powerful: In these few lines, we've created an agent that can:

Plan and track complex tasks
Search the web for information
Delegate to specialized sub-agents
Manage persistent state through files
Maintain long-term memory
Execute multi-step workflows

All the complexity is abstracted away by create_deep_agent.

Model Flexibility Note: Notice how easy it is to switch between OpenAI and Gemini (or any other provider). This is intentional—DeepAgents is designed to be model-agnostic. The architecture works the same regardless of which LLM powers it.

Step 9: Invoke the Agent with a Research Query

query = "What are the latest updates on the EU AI Act and its global impact?"

result = agent.invoke({"messages": [{"role": "user", "content": query}]})

What Happens When You Run This:

The Invocation Format:

{"messages": [{"role": "user", "content": query}]}

This is the standard LangChain message format. The agent receives it as if it were a chat conversation.

The Execution Flow (what happens behind the scenes):

Main Agent Receives Query
- Reads the question: "What are the latest updates on the EU AI Act..."
- Consults its system prompt (the numbered workflow)
- Decides to start at step 1
Step 1: Save the Question

   # Agent calls: write_file(path="question.txt", content=query)

Uses built-in file tool to persist the question
Creates a reference point for later use

Step 2: Delegate to Research Sub-Agent

   # Agent calls: task(agent="policy-research-agent", 
   #                   instruction="Research the EU AI Act updates and global impact")

Research sub-agent receives the task
Calls internet_search("EU AI Act latest updates")
Calls internet_search("EU AI Act global impact")
Calls internet_search("AI regulations worldwide comparison")
Synthesizes findings into comprehensive report
Returns final message to main agent

Step 3: Write the Draft Report

   # Agent calls: write_file(path="final_report.md", content=research_findings)

Takes sub-agent's research
Structures it into Markdown sections
Includes citations in Title format
Writes to file system

Step 4: Get Editorial Critique

   # Agent calls: task(agent="policy-critique-agent",
   #                   instruction="Review the draft report for quality")

Critique sub-agent reads final_report.md
Checks accuracy, citations, balance, tone
Provides constructive feedback
Returns review to main agent

Step 5: Revise and Finalize

   # Agent calls: edit_file(path="final_report.md", edits=improvements)

Main agent incorporates feedback
Refines sections based on critique
Ensures all quality standards are met
Outputs final comprehensive report

The Result: result contains the complete execution trace, including:

All intermediate steps taken
Tool calls made by each agent
Sub-agent invocations and responses
The final policy report
File system changes

Viewing the Output:

# Get just the final report
print(result['messages'][-1]['content'])

# Or access the file directly
# agent.read_file("final_report.md")

What Makes This Different from Basic Agents:

Basic agent: Search → Generate → Done (one step, no planning)
Deep agent: Plan → Delegate → Research → Write → Critique → Revise → Deliver (strategic, multi-stage)

The deep agent produces higher quality results because it:

Plans before acting
Uses specialists for specific tasks
Iterates based on feedback
Manages complexity through file system
Maintains focus through sub-agent delegation

This is the power of the DeepAgents architecture in action.

Let's Setup

Prerequisites

Before running this implementation, ensure you have:

1. Python Environment

Python 3.8 or higher
pip package manager
Jupyter Notebook or JupyterLab (recommended for interactive development)

2. API Access

Tavily API Key:
- Sign up at https://tavily.com
- Free tier available for testing
- Required for web search functionality
LLM Provider API Key (choose one):
- OpenAI: Get key from https://platform.openai.com
- Google AI: Get key from Google AI Studio
- Anthropic: Get key from https://console.anthropic.com
- Note: DeepAgents defaults to Claude Sonnet 4.5 if no model specified

3. Development Environment

Code editor (VS Code, PyCharm, or similar)
Terminal access for pip installations
Stable internet connection for API calls

Installation Steps

Step 1: Set Up Virtual Environment (Recommended)

# Create a virtual environment
python -m venv deepagents-env

# Activate it (Windows)
deepagents-env\Scripts\activate

# Activate it (Mac/Linux)
source deepagents-env/bin/activate

Step 2: Install Core Dependencies

pip install deepagents tavily-python langchain-google-genai langchain-openai

Step 3: Verify Installation

# Test imports
import deepagents
from tavily import TavilyClient
from langchain.chat_models import init_chat_model

print("✅ All dependencies installed successfully")

Step 4: Configure Environment Variables

Create a .env file in your project directory:

TAVILY_API_KEY=your_tavily_key_here
OPENAI_API_KEY=your_openai_key_here
# GOOGLE_API_KEY=your_google_key_here  # If using Gemini

Or set them programmatically:

import os
from getpass import getpass

os.environ['TAVILY_API_KEY'] = getpass('Enter Tavily API Key: ')
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

Step 5: Test Your Setup

Run this quick test to verify everything works:

from tavily import TavilyClient
from langchain.chat_models import init_chat_model
from deepagents import create_deep_agent

# Test Tavily connection
client = TavilyClient()
test_search = client.search("test query", max_results=1)
print("✅ Tavily search working")

# Test model initialization
model = init_chat_model(model="openai:gpt-4o")
print("✅ LLM connection working")

# Test DeepAgents
simple_agent = create_deep_agent(model=model, tools=[])
print("✅ DeepAgents framework working")

print("\n🎉 Setup complete! You're ready to build deep agents.")

Project Structure

Organize your project like this:

deepagents-project/
│
├── deepagents_research.ipynb    # Main notebook (from GitHub)
├── .env                          # Your API keys (DO NOT commit to git)
├── .gitignore                    # Ignore .env and other sensitive files
├── requirements.txt              # Dependency list
└── README.md                     # Project documentation

Common Setup Issues and Solutions

Issue: "ModuleNotFoundError: No module named 'deepagents'"

Solution: Make sure you activated your virtual environment before installing

Issue: "Invalid API key" errors

Solution: Double-check your API keys, ensure no extra spaces, verify they're active

Issue: "Rate limit exceeded" from Tavily or OpenAI

Solution: Check your API usage limits, consider upgrading your plan, or add rate limiting

Issue: File permission errors

Solution: Ensure you have write permissions in your working directory

Quick Start Checklist

[ ] Python 3.8+ installed
[ ] Virtual environment created and activated
[ ] Dependencies installed (pip install deepagents tavily-python langchain-google-genai langchain-openai)
[ ] Tavily API key obtained and configured
[ ] LLM provider API key obtained and configured
[ ] Test imports successful
[ ] Tavily search test successful
[ ] Model initialization test successful
[ ] Ready to run the full implementation

Next Steps

Once your setup is complete, you can:

Download the complete code from the GitHub repository
Run through the implementation step by step
Experiment with different research queries
Modify sub-agent prompts for different domains
Add additional sub-agents for specialized tasks

6. Integrate with your own applications

Let's Run

Running the Complete System

Now that everything is set up, let's execute the full policy research workflow.

Step 1: Start Jupyter Notebook

jupyter notebook

Navigate to your deepagents_research.ipynb file and open it.

Step 2: Run Initialization Cells

Execute cells in order to:

Import dependencies
Configure API keys
Define the search tool
Create sub-agent configurations
Set up the main agent prompt
Initialize the deep agent

Step 3: Execute a Research Query

Run the invocation cell:

query = "What are the latest updates on the EU AI Act and its global impact?"
result = agent.invoke({"messages": [{"role": "user", "content": query}]})

What You'll See During Execution:

The agent will output its thought process and actions:

💭 Thought: I need to research the EU AI Act updates systematically.

🔧 Action: write_file
   Path: question.txt
   Content: What are the latest updates on the EU AI Act...

✅ Observation: File created successfully

💭 Thought: Now I'll delegate the research to my specialized sub-agent.

🔧 Action: task
   Agent: policy-research-agent
   Instruction: Research latest EU AI Act updates and global impact

   [Sub-agent starts executing...]

   🔍 Searching: "EU AI Act 2025 updates"
   🔍 Searching: "EU AI Act global impact"
   🔍 Searching: "AI regulations worldwide comparison"

   ✅ Sub-agent complete: [Returns comprehensive research findings]

💭 Thought: I have the research. Now I'll structure it into a report.

🔧 Action: write_file
   Path: final_report.md
   Content: [Structured Markdown report with citations]

✅ Observation: Report written

💭 Thought: Let me get editorial feedback to ensure quality.

🔧 Action: task
   Agent: policy-critique-agent
   Instruction: Review the draft report

   [Critique sub-agent executes...]

   ✅ Sub-agent complete: [Returns constructive feedback]

💭 Thought: Based on the feedback, I'll make final revisions.

🔧 Action: edit_file
   Path: final_report.md
   Edits: [Improvements based on critique]

✅ Final report complete!

Step 4: View the Results

Access the final report:

# Option 1: From the result object
final_message = result['messages'][-1]['content']
print(final_message)

# Option 2: Read from file system
report = agent.read_file("final_report.md")
print(report)

# Option 3: View all generated files
files = agent.list_files()
print("Files created:", files)

Example Output Structure:

The final report will look something like this:

# Latest Updates on the EU AI Act and Global Impact

## Executive Summary

The EU AI Act, formally adopted in [date], represents the world's first 
comprehensive regulatory framework for artificial intelligence...

## Key Updates

### Regulatory Timeline
- Final text published: [date]
- Implementation begins: [date]
- Full enforcement: [date]

### Core Provisions
- Risk-based classification system
- Prohibited AI practices
- High-risk AI requirements
- Transparency obligations

## Global Impact

### Regional Responses

**United States**: The U.S. has responded with executive orders...

**China**: China's AI regulations focus on...

**United Kingdom**: The UK has taken a different approach...

### International Standards

The EU AI Act is influencing global AI governance through...

## Industry Implications

Organizations worldwide are adapting to these regulations by...

## Sources

- [EU AI Act Official Text](https://example.com)
- [Global AI Policy Tracker](https://example.com)
- [Industry Analysis Report](https://example.com)

Step 5: Try Different Queries

Experiment with various research questions:

# Example 1: Different topic
query1 = "How are different countries regulating AI in healthcare?"
result1 = agent.invoke({"messages": [{"role": "user", "content": query1}]})

# Example 2: Specific comparison
query2 = "Compare AI ethics frameworks between US, EU, and China"
result2 = agent.invoke({"messages": [{"role": "user", "content": query2}]})

# Example 3: Narrow focus
query3 = "What are the key compliance requirements for AI systems under the EU AI Act?"
result3 = agent.invoke({"messages": [{"role": "user", "content": query3}]})

Step 6: Inspect the Workflow

View the complete execution trace:

# See all steps taken
for msg in result['messages']:
    role = msg.get('role', 'system')
    content = msg.get('content', '')[:200]  # First 200 chars
    print(f"\n[{role}]: {content}...")

# Count tool calls
tool_calls = [msg for msg in result['messages'] if msg.get('tool_calls')]
print(f"\nTotal tool calls: {len(tool_calls)}")

# See sub-agent invocations
sub_agent_calls = [call for call in tool_calls if 'task' in str(call)]
print(f"Sub-agent invocations: {len(sub_agent_calls)}")

Performance Metrics:

Monitor agent performance:

import time

start_time = time.time()
result = agent.invoke({"messages": [{"role": "user", "content": query}]})
end_time = time.time()

execution_time = end_time - start_time
print(f"⏱️  Total execution time: {execution_time:.2f} seconds")

# Token usage (if available from your LLM provider)
# This varies by provider
if 'usage' in result:
    print(f"🎫 Tokens used: {result['usage']}")

Debugging Tips:

If something goes wrong:

Check API Keys: Verify all keys are set correctly

print("Tavily:", "✅" if os.getenv('TAVILY_API_KEY') else "❌")
print("OpenAI:", "✅" if os.getenv('OPENAI_API_KEY') else "❌")

Test Components Individually:

# Test search function
test_result = internet_search("test query", max_results=1)
print("Search working:", "✅" if test_result else "❌")

# Test model
test_response = model.invoke([{"role": "user", "content": "Hi"}])
print("Model working:", "✅" if test_response else "❌")

Enable Verbose Logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Check File System:

# List all files created by agent
files = agent.list_files()
print("Files created:", files)

# Read any file to debug
content = agent.read_file("question.txt")
print("Question file:", content)

Expected Behavior:

Normal execution: 30-90 seconds depending on query complexity
Sub-agent calls: Typically 2 (research + critique)
File operations: 3-5 (write question, write report, edit report)
Search queries: 3-7 (depending on research depth)

Success Indicators:

✅ Agent completes all workflow steps
✅ Final report is well-structured Markdown
✅ Sources are properly cited
✅ Report addresses the original question
✅ Professional tone maintained throughout
✅ No errors or crashes

Next Experiments:

Once you have it working:

Modify sub-agent prompts for different domains (technology, finance, healthcare)
Add additional sub-agents (fact-checker, summarizer, translator)
Adjust search parameters (more results, different topics)
Try different LLM models (compare GPT-4 vs Gemini vs Claude)
Integrate with your own data sources or APIs

Closing Thoughts

We've just built something remarkable: an AI agent that doesn't just respond to queries but thinks strategically, delegates to specialists, manages complex state, and iterates toward high-quality outputs. This isn't the future of AI agents—it's what's possible right now with LangChain's DeepAgents.

What We Accomplished

In this hands-on guide, you learned how to:

Transform a basic tool-calling agent into a sophisticated planning system
Leverage file-based context management to handle tasks of any complexity
Design and coordinate specialized sub-agents for focused execution
Implement iterative quality control through automated review workflows
Build production-ready research systems that rival human analysis

The policy research agent we built demonstrates patterns that extend far beyond this specific use case. Whether you're building code generation systems, content creation pipelines, data analysis workflows, or customer service automation, the principles remain the same: plan strategically, delegate intelligently, manage state persistently, and iterate toward quality.

The Deeper Implications

The evolution from shallow to deep agents represents a fundamental shift in how we build AI systems. We're moving from tools that execute individual tasks to systems that orchestrate complex workflows. The key insight is that intelligence emerges not just from powerful models, but from thoughtful architecture—planning layers, memory systems, delegation patterns, and quality controls.

DeepAgents embodies this philosophy. By providing built-in planning tools, file system access, sub-agent creation, and long-term memory, it gives developers the building blocks to create genuinely sophisticated AI systems without reinventing infrastructure.

What's Next for AI Agents

The trajectory is clear: agents are becoming more capable, more modular, and more specialized. We're heading toward ecosystems where:

Agent swarms collaborate on complex problems, each bringing specialized expertise
Persistent memory allows agents to maintain context across days, weeks, or months of work
Self-improvement loops enable agents to learn from feedback and enhance their own prompts
Multi-modal capabilities combine text, code, images, and data seamlessly
Human-AI collaboration reaches new levels as agents become true thought partners

The foundation you've learned here—planning, delegation, state management, quality control—will remain relevant as these capabilities evolve.

Practical Next Steps

Where should you go from here?

Immediate Experiments:

Adapt this architecture to your own domain (swap policy research for tech analysis, financial reports, or medical literature reviews)
Add more specialized sub-agents (fact-checkers, translators, data analyzers)
Integrate with your own data sources, databases, or APIs
Experiment with different LLM models to find the best performance/cost balance
Build quality metrics to measure agent performance over time

Production Considerations:

Implement proper error handling and retry logic
Add monitoring and logging for production deployments
Build evaluation frameworks to assess agent output quality
Consider cost optimization strategies (caching, smaller models for sub-agents)
Design human-in-the-loop workflows for critical decisions

Learning More:

Explore the DeepAgents GitHub repository for advanced examples
Study the LangGraph documentation for state management patterns
Join the LangChain community to learn from other practitioners
Experiment with Claude Code and Deep Research to see these patterns at scale

The Bigger Picture

What excites me most about this technology isn't just what it can do today, but what it enables tomorrow. As agents become more sophisticated, the barrier to building intelligent systems continues to fall. Complex workflows that once required teams of specialists can now be orchestrated by thoughtfully designed agent systems.

This democratization of AI capabilities means:

Small teams can build products that previously required large organizations
Individuals can leverage AI to amplify their expertise and productivity
New categories of applications become possible
The focus shifts from AI implementation to AI orchestration

We're entering an era where the key skill isn't writing every algorithm from scratch, but knowing how to compose powerful systems from intelligent components.

Final Thoughts

The agent you built today is more than a research tool—it's a pattern for building intelligent systems. The planning layer, the delegation model, the context management, the quality control—these aren't specific to policy research. They're fundamental architectural principles for any complex AI workflow.

As you apply these patterns to your own projects, remember: the goal isn't to replace human intelligence, but to augment it. The best AI systems are those that handle complexity strategically, maintain context persistently, leverage specialization effectively, and iterate toward quality relentlessly.

That's what DeepAgents enables. That's what you now know how to build.

The future of AI isn't about smarter models alone—it's about smarter architectures. And you're now equipped to create them.

Want to go deeper?