DEV Community

Cover image for One Open Source Project a Day (No.33): DeerFlow - ByteDance's SuperAgent Execution Engine
WonderLab
WonderLab

Posted on

One Open Source Project a Day (No.33): DeerFlow - ByteDance's SuperAgent Execution Engine

Introduction

"LLMs shouldn't just talk about actions — they should actually execute them."

This is article No.33 in the "One Open Source Project a Day" series. Today's project is DeerFlow (GitHub).

Most AI Agent frameworks share a hidden limitation: they're good at suggesting, but not at doing. Generating code is easy — but actually running it, handling errors, iterating, and producing a deliverable artifact? That's the real challenge for complex research and automation tasks.

DeerFlow (Deep Exploration and Efficient Research Flow) is ByteDance's open-source answer to this problem. Completely rewritten in v2.0, it's no longer just a deep research framework — it's a general-purpose SuperAgent execution engine that runs code in real sandboxes, orchestrates parallel sub-agents, and handles tasks that take minutes to hours — from a single prompt all the way to a research report, a webpage, or a working program.

It hit #1 on GitHub Trending shortly after launch and now sits at 59k+ Stars, making it one of the most watched open-source projects in the AI Agent space.

What You'll Learn

  • DeerFlow's core positioning and the v1 → v2 architectural evolution
  • The SuperAgent execution flow: Lead Agent + parallel sub-agent orchestration
  • How sandbox-isolated code execution works and its security design
  • The Skills-as-Markdown extensibility mechanism
  • Multi-model support strategy and Chinese model optimization

Prerequisites

  • Basic understanding of LLM API calls (OpenAI-compatible interface format)
  • Some Docker experience (recommended for deployment)
  • Python basics (optional, for customization)

Project Background

What Is It?

DeerFlow stands for Deep Exploration and Efficient Research Flow, open-sourced by ByteDance. The project has gone through two major versions:

  • v1.x: Positioned as a deep research framework — multi-round search, web scraping, and consolidated report generation
  • v2.0 (February 2026, complete rewrite): Elevated to a general-purpose SuperAgent execution engine, introducing real sandbox code execution and a much broader range of supported task types

v2.0 shares no code with v1.x — it's a ground-up architectural rebuild, marking the project's transition from a "research assistance tool" to a "production-grade Agent execution engine."

About the Team

  • Organization: ByteDance (official open-source project)
  • Nature: Community-driven, led by ByteDance engineers, accepts external contributions
  • Release Timeline: v1.0 ~early 2025, v2.0 released February 2026
  • Milestone: Hit #1 on GitHub Trending on February 28, 2026

Project Stats

  • GitHub Stars: 59,200+
  • 🍴 Forks: 7,500+
  • 🐛 Open Issues: ~365
  • 📄 License: MIT
  • 🔄 Active Branches: main (v2.x), main-1.x (v1.x maintenance)

Key Features

Core Purpose

DeerFlow's fundamental value proposition is making AI Agents actually do things rather than just talk about things:

Capability Traditional Agent Frameworks DeerFlow v2.0
Code Execution Generates code (doesn't run it) Real execution in isolated sandbox
Task Duration Seconds to minutes Minutes to hours
Task Decomposition Sequential execution Parallel sub-agent orchestration
Output Type Text suggestions Real deliverables: files, pages, programs
Context Limits Bound by single model window Sub-agent divide-and-conquer

Use Cases

  1. Deep Research Reports

    • Given a research topic, automatically performs multi-round search, web scraping, and data synthesis to produce a structured report
  2. Code Generation & Validation

    • From requirements to a working program — real execution and debugging in the sandbox, iterating until it works
  3. Data Analysis & Visualization

    • Upload a data file; the Agent writes analysis scripts, generates charts, and outputs a ready-to-use analytics report
  4. Web Development

    • Describe what you need; the Agent writes HTML/CSS/JS, validates it in the sandbox, and delivers a complete webpage
  5. Content Creation

    • Automatically generate slides, podcast summaries, technical blog posts, and other content formats

Quick Start

Recommended (Docker):

# Clone the repository
git clone https://github.com/bytedance/deer-flow.git
cd deer-flow

# Generate configuration file
make config

# Edit config — fill in your model API keys
# Supports OpenAI, Claude, DeepSeek, Qwen, Doubao, etc.
vim config.yaml

# Initialize and start
make docker-init
make docker-start

# Access the web UI
# http://localhost:2026
Enter fullscreen mode Exit fullscreen mode

Local development mode:

# Check environment requirements (Python 3.12+, Node.js 22+)
make check

# Install dependencies (uv for Python, pnpm for JS)
make install

# Start development servers
make dev
Enter fullscreen mode Exit fullscreen mode

Built-in Skills

DeerFlow ships with several production-ready skills out of the box:

Skill Functionality
Deep Research Multi-round search + web scraping + consolidated research report
Report Generation Formatted report generation
Slide Creation Presentation slide creation
Web Page Development Full webpage development
GitHub Deep Research In-depth GitHub repository analysis

How It Compares

Dimension DeerFlow AutoGen CrewAI Manus
Real Code Execution ✅ Sandbox isolated ✅ (commercial)
Open Source MIT MIT MIT ❌ Closed
Chinese Model Support ✅ First-class Average Average
Production Validated ✅ ByteDance
Skills Extensibility ✅ Markdown Python class Python class
Deployment Complexity Medium (Docker) Low Low No self-hosting

Why choose DeerFlow?

  • Validated in ByteDance's production environments — reliability is battle-tested
  • Sandbox execution produces actual deliverables, not just text suggestions
  • First-class support for DeepSeek, Qwen, Doubao, and other Chinese models
  • Skills-as-Markdown has the lowest extension barrier in its class

Deep Dive

System Architecture

DeerFlow supports two deployment modes, sharing the same frontend but differing in backend process count:

Standard Mode — Recommended for production
┌─────────────────────────────────────┐
│  Nginx (Reverse Proxy + Routing)    │
├──────────────┬──────────────────────┤
│  Frontend    │  Gateway API         │
│  (Web UI)    │  (REST + WebSocket)  │
│              ├──────────────────────┤
│              │  LangGraph Server    │
│              │  (Standalone Agent   │
│              │   Runtime)           │
└──────────────┴──────────────────────┘

Gateway Mode — Experimental, lighter deployment
┌─────────────────────────────────────┐
│  Nginx                              │
├──────────────┬──────────────────────┤
│  Frontend    │  Gateway API         │
│              │  (Embedded Agent     │
│              │   Runtime)           │
└──────────────┴──────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Core Execution Flow

DeerFlow's agent orchestration is a three-tier structure:

User Input (Prompt)
        │
        ▼
┌───────────────────────────────────────┐
│          Lead Agent                   │
│  Task decomposition → Sub-task plan   │
│  → Result aggregation                 │
└──────┬─────────────┬──────────────────┘
       │             │             │
       ▼             ▼             ▼
  Researcher     Coder Agent    Reporter
  Sub-Agent      (Code Gen +    Sub-Agent
  (Search/Crawl)  Sandbox Exec) (Report Synthesis)
       │             │
       ▼             ▼
  Search APIs    Docker Sandbox
  Web Scraping   bash / Python
                 File System
Enter fullscreen mode Exit fullscreen mode

The Lead Agent is the system's "brain", responsible for:

  1. Understanding task intent and breaking it into parallelizable sub-tasks
  2. Assigning each sub-task to the appropriate Sub-Agent
  3. Aggregating results from all Sub-Agents into the final output

Sandbox Execution

The sandbox is one of v2.0's most important technical breakthroughs. Real code isolation is achieved through Docker containers:

# Simplified: Coder Sub-Agent's sandbox invocation
async def execute_in_sandbox(code: str, language: str = "python") -> ExecutionResult:
    """
    Execute code inside a Docker container, isolated from the host
    """
    container = await docker_client.containers.create(
        image="deerflow-sandbox:latest",
        command=["python", "-c", code],
        volumes={
            "/mnt/user-data/workspace": {"bind": "/workspace", "mode": "rw"},
            "/mnt/user-data/outputs": {"bind": "/outputs", "mode": "rw"},
        },
        network_mode="bridge",  # Restricted network access
        mem_limit="2g",         # Memory cap
        cpu_period=100000,
        cpu_quota=50000,        # CPU cap at 50%
    )

    result = await container.start()
    stdout, stderr = await container.logs()

    return ExecutionResult(
        stdout=stdout.decode(),
        stderr=stderr.decode(),
        exit_code=result["StatusCode"]
    )
Enter fullscreen mode Exit fullscreen mode

Sandbox filesystem layout:

Inside the Docker container:
├── /mnt/user-data/uploads    # User-uploaded files (read-only)
├── /mnt/user-data/workspace  # Agent working directory (read-write)
└── /mnt/user-data/outputs    # Final output artifacts (read-write)
Enter fullscreen mode Exit fullscreen mode

This design guarantees:

  • Security isolation: Agent-generated code cannot access sensitive host files
  • Reproducibility: Every task runs in a clean container, avoiding state contamination
  • Real deliverables: Output files persist to the host machine, immediately usable by the user

Skills as Markdown

The Skills system is the crown jewel of DeerFlow's extensibility design. Unlike other frameworks that define Skills as Python classes, DeerFlow uses Markdown files — dramatically lowering the barrier to extension:

.claude/skills/deep-research/
├── SKILL.md              # Skill description, trigger conditions, execution steps
└── references/
    ├── search-strategy.md    # Search strategy specifications
    ├── report-template.md    # Report template
    └── quality-checklist.md  # Quality checklist
Enter fullscreen mode Exit fullscreen mode

A typical SKILL.md structure:

# Deep Research Skill

## Trigger Conditions
Activate when the user needs to conduct deep research on a topic,
competitive analysis, or industry investigation.

## Execution Steps
1. Understand the research objective; break it into 3-5 key questions
2. Perform multi-round searches per question (minimum 3 rounds, diverse angles)
3. Crawl high-quality source pages; extract key information
4. Synthesize findings; identify consensus and contradictions
5. Generate structured output using the report template

## Output Format
- Executive summary (< 200 words)
- Deep-dive sections (500-1000 words each)
- Key findings summary
- Source reference list

## Load Resources
- load_skill_resource("references/search-strategy.md")
- load_skill_resource("references/report-template.md")
Enter fullscreen mode Exit fullscreen mode

This design means non-engineers can write and customize skills — all you need is Markdown, no Python code required.

LangGraph Integration

DeerFlow chose LangGraph as the Agent orchestration layer rather than building its own state machine. LangGraph's key advantages:

  1. Directed Acyclic Graph (DAG): Task dependencies are clearly visualized
  2. Checkpoints: Supports Human-in-the-Loop — pause and wait for human approval at critical nodes
  3. Persistent State: Cross-session task state saving supports interruption and resumption of long-running tasks
  4. Parallel Execution: Native parallel node execution means Sub-Agents can truly run concurrently
# DeerFlow's LangGraph workflow (simplified)
from langgraph.graph import StateGraph, END
from typing import TypedDict

class ResearchState(TypedDict):
    query: str
    sub_tasks: list[str]
    search_results: dict
    code_outputs: dict
    final_report: str

workflow = StateGraph(ResearchState)

# Add nodes
workflow.add_node("planner", lead_agent_plan)
workflow.add_node("researcher", researcher_agent)
workflow.add_node("coder", coder_agent)
workflow.add_node("reporter", reporter_agent)

# Define edges (execution order)
workflow.set_entry_point("planner")
workflow.add_edge("planner", "researcher")
workflow.add_edge("planner", "coder")       # Parallel
workflow.add_edge("researcher", "reporter")
workflow.add_edge("coder", "reporter")
workflow.add_edge("reporter", END)

app = workflow.compile()
Enter fullscreen mode Exit fullscreen mode

Multi-Model Strategy

DeerFlow is model-agnostic, with recommended selection criteria:

  • Long context: 100k+ tokens (for processing large search results and codebases)
  • Reasoning: Complex multi-step reasoning capability
  • Tool calling: Reliable function calling / tool use
  • Recommended Chinese models: Doubao-Seed-2.0-Code (ByteDance in-house), DeepSeek v3.2, Kimi 2.5

Configuration (config.yaml):

# Any OpenAI-compatible endpoint works
llm:
  provider: openai_compatible
  base_url: "https://ark.cn-beijing.volces.com/api/v3"
  api_key: "${DOUBAO_API_KEY}"
  model: "doubao-seed-2-0-code-250605"
  max_tokens: 16384

# Or use DeepSeek
llm:
  provider: openai_compatible
  base_url: "https://api.deepseek.com"
  api_key: "${DEEPSEEK_API_KEY}"
  model: "deepseek-v3"
Enter fullscreen mode Exit fullscreen mode

Resources

Official

Related Projects

  • LangGraph: The Agent orchestration framework powering DeerFlow's backend
  • LangSmith / Langfuse: Observability tracing integrations
  • OpenDeepResearch (OpenAI): Comparable competitor in the deep research space

Summary

Key Takeaways

  1. Positioning leap: From v1's deep research tool to v2's general SuperAgent execution engine — the core jump is real execution capability, not just text generation
  2. Docker sandbox: Real isolated code execution means Agents produce actual deliverables, not suggestions
  3. Sub-agent parallelism: The Lead Agent + Sub-Agent architecture breaks past single-model context limits, enabling genuinely complex long-running tasks
  4. Skills-as-Markdown: Lowest-barrier extensibility in its class — non-engineers can customize Agent behavior
  5. Chinese model first-class support: First-class support for Doubao, DeepSeek, and Qwen makes it the natural choice for developers in China

Who Should Use This

  • Researchers / Analysts: Knowledge workers who need to aggregate and synthesize large amounts of information
  • AI Engineers: Development teams building production-grade Agent applications that need a reliable execution engine
  • Python Developers: Practitioners looking to learn LangGraph and multi-agent orchestration through a real-world codebase
  • Enterprise Tech Teams: Teams exploring AI automation of complex tasks — research, reporting, code generation

Visit my personal site for more useful knowledge and interesting products

Top comments (0)