Andrew

Posted on Jun 27 • Originally published at andrew.ooo

DeerFlow 2.0 Review: ByteDance's Open SuperAgent Harness

#deerflow #bytedance #superagent #aiagents

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

DeerFlow 2.0 is ByteDance's open-source "SuperAgent harness" — a long-horizon agent runtime that orchestrates sub-agents, sandboxes, persistent memory, and an extensible skill system to run tasks that take minutes to hours. It hit 74,960 stars with 3,329 new this week, and held the #1 GitHub Trending spot on February 28th, 2026 after the v2 launch.

Ground-up rewrite from v1 — v2 shares zero code with the original Deep Research framework
SuperAgent harness — not just another LangChain wrapper; a full execution environment with Docker sandbox, persistent filesystem, run state, and a Gateway API
Sub-agents on demand — spawn isolated sub-agents with their own memory, tools, and sandbox; coordinate via a message gateway
Skill system — drop-in extensions for new capabilities (research, code, design, custom domain logic)
Bring-your-own LLM — Doubao-Seed-2.0-Code, DeepSeek v3.2, Kimi 2.5 recommended; also Claude Sonnet 4.6 via Claude Code OAuth, GPT-5 via Codex CLI, OpenRouter, vLLM
Three sandbox modes — local execution, Docker containers, or Kubernetes pods via a provisioner service
LangGraph-compatible API — Gateway exposes /api/langgraph/* paths so existing LangGraph clients work unchanged
Multilingual docs — English, 中文, 日本語, Français, Русский READMEs maintained in-tree
MCP server, IM channels, LangSmith + Langfuse tracing out of the box
Apache 2.0 — but the sandbox provisioner pulls upstream BytePlus infra by default; air-gap setups need manual config

If you've been building agent workflows on top of raw LangGraph or AutoGen and hit the wall where "who owns the sandbox? Who owns memory? Who owns the message bus?" becomes a six-month engineering project — DeerFlow has already made those decisions. Whether you agree with them is the question.

Quick Reference

Field	Value
Repo	bytedance/deer-flow
Website	deerflow.tech
Stars	74,960 (3,329 this week)
Forks	10,110
License	Apache 2.0
Language	Python (backend) + TypeScript (frontend)
Trending	#1 GitHub Trending on launch day, Feb 28 2026
Install	`git clone … && make setup` (2-minute wizard)
Deployment	Docker (recommended), local dev, or Kubernetes
Default port	`http://localhost:2026`
Built by	ByteDance (TikTok parent company)

What Problem DeerFlow Solves

If you've spent any time trying to build a "real" agent system in 2026, you know the pattern: start with a single LLM call. Add tool calling. Add memory — which kind? Conversation buffer? Vector store? Knowledge graph? Add a sandbox — host or Docker or Firecracker? Add sub-agents — now you need a message bus, run state, and somebody to own cancellation. Six months later you have a worse version of LangGraph + Open Interpreter + AutoGen glued together with duct tape.

DeerFlow 2.0's pitch is: stop building the harness. Use this one. We made the decisions.

Sandbox → Docker by default, with a clean abstraction so you can swap in Kubernetes or local execution.
Memory → long-term persistent memory with sample fixtures, exposed in a Settings UI, file-backed, reviewable.
Sub-agents → first-class. Each gets its own filesystem, context window, and tool set; coordinated by a message gateway.
Skills → the extension primitive. Skills live in skills/ (configurable via DEER_FLOW_SKILLS_PATH), and the agent picks them up automatically.
Models → BYO. YAML config supports LangChain providers, OpenRouter, the OpenAI Responses API, vLLM, and CLI-backed providers like Codex CLI and Claude Code OAuth.

Quick Start

The maintainers' "one-line agent setup" is genuinely cute — they wrote a prompt designed to be pasted into your coding agent of choice:

Help me clone DeerFlow if needed, then bootstrap it for local development by
following https://raw.githubusercontent.com/bytedance/deer-flow/main/Install.md

For humans:

# 1. Clone
git clone https://github.com/bytedance/deer-flow.git
cd deer-flow

# 2. Run the setup wizard (≈2 min, interactive)
make setup

# 3. Verify
make doctor

# 4. Start (Docker recommended)
make docker-init   # Pulls sandbox image — once
make docker-start  # Starts services

# 5. Open
open http://localhost:2026

The make setup wizard walks you through:

LLM provider (Doubao, DeepSeek, Kimi, OpenAI, Anthropic, OpenRouter, vLLM…)
Optional web search (Tavily, BytePlus InfoQuest, or skip)
Execution preferences — sandbox mode, bash access, file-write tools
Writes .env with keys and a minimal config.yaml

If you'd rather edit YAML directly, make config copies the full template.

Configuring Models

This is where DeerFlow earns the "harness" label. The config.yaml model block handles four provider patterns cleanly: standard OpenAI/Anthropic, OpenAI-compatible gateways (OpenRouter), the OpenAI Responses API for GPT-5 reasoning, and local vLLM with reasoning support.

models:
  - name: gpt-4o
    use: langchain_openai:ChatOpenAI
    model: gpt-4o
    api_key: $OPENAI_API_KEY

  - name: openrouter-gemini-2.5-flash
    use: langchain_openai:ChatOpenAI
    model: google/gemini-2.5-flash-preview
    api_key: $OPENROUTER_API_KEY
    base_url: https://openrouter.ai/api/v1

  - name: qwen3-32b-vllm
    use: deerflow.models.vllm_provider:VllmChatModel
    model: Qwen/Qwen3-32B
    base_url: http://localhost:8000/v1
    supports_thinking: true

The most novel piece is the CLI-backed providers. You can plug DeerFlow into your existing Codex CLI or Claude Code OAuth so it reuses your subscription quota instead of charging another API key:

  - name: claude-sonnet-4.6
    use: deerflow.models.claude_provider:ClaudeChatModel
    model: claude-sonnet-4-6
    supports_thinking: true

Claude Code reads ~/.claude/.credentials.json; Codex CLI reads ~/.codex/auth.json. On macOS, Claude Code auth sometimes needs an explicit eval "$(python3 scripts/export_claude_code_oauth.py --print-export)" — a thoughtful detail that tells you the team actually ran this on a Mac.

Architecture: What's Actually Running

When you make docker-start, you get:

Gateway API — single Python process owning run state, the stream bridge, and the LangGraph-compatible HTTP surface. Single worker by default (GATEWAY_WORKERS=1).
Frontend — Next.js UI at :2026, hot-reload in dev mode
Sandbox provisioner — only started when config.yaml uses provisioner mode (sandbox.use: deerflow.community.aio_sandbox:AioSandboxProvider)
Sub-agent containers — spawned on demand inside the configured sandbox

The single-worker constraint deserves attention. From the README:

The Gateway holds run state (RunManager and the stream bridge) in process, so production defaults to a single Gateway worker (GATEWAY_WORKERS=1). Raising the worker count without a shared cross-worker stream bridge — which is not yet available — breaks run cancellation, SSE reconnects, request de-duplication, and IM channels.

Translation: scale vertically, not horizontally. Throw more CPU and RAM at one Gateway worker; don't try to run N replicas behind a load balancer. This is fine for a team of 5–50 developers. It's a hard ceiling for "give DeerFlow to 10,000 users."

Skills & Sub-Agents

The skill system is the part that feels most differentiated from competitors like AutoGen or CrewAI.

Skills live in skills/ by default (override via DEER_FLOW_SKILLS_PATH)
A skill bundles a system prompt, a tool list, optional sub-agent declarations, and a small set of guardrails
The main agent doesn't call skills; it picks them up — meaning the model decides when a skill is relevant based on the user goal and the skill's frontmatter description
This is the same pattern Anthropic shipped as "AgentSkills" (see Anthropic's own agentskills.io spec — DeerFlow's design is clearly informed by it)

Sub-agents follow a similar logic:

Declared in a skill or at the harness level
Each gets isolated memory and a scoped sandbox path
The message gateway forwards inbound/outbound messages, so you can wire IM channels (Lark, Slack, Discord) to a sub-agent directly
Cancellation propagates from the parent run — kill a top-level run and all sub-agents die cleanly

The "hours-long task" promise is real if you give it the right model (Doubao-Seed-2.0-Code, DeepSeek v3.2, or Kimi 2.5 per the README) and a generous sandbox. With smaller models, it still works, but the failure modes get more interesting.

Sandbox Modes & Sizing

Three sandbox modes ship in-tree:

Local execution — runs in your shell, no isolation. Quick demos only; treat as "trust every prompt."
Docker (default) — solid isolation, easy debugging, what 95% of evaluators will use.
Kubernetes via provisioner — real isolation and real scale; needs a provisioner service and kubeconfig.

ByteDance is unusually honest about hardware needs:

Deployment	Starting point	Recommended
Local eval (`make dev`)	4 vCPU / 8 GB / 20 GB SSD	8 vCPU / 16 GB
Docker dev (`make docker-start`)	4 vCPU / 8 GB / 25 GB SSD	8 vCPU / 16 GB
Long-running server (`make up`)	8 vCPU / 16 GB / 40 GB SSD	16 vCPU / 32 GB

And from the README: "These numbers cover DeerFlow itself. If you also host a local LLM, size that service separately." If you're running a 32B model locally and the harness on the same box, expect a beefy workstation. This isn't a Raspberry Pi project.

The README also has an explicit security section: Improper Deployment May Introduce Security Risks. Which is a polite way of saying: this is a Docker-out-of-Docker shaped object with file-write tools and bash access. Don't expose it to the public internet without auth.

Community Reactions

Trendshift ranked DeerFlow #1 in repository momentum after v2.
VibeCoding: "On February 28, 2026, ByteDance quietly open-sourced DeerFlow. Within days it hit #1 trending on GitHub. Within weeks it crossed 60,000 stars."
The Dev.to writeup zeroed in on the SuperAgent label as "doing real work" — not marketing fluff.
Awesome Agents' verdict was less euphoric: "a powerful open-source agent harness that executes long-horizon tasks inside Docker sandboxes — impressive engineering, but not a turnkey solution."

That last verdict is the most accurate. DeerFlow gives you the harness. It does not give you a finished product. You still bring the model, the skills, the integration code, and the patience to wait for hour-long runs.

Honest Limitations

After working through the README and tracking community feedback, here are the rough edges to budget for:

Single-worker Gateway — vertical scale only until cross-worker state lands
First-run sandbox image pull is multi-GB; expect a slow make docker-init on the first try
Default sandbox provisioner pulls from BytePlus infra — if you're behind a corporate firewall or air-gapped, you'll need to set UV_INDEX_URL and NPM_REGISTRY manually
Windows local dev requires Git Bash — native cmd.exe/PowerShell shells aren't supported, and WSL is "not guaranteed" because some scripts rely on Git for Windows utilities like cygpath
CORS + CSRF defaults assume same-origin — split-origin frontends need GATEWAY_CORS_ORIGINS set explicitly
Memory is file-backed by default — fine for a single workstation, awkward for a team; you'll want to mount a shared volume or move to a managed memory provider once usage grows
Recommended models are Chinese-origin (Doubao, DeepSeek, Kimi). Western teams will likely substitute Claude/GPT-5 — which the harness supports, but the "feels right" tuning was clearly done against the recommended set
Skill system is powerful but underdocumented — backend/docs/MEMORY_SETTINGS_REVIEW.md is great; the equivalent for skills is thinner

None of these are deal-breakers. They're a tax you pay to skip six months of harness engineering.

DeerFlow vs. The Field

	DeerFlow 2.0	LangGraph (vanilla)	AutoGen	CrewAI	Claude Code
Long-horizon harness	✅ Built-in	❌ DIY	Partial	Partial	✅ Different model
Docker sandbox	✅ Default	❌ DIY	❌ DIY	❌ DIY	✅ Different model
Sub-agents	✅ First-class	✅ Graph nodes	✅ AssistantAgents	✅ Crews	✅ Sub-agents
Persistent memory	✅ File + Settings UI	Partial	Partial	Partial	✅ Different model
Skill plug-ins	✅ Drop-in	❌	❌	❌	✅ AgentSkills
MCP server	✅	Via integration	Via integration	Via integration	✅ Native
IM channels	✅ Lark, Slack	❌	❌	❌	❌
Best for	Self-hosted long-horizon agents	Custom workflows you fully own	Multi-agent conversations	Role-based teams	Coding tasks on your machine

Claude Code is in a different category — it's a coding agent for individual developers, not a self-hostable harness. DeerFlow is the self-hostable harness that talks to Claude Code (via OAuth) so you can hand it long-horizon goals from inside your own infra.

When to Use DeerFlow

Good fit:

You want to self-host a long-horizon agent (e.g., overnight research, multi-hour codegen, scheduled report generation)
You already have a model API or a local vLLM and want to put a real harness around it
You're a small team (5–50 devs) running on one beefy server or a single-tenant Kubernetes namespace
You like the LangGraph wire format but want batteries included
You want IM channels (Lark, Slack, Discord) wired straight to an agent

Bad fit:

You need to serve 10,000 users from one deployment — the single-worker Gateway will hurt
You want a turnkey end-user product — DeerFlow is a harness, not a UX
You're allergic to Docker — local execution mode exists but it's a footgun
You need air-gapped operation today — possible, but you'll be configuring mirrors and pinning sandbox images manually

FAQ

Q: Is DeerFlow 2.0 compatible with DeerFlow 1.x configs?
No. v2 is a ground-up rewrite that shares zero code with v1. The original Deep Research framework is maintained on the main-1.x branch and still accepts contributions, but active development has moved to 2.0.

Q: Do I have to use Doubao / DeepSeek / Kimi?
No, but the README explicitly recommends them: "We strongly recommend using Doubao-Seed-2.0-Code, DeepSeek v3.2 and Kimi 2.5 to run DeerFlow." GPT-5, Claude Sonnet 4.6, Gemini 2.5, and local vLLM models all work via the YAML config. Expect to tune prompts a bit if you swap models — the harness was clearly developed against the recommended set.

Q: Can DeerFlow drive my existing Claude Code or Codex CLI?
Yes. The CLI-backed providers (deerflow.models.claude_provider:ClaudeChatModel and deerflow.models.openai_codex_provider:CodexChatModel) read your local OAuth credentials (~/.claude/.credentials.json, ~/.codex/auth.json) so DeerFlow can call them as ordinary chat models without a separate API key.

Q: Is the sandbox actually secure?
The Docker sandbox is reasonable for development. For production, use the Kubernetes provisioner mode and read the README's "Security Recommendations" section before exposing anything to the network. The local-execution mode is not sandboxed — treat it as "I trust every prompt this model will generate."

Q: Can I scale DeerFlow horizontally behind a load balancer?
Not today. The Gateway holds run state in process and there's no cross-worker stream bridge yet. Scale vertically — more CPU and RAM on one Gateway worker — or shard by team/project across multiple deployments.

Q: How does DeerFlow compare to LangGraph?
DeerFlow uses LangGraph's wire format and exposes a /api/langgraph/*-compatible HTTP surface, so existing LangGraph clients work unchanged. The difference is everything around the graph: DeerFlow ships the sandbox, the run manager, the message gateway, the skill loader, the memory store, and the UI. With vanilla LangGraph, you build all of that yourself.

Bottom Line

DeerFlow 2.0 is the most complete open-source SuperAgent harness available right now. It's not the easiest to deploy, and it's tied to BytePlus infrastructure in places where ByteDance had reasons to defaults you might not share. But it solves the actual hard problem — who owns the sandbox, the memory, the run state, the message bus — in a way that you'd otherwise spend six months reinventing.

If you've been writing custom LangGraph harnesses for the last year and the maintenance burden is biting, give DeerFlow a weekend. The make setup wizard is fast. Docker dev mode works on a 16 GB laptop. The architecture decisions, while opinionated, are mostly the right ones.

For larger deployments, watch the cross-worker stream bridge issue. Once that lands, DeerFlow becomes genuinely production-grade for multi-tenant agent platforms.

74K stars in four months is not a coincidence. ByteDance shipped something the open-source agent community actually needed, and they shipped it polished. That's rare.

Try it: github.com/bytedance/deer-flow
Docs: deerflow.tech
License: Apache 2.0

DEV Community