clearloop for OpenWalrus

Posted on Mar 15 • Originally published at openwalrus.xyz

Built-in agents: what ships in the box

#ai #research #opensource #openwalrus

Every AI coding product calls itself "agentic." But when you actually install one, what agents ship inside it? Is there a single LLM loop with a mode switch, or are there distinct, named agents with their own tool sets and context windows?

We examined ten products — from IDE assistants to autonomous coding agents — and cataloged what each one actually ships. The findings: most products offer a single agent with multiple modes, two have converged on nearly identical subagent patterns, and only two ship full multi-agent orchestration with named roles.

What we surveyed

Ten AI coding products, all examined through their official documentation, public GitHub repositories, and vendor blog posts as of March 2026:

Claude Code (Anthropic) — CLI agent with subagent delegation
GitHub Copilot — IDE + CLI agent with built-in specialized agents
Cursor — IDE agent with modes and unnamed subagents
Windsurf (Codeium) — IDE agent with background planning
Devin (Cognition) — autonomous agent with cloud IDE
OpenHands (All Hands AI) — multi-agent framework with delegation
Aider — terminal pair programmer with dual-model pipeline
Amazon Q Developer — AWS-integrated coding agent
Jules (Google) — autonomous cloud-based coding agent
Augment Code — IDE agent + multi-agent orchestration workspace

Product by product

Claude Code

Six built-in subagents, each running in its own context window:

Subagent	Model	Purpose
Explore	Haiku (fast)	Read-only codebase search and file discovery
Plan	Inherited	Read-only research for implementation planning
General-purpose	Inherited	Complex multi-step tasks with all tools
Bash	Inherited	Terminal command execution in separate context
statusline-setup	Sonnet	Status line configuration
Claude Code Guide	Haiku	Answering questions about Claude Code features

Subagents cannot spawn other subagents — the main thread dispatches, and each subagent returns a single result. This is a deliberate one-level constraint.

Custom agents are defined as Markdown files with YAML frontmatter, stored in .claude/agents/. Each custom agent can specify which tools it has access to, which MCP servers to use, and its own lifecycle hooks. This is the most granular built-in extensibility we found.

GitHub Copilot

Four built-in agents, automatically selected based on the task:

Agent	Purpose
Explore	Fast codebase analysis without cluttering main context
Task	Runs builds and tests. Brief summaries on success, full output on failure
Plan	Creates implementation plans by analyzing dependencies
Code Review	Surfaces genuine issues with high signal-to-noise ratio

Copilot automatically delegates to these agents and can run multiple in parallel — no manual agent selection required.

Custom agents follow the same Markdown-with-YAML-frontmatter pattern as Claude Code. Stored at .github/agents/CUSTOM-AGENT-NAME.md for repository-level agents, or in the .github-private repository for organization-wide agents. The convergence between Claude Code and Copilot on this pattern is striking — both independently arrived at the same file format for user-defined agents.

Separately, the Copilot Coding Agent (GitHub.com-based) works asynchronously on assigned issues, creating PRs from a sandboxed cloud environment.

Cursor

Four modes, not four agents:

Mode	Capabilities
Agent (default)	Most autonomous — explores, edits, runs commands, fixes errors
Ask	Read-only — searches codebase, answers questions
Manual	Direct editing of explicitly selected files
Plan	Researches codebase and creates implementation plans

Cursor's documentation mentions "default subagents for researching your codebase, running terminal commands, and executing parallel work streams," but does not name them publicly. This makes it hard to compare directly with Claude Code's named subagents.

Users can define Custom Modes with specific tool combinations and instructions. Cursor also supports Skills (via SKILL.md files) and MCP servers for external tool integration.

Windsurf

A single primary agent called Cascade, operating in four modes (Code, Chat, Plan, Arena). Arena mode runs two Cascade instances side-by-side for comparison — an evaluation feature, not agent delegation.

What's unique is the background planning agent: a separate process that "continuously refines the long-term plan while your selected model focuses on short-term actions." This planner generates automatic todo lists for complex tasks and runs alongside the main model without user prompting. No other product in our survey ships this pattern.

Tool calls are capped at 20 per prompt. Extensibility is limited to MCP server configuration — no custom agent types.

Devin

A single autonomous agent with a full cloud IDE (terminal, editor, browser). Not a multi-agent system, but ships four built-in features that function like specialized tools:

Devin Search — agentic code exploration with a "Deep Mode" for complex queries
Devin Wiki — auto-indexes repositories every few hours, producing architecture diagrams and documentation
Devin Review — code review that proposes edits and can apply commits
Desktop Testing — end-to-end testing via computer use on Linux desktop apps

Extensible through Playbooks (reusable task templates) and Knowledge items (persistent context recalled automatically). Cognition's engineering team has noted they're exploring sub-agents but are cautious — "you have to be very careful about when to use subagents because the context and state management gets complex quickly."

OpenHands

The most explicitly multi-agent system among dedicated coding tools. Three built-in agents with delegation:

Agent	Purpose
CodeActAgent (default)	Implements the CodeAct framework — unifies LLM actions into a code action space. Executes bash commands and Python
BrowsingAgent	Handles web browsing tasks when delegated from CodeActAgent
DelegatorAgent	Routes tasks to micro-agents: RepoStudyAgent (repo analysis) and VerifierAgent (task completion checks)

This is real agent delegation — CodeActAgent can hand off web browsing to BrowsingAgent, and DelegatorAgent can decompose tasks across specialized micro-agents. The Python SDK lets developers define additional agents.

OpenHands also supports ACP (Agent Communication Protocol) for spawning external agent runtimes — including Codex, Claude Code, Gemini CLI, and OpenCode — as persistent or one-shot sessions.

Aider

Four modes, with one genuinely interesting architectural choice:

Mode	Purpose
Code (default)	Makes file changes
Ask	Q&A about code without changes
Architect	Two-model pipeline: architect proposes, editor implements
Help	Answers questions about aider itself

Architect mode is the standout: it splits work across two models. Model 1 (the architect) reasons about the solution. Model 2 (the editor) translates that reasoning into precise file edits. This is useful when the best reasoning model (e.g., o1, Gemini 2.5 Pro) isn't the best at applying diffs. The two models can be different.

This is a pipeline, not agent delegation — both models operate within the same conversation flow. Extensibility is limited to model selection and configuration. No MCP support, no custom agents.

Amazon Q Developer

A single agentic system with seven specialized capabilities:

Capability	Purpose
Software Development Agent	Multi-file implementation with tests (66% on SWE-Bench Verified)
Code Generation	Real-time suggestions in 25+ languages
Security Scanning	Vulnerability detection (exposed credentials, injection, etc.)
Unit Test Generation	Iterative test writing within projects
Documentation Generation	In-depth docs with data flow diagrams
Code Review	Logical errors, anti-patterns, security issues
Code Transformation	.NET porting, Java version upgrades

These are not separate agents — they're capabilities within a single agentic system. The software development agent runs builds and tests to validate generated code before surfacing it. The CLI supports MCP for external tool integration.

Jules

Google's asynchronous coding agent. A single autonomous agent running in a secure cloud VM, powered by Gemini 2.5 Pro.

What makes Jules different from the IDE agents is proactivity: Suggested Tasks can identify code improvements and fixes across up to 5 repositories without being asked. Scheduled tasks run on a recurring basis. Both are patterns we haven't seen in other products' built-in agent designs.

Jules also ships persistent Memory that remembers preferences across tasks, and an API for programmatic access. But it's a single agent — no delegation, no sub-agents, no nesting.

Augment Code

Two-tier architecture — the most complex agent setup in our survey.

Tier 1: IDE Agent ("Auggie") — a standard IDE assistant with Auto/Ask/Standard modes, parallel tool execution, and native integrations (GitHub, Linear, Jira, Confluence, etc.).

Tier 2: Intent — a multi-agent orchestration workspace with three explicit roles:

Role	Purpose
Coordinator	Analyzes codebase via Context Engine, breaks specs into structured task lists, sequences work into parallel waves
Implementor	Executes specific tasks in parallel within isolated git worktrees. Each receives context scoped to its assigned work
Verifier	Checks results against the original spec. Flags inconsistencies and missing edge cases

The most notable detail: Intent supports Claude Code, Codex, and OpenCode as implementor agents. It's not vendor-locked — the orchestrator can dispatch to other companies' agents. Teams can define custom specialist agents and control orchestration rules.

The landscape at a glance

Product	Built-in agents	Architecture	User-extensible	Nesting	Extensibility
Claude Code	6 subagents	Subagent delegation	Yes	1 level	Markdown files, MCP
GitHub Copilot	4 agents	Auto-delegation	Yes	GA	Markdown files, MCP
Cursor	4 modes + unnamed subagents	Single + modes	Yes	Undocumented	Custom Modes, MCP
Windsurf	1 + background planner	Single + planner	Limited	No	MCP
Devin	1 + 4 features	Single agent	Yes	Not yet	Playbooks, MCP
OpenHands	3+ agents	Multi-agent	Yes	Yes (chains)	Python SDK, ACP
Aider	4 modes	Single + dual-model	Limited	No	Model config
Amazon Q	1 + 7 capabilities	Single agent	Limited	No	MCP (CLI)
Jules	1 agent	Single async	Yes	No	Jules API
Augment Intent	3 roles	Multi-agent orchestration	Yes	Yes	Custom specialists

Three architectures, one spectrum

The ten products cluster into three architectural patterns:

Single agent, multiple modes. Cursor, Windsurf, Aider, Amazon Q, Jules, and Devin all ship a single agent with mode switches. The agent loop is the same — what changes is the tool set, the system prompt, or the autonomy level. This is the dominant pattern (6 of 10 products). It's simple to reason about and debug, but limits the system's ability to parallelize work or isolate context.

Subagent delegation. Claude Code and GitHub Copilot both ship named subagents that run in their own context windows. The main thread dispatches tasks to specialized agents (Explore, Plan, Task) and collects results. Claude Code limits nesting to one level; Copilot's depth constraints aren't documented. Both independently converged on Markdown files with YAML frontmatter for user-defined agents — the closest thing to a standard format in this space.

Multi-agent orchestration. OpenHands and Augment Intent have explicit role-based agent systems with coordination. OpenHands delegates between CodeActAgent, BrowsingAgent, and DelegatorAgent. Augment Intent assigns Coordinator, Implementor, and Verifier roles across isolated worktrees. These architectures are the most flexible but also the hardest to debug — as we explored in our survey of agent calling patterns.
[Interactive chart — see original post]

Patterns worth noting

MCP as the universal extensibility layer. Eight of ten products support MCP (Model Context Protocol) for external tool integration. The only holdouts are Aider (no MCP support documented) and Jules (API-only extensibility). MCP hasn't standardized agent-to-agent communication, but it's become the default way to give agents new tools.

The Markdown agent pattern. Claude Code and GitHub Copilot independently converged on the same format: a Markdown file with YAML frontmatter that defines an agent's name, description, system prompt, and tool permissions. Both store these files in dotfile directories (.claude/agents/ and .github/agents/ respectively). Whether this becomes a cross-tool standard or stays vendor-specific is an open question.

Background planning. Windsurf is the only product shipping a continuously-running planning agent that operates alongside the main model. Every other product either plans on-demand (via a Plan mode) or doesn't plan at all. If continuous planning produces meaningfully better outcomes, other products will likely adopt the pattern — but we haven't seen benchmarks comparing the two approaches.

Third-party agent orchestration. Augment Intent can dispatch work to Claude Code, Codex, and OpenCode as implementor agents. This is a qualitatively different extensibility model — instead of adding tools via MCP, you're adding entire agent runtimes. OpenHands takes a similar approach with ACP. If this pattern grows, the "which coding agent should I use?" question becomes less relevant — you use an orchestrator that dispatches to whichever agent fits the task.
[Interactive chart — see original post]

What the research says

The multi-agent coordination literature supports specialization but warns about coordination overhead. The MAST taxonomy (March 2025) analyzed 1,600+ failure traces across multi-agent systems and found that specification and planning failures account for the majority of errors — not execution failures. This aligns with what we see in the product landscape: most products chose single-agent architectures specifically to avoid coordination complexity.

The Agent Drift study (January 2026) measured 21% higher performance retention with single-agent architectures compared to multi-agent setups on sustained tasks. The tradeoff is parallelism — multi-agent systems handle concurrent workstreams better, but single agents maintain coherence longer.

We explored related failure patterns in multi-agent coordination and the role of task registries in making multi-agent work observable.

Open questions

Does the Markdown agent pattern become a standard? Claude Code and Copilot converged independently. If Cursor, Windsurf, or Augment adopt the same format, it could become a de facto standard for portable agent definitions. Or each vendor could diverge, splitting the ecosystem.

Will single-agent products add delegation? Six of ten products ship a single agent with modes. Is that a limitation they'll outgrow, or a deliberate choice that holds? Devin's caution about sub-agent complexity suggests the latter isn't guaranteed.

Does orchestration beat specialization? Augment Intent orchestrates other vendors' agents. OpenHands dispatches to micro-agents. If orchestration consistently outperforms single-agent + modes, the product category shifts from "which agent?" to "which orchestrator?"

How deep should nesting go? Claude Code says one level. OpenHands allows chains. The MAST taxonomy shows coordination failures increase with depth. But some tasks genuinely require recursive decomposition.

Will background planning spread? Windsurf's continuously-running planner is unique. If it demonstrably improves outcomes on complex tasks, every product with a Plan mode will need to consider whether on-demand planning is enough.

Where does extensibility plateau? MCP adds tools. Custom agents add behaviors. ACP adds entire runtimes. Is there a natural ceiling, or does agent extensibility keep compounding? The skills pattern offers one perspective — extensibility through composable skills rather than agent proliferation.

What this means for walrus

Walrus is an agent runtime, not a coding tool. It doesn't ship subagents in the same sense as Claude Code or Copilot. But the patterns in this survey map directly to WHS (Walrus Hook Service) architecture.

WHS hooks are walrus's built-in agents. Inference, memory, and channels are all lifecycle hooks — each independently swappable, each using the same API as third-party hooks. This is closer to Augment Intent's model (plug in different implementors) than to the single-agent-with-modes pattern.

The Markdown agent pattern has legs. Claude Code and Copilot proved that Markdown + YAML frontmatter is a natural format for defining agent behavior. Walrus could adopt this for user-defined WHS hook configurations — a .walrus/hooks/ directory with Markdown files describing custom hook behavior, tool permissions, and lifecycle rules.

Background planning is a hook type. Windsurf's continuous planner runs alongside the main agent. In walrus terms, this is a lifecycle hook that watches the agent's state and adjusts the plan asynchronously — exactly the kind of concern WHS is designed to separate.

Tool isolation matters. Claude Code gives each subagent a restricted tool set (Explore gets read-only, Plan gets read-only). Augment Intent gives each implementor a scoped context. WHS hooks already run with isolated permissions — the pattern validates the architectural choice.

DEV Community