Every AI coding product calls itself "agentic." But when you actually install one, what agents ship inside it? Is there a single LLM loop with a mode switch, or are there distinct, named agents with their own tool sets and context windows?
We examined ten products — from IDE assistants to autonomous coding agents — and cataloged what each one actually ships. The findings: most products offer a single agent with multiple modes, two have converged on nearly identical subagent patterns, and only two ship full multi-agent orchestration with named roles.
What we surveyed
Ten AI coding products, all examined through their official documentation, public GitHub repositories, and vendor blog posts as of March 2026:
- Claude Code (Anthropic) — CLI agent with subagent delegation
- GitHub Copilot — IDE + CLI agent with built-in specialized agents
- Cursor — IDE agent with modes and unnamed subagents
- Windsurf (Codeium) — IDE agent with background planning
- Devin (Cognition) — autonomous agent with cloud IDE
- OpenHands (All Hands AI) — multi-agent framework with delegation
- Aider — terminal pair programmer with dual-model pipeline
- Amazon Q Developer — AWS-integrated coding agent
- Jules (Google) — autonomous cloud-based coding agent
- Augment Code — IDE agent + multi-agent orchestration workspace
Product by product
Claude Code
Six built-in subagents, each running in its own context window:
| Subagent | Model | Purpose |
|---|---|---|
| Explore | Haiku (fast) | Read-only codebase search and file discovery |
| Plan | Inherited | Read-only research for implementation planning |
| General-purpose | Inherited | Complex multi-step tasks with all tools |
| Bash | Inherited | Terminal command execution in separate context |
| statusline-setup | Sonnet | Status line configuration |
| Claude Code Guide | Haiku | Answering questions about Claude Code features |
Subagents cannot spawn other subagents — the main thread dispatches, and each subagent returns a single result. This is a deliberate one-level constraint.
Custom agents are defined as Markdown files with YAML frontmatter, stored in .claude/agents/. Each custom agent can specify which tools it has access to, which MCP servers to use, and its own lifecycle hooks. This is the most granular built-in extensibility we found.
GitHub Copilot
Four built-in agents, automatically selected based on the task:
| Agent | Purpose |
|---|---|
| Explore | Fast codebase analysis without cluttering main context |
| Task | Runs builds and tests. Brief summaries on success, full output on failure |
| Plan | Creates implementation plans by analyzing dependencies |
| Code Review | Surfaces genuine issues with high signal-to-noise ratio |
Copilot automatically delegates to these agents and can run multiple in parallel — no manual agent selection required.
Custom agents follow the same Markdown-with-YAML-frontmatter pattern as Claude Code. Stored at .github/agents/CUSTOM-AGENT-NAME.md for repository-level agents, or in the .github-private repository for organization-wide agents. The convergence between Claude Code and Copilot on this pattern is striking — both independently arrived at the same file format for user-defined agents.
Separately, the Copilot Coding Agent (GitHub.com-based) works asynchronously on assigned issues, creating PRs from a sandboxed cloud environment.
Cursor
Four modes, not four agents:
| Mode | Capabilities |
|---|---|
| Agent (default) | Most autonomous — explores, edits, runs commands, fixes errors |
| Ask | Read-only — searches codebase, answers questions |
| Manual | Direct editing of explicitly selected files |
| Plan | Researches codebase and creates implementation plans |
Cursor's documentation mentions "default subagents for researching your codebase, running terminal commands, and executing parallel work streams," but does not name them publicly. This makes it hard to compare directly with Claude Code's named subagents.
Users can define Custom Modes with specific tool combinations and instructions. Cursor also supports Skills (via SKILL.md files) and MCP servers for external tool integration.
Windsurf
A single primary agent called Cascade, operating in four modes (Code, Chat, Plan, Arena). Arena mode runs two Cascade instances side-by-side for comparison — an evaluation feature, not agent delegation.
What's unique is the background planning agent: a separate process that "continuously refines the long-term plan while your selected model focuses on short-term actions." This planner generates automatic todo lists for complex tasks and runs alongside the main model without user prompting. No other product in our survey ships this pattern.
Tool calls are capped at 20 per prompt. Extensibility is limited to MCP server configuration — no custom agent types.
Devin
A single autonomous agent with a full cloud IDE (terminal, editor, browser). Not a multi-agent system, but ships four built-in features that function like specialized tools:
- Devin Search — agentic code exploration with a "Deep Mode" for complex queries
- Devin Wiki — auto-indexes repositories every few hours, producing architecture diagrams and documentation
- Devin Review — code review that proposes edits and can apply commits
- Desktop Testing — end-to-end testing via computer use on Linux desktop apps
Extensible through Playbooks (reusable task templates) and Knowledge items (persistent context recalled automatically). Cognition's engineering team has noted they're exploring sub-agents but are cautious — "you have to be very careful about when to use subagents because the context and state management gets complex quickly."
OpenHands
The most explicitly multi-agent system among dedicated coding tools. Three built-in agents with delegation:
| Agent | Purpose |
|---|---|
| CodeActAgent (default) | Implements the CodeAct framework — unifies LLM actions into a code action space. Executes bash commands and Python |
| BrowsingAgent | Handles web browsing tasks when delegated from CodeActAgent |
| DelegatorAgent | Routes tasks to micro-agents: RepoStudyAgent (repo analysis) and VerifierAgent (task completion checks) |
This is real agent delegation — CodeActAgent can hand off web browsing to BrowsingAgent, and DelegatorAgent can decompose tasks across specialized micro-agents. The Python SDK lets developers define additional agents.
OpenHands also supports ACP (Agent Communication Protocol) for spawning external agent runtimes — including Codex, Claude Code, Gemini CLI, and OpenCode — as persistent or one-shot sessions.
Aider
Four modes, with one genuinely interesting architectural choice:
| Mode | Purpose |
|---|---|
| Code (default) | Makes file changes |
| Ask | Q&A about code without changes |
| Architect | Two-model pipeline: architect proposes, editor implements |
| Help | Answers questions about aider itself |
Architect mode is the standout: it splits work across two models. Model 1 (the architect) reasons about the solution. Model 2 (the editor) translates that reasoning into precise file edits. This is useful when the best reasoning model (e.g., o1, Gemini 2.5 Pro) isn't the best at applying diffs. The two models can be different.
This is a pipeline, not agent delegation — both models operate within the same conversation flow. Extensibility is limited to model selection and configuration. No MCP support, no custom agents.
Amazon Q Developer
A single agentic system with seven specialized capabilities:
| Capability | Purpose |
|---|---|
| Software Development Agent | Multi-file implementation with tests (66% on SWE-Bench Verified) |
| Code Generation | Real-time suggestions in 25+ languages |
| Security Scanning | Vulnerability detection (exposed credentials, injection, etc.) |
| Unit Test Generation | Iterative test writing within projects |
| Documentation Generation | In-depth docs with data flow diagrams |
| Code Review | Logical errors, anti-patterns, security issues |
| Code Transformation | .NET porting, Java version upgrades |
These are not separate agents — they're capabilities within a single agentic system. The software development agent runs builds and tests to validate generated code before surfacing it. The CLI supports MCP for external tool integration.
Jules
Google's asynchronous coding agent. A single autonomous agent running in a secure cloud VM, powered by Gemini 2.5 Pro.
What makes Jules different from the IDE agents is proactivity: Suggested Tasks can identify code improvements and fixes across up to 5 repositories without being asked. Scheduled tasks run on a recurring basis. Both are patterns we haven't seen in other products' built-in agent designs.
Jules also ships persistent Memory that remembers preferences across tasks, and an API for programmatic access. But it's a single agent — no delegation, no sub-agents, no nesting.
Augment Code
Two-tier architecture — the most complex agent setup in our survey.
Tier 1: IDE Agent ("Auggie") — a standard IDE assistant with Auto/Ask/Standard modes, parallel tool execution, and native integrations (GitHub, Linear, Jira, Confluence, etc.).
Tier 2: Intent — a multi-agent orchestration workspace with three explicit roles:
| Role | Purpose |
|---|---|
| Coordinator | Analyzes codebase via Context Engine, breaks specs into structured task lists, sequences work into parallel waves |
| Implementor | Executes specific tasks in parallel within isolated git worktrees. Each receives context scoped to its assigned work |
| Verifier | Checks results against the original spec. Flags inconsistencies and missing edge cases |
The most notable detail: Intent supports Claude Code, Codex, and OpenCode as implementor agents. It's not vendor-locked — the orchestrator can dispatch to other companies' agents. Teams can define custom specialist agents and control orchestration rules.
The landscape at a glance
| Product | Built-in agents | Architecture | User-extensible | Nesting | Extensibility |
|---|---|---|---|---|---|
| Claude Code | 6 subagents | Subagent delegation | Yes | 1 level | Markdown files, MCP |
| GitHub Copilot | 4 agents | Auto-delegation | Yes | GA | Markdown files, MCP |
| Cursor | 4 modes + unnamed subagents | Single + modes | Yes | Undocumented | Custom Modes, MCP |
| Windsurf | 1 + background planner | Single + planner | Limited | No | MCP |
| Devin | 1 + 4 features | Single agent | Yes | Not yet | Playbooks, MCP |
| OpenHands | 3+ agents | Multi-agent | Yes | Yes (chains) | Python SDK, ACP |
| Aider | 4 modes | Single + dual-model | Limited | No | Model config |
| Amazon Q | 1 + 7 capabilities | Single agent | Limited | No | MCP (CLI) |
| Jules | 1 agent | Single async | Yes | No | Jules API |
| Augment Intent | 3 roles | Multi-agent orchestration | Yes | Yes | Custom specialists |
Three architectures, one spectrum
The ten products cluster into three architectural patterns:
Single agent, multiple modes. Cursor, Windsurf, Aider, Amazon Q, Jules, and Devin all ship a single agent with mode switches. The agent loop is the same — what changes is the tool set, the system prompt, or the autonomy level. This is the dominant pattern (6 of 10 products). It's simple to reason about and debug, but limits the system's ability to parallelize work or isolate context.
Subagent delegation. Claude Code and GitHub Copilot both ship named subagents that run in their own context windows. The main thread dispatches tasks to specialized agents (Explore, Plan, Task) and collects results. Claude Code limits nesting to one level; Copilot's depth constraints aren't documented. Both independently converged on Markdown files with YAML frontmatter for user-defined agents — the closest thing to a standard format in this space.
Multi-agent orchestration. OpenHands and Augment Intent have explicit role-based agent systems with coordination. OpenHands delegates between CodeActAgent, BrowsingAgent, and DelegatorAgent. Augment Intent assigns Coordinator, Implementor, and Verifier roles across isolated worktrees. These architectures are the most flexible but also the hardest to debug — as we explored in our survey of agent calling patterns.
[Interactive chart — see original post]
Patterns worth noting
MCP as the universal extensibility layer. Eight of ten products support MCP (Model Context Protocol) for external tool integration. The only holdouts are Aider (no MCP support documented) and Jules (API-only extensibility). MCP hasn't standardized agent-to-agent communication, but it's become the default way to give agents new tools.
The Markdown agent pattern. Claude Code and GitHub Copilot independently converged on the same format: a Markdown file with YAML frontmatter that defines an agent's name, description, system prompt, and tool permissions. Both store these files in dotfile directories (.claude/agents/ and .github/agents/ respectively). Whether this becomes a cross-tool standard or stays vendor-specific is an open question.
Background planning. Windsurf is the only product shipping a continuously-running planning agent that operates alongside the main model. Every other product either plans on-demand (via a Plan mode) or doesn't plan at all. If continuous planning produces meaningfully better outcomes, other products will likely adopt the pattern — but we haven't seen benchmarks comparing the two approaches.
Third-party agent orchestration. Augment Intent can dispatch work to Claude Code, Codex, and OpenCode as implementor agents. This is a qualitatively different extensibility model — instead of adding tools via MCP, you're adding entire agent runtimes. OpenHands takes a similar approach with ACP. If this pattern grows, the "which coding agent should I use?" question becomes less relevant — you use an orchestrator that dispatches to whichever agent fits the task.
[Interactive chart — see original post]
What the research says
The multi-agent coordination literature supports specialization but warns about coordination overhead. The MAST taxonomy (March 2025) analyzed 1,600+ failure traces across multi-agent systems and found that specification and planning failures account for the majority of errors — not execution failures. This aligns with what we see in the product landscape: most products chose single-agent architectures specifically to avoid coordination complexity.
The Agent Drift study (January 2026) measured 21% higher performance retention with single-agent architectures compared to multi-agent setups on sustained tasks. The tradeoff is parallelism — multi-agent systems handle concurrent workstreams better, but single agents maintain coherence longer.
We explored related failure patterns in multi-agent coordination and the role of task registries in making multi-agent work observable.
Open questions
Does the Markdown agent pattern become a standard? Claude Code and Copilot converged independently. If Cursor, Windsurf, or Augment adopt the same format, it could become a de facto standard for portable agent definitions. Or each vendor could diverge, splitting the ecosystem.
Will single-agent products add delegation? Six of ten products ship a single agent with modes. Is that a limitation they'll outgrow, or a deliberate choice that holds? Devin's caution about sub-agent complexity suggests the latter isn't guaranteed.
Does orchestration beat specialization? Augment Intent orchestrates other vendors' agents. OpenHands dispatches to micro-agents. If orchestration consistently outperforms single-agent + modes, the product category shifts from "which agent?" to "which orchestrator?"
How deep should nesting go? Claude Code says one level. OpenHands allows chains. The MAST taxonomy shows coordination failures increase with depth. But some tasks genuinely require recursive decomposition.
Will background planning spread? Windsurf's continuously-running planner is unique. If it demonstrably improves outcomes on complex tasks, every product with a Plan mode will need to consider whether on-demand planning is enough.
Where does extensibility plateau? MCP adds tools. Custom agents add behaviors. ACP adds entire runtimes. Is there a natural ceiling, or does agent extensibility keep compounding? The skills pattern offers one perspective — extensibility through composable skills rather than agent proliferation.
What this means for walrus
Walrus is an agent runtime, not a coding tool. It doesn't ship subagents in the same sense as Claude Code or Copilot. But the patterns in this survey map directly to WHS (Walrus Hook Service) architecture.
WHS hooks are walrus's built-in agents. Inference, memory, and channels are all lifecycle hooks — each independently swappable, each using the same API as third-party hooks. This is closer to Augment Intent's model (plug in different implementors) than to the single-agent-with-modes pattern.
The Markdown agent pattern has legs. Claude Code and Copilot proved that Markdown + YAML frontmatter is a natural format for defining agent behavior. Walrus could adopt this for user-defined WHS hook configurations — a .walrus/hooks/ directory with Markdown files describing custom hook behavior, tool permissions, and lifecycle rules.
Background planning is a hook type. Windsurf's continuous planner runs alongside the main agent. In walrus terms, this is a lifecycle hook that watches the agent's state and adjusts the plan asynchronously — exactly the kind of concern WHS is designed to separate.
Tool isolation matters. Claude Code gives each subagent a restricted tool set (Explore gets read-only, Plan gets read-only). Augment Intent gives each implementor a scoped context. WHS hooks already run with isolated permissions — the pattern validates the architectural choice.
Further reading
- Claude Code subagents — Anthropic
- GitHub Copilot custom agents — GitHub
- Cursor agent overview — Cursor
- Windsurf Cascade — Codeium
- Devin 2.0 — Cognition
- OpenHands agents — All Hands AI
- Aider chat modes — Aider
- Amazon Q Developer features — AWS
- Jules out of beta — Google
- Augment Intent — Augment Code
- MAST: multi-agent failure taxonomy — arxiv (March 2025)
- Agent Drift: performance retention in sustained tasks — arxiv (January 2026)
- CodeAct: executable code actions — arxiv (February 2024)
- Model Context Protocol — MCP
Originally published at OpenWalrus.
Top comments (0)