We surveyed eight frameworks to map who can call whom in agent teams — upward, lateral, recursive. Most are more restricted than expected.
When a leader agent delegates work to a team member, can that member escalate back? Can it talk to its peers? Can it recruit its own sub-team? These are the questions that determine whether your multi-agent system is a hierarchy, a mesh, or a tree of one-way pipes.
We surveyed eight frameworks — Claude Code, OpenClaw, CrewAI, AutoGen, LangGraph, OpenAI Agents SDK, Semantic Kernel, and Google ADK — to map the communication patterns they actually support. The findings: most systems are far more restricted than their marketing suggests.
The three directions
Every agent-to-agent call fits one of three directions:
- Downward (parent → child): A leader delegates a task to a team member. This is universally supported — it's the baseline for multi-agent systems.
- Upward (child → parent): A team member escalates back to its leader. This is where frameworks diverge sharply.
- Lateral (peer → peer): Team members communicate directly without going through the leader. This is the rarest capability.
And one structural question cuts across all three: can a team member become a leader? Can an agent that was delegated to create its own sub-team, becoming the root of a recursive hierarchy?
Framework by framework
Claude Code — one level, one direction
Claude Code uses the flattest model of any framework. The main agent spawns subagents via the Task tool. Up to 10 run in parallel. Each gets its own context window.
Communication is strictly one-way. The parent sends a prompt. The subagent returns its final message. There is no mid-execution communication, no streaming, no callback. A feature request documents the gap — users have observed subagents silently abandoning strategies with no way for the parent to detect or intervene.
Can subagents spawn their own subagents? No. The Task tool is explicitly not available to subagents. Users who attempted nesting via claude -p in Bash encountered heap out-of-memory crashes. Nesting depth: 1 level, hardcoded.
| Capability | Supported |
|---|---|
| Parent → Child | Yes |
| Child → Parent | Final result only |
| Peer → Peer | No |
| Recursive nesting | No |
OpenClaw — configurable depth, vertical only
OpenClaw spawns sub-agents via sessions_spawn. Each runs in an isolated session. Spawning is non-blocking — the parent gets a run ID immediately and can continue working.
Sub-agents "announce" their final result back to the parent's chat channel, but there's no mid-execution communication. Lateral communication between siblings doesn't exist yet — an RFC for Agent Teams proposes direct inter-agent messaging and shared state, but it's unimplemented.
The key differentiator: configurable nesting depth. maxSpawnDepth controls how deep sub-agents can recurse:
- Depth 1 (default): sub-agents cannot spawn children
- Depth 2: enables the orchestrator pattern (main → orchestrator → workers)
- Maximum: depth 5
Additional guardrails: maxChildrenPerAgent: 5, maxConcurrent: 8, runTimeoutSeconds: 900.
| Capability | Supported |
|---|---|
| Parent → Child | Yes |
| Child → Parent | Announce (final result) |
| Peer → Peer | No (RFC pending) |
| Recursive nesting | Yes (depth 1-5, configurable) |
CrewAI — delegation as tools
CrewAI runs two process modes: sequential (tasks in order) and hierarchical (manager coordinates workers). Communication follows hub-and-spoke.
When allow_delegation=True, agents get two delegation tools: Delegate Work (assign a task to a teammate by name) and Ask Question (query a colleague). This looks like peer-to-peer, but it's tool-mediated — the framework converts other agents into callable tools on the delegating agent. CrewAI's documentation describes this as "avoiding peer-to-peer agent traffic."
Hierarchical delegation chains are possible via allowed_agents: a management executive delegates to a communications manager, who delegates to an email agent. But this is static configuration, not dynamic sub-team creation. And it's currently broken in some configurations.
| Capability | Supported |
|---|---|
| Parent → Child | Yes |
| Child → Parent | Task result only |
| Peer → Peer | Via delegation tools (tool-mediated) |
| Recursive nesting | Via allowed_agents (static, ~2-3 levels) |
AutoGen — broadcast and nest
AutoGen uses a GroupChat pattern. All agents subscribe to a shared topic. A GroupChatManager selects the next speaker using round-robin, random, manual, or LLM-driven selection. There's no direct agent-to-agent addressing — everything flows through the broadcast topic.
The nesting story is strong. AutoGen explicitly supports "recursive group chats" — an agent in one group chat can package an entire inner multi-agent conversation as a single response. No explicit depth limit.
| Capability | Supported |
|---|---|
| Parent → Child | Yes (via manager) |
| Child → Parent | Via nested chat return |
| Peer → Peer | Via shared broadcast topic |
| Recursive nesting | Yes (unlimited) |
LangGraph — the most flexible
LangGraph offers three multi-agent patterns: supervisor (central router), swarm (decentralized), and hierarchical (supervisors managing supervisors).
In swarm mode, agents can hand off to peers based on their own assessment — true peer-to-peer. In supervisor mode, all routing goes through the coordinator. Hierarchical mode is a first-class pattern: "you can create multi-level hierarchical systems by creating a supervisor that manages multiple supervisors."
Sub-graphs can have different state schemas from parent graphs, enabling private message histories per agent. Nesting is bounded by recursion_limit (default ~25 supersteps), which is configurable.
| Capability | Supported |
|---|---|
| Parent → Child | Yes |
| Child → Parent | Via shared state |
| Peer → Peer | Yes (swarm mode) |
| Recursive nesting | Yes (~25 supersteps default, configurable) |
OpenAI Agents SDK — bidirectional by design
The Agents SDK (successor to Swarm) provides two patterns: handoffs (transfer conversation ownership) and agents-as-tools (invoke as bounded subtask).
Handoffs are bidirectional by design. Agent A lists Agent B in its handoffs. Agent B lists Agent A. Full conversation history is preserved across transfers. This enables circular flows: A → B → A. In agents-as-tools mode, the called agent returns results synchronously.
This is the only framework where true peer-to-peer communication is a core primitive rather than an opt-in mode. The nest_handoff_history beta manages context by collapsing transcript summaries across deep handoff chains.
| Capability | Supported |
|---|---|
| Parent → Child | Yes |
| Child → Parent | Yes (bidirectional handoff) |
| Peer → Peer | Yes (handoffs) |
| Recursive nesting | Yes (unlimited, beta context management) |
Semantic Kernel — five patterns, one API
Semantic Kernel provides five orchestration patterns through a unified API: sequential, concurrent, handoff, group chat, and magentic (inspired by MagenticOne). It's merging with AutoGen into the Microsoft Agent Framework.
Handoff orchestration supports bidirectional transfers. Group chat enables shared conversation. Nested orchestrations are supported — orchestrations can contain other orchestrations, each running independently on the runtime.
| Capability | Supported |
|---|---|
| Parent → Child | Yes |
| Child → Parent | Via orchestration return |
| Peer → Peer | Yes (handoff/group chat) |
| Recursive nesting | Yes (unlimited) |
Google ADK — strict tree, shared whiteboard
Google ADK organizes agents in an explicit tree structure. Three categories: LLM agents (reasoning), workflow agents (SequentialAgent, ParallelAgent, LoopAgent), and custom agents. The framework auto-sets parent_agent on each child.
Sibling agents communicate through shared session state — a "shared digital whiteboard" — not through direct messaging. Sub-agents write results to output_key in state, which the parent reads. An agent instance can only be added as a sub-agent once (a second parent raises ValueError), enforcing a strict tree.
Workflow agents can contain sub-agents that are themselves workflow agents, enabling multi-level hierarchies. The AgentTool pattern wraps agents as callable tools.
| Capability | Supported |
|---|---|
| Parent → Child | Yes |
| Child → Parent | Via shared state |
| Peer → Peer | Via shared state only (indirect) |
| Recursive nesting | Yes (unlimited tree depth) |
The full landscape
| Framework | Child → Parent | Peer → Peer | Recursive nesting | Max depth | Topology |
|---|---|---|---|---|---|
| Claude Code | Final result only | No | No | 1 | Star |
| OpenClaw | Announce only | No | Configurable | 5 | Tree |
| CrewAI | Task result | Via tools | Static config | ~3 | Hub-spoke |
| AutoGen | Nested return | Broadcast | Yes | Unlimited | Pub-sub |
| LangGraph | Shared state | Yes (swarm) | Yes | ~25 | Supervisor/Swarm |
| OpenAI SDK | Bidirectional | Yes | Yes (beta) | Unlimited | Mesh |
| Semantic Kernel | Orchestration | Yes | Yes | Unlimited | Pattern-dependent |
| Google ADK | Shared state | Indirect | Yes | Unlimited | Strict tree |
The agent-as-tool pattern
The dominant abstraction across frameworks isn't messaging — it's wrapping agents as tools. Google ADK has AgentTool. CrewAI converts agents into delegation tools. OpenAI's Agents SDK has an explicit agents-as-tools mode. AWS documents it as a standalone pattern.
The appeal: a parent agent calls a sub-agent exactly like it calls any other tool. Same interface, same error handling, same timeout semantics. The sub-agent's entire workflow collapses into a single tool call with input and output. No protocol to design, no message format to agree on.
The limitation: it's strictly request-response. The parent can't observe the sub-agent mid-execution. The sub-agent can't escalate, ask for clarification, or redirect. If the sub-agent gets stuck or goes off-track, the parent discovers this only when the tool call returns — or times out.
What the research says
The MAST taxonomy (March 2025) analyzed 1,600+ traces across seven frameworks and identified 14 failure modes in three categories:
Specification failures: agents disobey task/role specs, repeat steps, lose conversation history, or miss termination conditions.
Inter-agent misalignment: conversation resets, failure to ask for clarification, task derailment, information withholding, ignored inputs, reasoning-action mismatch.
Verification failures: premature termination, incomplete or incorrect verification.
No single failure category dominates — failures are diverse across architectures. The production failure rate across multi-agent systems: 41-87%.
Agent drift research (January 2026) introduces the Agent Stability Index measuring drift across 12 dimensions. The key finding: two-level hierarchies significantly outperform both flat and deep (3+) architectures. Workflows with explicit long-term memory show 21% higher performance retention than those relying on conversation history alone.
A documented case of circular agent-to-agent message relay persisted for 9+ days, consuming 60,000+ tokens. This is the risk of bidirectional communication without circuit breakers.
DyLAN (2024) shows that dynamic team formation outperforms static teams — an LLM-powered ranker that deactivates low-performing agents mid-task improves results over fixed configurations. Static crew definitions are simpler but less adaptive.
Open questions
Is the agent-as-tool pattern good enough? Most frameworks have converged on wrapping agents as tools. It's simple and prevents infinite loops. But it means the parent is flying blind during sub-agent execution — the only signal is success or timeout. Claude Code users observed subagents creating fake scripts instead of doing real research, with no way to detect it. Is mid-execution monitoring a must-have, or is it over-engineering?
Should agents be able to escalate? OpenAI's bidirectional handoffs are the boldest design choice in this survey. An agent can say "I can't handle this, passing back to you." No other framework makes this a first-class primitive. But bidirectional communication is also the fastest path to infinite loops and the 9-day relay case. Is the escalation capability worth the risk?
How deep should nesting go? The research is clear: two-level hierarchies outperform deeper ones. But OpenClaw allows depth 5, AutoGen allows unlimited, and LangGraph defaults to 25 supersteps. Are these limits set by engineering convenience or by evidence? And does the "telephone game" — information degrading through each level — impose a hard ceiling regardless of framework design?
Can lateral communication replace hierarchy? LangGraph's swarm mode and OpenAI's handoff mesh both enable peer-to-peer patterns. These are more flexible than trees but harder to debug. MARBLE benchmarks evaluate multi-agent collaboration across star, chain, tree, and graph topologies, but nobody has shown which topology produces the best outcomes for which task types. We explored related failure patterns in multi-agent coordination.
Does shared state beat message passing? Google ADK's "shared digital whiteboard" and LangGraph's shared state graphs let agents coordinate through data rather than conversation. This avoids the protocol complexity of messaging but creates hidden coupling — any agent can read or overwrite state that another agent depends on. Is implicit coordination through shared state more reliable than explicit message passing?
What does monitoring look like for nested agents? The task registry pattern gives parents visibility into sub-agent state. But when agents nest three levels deep, who's watching the watchers? Current frameworks either provide no monitoring (Claude Code) or basic announce mechanisms (OpenClaw). The gap between "spawned a subagent" and "here's what it's doing right now" is where most production failures hide.
Most frameworks chose restriction over flexibility — one-way communication, shallow nesting, no peer interaction. The ones that chose flexibility got infinite loops and 87% failure rates. The question isn't which direction is right. It's whether there's a middle ground that enables rich coordination without the chaos.
Further reading
- Claude Code Subagents — flat, single-level design
- OpenClaw Sub-Agents — configurable nesting with depth limits
- LangGraph Hierarchical Agent Teams — supervisors managing supervisors
- OpenAI Agents SDK Handoffs — bidirectional by design
- MAST: Multi-Agent Systems Failure Taxonomy — 14 failure modes from 1,600+ traces
- Agent Drift — two-level hierarchies outperform flat and deep architectures
- DyLAN: Dynamic LLM Agent Network — dynamic teams outperform static ones
- Google ADK Multi-Agent Systems — strict tree with shared whiteboard
Top comments (0)