clearloop for CrabTalk

Posted on Mar 11 • Edited on Mar 15

How agents call agents

#agents #ai #architecture #openwalrus

We surveyed eight frameworks to map who can call whom in agent teams — upward, lateral, recursive. Most are more restricted than expected.

When a leader agent delegates work to a team member, can that member escalate back? Can it talk to its peers? Can it recruit its own sub-team? These are the questions that determine whether your multi-agent system is a hierarchy, a mesh, or a tree of one-way pipes.

We surveyed eight frameworks — Claude Code, OpenClaw, CrewAI, AutoGen, LangGraph, OpenAI Agents SDK, Semantic Kernel, and Google ADK — to map the communication patterns they actually support. The findings: most systems are far more restricted than their marketing suggests.

The three directions

Every agent-to-agent call fits one of three directions:

Downward (parent → child): A leader delegates a task to a team member. This is universally supported — it's the baseline for multi-agent systems.
Upward (child → parent): A team member escalates back to its leader. This is where frameworks diverge sharply.
Lateral (peer → peer): Team members communicate directly without going through the leader. This is the rarest capability.

And one structural question cuts across all three: can a team member become a leader? Can an agent that was delegated to create its own sub-team, becoming the root of a recursive hierarchy?

Framework by framework

Claude Code — one level, one direction

Claude Code uses the flattest model of any framework. The main agent spawns subagents via the Task tool. Up to 10 run in parallel. Each gets its own context window.

Communication is strictly one-way. The parent sends a prompt. The subagent returns its final message. There is no mid-execution communication, no streaming, no callback. A feature request documents the gap — users have observed subagents silently abandoning strategies with no way for the parent to detect or intervene.

Can subagents spawn their own subagents? No. The Task tool is explicitly not available to subagents. Users who attempted nesting via claude -p in Bash encountered heap out-of-memory crashes. Nesting depth: 1 level, hardcoded.

Capability	Supported
Parent → Child	Yes
Child → Parent	Final result only
Peer → Peer	No
Recursive nesting	No

OpenClaw — configurable depth, vertical only

OpenClaw spawns sub-agents via sessions_spawn. Each runs in an isolated session. Spawning is non-blocking — the parent gets a run ID immediately and can continue working.

Sub-agents "announce" their final result back to the parent's chat channel, but there's no mid-execution communication. Lateral communication between siblings doesn't exist yet — an RFC for Agent Teams proposes direct inter-agent messaging and shared state, but it's unimplemented.

The key differentiator: configurable nesting depth. maxSpawnDepth controls how deep sub-agents can recurse:

Depth 1 (default): sub-agents cannot spawn children
Depth 2: enables the orchestrator pattern (main → orchestrator → workers)
Maximum: depth 5

Additional guardrails: maxChildrenPerAgent: 5, maxConcurrent: 8, runTimeoutSeconds: 900.

Capability	Supported
Parent → Child	Yes
Child → Parent	Announce (final result)
Peer → Peer	No (RFC pending)
Recursive nesting	Yes (depth 1-5, configurable)

CrewAI — delegation as tools

CrewAI runs two process modes: sequential (tasks in order) and hierarchical (manager coordinates workers). Communication follows hub-and-spoke.

When allow_delegation=True, agents get two delegation tools: Delegate Work (assign a task to a teammate by name) and Ask Question (query a colleague). This looks like peer-to-peer, but it's tool-mediated — the framework converts other agents into callable tools on the delegating agent. CrewAI's documentation describes this as "avoiding peer-to-peer agent traffic."

Hierarchical delegation chains are possible via allowed_agents: a management executive delegates to a communications manager, who delegates to an email agent. But this is static configuration, not dynamic sub-team creation. And it's currently broken in some configurations.

Capability	Supported
Parent → Child	Yes
Child → Parent	Task result only
Peer → Peer	Via delegation tools (tool-mediated)
Recursive nesting	Via `allowed_agents` (static, ~2-3 levels)

AutoGen — broadcast and nest

AutoGen uses a GroupChat pattern. All agents subscribe to a shared topic. A GroupChatManager selects the next speaker using round-robin, random, manual, or LLM-driven selection. There's no direct agent-to-agent addressing — everything flows through the broadcast topic.

The nesting story is strong. AutoGen explicitly supports "recursive group chats" — an agent in one group chat can package an entire inner multi-agent conversation as a single response. No explicit depth limit.

Capability	Supported
Parent → Child	Yes (via manager)
Child → Parent	Via nested chat return
Peer → Peer	Via shared broadcast topic
Recursive nesting	Yes (unlimited)

LangGraph — the most flexible

LangGraph offers three multi-agent patterns: supervisor (central router), swarm (decentralized), and hierarchical (supervisors managing supervisors).

In swarm mode, agents can hand off to peers based on their own assessment — true peer-to-peer. In supervisor mode, all routing goes through the coordinator. Hierarchical mode is a first-class pattern: "you can create multi-level hierarchical systems by creating a supervisor that manages multiple supervisors."

Sub-graphs can have different state schemas from parent graphs, enabling private message histories per agent. Nesting is bounded by recursion_limit (default ~25 supersteps), which is configurable.

Capability	Supported
Parent → Child	Yes
Child → Parent	Via shared state
Peer → Peer	Yes (swarm mode)
Recursive nesting	Yes (~25 supersteps default, configurable)

OpenAI Agents SDK — bidirectional by design

The Agents SDK (successor to Swarm) provides two patterns: handoffs (transfer conversation ownership) and agents-as-tools (invoke as bounded subtask).

Handoffs are bidirectional by design. Agent A lists Agent B in its handoffs. Agent B lists Agent A. Full conversation history is preserved across transfers. This enables circular flows: A → B → A. In agents-as-tools mode, the called agent returns results synchronously.

This is the only framework where true peer-to-peer communication is a core primitive rather than an opt-in mode. The nest_handoff_history beta manages context by collapsing transcript summaries across deep handoff chains.

Capability	Supported
Parent → Child	Yes
Child → Parent	Yes (bidirectional handoff)
Peer → Peer	Yes (handoffs)
Recursive nesting	Yes (unlimited, beta context management)

Semantic Kernel — five patterns, one API

Semantic Kernel provides five orchestration patterns through a unified API: sequential, concurrent, handoff, group chat, and magentic (inspired by MagenticOne). It's merging with AutoGen into the Microsoft Agent Framework.

Handoff orchestration supports bidirectional transfers. Group chat enables shared conversation. Nested orchestrations are supported — orchestrations can contain other orchestrations, each running independently on the runtime.

Capability	Supported
Parent → Child	Yes
Child → Parent	Via orchestration return
Peer → Peer	Yes (handoff/group chat)
Recursive nesting	Yes (unlimited)

Google ADK — strict tree, shared whiteboard

Google ADK organizes agents in an explicit tree structure. Three categories: LLM agents (reasoning), workflow agents (SequentialAgent, ParallelAgent, LoopAgent), and custom agents. The framework auto-sets parent_agent on each child.

Sibling agents communicate through shared session state — a "shared digital whiteboard" — not through direct messaging. Sub-agents write results to output_key in state, which the parent reads. An agent instance can only be added as a sub-agent once (a second parent raises ValueError), enforcing a strict tree.

Workflow agents can contain sub-agents that are themselves workflow agents, enabling multi-level hierarchies. The AgentTool pattern wraps agents as callable tools.

Capability	Supported
Parent → Child	Yes
Child → Parent	Via shared state
Peer → Peer	Via shared state only (indirect)
Recursive nesting	Yes (unlimited tree depth)

The full landscape

Framework	Child → Parent	Peer → Peer	Recursive nesting	Max depth	Topology
Claude Code	Final result only	No	No	1	Star
OpenClaw	Announce only	No	Configurable	5	Tree
CrewAI	Task result	Via tools	Static config	~3	Hub-spoke
AutoGen	Nested return	Broadcast	Yes	Unlimited	Pub-sub
LangGraph	Shared state	Yes (swarm)	Yes	~25	Supervisor/Swarm
OpenAI SDK	Bidirectional	Yes	Yes (beta)	Unlimited	Mesh
Semantic Kernel	Orchestration	Yes	Yes	Unlimited	Pattern-dependent
Google ADK	Shared state	Indirect	Yes	Unlimited	Strict tree

The agent-as-tool pattern

The dominant abstraction across frameworks isn't messaging — it's wrapping agents as tools. Google ADK has AgentTool. CrewAI converts agents into delegation tools. OpenAI's Agents SDK has an explicit agents-as-tools mode. AWS documents it as a standalone pattern.

The appeal: a parent agent calls a sub-agent exactly like it calls any other tool. Same interface, same error handling, same timeout semantics. The sub-agent's entire workflow collapses into a single tool call with input and output. No protocol to design, no message format to agree on.

The limitation: it's strictly request-response. The parent can't observe the sub-agent mid-execution. The sub-agent can't escalate, ask for clarification, or redirect. If the sub-agent gets stuck or goes off-track, the parent discovers this only when the tool call returns — or times out.

What the research says

The MAST taxonomy (March 2025) analyzed 1,600+ traces across seven frameworks and identified 14 failure modes in three categories:

Specification failures: agents disobey task/role specs, repeat steps, lose conversation history, or miss termination conditions.

Inter-agent misalignment: conversation resets, failure to ask for clarification, task derailment, information withholding, ignored inputs, reasoning-action mismatch.

Verification failures: premature termination, incomplete or incorrect verification.

No single failure category dominates — failures are diverse across architectures. The production failure rate across multi-agent systems: 41-87%.

Agent drift research (January 2026) introduces the Agent Stability Index measuring drift across 12 dimensions. The key finding: two-level hierarchies significantly outperform both flat and deep (3+) architectures. Workflows with explicit long-term memory show 21% higher performance retention than those relying on conversation history alone.

A documented case of circular agent-to-agent message relay persisted for 9+ days, consuming 60,000+ tokens. This is the risk of bidirectional communication without circuit breakers.

DyLAN (2024) shows that dynamic team formation outperforms static teams — an LLM-powered ranker that deactivates low-performing agents mid-task improves results over fixed configurations. Static crew definitions are simpler but less adaptive.

Open questions

Is the agent-as-tool pattern good enough? Most frameworks have converged on wrapping agents as tools. It's simple and prevents infinite loops. But it means the parent is flying blind during sub-agent execution — the only signal is success or timeout. Claude Code users observed subagents creating fake scripts instead of doing real research, with no way to detect it. Is mid-execution monitoring a must-have, or is it over-engineering?

Should agents be able to escalate? OpenAI's bidirectional handoffs are the boldest design choice in this survey. An agent can say "I can't handle this, passing back to you." No other framework makes this a first-class primitive. But bidirectional communication is also the fastest path to infinite loops and the 9-day relay case. Is the escalation capability worth the risk?

How deep should nesting go? The research is clear: two-level hierarchies outperform deeper ones. But OpenClaw allows depth 5, AutoGen allows unlimited, and LangGraph defaults to 25 supersteps. Are these limits set by engineering convenience or by evidence? And does the "telephone game" — information degrading through each level — impose a hard ceiling regardless of framework design?

Can lateral communication replace hierarchy? LangGraph's swarm mode and OpenAI's handoff mesh both enable peer-to-peer patterns. These are more flexible than trees but harder to debug. MARBLE benchmarks evaluate multi-agent collaboration across star, chain, tree, and graph topologies, but nobody has shown which topology produces the best outcomes for which task types. We explored related failure patterns in multi-agent coordination.

Does shared state beat message passing? Google ADK's "shared digital whiteboard" and LangGraph's shared state graphs let agents coordinate through data rather than conversation. This avoids the protocol complexity of messaging but creates hidden coupling — any agent can read or overwrite state that another agent depends on. Is implicit coordination through shared state more reliable than explicit message passing?

What does monitoring look like for nested agents? The task registry pattern gives parents visibility into sub-agent state. But when agents nest three levels deep, who's watching the watchers? Current frameworks either provide no monitoring (Claude Code) or basic announce mechanisms (OpenClaw). The gap between "spawned a subagent" and "here's what it's doing right now" is where most production failures hide.

Most frameworks chose restriction over flexibility — one-way communication, shallow nesting, no peer interaction. The ones that chose flexibility got infinite loops and 87% failure rates. The question isn't which direction is right. It's whether there's a middle ground that enables rich coordination without the chaos.