Sub-Agents vs Tools: ADK Multi-Agent Decision Framework

#multiagentsystems #googleadk #agentarchitecture #llmengineering

Stop Building Sub-Agents for Everything: A Decision Framework for ADK Multi-Agent Systems

Most multi-agent architecture diagrams look elegant. Clean boxes, directional arrows, specialised agents handling discrete domains. The problem? These diagrams optimise for whiteboard clarity, not production behaviour.

I've spent the last year helping SaaS teams across Canada and the US build agent systems on Google ADK. The pattern I see repeatedly: teams default to sub-agents because the architecture looks cleaner — then spend weeks debugging state passing failures, latency spikes, and cascading errors that wouldn't exist if they'd used the right abstraction from the start.

The sub-agent vs tool decision isn't cosmetic. It determines how state flows through your system, how errors propagate, how you scale, and how much latency you add per reasoning step. Get this wrong early, and you're refactoring agent architecture later — which is significantly more disruptive than refactoring code because the behaviour is harder to test.

The Hidden Cost of Over-Architected Agent Systems

When teams first adopt ADK, there's a natural pull toward the sub-agent pattern. It maps nicely to how we think about team structure — a billing agent, an infrastructure agent, a compliance agent. Clean separation of concerns. Independent reasoning domains.

But here's what the architecture diagrams don't show: every sub-agent call involves at least one additional LLM round trip. For a coordinator that orchestrates three sub-agents sequentially, you've added 3-4 LLM calls that wouldn't exist if those tasks were tools. At 500ms per call, that's 1.5-2 seconds of latency for the sub-agent coordination alone — before you even count the actual work.

I've seen teams build sub-agents for simple API lookups that should have been tools. A sub-agent to fetch project metadata. A sub-agent to check IAM bindings. Deterministic operations wrapped in reasoning overhead. The result: 2-3 LLM round trips for something that should be a function call returning structured data.

The opposite failure mode is equally common. A single agent with 30+ tools becomes unmanageable. The LLM context window fills with tool descriptions. Tool selection accuracy degrades. The agent starts calling the wrong tools or hallucinating tool capabilities that don't exist.

Neither extreme works. Production systems need a principled framework for when to use each pattern.

My Framework: Reasoning Boundaries Determine Architecture

The decision framework I use with teams is simple: sub-agents handle independent reasoning; tools handle deterministic operations.

If the task requires multi-step reasoning, maintains its own memory, or involves complex decision trees — that's a sub-agent. If the output is deterministic given the input, the task is self-contained, or you want to limit what the sub-task can access — that's a tool.

Here's how this plays out in ADK code. A coordinator with sub-agents looks like this:

coordinator = Agent(
    name="coordinator",
    model="gemini-2.0-flash",
    sub_agents=[billing_agent, infra_agent],
    tools=[get_project_metadata]
)

Notice get_project_metadata is a tool, not a sub-agent. It returns structured data. No reasoning required.

The agent-as-tool pattern wraps an agent call in a tool interface:

analysis_tool = AgentTool(
    agent=code_analysis_agent,
    description="Analyzes Terraform code for security misconfigurations"
)
main_agent = Agent(tools=[analysis_tool, deploy_tool])

This pattern works when you need contained reasoning that returns a structured result. The calling agent sees it as a black box. The code analysis agent does its multi-step work internally, but the main agent just gets back a report.

The critical distinction: sub-agents can access the coordinator's state and participate in broader workflows. Agent-as-tool returns a result and exits. Choose based on whether you need ongoing collaboration or isolated computation.

Patterns I've Seen Break in Production

No retry logic between sub-agents. One sub-agent failure cascades to full pipeline failure with no graceful degradation. I've watched an entire document processing pipeline fail because a metadata extraction sub-agent timed out — and there was no fallback. The fix: sub-agents should return structured error responses, not raise exceptions that propagate to the coordinator.

Large context objects passed between sub-agents. Teams try to share state by passing entire conversation histories or document contents through the coordinator. This bloats context windows and causes mysterious failures when you hit token limits mid-workflow. Use structured references instead — pass document IDs, not documents. Let each sub-agent fetch what it needs.

Agent-as-tool without observability. The pattern reduces visibility by design — the wrapped agent is a black box. I've debugged systems where no one could explain what the analysis agent was actually doing internally. Without explicit logging inside the wrapped agent, you lose traceability. Add structured logging before you need it.

Memory isolation surprises. Sub-agent memory is isolated by default in ADK. Teams assume context flows automatically, then wonder why their infrastructure agent doesn't remember what the billing agent just discovered. If you need shared context across sub-agents, you have to explicitly pass it through the coordinator.

The Trade-off Matrix

Factor	Sub-Agent	Agent as Tool	Plain Tool
Latency	Higher (1+ LLM calls)	Higher (1+ LLM calls)	Lowest
Independent testing	Easy	Easy	Easiest
State access	Coordinator state available	Isolated	N/A
Observability	Good	Requires explicit logging	Full visibility
Use case	Complex reasoning	Contained reasoning	Deterministic ops

Sub-agents are easier to test independently because each has a clear input/output contract. But they add latency and complexity. For time-sensitive workflows where you're measuring response time in seconds, every unnecessary LLM call hurts.

Agent-as-tool gives you contained reasoning with a clean interface, but you trade observability. Plain tools are fastest but can't reason.

Why This Matters for Platform Teams

This connects directly to the Automation pillar of the SCALE framework. Agent systems are infrastructure now. The architectural decisions you make about sub-agents vs tools compound as the system grows — affecting operational cost, debugging time, and end-user latency.

I've seen teams burn weeks refactoring agent architecture because they defaulted to sub-agents for everything. Agent behaviour is harder to test than code behaviour. When you change how reasoning flows through your system, you're not just changing code — you're changing emergent behaviour that's difficult to verify with traditional testing.

Start with tools for deterministic operations and single-purpose tasks. Add sub-agents when you need independent reasoning that can't be expressed as a function call. Use agent-as-tool when you need contained reasoning that returns a structured result to an orchestrator.

The decision framework isn't complicated. But I rarely see teams apply it systematically before building. Most start with whatever pattern they saw in a tutorial, then refactor when production behaviour surprises them.

The architecture diagram isn't the system. The latency, state flow, and error propagation are the system. Optimise for those.

Amit Malhotra is Principal GCP Architect at Buoyant Cloud Inc, helping B2B SaaS companies design production-ready platforms on GCP.

Work with a GCP specialist — book a free discovery call

Work with a GCP specialist — book a free discovery call → https://buoyantcloudtech.com