DEV Community

Cover image for Sub-Agent Architectures: Patterns, Trade-offs, and a Kotlin Implementation
Zrcic
Zrcic

Posted on

Sub-Agent Architectures: Patterns, Trade-offs, and a Kotlin Implementation

The Problem

Single-agent systems hit a wall fast. Give one agent too many tools and a complex task, and you get:

  • Context bloat — every intermediate result accumulates in the conversation history
  • Tool confusion — a model with 20 tools tends to make worse decisions than one with 5 (tool-use accuracy degrades with scale; see the Gorilla LLM benchmark for empirical evidence)
  • No isolation — a failed step can corrupt the entire conversation state
  • No parallelism — sequential execution even when tasks are independent

Larger context windows reduce but don't eliminate these problems. A 1M-token context still degrades with irrelevant history, and tool confusion is a model behavior issue independent of window size. Architecture is the more scalable solution — but it's a trade-off, not a free lunch. Sub-agents add latency, coordination complexity, and cost.


What Is a Sub-Agent?

A sub-agent is a separate LLM call with its own isolated conversation context, invoked by a parent agent to handle a specific, self-contained task. "Separate instance" is a common shorthand but can mislead — it's typically the same model, same API key, same underlying compute. The isolation is logical, not physical. Every sub-agent call is another round of LLM inference costs.

What a sub-agent gets depends on how you design it. At minimum:

  • A system prompt tailored to its job
  • A lifecycle — it runs, returns a result, and is discarded

Beyond that, context isolation is a design variable, not a defining property of the pattern. In AIgent, sub-agents start with a fresh conversation history and never see the parent's history. In AutoGen, agents share a group chat history by default. In LangGraph, nodes can read and write shared state. Whether a sub-agent has isolated context or shared context depends entirely on the framework and implementation choices.

This article describes AIgent's approach — isolated context, restricted tool sets — as one point in that design space, not the only valid one.

┌──────────────────────────────────────────────────┐
│                  Parent Agent                    │
│  • Full tool set       • Persistent history      │
│  • User-facing         • Skill injection         │
└────────────────────┬─────────────────────────────┘
                     │
              [LLM model decides]
              to call spawn_agent
              tool with task + tools
                     │
                     ▼
          ┌──────────────────────┐
          │     Sub-Agent        │
          │  • Limited tools     │
          │  • Isolated context* │
          │  • Focused task      │
          └──────────┬───────────┘
                     │ returns result string
                     ▼
┌──────────────────────────────────────────────────┐
│           Parent Agent (continues)               │
└──────────────────────────────────────────────────┘

* Context isolation is a design choice — see note below.
Enter fullscreen mode Exit fullscreen mode

When Did Sub-Agents Emerge?

Sub-agents didn't appear overnight. They evolved through distinct phases:

2022 — ReAct and Tool Use

Tool-augmented LMs existed before 2022. What the ReAct paper (Yao et al., 2022) contributed was empirical evidence that interleaving reasoning traces with actions outperforms either alone — it beat action-only approaches (like WebGPT) and reasoning-only approaches (like chain-of-thought) on question-answering and fact-verification tasks. The interleaving structure — think, act, observe, think again — became the blueprint for agent loops that followed. Still single-agent, but foundational.

2023 — Autonomous Agent Experiments

Projects like AutoGPT and BabyAGI showed what happens when you give a model the ability to spawn tasks and run them recursively. They were chaotic and unreliable, but they proved the concept: agents could delegate to themselves. The LangChain and LlamaIndex ecosystems formalized tool-using agents and chains, making sub-agent patterns accessible to developers.

2024 — Production-Grade Orchestration

CrewAI, AutoGen, and LangGraph moved the conversation from "can this work?" to "how do you build it reliably?" Role-based agent teams, graph-based execution, and structured handoffs between agents became first-class patterns. The focus shifted from raw autonomy to controlled delegation.

By 2025 — Native SDK Support

API providers began shipping sub-agent patterns as first-class features. Anthropic's Claude Agent SDK and Google's Gemini multi-agent APIs made spawning isolated sub-agents a few lines of code rather than a framework to build yourself.


Orchestration Patterns

These are structural decisions about how agents are created, what work they do, and how results flow back. They're distinct from capability mechanisms (covered in the next section), which are about what information an agent has access to.

1. ReAct Loop (Single Agent)

The building block. The model reasons, acts, observes, and repeats until done.

User input → [Think] → [Tool Call] → [Observe result] → [Think] → ... → Final answer
Enter fullscreen mode Exit fullscreen mode

Every agent loop in every framework is a variation of this. In production, it's typically bounded by a max-iterations guard to prevent infinite loops:

User input → [Think] → [Tool Call] → [Observe] → ... (max N iterations) → Final answer
                                                              ↓
                                                    [Forced termination if N exceeded]
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Simple to implement
  • Easy to trace and debug
  • No coordination overhead

Cons:

  • Context grows linearly with each tool call
  • All tools visible at once — decision quality degrades with scale
  • No parallelism

2. Hierarchical Delegation (Orchestrator + Sub-Agents)

A parent agent breaks work into tasks and delegates each to a specialized sub-agent with a restricted tool set and fresh context.

┌─────────────────────────────────────────────────────┐
│                   Orchestrator                      │
│   Tools: [spawn_agent, search_web, get_time, ...]   │
└───────────┬──────────────────┬──────────────────────┘
            │                  │
            ▼                  ▼
   ┌──────────────┐   ┌──────────────────┐
   │  Sub-Agent A │   │   Sub-Agent B    │
   │  [search_web]│   │ [push_to_github] │
   └──────────────┘   └──────────────────┘
Enter fullscreen mode Exit fullscreen mode

The key insight is tool isolation: sub-agents only get the tools relevant to their task. This improves decision quality and limits blast radius.

Context isolation — whether sub-agents see the parent's history or share state — is a design variable. Parallelism is structurally possible (sub-agents with isolated state have no ordering dependency), but requires explicit async wiring; a blocking implementation runs them sequentially.

Pros:

  • Tool scoping improves decision quality per sub-task
  • Parallelism is possible when sub-tasks are independent and isolated
  • Limits damage from a misbehaving sub-agent (if contexts are isolated)

Cons:

  • Each sub-agent is a full LLM call — latency and cost multiply with every delegation
  • Silent failures: a sub-agent that returns an empty or nonsensical result is indistinguishable from a valid response at the string boundary. Single-agent loops fail loudly (the model keeps trying); hierarchical delegation fails quietly (the parent receives garbage and reasons from it)
  • No mechanism for sub-agents to ask for clarification; ambiguous tasks produce hallucinated results
  • Coordination logic lives in the parent prompt — hard to test explicitly

3. Plan & Execute

The agent first produces a full plan (a list of steps), then executes each step — potentially spawning sub-agents per step. Planning and execution are separated.

[User input] → [Planning Agent] → [Step 1, Step 2, Step 3]
                                         │
                          ┌──────────────┼──────────────┐
                          ▼              ▼              ▼
                     [Executor 1]  [Executor 2]  [Executor 3]
                          └──────────────┼──────────────┘
                                         ▼
                                   [Synthesizer]
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Predictable execution path
  • Easy to inspect and approve before running

Cons:

  • Rigid — the plan can't adapt to what it discovers mid-execution
  • Upfront planning requires good foresight from the model

4. Role-Based Multi-Agent Teams

Multiple agents with distinct roles coordinate via shared memory or message passing. Common roles: Researcher, Writer, Critic, Executor.

┌──────────┐    findings    ┌──────────┐    draft    ┌──────────┐
│Researcher│ ─────────────▶ │  Writer  │ ──────────▶ │  Critic  │
└──────────┘                └──────────┘             └──────────┘
                                                           │
                                                    feedback│
                                                           ▼
                                                    ┌──────────┐
                                                    │  Writer  │
                                                    └──────────┘
Enter fullscreen mode Exit fullscreen mode

Used heavily in CrewAI and AutoGen. Great for creative or review-heavy pipelines; complex to coordinate reliably.


Capability Mechanisms

These are distinct from orchestration patterns. They don't change how agents are spawned or how work is divided — they change what knowledge or behavior an agent has access to within a single context.

Mechanism How It Works When to Use
Skill injection Relevant instructions prepended to system prompt Conditional behavior without spawning overhead
RAG Retrieved document chunks injected into the message Domain knowledge too large for static prompts
Memory Previous conversation facts loaded into context Personalization, continuity across sessions

Skill-Based Injection

The parent agent dynamically loads context ("skills") into its system prompt based on the user's request. No new LLM instance — just different instructions per request.

User: "Remind me to call John tomorrow at 9am"
        │
        ▼
┌──────────────────────────────────────────────────┐
│  selectRelevant("remind", allSkills)             │
│  → matches "reminders" skill                     │
│  → injects into system prompt                    │
└──────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

In AIgent, selectRelevant is word intersection — no embeddings, no LLM call:

// src/main/kotlin/SkillLoader.kt
fun selectRelevant(input: String, all: List<Skill>): List<Skill> {
    val words = input.lowercase().split(Regex("\\W+")).filter { it.length > 2 }.toSet()
    val matched = all.filter { skill ->
        val skillWords = (skill.name + " " + skill.description).lowercase().split(Regex("\\W+")).toSet()
        words.intersect(skillWords).isNotEmpty()
    }
    return matched.ifEmpty { all }  // fallback: inject ALL skills if nothing matches
}
Enter fullscreen mode Exit fullscreen mode

It splits both the user input and each skill's name+description into lowercase words (≥3 chars) and checks for any overlap. Fast and cheap — no extra API calls. The fallback on the last line is significant: if nothing matches, every skill gets injected. With two skills that's fine; with twenty it's a prompt full of unrelated instructions.

This is what "lightweight" means here. Alternative approaches — embedding similarity or a classifier LLM call — are more accurate but add latency and cost before the main response even starts.

Pros:

  • Zero extra API calls — no latency overhead
  • Easy to add new behaviors: drop a .md file in skills/
  • Skills can reference specific tools, guiding the model without hard-coding logic

Cons:

  • Word intersection is brittle — "Set an alert" won't match a skill named "reminders" unless the words overlap
  • Wrong skill injection is hard to detect: the model follows injected guidance even when irrelevant
  • Fallback injects all skills when nothing matches — prompt can balloon unexpectedly
  • No isolation — a buggy skill affects the whole agent, not just a sub-task

Architecture Comparison

Pattern Type Context Isolation Tool Scoping Complexity Best For
Single ReAct loop Orchestration None None Low Simple, focused tasks
Hierarchical delegation Orchestration Design variable² Per sub-agent Medium Multi-step workflows
Plan & Execute Orchestration Partial Per step Medium Predictable pipelines
Role-based teams Orchestration Shared state Per role High Creative, review workflows
Skill injection Capability mechanism Static (prompt-level)¹ None Low Conditional behavior
RAG Capability mechanism Static (prompt-level) None Low Large knowledge bases

¹ Skill injection changes what context is active at prompt-construction time. The agent still runs in a single LLM context — there is no new instance or conversation boundary.

² Context isolation in hierarchical delegation is a design variable. AIgent uses isolated contexts per sub-agent. AutoGen shares a group chat history by default. LangGraph nodes can read and write shared state.


Practical Recommendations

Use a single ReAct loop when:

  • The task needs fewer than ~5 tool calls
  • All tools are clearly relevant to the task
  • You need the full history for context throughout

Use skill injection when:

  • The agent needs different behavioral modes
  • You want guidance without spinning up new LLM instances
  • Capabilities are well-defined but not always relevant

Use sub-agents when:

  • A sub-task is self-contained and doesn't need conversation history
  • You want a model to operate with a focused tool set
  • Tasks are independent and you're willing to implement async dispatch for parallelism

Prevent recursive spawning explicitly. Don't rely on the model to know not to spawn sub-agents of sub-agents. Enforce it structurally — pass a baseTools list (without the spawn tool) to sub-agents.

Guard your ReAct loops. Add a max-iterations limit. Without it, a model that gets stuck will keep calling tools until you run out of tokens or budget.

Plan for sub-agent failures. The string "(no response)" is indistinguishable from a valid empty result. Use a structured return type that separates success from error, and make the parent handle failure explicitly.


Case Study: AIgent — A Kotlin Implementation

AIgent implements the hierarchical delegation pattern on top of Gemini's function calling API, with a skills system for capability injection. It's a working project — the code below is from the actual repo, not a simplified demo.

The Tool Interface

Every capability — from web search to spawning agents — implements the same contract:

// src/main/kotlin/tools/AgentTool.kt
interface AgentTool {
    val declaration: FunctionDeclaration
    fun handle(args: Map<String, Any>, chatId: Long?): Map<String, Any>
}
Enter fullscreen mode Exit fullscreen mode

FunctionDeclaration is what gets passed to Gemini's function calling API. handle() is what runs when the model calls the tool. Adding a new capability means implementing two things and nothing else.

The Spawn Agent Tool

The parent agent delegates by calling spawn_agent:

// src/main/kotlin/tools/SpawnAgentTool.kt
override fun handle(args: Map<String, Any>, chatId: Long?): Map<String, Any> {
    val task = args["task"]?.toString() ?: return mapOf("error" to "Missing task")
    val toolNames = args["tools"] as? List<String> ?: return mapOf("error" to "Missing tools")

    val selectedTools = availableTools.filter {
        it.declaration.name().orElse("") in toolNames
    }

    val result = SubAgentService(client, selectedTools, task).run()
    return mapOf("result" to result)
}
Enter fullscreen mode Exit fullscreen mode

One scoping detail: availableTools is the parent-defined pool. But toolNames comes from args["tools"] — meaning the model picks which tools to request from that pool. The parent sets hard limits; the sub-agent selects within them. A hallucinating model can't exceed the pool, but it could request the wrong subset. For security-sensitive tools, validate toolNames against an explicit allowlist.

This run() call is synchronous and blocking — sub-agents execute sequentially in the current implementation. The architecture supports parallelism (sub-agents share no state), but it requires explicit async wiring. With Kotlin coroutines:

// Parallel dispatch — not in current implementation, but straightforward to add
suspend fun spawnParallel(
    tasks: List<Pair<String, List<AgentTool>>>,
    client: Client
): List<String> = coroutineScope {
    tasks.map { (task, tools) ->
        async(Dispatchers.IO) { SubAgentService(client, tools, task).run() }
    }.awaitAll()
}
Enter fullscreen mode Exit fullscreen mode

Each async block runs on the IO dispatcher — network-bound LLM calls benefit immediately. awaitAll() collects results once all sub-agents finish.

The Sub-Agent Service

Each sub-agent runs its own ReAct loop in complete isolation:

// src/main/kotlin/SubAgentService.kt
fun run(): String {
    val config = GenerateContentConfig.builder()
        .systemInstruction(Content.fromParts(Part.fromText(
            "You are a focused sub-agent. Complete the given task using the available tools. Be concise and return only the result."
        )))
        .tools(listOf(Tool.builder().functionDeclarations(tools.map { it.declaration }).build()))
        .build()

    val history = mutableListOf<Content>()
    history.add(Content.builder().role("user").parts(listOf(Part.fromText(task))).build())
    var response = client.models.generateContent("gemini-2.5-flash", history, config)

    while (response.functionCalls()?.isNotEmpty() == true) {
        val responseParts = mutableListOf<Part>()
        for (fc in response.functionCalls()!!) {
            val tool = tools.find { it.declaration.name().orElse("") == fc.name().orElse("") }
            val result = tool?.handle(fc.args().orElse(emptyMap()), null)
                ?: mapOf("error" to "Unknown function: ${fc.name().orElse("")}")
            responseParts.add(Part.fromFunctionResponse(fc.name().orElse(""), result))
        }
        history.add(Content.builder().role("model").parts(response.parts()).build())
        history.add(Content.builder().role("user").parts(responseParts).build())
        response = client.models.generateContent("gemini-2.5-flash", history, config)
    }

    return response.text() ?: "(no response)"
}
Enter fullscreen mode Exit fullscreen mode

The system prompt — "Be concise and return only the result" — works well for self-contained tasks where the expected output is clear. The trade-off: the sub-agent has no way to ask for clarification. If the task is ambiguous, it will either hallucinate a plausible answer or return an empty result. The narrow prompt is a deliberate choice to keep sub-agents focused, but it means the parent must write unambiguous task descriptions. A task like "research the company" will fail silently; "find the founding year and current CEO of Stripe" will not.

Notice what's structurally absent in AIgent's implementation: no parent history passed in, no access to tools outside the assigned set, no ability to spawn further sub-agents. This is a design choice — other frameworks (AutoGen, LangGraph) make different ones.

Preventing Recursive Spawning

This design decision prevents one specific class of runaway agent bug — sub-agents spawning further sub-agents:

// src/main/kotlin/AgentService.kt

// baseTools does NOT include SpawnAgentTool
private val baseTools: List<AgentTool> = scheduler.tools + githubDocs.tools +
    listOf(CurrentTimeTool(), GoogleSearchTool(client))

// Only the parent agent gets spawn_agent
private val tools: List<AgentTool> = baseTools + listOf(SpawnAgentTool(client, baseTools))
Enter fullscreen mode Exit fullscreen mode

Sub-agents receive baseTools. SpawnAgentTool is only in tools — the parent's list. This is enforced structurally, not by trusting the model.

What it does not prevent: a parent that misinterprets a sub-agent's result and spawns it again in a loop, or a sub-agent whose own ReAct while loop runs indefinitely. SubAgentService.run() has no max-iteration guard — a model that keeps generating function calls will keep looping until it hits the token limit or the API times out. A production implementation should add an iteration counter and a hard cutoff.

Dynamic Skill Injection

Before each response, the agent injects relevant skills into its system prompt based on keyword matching:

// src/main/kotlin/AgentService.kt
private fun buildConfig(input: String): GenerateContentConfig {
    val relevantSkills = skillLoader.selectRelevant(input, allSkills)
    val systemPrompt = buildString {
        append(soul)
        if (relevantSkills.isNotEmpty()) {
            append("\n\n# Active Skills\n")
            relevantSkills.forEach { skill -> append("\n## ${skill.name}\n${skill.content}\n") }
        }
    }
    return GenerateContentConfig.builder()
        .systemInstruction(Content.fromParts(Part.fromText(systemPrompt)))
        .tools(listOf(Tool.builder().functionDeclarations(tools.map { it.declaration }).build()))
        .build()
}
Enter fullscreen mode Exit fullscreen mode

Skills are markdown files with YAML frontmatter:

---
name: reminders
description: Schedule and manage reminders, alerts, and time-based notifications
tools: [schedule_reminder, list_reminders, get_current_time]
---

Always call `get_current_time` first before scheduling a reminder...
Enter fullscreen mode Exit fullscreen mode

Zero extra API calls, zero latency overhead. The brittle point is selectRelevant: it matches on overlapping words between the input and the skill name/description. "Set an alert for Monday" won't match a skill named "reminders" unless "alert" or "Monday" appears in the skill's metadata.


The Bottom Line

Sub-agents solve three real problems: context that grows without bound, tools that dilute decision quality, and tasks that could run in parallel but don't. The trade-offs are real too: every sub-agent is an additional LLM call with its own latency and cost, coordination logic is hard to test, and a sub-agent that fails silently is worse than one that fails loudly.

The patterns that matter most in practice: isolate tools per sub-task, prevent recursive spawning structurally, write unambiguous task descriptions, and handle errors explicitly at the boundary where sub-agents return results. The architecture is sound; the failure modes live in the implementation details.


Resources

Top comments (0)