Zoricic

Posted on Mar 22 • Edited on Mar 24

Sub-Agent Architectures: Patterns, Trade-offs, and a Kotlin Implementation

#agents #architecture #kotlin #llm

The Problem

Single-agent systems hit a wall fast. Give one agent too many tools and a complex task, and you get:

Context bloat — every intermediate result accumulates in the conversation history
Tool confusion — a model with 20 tools tends to make worse decisions than one with 5 (tool-use accuracy degrades with scale; see the Gorilla LLM benchmark for empirical evidence)
No isolation — a failed step can corrupt the entire conversation state
No parallelism — sequential execution even when tasks are independent

Larger context windows reduce but don't eliminate these problems. A 1M-token context still degrades with irrelevant history, and tool confusion is a model behavior issue independent of window size. Architecture is the more scalable solution — but it's a trade-off, not a free lunch. Sub-agents add latency, coordination complexity, and cost.

What Is a Sub-Agent?

A sub-agent is a separate LLM call with its own isolated conversation context, invoked by a parent agent to handle a specific, self-contained task. "Separate instance" is a common shorthand but can mislead — it's typically the same model, same API key, same underlying compute. The isolation is logical, not physical. Every sub-agent call is another round of LLM inference costs.

What a sub-agent gets depends on how you design it. At minimum:

A system prompt tailored to its job
A lifecycle — it runs, returns a result, and is discarded

Beyond that, context isolation is a design variable, not a defining property of the pattern. In AIgent, sub-agents start with a fresh conversation history and never see the parent's history. In AutoGen, agents share a group chat history by default. In LangGraph, nodes can read and write shared state. Whether a sub-agent has isolated context or shared context depends entirely on the framework and implementation choices.

This article describes AIgent's approach — isolated context, restricted tool sets — as one point in that design space, not the only valid one.

┌──────────────────────────────────────────────────┐
│                  Parent Agent                    │
│  • Full tool set       • Persistent history      │
│  • User-facing         • Skill injection         │
└────────────────────┬─────────────────────────────┘
                     │
              [LLM model decides]
              to call spawn_agent
              tool with task + tools
                     │
                     ▼
          ┌──────────────────────┐
          │     Sub-Agent        │
          │  • Limited tools     │
          │  • Isolated context* │
          │  • Focused task      │
          └──────────┬───────────┘
                     │ returns result string
                     ▼
┌──────────────────────────────────────────────────┐
│           Parent Agent (continues)               │
└──────────────────────────────────────────────────┘

* Context isolation is a design choice — see note below.

When Did Sub-Agents Emerge?

Sub-agents didn't appear overnight. They evolved through distinct phases:

2022 — ReAct and Tool Use

Tool-augmented LMs existed before 2022. What the ReAct paper (Yao et al., 2022) contributed was empirical evidence that interleaving reasoning traces with actions outperforms either alone — it beat action-only approaches (like WebGPT) and reasoning-only approaches (like chain-of-thought) on question-answering and fact-verification tasks. The interleaving structure — think, act, observe, think again — became the blueprint for agent loops that followed. Still single-agent, but foundational.

2023 — Autonomous Agent Experiments

Projects like AutoGPT and BabyAGI showed what happens when you give a model the ability to spawn tasks and run them recursively. They were chaotic and unreliable, but they proved the concept: agents could delegate to themselves. The LangChain and LlamaIndex ecosystems formalized tool-using agents and chains, making sub-agent patterns accessible to developers.

2024 — Production-Grade Orchestration

CrewAI, AutoGen, and LangGraph moved the conversation from "can this work?" to "how do you build it reliably?" Role-based agent teams, graph-based execution, and structured handoffs between agents became first-class patterns. The focus shifted from raw autonomy to controlled delegation.

By 2025 — Native SDK Support

API providers began shipping sub-agent patterns as first-class features. Anthropic's Claude Agent SDK and Google's Gemini multi-agent APIs made spawning isolated sub-agents a few lines of code rather than a framework to build yourself.

Orchestration Patterns

These are structural decisions about how agents are created, what work they do, and how results flow back. They're distinct from capability mechanisms (covered in the next section), which are about what information an agent has access to.

1. ReAct Loop (Single Agent)

The building block. The model reasons, acts, observes, and repeats until done.

User input → [Think] → [Tool Call] → [Observe result] → [Think] → ... → Final answer

Every agent loop in every framework is a variation of this. In production, it's typically bounded by a max-iterations guard to prevent infinite loops:

User input → [Think] → [Tool Call] → [Observe] → ... (max N iterations) → Final answer
                                                              ↓
                                                    [Forced termination if N exceeded]

Pros:

Simple to implement
Easy to trace and debug
No coordination overhead

Cons:

Context grows linearly with each tool call
All tools visible at once — decision quality degrades with scale
No parallelism

2. Hierarchical Delegation (Orchestrator + Sub-Agents)

A parent agent breaks work into tasks and delegates each to a specialized sub-agent with a restricted tool set and fresh context.

┌─────────────────────────────────────────────────────┐
│                   Orchestrator                      │
│   Tools: [spawn_agent, search_web, get_time, ...]   │
└───────────┬──────────────────┬──────────────────────┘
            │                  │
            ▼                  ▼
   ┌──────────────┐   ┌──────────────────┐
   │  Sub-Agent A │   │   Sub-Agent B    │
   │  [search_web]│   │ [push_to_github] │
   └──────────────┘   └──────────────────┘

The key insight is tool isolation: sub-agents only get the tools relevant to their task. This improves decision quality and limits blast radius.

Context isolation — whether sub-agents see the parent's history or share state — is a design variable. Parallelism is structurally possible (sub-agents with isolated state have no ordering dependency), but requires explicit async wiring; a blocking implementation runs them sequentially.

Pros:

Tool scoping improves decision quality per sub-task
Parallelism is possible when sub-tasks are independent and isolated
Limits damage from a misbehaving sub-agent (if contexts are isolated)

Cons:

Each sub-agent is a full LLM call — latency and cost multiply with every delegation
Silent failures: a sub-agent that returns an empty or nonsensical result is indistinguishable from a valid response at the string boundary. Single-agent loops fail loudly (the model keeps trying); hierarchical delegation fails quietly (the parent receives garbage and reasons from it)
No mechanism for sub-agents to ask for clarification; ambiguous tasks produce hallucinated results
Coordination logic lives in the parent prompt — hard to test explicitly

3. Plan & Execute

The agent first produces a full plan (a list of steps), then executes each step — potentially spawning sub-agents per step. Planning and execution are separated.

[User input] → [Planning Agent] → [Step 1, Step 2, Step 3]
                                         │
                          ┌──────────────┼──────────────┐
                          ▼              ▼              ▼
                     [Executor 1]  [Executor 2]  [Executor 3]
                          └──────────────┼──────────────┘
                                         ▼
                                   [Synthesizer]

Pros:

Predictable execution path
Easy to inspect and approve before running

Cons:

Rigid — the plan can't adapt to what it discovers mid-execution
Upfront planning requires good foresight from the model

4. Role-Based Multi-Agent Teams

Multiple agents with distinct roles coordinate via shared memory or message passing. Common roles: Researcher, Writer, Critic, Executor.

┌──────────┐    findings    ┌──────────┐    draft    ┌──────────┐
│Researcher│ ─────────────▶ │  Writer  │ ──────────▶ │  Critic  │
└──────────┘                └──────────┘             └──────────┘
                                                           │
                                                    feedback│
                                                           ▼
                                                    ┌──────────┐
                                                    │  Writer  │
                                                    └──────────┘

Used heavily in CrewAI and AutoGen. Great for creative or review-heavy pipelines; complex to coordinate reliably.

Capability Mechanisms

These are distinct from orchestration patterns. They don't change how agents are spawned or how work is divided — they change what knowledge or behavior an agent has access to within a single context.

Mechanism	How It Works	When to Use
Skill injection	Relevant instructions prepended to system prompt	Conditional behavior without spawning overhead
RAG	Retrieved document chunks injected into the message	Domain knowledge too large for static prompts
Memory	Previous conversation facts loaded into context	Personalization, continuity across sessions

Skill-Based Injection

The parent agent dynamically loads context ("skills") into its system prompt based on the user's request. No new LLM instance — just different instructions per request.

User: "Remind me to call John tomorrow at 9am"
        │
        ▼
┌──────────────────────────────────────────────────┐
│  selectRelevant("remind", allSkills)             │
│  → matches "reminders" skill                     │
│  → injects into system prompt                    │
└──────────────────────────────────────────────────┘

In AIgent, selectRelevant is word intersection — no embeddings, no LLM call:

// src/main/kotlin/SkillLoader.kt
fun selectRelevant(input: String, all: List<Skill>): List<Skill> {
    val words = input.lowercase().split(Regex("\\W+")).filter { it.length > 2 }.toSet()
    val matched = all.filter { skill ->
        val skillWords = (skill.name + " " + skill.description).lowercase().split(Regex("\\W+")).toSet()
        words.intersect(skillWords).isNotEmpty()
    }
    return matched.ifEmpty { all }  // fallback: inject ALL skills if nothing matches
}

It splits both the user input and each skill's name+description into lowercase words (≥3 chars) and checks for any overlap. Fast and cheap — no extra API calls. The fallback on the last line is significant: if nothing matches, every skill gets injected. With two skills that's fine; with twenty it's a prompt full of unrelated instructions.

This is what "lightweight" means here. Alternative approaches — embedding similarity or a classifier LLM call — are more accurate but add latency and cost before the main response even starts.

Pros:

Zero extra API calls — no latency overhead
Easy to add new behaviors: drop a .md file in skills/
Skills can reference specific tools, guiding the model without hard-coding logic

Cons:

Word intersection is brittle — "Set an alert" won't match a skill named "reminders" unless the words overlap
Wrong skill injection is hard to detect: the model follows injected guidance even when irrelevant
Fallback injects all skills when nothing matches — prompt can balloon unexpectedly
No isolation — a buggy skill affects the whole agent, not just a sub-task

Architecture Comparison

Pattern	Type	Context Isolation	Tool Scoping	Complexity	Best For
Single ReAct loop	Orchestration	None	None	Low	Simple, focused tasks
Hierarchical delegation	Orchestration	Design variable²	Per sub-agent	Medium	Multi-step workflows
Plan & Execute	Orchestration	Partial	Per step	Medium	Predictable pipelines
Role-based teams	Orchestration	Shared state	Per role	High	Creative, review workflows
Skill injection	Capability mechanism	Static (prompt-level)¹	None	Low	Conditional behavior
RAG	Capability mechanism	Static (prompt-level)	None	Low	Large knowledge bases

¹ Skill injection changes what context is active at prompt-construction time. The agent still runs in a single LLM context — there is no new instance or conversation boundary.

² Context isolation in hierarchical delegation is a design variable. AIgent uses isolated contexts per sub-agent. AutoGen shares a group chat history by default. LangGraph nodes can read and write shared state.

Practical Recommendations

Use a single ReAct loop when:

The task needs fewer than ~5 tool calls
All tools are clearly relevant to the task

Use skill injection when:

The agent needs different behavioral modes
You want guidance without spinning up new LLM instances
Capabilities are well-defined but not always relevant

Use sub-agents when:

A sub-task is self-contained
You want a model to operate with a focused tool set
Tasks are independent and you're willing to implement async dispatch for parallelism

Prevent recursive spawning explicitly. Don't rely on the model to know not to spawn sub-agents of sub-agents. Enforce it structurally — pass a baseTools list (without the spawn tool) to sub-agents.

Guard your ReAct loops. Add a max-iterations limit. Without it, a model that gets stuck will keep calling tools until you run out of tokens or budget.

Plan for sub-agent failures. The string "(no response)" is indistinguishable from a valid empty result. Use a structured return type that separates success from error, and make the parent handle failure explicitly.

Case Study: AIgent — A Kotlin Implementation

AIgent implements the hierarchical delegation pattern on top of Gemini's function calling API, with a skills system for capability injection. It's a working project — the code below is from the actual repo, not a simplified demo.

The Tool Interface

Every capability — from web search to spawning agents — implements the same contract:

// src/main/kotlin/tools/AgentTool.kt
interface AgentTool {
    val declaration: FunctionDeclaration
    fun handle(args: Map<String, Any>, chatId: Long?): Map<String, Any>
}

FunctionDeclaration is what gets passed to Gemini's function calling API. handle() is what runs when the model calls the tool. Adding a new capability means implementing two things and nothing else.

The Spawn Agent Tool

The parent agent delegates by calling spawn_agent:

// src/main/kotlin/tools/SpawnAgentTool.kt
override fun handle(args: Map<String, Any>, chatId: Long?): Map<String, Any> {
    val task = args["task"]?.toString() ?: return mapOf("error" to "Missing task")
    val toolNames = args["tools"] as? List<String> ?: return mapOf("error" to "Missing tools")

    val selectedTools = availableTools.filter {
        it.declaration.name().orElse("") in toolNames
    }

    val result = SubAgentService(client, selectedTools, task).run()
    return mapOf("result" to result)
}

One scoping detail: availableTools is the parent-defined pool. But toolNames comes from args["tools"] — meaning the model picks which tools to request from that pool. The parent sets hard limits; the sub-agent selects within them. A hallucinating model can't exceed the pool, but it could request the wrong subset. For security-sensitive tools, validate toolNames against an explicit allowlist.

This run() call is synchronous and blocking — sub-agents execute sequentially in the current implementation. The architecture supports parallelism (sub-agents share no state), but it requires explicit async wiring. With Kotlin coroutines:

// Parallel dispatch — not in current implementation, but straightforward to add
suspend fun spawnParallel(
    tasks: List<Pair<String, List<AgentTool>>>,
    client: Client
): List<String> = coroutineScope {
    tasks.map { (task, tools) ->
        async(Dispatchers.IO) { SubAgentService(client, tools, task).run() }
    }.awaitAll()
}

Each async block runs on the IO dispatcher — network-bound LLM calls benefit immediately. awaitAll() collects results once all sub-agents finish.

The Sub-Agent Service

Each sub-agent runs its own ReAct loop in complete isolation:

// src/main/kotlin/SubAgentService.kt
fun run(): String {
    val config = GenerateContentConfig.builder()
        .systemInstruction(Content.fromParts(Part.fromText(
            "You are a focused sub-agent. Complete the given task using the available tools. Be concise and return only the result."
        )))
        .tools(listOf(Tool.builder().functionDeclarations(tools.map { it.declaration }).build()))
        .build()

    val history = mutableListOf<Content>()
    history.add(Content.builder().role("user").parts(listOf(Part.fromText(task))).build())
    var response = client.models.generateContent("gemini-2.5-flash", history, config)

    while (response.functionCalls()?.isNotEmpty() == true) {
        val responseParts = mutableListOf<Part>()
        for (fc in response.functionCalls()!!) {
            val tool = tools.find { it.declaration.name().orElse("") == fc.name().orElse("") }
            val result = tool?.handle(fc.args().orElse(emptyMap()), null)
                ?: mapOf("error" to "Unknown function: ${fc.name().orElse("")}")
            responseParts.add(Part.fromFunctionResponse(fc.name().orElse(""), result))
        }
        history.add(Content.builder().role("model").parts(response.parts()).build())
        history.add(Content.builder().role("user").parts(responseParts).build())
        response = client.models.generateContent("gemini-2.5-flash", history, config)
    }

    return response.text() ?: "(no response)"
}

The system prompt — "Be concise and return only the result" — works well for self-contained tasks where the expected output is clear. The trade-off: the sub-agent has no way to ask for clarification. If the task is ambiguous, it will either hallucinate a plausible answer or return an empty result. The narrow prompt is a deliberate choice to keep sub-agents focused, but it means the parent must write unambiguous task descriptions. A task like "research the company" will fail silently; "find the founding year and current CEO of Stripe" will not.

Notice what's structurally absent in AIgent's implementation: no parent history passed in, no access to tools outside the assigned set, no ability to spawn further sub-agents. This is a design choice — other frameworks (AutoGen, LangGraph) make different ones.

Preventing Recursive Spawning

This design decision prevents one specific class of runaway agent bug — sub-agents spawning further sub-agents:

// src/main/kotlin/AgentService.kt

// baseTools does NOT include SpawnAgentTool
private val baseTools: List<AgentTool> = scheduler.tools + githubDocs.tools +
    listOf(CurrentTimeTool(), GoogleSearchTool(client))

// Only the parent agent gets spawn_agent
private val tools: List<AgentTool> = baseTools + listOf(SpawnAgentTool(client, baseTools))

Sub-agents receive baseTools. SpawnAgentTool is only in tools — the parent's list. This is enforced structurally, not by trusting the model.

What it does not prevent: a parent that misinterprets a sub-agent's result and spawns it again in a loop, or a sub-agent whose own ReAct while loop runs indefinitely. SubAgentService.run() has no max-iteration guard — a model that keeps generating function calls will keep looping until it hits the token limit or the API times out. A production implementation should add an iteration counter and a hard cutoff.

Dynamic Skill Injection

Before each response, the agent injects relevant skills into its system prompt based on keyword matching:

// src/main/kotlin/AgentService.kt
private fun buildConfig(input: String): GenerateContentConfig {
    val relevantSkills = skillLoader.selectRelevant(input, allSkills)
    val systemPrompt = buildString {
        append(soul)
        if (relevantSkills.isNotEmpty()) {
            append("\n\n# Active Skills\n")
            relevantSkills.forEach { skill -> append("\n## ${skill.name}\n${skill.content}\n") }
        }
    }
    return GenerateContentConfig.builder()
        .systemInstruction(Content.fromParts(Part.fromText(systemPrompt)))
        .tools(listOf(Tool.builder().functionDeclarations(tools.map { it.declaration }).build()))
        .build()
}

Skills are markdown files with YAML frontmatter:

---
name: reminders
description: Schedule and manage reminders, alerts, and time-based notifications
tools: [schedule_reminder, list_reminders, get_current_time]
---

Always call `get_current_time` first before scheduling a reminder...

Zero extra API calls, zero latency overhead. The brittle point is selectRelevant: it matches on overlapping words between the input and the skill name/description. "Set an alert for Monday" won't match a skill named "reminders" unless "alert" or "Monday" appears in the skill's metadata.

The Bottom Line

Sub-agents solve three real problems: context that grows without bound, tools that dilute decision quality, and tasks that could run in parallel but don't. The trade-offs are real too: every sub-agent is an additional LLM call with its own latency and cost, coordination logic is hard to test, and a sub-agent that fails silently is worse than one that fails loudly.

The patterns that matter most in practice: isolate tools per sub-task, prevent recursive spawning structurally, write unambiguous task descriptions, and handle errors explicitly at the boundary where sub-agents return results. The architecture is sound; the failure modes live in the implementation details.

Resources

AIgent on GitHub — The Kotlin agent that powers the case study in this article
ReAct: Synergizing Reasoning and Acting in Language Models — The original 2022 paper
Gorilla LLM Benchmark — Empirical data on tool-use accuracy at scale
LangGraph Documentation — Graph-based agent orchestration
AutoGen Multi-Agent Conversation Framework — Microsoft's role-based agent framework
Anthropic Agent SDK — Native sub-agent support in Claude

DEV Community

Sub-Agent Architectures: Patterns, Trade-offs, and a Kotlin Implementation

The Problem

What Is a Sub-Agent?

When Did Sub-Agents Emerge?

2022 — ReAct and Tool Use

2023 — Autonomous Agent Experiments

2024 — Production-Grade Orchestration

By 2025 — Native SDK Support

Orchestration Patterns

1. ReAct Loop (Single Agent)

2. Hierarchical Delegation (Orchestrator + Sub-Agents)

3. Plan & Execute

4. Role-Based Multi-Agent Teams

Capability Mechanisms

Skill-Based Injection

Architecture Comparison

Practical Recommendations

Case Study: AIgent — A Kotlin Implementation

The Tool Interface

The Spawn Agent Tool

The Sub-Agent Service

Preventing Recursive Spawning

Dynamic Skill Injection

The Bottom Line

Resources

Top comments (0)