Claude Code's sub-agent system is powerful. You define specialized agents with focused prompts, restricted tools, and independent contexts. Claude decides when to delegate, spawns sub-agents in foreground or background, and synthesizes results. It works.
But there's a design choice buried in the architecture that matters more than any individual feature: who decides what happens next? In Claude Sub-agents, the answer is the LLM. The parent agent reads your request, evaluates sub-agent descriptions, and decides which one to spawn. The routing logic lives in inference, not in config.
This article explores why that matters, when it becomes a problem, and how duckflux offers an alternative where the orchestration is deterministic while the work inside each step stays as creative as the LLM needs to be.
How Claude Sub-agents work
Claude Sub-agents are markdown files with YAML frontmatter that define specialized AI assistants. Each sub-agent has its own system prompt, tool restrictions, model choice, and permission mode.
---
name: code-reviewer
description: Reviews code for quality and best practices
tools: Read, Grep, Glob, Bash
model: sonnet
---
You are a senior code reviewer. When invoked, analyze the code
and provide specific, actionable feedback on quality, security,
and best practices.
At runtime, Claude reads the description field of each available sub-agent and decides whether to delegate. You can nudge this with natural language ("use the code-reviewer agent") or force it with @-mentions, but the routing is fundamentally an LLM decision.
Sub-agents run in their own context window. They can't spawn other sub-agents. Results return to the parent, which synthesizes them. For parallel work, you can run multiple sub-agents in the background, or use agent teams for cross-session coordination.
Key capabilities:
- Isolation. Each sub-agent has its own context, tools, and permissions.
- Model routing. Haiku for cheap exploration, Opus for complex reasoning, Sonnet as default.
-
Worktree isolation.
isolation: worktreegives a sub-agent a temporary git worktree. - Persistent memory. Sub-agents can accumulate learnings across sessions.
-
Hooks.
PreToolUse,PostToolUse,Stophooks for lifecycle control. - Background execution. Sub-agents run concurrently while you keep working.
The non-determinism problem
Here's the thing: Claude Sub-agents are orchestrated by inference. The LLM decides:
- Whether to delegate at all.
- Which sub-agent to spawn.
- What prompt to write for the sub-agent.
- When to synthesize results vs. spawn more agents.
- Whether to chain sub-agents or return to you.
Each of these decisions is a probabilistic inference. On a good day, Claude makes the right calls. On a bad day, it forgets to delegate, picks the wrong sub-agent, writes a vague task prompt, or synthesizes prematurely.
This is fine for interactive, exploratory work. You're in the loop, you can redirect, you can say "no, use the reviewer agent." But the moment you want a repeatable pipeline (plan, code, test, review, deploy), you're asking the LLM to be a reliable router. And LLMs are unreliable routers. They forget steps, miscount iterations, and silently skip transitions.
The sub-agent docs themselves acknowledge this: sub-agents cannot spawn other sub-agents, so chaining requires the parent to orchestrate. But the parent's orchestration logic is just... its next token prediction.
Compare this to how we treat human workflows. Nobody says "here are five specialists, figure out the order." We define processes, assign roles to steps, and execute deterministically. The specialists bring creativity; the process brings structure.
What is duckflux?
duckflux is a declarative, YAML-based workflow DSL. The execution order is defined in config, not inferred by an LLM. The runtime handles sequencing, loops, parallelism, retries, events, and tracing.
flow:
- type: exec
run: npm test
The key difference: duckflux separates orchestration from execution. The workflow file defines what happens in what order. Each step can invoke an LLM, run a shell command, call an HTTP API, or trigger a sub-workflow. The LLM does creative work inside each step. The workflow DSL handles the plumbing between steps.
The determinism spectrum
It's not binary. Different parts of a pipeline need different levels of determinism.
| Concern | Needs determinism? | Why |
|---|---|---|
| Step ordering | Yes | Plan before code, test before deploy. Not negotiable. |
| Retry logic | Yes | "Retry 3 times with backoff" is a policy, not a creative decision. |
| Quality gates | Yes | Tests pass or they don't. Exit codes, not vibes. |
| Error handling | Yes | "If deploy fails, notify Slack" is a business rule. |
| Code generation | No | The LLM should be creative here. |
| Code review | No | The LLM should reason freely about quality. |
| Planning | No | Breaking tasks into subtasks is inherently creative. |
Claude Sub-agents put everything on the non-deterministic side. duckflux lets you draw the line where it makes sense: deterministic orchestration, non-deterministic execution.
Concepts side by side
| Claude Sub-agents | duckflux | Notes |
|---|---|---|
| Sub-agent (markdown file) | Participant | A unit of work. In duckflux, not limited to LLM invocations. |
description (LLM-routed) |
Flow position / when guard |
Explicit placement replaces LLM routing decisions. |
| Parent decides delegation |
flow array |
Ordering is declared, not inferred. |
maxTurns |
retry.max / loop.max
|
Iteration caps per step, not per agent context. |
isolation: worktree |
cwd per participant |
Working directory isolation per step. |
| Background sub-agents |
parallel: construct |
Concurrent execution declared in config. |
| Chained sub-agents | Sequential flow | No LLM needed to decide "run B after A." |
| Sub-agent hooks |
onError, when, emit/wait
|
Lifecycle control in the DSL, not in hook scripts. |
| Persistent memory |
execution.context / set
|
Workflow-scoped state. Cross-session memory is outside duckflux scope. |
| Model routing | N/A (bring your own agent CLI) | duckflux orchestrates commands; model choice is per-agent. |
Migration patterns
Chained sub-agents
In Claude Sub-agents, chaining requires the parent to decide the sequence:
Use the code-reviewer subagent to find performance issues,
then use the optimizer subagent to fix them
The parent LLM interprets "then" and decides to spawn the optimizer after the reviewer. If it misunderstands, it might run them in parallel, skip the optimizer, or synthesize prematurely.
duckflux:
participants:
review:
type: exec
run: cat PROMPT_REVIEW.md | $AGENT
optimize:
type: exec
run: cat PROMPT_OPTIMIZE.md | $AGENT
flow:
- review
- optimize
"Then" is a line break in the YAML. No inference needed.
Parallel research
Claude Sub-agents can run research in parallel via background tasks:
Research the authentication, database, and API modules
in parallel using separate subagents
Again, the parent decides whether to actually parallelize, which sub-agents to use, and how to synthesize.
duckflux:
flow:
- parallel:
- as: auth-research
type: exec
run: cat PROMPT_AUTH.md | $AGENT
- as: db-research
type: exec
run: cat PROMPT_DB.md | $AGENT
- as: api-research
type: exec
run: cat PROMPT_API.md | $AGENT
Parallelism is declared. All three run concurrently. The outputs are collected in an array for the next step. No LLM routing decision required.
Review loop with quality gates
A common Claude Sub-agents pattern: code, then review, then fix if needed. The parent decides when to stop.
Use the coder subagent to implement the feature,
then use the reviewer subagent to check it.
If there are issues, have the coder fix them.
Repeat until the reviewer approves.
The parent LLM manages the iteration. It decides whether to loop, how many times, and when to stop. If it loses track, the loop might run forever (capped by maxTurns) or stop too early.
duckflux:
participants:
code:
type: exec
run: cat PROMPT_CODE.md | $AGENT
onError: retry
retry:
max: 3
backoff: 2s
test:
type: exec
run: npm test
lint:
type: exec
run: npm run lint
review:
type: exec
run: cat PROMPT_REVIEW.md | $AGENT
flow:
- loop:
until: review.output.approved == true
max: 5
steps:
- code
- test
- lint
- review
The loop condition, iteration cap, and quality gates are all in the config. The LLM does creative work inside code and review. The DSL handles the loop, the exit condition, and the gates. test and lint are real commands with real exit codes, not prompt instructions asking the agent to self-report.
Event-driven coordination
Claude Sub-agents have no event system. If sub-agent A needs to signal sub-agent B, the parent synthesizes A's output and writes B's prompt. The coordination happens in the parent's inference.
duckflux has native emit + wait for cases where steps genuinely need to signal each other:
flow:
- parallel:
- as: data-prep
type: exec
run: ./prepare-data.sh
- as: wait-for-data
type: exec
run: |
# This branch waits for data-prep to signal readiness
- wait:
event: "data.ready"
timeout: 5m
- as: process
type: exec
run: ./process.sh
Events work across parallel branches, across parent/child workflows, and with external event hubs (NATS, Redis). This is coordination infrastructure that the sub-agent model lacks entirely.
When to keep sub-agents
Sub-agents are the right tool when:
- You're working interactively. Typing in Claude Code, exploring a codebase, asking questions. The LLM-routed delegation is exactly right here because you're in the loop.
- The workflow is genuinely emergent. You don't know the steps upfront. The agent needs to figure out what to do based on what it finds.
- Context preservation matters. Each sub-agent's isolated context window prevents pollution of the main conversation. This is a real advantage for high-volume operations.
- You need model routing. Sending cheap tasks to Haiku and expensive tasks to Opus within a single session is built into the sub-agent model.
When to switch to duckflux
Switch when:
- The workflow is repeatable. If you've typed the same chaining instructions more than twice, it should be a config file.
- You need guaranteed step ordering. Plan, code, test, review, deploy. Always in that order. No exceptions.
-
You need real quality gates. Not "please run the tests", but
npm testas an actual step with an exit code. - You need audit trails. Structured JSON traces per step, visible in the web server UI.
- You need cross-agent events. Steps signaling each other, waiting for external events, publishing to message queues.
-
You want provider independence. duckflux orchestrates
$AGENT, not Claude specifically. Swap agents per step.
What you gain
| Concern | Claude Sub-agents | duckflux |
|---|---|---|
| Routing | LLM decides (probabilistic) | Config declares (deterministic) |
| Step ordering | Parent LLM inference |
flow array, top to bottom |
| Quality gates | Prompt instructions | Real commands with exit codes |
| Retry |
maxTurns (global per agent) |
retry.max with backoff (per step) |
| Parallel | Background sub-agents (LLM decides) |
parallel: construct (declared) |
| Events | None |
emit + wait (cross-branch, cross-workflow) |
| Tracing | Transcript files | Structured JSON + web server UI |
| Provider lock-in | Claude Code only | Any agent CLI, any runtime |
What you lose
- Interactive delegation. The natural "use the reviewer agent" UX in Claude Code. duckflux is a runner, not an interactive assistant.
- Context isolation. Sub-agents protect the parent's context window. duckflux steps are independent commands, but they don't share a conversation context across steps.
- Tool restriction enforcement. Sub-agents have framework-level tool control. In duckflux, that's the agent's responsibility.
-
Model routing within the workflow. Sub-agents can use different models per agent. In duckflux, each
execstep invokes whatever CLI you point it at. -
Persistent memory. Sub-agents accumulate learnings across sessions. duckflux has
execution.contextfor within-workflow state, but cross-session memory is outside scope.
A hybrid approach
You don't have to choose one or the other. The most practical architecture uses both:
# ci-pipeline.flux.yaml
participants:
plan:
type: exec
run: claude --agent planner --print "$(cat SPEC.md)"
code:
type: exec
run: claude --agent coder --print "Implement the plan in PLAN.md"
onError: retry
retry:
max: 3
test:
type: exec
run: npm test
lint:
type: exec
run: npm run lint
review:
type: exec
run: claude --agent reviewer --print "Review the implementation"
flow:
- plan
- loop:
until: review.output.approved == true
max: 5
steps:
- code
- test
- lint
- review
Each claude --agent step is a Claude Sub-agent invocation. The sub-agent gets its isolated context, restricted tools, and specialized prompt. But the orchestration (ordering, looping, gating, retrying) is declarative. The LLM does creative work. The YAML handles plumbing.
This is the core argument: decouple what the LLM is good at (reasoning, generation, analysis) from what config files are good at (sequencing, retrying, branching, gating). Don't ask the LLM to be a router when you already know the route.
Getting started
- Install the runtime:
bun add -g @duckflux/runner
Identify your repeatable workflows. Which sub-agent chains do you run the same way every time?
Extract the ordering into a
.flux.yaml. Each sub-agent becomes a participant. The chain becomes the flow.Add real quality gates. Replace "please run the tests" with actual
npm teststeps.Run it:
duckflux run my-pipeline.flux.yaml
-
Observe via
duckflux server --trace-dir ./tracesfor a visual trace of every step.
Tip: Keep using Claude Sub-agents for interactive exploration and ad-hoc tasks. Use duckflux for the workflows you've already figured out and want to run reliably, repeatably, and without babysitting.
Final thoughts
Claude Sub-agents represent a real step forward in AI-assisted development. The isolated contexts, tool restrictions, and model routing are well-designed primitives.
But the orchestration layer, where the LLM decides what to delegate, when, and in what order, is the weak link. Not because Claude is bad at it, but because orchestration is fundamentally a deterministic problem being solved with a probabilistic tool.
duckflux doesn't replace the agents. It replaces the part of the system that shouldn't be guessing.
Check the duckflux docs for the full DSL reference, or jump straight to the spec.
Top comments (0)