DEV Community

Gustavo Gondim
Gustavo Gondim

Posted on • Originally published at docs.duckflux.openvibes.tech

Migrating from Claude Sub-agents to duckflux

Claude Code's sub-agent system is powerful. You define specialized agents with focused prompts, restricted tools, and independent contexts. Claude decides when to delegate, spawns sub-agents in foreground or background, and synthesizes results. It works.

But there's a design choice buried in the architecture that matters more than any individual feature: who decides what happens next? In Claude Sub-agents, the answer is the LLM. The parent agent reads your request, evaluates sub-agent descriptions, and decides which one to spawn. The routing logic lives in inference, not in config.

This article explores why that matters, when it becomes a problem, and how duckflux offers an alternative where the orchestration is deterministic while the work inside each step stays as creative as the LLM needs to be.


How Claude Sub-agents work

Claude Sub-agents are markdown files with YAML frontmatter that define specialized AI assistants. Each sub-agent has its own system prompt, tool restrictions, model choice, and permission mode.

---
name: code-reviewer
description: Reviews code for quality and best practices
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a senior code reviewer. When invoked, analyze the code
and provide specific, actionable feedback on quality, security,
and best practices.
Enter fullscreen mode Exit fullscreen mode

At runtime, Claude reads the description field of each available sub-agent and decides whether to delegate. You can nudge this with natural language ("use the code-reviewer agent") or force it with @-mentions, but the routing is fundamentally an LLM decision.

Sub-agents run in their own context window. They can't spawn other sub-agents. Results return to the parent, which synthesizes them. For parallel work, you can run multiple sub-agents in the background, or use agent teams for cross-session coordination.

Key capabilities:

  • Isolation. Each sub-agent has its own context, tools, and permissions.
  • Model routing. Haiku for cheap exploration, Opus for complex reasoning, Sonnet as default.
  • Worktree isolation. isolation: worktree gives a sub-agent a temporary git worktree.
  • Persistent memory. Sub-agents can accumulate learnings across sessions.
  • Hooks. PreToolUse, PostToolUse, Stop hooks for lifecycle control.
  • Background execution. Sub-agents run concurrently while you keep working.

The non-determinism problem

Here's the thing: Claude Sub-agents are orchestrated by inference. The LLM decides:

  1. Whether to delegate at all.
  2. Which sub-agent to spawn.
  3. What prompt to write for the sub-agent.
  4. When to synthesize results vs. spawn more agents.
  5. Whether to chain sub-agents or return to you.

Each of these decisions is a probabilistic inference. On a good day, Claude makes the right calls. On a bad day, it forgets to delegate, picks the wrong sub-agent, writes a vague task prompt, or synthesizes prematurely.

This is fine for interactive, exploratory work. You're in the loop, you can redirect, you can say "no, use the reviewer agent." But the moment you want a repeatable pipeline (plan, code, test, review, deploy), you're asking the LLM to be a reliable router. And LLMs are unreliable routers. They forget steps, miscount iterations, and silently skip transitions.

The sub-agent docs themselves acknowledge this: sub-agents cannot spawn other sub-agents, so chaining requires the parent to orchestrate. But the parent's orchestration logic is just... its next token prediction.

Compare this to how we treat human workflows. Nobody says "here are five specialists, figure out the order." We define processes, assign roles to steps, and execute deterministically. The specialists bring creativity; the process brings structure.


What is duckflux?

duckflux is a declarative, YAML-based workflow DSL. The execution order is defined in config, not inferred by an LLM. The runtime handles sequencing, loops, parallelism, retries, events, and tracing.

flow:
  - type: exec
    run: npm test
Enter fullscreen mode Exit fullscreen mode

The key difference: duckflux separates orchestration from execution. The workflow file defines what happens in what order. Each step can invoke an LLM, run a shell command, call an HTTP API, or trigger a sub-workflow. The LLM does creative work inside each step. The workflow DSL handles the plumbing between steps.


The determinism spectrum

It's not binary. Different parts of a pipeline need different levels of determinism.

Concern Needs determinism? Why
Step ordering Yes Plan before code, test before deploy. Not negotiable.
Retry logic Yes "Retry 3 times with backoff" is a policy, not a creative decision.
Quality gates Yes Tests pass or they don't. Exit codes, not vibes.
Error handling Yes "If deploy fails, notify Slack" is a business rule.
Code generation No The LLM should be creative here.
Code review No The LLM should reason freely about quality.
Planning No Breaking tasks into subtasks is inherently creative.

Claude Sub-agents put everything on the non-deterministic side. duckflux lets you draw the line where it makes sense: deterministic orchestration, non-deterministic execution.


Concepts side by side

Claude Sub-agents duckflux Notes
Sub-agent (markdown file) Participant A unit of work. In duckflux, not limited to LLM invocations.
description (LLM-routed) Flow position / when guard Explicit placement replaces LLM routing decisions.
Parent decides delegation flow array Ordering is declared, not inferred.
maxTurns retry.max / loop.max Iteration caps per step, not per agent context.
isolation: worktree cwd per participant Working directory isolation per step.
Background sub-agents parallel: construct Concurrent execution declared in config.
Chained sub-agents Sequential flow No LLM needed to decide "run B after A."
Sub-agent hooks onError, when, emit/wait Lifecycle control in the DSL, not in hook scripts.
Persistent memory execution.context / set Workflow-scoped state. Cross-session memory is outside duckflux scope.
Model routing N/A (bring your own agent CLI) duckflux orchestrates commands; model choice is per-agent.

Migration patterns

Chained sub-agents

In Claude Sub-agents, chaining requires the parent to decide the sequence:

Use the code-reviewer subagent to find performance issues,
then use the optimizer subagent to fix them
Enter fullscreen mode Exit fullscreen mode

The parent LLM interprets "then" and decides to spawn the optimizer after the reviewer. If it misunderstands, it might run them in parallel, skip the optimizer, or synthesize prematurely.

duckflux:

participants:
  review:
    type: exec
    run: cat PROMPT_REVIEW.md | $AGENT

  optimize:
    type: exec
    run: cat PROMPT_OPTIMIZE.md | $AGENT

flow:
  - review
  - optimize
Enter fullscreen mode Exit fullscreen mode

"Then" is a line break in the YAML. No inference needed.

Parallel research

Claude Sub-agents can run research in parallel via background tasks:

Research the authentication, database, and API modules
in parallel using separate subagents
Enter fullscreen mode Exit fullscreen mode

Again, the parent decides whether to actually parallelize, which sub-agents to use, and how to synthesize.

duckflux:

flow:
  - parallel:
      - as: auth-research
        type: exec
        run: cat PROMPT_AUTH.md | $AGENT

      - as: db-research
        type: exec
        run: cat PROMPT_DB.md | $AGENT

      - as: api-research
        type: exec
        run: cat PROMPT_API.md | $AGENT
Enter fullscreen mode Exit fullscreen mode

Parallelism is declared. All three run concurrently. The outputs are collected in an array for the next step. No LLM routing decision required.

Review loop with quality gates

A common Claude Sub-agents pattern: code, then review, then fix if needed. The parent decides when to stop.

Use the coder subagent to implement the feature,
then use the reviewer subagent to check it.
If there are issues, have the coder fix them.
Repeat until the reviewer approves.
Enter fullscreen mode Exit fullscreen mode

The parent LLM manages the iteration. It decides whether to loop, how many times, and when to stop. If it loses track, the loop might run forever (capped by maxTurns) or stop too early.

duckflux:

participants:
  code:
    type: exec
    run: cat PROMPT_CODE.md | $AGENT
    onError: retry
    retry:
      max: 3
      backoff: 2s

  test:
    type: exec
    run: npm test

  lint:
    type: exec
    run: npm run lint

  review:
    type: exec
    run: cat PROMPT_REVIEW.md | $AGENT

flow:
  - loop:
      until: review.output.approved == true
      max: 5
      steps:
        - code
        - test
        - lint
        - review
Enter fullscreen mode Exit fullscreen mode

The loop condition, iteration cap, and quality gates are all in the config. The LLM does creative work inside code and review. The DSL handles the loop, the exit condition, and the gates. test and lint are real commands with real exit codes, not prompt instructions asking the agent to self-report.

Event-driven coordination

Claude Sub-agents have no event system. If sub-agent A needs to signal sub-agent B, the parent synthesizes A's output and writes B's prompt. The coordination happens in the parent's inference.

duckflux has native emit + wait for cases where steps genuinely need to signal each other:

flow:
  - parallel:
      - as: data-prep
        type: exec
        run: ./prepare-data.sh

      - as: wait-for-data
        type: exec
        run: |
          # This branch waits for data-prep to signal readiness

  - wait:
      event: "data.ready"
      timeout: 5m

  - as: process
    type: exec
    run: ./process.sh
Enter fullscreen mode Exit fullscreen mode

Events work across parallel branches, across parent/child workflows, and with external event hubs (NATS, Redis). This is coordination infrastructure that the sub-agent model lacks entirely.


When to keep sub-agents

Sub-agents are the right tool when:

  • You're working interactively. Typing in Claude Code, exploring a codebase, asking questions. The LLM-routed delegation is exactly right here because you're in the loop.
  • The workflow is genuinely emergent. You don't know the steps upfront. The agent needs to figure out what to do based on what it finds.
  • Context preservation matters. Each sub-agent's isolated context window prevents pollution of the main conversation. This is a real advantage for high-volume operations.
  • You need model routing. Sending cheap tasks to Haiku and expensive tasks to Opus within a single session is built into the sub-agent model.

When to switch to duckflux

Switch when:

  • The workflow is repeatable. If you've typed the same chaining instructions more than twice, it should be a config file.
  • You need guaranteed step ordering. Plan, code, test, review, deploy. Always in that order. No exceptions.
  • You need real quality gates. Not "please run the tests", but npm test as an actual step with an exit code.
  • You need audit trails. Structured JSON traces per step, visible in the web server UI.
  • You need cross-agent events. Steps signaling each other, waiting for external events, publishing to message queues.
  • You want provider independence. duckflux orchestrates $AGENT, not Claude specifically. Swap agents per step.

What you gain

Concern Claude Sub-agents duckflux
Routing LLM decides (probabilistic) Config declares (deterministic)
Step ordering Parent LLM inference flow array, top to bottom
Quality gates Prompt instructions Real commands with exit codes
Retry maxTurns (global per agent) retry.max with backoff (per step)
Parallel Background sub-agents (LLM decides) parallel: construct (declared)
Events None emit + wait (cross-branch, cross-workflow)
Tracing Transcript files Structured JSON + web server UI
Provider lock-in Claude Code only Any agent CLI, any runtime

What you lose

  • Interactive delegation. The natural "use the reviewer agent" UX in Claude Code. duckflux is a runner, not an interactive assistant.
  • Context isolation. Sub-agents protect the parent's context window. duckflux steps are independent commands, but they don't share a conversation context across steps.
  • Tool restriction enforcement. Sub-agents have framework-level tool control. In duckflux, that's the agent's responsibility.
  • Model routing within the workflow. Sub-agents can use different models per agent. In duckflux, each exec step invokes whatever CLI you point it at.
  • Persistent memory. Sub-agents accumulate learnings across sessions. duckflux has execution.context for within-workflow state, but cross-session memory is outside scope.

A hybrid approach

You don't have to choose one or the other. The most practical architecture uses both:

# ci-pipeline.flux.yaml
participants:
  plan:
    type: exec
    run: claude --agent planner --print "$(cat SPEC.md)"

  code:
    type: exec
    run: claude --agent coder --print "Implement the plan in PLAN.md"
    onError: retry
    retry:
      max: 3

  test:
    type: exec
    run: npm test

  lint:
    type: exec
    run: npm run lint

  review:
    type: exec
    run: claude --agent reviewer --print "Review the implementation"

flow:
  - plan

  - loop:
      until: review.output.approved == true
      max: 5
      steps:
        - code
        - test
        - lint
        - review
Enter fullscreen mode Exit fullscreen mode

Each claude --agent step is a Claude Sub-agent invocation. The sub-agent gets its isolated context, restricted tools, and specialized prompt. But the orchestration (ordering, looping, gating, retrying) is declarative. The LLM does creative work. The YAML handles plumbing.

This is the core argument: decouple what the LLM is good at (reasoning, generation, analysis) from what config files are good at (sequencing, retrying, branching, gating). Don't ask the LLM to be a router when you already know the route.


Getting started

  1. Install the runtime:
bun add -g @duckflux/runner
Enter fullscreen mode Exit fullscreen mode
  1. Identify your repeatable workflows. Which sub-agent chains do you run the same way every time?

  2. Extract the ordering into a .flux.yaml. Each sub-agent becomes a participant. The chain becomes the flow.

  3. Add real quality gates. Replace "please run the tests" with actual npm test steps.

  4. Run it:

duckflux run my-pipeline.flux.yaml
Enter fullscreen mode Exit fullscreen mode
  1. Observe via duckflux server --trace-dir ./traces for a visual trace of every step.

Tip: Keep using Claude Sub-agents for interactive exploration and ad-hoc tasks. Use duckflux for the workflows you've already figured out and want to run reliably, repeatably, and without babysitting.


Final thoughts

Claude Sub-agents represent a real step forward in AI-assisted development. The isolated contexts, tool restrictions, and model routing are well-designed primitives.

But the orchestration layer, where the LLM decides what to delegate, when, and in what order, is the weak link. Not because Claude is bad at it, but because orchestration is fundamentally a deterministic problem being solved with a probabilistic tool.

duckflux doesn't replace the agents. It replaces the part of the system that shouldn't be guessing.


Check the duckflux docs for the full DSL reference, or jump straight to the spec.

Top comments (0)