DEV Community

Gustavo Gondim
Gustavo Gondim

Posted on • Originally published at ggondim.notion.site

duckflux — A Declarative Workflow DSL Born from the Multi-Agent Orchestration Gap

TL;DR: After months exploring multi-agent orchestration with OpenClaw and Lobster, I hit a wall: no existing tool offered simple declarative spec + runtime-agnostic execution + first-class control flow. So I designed duckflux — a minimal YAML-based workflow DSL with loops, conditionals, parallelism, and events built in. The spec is done (v0.2), a Go CLI runner is working, and the next step is integrating it as the orchestration engine inside OpenClaw.


Table of Contents


Previously, on this series

This article is the third in a series about building deterministic multi-agent development pipelines. If you're joining now, here's the short version.

In the first article, I documented two months of trial and error trying to build a code → review → test pipeline with autonomous AI agents. The core thesis: LLMs are unreliable routers — they forget steps, miscount iterations, skip transitions. Orchestration must be deterministic and implemented in code, not delegated to inference. After five failed attempts (Ralph Orchestrator, OpenClaw sub-agents, a custom event bus, skill-driven self-orchestration, and plugin hooks), I found Lobster — OpenClaw's built-in workflow engine. It was close, but lacked native loop support. I contributed a pull request adding sub-workflow steps with loops.

In the second article, I zoomed out. The problem wasn't just orchestration — it was multi-agents × multi-projects × multi-providers × multi-channels. I compiled a dataset of agent configuration formats across providers, proposed the Monoswarm pattern (a monorepo layout for managing agent swarms), and identified the still-missing piece: an orchestration layer that ties agent events to workflow transitions across projects.

Both articles ended with the same conclusion: we need a proper workflow DSL.

The gap that remained

Lobster was the closest thing to what I needed, but it was designed for linear pipelines with approval gates. My pull request added loops, but the deeper issues remained:

  • No conditional branching (if/then/else).
  • No parallel execution of multiple agents.
  • No event system for inter-agent coordination.
  • No typed expressions — conditions were shell commands returning exit codes.
  • Tied to OpenClaw's runtime — not portable to other environments.

I looked at the broader landscape:

Tool Where it falls short
Argo Workflows Turing-complete YAML disguised as config. A conditional loop requires template recursion, manual iteration counters, and string-interpolated type casting.
GitHub Actions No conditional loops. Workarounds require unrolling or recursive reusable workflows.
Temporal / Inngest Code-first — Go/TS/Python SDKs. The code IS the spec. No declarative layer.
Airflow / Prefect DAGs are acyclic by definition — conditional loops are architecturally impossible.
n8n / Make Visual-first, JSON-heavy specs. Loop constructs require JavaScript function nodes. Specs are unreadable as text.
Lobster Linear pipelines with approval gates. No native loops, no parallelism, no conditionals.

The gap was clear: no existing tool combines a simple declarative spec + runtime-agnostic execution + first-class control flow (loops, conditionals, parallelism) + events.

So I built one.

What is duckflux

duckflux is a minimal, deterministic, runtime-agnostic DSL for orchestrating workflows through declarative YAML.

The design principles are deliberate:

  1. Readable in 5 seconds — any developer understands the flow by glancing at the YAML.
  2. Minimal by default — features are only added when absolutely necessary.
  3. Convention over configuration — sensible defaults everywhere.
  4. Runtime-agnostic — the DSL defines WHAT happens and in WHAT ORDER. The runtime decides HOW.
  5. Reuse proven standards — expressions use Google CEL (used in Kubernetes, Firebase, Envoy), schemas use JSON Schema, format is YAML.

The simplest possible workflow:

flow:
  - as: greet
    type: exec
    run: echo "Hello, duckflux!"
Enter fullscreen mode Exit fullscreen mode

That's it. One flow, one step. No boilerplate, no mandatory fields beyond what's needed.

A more realistic example — a code review pipeline with a retry loop, parallel checks, conditional deployment, and event notification:

id: code-review-pipeline
name: Code Review Pipeline

defaults:
  timeout: 10m

inputs:
  repoUrl:
    type: string
    format: uri
    required: true
  maxRounds:
    type: integer
    default: 3

participants:
  coder:
    type: agent
    model: claude-sonnet-4-20250514
    tools: [read, write, bash]
    onError: retry
    retry:
      max: 2
      backoff: 5s

  reviewer:
    type: agent
    model: claude-sonnet-4-20250514
    tools: [read]
    output:
      approved:
        type: boolean
        required: true
      score:
        type: integer

flow:
  - coder

  - loop:
      until: reviewer.output.approved == true
      max: input.maxRounds
      steps:
        - reviewer
        - coder:
            when: reviewer.output.approved == false

  - parallel:
      - as: tests
        type: exec
        run: npm test
        onError: skip

      - as: lint
        type: exec
        run: npm run lint
        onError: skip

  - if:
      condition: tests.status == "success" && lint.status == "success"
      then:
        - as: deploy
          type: exec
          run: ./deploy.sh

        - as: notify
          type: emit
          event: "deploy.completed"
          payload:
            approved: reviewer.output.approved
            score: reviewer.output.score
      else:
        - as: notifyFailure
          type: emit
          event: "deploy.failed"
          payload:
            tests: tests.status
            lint: lint.status

output:
  approved: reviewer.output.approved
  score: reviewer.output.score
Enter fullscreen mode Exit fullscreen mode

Compare this to the same scenario in Argo Workflows (~40 lines of template recursion), GitHub Actions (~50+ lines with unrolled iterations), or Temporal (~35 lines of Go code that requires compilation and a server).

Alternatives considered

Before landing on a custom YAML format, I evaluated two other approaches:

Extending Argo Workflows. Argo's YAML is expressive, but its power came from 6+ years of incremental feature additions. A conditional loop in Argo requires template recursion, manual iteration counters, and string-interpolated type casting — 13+ lines for what should be 6. The complexity is the feature, not a bug, and that's the problem.

Mermaid as executable spec. Mermaid sequence diagrams already have loop, par, and alt constructs. The DX for reading and writing is excellent, and diagrams render natively in GitHub. However, extending Mermaid for real workflow concerns (retry policies, timeouts, error handling, typed variables) requires hacking Note blocks for config and $var for expressions — creating a custom parser as proprietary as a new YAML format, just disguised as something familiar.

Custom minimal YAML (chosen). A new format, intentionally constrained, inspired by Mermaid's visual clarity but with the extensibility and tooling ecosystem of YAML. The tradeoff: a new DSL to learn, but one designed to be readable in 5 seconds and writable in 5 minutes.

The spec at a glance

The full spec is at github.com/duckflux/spec. Here's a walkthrough of the key features.

Participants

Participants are the building blocks. Each has a type that determines its behavior:

Type Description
exec Shell command
http HTTP request
mcp MCP server delegation
workflow Sub-workflow (composition)
emit Fire an event to the event hub
wait Pause execution until an event, timeout, or polling condition

Participants can be defined in a reusable participants block or inline in the flow:

# Reusable
participants:
  build:
    type: exec
    run: npm run build

flow:
  - build

  # Inline (single-use)
  - as: notify
    type: http
    url: https://hooks.slack.com/services/...
    method: POST
Enter fullscreen mode Exit fullscreen mode

Control flow

Loops — repeat until a CEL condition is true or N iterations:

- loop:
    until: reviewer.output.approved == true
    max: 3
    steps:
      - coder
      - reviewer
Enter fullscreen mode Exit fullscreen mode

Parallel — run steps concurrently:

- parallel:
    - as: lint
      type: exec
      run: npm run lint
    - as: test
      type: exec
      run: npm test
Enter fullscreen mode Exit fullscreen mode

Conditionals — branch based on CEL expressions:

- if:
    condition: tests.status == "success"
    then:
      - deploy
    else:
      - rollback
Enter fullscreen mode Exit fullscreen mode

Guards — skip a single step conditionally:

- deploy:
    when: reviewer.output.approved == true
Enter fullscreen mode Exit fullscreen mode

Wait — pause for an event, a timeout, or a polling condition:

- wait:
    event: "approval.received"
    match: event.requestId == submitForApproval.output.id
    timeout: 24h
Enter fullscreen mode Exit fullscreen mode

Expressions with Google CEL

All conditions, input mappings, and output mappings use Google CEL. CEL is non-Turing-complete, sandboxed (no I/O, no side effects), type-checked at parse time, and has a familiar C/JS/Python-like syntax:

- if:
    condition: reviewer.output.approved == false && loop.iteration < 3
Enter fullscreen mode Exit fullscreen mode

CEL was chosen over JavaScript eval (security surface, runtime dependency), custom mini-DSLs (implementation burden), and JSONPath/JMESPath (poor logic support).

Events

emit publishes events, wait subscribes. Events propagate both internally (within the workflow) and externally:

- as: notifyProgress
  type: emit
  event: "task.progress"
  payload:
    taskId: input.taskId
    status: coder.output.status
  ack: true  # block until delivery confirmed
Enter fullscreen mode Exit fullscreen mode

Error handling

Configurable per participant or per flow step invocation, with four strategies:

participants:
  coder:
    type: agent
    onError: retry
    retry:
      max: 3
      backoff: 2s
      factor: 2     # exponential: 2s, 4s, 8s

  deploy:
    type: exec
    onError: notify  # redirect to another participant as fallback
Enter fullscreen mode Exit fullscreen mode

Inputs and outputs

Everything is string by default, like stdin/stdout. Schema is opt-in via JSON Schema (written in YAML):

inputs:
  repoUrl:
    type: string
    format: uri
    required: true
  branch:
    type: string
    default: "main"

output:
  approved: reviewer.output.approved
  score: reviewer.output.score
Enter fullscreen mode Exit fullscreen mode

JSON Schema for editor support

A JSON Schema ships with the spec, giving you autocomplete and validation in VS Code for free:

{
  "yaml.schemas": {
    "./duckflux.schema.json": "*.flow.yaml"
  }
}
Enter fullscreen mode Exit fullscreen mode

The Go runner

The spec is useless without an executor. The duckflux runner is a cross-platform CLI written in Go.

Why Go: official Google CEL implementation (cel-go), single static binary with zero runtime dependencies, native concurrency via goroutines (maps directly to parallel:), and ecosystem fit — virtually every workflow and infrastructure tool is written in Go (Argo, Temporal, Docker, Kubernetes, Terraform).

Installation

git clone https://github.com/duckflux/runner.git
cd runner
make build
./bin/duckflux version
Enter fullscreen mode Exit fullscreen mode

Commands

Run a workflow:

duckflux run deploy.flow.yaml --input branch=main --input env=staging
Enter fullscreen mode Exit fullscreen mode

Lint (validate without executing):

duckflux lint deploy.flow.yaml
Enter fullscreen mode Exit fullscreen mode

Validate inputs against schema:

duckflux validate deploy.flow.yaml --input branch=main
Enter fullscreen mode Exit fullscreen mode

What's next: duckflux meets OpenClaw

The entire journey — from Protoagent to Lobster to Monoswarm — has been converging toward one goal: a deterministic orchestration engine for multi-agent workflows inside OpenClaw.

The architecture

┌─────────────────────────────────────────────────────────┐
│                    Orchestrator                          │
│                                                         │
│  /work [workflow] [project] [task]                      │
│                                                         │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│            Canonical Agents Plugin                       │
│                                                         │
│  Watch + hot-reload of AGENTS.md                        │
│  Dynamically generates OpenClaw config                  │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│              OpenClaw Gateway                            │
│                                                         │
│  Webhooks + Sandboxing + Tools                          │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│          Per-Task Containers (Docker)                    │
│                                                         │
│  Git worktrees as filesystem                            │
└─────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Two plugins

The integration relies on two OpenClaw plugins:

Canonical Agents Plugin — watches a directory of AGENTS.md files (YAML frontmatter for model/tools/sandbox config + markdown body for the system prompt) and dynamically generates OpenClaw's agent configuration with hot-reload on changes. This is the Monoswarm pattern's .ai/ directory made executable.

Orchestrator Plugin — the duckflux runner embedded as an OpenClaw plugin. Triggered by a command like /work code-review project-a TASK-123, it reads a duckflux workflow file, clones canonical agents per project, manages git worktrees, and executes the workflow — where agent participants map to OpenClaw webhook calls with isolated session keys.

The details of each plugin's implementation will be a future article. For now, the important thing is how this changes the picture.

What this replaces

With duckflux as the orchestration engine:

  • Lobster is replaced by a more expressive workflow DSL with native loops, conditionals, parallelism, and events.
  • Plugin hooks for routing are replaced by declarative emit/wait in the workflow spec.
  • Shell exit codes for conditions are replaced by type-checked CEL expressions.
  • The custom orchestration plugin described in article two becomes the duckflux runner itself, embedded in OpenClaw.

The LLMs do what they're good at: writing code, analyzing code, making decisions. duckflux does what code is good at: sequencing, counting, routing, retrying.


Links:

Top comments (0)