Gustavo Gondim

Posted on Mar 11 • Edited on Apr 3 • Originally published at ggondim.notion.site

duckflux : A Declarative Workflow DSL Born from the Multi-Agent Orchestration Gap

#ai #agents #openclaw #dsl

TL;DR: After months exploring multi-agent orchestration with OpenClaw and Lobster, I hit a wall: no existing tool offered simple declarative spec + runtime-agnostic execution + first-class control flow. So I designed duckflux, a minimal YAML-based workflow DSL with loops, conditionals, parallelism, and events built in. The spec is now at v0.7, the TypeScript runtime ships as a CLI (quack) and an embeddable library (@duckflux/core), with pluggable event hub backends (in-memory, NATS, Redis) and built-in execution tracing. Full docs at duckflux.openvibes.tech.

Previously, on this series
The gap that remained
What is duckflux
Alternatives considered
The spec at a glance
The TypeScript runtime
What's next

Previously, on this series

This article is the third in a series about building deterministic multi-agent development pipelines. If you're joining now, here's the short version.

In the first article, I documented two months of trial and error trying to build a code -> review -> test pipeline with autonomous AI agents. The core thesis: LLMs are unreliable routers, they forget steps, miscount iterations, skip transitions. Orchestration must be deterministic and implemented in code, not delegated to inference. After five failed attempts (Ralph Orchestrator, OpenClaw sub-agents, a custom event bus, skill-driven self-orchestration, and plugin hooks), I found Lobster, OpenClaw's built-in workflow engine. It was close, but lacked native loop support. I contributed a pull request adding sub-workflow steps with loops.

In the second article, I zoomed out. The problem wasn't just orchestration, it was multi-agents x multi-projects x multi-providers x multi-channels. I compiled a dataset of agent configuration formats across providers, proposed the Monoswarm pattern (a monorepo layout for managing agent swarms), and identified the still-missing piece: an orchestration layer that ties agent events to workflow transitions across projects.

Both articles ended with the same conclusion: we need a proper workflow DSL.

The gap that remained

Lobster was the closest thing to what I needed, but it was designed for linear pipelines with approval gates. My pull request added loops, but the deeper issues remained:

No conditional branching (if/then/else).
No parallel execution of multiple agents.
No event system for inter-agent coordination.
No typed expressions, conditions were shell commands returning exit codes.
Tied to OpenClaw's runtime, not portable to other environments.

I looked at the broader landscape:

Tool	Where it falls short
Argo Workflows	Turing-complete YAML disguised as config. A conditional loop requires template recursion, manual iteration counters, and string-interpolated type casting.
GitHub Actions	No conditional loops. Workarounds require unrolling or recursive reusable workflows.
Temporal / Inngest	Code-first (Go/TS/Python SDKs). The code IS the spec. No declarative layer.
Airflow / Prefect	DAGs are acyclic by definition, conditional loops are architecturally impossible.
n8n / Make	Visual-first, JSON-heavy specs. Loop constructs require JavaScript function nodes. Specs are unreadable as text.
Lobster	Linear pipelines with approval gates. No native loops, no parallelism, no conditionals.

The gap was clear: no existing tool combines a simple declarative spec + runtime-agnostic execution + first-class control flow (loops, conditionals, parallelism) + events.

So I built one.

What is duckflux

duckflux is a minimal, deterministic, runtime-agnostic DSL for orchestrating workflows through declarative YAML. The spec is at v0.7, with a complete TypeScript runtime and a documentation site at duckflux.openvibes.tech.

The design principles are deliberate:

Readable in 5 seconds -- any developer understands the flow by glancing at the YAML.
Minimal by default -- features are only added when absolutely necessary.
Convention over configuration -- sensible defaults everywhere.
Runtime-agnostic -- the DSL defines WHAT happens and in WHAT ORDER. The runtime decides HOW.
String by default -- every participant receives and returns strings unless a schema is explicitly defined, like stdin/stdout, the universal interface.
Reuse proven standards -- expressions use Google CEL (used in Kubernetes, Firebase, Envoy), schemas use JSON Schema, format is YAML.

The simplest possible workflow:

flow:
  - type: exec
    run: echo "Hello, duckflux!"

That's it. One flow, one step. No boilerplate, no mandatory fields beyond what's needed.

A more realistic example: an agentic coding pipeline where a planner breaks work into tasks, then a loop fetches each task, a coder implements it, and a reviewer checks it:

id: agentic-coding-pipeline
name: Agentic Coding Pipeline
version: "0.7"

defaults:
  timeout: 10m
  cwd: ./repo

inputs:
  goal:
    type: string
    required: true
    description: "High-level description of what needs to be built"
  taskQueueUrl:
    type: string
    required: true
  maxRounds:
    type: integer
    default: 3
    minimum: 1
    maximum: 10

participants:
  planner:
    type: exec
    run: >
      claude -p
      "Break the following goal into discrete coding tasks.
      Return a JSON array of {id, description} objects.
      Goal: " + workflow.inputs.goal
    timeout: 5m
    output:
      type: array
      items:
        type: object
        required: true

  fetchTask:
    type: http
    url: workflow.inputs.taskQueueUrl + "/next"
    method: GET
    headers:
      Accept: application/json

  coder:
    type: exec
    run: >
      claude -p
      "Implement the following task in the current repository.
      Task: " + fetchTask.output.description
    timeout: 15m
    onError: retry
    retry:
      max: 2
      backoff: 10s

  reviewer:
    type: exec
    run: >
      claude -p
      "Review the changes for the following task. Return a JSON
      object with 'approved' (boolean) and 'feedback' (string).
      Task: " + fetchTask.output.description
    timeout: 10m
    output:
      approved:
        type: boolean
        required: true
      feedback:
        type: string

flow:
  - planner

  - loop:
      max: workflow.inputs.maxRounds
      steps:
        - fetchTask
        - coder:
            input:
              task: fetchTask.output.description
        - reviewer:
            input:
              task: fetchTask.output.description

output:
  approved: reviewer.output.approved
  feedback: reviewer.output.feedback
  rounds: loop.iteration

Compare this to the same scenario in Argo Workflows (~40 lines of template recursion), GitHub Actions (~50+ lines with unrolled iterations), or Temporal (~35 lines of Go code that requires compilation and a server).

Alternatives considered

Before landing on a custom YAML format, I evaluated two other approaches:

Extending Argo Workflows. Argo's YAML is expressive, but its power came from 6+ years of incremental feature additions. A conditional loop in Argo requires template recursion, manual iteration counters, and string-interpolated type casting, 13+ lines for what should be 6. The complexity is the feature, not a bug, and that's the problem.

Mermaid as executable spec. Mermaid sequence diagrams already have loop, par, and alt constructs. The DX for reading and writing is excellent, and diagrams render natively in GitHub. However, extending Mermaid for real workflow concerns (retry policies, timeouts, error handling, typed variables) requires hacking Note blocks for config and $var for expressions, creating a custom parser as proprietary as a new YAML format, just disguised as something familiar.

Custom minimal YAML (chosen). A new format, intentionally constrained, inspired by Mermaid's visual clarity but with the extensibility and tooling ecosystem of YAML. The tradeoff: a new DSL to learn, but one designed to be readable in 5 seconds and writable in 5 minutes.

The spec at a glance

The full spec is at github.com/duckflux/spec, with complete documentation at duckflux.openvibes.tech. Here's a walkthrough of the key features.

Participants

Participants are the atomic unit of work. Each has a type that determines its behavior:

Type	Description
`exec`	Shell command
`http`	HTTP request
`mcp`	MCP server tool call
`workflow`	Sub-workflow (composition)
`emit`	Fire an event to the event hub

Participants can be defined in three ways: in a reusable participants block, as named inline steps (with as), or as anonymous inline steps (without a name at all):

# Reusable (in participants block)
participants:
  build:
    type: exec
    run: npm run build

flow:
  # Reference a reusable participant
  - build

  # Named inline (one-off, but addressable by name)
  - as: notify
    type: http
    url: https://hooks.slack.com/services/...
    method: POST

  # Anonymous inline (output accessible only via the I/O chain)
  - type: exec
    run: echo "done"

Implicit I/O chain

One of the most impactful features added since v0.2: the output of each step is automatically passed as input to the next step, forming a chain analogous to Unix pipes.

flow:
  - type: exec
    run: curl -s https://api.example.com/data
  - type: exec
    run: jq '.items[] | .name'
  - type: exec
    run: wc -l

Each step receives the previous step's output on stdin. No explicit input mapping needed for linear pipelines. When a participant also has an explicit input mapping, the runtime merges the chained value with the explicit mapping.

Control flow

Loops -- repeat until a CEL condition is true or N iterations:

- loop:
    until: reviewer.output.approved == true
    max: 3
    steps:
      - coder
      - reviewer

Parallel -- run steps concurrently:

- parallel:
    - as: lint
      type: exec
      run: npm run lint
    - as: test
      type: exec
      run: npm test

Conditionals -- branch based on CEL expressions:

- if:
    condition: tests.status == "success"
    then:
      - deploy
    else:
      - rollback

Guards -- skip a single step conditionally:

- deploy:
    when: reviewer.output.approved == true

Wait -- pause for an event, a timeout, or a polling condition:

# Wait for an external event
- wait:
    event: "approval.received"
    match: event.requestId == submitForApproval.output.id
    timeout: 24h

# Sleep
- wait:
    timeout: 30s

# Poll until a condition is true
- wait:
    until: now >= timestamp("2024-04-01T09:00:00Z")
    poll: 1m
    timeout: 48h

Set -- write values into a shared execution context without producing output:

- set:
    token: workflow.inputs.api_token
    region: env.AWS_REGION

- as: fetchData
  type: http
  url: "'https://api.example.com/data'"
  headers:
    Authorization: "'Bearer ' + execution.context.token"

set is transparent to the I/O chain: the chain passes through unchanged.

Exec input passing semantics

How input reaches an exec subprocess depends on its type:

Map input -> environment variables. When the resolved input is an object, each key-value pair is injected as an environment variable. The run command references them via shell interpolation (${KEY}).
String input -> stdin. When the resolved input is a string, it's passed via stdin, enabling Unix pipe-style chaining.

# Map input: keys become environment variables
- as: deploy
  type: exec
  run: ./deploy.sh --branch="${BRANCH}" --env="${TARGET_ENV}"
  input:
    BRANCH: workflow.inputs.branch
    TARGET_ENV: execution.context.environment

# String input: passed via stdin
flow:
  - type: exec
    run: echo '{"name": "World"}'
  - type: exec
    run: jq -r '.name'

Expressions with Google CEL

All conditions, input mappings, and output mappings use Google CEL. CEL is non-Turing-complete, sandboxed (no I/O, no side effects), type-checked at parse time, and has a familiar C/JS/Python-like syntax:

- if:
    condition: reviewer.output.approved == false && loop.iteration < 3

The runtime ships with the full CEL standard library: has, size, matches, contains, startsWith, endsWith, timestamp, duration, filter, map, exists, all, and more.

CEL was chosen over JavaScript eval (security surface, runtime dependency), custom mini-DSLs (implementation burden), and JSONPath/JMESPath (poor logic support).

Variable namespaces

Since v0.3, input and output are participant-scoped: inside a participant, input means "my input" and output means "my output". Workflow-level I/O lives under workflow.inputs.* and workflow.output.

Key runtime variables:

Namespace	Description
`workflow.inputs.*`	Workflow input parameters
`workflow.output`	Workflow final result
`<step>.output`	A step's output (auto-parsed if JSON)
`<step>.status`	`success`, `failure`, or `skipped`
`execution.context.*`	Shared read/write scratchpad (set via `set`)
`env.*`	Environment variables (read-only)
`loop.iteration`	Current loop iteration index
`input`	Current participant's resolved input

Events

emit publishes events, wait subscribes. Events propagate both internally (within the workflow) and externally via the event hub:

- as: notifyProgress
  type: emit
  event: "task.progress"
  payload:
    taskId: workflow.inputs.taskId
    status: coder.output.status
  ack: true  # block until delivery confirmed

Error handling

Configurable per participant, per flow step invocation, or globally via defaults, with four strategies:

# Global defaults
defaults:
  onError: retry
  retry:
    max: 2
    backoff: 5s

participants:
  coder:
    type: exec
    run: ./code.sh
    onError: retry       # retry with exponential backoff
    retry:
      max: 3
      backoff: 2s
      factor: 2          # exponential: 2s, 4s, 8s

  deploy:
    type: exec
    run: ./deploy.sh
    onError: notify      # redirect to a fallback participant

Error strategy resolution chain: flow override > participant > defaults > fail.

Inputs and outputs

Everything is string by default, like stdin/stdout. Schema is opt-in via JSON Schema (written in YAML):

inputs:
  repoUrl:
    type: string
    format: uri
    required: true
  branch:
    type: string
    default: "main"

output:
  approved: reviewer.output.approved
  score: reviewer.output.score

Input mapping supports flow-level overrides that merge with the participant's base input (instead of replacing it), so you never have to repeat shared configuration on every call:

participants:
  fetch_page:
    type: exec
    input:
      NOTION_TOKEN: execution.context.token   # base input, always present
    run: curl -sS "https://api.notion.com/v1/pages/$(cat)" -H "Authorization: Bearer ${NOTION_TOKEN}"

flow:
  - fetch_page:
      input:
        PAGE_ID: workflow.inputs.story_id    # merged with base input

JSON Schema for editor support

A JSON Schema ships with the spec, giving you autocomplete and validation in VS Code for free:

{
  "yaml.schemas": {
    "./duckflux.schema.json": "*.duck.yaml"
  }
}

Workflow files use the .duck.yaml convention (e.g., deploy.duck.yaml, review-loop.duck.yaml).

The TypeScript runtime

The original plan was a Go runner, chosen for its native CEL implementation (cel-go) and single-binary distribution. After prototyping, I switched to TypeScript: Go's plugin model can't support extensibility via npm packages, which is the core extensibility primitive for duckflux plugins. The runtime targets Bun and ships as both a CLI tool and an embeddable library.

Packages

Package	Description
`duckflux`	CLI tool (`quack run`, `quack lint`, `quack validate`)
`@duckflux/core`	Engine, parser, CEL evaluator, event hub (in-memory)
`@duckflux/hub-nats`	Optional NATS JetStream event hub backend
`@duckflux/hub-redis`	Optional Redis Streams event hub backend

Installation

# Universal installer (auto-detects apt, brew, bun, npm; falls back to standalone binary)
curl -fsSL https://duckflux.github.io/apt-repo/install.sh | bash

# Or via Homebrew
brew install duckflux/tap/quack

# Or via npm/bun
npm install -g duckflux   # or: bun add -g duckflux

# Or run without installing
npx duckflux run workflow.yaml

Standalone binaries (no Node.js or Bun required) are also available for macOS, Linux, and Windows on the GitHub Releases page.

CLI usage

# Run a workflow
quack run deploy.duck.yaml --input branch=main --input env=staging

# Run from stdin
echo '{"branch": "main"}' | quack run deploy.duck.yaml

# Validate (schema + semantics)
quack lint deploy.duck.yaml

# Validate with inputs
quack validate deploy.duck.yaml --input branch=main

# Start the web server UI for visual workflow observation
quack server --trace-dir ./traces

# Version
quack version

Library usage

Drop @duckflux/core into any TypeScript project and run workflows in-process:

import { executeWorkflow } from "@duckflux/core/engine";
import { parseWorkflowFile } from "@duckflux/core/parser";

const workflow = await parseWorkflowFile("./pipeline.yaml");
const result = await executeWorkflow(workflow, { env: "production" });

console.log(result.output);  // structured output
console.log(result.steps);   // per-step results, timings, errors

No subprocess, no serialization overhead, full TypeScript types.

Event hub backends

Async workflows that emit and wait on events work out of the box with the built-in in-memory hub. Scale up to NATS or Redis when you need cross-process delivery:

Backend	Package	Cross-process	Use case
In-memory	built-in	No	Development, testing, single-process
NATS JetStream	`@duckflux/hub-nats`	Yes	Distributed, multi-process
Redis Streams	`@duckflux/hub-redis`	Yes	Distributed with persistence

quack run workflow.yaml --event-backend nats --nats-url nats://localhost:4222
quack run workflow.yaml --event-backend redis --redis-addr localhost:6379

Execution tracing

Every run can produce a structured trace, written incrementally as each step completes. Choose the format that fits your workflow:

# Trace to JSON (default)
quack run workflow.yaml --trace-dir ./traces

# Trace to SQLite (queryable with any SQL client)
quack run workflow.yaml --trace-dir ./traces --trace-format sqlite

Each trace captures every step (participants and control-flow constructs alike) with timing, inputs, outputs, errors, and retry counts.

Spec v0.7 feature coverage

The runtime implements the complete duckflux v0.7 spec:

Participant types: exec, http, emit, workflow (+ mcp stub)
Control flow: loop, parallel, if/else, when guards, set, wait
I/O chaining: step output flows automatically as input to the next step
Expressions: full CEL standard library (has, size, matches, timestamp, duration, and more)
Error strategies: fail, skip, retry (exponential backoff), redirect to fallback participant
Input semantics: map input -> env vars, string input -> stdin
Input merge: flow override merges with participant base input instead of replacing it
Timeouts: per-step, per-participant, or global via defaults
Output schema validation: validate step and workflow output against JSON Schema definitions
Circular sub-workflow detection: prevents infinite recursion in nested workflows

What's next

Tooling and ecosystem

The documentation site at duckflux.openvibes.tech covers everything from getting started to the full library API. A browser-based visual editor for building workflows is planned.

On the roadmap

Features deliberately deferred from v0.7, to be prioritized based on real-world demand:

DAG mode -- explicit step dependencies (depends: [stepA, stepB]) for complex graphs
Durability / resume -- workflow survives a runtime crash and resumes from where it stopped
Matrix / fan-out -- combinatorial execution (e.g., tests across 3 Node versions x 2 OS)
Persistent mode -- workflow running as a daemon, reacting to events continuously
Caching between runs -- reuse outputs from idempotent steps across executions

The thesis, revisited

The journey from Protoagent to Lobster to duckflux converged on one insight: LLMs should do what they're good at (writing code, analyzing code, making decisions), and code should do what code is good at (sequencing, counting, routing, retrying).

duckflux is the code side of that equation. A deterministic orchestration layer where the flow is explicit, the execution is predictable, and the spec is readable by both humans and machines.

Links:

duckflux docs -- Full documentation site
duckflux spec -- DSL specification (v0.7)
duckflux on npm -- TypeScript runtime
Article 1 -- Building a deterministic pipeline with Lobster
Article 2 -- Multi-agents x multi-projects x multi-providers x multi-channels

Top comments (1)

Jiepeng Wan • Mar 28

Sounds really cool! When can I use these two plugins to integrate duckflux into openclaw

DEV Community