DEV Community

Gustavo Gondim
Gustavo Gondim

Posted on • Edited on • Originally published at ggondim.notion.site

duckflux : A Declarative Workflow DSL Born from the Multi-Agent Orchestration Gap

TL;DR: After months exploring multi-agent orchestration with OpenClaw and Lobster, I hit a wall: no existing tool offered simple declarative spec + runtime-agnostic execution + first-class control flow. So I designed duckflux, a minimal YAML-based workflow DSL with loops, conditionals, parallelism, and events built in. The spec is now at v0.7, the TypeScript runtime ships as a CLI (quack) and an embeddable library (@duckflux/core), with pluggable event hub backends (in-memory, NATS, Redis) and built-in execution tracing. Full docs at duckflux.openvibes.tech.


Table of Contents


Previously, on this series

This article is the third in a series about building deterministic multi-agent development pipelines. If you're joining now, here's the short version.

In the first article, I documented two months of trial and error trying to build a code -> review -> test pipeline with autonomous AI agents. The core thesis: LLMs are unreliable routers, they forget steps, miscount iterations, skip transitions. Orchestration must be deterministic and implemented in code, not delegated to inference. After five failed attempts (Ralph Orchestrator, OpenClaw sub-agents, a custom event bus, skill-driven self-orchestration, and plugin hooks), I found Lobster, OpenClaw's built-in workflow engine. It was close, but lacked native loop support. I contributed a pull request adding sub-workflow steps with loops.

In the second article, I zoomed out. The problem wasn't just orchestration, it was multi-agents x multi-projects x multi-providers x multi-channels. I compiled a dataset of agent configuration formats across providers, proposed the Monoswarm pattern (a monorepo layout for managing agent swarms), and identified the still-missing piece: an orchestration layer that ties agent events to workflow transitions across projects.

Both articles ended with the same conclusion: we need a proper workflow DSL.

The gap that remained

Lobster was the closest thing to what I needed, but it was designed for linear pipelines with approval gates. My pull request added loops, but the deeper issues remained:

  • No conditional branching (if/then/else).
  • No parallel execution of multiple agents.
  • No event system for inter-agent coordination.
  • No typed expressions, conditions were shell commands returning exit codes.
  • Tied to OpenClaw's runtime, not portable to other environments.

I looked at the broader landscape:

Tool Where it falls short
Argo Workflows Turing-complete YAML disguised as config. A conditional loop requires template recursion, manual iteration counters, and string-interpolated type casting.
GitHub Actions No conditional loops. Workarounds require unrolling or recursive reusable workflows.
Temporal / Inngest Code-first (Go/TS/Python SDKs). The code IS the spec. No declarative layer.
Airflow / Prefect DAGs are acyclic by definition, conditional loops are architecturally impossible.
n8n / Make Visual-first, JSON-heavy specs. Loop constructs require JavaScript function nodes. Specs are unreadable as text.
Lobster Linear pipelines with approval gates. No native loops, no parallelism, no conditionals.

The gap was clear: no existing tool combines a simple declarative spec + runtime-agnostic execution + first-class control flow (loops, conditionals, parallelism) + events.

So I built one.

What is duckflux

duckflux is a minimal, deterministic, runtime-agnostic DSL for orchestrating workflows through declarative YAML. The spec is at v0.7, with a complete TypeScript runtime and a documentation site at duckflux.openvibes.tech.

The design principles are deliberate:

  1. Readable in 5 seconds -- any developer understands the flow by glancing at the YAML.
  2. Minimal by default -- features are only added when absolutely necessary.
  3. Convention over configuration -- sensible defaults everywhere.
  4. Runtime-agnostic -- the DSL defines WHAT happens and in WHAT ORDER. The runtime decides HOW.
  5. String by default -- every participant receives and returns strings unless a schema is explicitly defined, like stdin/stdout, the universal interface.
  6. Reuse proven standards -- expressions use Google CEL (used in Kubernetes, Firebase, Envoy), schemas use JSON Schema, format is YAML.

The simplest possible workflow:

flow:
  - type: exec
    run: echo "Hello, duckflux!"
Enter fullscreen mode Exit fullscreen mode

That's it. One flow, one step. No boilerplate, no mandatory fields beyond what's needed.

A more realistic example: an agentic coding pipeline where a planner breaks work into tasks, then a loop fetches each task, a coder implements it, and a reviewer checks it:

id: agentic-coding-pipeline
name: Agentic Coding Pipeline
version: "0.7"

defaults:
  timeout: 10m
  cwd: ./repo

inputs:
  goal:
    type: string
    required: true
    description: "High-level description of what needs to be built"
  taskQueueUrl:
    type: string
    required: true
  maxRounds:
    type: integer
    default: 3
    minimum: 1
    maximum: 10

participants:
  planner:
    type: exec
    run: >
      claude -p
      "Break the following goal into discrete coding tasks.
      Return a JSON array of {id, description} objects.
      Goal: " + workflow.inputs.goal
    timeout: 5m
    output:
      type: array
      items:
        type: object
        required: true

  fetchTask:
    type: http
    url: workflow.inputs.taskQueueUrl + "/next"
    method: GET
    headers:
      Accept: application/json

  coder:
    type: exec
    run: >
      claude -p
      "Implement the following task in the current repository.
      Task: " + fetchTask.output.description
    timeout: 15m
    onError: retry
    retry:
      max: 2
      backoff: 10s

  reviewer:
    type: exec
    run: >
      claude -p
      "Review the changes for the following task. Return a JSON
      object with 'approved' (boolean) and 'feedback' (string).
      Task: " + fetchTask.output.description
    timeout: 10m
    output:
      approved:
        type: boolean
        required: true
      feedback:
        type: string

flow:
  - planner

  - loop:
      max: workflow.inputs.maxRounds
      steps:
        - fetchTask
        - coder:
            input:
              task: fetchTask.output.description
        - reviewer:
            input:
              task: fetchTask.output.description

output:
  approved: reviewer.output.approved
  feedback: reviewer.output.feedback
  rounds: loop.iteration
Enter fullscreen mode Exit fullscreen mode

Compare this to the same scenario in Argo Workflows (~40 lines of template recursion), GitHub Actions (~50+ lines with unrolled iterations), or Temporal (~35 lines of Go code that requires compilation and a server).

Alternatives considered

Before landing on a custom YAML format, I evaluated two other approaches:

Extending Argo Workflows. Argo's YAML is expressive, but its power came from 6+ years of incremental feature additions. A conditional loop in Argo requires template recursion, manual iteration counters, and string-interpolated type casting, 13+ lines for what should be 6. The complexity is the feature, not a bug, and that's the problem.

Mermaid as executable spec. Mermaid sequence diagrams already have loop, par, and alt constructs. The DX for reading and writing is excellent, and diagrams render natively in GitHub. However, extending Mermaid for real workflow concerns (retry policies, timeouts, error handling, typed variables) requires hacking Note blocks for config and $var for expressions, creating a custom parser as proprietary as a new YAML format, just disguised as something familiar.

Custom minimal YAML (chosen). A new format, intentionally constrained, inspired by Mermaid's visual clarity but with the extensibility and tooling ecosystem of YAML. The tradeoff: a new DSL to learn, but one designed to be readable in 5 seconds and writable in 5 minutes.

The spec at a glance

The full spec is at github.com/duckflux/spec, with complete documentation at duckflux.openvibes.tech. Here's a walkthrough of the key features.

Participants

Participants are the atomic unit of work. Each has a type that determines its behavior:

Type Description
exec Shell command
http HTTP request
mcp MCP server tool call
workflow Sub-workflow (composition)
emit Fire an event to the event hub

Participants can be defined in three ways: in a reusable participants block, as named inline steps (with as), or as anonymous inline steps (without a name at all):

# Reusable (in participants block)
participants:
  build:
    type: exec
    run: npm run build

flow:
  # Reference a reusable participant
  - build

  # Named inline (one-off, but addressable by name)
  - as: notify
    type: http
    url: https://hooks.slack.com/services/...
    method: POST

  # Anonymous inline (output accessible only via the I/O chain)
  - type: exec
    run: echo "done"
Enter fullscreen mode Exit fullscreen mode

Implicit I/O chain

One of the most impactful features added since v0.2: the output of each step is automatically passed as input to the next step, forming a chain analogous to Unix pipes.

flow:
  - type: exec
    run: curl -s https://api.example.com/data
  - type: exec
    run: jq '.items[] | .name'
  - type: exec
    run: wc -l
Enter fullscreen mode Exit fullscreen mode

Each step receives the previous step's output on stdin. No explicit input mapping needed for linear pipelines. When a participant also has an explicit input mapping, the runtime merges the chained value with the explicit mapping.

Control flow

Loops -- repeat until a CEL condition is true or N iterations:

- loop:
    until: reviewer.output.approved == true
    max: 3
    steps:
      - coder
      - reviewer
Enter fullscreen mode Exit fullscreen mode

Parallel -- run steps concurrently:

- parallel:
    - as: lint
      type: exec
      run: npm run lint
    - as: test
      type: exec
      run: npm test
Enter fullscreen mode Exit fullscreen mode

Conditionals -- branch based on CEL expressions:

- if:
    condition: tests.status == "success"
    then:
      - deploy
    else:
      - rollback
Enter fullscreen mode Exit fullscreen mode

Guards -- skip a single step conditionally:

- deploy:
    when: reviewer.output.approved == true
Enter fullscreen mode Exit fullscreen mode

Wait -- pause for an event, a timeout, or a polling condition:

# Wait for an external event
- wait:
    event: "approval.received"
    match: event.requestId == submitForApproval.output.id
    timeout: 24h

# Sleep
- wait:
    timeout: 30s

# Poll until a condition is true
- wait:
    until: now >= timestamp("2024-04-01T09:00:00Z")
    poll: 1m
    timeout: 48h
Enter fullscreen mode Exit fullscreen mode

Set -- write values into a shared execution context without producing output:

- set:
    token: workflow.inputs.api_token
    region: env.AWS_REGION

- as: fetchData
  type: http
  url: "'https://api.example.com/data'"
  headers:
    Authorization: "'Bearer ' + execution.context.token"
Enter fullscreen mode Exit fullscreen mode

set is transparent to the I/O chain: the chain passes through unchanged.

Exec input passing semantics

How input reaches an exec subprocess depends on its type:

  • Map input -> environment variables. When the resolved input is an object, each key-value pair is injected as an environment variable. The run command references them via shell interpolation (${KEY}).
  • String input -> stdin. When the resolved input is a string, it's passed via stdin, enabling Unix pipe-style chaining.
# Map input: keys become environment variables
- as: deploy
  type: exec
  run: ./deploy.sh --branch="${BRANCH}" --env="${TARGET_ENV}"
  input:
    BRANCH: workflow.inputs.branch
    TARGET_ENV: execution.context.environment

# String input: passed via stdin
flow:
  - type: exec
    run: echo '{"name": "World"}'
  - type: exec
    run: jq -r '.name'
Enter fullscreen mode Exit fullscreen mode

Expressions with Google CEL

All conditions, input mappings, and output mappings use Google CEL. CEL is non-Turing-complete, sandboxed (no I/O, no side effects), type-checked at parse time, and has a familiar C/JS/Python-like syntax:

- if:
    condition: reviewer.output.approved == false && loop.iteration < 3
Enter fullscreen mode Exit fullscreen mode

The runtime ships with the full CEL standard library: has, size, matches, contains, startsWith, endsWith, timestamp, duration, filter, map, exists, all, and more.

CEL was chosen over JavaScript eval (security surface, runtime dependency), custom mini-DSLs (implementation burden), and JSONPath/JMESPath (poor logic support).

Variable namespaces

Since v0.3, input and output are participant-scoped: inside a participant, input means "my input" and output means "my output". Workflow-level I/O lives under workflow.inputs.* and workflow.output.

Key runtime variables:

Namespace Description
workflow.inputs.* Workflow input parameters
workflow.output Workflow final result
<step>.output A step's output (auto-parsed if JSON)
<step>.status success, failure, or skipped
execution.context.* Shared read/write scratchpad (set via set)
env.* Environment variables (read-only)
loop.iteration Current loop iteration index
input Current participant's resolved input

Events

emit publishes events, wait subscribes. Events propagate both internally (within the workflow) and externally via the event hub:

- as: notifyProgress
  type: emit
  event: "task.progress"
  payload:
    taskId: workflow.inputs.taskId
    status: coder.output.status
  ack: true  # block until delivery confirmed
Enter fullscreen mode Exit fullscreen mode

Error handling

Configurable per participant, per flow step invocation, or globally via defaults, with four strategies:

# Global defaults
defaults:
  onError: retry
  retry:
    max: 2
    backoff: 5s

participants:
  coder:
    type: exec
    run: ./code.sh
    onError: retry       # retry with exponential backoff
    retry:
      max: 3
      backoff: 2s
      factor: 2          # exponential: 2s, 4s, 8s

  deploy:
    type: exec
    run: ./deploy.sh
    onError: notify      # redirect to a fallback participant
Enter fullscreen mode Exit fullscreen mode

Error strategy resolution chain: flow override > participant > defaults > fail.

Inputs and outputs

Everything is string by default, like stdin/stdout. Schema is opt-in via JSON Schema (written in YAML):

inputs:
  repoUrl:
    type: string
    format: uri
    required: true
  branch:
    type: string
    default: "main"

output:
  approved: reviewer.output.approved
  score: reviewer.output.score
Enter fullscreen mode Exit fullscreen mode

Input mapping supports flow-level overrides that merge with the participant's base input (instead of replacing it), so you never have to repeat shared configuration on every call:

participants:
  fetch_page:
    type: exec
    input:
      NOTION_TOKEN: execution.context.token   # base input, always present
    run: curl -sS "https://api.notion.com/v1/pages/$(cat)" -H "Authorization: Bearer ${NOTION_TOKEN}"

flow:
  - fetch_page:
      input:
        PAGE_ID: workflow.inputs.story_id    # merged with base input
Enter fullscreen mode Exit fullscreen mode

JSON Schema for editor support

A JSON Schema ships with the spec, giving you autocomplete and validation in VS Code for free:

{
  "yaml.schemas": {
    "./duckflux.schema.json": "*.duck.yaml"
  }
}
Enter fullscreen mode Exit fullscreen mode

Workflow files use the .duck.yaml convention (e.g., deploy.duck.yaml, review-loop.duck.yaml).

The TypeScript runtime

The original plan was a Go runner, chosen for its native CEL implementation (cel-go) and single-binary distribution. After prototyping, I switched to TypeScript: Go's plugin model can't support extensibility via npm packages, which is the core extensibility primitive for duckflux plugins. The runtime targets Bun and ships as both a CLI tool and an embeddable library.

Packages

Package Description
duckflux CLI tool (quack run, quack lint, quack validate)
@duckflux/core Engine, parser, CEL evaluator, event hub (in-memory)
@duckflux/hub-nats Optional NATS JetStream event hub backend
@duckflux/hub-redis Optional Redis Streams event hub backend

Installation

# Universal installer (auto-detects apt, brew, bun, npm; falls back to standalone binary)
curl -fsSL https://duckflux.github.io/apt-repo/install.sh | bash

# Or via Homebrew
brew install duckflux/tap/quack

# Or via npm/bun
npm install -g duckflux   # or: bun add -g duckflux

# Or run without installing
npx duckflux run workflow.yaml
Enter fullscreen mode Exit fullscreen mode

Standalone binaries (no Node.js or Bun required) are also available for macOS, Linux, and Windows on the GitHub Releases page.

CLI usage

# Run a workflow
quack run deploy.duck.yaml --input branch=main --input env=staging

# Run from stdin
echo '{"branch": "main"}' | quack run deploy.duck.yaml

# Validate (schema + semantics)
quack lint deploy.duck.yaml

# Validate with inputs
quack validate deploy.duck.yaml --input branch=main

# Start the web server UI for visual workflow observation
quack server --trace-dir ./traces

# Version
quack version
Enter fullscreen mode Exit fullscreen mode

Library usage

Drop @duckflux/core into any TypeScript project and run workflows in-process:

import { executeWorkflow } from "@duckflux/core/engine";
import { parseWorkflowFile } from "@duckflux/core/parser";

const workflow = await parseWorkflowFile("./pipeline.yaml");
const result = await executeWorkflow(workflow, { env: "production" });

console.log(result.output);  // structured output
console.log(result.steps);   // per-step results, timings, errors
Enter fullscreen mode Exit fullscreen mode

No subprocess, no serialization overhead, full TypeScript types.

Event hub backends

Async workflows that emit and wait on events work out of the box with the built-in in-memory hub. Scale up to NATS or Redis when you need cross-process delivery:

Backend Package Cross-process Use case
In-memory built-in No Development, testing, single-process
NATS JetStream @duckflux/hub-nats Yes Distributed, multi-process
Redis Streams @duckflux/hub-redis Yes Distributed with persistence
quack run workflow.yaml --event-backend nats --nats-url nats://localhost:4222
quack run workflow.yaml --event-backend redis --redis-addr localhost:6379
Enter fullscreen mode Exit fullscreen mode

Execution tracing

Every run can produce a structured trace, written incrementally as each step completes. Choose the format that fits your workflow:

# Trace to JSON (default)
quack run workflow.yaml --trace-dir ./traces

# Trace to SQLite (queryable with any SQL client)
quack run workflow.yaml --trace-dir ./traces --trace-format sqlite
Enter fullscreen mode Exit fullscreen mode

Each trace captures every step (participants and control-flow constructs alike) with timing, inputs, outputs, errors, and retry counts.

Spec v0.7 feature coverage

The runtime implements the complete duckflux v0.7 spec:

  • Participant types: exec, http, emit, workflow (+ mcp stub)
  • Control flow: loop, parallel, if/else, when guards, set, wait
  • I/O chaining: step output flows automatically as input to the next step
  • Expressions: full CEL standard library (has, size, matches, timestamp, duration, and more)
  • Error strategies: fail, skip, retry (exponential backoff), redirect to fallback participant
  • Input semantics: map input -> env vars, string input -> stdin
  • Input merge: flow override merges with participant base input instead of replacing it
  • Timeouts: per-step, per-participant, or global via defaults
  • Output schema validation: validate step and workflow output against JSON Schema definitions
  • Circular sub-workflow detection: prevents infinite recursion in nested workflows

What's next

Tooling and ecosystem

The documentation site at duckflux.openvibes.tech covers everything from getting started to the full library API. A browser-based visual editor for building workflows is planned.

On the roadmap

Features deliberately deferred from v0.7, to be prioritized based on real-world demand:

  • DAG mode -- explicit step dependencies (depends: [stepA, stepB]) for complex graphs
  • Durability / resume -- workflow survives a runtime crash and resumes from where it stopped
  • Matrix / fan-out -- combinatorial execution (e.g., tests across 3 Node versions x 2 OS)
  • Persistent mode -- workflow running as a daemon, reacting to events continuously
  • Caching between runs -- reuse outputs from idempotent steps across executions

The thesis, revisited

The journey from Protoagent to Lobster to duckflux converged on one insight: LLMs should do what they're good at (writing code, analyzing code, making decisions), and code should do what code is good at (sequencing, counting, routing, retrying).

duckflux is the code side of that equation. A deterministic orchestration layer where the flow is explicit, the execution is predictable, and the spec is readable by both humans and machines.


Links:

Top comments (1)

Collapse
 
heheheeeeee profile image
Jiepeng Wan

Sounds really cool! When can I use these two plugins to integrate duckflux into openclaw