Orchestrating a Team of AI Agents from a Single CLI

Alex — Fri, 13 Mar 2026 19:38:09 +0000

Published: 2026-03-13 | Target: Dev.to / Hashnode | Audience: Senior devs, DevOps engineers, ML engineers

The Problem: You've Become a Human Message Bus

Picture this: you're running a feature sprint with three AI agents. Claude Code is implementing an OAuth2 flow in one terminal. OpenAI Codex is writing integration tests in another. A shell script is running database migrations somewhere else.

You're copy-pasting context between terminals, manually checking whether each agent is done, restarting the one that crashed, and deciding which task should go next.

You haven't automated your workflow. You've automated individual steps and manually stitched them together. The coordination overhead — tracking state, routing context, handling failures — still lands on you.

This is the core problem when running 3+ AI agents: the agents themselves are capable, but there's nothing managing them as a team. You fill that gap.

Before ORCH:
  You → Claude ("implement auth")
  You → Codex ("write tests for auth")
  You → Claude ("auth failed, retry with this context...")
  You → Codex ("actually, auth changed, update tests")
  You → You ("why am I doing this manually")

The Solution: A CLI Orchestrator with a Real State Machine

ORCH is a CLI tool that manages your AI agents as a team. You define agents once, add tasks, and let the orchestrator handle dispatch, retries, and coordination.

$ orch run --all

  orch · watching · 3 running · 0 queued

  14:32  ▶ Backend A    → "Implement OAuth2 flow"
  14:32  ▶ Backend B    → "Write API integration tests"
  14:32  ▶ QA           → "Verify auth edge cases"
  14:35  ✓ Backend B    DONE  (3m 12s · 4,200 tokens)
  14:38  ✓ Backend A    DONE  (6m 44s · 8,100 tokens)
  14:39  ↻ QA           RETRY  attempt 2 · found regression
  14:41  ✓ QA           DONE  (2m 15s · 2,800 tokens)

No tab-switching. No manual routing. The agents work; you watch metrics.

Architecture: Three Layers That Scale

The architecture follows domain-driven design with clean separation between what the system knows (domain), what it does (application), and how it stores things (infrastructure).

┌─────────────────────────────────────────────────────┐
│                   CLI / TUI Interface                │
│  Commands (Commander.js)    Dashboard (Ink/React)    │
└─────────────────────────┬───────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────┐
│              Application Layer                       │
│  Orchestrator  ←→  EventBus  ←→  Services           │
│  (tick/dispatch/reconcile)      (Task/Agent/Goal)   │
└──────────┬───────────────────────────────┬──────────┘
           │                               │
┌──────────▼──────────┐    ┌───────────────▼──────────┐
│   Domain Layer       │    │  Infrastructure Layer     │
│   State Machine      │    │  Adapters: claude/codex/  │
│   todo→in_progress   │    │  cursor/shell             │
│   →review→done       │    │  Storage: YAML/JSON/JSONL │
│   Events (31 types)  │    │  Process Manager          │
└─────────────────────┘    └──────────────────────────┘

The State Machine

Every task follows a validated state machine. Invalid transitions are rejected at the domain layer:

todo → in_progress → review → done
                   ↘ retrying → in_progress
                   ↘ failed

This isn't a glorified TODO list. When an agent finishes, the task goes to review — it doesn't jump straight to done. When it fails, ORCH retries with exponential backoff. When an agent stalls (exceeds timeout), ORCH kills it and re-queues the task. These transitions are enforced by transitions.ts:

// src/domain/transitions.ts
const VALID_TRANSITIONS: Record<TaskStatus, TaskStatus[]> = {
  todo:        ['in_progress', 'cancelled'],
  in_progress: ['review', 'retrying', 'failed', 'cancelled'],
  review:      ['done', 'in_progress', 'failed'],
  retrying:    ['in_progress', 'failed', 'cancelled'],
  done:        [],
  failed:      ['todo'],
  cancelled:   [],
};

The Event Bus

Everything in ORCH communicates through a typed event bus. Agents emit events, the orchestrator reacts, the TUI updates — all decoupled. There are 31 event types covering the full lifecycle:

// Sampling of event types from src/domain/events.ts
'task:created' | 'task:status_changed' | 'task:scope_overlap'
'agent:output' | 'agent:completed' | 'agent:error'
'run:started' | 'run:finished'
'goal:created' | 'goal:status_changed'
'team:created' | 'team:task_claimed'
'orchestrator:error' | 'orchestrator:shutdown'
'message:sent' | 'message:delivered'

The event bus supports wildcard subscriptions (onAny()), which is how the TUI activity feed works — it receives every event without knowing the event types in advance.

Adapters: The Plugin Layer

Each AI tool is wrapped in an adapter that implements a common interface. You get Claude, Codex, Cursor, and Shell out of the box:

src/infrastructure/adapters/
├── claude.ts    # Claude Code CLI (claude --print)
├── codex.ts     # OpenAI Codex CLI (codex exec --json)
├── cursor.ts    # Cursor Agent (headless mode)
├── shell.ts     # Any command: npm test, python bot.py, etc.
└── utils.ts     # Shared: extractTokens, createStreamingEvents

Adapters are async generators that stream events — token counts, file changes, output chunks — as they happen. The orchestrator consumes these streams and writes to JSONL event logs for replay and analysis.

File-Based Storage: No Database Required

State lives entirely in .orchestry/:

.orchestry/
├── config.yml          # Project config (agents, timeouts, max_concurrent)
├── state.json          # Live orchestrator state (running, claimed, retry_queue)
├── tasks/              # One YAML file per task
├── agents/             # One YAML file per agent
├── runs/               # JSON metadata + JSONL event streams per run
├── goals/              # Goal YAML files
├── teams/              # Team YAML files
└── messages/           # Message YAML files

Atomic writes (temp file → rename) prevent corruption. File locking with O_EXCL prevents concurrent orchestrator instances. JSONL streams support tail-reads for large log files without loading everything into memory.

Demo Walkthrough: 10 Commands to a Working Agent Team

Here's the complete setup for a TypeScript project with a Backend agent, a QA agent, and task coordination:

1. Initialize the project

cd ~/my-project
orch init

Creates .orchestry/config.yml with sensible defaults. An Agent Creator agent is included automatically.

2. Add your agents

orch agent add "Backend" \
  --adapter claude \
  --model claude-sonnet-4-6 \
  --role "Senior TypeScript developer. Implements features following DDD patterns. Uses Promise.all for I/O, atomic writes for file operations. Runs tsc --noEmit and vitest run before finishing."

orch agent add "QA" \
  --adapter claude \
  --model claude-sonnet-4-6 \
  --role "QA engineer. Writes Vitest tests. Checks TypeScript types. Verifies edge cases and error handling. Never fixes bugs directly — files tasks for developers."

3. Create tasks

orch task add "Implement OAuth2 flow with refresh token rotation" \
  --priority 1 \
  --scope "src/auth/**"

orch task add "Write integration tests for OAuth2 endpoints" \
  --priority 2 \
  --scope "test/integration/**" \
  --depends-on <oauth-task-id>

The --scope flag prevents two agents from touching the same files simultaneously. --depends-on ensures the QA task only dispatches after OAuth is done.

4. Set a strategic goal

orch goal add "Ship OAuth2 authentication" \
  --description "Complete implementation, tests, and documentation for the OAuth2 flow by end of sprint"

orch goal status <goal-id> active

In autonomous mode, ORCH generates tasks for idle agents based on active goals. The agents stay productive without manual task creation.

5. Run everything

orch run --watch   # daemon mode — dispatches continuously
# or
orch             # opens the TUI dashboard

6. Monitor from the dashboard

orch            # TUI with Tasks / Agents / Goals tabs

The TUI shows live task statuses, agent activity with token counts, and the event stream. You can create tasks, assign agents, and approve reviews — all with keyboard shortcuts, without leaving the terminal.

7. Inter-agent messaging

# Agent shares findings with the team
orch msg send <qa-agent-id> "Auth tests failing on token expiry edge case — see test/auth.spec.ts:142" \
  --subject "OAuth regression"

# Or broadcast to everyone
orch msg broadcast "Sprint review in 30 min — wrap up current tasks" \
  --subject "Coordination"

8. Shared context store

# Agent writes results others can read
orch context set oauth-endpoints "POST /auth/token, POST /auth/refresh, DELETE /auth/logout"

# Another agent reads it
orch context get oauth-endpoints

The context store supports TTL for ephemeral data. Agents use it to share findings mid-run without direct coordination.

Key Design Decisions

Decision 1: File-based state over a database

The .orchestry/ directory is the entire system state. This means:

Zero infrastructure to set up or maintain
State is readable, debuggable, and git-committable
Atomic writes prevent corruption without transactions
JSONL event logs are append-only — concurrent agents don't conflict

The cost: reading large task lists requires Promise.all parallelization (implemented), and JSONL logs need tail-reads for large files (also implemented via readJsonlTail()).

Decision 2: Reactive dispatch over polling

Early versions polled every 30 seconds. That meant new tasks waited up to 30 seconds before dispatching — unacceptable for interactive use. The solution: when a task is created, the event bus emits task:created, and a debounced handler triggers an immediate mini-dispatch cycle (500ms debounce, no reconcile). Agents start working in under a second.

Decision 3: Scope overlap as a safety mechanism

Two agents editing the same files simultaneously produces merge conflicts and corrupted state. The scope system detects overlap at dispatch time and blocks conflicting tasks. Instead of failing silently, it emits a task:scope_overlap event visible in the activity feed.

Decision 4: 31-event typed bus, not logging

Every meaningful state change is an event, not a log line. This enables the TUI to react in real-time, the context store to be event-sourced, and test suites to assert on specific events rather than parsing strings. The event bus has max listener protection and wildcard subscription support.

Decision 5: Adapters as async generators

Agent execution yields events as they happen — output chunks, token counts, file modifications. This enables:

Real-time TUI streaming (batched at 80ms for performance)
Per-run JSONL event logs for audit and replay
Token counting and cost tracking per task
Stall detection without polling (last-event timestamp)

What's Next

Workspace isolation: Currently worktree mode creates a git worktree per run for file isolation. isolated mode clones the entire repo. We're working on merge-back strategies for parallel agents editing the same repository.

Review pipeline: The auto-review feature (currently in review-runner.ts) runs a reviewer agent on task output before transitioning to done. We're expanding this with configurable review criteria and structured feedback.

Goal decomposition: When a goal is assigned to an agent in autonomous mode, the agent generates subtasks. We're working on goal templates that provide decomposition hints for common workflows (feature development, refactoring, documentation).

npm registry: Currently published to GitHub Packages. Moving to the public npm registry for easier installation.

Try It

npm install -g @oxgeneral/orch --registry=https://npm.pkg.github.com

cd ~/your-project
orch init
orch agent add "Backend" --adapter claude --role "TypeScript developer"
orch task add "Your first task" --priority 1
orch run --all

The full TUI launches with orch. 987 tests, strict TypeScript, MIT license.

GitHub: https://github.com/oxgeneral/ORCH

If you're running multiple AI agents and spending more time coordinating them than doing actual engineering work — that's the exact problem ORCH was built to solve. Questions about the architecture or autonomous mode? Drop them in the comments.

Tags: ai-agents, cli, devops, typescript, automation, developer-tools

Found this useful?

If this post helped you think about AI agent orchestration differently, here's how to support the project:

Star the repo on GitHub: github.com/oxgeneral/ORCH — it helps others discover the project
Install and try it: npm install -g @oxgeneral/orch
Join the discussion: GitHub Discussions — questions, ideas, and feedback welcome

Built with 987 tests, strict TypeScript, and MIT license. Pull requests open.

DEV Community: Alex