Most multi-agent setups stop at "LLM A hands off to LLM B." That covers a lot of ground, but some tasks are execution-heavy: hit a URL, parse the HTML, notice the page is bot-blocked, fall back to a curated source, run a fee calculation, and return something structured enough that the next agent can trust it. Wrapping all of that inside a plain chat completion is the wrong abstraction.
KaibanJS adds ExternalCodingAgent for exactly this case: one task in a Team is executed by a local developer CLI, today Claude Code or OpenCode, plus a mock backend for CI—while every other part of the team lifecycle stays identical: interpolated task descriptions, context passing, completion handlers, HITL gates, and errors when the subprocess fails.
This post covers the API surface and then walks a concrete implementation: a three-agent team that handles flight cancellations converted into future travel credits, eligibility first, customer review with a mandatory approval gate second, and conditional resolution last. The scenario is the vehicle for the code; if you also want the product framing, Kaiban has an airline use-case page for the same workflow.
Why a separate agent type instead of tools?
KaibanJS agents already support a tools array, but ExternalCodingAgent does not use it for CLI execution. The split is intentional:
- KaibanJS owns the workflow: task ordering, board state, validation gates, handoffs between agents.
- The external CLI owns the execution session: which tools it can call, permission scope, stdout/stderr, and structured output format.
That boundary matters in production—you configure Claude Code's allowed tools via --allowedTools, not by wiring KaibanJS tool objects. The library's job is to compose the prompt, spawn the process, and surface the result (or the error) into the team state.
Each task triggers one CLI run. Reviewer feedback appends to the prompt and triggers another run. That is the full loop—same as other agents, but the "model call" is a subprocess.
Official reference: Using ExternalCodingAgent.
Agent definition: the full parameter surface
Requirements: Node.js (subprocesses—browser-only bundles are not supported), a kaibanjs release that exports ExternalCodingAgent, and the provider env var your CLI needs (e.g. ANTHROPIC_API_KEY for headless Claude Code).
import { Agent } from 'kaibanjs';
const agent = new Agent({
type: 'ExternalCodingAgent',
name: 'Coder',
role: 'Implementation assistant',
goal: 'Use the external CLI to satisfy each task',
background: 'Runs in Node against workspaceRoot',
codingBackend: 'claude-code', // 'opencode' | 'mock'
workspaceRoot: '/absolute/path/to/repo',
timeoutMs: 600_000,
cliPath: '/optional/path/to/claude', // defaults to 'claude'
claude: {
useBare: true, // scripted-friendly --bare flag (default: true)
allowedTools: 'Read', // narrow allowlist strongly recommended
permissionMode: undefined,
maxTurns: undefined,
maxBudgetUsd: undefined,
extraArgs: [],
},
});
Key fields at a glance (full parameter table):
| Field | Notes |
|---|---|
codingBackend |
'claude-code', 'opencode', or 'mock' (no subprocess, deterministic output—ideal for CI). |
workspaceRoot |
CWD passed to the CLI; usually the repo root. |
claude.useBare |
Enables the scripted JSON output mode; leave true unless you know why not. |
claude.allowedTools |
Comma-separated list of tools the CLI may use. Start narrow. |
timeoutMs |
Defaults to 600 000 ms (10 min). Tune down for fast tasks. |
On structured output: if Claude Code's JSON response includes a structured_output field, the task result stored in KaibanJS state is that structured value. Otherwise it falls back to the plain text result. See docs.
Worked example: reservation cancellation → future travel credit
The problem shape
A customer wants to cancel a flight but keep the value as future credit. The answer is never a simple yes/no: it depends on fare family, route, timing, carrier policy, and sometimes the text of a live airline policy page (which may block automation, require JS, or just return empty HTML).
That makes it a good fit for the ExternalCodingAgent pattern:
-
Task 1 (ExternalCodingAgent): policy research + fee/credit calculation—CLI-grade work with Bash and
curl. -
Task 2 (standard agent): translate the eligibility result into a human-readable review card. Task is marked
externalValidationRequired: true, which pauses the team until the customer explicitly accepts or declines. - Task 3 (standard agent): resolve the case—issue credit references or escalate—based on that decision.
The HITL gate on task 2 is the critical design choice: the workflow never cancels the booking on AI confidence alone.
Team wiring (excerpt from the open-source demo)
import { Agent, Task, Team } from 'kaibanjs';
// --- Agent 1: research delegated to Claude Code ---
const eligibilityAgent = new Agent({
type: 'ExternalCodingAgent',
name: 'Eligibility Evaluator',
role: 'Fare Rules & Cancellation Policy Specialist',
goal: 'Accurately evaluate flight cancellation eligibility using real-time fare rule research',
background:
'Expert in airline fare rules powered by Claude Code CLI. Uses curl and Bash to research current policies from official airline sources.',
codingBackend: 'claude-code',
workspaceRoot: WORKSPACE_ROOT,
timeoutMs: 600_000,
claude: {
useBare: true,
allowedTools: 'Bash',
},
});
// --- Agent 2: customer-facing copy, standard agent ---
const notificationAgent = new Agent({
name: 'Customer Notification Agent',
role: 'Customer Service Communication Specialist',
goal: 'Prepare clear, empathetic cancellation terms for customer review and decision',
background:
'Specialist in translating fare rules into customer-friendly communications.',
maxIterations: 2,
forceFinalAnswer: true,
});
// --- Agent 3: conditional resolution, standard agent ---
const resolutionAgent = new Agent({
name: 'Resolution Agent',
role: 'Reservation Cancellation & Resolution Specialist',
goal: 'Execute the cancellation and issue future flight credit, or prepare an escalation brief',
background:
'Senior specialist for finalizing airline cancellations and coordinating escalations.',
maxIterations: 2,
forceFinalAnswer: true,
});
// --- Tasks ---
const eligibilityTask = new Task({
id: 'eligibilityTask',
title: 'Evaluate Cancellation Eligibility',
description: buildEligibilityDescription(inputs),
expectedOutput:
'Markdown eligibility report with status, credit amount, fees, conditions, and policy source',
agent: eligibilityAgent,
});
const notificationTask = new Task({
id: 'notificationTask',
title: 'Prepare Customer Notification',
description: buildNotificationDescription(inputs),
expectedOutput:
'Markdown customer review card with credit details and accept/deny consequences',
agent: notificationAgent,
externalValidationRequired: true, // ← pauses team until customer decision
});
const resolutionTask = new Task({
id: 'resolutionTask',
title: 'Finalize: issue credit or escalate',
description: buildResolutionDescription(inputs),
expectedOutput:
'Markdown resolution with cancellation confirmation and credit details, or escalation brief',
agent: resolutionAgent,
});
const team = new Team({
name: 'Cancellation Team',
agents: [eligibilityAgent, notificationAgent, resolutionAgent],
tasks: [eligibilityTask, notificationTask, resolutionTask],
inputs,
env: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY },
});
How the demo wires it into Next.js
The sample app keeps Claude Code entirely server-side—the browser never spawns a subprocess:
-
POST /api/team/start— builds the team and streams task updates via SSE. -
POST /api/team/validate— resumes the paused workflow after the customer accepts or declines.
That split satisfies the Node-only requirement and keeps API keys off the client. Walkthrough video: YouTube.
Task chaining
Each task's description can reference the output of an earlier task with interpolation syntax: {taskResult:task1} for the first task in the list, and so on (docs).
In this demo the notification and resolution prompts consume the eligibility result so neither agent receives a monolithic system prompt. Each one has a narrow input/output contract: eligibility out → notification in; notification out + customer decision → resolution in.
Safety and limitations
From the official limitations section:
-
Trust boundary. The agent spawns arbitrary CLIs with composed prompts. Never pass unsanitized user-controlled strings into flags or
extraArgs. -
Narrow allowlists. Use
claude.allowedToolsto restrict what the CLI can touch. Defaults are your responsibility. -
mockbackend. Use it in CI and local dev to assert team wiring without real API keys or subprocesses. - Memory. Very large stdout/stderr is not specially truncated by the library today; long-running tasks may use significant memory.
Try it yourself
| Resource | Link |
|---|---|
| Demo repository | kaiban-ai/kaibanjs-claude-code-cancel-flight-for-future-credit-demo |
| KaibanJS | kaibanjs.com |
| ExternalCodingAgent how-to | Using ExternalCodingAgent |
| Official playground |
playground/external-coding-agents in the KaibanJS repo |
| Airline use-case framing | Cancel for Future Flight Credit |
Closing
ExternalCodingAgent answers a specific question every multi-agent stack eventually hits: where do execution-heavy, tool-using steps live without polluting the rest of the workflow?
The cancellation demo gives that answer a concrete shape: one agent does research via Claude Code (bash, HTTP, structured output); two standard agents handle customer language and conditional resolution; one externalValidationRequired flag ensures a human is in the loop before anything irreversible happens.
If you are evaluating the pattern for your own workflows, clone the repo, run createCancellationTeam with codingBackend: 'mock' first to verify the wiring, then swap in 'claude-code' once you have keys in place.
API or docs ahead of this post? The source of truth is always Using ExternalCodingAgent.
Top comments (0)