DEV Community

Cover image for ExternalCodingAgent in KaibanJS: Using Claude Code for Airline Cancellations to Future Flight Credits
Dariel Vila for KaibanJS

Posted on

ExternalCodingAgent in KaibanJS: Using Claude Code for Airline Cancellations to Future Flight Credits

Most multi-agent setups stop at "LLM A hands off to LLM B." That covers a lot of ground, but some tasks are execution-heavy: hit a URL, parse the HTML, notice the page is bot-blocked, fall back to a curated source, run a fee calculation, and return something structured enough that the next agent can trust it. Wrapping all of that inside a plain chat completion is the wrong abstraction.

KaibanJS adds ExternalCodingAgent for exactly this case: one task in a Team is executed by a local developer CLI, today Claude Code or OpenCode, plus a mock backend for CI—while every other part of the team lifecycle stays identical: interpolated task descriptions, context passing, completion handlers, HITL gates, and errors when the subprocess fails.

This post covers the API surface and then walks a concrete implementation: a three-agent team that handles flight cancellations converted into future travel credits, eligibility first, customer review with a mandatory approval gate second, and conditional resolution last. The scenario is the vehicle for the code; if you also want the product framing, Kaiban has an airline use-case page for the same workflow.


Why a separate agent type instead of tools?

KaibanJS agents already support a tools array, but ExternalCodingAgent does not use it for CLI execution. The split is intentional:

  • KaibanJS owns the workflow: task ordering, board state, validation gates, handoffs between agents.
  • The external CLI owns the execution session: which tools it can call, permission scope, stdout/stderr, and structured output format.

That boundary matters in production—you configure Claude Code's allowed tools via --allowedTools, not by wiring KaibanJS tool objects. The library's job is to compose the prompt, spawn the process, and surface the result (or the error) into the team state.

Each task triggers one CLI run. Reviewer feedback appends to the prompt and triggers another run. That is the full loop—same as other agents, but the "model call" is a subprocess.

Official reference: Using ExternalCodingAgent.


Agent definition: the full parameter surface

Requirements: Node.js (subprocesses—browser-only bundles are not supported), a kaibanjs release that exports ExternalCodingAgent, and the provider env var your CLI needs (e.g. ANTHROPIC_API_KEY for headless Claude Code).

import { Agent } from 'kaibanjs';

const agent = new Agent({
  type: 'ExternalCodingAgent',
  name: 'Coder',
  role: 'Implementation assistant',
  goal: 'Use the external CLI to satisfy each task',
  background: 'Runs in Node against workspaceRoot',
  codingBackend: 'claude-code', // 'opencode' | 'mock'
  workspaceRoot: '/absolute/path/to/repo',
  timeoutMs: 600_000,
  cliPath: '/optional/path/to/claude', // defaults to 'claude'
  claude: {
    useBare: true,         // scripted-friendly --bare flag (default: true)
    allowedTools: 'Read',  // narrow allowlist strongly recommended
    permissionMode: undefined,
    maxTurns: undefined,
    maxBudgetUsd: undefined,
    extraArgs: [],
  },
});
Enter fullscreen mode Exit fullscreen mode

Key fields at a glance (full parameter table):

Field Notes
codingBackend 'claude-code', 'opencode', or 'mock' (no subprocess, deterministic output—ideal for CI).
workspaceRoot CWD passed to the CLI; usually the repo root.
claude.useBare Enables the scripted JSON output mode; leave true unless you know why not.
claude.allowedTools Comma-separated list of tools the CLI may use. Start narrow.
timeoutMs Defaults to 600 000 ms (10 min). Tune down for fast tasks.

On structured output: if Claude Code's JSON response includes a structured_output field, the task result stored in KaibanJS state is that structured value. Otherwise it falls back to the plain text result. See docs.


Worked example: reservation cancellation → future travel credit

The problem shape

A customer wants to cancel a flight but keep the value as future credit. The answer is never a simple yes/no: it depends on fare family, route, timing, carrier policy, and sometimes the text of a live airline policy page (which may block automation, require JS, or just return empty HTML).

That makes it a good fit for the ExternalCodingAgent pattern:

  • Task 1 (ExternalCodingAgent): policy research + fee/credit calculation—CLI-grade work with Bash and curl.
  • Task 2 (standard agent): translate the eligibility result into a human-readable review card. Task is marked externalValidationRequired: true, which pauses the team until the customer explicitly accepts or declines.
  • Task 3 (standard agent): resolve the case—issue credit references or escalate—based on that decision.

The HITL gate on task 2 is the critical design choice: the workflow never cancels the booking on AI confidence alone.

Team wiring (excerpt from the open-source demo)

import { Agent, Task, Team } from 'kaibanjs';

// --- Agent 1: research delegated to Claude Code ---
const eligibilityAgent = new Agent({
  type: 'ExternalCodingAgent',
  name: 'Eligibility Evaluator',
  role: 'Fare Rules & Cancellation Policy Specialist',
  goal: 'Accurately evaluate flight cancellation eligibility using real-time fare rule research',
  background:
    'Expert in airline fare rules powered by Claude Code CLI. Uses curl and Bash to research current policies from official airline sources.',
  codingBackend: 'claude-code',
  workspaceRoot: WORKSPACE_ROOT,
  timeoutMs: 600_000,
  claude: {
    useBare: true,
    allowedTools: 'Bash',
  },
});

// --- Agent 2: customer-facing copy, standard agent ---
const notificationAgent = new Agent({
  name: 'Customer Notification Agent',
  role: 'Customer Service Communication Specialist',
  goal: 'Prepare clear, empathetic cancellation terms for customer review and decision',
  background:
    'Specialist in translating fare rules into customer-friendly communications.',
  maxIterations: 2,
  forceFinalAnswer: true,
});

// --- Agent 3: conditional resolution, standard agent ---
const resolutionAgent = new Agent({
  name: 'Resolution Agent',
  role: 'Reservation Cancellation & Resolution Specialist',
  goal: 'Execute the cancellation and issue future flight credit, or prepare an escalation brief',
  background:
    'Senior specialist for finalizing airline cancellations and coordinating escalations.',
  maxIterations: 2,
  forceFinalAnswer: true,
});

// --- Tasks ---
const eligibilityTask = new Task({
  id: 'eligibilityTask',
  title: 'Evaluate Cancellation Eligibility',
  description: buildEligibilityDescription(inputs),
  expectedOutput:
    'Markdown eligibility report with status, credit amount, fees, conditions, and policy source',
  agent: eligibilityAgent,
});

const notificationTask = new Task({
  id: 'notificationTask',
  title: 'Prepare Customer Notification',
  description: buildNotificationDescription(inputs),
  expectedOutput:
    'Markdown customer review card with credit details and accept/deny consequences',
  agent: notificationAgent,
  externalValidationRequired: true, // ← pauses team until customer decision
});

const resolutionTask = new Task({
  id: 'resolutionTask',
  title: 'Finalize: issue credit or escalate',
  description: buildResolutionDescription(inputs),
  expectedOutput:
    'Markdown resolution with cancellation confirmation and credit details, or escalation brief',
  agent: resolutionAgent,
});

const team = new Team({
  name: 'Cancellation Team',
  agents: [eligibilityAgent, notificationAgent, resolutionAgent],
  tasks: [eligibilityTask, notificationTask, resolutionTask],
  inputs,
  env: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY },
});
Enter fullscreen mode Exit fullscreen mode

How the demo wires it into Next.js

The sample app keeps Claude Code entirely server-side—the browser never spawns a subprocess:

  • POST /api/team/start — builds the team and streams task updates via SSE.
  • POST /api/team/validate — resumes the paused workflow after the customer accepts or declines.

That split satisfies the Node-only requirement and keeps API keys off the client. Walkthrough video: YouTube.


Task chaining

Each task's description can reference the output of an earlier task with interpolation syntax: {taskResult:task1} for the first task in the list, and so on (docs).

In this demo the notification and resolution prompts consume the eligibility result so neither agent receives a monolithic system prompt. Each one has a narrow input/output contract: eligibility out → notification in; notification out + customer decision → resolution in.


Safety and limitations

From the official limitations section:

  • Trust boundary. The agent spawns arbitrary CLIs with composed prompts. Never pass unsanitized user-controlled strings into flags or extraArgs.
  • Narrow allowlists. Use claude.allowedTools to restrict what the CLI can touch. Defaults are your responsibility.
  • mock backend. Use it in CI and local dev to assert team wiring without real API keys or subprocesses.
  • Memory. Very large stdout/stderr is not specially truncated by the library today; long-running tasks may use significant memory.

Try it yourself

Resource Link
Demo repository kaiban-ai/kaibanjs-claude-code-cancel-flight-for-future-credit-demo
KaibanJS kaibanjs.com
ExternalCodingAgent how-to Using ExternalCodingAgent
Official playground playground/external-coding-agents in the KaibanJS repo
Airline use-case framing Cancel for Future Flight Credit

Closing

ExternalCodingAgent answers a specific question every multi-agent stack eventually hits: where do execution-heavy, tool-using steps live without polluting the rest of the workflow?

The cancellation demo gives that answer a concrete shape: one agent does research via Claude Code (bash, HTTP, structured output); two standard agents handle customer language and conditional resolution; one externalValidationRequired flag ensures a human is in the loop before anything irreversible happens.

If you are evaluating the pattern for your own workflows, clone the repo, run createCancellationTeam with codingBackend: 'mock' first to verify the wiring, then swap in 'claude-code' once you have keys in place.


API or docs ahead of this post? The source of truth is always Using ExternalCodingAgent.

Top comments (0)