WonderLab

Posted on Mar 6

OpenClaw Deep Dive (6): Node Host, Canvas, and Sub-Agents

#ai #opensource #openclaw #agents

Scenario: The "Boundary Problem" of an AI Assistant

After using OpenClaw as a personal AI assistant for a while, you'll encounter scenarios where a single-process model falls short:

Remote execution: You want the AI to run a script on your home Linux server, but the OpenClaw Gateway runs on your Mac — how does the AI reach that server's shell?
Interactive mobile UI: You want to see a real-time AI-generated dashboard on your phone and tap buttons to trigger the next action — how does the AI push a UI to a mobile WebView, and how do taps in that WebView feed back to the AI?
Parallel tasks: You ask the AI to sort 1,000 emails — processing them serially with one Agent is too slow. Can you deploy multiple AIs simultaneously?

These three scenarios correspond to three core extension mechanisms in OpenClaw: Node Host, Canvas + A2UI, and Sub-Agents.

1. Node Host: Giving AI Access to Remote Machines

Problem: The Gateway and Execution Target Are on Different Machines

The Gateway handles conversation management and Agent execution, but tools like system.run (execute shell commands) need to run on the target machine — which might be a remote server, NAS, Raspberry Pi, or a different process in a restricted environment on the same machine.

Node is the abstraction that solves this: an independent process that connects to the Gateway and responds to execution requests. A Node registers itself via the standard GatewayClient (WebSocket):

// src/node-host/runner.ts
const client = new GatewayClient({
  url: `wss://${gatewayHost}:${gatewayPort}`,
  instanceId: nodeId,         // unique node identifier (e.g. machine hostname)
  clientName: "node-host",
  role: "node",               // distinguished from "agent", "cli" roles
  caps: ["system", "browser"], // capabilities this node supports
  commands: NODE_SYSTEM_RUN_COMMANDS,  // supported command list
  onEvent: (evt) => {
    if (evt.event !== "node.invoke.request") return;
    const payload = coerceNodeInvokePayload(evt.payload);
    void handleInvoke(payload, client, skillBins);  // handle execution requests
  },
});
client.start();

Data flow:

Agent calls system.run tool
  → Gateway routes to target nodeId's WebSocket connection
  → sends node.invoke.request event
  → Node executes command (spawns child process)
  → node.invoke.result returns result
  → Gateway delivers result to Agent

Execution Security: A Three-Level Permission Model

A Node is not an unlimited shell executor. exec-approvals.ts implements a three-tier security model:

type ExecSecurity = "deny" | "allowlist" | "full";

deny: Reject all command execution
allowlist (default): Only allow commands in the exec-approvals.json whitelist
full: Allow all commands (for high-trust environments)

The whitelist file uses hashing to prevent race-condition modifications — both reads and updates require passing a baseHash. If the file has been changed by another process, the response is "INVALID_REQUEST: exec approvals changed; reload and retry". This prevents TOCTOU attacks.

There's also a special path: on macOS, when preferMacAppExecHost is true, execution requests are proxied through the macOS app's Exec Host (Unix socket) rather than calling spawn directly. This bypasses macOS sandbox restrictions that would otherwise block access to certain paths.

Output Caps

Command output is hard-capped:

const OUTPUT_CAP = 200_000;      // cumulative output cap (bytes)
const OUTPUT_EVENT_TAIL = 20_000; // trailing tail of output per event (bytes)

Output exceeding the cap is discarded, with truncated: true in the result. This ensures large outputs (like log files) don't blow up the Agent's context window.

2. Canvas: Rendering AI-Generated UIs on Mobile

Problem: Can AI Replies Be More Than Text?

AI is good at generating code — so why not generate an HTML dashboard and display it directly on the user's phone with full interactivity?

Canvas is a lightweight HTTP server built into the Gateway, mounted at /__openclaw__/canvas, designed specifically to serve AI-generated HTML/JS/CSS files:

// Directory structure
~/.openclaw/canvas/
  index.html   ← AI writes here
  app.js
  style.css

At Gateway startup, createCanvasHostHandler:

Creates a default index.html in ~/.openclaw/canvas/ (if it doesn't exist)
Watches the directory with chokidar
Broadcasts "reload" over WebSocket (/__openclaw__/ws) to all connected clients when files change

The live-reload script injected into every HTML page:

// automatically injected before </body> of every Canvas HTML page
const ws = new WebSocket("wss://host/__openclaw__/ws");
ws.onmessage = (ev) => {
  if (String(ev.data || "") === "reload") location.reload();
};

This means: AI rewrites index.html → chokidar detects the change → WebSocket broadcasts "reload" → the WebView on the user's phone auto-refreshes.

Filesystem Boundary

File serving has security constraints. resolveFileWithinRoot uses open(fd) + realpath to verify that every requested file path lies within the Canvas root directory — the same class of protection as openBoundaryFileSync in the Plugin SDK, defending against path traversal attacks.

3. A2UI: Two-Way Communication Between WebView and Agent

Problem: How Do Button Taps in Canvas Feed Back to the Agent?

An HTML page in Canvas can display data, but it has no way to communicate with OpenClaw on its own — it's an isolated page running inside an iOS/Android WebView.

A2UI (Agent-to-UI) solves this in the other direction: it provides a cross-platform JavaScript bridge API that lets code inside a WebView trigger OpenClaw Agent actions.

The A2UI bundle (src/canvas-host/a2ui/) is served from /__openclaw__/a2ui/, and a bootstrap script is automatically injected into every Canvas HTML page:

// Bridge script injected into every Canvas HTML page (simplified)
function postToNode(payload) {
  const raw = typeof payload === "string" ? payload : JSON.stringify(payload);

  // iOS bridge
  const iosHandler = globalThis.webkit?.messageHandlers?.openclawCanvasA2UIAction;
  if (iosHandler?.postMessage) {
    iosHandler.postMessage(raw);
    return true;
  }

  // Android bridge
  const androidHandler = globalThis.openclawCanvasA2UIAction;
  if (androidHandler?.postMessage) {
    androidHandler.postMessage(raw);
    return true;
  }

  return false;
}

// Public API for Canvas pages
globalThis.openclawSendUserAction = (userAction) => {
  const id = userAction.id || crypto.randomUUID();
  return postToNode({ userAction: { ...userAction, id } });
};

Code inside a Canvas page can trigger actions like this:

// User tapped the "Run Backup" button
window.openclawSendUserAction({
  name: "run_backup",
  surfaceId: "main",
  sourceComponentId: "backup.button",
  context: { target: "nas-01", compress: true }
});

This message travels:

Canvas JS → native MessageHandler (iOS/Android)
  → OpenClaw Node Host's node.event
  → Gateway
  → corresponding Agent session (as user input)
  → Agent decides next action

Action Result Delivery

After the Agent finishes processing an action, it can notify the Canvas page via window.dispatchEvent(new CustomEvent("openclaw:a2ui-action-status", { detail: { id, ok, error } })). This completes a full request-response loop.

4. ACP: Standardized Agent Interoperability Protocol

Problem: How Do External Tools Call OpenClaw?

@agentclientprotocol/sdk is the Agent Client Protocol implementation. OpenClaw uses it to expose a standard interface, letting any ACP-compatible tool interact with an OpenClaw Agent session.

// src/acp/server.ts
export async function serveAcpGateway(opts: AcpServerOptions): Promise<void> {
  // Listens on a local port, translating ACP requests into Gateway operations
  const agent = new AcpGatewayAgent(gateway);
  // Each ACP session maps to an OpenClaw session (sessionKey)
}

ACP sessions have two modes:

export const ACP_SPAWN_MODES = ["run", "session"] as const;
// "run"     → one-shot task: closes session after completion
// "session" → persistent session: kept alive after completion; subsequent
//             requests continue in the same context

This lets CI/CD pipelines, IDE plugins, or other AI tools use OpenClaw as a programmable AI backend, without needing to understand OpenClaw's internal protocol.

5. Sub-Agents: Parallel Task Decomposition

Problem: Serial Processing of Large Tasks Is Too Slow

A complex task (sorting 1,000 emails, analyzing 50 code files) processed serially by a single Agent is painfully slow. The sub-agent mechanism lets an Agent spawn independent sub-agents to execute sub-tasks in parallel.

// Agent uses the sessions.spawn tool to spawn a sub-agent
export async function spawnSubagentDirect(
  params: SpawnSubagentParams,
  ctx: SpawnSubagentContext,
): Promise<SpawnSubagentResult>

Core fields of SpawnSubagentParams:

type SpawnSubagentParams = {
  task: string;              // sub-task description (injected as sub-agent's first message)
  label?: string;            // human-readable label (for status display)
  agentId?: string;          // which agent config to use
  model?: string;            // sub-agent's model (can differ from parent)
  thinking?: string;         // sub-agent's thinking level
  runTimeoutSeconds?: number; // timeout control
  thread?: boolean;          // bind to chat thread (results go directly to chat)
  mode?: "run" | "session";  // one-shot vs persistent session
  cleanup?: "delete" | "keep"; // clean up session after completion?
};

Sub-Agent Lifecycle

Parent Agent calls sessions.spawn
  ↓
spawnSubagentDirect()
  → creates new SessionKey (format: agentId:session-xxxxxxxx)
  → queues in AGENT_LANE_SUBAGENT channel
  → registers with SubagentRegistry (tracks run state)
  ↓
Sub-Agent executes in its own Lane (parallel to parent)
  ↓
On completion: subagent-announce delivers result back to parent Agent
  → injected as a user message in the parent's session
  → parent Agent continues processing

"Don't poll" caveat: When a sub-agent completes, it auto-announces its result — the parent doesn't query for it. After spawning sub-tasks, the parent Agent should move on to other work, not wait in a loop. The result will be re-injected into the parent's context as a user message automatically.

// Sub-agent result announcement note (src/agents/subagent-spawn.ts)
export const SUBAGENT_SPAWN_ACCEPTED_NOTE =
  "auto-announces on completion, do not poll/sleep. The response will be sent back as an user message.";

Depth Limits

To prevent infinite recursion (sub-agents spawning sub-agents spawning sub-agents...), the system maintains a depth counter (subagent-depth.ts) with a default limit:

// src/config/agent-limits.ts
export const DEFAULT_SUBAGENT_MAX_SPAWN_DEPTH = 3;

Spawn requests that exceed this depth are rejected with a "forbidden" status.

Thread-Bound Sub-Agents

The thread: true parameter makes sub-agent results deliver directly to the current chat thread (rather than waiting for the parent Agent to forward them):

Parent Agent (in a Telegram conversation)
  → spawns sub-agent, thread: true
  → sub-agent completes, result sent directly to Telegram conversation
  → user sees sub-task output directly in Telegram

This feature depends on the subagent_spawning lifecycle hook — only channel plugins that implement this hook (chat platforms capable of thread binding) can use it.

Summary: Three "Boundary-Breaking" Mechanisms

Mechanism	Boundary Broken	Core Data Flow
Node Host	Execution boundary — AI reaches remote machines	Gateway → WebSocket → Node → `spawn` → result returned
Canvas + A2UI	UI boundary — AI output becomes interactive UI	AI writes file → chokidar → WebSocket → WebView reloads; user taps → native bridge → node.event → Agent
Sub-Agents	Concurrency boundary — AI decomposes tasks in parallel	Parent Agent spawns → Sub-Agent executes independently → announces result → re-injected into parent session

Together, these three mechanisms expand the capability boundaries of a "personal AI assistant": not just conversation, but a programmable execution engine + interactive UI host + multi-Agent collaboration system.

In the next and final article of this series, we'll cover OpenClaw's security model and sandbox — systematically mapping out Gateway authentication, tool policy, sandbox isolation, API key protection, and the trust boundary design of the entire system.

DEV Community