DEV Community

Agent Internals
Agent Internals

Posted on

How OpenClaw Serves HTTP, WebSocket, and 70+ Methods on a Single Port

From HTTP/WebSocket multi-protocol multiplexing to configuration hot-reload — a deep breakdown of how the Gateway starts and runs

Version baseline: OpenClaw 2026.3.2

Key Takeaways

  • The Gateway serves HTTP and WebSocket traffic on a single port (18789) with no reverse proxy required, using Node.js's native upgrade event to multiplex protocols.
  • Startup is a precisely ordered 48-step, 9-phase sequence -- from config migration and secrets resolution through to sidecar lifecycle management -- where each phase has clear failure semantics.
  • HTTP routing uses a stage pipeline with first-match-wins semantics: 13 ordered stages cover webhooks, OpenAI-compatible APIs, Canvas, plugins, and health probes, with zero overhead for disabled features.
  • Configuration hot-reload supports 4 modes (off/restart/hot/hybrid), with a recursive deep-diff engine and per-subsystem reload rules that allow most changes to take effect without a full Gateway restart.
  • Slow consumer protection on WebSocket broadcasts prevents a single laggy client from causing unbounded memory growth -- low-priority events are dropped, critical events force-disconnect the slow client.

Table of Contents

  1. Gateway Overview: Single Process, Single Port, Multi-Protocol
  2. Startup Sequence in Detail
  3. HTTP Routing Architecture
  4. WebSocket Protocol
  5. Configuration Hot-Reload
  6. Health Monitoring and Presence
  7. Global Architecture Overview

1. Gateway Overview: Single Process, Single Port, Multi-Protocol

The OpenClaw Gateway is the central hub of the entire system. It serves externally as a single process on a single port, carrying both HTTP and WebSocket protocols simultaneously, and routing multiple higher-level functional modules (Control UI, OpenAI-compatible API, Webhooks, plugins, Canvas, etc.) over the same connection infrastructure.

Default Port and Bind Modes

The Gateway listens on port 18789 by default. The port can be overridden at startup via the --port parameter or the OPENCLAW_GATEWAY_PORT environment variable. startGatewayServer() writes this variable immediately on startup (src/gateway/server.impl.ts:240):

// src/gateway/server.impl.ts:232-240
export async function startGatewayServer(
  port = 18789,
  opts: GatewayServerOptions = {},
): Promise<GatewayServer> {
  // ...
  // Ensure all default port derivations (browser/canvas) see the actual runtime port.
  process.env.OPENCLAW_GATEWAY_PORT = String(port);
Enter fullscreen mode Exit fullscreen mode

The bind mode is specified via the GatewayServerOptions.bind field (or gateway.bind in openclaw.json), with four options:

Bind Mode Bind Address Use Case
loopback 127.0.0.1 Local single-device, secure default
lan 0.0.0.0 Sharing across devices on a LAN
tailnet Tailscale IPv4 (100.64.0.0/10) Exposing via Tailscale network
auto Prefers loopback, falls back to LAN Auto-detection, suitable for general deployments

When the bind mode is non-loopback, createGatewayRuntimeState() proactively prints a security warning in the logs, prompting the user to ensure authentication is configured (src/gateway/server-runtime-state.ts):

⚠️  Gateway is binding to a non-loopback address.
    Ensure authentication is configured before exposing to public networks.
Enter fullscreen mode Exit fullscreen mode

Common CLI commands for starting the Gateway:

# Start with default port (loopback)
openclaw gateway run

# Bind to LAN, allowing local network connections
openclaw gateway run --bind lan --port 18789

# Run in the background (daemon mode)
nohup openclaw gateway run --bind loopback --port 18789 --force \
  > /tmp/openclaw-gateway.log 2>&1 &

# Check channel status (with probe)
openclaw channels status --probe

# Check health endpoint
curl http://127.0.0.1:18789/health
Enter fullscreen mode Exit fullscreen mode

How HTTP and WebSocket Share a Single Port

Node.js's http.Server emits an upgrade event to handle HTTP-to-WebSocket protocol upgrade requests. The Gateway listens for this event in attachGatewayUpgradeHandler() (src/gateway/server-http.ts, line 650), forwarding requests with the Upgrade: websocket header to the WebSocketServer (ws library), while regular HTTP requests are handled by the handleRequest() function. This allows HTTP and WebSocket traffic to naturally multiplex over the same port without requiring an additional reverse proxy.


2. Startup Sequence in Detail

The startGatewayServer() function is defined at src/gateway/server.impl.ts:232 and serves as the Gateway's main entry point. The entire startup process comprises approximately 48 steps, organized into nine phases.

Source location: src/gateway/server.impl.ts is approximately 1100 lines long, starting from the function signature at line 232 through to the end of the file where the GatewayServer object is returned. Core startup logic is concentrated in lines 250-750.

Startup Sequence ASCII Diagram

startGatewayServer(port=18789, opts)
│
├─ Phase 1: Config Pipeline
│   ├─ readConfigFileSnapshot()          Read raw openclaw.json snapshot
│   ├─ migrateLegacyConfig()             Detect legacy fields and auto-migrate
│   ├─ writeConfigFile()                 Write migration results back to disk
│   ├─ applyPluginAutoEnable()           Auto-enable qualifying plugins
│   └─ loadConfig()                      Load the final effective configuration
│
├─ Phase 2: Auth & Secrets
│   ├─ prepareSecretsRuntimeSnapshot()   Resolve SecretRefs, validate availability
│   ├─ ensureGatewayStartupAuth()        Ensure auth token is generated/saved
│   ├─ activateSecretsRuntimeSnapshot()  Activate secrets runtime snapshot
│   └─ maybeSeedControlUiAllowedOrigins  Seed CORS allowlist for non-loopback installs
│
├─ Phase 3: Registry & Methods
│   ├─ initSubagentRegistry()            Initialize subagent registry
│   ├─ listGatewayMethods()              Enumerate 73 core WS methods
│   ├─ loadGatewayPlugins()              Load plugins, merge plugin methods into gatewayMethods
│   └─ listChannelPlugins()              List built-in channel plugin gatewayMethods
│
├─ Phase 4: HTTP / WebSocket
│   ├─ loadGatewayTlsRuntime()           Load TLS certificates (optional)
│   ├─ createGatewayRuntimeState()       → {
│   │    ├─ createGatewayHttpServer()    Build HTTP(S) server (Node http/https)
│   │    ├─ WebSocketServer({ maxPayload: 25MB })
│   │    ├─ listenGatewayHttpServer()    Bind port and start accepting connections
│   │    └─ createGatewayBroadcaster()   Broadcaster (scope filtering + slow-consumer protection)
│   │   }
│   └─ attachGatewayUpgradeHandler()     Attach HTTP→WS upgrade handler
│
├─ Phase 5: Services
│   ├─ new NodeRegistry()                Node (Pi/mobile/desktop) registry
│   ├─ buildGatewayCronService()         Scheduled task service
│   ├─ createChannelManager()            Unified channel manager (Slack/Telegram/Discord...)
│   └─ startGatewayDiscovery()           mDNS/wide-area discovery broadcast
│
├─ Phase 6: Maintenance Timers
│   └─ startGatewayMaintenanceTimers()   → {
│        ├─ tick:    broadcast "tick" every 30s (keepalive)
│        ├─ health:  refreshGatewayHealthSnapshot() every 60s
│        └─ dedupe:  clean dedup cache + timeout-abort chat runs every 60s
│       }
│
├─ Phase 7: Event System
│   ├─ onAgentEvent(createAgentEventHandler())  Subscribe to agent runtime events
│   ├─ onHeartbeatEvent()                       Subscribe to heartbeat events → broadcast
│   └─ startHeartbeatRunner()                   Start heartbeat timer loop
│
├─ Phase 8: Sidecars
│   ├─ startGatewayTailscaleExposure()   Tailscale exposure (optional)
│   ├─ startGatewaySidecars() → {
│   │    ├─ startBrowserControlServer()  Browser control (optional)
│   │    ├─ startGmailWatcher()          Gmail hook watcher (optional)
│   │    ├─ startChannels()              Start all configured channels
│   │    └─ pluginServices.start()       Plugin service lifecycle
│   │   }
│   └─ hookRunner.runGatewayStart()      Fire gateway_start plugin hook
│
└─ Phase 9: Config Reloader
    └─ startGatewayConfigReloader()      chokidar watches openclaw.json
         ├─ stabilityThreshold: 200ms
         └─ debounce: 300ms (configurable)
Enter fullscreen mode Exit fullscreen mode

Key Function Details

Phase 1 — readConfigFileSnapshot(): Returns a ConfigFileSnapshot containing the raw JSON, Zod validation result, and migration detection result. If the configuration file is invalid, startGatewayServer() throws an exception with an error message that includes the specific field path and prompts the user to run openclaw doctor.

Phase 2 — activateRuntimeSecrets() (src/gateway/server.impl.ts:318-362): Uses a Promise chain (secretsActivationTail) internally to implement a serialization lock, ensuring that secrets activation does not race under concurrent hot-reload scenarios. If secrets resolution fails, the runtime retains the "last-known-good" snapshot and notifies the user via the SECRETS_RELOADER_DEGRADED system event:

// src/gateway/server.impl.ts:310-316 — Promise chain serialization lock
const runWithSecretsActivationLock = async <T>(operation: () => Promise<T>): Promise<T> => {
  const run = secretsActivationTail.then(operation, operation);
  secretsActivationTail = run.then(
    () => undefined,
    () => undefined,
  );
  return await run;
};
Enter fullscreen mode Exit fullscreen mode

Phase 3 — loadGatewayPlugins() (src/gateway/server.impl.ts:428-444): Loads all enabled plugins from disk, injects built-in handlers via coreGatewayHandlers, and merges plugin-provided gatewayMethods into the gatewayMethods array. Deduplication is performed via Array.from(new Set([...])):

// src/gateway/server.impl.ts:443-444
const channelMethods = listChannelPlugins().flatMap((plugin) => plugin.gatewayMethods ?? []);
const gatewayMethods = Array.from(new Set([...baseGatewayMethods, ...channelMethods]));
Enter fullscreen mode Exit fullscreen mode

Phase 4 — WebSocketServer({ maxPayload }): maxPayload is set to MAX_PAYLOAD_BYTES = 25 * 1024 * 1024 (25 MB), consistent with the client, to prevent unexpected disconnections during high-resolution Canvas snapshot transfers (see comments in src/gateway/server-constants.ts).

Phase 8 — startGatewaySidecars(): Defined in src/gateway/server-startup.ts, this function manages the lifecycle of "sidecar" processes including Browser Control, Gmail Watcher, channel startup, and plugin services. When the Gateway shuts down, all sidecars are stopped in an orderly fashion.


3. HTTP Routing Architecture

The HTTP request processing entry point is the internal function handleRequest() within createGatewayHttpServer(), defined at src/gateway/server-http.ts line 486.

Design Philosophy: Stage Pipeline, First Match Wins

handleRequest() wraps all routes into a GatewayHttpRequestStage[] array, trying each stage in sequence. Any stage returning true means "handled" and terminates the pipeline. The advantage of this design is that route priority is entirely determined by array order, eliminating the need for complex route-table matching logic and making priority changes easy to track in diffs.

HTTP request arrives
│
├─ [Stage 1]  hooks          POST /hooks/*  → Webhook entry point (with rate limiting)
├─ [Stage 2]  tools-invoke   POST /v1/tools/invoke → Direct tool invocation
├─ [Stage 3]  slack          Slack Events API / Interactive Components
├─ [Stage 4]  openresponses  POST /v1/responses (requires enabled=true)
├─ [Stage 5]  openai         POST /v1/chat/completions (requires enabled=true)
├─ [Stage 6]  canvas-auth    Canvas path auth pre-check
├─ [Stage 7]  a2ui           Canvas A2UI assets (/a2ui/*)
├─ [Stage 8]  canvas-http    Canvas WebSocket proxy + static assets
├─ [Stage 9]  plugin-auth    Plugin route auth pre-check
├─ [Stage 10] plugin-http    Plugin-registered custom HTTP routes
├─ [Stage 11] control-ui-avatar  /avatar/* proxy (controlUiEnabled only)
├─ [Stage 12] control-ui-http    Control UI SPA catch-all (controlUiEnabled only)
└─ [Stage 13] gateway-probes GET /health /healthz /ready /readyz
     └─ No match → 404 Not Found
Enter fullscreen mode Exit fullscreen mode

Each stage's run() returns boolean | Promise<boolean>, where true indicates the request has been consumed. runGatewayHttpRequestStages() sequentially awaits each stage and returns upon encountering true.

Probe Endpoints

Four health probe paths are handled uniformly by handleGatewayProbeRequest():

GET /health   → { ok: true, status: "live" }
GET /healthz  → { ok: true, status: "live" }
GET /ready    → { ok: true, status: "ready" }
GET /readyz   → { ok: true, status: "ready" }
Enter fullscreen mode Exit fullscreen mode

Only GET and HEAD methods are accepted; other methods return 405. Response headers include Cache-Control: no-store. These endpoints require no authentication and follow the standard Kubernetes liveness/readiness probe format.

Webhook Hook Rate Limiting

Defined at the top of src/gateway/server-http.ts:

const HOOK_AUTH_FAILURE_LIMIT = 20;
const HOOK_AUTH_FAILURE_WINDOW_MS = 60_000;
Enter fullscreen mode Exit fullscreen mode

This means more than 20 authentication failures within 60 seconds will trigger rate limiting for that client IP. The Hook rate limiter is scoped via AUTH_RATE_LIMIT_SCOPE_HOOK_AUTH, isolated from the WS authentication rate limiter, so the two do not interfere with each other.

openclaw.json Configuration Example

{
  "gateway": {
    "bind": "loopback",
    "http": {
      "endpoints": {
        "chatCompletions": {
          "enabled": true
        },
        "responses": {
          "enabled": false
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

After enabling the OpenAI-compatible API via openclaw config set gateway.http.endpoints.chatCompletions.enabled true, Stage 5 (openai) is added to the pipeline. If not enabled, the stage is never added to the array — zero overhead.


4. WebSocket Protocol

Frame Type Definitions

The WebSocket frame schema is defined in src/gateway/protocol/schema/frames.ts, built with @sinclair/typebox. There are three top-level frame types, discriminated by the type field (discriminated union):

Frame Type Direction Description
req Client → Server Method call request, carries id, method, params
res Server → Client Method response, carries id, ok, payload or error
event Server → Client Server broadcast, carries event, payload, seq, stateVersion
// RequestFrameSchema (src/gateway/protocol/schema/frames.ts)
{ type: "req", id: string, method: string, params?: unknown }

// ResponseFrameSchema
{ type: "res", id: string, ok: boolean, payload?: unknown, error?: ErrorShape }

// EventFrameSchema
{ type: "event", event: string, payload?: unknown, seq?: number, stateVersion?: StateVersion }
Enter fullscreen mode Exit fullscreen mode

The seq field in event frames is a monotonically increasing global sequence number (maintained by createGatewayBroadcaster()), allowing clients to detect dropped frames. stateVersion contains presence and health version numbers, enabling clients to determine whether they need to refresh their local cache.

Key Constants (src/gateway/server-constants.ts)

export const MAX_PAYLOAD_BYTES    = 25 * 1024 * 1024;  // 25 MB, max single-frame payload
export const MAX_BUFFERED_BYTES   = 50 * 1024 * 1024;  // 50 MB, per-connection send buffer cap
export const DEFAULT_HANDSHAKE_TIMEOUT_MS = 10_000;    // Handshake timeout: 10 seconds
export const TICK_INTERVAL_MS     = 30_000;             // Keepalive tick: every 30 seconds
export const HEALTH_REFRESH_INTERVAL_MS = 60_000;      // Health snapshot refresh: every 60 seconds
export const DEDUPE_TTL_MS        = 5 * 60_000;        // Dedup TTL: 5 minutes
Enter fullscreen mode Exit fullscreen mode

Connection Handshake Sequence

attachGatewayWsConnectionHandler() (src/gateway/server/ws-connection.ts, line 93) is the main handler for WebSocket connections. The handshake flow is as follows:

Client                          Gateway (server)
  │                                   │
  │──── TCP/TLS + HTTP Upgrade ──────>│
  │                                   │ wss.on("connection")
  │                                   │ connId = randomUUID()
  │                                   │ handshakeTimer = setTimeout(10s)
  │<─── event: connect.challenge ─────│  { nonce, ts }
  │                                   │
  │──── req: connect ────────────────>│  {
  │       (minProtocol, maxProtocol,  │    auth.token / auth.password,
  │        client info, auth, caps)   │    device / scopes
  │                                   │  }
  │                                   │ Validate token / rate limiter check
  │                                   │ Negotiate protocol version
  │                                   │ clearTimeout(handshakeTimer)
  │<─── res: hello-ok ────────────────│  {
  │       (protocol, server.connId,   │    features.methods[73+],
  │        features, snapshot,        │    features.events[],
  │        policy, canvasHostUrl)     │    policy.maxPayload,
  │                                   │    policy.tickIntervalMs,
  │                                   │    snapshot (presence+health)
  │                                   │  }
  │                                   │ clients.add(client)
  │                                   │ handshakeState = "connected"
  │                                   │
  │<══════ Normal bidirectional communication ═══════│
  │                                   │
  │   Every 30s:                      │
  │<─── event: tick ──────────────────│  { ts } (dropIfSlow=true)
  │                                   │
  │──── req: {method} ───────────────>│
  │<─── res: {id, ok, payload} ───────│
  │                                   │
  │<─── event: agent / health / ... ──│  (broadcast)
Enter fullscreen mode Exit fullscreen mode

After a successful handshake, the server lists all available methods and events in the features field of hello-ok. Clients should use this as the source of truth and should not hard-code method names.

Method Dispatch

The Gateway core has 100 built-in methods (including those contributed by channel plugins). The BASE_METHODS array is defined in src/gateway/server-methods-list.ts, covering:

  • Session management: sessions.list, sessions.preview, sessions.reset, sessions.compact, etc.
  • Channel operations: channels.status, channels.logout
  • Agent control: agent, agents.list, agents.create, agents.update, agents.delete
  • Models/tools: models.list, tools.catalog
  • Scheduled tasks: cron.list, cron.add, cron.run, etc.
  • Node pairing: node.pair.request, node.invoke, node.list
  • Device auth: device.pair.approve, device.token.rotate
  • Chat: chat.send, chat.abort, chat.history (WebChat native methods)
  • Skills: skills.install, skills.update, skills.bins
  • TTS: tts.convert, tts.providers, tts.setProvider

Plugins inject additional method handlers via pluginRegistry.gatewayHandlers; channel plugins contribute method names via plugin.gatewayMethods.

Slow Consumer Protection

The broadcast function broadcastInternal() (src/gateway/server-broadcast.ts) checks each client's socket.bufferedAmount before sending:

const slow = c.socket.bufferedAmount > MAX_BUFFERED_BYTES; // 50 MB
if (slow && opts?.dropIfSlow) {
  continue;          // Low-priority events like tick / heartbeat are simply dropped
}
if (slow) {
  c.socket.close(1008, "slow consumer");  // Critical events: forcefully disconnect slow client
  continue;
}
Enter fullscreen mode Exit fullscreen mode

This prevents a single slow-consuming client from causing unbounded memory growth in the Gateway process. dropIfSlow: true is used for keepalive events like tick and heartbeat where loss is acceptable, while business-critical events like agent and health force-disconnect slow clients.


5. Configuration Hot-Reload

Four Reload Modes

The hot-reload system is defined in src/gateway/config-reload.ts and src/gateway/config-reload-plan.ts, configured via gateway.reload.mode.

Default configuration (src/gateway/config-reload.ts:16-19):

// src/gateway/config-reload.ts:16-19
const DEFAULT_RELOAD_SETTINGS: GatewayReloadSettings = {
  mode: "hybrid",
  debounceMs: 300,
};
Enter fullscreen mode Exit fullscreen mode

The four modes:

Mode Behavior
off Completely disabled; file changes trigger no action
restart Any change triggers a full gateway restart
hot Only performs hot-reload; changes requiring restart are ignored (prints a warning log)
hybrid Default: attempts hot-reload first; triggers a restart if the change involves paths that require it

Chokidar File Watching

const watcher = chokidar.watch(opts.watchPath, {
  ignoreInitial: true,
  awaitWriteFinish: { stabilityThreshold: 200, pollInterval: 50 },  // File stability threshold 200ms
  usePolling: Boolean(process.env.VITEST),
});
Enter fullscreen mode Exit fullscreen mode

stabilityThreshold: 200 prevents multiple reloads when editors write files in multiple passes (e.g., VS Code's safe write mechanism). debounceMs (default 300ms) is read from configuration by resolveGatewayReloadSettings() and adds an additional delay after the chokidar event fires before performing the actual snapshot read.

diffConfigPaths: Recursive Deep Diff

diffConfigPaths(prev, next, prefix) (src/gateway/config-reload.ts:23-52) is a pure function that recursively compares two configuration objects and returns a list of changed dot-path strings:

diffConfigPaths(
  { gateway: { auth: { token: "old" } } },
  { gateway: { auth: { token: "new" } } },
)
// => ["gateway.auth.token"]
Enter fullscreen mode Exit fullscreen mode

For array types, it uses isDeepStrictEqual for whole-array comparison rather than per-element expansion, avoiding false-positive index paths (such as memory.qmd.paths.0) for fields like memory.qmd.paths.

GatewayReloadPlan: Per-Subsystem Restart Flags

buildGatewayReloadPlan(changedPaths) (src/gateway/config-reload-plan.ts:137) maps the diff results to a GatewayReloadPlan:

// src/gateway/config-reload-plan.ts:6-19
export type GatewayReloadPlan = {
  changedPaths: string[];
  restartGateway: boolean;        // true → requires full restart
  restartReasons: string[];       // Paths that triggered restart
  hotReasons: string[];           // Paths handled via hot-reload
  reloadHooks: boolean;           // Reload Webhook configuration
  restartGmailWatcher: boolean;   // Restart Gmail watcher
  restartBrowserControl: boolean; // Restart Browser Control service
  restartCron: boolean;           // Rebuild CronService
  restartHeartbeat: boolean;      // Rebuild HeartbeatRunner
  restartHealthMonitor: boolean;  // Rebuild ChannelHealthMonitor
  restartChannels: Set<ChannelKind>; // Set of channels that need restart
  noopPaths: string[];            // No-op paths (known safe to ignore)
};
Enter fullscreen mode Exit fullscreen mode

BASE_RELOAD_RULES: Prefix-Priority Matching Rules

The rule table (src/gateway/config-reload-plan.ts:36-70) uses a "prefix-priority, first-match" approach. matchRule(path) iterates through the rule list and returns the first rule where path === rule.prefix || path.startsWith(rule.prefix + "."):

// src/gateway/config-reload-plan.ts:36-70 — Key rules excerpt
const BASE_RELOAD_RULES: ReloadRule[] = [
  { prefix: "gateway.remote", kind: "none" },
  { prefix: "gateway.reload", kind: "none" },
  { prefix: "gateway.channelHealthCheckMinutes", kind: "hot", actions: ["restart-health-monitor"] },
  { prefix: "hooks.gmail", kind: "hot", actions: ["restart-gmail-watcher"] },
  { prefix: "hooks", kind: "hot", actions: ["reload-hooks"] },
  { prefix: "agents.defaults.heartbeat", kind: "hot", actions: ["restart-heartbeat"] },
  { prefix: "models", kind: "hot", actions: ["restart-heartbeat"] },
  { prefix: "cron", kind: "hot", actions: ["restart-cron"] },
  { prefix: "browser", kind: "hot", actions: ["restart-browser-control"] },
];
Enter fullscreen mode Exit fullscreen mode

Complete prefix matching map:

gateway.remote          → none    (remote config path, noop)
gateway.reload          → none    (reload's own config, prevents reload loop)
gateway.channelHealthTiming → hot + restart-health-monitor
hooks.gmail             → hot + restart-gmail-watcher
hooks                   → hot + reload-hooks
agents.defaults.heartbeat → hot + restart-heartbeat
models                  → hot + restart-heartbeat
cron                    → hot + restart-cron
browser                 → hot + restart-browser-control
plugins                 → restart  (plugin changes require restart)
gateway                 → restart  (other gateway config requires restart)
discovery               → restart
canvasHost              → restart
meta / identity / agents / tools / routing / ... → none (noop)
Enter fullscreen mode Exit fullscreen mode

Paths that do not match any rule default to restart (conservative principle).

Hybrid Mode Decision Tree

File change detected (chokidar)
│
├─ debounce 300ms
│
└─ runReload()
    ├─ readConfigFileSnapshot()
    ├─ handleMissingSnapshot()   → If missing, retry 2 times (150ms interval); if still missing, warn and skip
    ├─ handleInvalidSnapshot()   → If Zod validation fails, warn and skip
    └─ applySnapshot(nextConfig)
        ├─ diffConfigPaths(currentConfig, nextConfig)
        ├─ changedPaths.length == 0  → no-op
        ├─ mode == "off"             → log + no-op
        ├─ mode == "restart"         → queueRestart(plan, nextConfig)
        ├─ plan.restartGateway == true
        │   ├─ mode == "hot"         → warn log + ignore (no restart)
        │   └─ mode == "hybrid"      → queueRestart(plan, nextConfig)
        └─ plan.restartGateway == false
            └─ onHotReload(plan, nextConfig)  → applyHotReload() (atomically switch each subsystem)
Enter fullscreen mode Exit fullscreen mode

openclaw.json Hot-Reload Configuration Example

{
  "gateway": {
    "reload": {
      "mode": "hybrid",
      "debounceMs": 300
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Disable auto-reload (suitable for production with manual control):

{
  "gateway": {
    "reload": {
      "mode": "off"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Force all changes to go through a full restart (most conservative mode):

{
  "gateway": {
    "reload": {
      "mode": "restart"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

6. Health Monitoring and Presence

ChannelHealthMonitor Architecture

startChannelHealthMonitor() (src/gateway/channel-health-monitor.ts:77) implements periodic health checks for all configured channel accounts, automatically restarting unhealthy channel connections.

Default timing parameters (src/gateway/channel-health-monitor.ts:12-25):

// src/gateway/channel-health-monitor.ts:12-25
const DEFAULT_CHECK_INTERVAL_MS         = 5 * 60_000;    //  5 minutes (check interval)
const DEFAULT_MONITOR_STARTUP_GRACE_MS  = 60_000;        //  1 minute (monitor startup grace period)
const DEFAULT_COOLDOWN_CYCLES           = 2;             //  → cooldown = 2 × 5min = 10 minutes
const DEFAULT_MAX_RESTARTS_PER_HOUR     = 10;            //  Max 10 restarts per hour
const ONE_HOUR_MS = 60 * 60_000;

const DEFAULT_STALE_EVENT_THRESHOLD_MS  = 30 * 60_000;   // 30 minutes (no events → stale socket)
const DEFAULT_CHANNEL_CONNECT_GRACE_MS  = 120_000;       //  2 minutes (channel connect grace period)
Enter fullscreen mode Exit fullscreen mode

Where this function is called in the Gateway (src/gateway/server.impl.ts:694-699):

// src/gateway/server.impl.ts:694-699
let channelHealthMonitor = healthCheckDisabled
  ? null
  : startChannelHealthMonitor({
      channelManager,
      checkIntervalMs: (healthCheckMinutes ?? 5) * 60_000,
    });
Enter fullscreen mode Exit fullscreen mode

The "stale socket" scenario targets platforms like Slack where WebSocket connections can appear alive (health check passes) while the platform has silently stopped pushing events. By tracking the lastEventAt timestamp, a restart is triggered if no events are received for 30 minutes.

evaluateChannelHealth Decision Tree

runCheck() (every 5 minutes)
│
├─ now - startedAt < monitorStartupGraceMs (1min)  → skip all checks
│
└─ for each channelId / accountId in snapshot:
    ├─ isManuallyStopped?  → skip
    ├─ evaluateChannelHealth(status, policy)
    │   ├─ healthy == true  → skip
    │   └─ healthy == false
    │       ├─ Check cooldown: now - lastRestartAt <= cooldownMs (10min)  → skip
    │       ├─ pruneOldRestarts (sliding window, 1 hour)
    │       ├─ restartsThisHour >= maxRestartsPerHour (10)  → warn + skip
    │       └─ Execute restart:
    │           ├─ stopChannel(channelId, accountId)
    │           ├─ resetRestartAttempts()
    │           ├─ startChannel(channelId, accountId)
    │           └─ Update RestartRecord
Enter fullscreen mode Exit fullscreen mode

Restart records (RestartRecord) are stored in a Map<string, RestartRecord> keyed by "channelId:accountId", supporting independent counters for each account in multi-account scenarios.

Adjusting Timing via Configuration

resolveHealthTimingFromConfig() converts user-configured "minutes" into internal milliseconds:

{
  "gateway": {
    "channelHealthCheckMinutes": 10,
    "channelHealthTiming": {
      "startupGraceMinutes": 2,
      "connectGraceMinutes": 3,
      "staleEventThresholdMinutes": 60
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
  • channelHealthCheckMinutes: 0 completely disables health monitoring (healthCheckDisabled = true, no monitor instance is created).
  • The three fields in channelHealthTiming are independently configurable; unset fields use their default values.

This configuration change triggers the restart-health-monitor action, categorized as hot, meaning the Gateway does not need a full restart for it to take effect (when used with hybrid or hot mode):

{
  "gateway": {
    "reload": {
      "mode": "hybrid"
    },
    "channelHealthCheckMinutes": 10
  }
}
Enter fullscreen mode Exit fullscreen mode

Presence System

Presence works in conjunction with health monitoring. When clients connect or disconnect, upsertPresence() updates the online status, incrementPresenceVersion() increments the version number, and broadcastPresenceSnapshot() broadcasts the latest presence snapshot to all connected clients. The presence version number is sent alongside the event: tick frame (every 30 seconds) and all broadcast stateVersion fields, enabling clients to detect whether they need to request a refresh.


7. Global Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                    OpenClaw Gateway (single process, single port 18789) │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                   HTTP / TLS Layer (Node http/https)          │   │
│  │                                                             │   │
│  │  incoming request                                           │   │
│  │       │                                                     │   │
│  │  Upgrade: websocket? ──YES──→ attachGatewayUpgradeHandler   │   │
│  │       │                              │                      │   │
│  │       NO                             ↓                      │   │
│  │       ↓                       WebSocketServer               │   │
│  │  handleRequest()              (maxPayload=25MB)             │   │
│  │  Stage Pipeline:              attachGatewayWsConnectionHandler│ │
│  │  [hooks]                      ├─ connect.challenge          │   │
│  │  [tools-invoke]               ├─ handshake (10s timeout)    │   │
│  │  [slack]                      ├─ hello-ok (methods+events)  │   │
│  │  [openresponses]              └─ req/res/event frames       │   │
│  │  [openai]                                                   │   │
│  │  [canvas-auth]                                              │   │
│  │  [a2ui]                                                     │   │
│  │  [canvas-http]                                              │   │
│  │  [plugin-auth]                                              │   │
│  │  [plugin-http]                                              │   │
│  │  [control-ui-avatar]                                        │   │
│  │  [control-ui-http]                                          │   │
│  │  [gateway-probes]                                           │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │
│  │ NodeRegistry │  │ CronService  │  │   ChannelManager         │  │
│  │ (Pi/mobile/  │  │ (scheduled   │  │ Slack/Telegram/Discord/  │  │
│  │  desktop     │  │  tasks)      │  │ Signal/iMessage/...      │  │
│  │  nodes)      │  │              │  │                          │  │
│  └──────────────┘  └──────────────┘  └──────────┬───────────────┘  │
│                                                  │                  │
│  ┌────────────────────────────────┐   ┌──────────▼───────────────┐  │
│  │   Maintenance Timers           │   │  ChannelHealthMonitor    │  │
│  │  tick:   30s  keepalive        │   │  check: 5min             │  │
│  │  health: 60s  snapshot refresh │   │  stale: 30min            │  │
│  │  dedupe: 60s  cache cleanup    │   │  max restarts: 10/hr     │  │
│  └────────────────────────────────┘   └──────────────────────────┘  │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │               Config Reloader (chokidar)                     │   │
│  │  openclaw.json → diffConfigPaths → buildGatewayReloadPlan   │   │
│  │  stabilityThreshold: 200ms  debounce: 300ms                 │   │
│  │  mode: off / restart / hot / hybrid(default)                │   │
│  │                                                              │   │
│  │  hot reload:  hooks / heartbeat / cron / health-monitor      │   │
│  │               / browser-control / per-channel restart        │   │
│  │  restart:     gateway / plugins / discovery / canvasHost     │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  ┌────────────────────┐  ┌────────────────────┐                    │
│  │   Secrets Runtime  │  │  HeartbeatRunner   │                    │
│  │  activateSnapshot  │  │  (agent heartbeat  │                    │
│  │  last-known-good   │  │   loop)            │                    │
│  └────────────────────┘  └────────────────────┘                    │
│                                                                     │
│  Sidecars: BrowserControl · GmailWatcher · Tailscale · PluginSvcs  │
└─────────────────────────────────────────────────────────────────────┘
         ↑ WS clients: macOS app / iOS / Android / Web UI / Pi nodes
Enter fullscreen mode Exit fullscreen mode

Core Source File Quick Reference

Function Source File
Gateway main entry src/gateway/server.impl.ts
HTTP routing Stage Pipeline src/gateway/server-http.ts
WebSocket connection handshake src/gateway/server/ws-connection.ts
WS frame schema src/gateway/protocol/schema/frames.ts
WS broadcast + Slow Consumer src/gateway/server-broadcast.ts
Server constants (MAX_PAYLOAD, etc.) src/gateway/server-constants.ts
Runtime state initialization src/gateway/server-runtime-state.ts
Maintenance timers src/gateway/server-maintenance.ts
Configuration hot-reload src/gateway/config-reload.ts
Reload plan construction src/gateway/config-reload-plan.ts
Channel health monitoring src/gateway/channel-health-monitor.ts
Gateway method list src/gateway/server-methods-list.ts
Startup sidecars src/gateway/server-startup.ts

Summary

  • We traced the Gateway's single-process, single-port architecture and how Node.js's native upgrade event multiplexes HTTP and WebSocket traffic without a reverse proxy.
  • The 48-step, 9-phase startup sequence was examined end to end -- from config migration and secrets resolution through plugin loading, HTTP/WS server creation, sidecar lifecycle management, and the config reloader.
  • HTTP routing uses a stage pipeline with first-match-wins semantics across 13 ordered stages, where disabled features contribute zero overhead because their stages are never added to the array.
  • Configuration hot-reload was broken down across its four modes (off/restart/hot/hybrid), the recursive deep-diff engine, and the per-subsystem reload rules that determine which changes can be applied without a full restart.
  • Health monitoring and slow-consumer protection ensure operational resilience: channel health checks auto-restart unhealthy connections with cooldown and hourly caps, while broadcast backpressure either drops low-priority events or force-disconnects laggy WebSocket clients.

This article is based on source code analysis of OpenClaw 2026.3.2. All data comes from actual code, with no subjective inference.

--
*Originally published on Agent Internals

Top comments (0)