From HTTP/WebSocket multi-protocol multiplexing to configuration hot-reload — a deep breakdown of how the Gateway starts and runs
Version baseline: OpenClaw 2026.3.2
Key Takeaways
- The Gateway serves HTTP and WebSocket traffic on a single port (18789) with no reverse proxy required, using Node.js's native
upgradeevent to multiplex protocols. - Startup is a precisely ordered 48-step, 9-phase sequence -- from config migration and secrets resolution through to sidecar lifecycle management -- where each phase has clear failure semantics.
- HTTP routing uses a stage pipeline with first-match-wins semantics: 13 ordered stages cover webhooks, OpenAI-compatible APIs, Canvas, plugins, and health probes, with zero overhead for disabled features.
- Configuration hot-reload supports 4 modes (off/restart/hot/hybrid), with a recursive deep-diff engine and per-subsystem reload rules that allow most changes to take effect without a full Gateway restart.
- Slow consumer protection on WebSocket broadcasts prevents a single laggy client from causing unbounded memory growth -- low-priority events are dropped, critical events force-disconnect the slow client.
Table of Contents
- Gateway Overview: Single Process, Single Port, Multi-Protocol
- Startup Sequence in Detail
- HTTP Routing Architecture
- WebSocket Protocol
- Configuration Hot-Reload
- Health Monitoring and Presence
- Global Architecture Overview
1. Gateway Overview: Single Process, Single Port, Multi-Protocol
The OpenClaw Gateway is the central hub of the entire system. It serves externally as a single process on a single port, carrying both HTTP and WebSocket protocols simultaneously, and routing multiple higher-level functional modules (Control UI, OpenAI-compatible API, Webhooks, plugins, Canvas, etc.) over the same connection infrastructure.
Default Port and Bind Modes
The Gateway listens on port 18789 by default. The port can be overridden at startup via the --port parameter or the OPENCLAW_GATEWAY_PORT environment variable. startGatewayServer() writes this variable immediately on startup (src/gateway/server.impl.ts:240):
// src/gateway/server.impl.ts:232-240
export async function startGatewayServer(
port = 18789,
opts: GatewayServerOptions = {},
): Promise<GatewayServer> {
// ...
// Ensure all default port derivations (browser/canvas) see the actual runtime port.
process.env.OPENCLAW_GATEWAY_PORT = String(port);
The bind mode is specified via the GatewayServerOptions.bind field (or gateway.bind in openclaw.json), with four options:
| Bind Mode | Bind Address | Use Case |
|---|---|---|
loopback |
127.0.0.1 |
Local single-device, secure default |
lan |
0.0.0.0 |
Sharing across devices on a LAN |
tailnet |
Tailscale IPv4 (100.64.0.0/10) |
Exposing via Tailscale network |
auto |
Prefers loopback, falls back to LAN | Auto-detection, suitable for general deployments |
When the bind mode is non-loopback, createGatewayRuntimeState() proactively prints a security warning in the logs, prompting the user to ensure authentication is configured (src/gateway/server-runtime-state.ts):
⚠️ Gateway is binding to a non-loopback address.
Ensure authentication is configured before exposing to public networks.
Common CLI commands for starting the Gateway:
# Start with default port (loopback)
openclaw gateway run
# Bind to LAN, allowing local network connections
openclaw gateway run --bind lan --port 18789
# Run in the background (daemon mode)
nohup openclaw gateway run --bind loopback --port 18789 --force \
> /tmp/openclaw-gateway.log 2>&1 &
# Check channel status (with probe)
openclaw channels status --probe
# Check health endpoint
curl http://127.0.0.1:18789/health
How HTTP and WebSocket Share a Single Port
Node.js's http.Server emits an upgrade event to handle HTTP-to-WebSocket protocol upgrade requests. The Gateway listens for this event in attachGatewayUpgradeHandler() (src/gateway/server-http.ts, line 650), forwarding requests with the Upgrade: websocket header to the WebSocketServer (ws library), while regular HTTP requests are handled by the handleRequest() function. This allows HTTP and WebSocket traffic to naturally multiplex over the same port without requiring an additional reverse proxy.
2. Startup Sequence in Detail
The startGatewayServer() function is defined at src/gateway/server.impl.ts:232 and serves as the Gateway's main entry point. The entire startup process comprises approximately 48 steps, organized into nine phases.
Source location:
src/gateway/server.impl.tsis approximately 1100 lines long, starting from the function signature at line 232 through to the end of the file where theGatewayServerobject is returned. Core startup logic is concentrated in lines 250-750.
Startup Sequence ASCII Diagram
startGatewayServer(port=18789, opts)
│
├─ Phase 1: Config Pipeline
│ ├─ readConfigFileSnapshot() Read raw openclaw.json snapshot
│ ├─ migrateLegacyConfig() Detect legacy fields and auto-migrate
│ ├─ writeConfigFile() Write migration results back to disk
│ ├─ applyPluginAutoEnable() Auto-enable qualifying plugins
│ └─ loadConfig() Load the final effective configuration
│
├─ Phase 2: Auth & Secrets
│ ├─ prepareSecretsRuntimeSnapshot() Resolve SecretRefs, validate availability
│ ├─ ensureGatewayStartupAuth() Ensure auth token is generated/saved
│ ├─ activateSecretsRuntimeSnapshot() Activate secrets runtime snapshot
│ └─ maybeSeedControlUiAllowedOrigins Seed CORS allowlist for non-loopback installs
│
├─ Phase 3: Registry & Methods
│ ├─ initSubagentRegistry() Initialize subagent registry
│ ├─ listGatewayMethods() Enumerate 73 core WS methods
│ ├─ loadGatewayPlugins() Load plugins, merge plugin methods into gatewayMethods
│ └─ listChannelPlugins() List built-in channel plugin gatewayMethods
│
├─ Phase 4: HTTP / WebSocket
│ ├─ loadGatewayTlsRuntime() Load TLS certificates (optional)
│ ├─ createGatewayRuntimeState() → {
│ │ ├─ createGatewayHttpServer() Build HTTP(S) server (Node http/https)
│ │ ├─ WebSocketServer({ maxPayload: 25MB })
│ │ ├─ listenGatewayHttpServer() Bind port and start accepting connections
│ │ └─ createGatewayBroadcaster() Broadcaster (scope filtering + slow-consumer protection)
│ │ }
│ └─ attachGatewayUpgradeHandler() Attach HTTP→WS upgrade handler
│
├─ Phase 5: Services
│ ├─ new NodeRegistry() Node (Pi/mobile/desktop) registry
│ ├─ buildGatewayCronService() Scheduled task service
│ ├─ createChannelManager() Unified channel manager (Slack/Telegram/Discord...)
│ └─ startGatewayDiscovery() mDNS/wide-area discovery broadcast
│
├─ Phase 6: Maintenance Timers
│ └─ startGatewayMaintenanceTimers() → {
│ ├─ tick: broadcast "tick" every 30s (keepalive)
│ ├─ health: refreshGatewayHealthSnapshot() every 60s
│ └─ dedupe: clean dedup cache + timeout-abort chat runs every 60s
│ }
│
├─ Phase 7: Event System
│ ├─ onAgentEvent(createAgentEventHandler()) Subscribe to agent runtime events
│ ├─ onHeartbeatEvent() Subscribe to heartbeat events → broadcast
│ └─ startHeartbeatRunner() Start heartbeat timer loop
│
├─ Phase 8: Sidecars
│ ├─ startGatewayTailscaleExposure() Tailscale exposure (optional)
│ ├─ startGatewaySidecars() → {
│ │ ├─ startBrowserControlServer() Browser control (optional)
│ │ ├─ startGmailWatcher() Gmail hook watcher (optional)
│ │ ├─ startChannels() Start all configured channels
│ │ └─ pluginServices.start() Plugin service lifecycle
│ │ }
│ └─ hookRunner.runGatewayStart() Fire gateway_start plugin hook
│
└─ Phase 9: Config Reloader
└─ startGatewayConfigReloader() chokidar watches openclaw.json
├─ stabilityThreshold: 200ms
└─ debounce: 300ms (configurable)
Key Function Details
Phase 1 — readConfigFileSnapshot(): Returns a ConfigFileSnapshot containing the raw JSON, Zod validation result, and migration detection result. If the configuration file is invalid, startGatewayServer() throws an exception with an error message that includes the specific field path and prompts the user to run openclaw doctor.
Phase 2 — activateRuntimeSecrets() (src/gateway/server.impl.ts:318-362): Uses a Promise chain (secretsActivationTail) internally to implement a serialization lock, ensuring that secrets activation does not race under concurrent hot-reload scenarios. If secrets resolution fails, the runtime retains the "last-known-good" snapshot and notifies the user via the SECRETS_RELOADER_DEGRADED system event:
// src/gateway/server.impl.ts:310-316 — Promise chain serialization lock
const runWithSecretsActivationLock = async <T>(operation: () => Promise<T>): Promise<T> => {
const run = secretsActivationTail.then(operation, operation);
secretsActivationTail = run.then(
() => undefined,
() => undefined,
);
return await run;
};
Phase 3 — loadGatewayPlugins() (src/gateway/server.impl.ts:428-444): Loads all enabled plugins from disk, injects built-in handlers via coreGatewayHandlers, and merges plugin-provided gatewayMethods into the gatewayMethods array. Deduplication is performed via Array.from(new Set([...])):
// src/gateway/server.impl.ts:443-444
const channelMethods = listChannelPlugins().flatMap((plugin) => plugin.gatewayMethods ?? []);
const gatewayMethods = Array.from(new Set([...baseGatewayMethods, ...channelMethods]));
Phase 4 — WebSocketServer({ maxPayload }): maxPayload is set to MAX_PAYLOAD_BYTES = 25 * 1024 * 1024 (25 MB), consistent with the client, to prevent unexpected disconnections during high-resolution Canvas snapshot transfers (see comments in src/gateway/server-constants.ts).
Phase 8 — startGatewaySidecars(): Defined in src/gateway/server-startup.ts, this function manages the lifecycle of "sidecar" processes including Browser Control, Gmail Watcher, channel startup, and plugin services. When the Gateway shuts down, all sidecars are stopped in an orderly fashion.
3. HTTP Routing Architecture
The HTTP request processing entry point is the internal function handleRequest() within createGatewayHttpServer(), defined at src/gateway/server-http.ts line 486.
Design Philosophy: Stage Pipeline, First Match Wins
handleRequest() wraps all routes into a GatewayHttpRequestStage[] array, trying each stage in sequence. Any stage returning true means "handled" and terminates the pipeline. The advantage of this design is that route priority is entirely determined by array order, eliminating the need for complex route-table matching logic and making priority changes easy to track in diffs.
HTTP request arrives
│
├─ [Stage 1] hooks POST /hooks/* → Webhook entry point (with rate limiting)
├─ [Stage 2] tools-invoke POST /v1/tools/invoke → Direct tool invocation
├─ [Stage 3] slack Slack Events API / Interactive Components
├─ [Stage 4] openresponses POST /v1/responses (requires enabled=true)
├─ [Stage 5] openai POST /v1/chat/completions (requires enabled=true)
├─ [Stage 6] canvas-auth Canvas path auth pre-check
├─ [Stage 7] a2ui Canvas A2UI assets (/a2ui/*)
├─ [Stage 8] canvas-http Canvas WebSocket proxy + static assets
├─ [Stage 9] plugin-auth Plugin route auth pre-check
├─ [Stage 10] plugin-http Plugin-registered custom HTTP routes
├─ [Stage 11] control-ui-avatar /avatar/* proxy (controlUiEnabled only)
├─ [Stage 12] control-ui-http Control UI SPA catch-all (controlUiEnabled only)
└─ [Stage 13] gateway-probes GET /health /healthz /ready /readyz
└─ No match → 404 Not Found
Each stage's run() returns boolean | Promise<boolean>, where true indicates the request has been consumed. runGatewayHttpRequestStages() sequentially awaits each stage and returns upon encountering true.
Probe Endpoints
Four health probe paths are handled uniformly by handleGatewayProbeRequest():
GET /health → { ok: true, status: "live" }
GET /healthz → { ok: true, status: "live" }
GET /ready → { ok: true, status: "ready" }
GET /readyz → { ok: true, status: "ready" }
Only GET and HEAD methods are accepted; other methods return 405. Response headers include Cache-Control: no-store. These endpoints require no authentication and follow the standard Kubernetes liveness/readiness probe format.
Webhook Hook Rate Limiting
Defined at the top of src/gateway/server-http.ts:
const HOOK_AUTH_FAILURE_LIMIT = 20;
const HOOK_AUTH_FAILURE_WINDOW_MS = 60_000;
This means more than 20 authentication failures within 60 seconds will trigger rate limiting for that client IP. The Hook rate limiter is scoped via AUTH_RATE_LIMIT_SCOPE_HOOK_AUTH, isolated from the WS authentication rate limiter, so the two do not interfere with each other.
openclaw.json Configuration Example
{
"gateway": {
"bind": "loopback",
"http": {
"endpoints": {
"chatCompletions": {
"enabled": true
},
"responses": {
"enabled": false
}
}
}
}
}
After enabling the OpenAI-compatible API via openclaw config set gateway.http.endpoints.chatCompletions.enabled true, Stage 5 (openai) is added to the pipeline. If not enabled, the stage is never added to the array — zero overhead.
4. WebSocket Protocol
Frame Type Definitions
The WebSocket frame schema is defined in src/gateway/protocol/schema/frames.ts, built with @sinclair/typebox. There are three top-level frame types, discriminated by the type field (discriminated union):
| Frame Type | Direction | Description |
|---|---|---|
req |
Client → Server | Method call request, carries id, method, params
|
res |
Server → Client | Method response, carries id, ok, payload or error
|
event |
Server → Client | Server broadcast, carries event, payload, seq, stateVersion
|
// RequestFrameSchema (src/gateway/protocol/schema/frames.ts)
{ type: "req", id: string, method: string, params?: unknown }
// ResponseFrameSchema
{ type: "res", id: string, ok: boolean, payload?: unknown, error?: ErrorShape }
// EventFrameSchema
{ type: "event", event: string, payload?: unknown, seq?: number, stateVersion?: StateVersion }
The seq field in event frames is a monotonically increasing global sequence number (maintained by createGatewayBroadcaster()), allowing clients to detect dropped frames. stateVersion contains presence and health version numbers, enabling clients to determine whether they need to refresh their local cache.
Key Constants (src/gateway/server-constants.ts)
export const MAX_PAYLOAD_BYTES = 25 * 1024 * 1024; // 25 MB, max single-frame payload
export const MAX_BUFFERED_BYTES = 50 * 1024 * 1024; // 50 MB, per-connection send buffer cap
export const DEFAULT_HANDSHAKE_TIMEOUT_MS = 10_000; // Handshake timeout: 10 seconds
export const TICK_INTERVAL_MS = 30_000; // Keepalive tick: every 30 seconds
export const HEALTH_REFRESH_INTERVAL_MS = 60_000; // Health snapshot refresh: every 60 seconds
export const DEDUPE_TTL_MS = 5 * 60_000; // Dedup TTL: 5 minutes
Connection Handshake Sequence
attachGatewayWsConnectionHandler() (src/gateway/server/ws-connection.ts, line 93) is the main handler for WebSocket connections. The handshake flow is as follows:
Client Gateway (server)
│ │
│──── TCP/TLS + HTTP Upgrade ──────>│
│ │ wss.on("connection")
│ │ connId = randomUUID()
│ │ handshakeTimer = setTimeout(10s)
│<─── event: connect.challenge ─────│ { nonce, ts }
│ │
│──── req: connect ────────────────>│ {
│ (minProtocol, maxProtocol, │ auth.token / auth.password,
│ client info, auth, caps) │ device / scopes
│ │ }
│ │ Validate token / rate limiter check
│ │ Negotiate protocol version
│ │ clearTimeout(handshakeTimer)
│<─── res: hello-ok ────────────────│ {
│ (protocol, server.connId, │ features.methods[73+],
│ features, snapshot, │ features.events[],
│ policy, canvasHostUrl) │ policy.maxPayload,
│ │ policy.tickIntervalMs,
│ │ snapshot (presence+health)
│ │ }
│ │ clients.add(client)
│ │ handshakeState = "connected"
│ │
│<══════ Normal bidirectional communication ═══════│
│ │
│ Every 30s: │
│<─── event: tick ──────────────────│ { ts } (dropIfSlow=true)
│ │
│──── req: {method} ───────────────>│
│<─── res: {id, ok, payload} ───────│
│ │
│<─── event: agent / health / ... ──│ (broadcast)
After a successful handshake, the server lists all available methods and events in the features field of hello-ok. Clients should use this as the source of truth and should not hard-code method names.
Method Dispatch
The Gateway core has 100 built-in methods (including those contributed by channel plugins). The BASE_METHODS array is defined in src/gateway/server-methods-list.ts, covering:
-
Session management:
sessions.list,sessions.preview,sessions.reset,sessions.compact, etc. -
Channel operations:
channels.status,channels.logout -
Agent control:
agent,agents.list,agents.create,agents.update,agents.delete -
Models/tools:
models.list,tools.catalog -
Scheduled tasks:
cron.list,cron.add,cron.run, etc. -
Node pairing:
node.pair.request,node.invoke,node.list -
Device auth:
device.pair.approve,device.token.rotate -
Chat:
chat.send,chat.abort,chat.history(WebChat native methods) -
Skills:
skills.install,skills.update,skills.bins -
TTS:
tts.convert,tts.providers,tts.setProvider
Plugins inject additional method handlers via pluginRegistry.gatewayHandlers; channel plugins contribute method names via plugin.gatewayMethods.
Slow Consumer Protection
The broadcast function broadcastInternal() (src/gateway/server-broadcast.ts) checks each client's socket.bufferedAmount before sending:
const slow = c.socket.bufferedAmount > MAX_BUFFERED_BYTES; // 50 MB
if (slow && opts?.dropIfSlow) {
continue; // Low-priority events like tick / heartbeat are simply dropped
}
if (slow) {
c.socket.close(1008, "slow consumer"); // Critical events: forcefully disconnect slow client
continue;
}
This prevents a single slow-consuming client from causing unbounded memory growth in the Gateway process. dropIfSlow: true is used for keepalive events like tick and heartbeat where loss is acceptable, while business-critical events like agent and health force-disconnect slow clients.
5. Configuration Hot-Reload
Four Reload Modes
The hot-reload system is defined in src/gateway/config-reload.ts and src/gateway/config-reload-plan.ts, configured via gateway.reload.mode.
Default configuration (src/gateway/config-reload.ts:16-19):
// src/gateway/config-reload.ts:16-19
const DEFAULT_RELOAD_SETTINGS: GatewayReloadSettings = {
mode: "hybrid",
debounceMs: 300,
};
The four modes:
| Mode | Behavior |
|---|---|
off |
Completely disabled; file changes trigger no action |
restart |
Any change triggers a full gateway restart |
hot |
Only performs hot-reload; changes requiring restart are ignored (prints a warning log) |
hybrid |
Default: attempts hot-reload first; triggers a restart if the change involves paths that require it |
Chokidar File Watching
const watcher = chokidar.watch(opts.watchPath, {
ignoreInitial: true,
awaitWriteFinish: { stabilityThreshold: 200, pollInterval: 50 }, // File stability threshold 200ms
usePolling: Boolean(process.env.VITEST),
});
stabilityThreshold: 200 prevents multiple reloads when editors write files in multiple passes (e.g., VS Code's safe write mechanism). debounceMs (default 300ms) is read from configuration by resolveGatewayReloadSettings() and adds an additional delay after the chokidar event fires before performing the actual snapshot read.
diffConfigPaths: Recursive Deep Diff
diffConfigPaths(prev, next, prefix) (src/gateway/config-reload.ts:23-52) is a pure function that recursively compares two configuration objects and returns a list of changed dot-path strings:
diffConfigPaths(
{ gateway: { auth: { token: "old" } } },
{ gateway: { auth: { token: "new" } } },
)
// => ["gateway.auth.token"]
For array types, it uses isDeepStrictEqual for whole-array comparison rather than per-element expansion, avoiding false-positive index paths (such as memory.qmd.paths.0) for fields like memory.qmd.paths.
GatewayReloadPlan: Per-Subsystem Restart Flags
buildGatewayReloadPlan(changedPaths) (src/gateway/config-reload-plan.ts:137) maps the diff results to a GatewayReloadPlan:
// src/gateway/config-reload-plan.ts:6-19
export type GatewayReloadPlan = {
changedPaths: string[];
restartGateway: boolean; // true → requires full restart
restartReasons: string[]; // Paths that triggered restart
hotReasons: string[]; // Paths handled via hot-reload
reloadHooks: boolean; // Reload Webhook configuration
restartGmailWatcher: boolean; // Restart Gmail watcher
restartBrowserControl: boolean; // Restart Browser Control service
restartCron: boolean; // Rebuild CronService
restartHeartbeat: boolean; // Rebuild HeartbeatRunner
restartHealthMonitor: boolean; // Rebuild ChannelHealthMonitor
restartChannels: Set<ChannelKind>; // Set of channels that need restart
noopPaths: string[]; // No-op paths (known safe to ignore)
};
BASE_RELOAD_RULES: Prefix-Priority Matching Rules
The rule table (src/gateway/config-reload-plan.ts:36-70) uses a "prefix-priority, first-match" approach. matchRule(path) iterates through the rule list and returns the first rule where path === rule.prefix || path.startsWith(rule.prefix + "."):
// src/gateway/config-reload-plan.ts:36-70 — Key rules excerpt
const BASE_RELOAD_RULES: ReloadRule[] = [
{ prefix: "gateway.remote", kind: "none" },
{ prefix: "gateway.reload", kind: "none" },
{ prefix: "gateway.channelHealthCheckMinutes", kind: "hot", actions: ["restart-health-monitor"] },
{ prefix: "hooks.gmail", kind: "hot", actions: ["restart-gmail-watcher"] },
{ prefix: "hooks", kind: "hot", actions: ["reload-hooks"] },
{ prefix: "agents.defaults.heartbeat", kind: "hot", actions: ["restart-heartbeat"] },
{ prefix: "models", kind: "hot", actions: ["restart-heartbeat"] },
{ prefix: "cron", kind: "hot", actions: ["restart-cron"] },
{ prefix: "browser", kind: "hot", actions: ["restart-browser-control"] },
];
Complete prefix matching map:
gateway.remote → none (remote config path, noop)
gateway.reload → none (reload's own config, prevents reload loop)
gateway.channelHealthTiming → hot + restart-health-monitor
hooks.gmail → hot + restart-gmail-watcher
hooks → hot + reload-hooks
agents.defaults.heartbeat → hot + restart-heartbeat
models → hot + restart-heartbeat
cron → hot + restart-cron
browser → hot + restart-browser-control
plugins → restart (plugin changes require restart)
gateway → restart (other gateway config requires restart)
discovery → restart
canvasHost → restart
meta / identity / agents / tools / routing / ... → none (noop)
Paths that do not match any rule default to restart (conservative principle).
Hybrid Mode Decision Tree
File change detected (chokidar)
│
├─ debounce 300ms
│
└─ runReload()
├─ readConfigFileSnapshot()
├─ handleMissingSnapshot() → If missing, retry 2 times (150ms interval); if still missing, warn and skip
├─ handleInvalidSnapshot() → If Zod validation fails, warn and skip
└─ applySnapshot(nextConfig)
├─ diffConfigPaths(currentConfig, nextConfig)
├─ changedPaths.length == 0 → no-op
├─ mode == "off" → log + no-op
├─ mode == "restart" → queueRestart(plan, nextConfig)
├─ plan.restartGateway == true
│ ├─ mode == "hot" → warn log + ignore (no restart)
│ └─ mode == "hybrid" → queueRestart(plan, nextConfig)
└─ plan.restartGateway == false
└─ onHotReload(plan, nextConfig) → applyHotReload() (atomically switch each subsystem)
openclaw.json Hot-Reload Configuration Example
{
"gateway": {
"reload": {
"mode": "hybrid",
"debounceMs": 300
}
}
}
Disable auto-reload (suitable for production with manual control):
{
"gateway": {
"reload": {
"mode": "off"
}
}
}
Force all changes to go through a full restart (most conservative mode):
{
"gateway": {
"reload": {
"mode": "restart"
}
}
}
6. Health Monitoring and Presence
ChannelHealthMonitor Architecture
startChannelHealthMonitor() (src/gateway/channel-health-monitor.ts:77) implements periodic health checks for all configured channel accounts, automatically restarting unhealthy channel connections.
Default timing parameters (src/gateway/channel-health-monitor.ts:12-25):
// src/gateway/channel-health-monitor.ts:12-25
const DEFAULT_CHECK_INTERVAL_MS = 5 * 60_000; // 5 minutes (check interval)
const DEFAULT_MONITOR_STARTUP_GRACE_MS = 60_000; // 1 minute (monitor startup grace period)
const DEFAULT_COOLDOWN_CYCLES = 2; // → cooldown = 2 × 5min = 10 minutes
const DEFAULT_MAX_RESTARTS_PER_HOUR = 10; // Max 10 restarts per hour
const ONE_HOUR_MS = 60 * 60_000;
const DEFAULT_STALE_EVENT_THRESHOLD_MS = 30 * 60_000; // 30 minutes (no events → stale socket)
const DEFAULT_CHANNEL_CONNECT_GRACE_MS = 120_000; // 2 minutes (channel connect grace period)
Where this function is called in the Gateway (src/gateway/server.impl.ts:694-699):
// src/gateway/server.impl.ts:694-699
let channelHealthMonitor = healthCheckDisabled
? null
: startChannelHealthMonitor({
channelManager,
checkIntervalMs: (healthCheckMinutes ?? 5) * 60_000,
});
The "stale socket" scenario targets platforms like Slack where WebSocket connections can appear alive (health check passes) while the platform has silently stopped pushing events. By tracking the lastEventAt timestamp, a restart is triggered if no events are received for 30 minutes.
evaluateChannelHealth Decision Tree
runCheck() (every 5 minutes)
│
├─ now - startedAt < monitorStartupGraceMs (1min) → skip all checks
│
└─ for each channelId / accountId in snapshot:
├─ isManuallyStopped? → skip
├─ evaluateChannelHealth(status, policy)
│ ├─ healthy == true → skip
│ └─ healthy == false
│ ├─ Check cooldown: now - lastRestartAt <= cooldownMs (10min) → skip
│ ├─ pruneOldRestarts (sliding window, 1 hour)
│ ├─ restartsThisHour >= maxRestartsPerHour (10) → warn + skip
│ └─ Execute restart:
│ ├─ stopChannel(channelId, accountId)
│ ├─ resetRestartAttempts()
│ ├─ startChannel(channelId, accountId)
│ └─ Update RestartRecord
Restart records (RestartRecord) are stored in a Map<string, RestartRecord> keyed by "channelId:accountId", supporting independent counters for each account in multi-account scenarios.
Adjusting Timing via Configuration
resolveHealthTimingFromConfig() converts user-configured "minutes" into internal milliseconds:
{
"gateway": {
"channelHealthCheckMinutes": 10,
"channelHealthTiming": {
"startupGraceMinutes": 2,
"connectGraceMinutes": 3,
"staleEventThresholdMinutes": 60
}
}
}
-
channelHealthCheckMinutes: 0completely disables health monitoring (healthCheckDisabled = true, no monitor instance is created). - The three fields in
channelHealthTimingare independently configurable; unset fields use their default values.
This configuration change triggers the restart-health-monitor action, categorized as hot, meaning the Gateway does not need a full restart for it to take effect (when used with hybrid or hot mode):
{
"gateway": {
"reload": {
"mode": "hybrid"
},
"channelHealthCheckMinutes": 10
}
}
Presence System
Presence works in conjunction with health monitoring. When clients connect or disconnect, upsertPresence() updates the online status, incrementPresenceVersion() increments the version number, and broadcastPresenceSnapshot() broadcasts the latest presence snapshot to all connected clients. The presence version number is sent alongside the event: tick frame (every 30 seconds) and all broadcast stateVersion fields, enabling clients to detect whether they need to request a refresh.
7. Global Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ OpenClaw Gateway (single process, single port 18789) │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ HTTP / TLS Layer (Node http/https) │ │
│ │ │ │
│ │ incoming request │ │
│ │ │ │ │
│ │ Upgrade: websocket? ──YES──→ attachGatewayUpgradeHandler │ │
│ │ │ │ │ │
│ │ NO ↓ │ │
│ │ ↓ WebSocketServer │ │
│ │ handleRequest() (maxPayload=25MB) │ │
│ │ Stage Pipeline: attachGatewayWsConnectionHandler│ │
│ │ [hooks] ├─ connect.challenge │ │
│ │ [tools-invoke] ├─ handshake (10s timeout) │ │
│ │ [slack] ├─ hello-ok (methods+events) │ │
│ │ [openresponses] └─ req/res/event frames │ │
│ │ [openai] │ │
│ │ [canvas-auth] │ │
│ │ [a2ui] │ │
│ │ [canvas-http] │ │
│ │ [plugin-auth] │ │
│ │ [plugin-http] │ │
│ │ [control-ui-avatar] │ │
│ │ [control-ui-http] │ │
│ │ [gateway-probes] │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ NodeRegistry │ │ CronService │ │ ChannelManager │ │
│ │ (Pi/mobile/ │ │ (scheduled │ │ Slack/Telegram/Discord/ │ │
│ │ desktop │ │ tasks) │ │ Signal/iMessage/... │ │
│ │ nodes) │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────┬───────────────┘ │
│ │ │
│ ┌────────────────────────────────┐ ┌──────────▼───────────────┐ │
│ │ Maintenance Timers │ │ ChannelHealthMonitor │ │
│ │ tick: 30s keepalive │ │ check: 5min │ │
│ │ health: 60s snapshot refresh │ │ stale: 30min │ │
│ │ dedupe: 60s cache cleanup │ │ max restarts: 10/hr │ │
│ └────────────────────────────────┘ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Config Reloader (chokidar) │ │
│ │ openclaw.json → diffConfigPaths → buildGatewayReloadPlan │ │
│ │ stabilityThreshold: 200ms debounce: 300ms │ │
│ │ mode: off / restart / hot / hybrid(default) │ │
│ │ │ │
│ │ hot reload: hooks / heartbeat / cron / health-monitor │ │
│ │ / browser-control / per-channel restart │ │
│ │ restart: gateway / plugins / discovery / canvasHost │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ Secrets Runtime │ │ HeartbeatRunner │ │
│ │ activateSnapshot │ │ (agent heartbeat │ │
│ │ last-known-good │ │ loop) │ │
│ └────────────────────┘ └────────────────────┘ │
│ │
│ Sidecars: BrowserControl · GmailWatcher · Tailscale · PluginSvcs │
└─────────────────────────────────────────────────────────────────────┘
↑ WS clients: macOS app / iOS / Android / Web UI / Pi nodes
Core Source File Quick Reference
| Function | Source File |
|---|---|
| Gateway main entry | src/gateway/server.impl.ts |
| HTTP routing Stage Pipeline | src/gateway/server-http.ts |
| WebSocket connection handshake | src/gateway/server/ws-connection.ts |
| WS frame schema | src/gateway/protocol/schema/frames.ts |
| WS broadcast + Slow Consumer | src/gateway/server-broadcast.ts |
| Server constants (MAX_PAYLOAD, etc.) | src/gateway/server-constants.ts |
| Runtime state initialization | src/gateway/server-runtime-state.ts |
| Maintenance timers | src/gateway/server-maintenance.ts |
| Configuration hot-reload | src/gateway/config-reload.ts |
| Reload plan construction | src/gateway/config-reload-plan.ts |
| Channel health monitoring | src/gateway/channel-health-monitor.ts |
| Gateway method list | src/gateway/server-methods-list.ts |
| Startup sidecars | src/gateway/server-startup.ts |
Summary
- We traced the Gateway's single-process, single-port architecture and how Node.js's native
upgradeevent multiplexes HTTP and WebSocket traffic without a reverse proxy. - The 48-step, 9-phase startup sequence was examined end to end -- from config migration and secrets resolution through plugin loading, HTTP/WS server creation, sidecar lifecycle management, and the config reloader.
- HTTP routing uses a stage pipeline with first-match-wins semantics across 13 ordered stages, where disabled features contribute zero overhead because their stages are never added to the array.
- Configuration hot-reload was broken down across its four modes (off/restart/hot/hybrid), the recursive deep-diff engine, and the per-subsystem reload rules that determine which changes can be applied without a full restart.
- Health monitoring and slow-consumer protection ensure operational resilience: channel health checks auto-restart unhealthy connections with cooldown and hourly caps, while broadcast backpressure either drops low-priority events or force-disconnects laggy WebSocket clients.
This article is based on source code analysis of OpenClaw 2026.3.2. All data comes from actual code, with no subjective inference.
Top comments (0)