Chen-Hung Wu

Posted on Feb 22 • Originally published at tryupskill.app

OpenClaw Agent Runner: Request Lifecycle Explained

#ai #architecture #systemdesign #tutorial

The Six-Layer Pipeline

OpenClaw isn't a monolithic agent runtime. It's a hub-and-spoke architecture where a central Gateway orchestrates traffic from every messaging platform to a unified agent core. Here's what your request actually hits:

Channel Adapter: Platform-specific ingestion (WhatsApp, Discord, Telegram, CLI)
Gateway Server: WebSocket control plane, session coordination
Session Resolution: Mapping messages to isolated execution contexts
Lane Queue: Serial execution enforcement, race condition prevention
Agent Runner: Context assembly, model invocation, tool execution
Response Path: Streaming output, persistence, platform delivery

The design principle is separation of concerns: the interface layer (where messages come from) is completely decoupled from the assistant runtime (where intelligence lives). This enables one persistent assistant accessible across all platforms with centralized state.

What interviewers are actually testing: Can you decompose a system into clear boundaries? The Channel Adapter knows nothing about LLMs. The Agent Runner knows nothing about WhatsApp. That's not accidental. It's how you build systems that survive 10x growth.

Channel Adapters: Platform Normalization

Every messaging platform has its own protocol. WhatsApp uses Baileys (reverse-engineered web protocol). Telegram uses grammY. Discord uses discord.js. The Channel Adapter's job is to make these differences invisible to everything downstream.

What actually happens:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  WhatsApp   │     │  Telegram   │     │   Discord   │
│  (Baileys)  │     │   (grammY)  │     │ (discord.js)│
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │
                           ▼
                 ┌─────────────────┐
                 │ Normalized Msg  │
                 │ { text, media,  │
                 │   sender, ts }  │
                 └─────────────────┘

The adapter handles authentication, parses incoming messages, extracts media attachments, and enforces access control. Here's the WhatsApp configuration:

{
  "channels": {
    "whatsapp": {
      "enabled": true,
      "allowFrom": ["+1234567890"],
      "dmPolicy": "pairing"
    }
  }
}

The dmPolicy: "pairing" is critical. It requires device pairing before accepting DMs, which prevents random strangers from talking to your AI. I've seen production systems without this get 10,000 spam messages in an hour. Not fun to debug when your token budget explodes.

What interviewers are actually testing: Input validation at boundaries. Every system accepts external input somewhere. The question is: do you validate and normalize before it spreads through your system, or do you let garbage propagate?

Gateway Server: The Control Plane

The Gateway is where coordination happens. It's a WebSocket server running on Node.js, binding to 127.0.0.1:18789 by default. Every channel adapter connects here.

Key responsibilities:

Session routing: Determines which session a message belongs to
Frame validation: All WebSocket frames pass JSON Schema validation
Authentication: Token/password auth for remote connections
Health monitoring: Tracks system state, cron jobs, connection health

The Gateway never touches LLM logic. It's pure message routing. When a WhatsApp message arrives, the Gateway looks at the sender and message type, maps it to a session identifier, and queues it for the Agent Runner.

Session mapping follows this pattern:

Origin	Session Key	Trust Level
CLI / macOS app	`main`	Full host access
WhatsApp DM	`agent:main:whatsapp:dm:<phone>`	Sandboxed
Discord Group	`agent:main:discord:group:<id>`	Sandboxed

The main session gets host access with no Docker overhead and full filesystem. DM and group sessions run in ephemeral containers. This isn't paranoia. It's the correct threat model: you trust yourself, you don't trust random group chat members.

What interviewers are actually testing: Defense in depth. The Gateway validates frames. The Session maps trust levels. The sandbox enforces isolation. Each layer assumes the previous one might fail.

Lane Queues: Preventing State Drift

Here's where most agent frameworks break. Concurrent modifications to session state create race conditions. User sends message A. Before A finishes processing, user sends message B. Now you have two tool chains executing in parallel against the same session history. State corruption. Incoherent responses. Debugging hell.

OpenClaw's answer: Lane Queues.

┌─────────────────────────────────────────────┐
│              Lane Queue Manager              │
├─────────────────────────────────────────────┤
│  Session: main          │ Run #42 executing │
│  Session: wa:dm:+123    │ Run #17 queued    │
│  Session: dc:group:456  │ Idle              │
└─────────────────────────────────────────────┘

The rules are simple:

One run per session at a time. Period.
Runs queue if session is busy. FIFO ordering.
Parallel lanes exist only for explicitly safe tasks, like scheduled cron jobs that don't touch session state.

This is the "Default Serial, Explicit Parallel" philosophy. Most frameworks default to parallel (fast but dangerous). OpenClaw defaults to serial (correct but slower). The 50ms you lose waiting in queue saves you hours of debugging non-deterministic state bugs.

Session locking happens before streaming begins. The SessionManager acquires a write lock while workspace is prepared, skills are injected, and context is assembled. No other run can touch that session until the lock releases.

What interviewers are actually testing: Concurrency control. This is the same problem as database transactions. The answer is always: define your isolation level explicitly, don't let it happen by accident.

Agent Runner: The Agentic Loop

This is where inference happens. The PiEmbeddedRunner processes requests through a five-stage loop:

1. Entry & Validation
The agent RPC accepts parameters and returns a runId immediately. Async from the start.

2. Context Assembly
This is the expensive part. The runner:

Loads session history from persistent JSONL files
Builds system prompt from workspace files (AGENTS.md, SOUL.md, TOOLS.md)
Queries the memory system for semantically relevant past conversations
Selectively injects skills to avoid prompt bloat

3. Model Invocation
Context streams to the configured provider (Anthropic, OpenAI, Gemini, local). Token counting happens here. The Context Window Guard monitors usage before the window "explodes."

4. Tool Execution
As the model returns tool calls, the runner intercepts and executes:

// Simplified tool execution flow
while (modelResponse.hasToolCalls()) {
  const call = modelResponse.nextToolCall();
  const result = await toolRegistry.execute(call.name, call.args);
  modelResponse.appendToolResult(call.id, sanitize(result));
  // Result flows back into model generation
}

Tool results undergo sanitization for size and image payloads before logging. One 10MB screenshot in your context will blow your token budget faster than anything else.

5. Persistence
Session state updates consistently. Every message, tool call, and result writes to JSONL files in .openclaw/agents.main/sessions/.

The loop continues until one of three things happens:

Model returns a final response (no more tool calls)
Token limit triggers auto-compaction
Timeout hits (600s default)

What interviewers are actually testing: State machines. The agentic loop is a state machine with five states and explicit transitions. Can you model complex behavior as explicit states rather than implicit control flow?

Token Management: Preventing Blowups

Here's the reality of agent systems: context windows fill up fast. Every message, every tool result, every system prompt chunk. They all consume tokens. Without active management, you hit the limit mid-generation and get garbage output.

OpenClaw's token strategy:

Context Window Guard
Monitors token count during prompt assembly. Before hitting limits, it triggers summarization or stops the loop entirely. Better to fail cleanly than produce incoherent output.

Auto-Compaction
When tokens approach limits, compaction kicks in:

Before: [msg1, msg2, tool_result_50kb, msg3, msg4, ...]
After:  [summary: "User discussed X, system did Y", msg4, ...]

Compaction emits stream events and can trigger a retry. On retry, in-memory buffers reset to avoid duplicate output.

Per-Model Limits
Different models have different capacities. The runner enforces model-specific limits and reserves tokens for compaction overhead.

Usage Logging
Everything lands in ~/.openclaw/logs/usage.jsonl:

{
  "timestamp": "2026-02-22T10:30:00Z",
  "session": "main",
  "model": "claude-sonnet-4-5",
  "input_tokens": 4521,
  "output_tokens": 892,
  "cost_usd": 0.0284
}

I've debugged sessions where a single runaway tool (listing a directory with 50,000 files) burned through $40 in tokens before anyone noticed. The logging exists for exactly this reason.

What interviewers are actually testing: Resource management. Context windows are a finite resource. How do you monitor, limit, and recover when limits are exceeded? Same pattern applies to memory, disk, network bandwidth.

Try It Yourself

Want to trace a request through the pipeline? Here's how.

Prerequisites

OpenClaw v2026.1.29+
jq for JSON parsing
Access to your instance's logs

Step 1: Enable Verbose Logging

openclaw config set logging.level debug
openclaw config set logging.include_tool_results true

Step 2: Send a Test Message

# Via CLI (simplest path)
openclaw chat "What time is it?"

Step 3: Trace the Request

# Find the run ID
tail -100 ~/.openclaw/logs/agent.log | grep "runId"

# Example output:
# [DEBUG] agent.run started runId=abc123 session=main

# Follow the full trace
grep "abc123" ~/.openclaw/logs/agent.log | jq .

Step 4: Inspect Session State

# View raw session history
cat ~/.openclaw/agents.main/sessions/main.jsonl | tail -5 | jq .

# Check token usage
cat ~/.openclaw/logs/usage.jsonl | tail -1 | jq .

Expected Output

{
  "runId": "abc123",
  "stages": [
    {"name": "entry", "durationMs": 2},
    {"name": "contextAssembly", "durationMs": 45},
    {"name": "modelInvocation", "durationMs": 412},
    {"name": "toolExecution", "durationMs": 0},
    {"name": "persistence", "durationMs": 8}
  ],
  "totalTokens": 1847
}

Troubleshooting

"Session locked" errors: Another run is in progress. Check ps aux | grep openclaw for stuck processes.
Compaction triggered unexpectedly: Your context is too large. Review tool results in session JSONL.
Latency spikes in contextAssembly: Memory queries are slow. Check your embedding index health.

Key Takeaways

OpenClaw's request lifecycle is a masterclass in separation of concerns. Channel Adapters handle platform chaos without knowing anything about LLMs. The Gateway routes and validates without touching inference. Lane Queues prevent the race conditions that plague every concurrent system. The Agent Runner implements a clean state machine for the agentic loop. And token management treats context windows as the finite resource they are.

When debugging agent systems, trace requests layer by layer. Most issues live in one of three places: context assembly (wrong state loaded), tool execution (unexpected results), or token management (limits exceeded). Understanding the pipeline means knowing exactly where to look.

Sources:

DEV Community