Jenny Met

Posted on Mar 7

OpenClaw Architecture: The Four-Layer Design Philosophy Behind an AI Assistant Runtime

#openclaw #ai #architecture #systemdesign

When people think about AI assistants, they picture a chat window backed by an LLM. But what actually makes an AI assistant alive is the runtime architecture behind it. OpenClaw uses a unique four-layer design that separates the control plane, gateway, agent runtime, and endpoint nodes into a system that's both lightweight and surprisingly powerful.

This article breaks down each layer, explaining why this architecture lets one AI assistant simultaneously connect to Telegram, Discord, and WhatsApp while controlling your Mac, iPhone, and Android devices.

┌─────────────────────────────────────────────────┐
│                 Control Plane                    │
│        (Config · Auth · Device Pairing)          │
└──────────────────────┬──────────────────────────┘
                       │ HTTPS / Token
┌──────────────────────▼──────────────────────────┐
│                   Gateway                        │
│       (Single Process · WebSocket · Routing)     │
│   ┌─────────┐  ┌──────────┐  ┌──────────────┐   │
│   │ Channel  │  │ Provider │  │    Memory     │   │
│   │ Plugins  │  │ Plugins  │  │   Plugins     │   │
│   └─────────┘  └──────────┘  └──────────────┘   │
└──────────────────────┬──────────────────────────┘
                       │ Internal API
┌──────────────────────▼──────────────────────────┐
│               Agent Runtime                      │
│    (Sessions · Context Assembly · ReAct · Memory)│
│   ┌──────────┐  ┌──────────┐  ┌─────────────┐   │
│   │ Context  │  │  ReAct   │  │   Memory     │   │
│   │ Assembly │  │   Loop   │  │   Flush      │   │
│   └──────────┘  └──────────┘  └─────────────┘   │
└──────────────────────┬──────────────────────────┘
                       │ WebSocket
┌──────────────────────▼──────────────────────────┐
│                    Nodes                         │
│     (macOS · iOS · Android · Physical Devices)   │
│   ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐   │
│   │ Camera │ │ Screen │ │Location│ │ Canvas │   │
│   └────────┘ └────────┘ └────────┘ └────────┘   │
└─────────────────────────────────────────────────┘

Architecture Overview: Why Single-Process? (Gateway Layer, WebSocket Protocol)

The first key decision in OpenClaw's architecture: the Gateway runs as a single process. This isn't laziness — it's a deliberate design choice.

The Single-Process Philosophy

Traditional microservice architectures would split message routing, auth, and plugin management into separate services. But for a personal AI assistant, that decomposition creates complexity that far outweighs its benefits. OpenClaw's Gateway consolidates everything into one Node.js process:

Message receiving and routing (from Channel plugins)
WebSocket connection management (bidirectional with Nodes)
Session state maintenance
Plugin lifecycle management

Single process means zero-overhead internal calls, dead-simple deployment (one process handles everything), and natural state consistency.

WebSocket Protocol Design

Gateway-to-Node communication uses WebSocket with three message types:

// req: request-response pattern
{ type: "req", id: "abc123", method: "camera_snap", params: { facing: "front" } }

// res: response
{ type: "res", id: "abc123", result: { image: "base64..." } }

// event: one-way push
{ type: "event", name: "location_update", data: { lat: 35.68, lng: 139.76 } }

These three types cover every interaction: synchronous calls via req/res, async notifications via event. Clean and complete.

Security Model: Device Pairing & Token Auth

Security is non-negotiable. Every Node must go through device pairing on first connection:

Node sends a pairing request, Gateway generates a one-time code
User confirms pairing on a trusted device
On success, Gateway issues a long-lived token
Subsequent connections use the token — no re-pairing needed

# Check pending devices
openclaw gateway status

# Token stored locally after pairing
cat ~/.openclaw/nodes/<device-id>/token

This mirrors Bluetooth pairing — trust is established once, then auto-authenticated going forward.

The Core: Agent Runtime (Context Assembly, ReAct Loop, Memory Flush)

The Agent Runtime is OpenClaw's brain. It transforms a user message into a complete AI interaction.

Context Assembly

Before every AI response, the Agent Runtime assembles context from 5 sources:

System Prompt — Defines the AI's identity, capabilities, and boundaries
Workspace Files — SOUL.md, USER.md, AGENTS.md, and other persistent config
Memory Files — MEMORY.md (long-term) + daily memory files
Session History — Current conversation's context window
Tool Results — Return values from previous tool calls

Context Assembly Flow:

System Prompt ──┐
Workspace Files ─┤
Memory Files ────┼──→ [Context Assembly] ──→ LLM API Call
Session History ─┤
Tool Results ────┘

This multi-source assembly ensures the AI has a complete "worldview" every time — it knows who it is, who you are, what happened before, and what tools it has.

Context Window Optimization

Context windows are limited, so OpenClaw optimizes token usage:

Old messages auto-compress or truncate
Oversized tool outputs get summarized
Memory files are ranked by relevance, with the most relevant injected first

The ReAct Loop

ReAct (Reasoning + Acting) is the core execution mode. When the AI needs tools, it enters a loop:

User message → LLM thinks → Decides to call tool → Execute → Get result → LLM thinks again → ...

// ReAct loop pseudocode
while (true) {
  const response = await llm.chat(context);

  if (response.hasToolCalls) {
    const results = await executeTools(response.toolCalls);
    context.append(results);
    continue; // Keep looping
  }

  // No tool calls — return final response
  return response.text;
}

This allows multi-step reasoning: search the web, then read a file, then run a command, then synthesize everything into a single answer.

Memory Flush

This is one of OpenClaw's most elegant designs. After a conversation ends (silence detected), the Agent Runtime:

Reviews key information from the conversation
Compresses and extracts what's worth remembering
Writes to memory/YYYY-MM-DD.md (daily memory)
Optionally updates MEMORY.md (long-term memory)

It's like a human reviewing their day before sleep — moving important stuff from short-term to long-term memory. This gives the AI cross-session continuity. It's no longer a goldfish.

The Memory System: Markdown as Database (Dual-Layer Memory, Vector Search)

OpenClaw made a bold choice: Markdown files as the memory storage format. No SQLite. No Redis. Just plain text.

Dual-Layer Memory Design

Long-term memory (MEMORY.md) — User preferences, key decisions, persistent knowledge (like a human's long-term memory)
Daily memory (memory/YYYY-MM-DD.md) — Conversation summaries and event logs for each day (like a diary)

<!-- MEMORY.md example -->
# Long-Term Memory

## User Preferences
- Prefers Chinese communication, English for tech terms
- Timezone: UTC+8 (Tokyo)
- Favorite tools: VS Code, iTerm2

## Key Decisions
- 2026-03-01: Decided to migrate blog to Astro
- 2026-02-28: API pricing set to per-token billing

Why Markdown Instead of a Database?

Human-readable — You can read and edit memory files directly
Version controlled — Git tracks every change
Portable — Plain text, zero dependencies
AI-friendly — LLMs are naturally great with Markdown

Vector Search

When memory files accumulate, simple text matching isn't enough. OpenClaw integrates vector search with multiple embedding providers:

# Vector search config
memory:
  embedding:
    provider: openai
    model: text-embedding-3-small
    dimensions: 1536

The flow: memory files get chunked and embedded → user message gets embedded → cosine similarity finds the most relevant memory fragments → those get injected into context.

Even with hundreds of days of memory files, the AI finds the most relevant history in milliseconds.

Plugin Architecture & Extensibility (Four Plugin Slots)

OpenClaw's extensibility is built on four plugin slots. Each defines an interface for a category of capabilities.

┌──────────────────────────────────────────┐
│            OpenClaw Plugin System         │
├──────────┬──────────┬────────┬───────────┤
│ Channel  │ Provider │ Memory │   Tool    │
├──────────┼──────────┼────────┼───────────┤
│ Telegram │ OpenAI   │ Local  │ Browser   │
│ Discord  │ Anthropic│ Vector │ Exec      │
│ WhatsApp │ Gemini   │ Custom │ Web Search│
│ Slack    │ Mistral  │        │ Camera    │
│ WeChat   │ Local LLM│        │ Custom    │
└──────────┴──────────┴────────┴───────────┘

Channel plugins — Message input/output. Each handles receiving, sending, and format conversion for a specific platform.
Provider plugins — Connect to different LLM providers. OpenAI, Anthropic, Gemini, Mistral, local models — all via Provider plugins.
Memory plugins — Control how memory is stored and retrieved. Default is local Markdown, but you can implement custom plugins for external storage.
Tool plugins — Extend the AI's capabilities. Browser control, shell execution, web search, file ops — each is a Tool plugin.

Node Device Capabilities

Nodes are the most innovative part. They expose physical device capabilities to the AI:

Camera — Photos, video clips (front/back)
Screen — Screenshots, screen recording
Location — GPS with accuracy control
Canvas — Render and display UI content on devices

// Take a photo via Node
const photo = await nodes.camera_snap({
  node: "jeffs-iphone",
  facing: "back",
  quality: 0.8
});

// Get device location
const location = await nodes.location_get({
  node: "jeffs-macbook",
  desiredAccuracy: "balanced"
});

Currently supports macOS, iOS, and Android. Each Node maintains a WebSocket connection to the Gateway — the AI can invoke device capabilities anytime.

Design Trade-offs

OpenClaw's architecture boils down to three words: simple, transparent, extensible.

Single-process Gateway sacrifices horizontal scaling for dead-simple deployment
Markdown memory sacrifices query performance for human readability and version control
Plugin architecture sacrifices out-of-the-box richness for infinite extensibility

The logic behind these trade-offs is consistent: OpenClaw targets the personal AI assistant use case, not enterprise SaaS. One person's message volume doesn't need a distributed system. But they need an AI that truly understands them, remembers them, and can control their devices.

🚀 Looking for a stable, cost-effective AI API gateway for your OpenClaw instance? Try Crazyrouter — unified API access to OpenAI, Claude, Gemini, and more. Pay-as-you-go, no monthly fees.

DEV Community