When people think about AI assistants, they picture a chat window backed by an LLM. But what actually makes an AI assistant alive is the runtime architecture behind it. OpenClaw uses a unique four-layer design that separates the control plane, gateway, agent runtime, and endpoint nodes into a system that's both lightweight and surprisingly powerful.
This article breaks down each layer, explaining why this architecture lets one AI assistant simultaneously connect to Telegram, Discord, and WhatsApp while controlling your Mac, iPhone, and Android devices.
┌─────────────────────────────────────────────────┐
│ Control Plane │
│ (Config · Auth · Device Pairing) │
└──────────────────────┬──────────────────────────┘
│ HTTPS / Token
┌──────────────────────▼──────────────────────────┐
│ Gateway │
│ (Single Process · WebSocket · Routing) │
│ ┌─────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Channel │ │ Provider │ │ Memory │ │
│ │ Plugins │ │ Plugins │ │ Plugins │ │
│ └─────────┘ └──────────┘ └──────────────┘ │
└──────────────────────┬──────────────────────────┘
│ Internal API
┌──────────────────────▼──────────────────────────┐
│ Agent Runtime │
│ (Sessions · Context Assembly · ReAct · Memory)│
│ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Context │ │ ReAct │ │ Memory │ │
│ │ Assembly │ │ Loop │ │ Flush │ │
│ └──────────┘ └──────────┘ └─────────────┘ │
└──────────────────────┬──────────────────────────┘
│ WebSocket
┌──────────────────────▼──────────────────────────┐
│ Nodes │
│ (macOS · iOS · Android · Physical Devices) │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Camera │ │ Screen │ │Location│ │ Canvas │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ │
└─────────────────────────────────────────────────┘
Architecture Overview: Why Single-Process? (Gateway Layer, WebSocket Protocol)
The first key decision in OpenClaw's architecture: the Gateway runs as a single process. This isn't laziness — it's a deliberate design choice.
The Single-Process Philosophy
Traditional microservice architectures would split message routing, auth, and plugin management into separate services. But for a personal AI assistant, that decomposition creates complexity that far outweighs its benefits. OpenClaw's Gateway consolidates everything into one Node.js process:
- Message receiving and routing (from Channel plugins)
- WebSocket connection management (bidirectional with Nodes)
- Session state maintenance
- Plugin lifecycle management
Single process means zero-overhead internal calls, dead-simple deployment (one process handles everything), and natural state consistency.
WebSocket Protocol Design
Gateway-to-Node communication uses WebSocket with three message types:
// req: request-response pattern
{ type: "req", id: "abc123", method: "camera_snap", params: { facing: "front" } }
// res: response
{ type: "res", id: "abc123", result: { image: "base64..." } }
// event: one-way push
{ type: "event", name: "location_update", data: { lat: 35.68, lng: 139.76 } }
These three types cover every interaction: synchronous calls via req/res, async notifications via event. Clean and complete.
Security Model: Device Pairing & Token Auth
Security is non-negotiable. Every Node must go through device pairing on first connection:
- Node sends a pairing request, Gateway generates a one-time code
- User confirms pairing on a trusted device
- On success, Gateway issues a long-lived token
- Subsequent connections use the token — no re-pairing needed
# Check pending devices
openclaw gateway status
# Token stored locally after pairing
cat ~/.openclaw/nodes/<device-id>/token
This mirrors Bluetooth pairing — trust is established once, then auto-authenticated going forward.
The Core: Agent Runtime (Context Assembly, ReAct Loop, Memory Flush)
The Agent Runtime is OpenClaw's brain. It transforms a user message into a complete AI interaction.
Context Assembly
Before every AI response, the Agent Runtime assembles context from 5 sources:
- System Prompt — Defines the AI's identity, capabilities, and boundaries
- Workspace Files — SOUL.md, USER.md, AGENTS.md, and other persistent config
- Memory Files — MEMORY.md (long-term) + daily memory files
- Session History — Current conversation's context window
- Tool Results — Return values from previous tool calls
Context Assembly Flow:
System Prompt ──┐
Workspace Files ─┤
Memory Files ────┼──→ [Context Assembly] ──→ LLM API Call
Session History ─┤
Tool Results ────┘
This multi-source assembly ensures the AI has a complete "worldview" every time — it knows who it is, who you are, what happened before, and what tools it has.
Context Window Optimization
Context windows are limited, so OpenClaw optimizes token usage:
- Old messages auto-compress or truncate
- Oversized tool outputs get summarized
- Memory files are ranked by relevance, with the most relevant injected first
The ReAct Loop
ReAct (Reasoning + Acting) is the core execution mode. When the AI needs tools, it enters a loop:
User message → LLM thinks → Decides to call tool → Execute → Get result → LLM thinks again → ...
// ReAct loop pseudocode
while (true) {
const response = await llm.chat(context);
if (response.hasToolCalls) {
const results = await executeTools(response.toolCalls);
context.append(results);
continue; // Keep looping
}
// No tool calls — return final response
return response.text;
}
This allows multi-step reasoning: search the web, then read a file, then run a command, then synthesize everything into a single answer.
Memory Flush
This is one of OpenClaw's most elegant designs. After a conversation ends (silence detected), the Agent Runtime:
- Reviews key information from the conversation
- Compresses and extracts what's worth remembering
- Writes to
memory/YYYY-MM-DD.md(daily memory) - Optionally updates
MEMORY.md(long-term memory)
It's like a human reviewing their day before sleep — moving important stuff from short-term to long-term memory. This gives the AI cross-session continuity. It's no longer a goldfish.
The Memory System: Markdown as Database (Dual-Layer Memory, Vector Search)
OpenClaw made a bold choice: Markdown files as the memory storage format. No SQLite. No Redis. Just plain text.
Dual-Layer Memory Design
-
Long-term memory (
MEMORY.md) — User preferences, key decisions, persistent knowledge (like a human's long-term memory) -
Daily memory (
memory/YYYY-MM-DD.md) — Conversation summaries and event logs for each day (like a diary)
<!-- MEMORY.md example -->
# Long-Term Memory
## User Preferences
- Prefers Chinese communication, English for tech terms
- Timezone: UTC+8 (Tokyo)
- Favorite tools: VS Code, iTerm2
## Key Decisions
- 2026-03-01: Decided to migrate blog to Astro
- 2026-02-28: API pricing set to per-token billing
Why Markdown Instead of a Database?
- Human-readable — You can read and edit memory files directly
- Version controlled — Git tracks every change
- Portable — Plain text, zero dependencies
- AI-friendly — LLMs are naturally great with Markdown
Vector Search
When memory files accumulate, simple text matching isn't enough. OpenClaw integrates vector search with multiple embedding providers:
# Vector search config
memory:
embedding:
provider: openai
model: text-embedding-3-small
dimensions: 1536
The flow: memory files get chunked and embedded → user message gets embedded → cosine similarity finds the most relevant memory fragments → those get injected into context.
Even with hundreds of days of memory files, the AI finds the most relevant history in milliseconds.
Plugin Architecture & Extensibility (Four Plugin Slots)
OpenClaw's extensibility is built on four plugin slots. Each defines an interface for a category of capabilities.
┌──────────────────────────────────────────┐
│ OpenClaw Plugin System │
├──────────┬──────────┬────────┬───────────┤
│ Channel │ Provider │ Memory │ Tool │
├──────────┼──────────┼────────┼───────────┤
│ Telegram │ OpenAI │ Local │ Browser │
│ Discord │ Anthropic│ Vector │ Exec │
│ WhatsApp │ Gemini │ Custom │ Web Search│
│ Slack │ Mistral │ │ Camera │
│ WeChat │ Local LLM│ │ Custom │
└──────────┴──────────┴────────┴───────────┘
- Channel plugins — Message input/output. Each handles receiving, sending, and format conversion for a specific platform.
- Provider plugins — Connect to different LLM providers. OpenAI, Anthropic, Gemini, Mistral, local models — all via Provider plugins.
- Memory plugins — Control how memory is stored and retrieved. Default is local Markdown, but you can implement custom plugins for external storage.
- Tool plugins — Extend the AI's capabilities. Browser control, shell execution, web search, file ops — each is a Tool plugin.
Node Device Capabilities
Nodes are the most innovative part. They expose physical device capabilities to the AI:
- Camera — Photos, video clips (front/back)
- Screen — Screenshots, screen recording
- Location — GPS with accuracy control
- Canvas — Render and display UI content on devices
// Take a photo via Node
const photo = await nodes.camera_snap({
node: "jeffs-iphone",
facing: "back",
quality: 0.8
});
// Get device location
const location = await nodes.location_get({
node: "jeffs-macbook",
desiredAccuracy: "balanced"
});
Currently supports macOS, iOS, and Android. Each Node maintains a WebSocket connection to the Gateway — the AI can invoke device capabilities anytime.
Design Trade-offs
OpenClaw's architecture boils down to three words: simple, transparent, extensible.
- Single-process Gateway sacrifices horizontal scaling for dead-simple deployment
- Markdown memory sacrifices query performance for human readability and version control
- Plugin architecture sacrifices out-of-the-box richness for infinite extensibility
The logic behind these trade-offs is consistent: OpenClaw targets the personal AI assistant use case, not enterprise SaaS. One person's message volume doesn't need a distributed system. But they need an AI that truly understands them, remembers them, and can control their devices.
🚀 Looking for a stable, cost-effective AI API gateway for your OpenClaw instance? Try Crazyrouter — unified API access to OpenAI, Claude, Gemini, and more. Pay-as-you-go, no monthly fees.
Top comments (0)