Series goal: After reading the full series, you'll be able to do custom development on OpenClaw and build a similar system from scratch.
Core questions in this article: After a message reaches the Agent, how does the AI "think"? How are tool calls executed? How are real-world problems like rate limits and context overflow handled?
Start With a Seemingly Simple Request
You send a message on WhatsApp:
"Summarize today's meeting notes and save the summary to my desktop."
This message travels through the Gateway and routing system described in the previous two articles, and eventually reaches the Agent. Then what?
This request requires the AI to do several things in sequence: understand intent → find the meeting notes file → read it → generate a summary → write to a file → report back. Any step can fail: file not found, API rate limit hit, context too long...
The Agent execution engine's job is to reliably complete this entire sequence, no matter what happens along the way.
First Challenge: Prevent Concurrent Conflicts
Before explaining how the AI "thinks," let's solve a foundational problem: can the same session process two messages at once?
No. A single AI conversation context (SessionKey) has one history transcript. If two messages write to it concurrently, the history gets corrupted and the AI's context breaks.
OpenClaw's solution is Lanes:
// src/agents/pi-embedded-runner/lanes.ts
export function resolveSessionLane(key: string) {
// Each SessionKey has its own dedicated command queue
return `session:${key}`;
}
Each SessionKey has exactly one Lane, and tasks within a Lane execute strictly in sequence:
// src/agents/pi-embedded-runner/run.ts
const sessionLane = resolveSessionLane(params.sessionKey ?? params.sessionId);
return enqueueSession(() => // ① queue in session lane
enqueueGlobal(async () => { // ② queue in global lane
// actual AI execution logic
})
);
The two nested enqueue calls each serve a purpose:
- session lane: same-session messages are serialized — prevents concurrent writes
- global lane: cross-session shared resources (model connections, file handles) are also fairly queued — prevents one session from monopolizing resources
This is a multi-level queue pattern: the inner level ensures concurrency safety, the outer level ensures resource fairness.
The Main Event: Five Phases of a Single Run
Once inside the Lane, the core function runEmbeddedAttempt takes over. It does five things:
Phase 1: Prepare Workspace and Skill Environment
// src/agents/pi-embedded-runner/run/attempt.ts
const sandbox = await resolveSandboxContext({ config, sessionKey, workspaceDir });
const effectiveWorkspace = sandbox?.enabled ? sandbox.workspaceDir : resolvedWorkspace;
// Switch working directory to workspace (the AI's file system perspective)
process.chdir(effectiveWorkspace);
// Load skills and apply environment variable overrides
const skillEntries = loadWorkspaceSkillEntries(effectiveWorkspace);
restoreSkillEnv = applySkillEnvOverrides({ skills: skillEntries, config });
Skills are OpenClaw's extension mechanism — like installing "apps" for the AI. A skill can provide:
- Dedicated environment variables (e.g.
GITHUB_TOKEN) - Documentation (injected into the system prompt, telling the AI how to use this capability)
- Predefined task templates
Skill documentation gets injected in the next phase so the AI knows what capabilities it has.
Phase 2: Build the System Prompt
The system prompt is the source of the AI's "personality" — it determines how the AI behaves, what it can and cannot do. OpenClaw's system prompt is dynamically built for each run:
// src/agents/pi-embedded-runner/run/attempt.ts
const appendPrompt = buildEmbeddedSystemPrompt({
workspaceDir: effectiveWorkspace, // where the AI's working directory is
defaultThinkLevel: params.thinkLevel, // whether deep thinking is enabled
skillsPrompt, // installed skills documentation
docsPath, // documentation path
sandboxInfo, // sandbox restrictions
tools, // available tools list
runtimeInfo: { // runtime environment
host: machineName,
os: `${os.type()} ${os.release()}`,
model: `${params.provider}/${params.modelId}`,
channel: runtimeChannel,
capabilities: runtimeCapabilities, // what this channel supports (e.g. Telegram inline buttons)
},
reactionGuidance, // Telegram/Signal emoji reaction guidance
messageToolHints, // message sending tool usage hints
// ...more params
});
Notice runtimeCapabilities: the AI behaves differently on different channels. If Telegram supports inline buttons, the AI knows it can send interactive button menus. If WhatsApp doesn't, the AI sticks to plain text. The system prompt dynamically adjusts the AI's capability descriptions based on the current channel.
Phase 3: Load Session History
The AI needs to know "what was said before" to continue the conversation:
// src/agents/pi-embedded-runner/run/attempt.ts
await repairSessionFileIfNeeded({ sessionFile: params.sessionFile });
const sessionManager = guardSessionManager(
(await createAgentSession({ sessionFile, ... })).session,
{ sessionId: params.sessionId }
);
// History length limit: DM sessions have a separate cap (prevent single-user context monopoly)
const historyLimit = getDmHistoryLimitFromSessionKey(params.sessionKey, params.config);
if (historyLimit) {
await limitHistoryTurns(sessionManager, historyLimit);
}
Session history is stored in JSONL files (~/.openclaw/agents/<agentId>/sessions/), managed by @mariozechner/pi-coding-agent's SessionManager. OpenClaw wraps it in guardSessionManager, intercepting every write to verify its integrity (e.g. tool_use and tool_result entries must be correctly paired).
Phase 4: Register Tools
Tools are the AI's "hands." All available tools are registered here:
// src/agents/pi-embedded-runner/run/attempt.ts
const toolsRaw = createOpenClawCodingTools({
agentId: sessionAgentId,
exec: { ...params.execOverrides, elevated: params.bashElevated },
sandbox,
messageProvider: params.messageChannel,
sessionKey: params.sessionKey ?? params.sessionId,
workspaceDir: effectiveWorkspace,
config: params.config,
abortSignal: runAbortController.signal,
// ...more context
});
// Tool policy filtering
const tools = sanitizeToolsForGoogle({ tools: toolsRaw, provider: params.provider });
const allowedToolNames = collectAllowedToolNames({ tools, clientTools: params.clientTools });
The tool set includes: file read/write, bash execution, message sending, web requests, media processing... The tool policy is detailed in the next section.
Phase 5: Subscribe to Streaming Output
// src/agents/pi-embedded-runner/run/attempt.ts
const subscribeResult = await subscribeEmbeddedPiSession({
session: sessionManager,
prompt: params.prompt,
onBlockReply: params.onBlockReply, // called when AI completes a text block
onReasoningStream: params.onReasoningStream,
// ...
});
subscribeEmbeddedPiSession is the actual entry point for AI execution, receiving and processing streaming events from the SDK.
Streaming Subscription: How the AI's Thinking Is Captured
subscribeEmbeddedPiSession handles three types of events from @mariozechner/pi-agent-core:
Event 1: Text Stream
// Each token arrival
text_delta → accumulate into deltaBuffer → detect <think> tags → filter or emit
const THINKING_TAG_SCAN_RE = /<\s*(\/?)\s*(?:think(?:ing)?|thought|antthinking)\s*>/gi;
When <think>...</think> is encountered, content is handled based on reasoningMode:
-
off: filtered out — users don't see the AI's chain of thought -
on: thinking is sent as a separate message -
stream: thinking is pushed in real time (experimental)
The "when to send" timing for text blocks is controlled by blockReplyBreak:
-
text_end(default): send when a full text block is complete — avoids frequent interruptions -
paragraph: send at each paragraph break — users see progress sooner
This involves a code-span-aware block chunker (EmbeddedBlockChunker): when splitting text, it detects whether you're inside a code block, preventing splits that would break Markdown rendering.
Event 2: Tool Calls
// Tool call event sequence:
tool_use_start → dispatch to appropriate tool executor
tool_use_result → write result back to SessionManager
Before a tool call executes, it passes through runBeforeToolCallHook:
// src/agents/pi-tools.before-tool-call.ts
export async function runBeforeToolCallHook(args: {
toolName: string;
params: unknown;
toolCallId?: string;
ctx?: HookContext;
}): Promise<HookOutcome> {
// 1. Tool loop detection (prevent AI from looping the same tool call)
// 2. Plugin hooks (before_tool_call hook, can intercept or modify params)
// 3. If blocked=true, return the error as a tool result back to the AI
}
Tool loop detection: if the AI calls the same tool with identical arguments more than ~10 times, it's stuck in a loop — the AI is prompted: "Repeated identical tool call detected, please try a different approach."
Event 3: Compaction Signal
// When pi-agent-core internally triggers compaction
compaction_start → set compactionInFlight = true
compaction_done → clear flag, continue streaming
Tool Policy: What Is the AI Allowed to Do?
The AI has many tools, but not every scenario should grant access to all of them. The tool policy is the key implementation of the security boundary.
Tool Filtering: Deny/Allow Pattern
// src/agents/pi-tools.policy.ts
function makeToolPolicyMatcher(policy: SandboxToolPolicy) {
const deny = compileGlobPatterns({ raw: expandToolGroups(policy.deny ?? []) });
const allow = compileGlobPatterns({ raw: expandToolGroups(policy.allow ?? []) });
return (name: string) => {
if (matchesAnyGlobPattern(normalized, deny)) return false; // deny list takes priority
if (allow.length === 0) return true; // no allow list = allow everything
return matchesAnyGlobPattern(normalized, allow);
};
}
Tool names support Glob patterns: exec:* matches all exec-series tools, bash matches only bash.
Additional Restrictions for Sub-Agents
When a main Agent spawns a sub-agent to handle a subtask, the sub-agent's tool set is further restricted:
// src/agents/pi-tools.policy.ts — always denied for sub-agents
const SUBAGENT_TOOL_DENY_ALWAYS = [
"gateway", // system administration — dangerous
"agents_list", // system administration
"whatsapp_login", // interactive setup — not a task
"session_status", // status/scheduling — main agent coordinates this
"cron", // scheduled tasks — not the sub-agent's domain
"memory_search", // memory — main agent passes relevant info via spawn prompt
"memory_get",
"sessions_send", // direct session sends — sub-agents communicate through announce chain
];
// Leaf sub-agents (deepest level, cannot spawn further) additionally denied:
const SUBAGENT_TOOL_DENY_LEAF = [
"sessions_list",
"sessions_history",
"sessions_spawn", // leaves cannot spawn
];
This design flows from a clear principle: each Agent only does what it's meant to do. Sub-agents are executors, not managers; memory queries and task scheduling are the orchestrator's (main Agent's) responsibility.
Sub-agent spawn depth is configurable (maxSpawnDepth). The deeper the depth, the stricter the restrictions:
- Depth 1 with
maxSpawnDepth >= 2(orchestrator): can spawn grandchildren - Depth >=
maxSpawnDepth(leaf): can only execute, cannot spawn
The Outer Retry Loop: Fighting Real-World Unreliability
The inner "single run" occasionally fails: API rate limits, expired auth, context overflow... The outer loop is dedicated to handling these situations:
// src/agents/pi-embedded-runner/run.ts
const MAX_RUN_LOOP_ITERATIONS = resolveMaxRunRetryIterations(profileCandidates.length);
// 32–160 iterations, dynamically scaled by the number of auth profiles
while (true) {
if (runLoopIterations >= MAX_RUN_LOOP_ITERATIONS) {
return { error: "Exceeded retry limit after N attempts" };
}
const attempt = await runEmbeddedAttempt({ ... });
if (attempt succeeded) {
markAuthProfileGood(profileId); // mark this profile as healthy
return success result;
}
if (isRateLimitError(attempt)) {
markAuthProfileFailure(profileId, "rate_limit"); // mark as rate-limited, enter cooldown
const advanced = await advanceAuthProfile(); // switch to next profile
if (!advanced) return failure;
continue; // retry with new profile
}
if (isContextOverflowError(attempt)) {
if (overflowCompactionAttempts < 3) {
await compactEmbeddedPiSession( ... ); // summarize session history
overflowCompactionAttempts++;
continue; // retry with compacted history
}
return context overflow failure;
}
if (isAuthError(attempt)) {
markAuthProfileFailure(profileId, "auth");
advanceAuthProfile();
continue;
}
// ...more error handling
}
Auth Profile Rotation
This is the core mechanism for handling API rate limits. You can configure multiple API keys (or OAuth accounts) as "auth profiles":
# openclaw.yml
auth:
profiles:
- id: primary
provider: anthropic
apiKey: sk-ant-...
- id: backup-1
provider: anthropic
apiKey: sk-ant-...
- id: backup-2
provider: anthropic
apiKey: sk-ant-...
When primary hits a rate limit:
-
markAuthProfileFailure(primary, "rate_limit")— enters cooldown period -
advanceAuthProfile()— switches tobackup-1 - Retry with
backup-1 - If
backup-1also rate-limits, switch tobackup-2 - All profiles in cooldown → report "API temporarily unavailable" to user
This solves a real pain point for personal AI assistants: if you kick off a complex task at midnight that requires many API calls, a single key hitting rate limits means you just wait. With profile rotation, the system automatically continues with other keys.
Context Overflow and Compaction
LLM context windows are finite (e.g. Claude's 200k tokens). Long conversations and large tool results will eventually fill it up.
When the API returns a "context exceeded" error:
// src/agents/pi-embedded-runner/run.ts
if (isLikelyContextOverflowError(attempt)) {
const compacted = await compactEmbeddedPiSession({
sessionFile: params.sessionFile,
trigger: "overflow",
// use a lighter model for summarization (not the current heavy model)
model: compactionModelId,
// ...
});
// after compaction, retry the attempt with the summarized history
}
The compaction process:
- Read the full session history
- Have an AI generate a "conversation summary"
- Replace the history messages with that summary
- Retry the request with the compacted history
This is not simple truncation — truncating causes AI "amnesia." Compaction preserves key context. Using a smaller model for the summarization task also makes sense: it doesn't require complex reasoning, and using a cheap, fast model saves time and cost.
Compaction retries a maximum of 3 times (MAX_OVERFLOW_COMPACTION_ATTEMPTS = 3) to prevent infinite compaction loops.
The Full Flow Diagram
Putting everything together:
Message arrives at Agent
↓
runEmbeddedPiAgent()
→ queue in session lane (serializes same-session messages)
→ queue in global lane (fair resource sharing)
↓
Outer retry loop (up to 160 iterations)
↓ ↑ (on failure: auth rotation / context compaction / model failover)
runEmbeddedAttempt()
① Prepare workspace + skill environment
② Dynamically build system prompt (channel capabilities, skill docs)
③ Load session history (with history length cap)
④ Register tools (with policy filtering)
⑤ subscribeEmbeddedPiSession()
↓
pi-agent-core SDK inner loop:
[Model generates text]
↓ text_delta events
Detect <think> tags → filter / send separately
EmbeddedBlockChunker splitting (code-span-aware)
onBlockReply → push to Gateway → broadcast to all clients
↓
[Model decides to call a tool]
↓ tool_use event
runBeforeToolCallHook (loop detection + plugin hooks)
↓ tool executes (bash / file read-write / message send / ...)
tool_result → write back to SessionManager
↓ result returned to model, next reasoning round begins
↓
[Model finishes]
Final reply sent back to user via channel
Summary
| Problem | Solution | Key code |
|---|---|---|
| Same-session concurrent writes | Lane serial queue | src/agents/pi-embedded-runner/lanes.ts |
| System prompt varies by channel | Dynamically built appendPrompt
|
run/attempt.ts:buildEmbeddedSystemPrompt |
| Streaming text breaks Markdown | Code-span-aware block chunker | src/agents/pi-embedded-block-chunker.ts |
| AI stuck in tool call loop | Tool loop detection | src/agents/pi-tools.before-tool-call.ts |
| Sub-agent over-privileged | Sub-agent tool deny-list | src/agents/pi-tools.policy.ts:SUBAGENT_TOOL_DENY_ALWAYS |
| API rate limits | Auth profile rotation (up to 160 retries) | src/agents/pi-embedded-runner/run.ts:advanceAuthProfile |
| Context window overflow | Session compaction (AI summarizes history) | src/agents/pi-embedded-runner/compact.ts |
Next article covers the Plugin SDK and extension development:
How does OpenClaw let third-party developers extend its capabilities? What interfaces does a new messaging channel (e.g. WeChat Work, Zalo) need to implement? How is plugin lifecycle managed?
Source paths: src/agents/pi-embedded-runner/ | Key files: run.ts, run/attempt.ts, pi-embedded-subscribe.ts, pi-tools.policy.ts
Top comments (0)