WonderLab

Posted on Mar 3

OpenClaw Source Deep Dive (3): Agent Execution Engine — How Does the AI Think and Act?

#openclaw #ai #opensource

Series goal: After reading the full series, you'll be able to do custom development on OpenClaw and build a similar system from scratch.

Core questions in this article: After a message reaches the Agent, how does the AI "think"? How are tool calls executed? How are real-world problems like rate limits and context overflow handled?

Start With a Seemingly Simple Request

You send a message on WhatsApp:

"Summarize today's meeting notes and save the summary to my desktop."

This message travels through the Gateway and routing system described in the previous two articles, and eventually reaches the Agent. Then what?

This request requires the AI to do several things in sequence: understand intent → find the meeting notes file → read it → generate a summary → write to a file → report back. Any step can fail: file not found, API rate limit hit, context too long...

The Agent execution engine's job is to reliably complete this entire sequence, no matter what happens along the way.

First Challenge: Prevent Concurrent Conflicts

Before explaining how the AI "thinks," let's solve a foundational problem: can the same session process two messages at once?

No. A single AI conversation context (SessionKey) has one history transcript. If two messages write to it concurrently, the history gets corrupted and the AI's context breaks.

OpenClaw's solution is Lanes:

// src/agents/pi-embedded-runner/lanes.ts
export function resolveSessionLane(key: string) {
  // Each SessionKey has its own dedicated command queue
  return `session:${key}`;
}

Each SessionKey has exactly one Lane, and tasks within a Lane execute strictly in sequence:

// src/agents/pi-embedded-runner/run.ts
const sessionLane = resolveSessionLane(params.sessionKey ?? params.sessionId);

return enqueueSession(() =>     // ① queue in session lane
  enqueueGlobal(async () => {   // ② queue in global lane
    // actual AI execution logic
  })
);

The two nested enqueue calls each serve a purpose:

session lane: same-session messages are serialized — prevents concurrent writes
global lane: cross-session shared resources (model connections, file handles) are also fairly queued — prevents one session from monopolizing resources

This is a multi-level queue pattern: the inner level ensures concurrency safety, the outer level ensures resource fairness.

The Main Event: Five Phases of a Single Run

Once inside the Lane, the core function runEmbeddedAttempt takes over. It does five things:

Phase 1: Prepare Workspace and Skill Environment

// src/agents/pi-embedded-runner/run/attempt.ts
const sandbox = await resolveSandboxContext({ config, sessionKey, workspaceDir });
const effectiveWorkspace = sandbox?.enabled ? sandbox.workspaceDir : resolvedWorkspace;

// Switch working directory to workspace (the AI's file system perspective)
process.chdir(effectiveWorkspace);

// Load skills and apply environment variable overrides
const skillEntries = loadWorkspaceSkillEntries(effectiveWorkspace);
restoreSkillEnv = applySkillEnvOverrides({ skills: skillEntries, config });

Skills are OpenClaw's extension mechanism — like installing "apps" for the AI. A skill can provide:

Dedicated environment variables (e.g. GITHUB_TOKEN)
Documentation (injected into the system prompt, telling the AI how to use this capability)
Predefined task templates

Skill documentation gets injected in the next phase so the AI knows what capabilities it has.

Phase 2: Build the System Prompt

The system prompt is the source of the AI's "personality" — it determines how the AI behaves, what it can and cannot do. OpenClaw's system prompt is dynamically built for each run:

// src/agents/pi-embedded-runner/run/attempt.ts
const appendPrompt = buildEmbeddedSystemPrompt({
  workspaceDir: effectiveWorkspace,      // where the AI's working directory is
  defaultThinkLevel: params.thinkLevel,  // whether deep thinking is enabled
  skillsPrompt,                          // installed skills documentation
  docsPath,                              // documentation path
  sandboxInfo,                           // sandbox restrictions
  tools,                                 // available tools list
  runtimeInfo: {                         // runtime environment
    host: machineName,
    os: `${os.type()} ${os.release()}`,
    model: `${params.provider}/${params.modelId}`,
    channel: runtimeChannel,
    capabilities: runtimeCapabilities,   // what this channel supports (e.g. Telegram inline buttons)
  },
  reactionGuidance,    // Telegram/Signal emoji reaction guidance
  messageToolHints,    // message sending tool usage hints
  // ...more params
});

Notice runtimeCapabilities: the AI behaves differently on different channels. If Telegram supports inline buttons, the AI knows it can send interactive button menus. If WhatsApp doesn't, the AI sticks to plain text. The system prompt dynamically adjusts the AI's capability descriptions based on the current channel.

Phase 3: Load Session History

The AI needs to know "what was said before" to continue the conversation:

// src/agents/pi-embedded-runner/run/attempt.ts
await repairSessionFileIfNeeded({ sessionFile: params.sessionFile });

const sessionManager = guardSessionManager(
  (await createAgentSession({ sessionFile, ... })).session,
  { sessionId: params.sessionId }
);

// History length limit: DM sessions have a separate cap (prevent single-user context monopoly)
const historyLimit = getDmHistoryLimitFromSessionKey(params.sessionKey, params.config);
if (historyLimit) {
  await limitHistoryTurns(sessionManager, historyLimit);
}

Session history is stored in JSONL files (~/.openclaw/agents/<agentId>/sessions/), managed by @mariozechner/pi-coding-agent's SessionManager. OpenClaw wraps it in guardSessionManager, intercepting every write to verify its integrity (e.g. tool_use and tool_result entries must be correctly paired).

Phase 4: Register Tools

Tools are the AI's "hands." All available tools are registered here:

// src/agents/pi-embedded-runner/run/attempt.ts
const toolsRaw = createOpenClawCodingTools({
  agentId: sessionAgentId,
  exec: { ...params.execOverrides, elevated: params.bashElevated },
  sandbox,
  messageProvider: params.messageChannel,
  sessionKey: params.sessionKey ?? params.sessionId,
  workspaceDir: effectiveWorkspace,
  config: params.config,
  abortSignal: runAbortController.signal,
  // ...more context
});

// Tool policy filtering
const tools = sanitizeToolsForGoogle({ tools: toolsRaw, provider: params.provider });
const allowedToolNames = collectAllowedToolNames({ tools, clientTools: params.clientTools });

The tool set includes: file read/write, bash execution, message sending, web requests, media processing... The tool policy is detailed in the next section.

Phase 5: Subscribe to Streaming Output

// src/agents/pi-embedded-runner/run/attempt.ts
const subscribeResult = await subscribeEmbeddedPiSession({
  session: sessionManager,
  prompt: params.prompt,
  onBlockReply: params.onBlockReply,   // called when AI completes a text block
  onReasoningStream: params.onReasoningStream,
  // ...
});

subscribeEmbeddedPiSession is the actual entry point for AI execution, receiving and processing streaming events from the SDK.

Streaming Subscription: How the AI's Thinking Is Captured

subscribeEmbeddedPiSession handles three types of events from @mariozechner/pi-agent-core:

Event 1: Text Stream

// Each token arrival
text_delta → accumulate into deltaBuffer → detect <think> tags → filter or emit

const THINKING_TAG_SCAN_RE = /<\s*(\/?)\s*(?:think(?:ing)?|thought|antthinking)\s*>/gi;

When <think>...</think> is encountered, content is handled based on reasoningMode:

off: filtered out — users don't see the AI's chain of thought
on: thinking is sent as a separate message
stream: thinking is pushed in real time (experimental)

The "when to send" timing for text blocks is controlled by blockReplyBreak:

text_end (default): send when a full text block is complete — avoids frequent interruptions
paragraph: send at each paragraph break — users see progress sooner

This involves a code-span-aware block chunker (EmbeddedBlockChunker): when splitting text, it detects whether you're inside a code block, preventing splits that would break Markdown rendering.

Event 2: Tool Calls

// Tool call event sequence:
tool_use_start → dispatch to appropriate tool executor
tool_use_result → write result back to SessionManager

Before a tool call executes, it passes through runBeforeToolCallHook:

// src/agents/pi-tools.before-tool-call.ts
export async function runBeforeToolCallHook(args: {
  toolName: string;
  params: unknown;
  toolCallId?: string;
  ctx?: HookContext;
}): Promise<HookOutcome> {
  // 1. Tool loop detection (prevent AI from looping the same tool call)
  // 2. Plugin hooks (before_tool_call hook, can intercept or modify params)
  // 3. If blocked=true, return the error as a tool result back to the AI
}

Tool loop detection: if the AI calls the same tool with identical arguments more than ~10 times, it's stuck in a loop — the AI is prompted: "Repeated identical tool call detected, please try a different approach."

Event 3: Compaction Signal

// When pi-agent-core internally triggers compaction
compaction_start → set compactionInFlight = true
compaction_done  → clear flag, continue streaming

Tool Policy: What Is the AI Allowed to Do?

The AI has many tools, but not every scenario should grant access to all of them. The tool policy is the key implementation of the security boundary.

Tool Filtering: Deny/Allow Pattern

// src/agents/pi-tools.policy.ts
function makeToolPolicyMatcher(policy: SandboxToolPolicy) {
  const deny = compileGlobPatterns({ raw: expandToolGroups(policy.deny ?? []) });
  const allow = compileGlobPatterns({ raw: expandToolGroups(policy.allow ?? []) });

  return (name: string) => {
    if (matchesAnyGlobPattern(normalized, deny)) return false;  // deny list takes priority
    if (allow.length === 0) return true;                        // no allow list = allow everything
    return matchesAnyGlobPattern(normalized, allow);
  };
}

Tool names support Glob patterns: exec:* matches all exec-series tools, bash matches only bash.

Additional Restrictions for Sub-Agents

When a main Agent spawns a sub-agent to handle a subtask, the sub-agent's tool set is further restricted:

// src/agents/pi-tools.policy.ts — always denied for sub-agents
const SUBAGENT_TOOL_DENY_ALWAYS = [
  "gateway",        // system administration — dangerous
  "agents_list",    // system administration
  "whatsapp_login", // interactive setup — not a task
  "session_status", // status/scheduling — main agent coordinates this
  "cron",           // scheduled tasks — not the sub-agent's domain
  "memory_search",  // memory — main agent passes relevant info via spawn prompt
  "memory_get",
  "sessions_send",  // direct session sends — sub-agents communicate through announce chain
];

// Leaf sub-agents (deepest level, cannot spawn further) additionally denied:
const SUBAGENT_TOOL_DENY_LEAF = [
  "sessions_list",
  "sessions_history",
  "sessions_spawn",  // leaves cannot spawn
];

This design flows from a clear principle: each Agent only does what it's meant to do. Sub-agents are executors, not managers; memory queries and task scheduling are the orchestrator's (main Agent's) responsibility.

Sub-agent spawn depth is configurable (maxSpawnDepth). The deeper the depth, the stricter the restrictions:

Depth 1 with maxSpawnDepth >= 2 (orchestrator): can spawn grandchildren
Depth >= maxSpawnDepth (leaf): can only execute, cannot spawn

The Outer Retry Loop: Fighting Real-World Unreliability

The inner "single run" occasionally fails: API rate limits, expired auth, context overflow... The outer loop is dedicated to handling these situations:

// src/agents/pi-embedded-runner/run.ts
const MAX_RUN_LOOP_ITERATIONS = resolveMaxRunRetryIterations(profileCandidates.length);
// 32–160 iterations, dynamically scaled by the number of auth profiles

while (true) {
  if (runLoopIterations >= MAX_RUN_LOOP_ITERATIONS) {
    return { error: "Exceeded retry limit after N attempts" };
  }

  const attempt = await runEmbeddedAttempt({ ... });

  if (attempt succeeded) {
    markAuthProfileGood(profileId);  // mark this profile as healthy
    return success result;
  }

  if (isRateLimitError(attempt)) {
    markAuthProfileFailure(profileId, "rate_limit");  // mark as rate-limited, enter cooldown
    const advanced = await advanceAuthProfile();       // switch to next profile
    if (!advanced) return failure;
    continue;  // retry with new profile
  }

  if (isContextOverflowError(attempt)) {
    if (overflowCompactionAttempts < 3) {
      await compactEmbeddedPiSession( ... );  // summarize session history
      overflowCompactionAttempts++;
      continue;  // retry with compacted history
    }
    return context overflow failure;
  }

  if (isAuthError(attempt)) {
    markAuthProfileFailure(profileId, "auth");
    advanceAuthProfile();
    continue;
  }
  // ...more error handling
}

Auth Profile Rotation

This is the core mechanism for handling API rate limits. You can configure multiple API keys (or OAuth accounts) as "auth profiles":

# openclaw.yml
auth:
  profiles:
    - id: primary
      provider: anthropic
      apiKey: sk-ant-...
    - id: backup-1
      provider: anthropic
      apiKey: sk-ant-...
    - id: backup-2
      provider: anthropic
      apiKey: sk-ant-...

When primary hits a rate limit:

markAuthProfileFailure(primary, "rate_limit") — enters cooldown period
advanceAuthProfile() — switches to backup-1
Retry with backup-1
If backup-1 also rate-limits, switch to backup-2
All profiles in cooldown → report "API temporarily unavailable" to user

This solves a real pain point for personal AI assistants: if you kick off a complex task at midnight that requires many API calls, a single key hitting rate limits means you just wait. With profile rotation, the system automatically continues with other keys.

Context Overflow and Compaction

LLM context windows are finite (e.g. Claude's 200k tokens). Long conversations and large tool results will eventually fill it up.

When the API returns a "context exceeded" error:

// src/agents/pi-embedded-runner/run.ts
if (isLikelyContextOverflowError(attempt)) {
  const compacted = await compactEmbeddedPiSession({
    sessionFile: params.sessionFile,
    trigger: "overflow",
    // use a lighter model for summarization (not the current heavy model)
    model: compactionModelId,
    // ...
  });
  // after compaction, retry the attempt with the summarized history
}

The compaction process:

Read the full session history
Have an AI generate a "conversation summary"
Replace the history messages with that summary
Retry the request with the compacted history

This is not simple truncation — truncating causes AI "amnesia." Compaction preserves key context. Using a smaller model for the summarization task also makes sense: it doesn't require complex reasoning, and using a cheap, fast model saves time and cost.

Compaction retries a maximum of 3 times (MAX_OVERFLOW_COMPACTION_ATTEMPTS = 3) to prevent infinite compaction loops.

The Full Flow Diagram

Putting everything together:

Message arrives at Agent
      ↓
runEmbeddedPiAgent()
  → queue in session lane (serializes same-session messages)
  → queue in global lane (fair resource sharing)
      ↓
Outer retry loop (up to 160 iterations)
  ↓ ↑ (on failure: auth rotation / context compaction / model failover)

runEmbeddedAttempt()
  ① Prepare workspace + skill environment
  ② Dynamically build system prompt (channel capabilities, skill docs)
  ③ Load session history (with history length cap)
  ④ Register tools (with policy filtering)
  ⑤ subscribeEmbeddedPiSession()
       ↓
  pi-agent-core SDK inner loop:
  [Model generates text]
       ↓ text_delta events
  Detect <think> tags → filter / send separately
  EmbeddedBlockChunker splitting (code-span-aware)
  onBlockReply → push to Gateway → broadcast to all clients
       ↓
  [Model decides to call a tool]
       ↓ tool_use event
  runBeforeToolCallHook (loop detection + plugin hooks)
       ↓ tool executes (bash / file read-write / message send / ...)
  tool_result → write back to SessionManager
       ↓ result returned to model, next reasoning round begins
       ↓
  [Model finishes]
Final reply sent back to user via channel

Summary

Problem	Solution	Key code
Same-session concurrent writes	Lane serial queue	`src/agents/pi-embedded-runner/lanes.ts`
System prompt varies by channel	Dynamically built `appendPrompt`	`run/attempt.ts:buildEmbeddedSystemPrompt`
Streaming text breaks Markdown	Code-span-aware block chunker	`src/agents/pi-embedded-block-chunker.ts`
AI stuck in tool call loop	Tool loop detection	`src/agents/pi-tools.before-tool-call.ts`
Sub-agent over-privileged	Sub-agent tool deny-list	`src/agents/pi-tools.policy.ts:SUBAGENT_TOOL_DENY_ALWAYS`
API rate limits	Auth profile rotation (up to 160 retries)	`src/agents/pi-embedded-runner/run.ts:advanceAuthProfile`
Context window overflow	Session compaction (AI summarizes history)	`src/agents/pi-embedded-runner/compact.ts`

Next article covers the Plugin SDK and extension development:

How does OpenClaw let third-party developers extend its capabilities? What interfaces does a new messaging channel (e.g. WeChat Work, Zalo) need to implement? How is plugin lifecycle managed?

Source paths: src/agents/pi-embedded-runner/ | Key files: run.ts, run/attempt.ts, pi-embedded-subscribe.ts, pi-tools.policy.ts

DEV Community