DEV Community

WonderLab
WonderLab

Posted on

OpenClaw Source Deep Dive (3): Agent Execution Engine — How Does the AI Think and Act?

Series goal: After reading the full series, you'll be able to do custom development on OpenClaw and build a similar system from scratch.

Core questions in this article: After a message reaches the Agent, how does the AI "think"? How are tool calls executed? How are real-world problems like rate limits and context overflow handled?


Start With a Seemingly Simple Request

You send a message on WhatsApp:

"Summarize today's meeting notes and save the summary to my desktop."

This message travels through the Gateway and routing system described in the previous two articles, and eventually reaches the Agent. Then what?

This request requires the AI to do several things in sequence: understand intent → find the meeting notes file → read it → generate a summary → write to a file → report back. Any step can fail: file not found, API rate limit hit, context too long...

The Agent execution engine's job is to reliably complete this entire sequence, no matter what happens along the way.


First Challenge: Prevent Concurrent Conflicts

Before explaining how the AI "thinks," let's solve a foundational problem: can the same session process two messages at once?

No. A single AI conversation context (SessionKey) has one history transcript. If two messages write to it concurrently, the history gets corrupted and the AI's context breaks.

OpenClaw's solution is Lanes:

// src/agents/pi-embedded-runner/lanes.ts
export function resolveSessionLane(key: string) {
  // Each SessionKey has its own dedicated command queue
  return `session:${key}`;
}
Enter fullscreen mode Exit fullscreen mode

Each SessionKey has exactly one Lane, and tasks within a Lane execute strictly in sequence:

// src/agents/pi-embedded-runner/run.ts
const sessionLane = resolveSessionLane(params.sessionKey ?? params.sessionId);

return enqueueSession(() =>     // ① queue in session lane
  enqueueGlobal(async () => {   // ② queue in global lane
    // actual AI execution logic
  })
);
Enter fullscreen mode Exit fullscreen mode

The two nested enqueue calls each serve a purpose:

  • session lane: same-session messages are serialized — prevents concurrent writes
  • global lane: cross-session shared resources (model connections, file handles) are also fairly queued — prevents one session from monopolizing resources

This is a multi-level queue pattern: the inner level ensures concurrency safety, the outer level ensures resource fairness.


The Main Event: Five Phases of a Single Run

Once inside the Lane, the core function runEmbeddedAttempt takes over. It does five things:

Phase 1: Prepare Workspace and Skill Environment

// src/agents/pi-embedded-runner/run/attempt.ts
const sandbox = await resolveSandboxContext({ config, sessionKey, workspaceDir });
const effectiveWorkspace = sandbox?.enabled ? sandbox.workspaceDir : resolvedWorkspace;

// Switch working directory to workspace (the AI's file system perspective)
process.chdir(effectiveWorkspace);

// Load skills and apply environment variable overrides
const skillEntries = loadWorkspaceSkillEntries(effectiveWorkspace);
restoreSkillEnv = applySkillEnvOverrides({ skills: skillEntries, config });
Enter fullscreen mode Exit fullscreen mode

Skills are OpenClaw's extension mechanism — like installing "apps" for the AI. A skill can provide:

  • Dedicated environment variables (e.g. GITHUB_TOKEN)
  • Documentation (injected into the system prompt, telling the AI how to use this capability)
  • Predefined task templates

Skill documentation gets injected in the next phase so the AI knows what capabilities it has.

Phase 2: Build the System Prompt

The system prompt is the source of the AI's "personality" — it determines how the AI behaves, what it can and cannot do. OpenClaw's system prompt is dynamically built for each run:

// src/agents/pi-embedded-runner/run/attempt.ts
const appendPrompt = buildEmbeddedSystemPrompt({
  workspaceDir: effectiveWorkspace,      // where the AI's working directory is
  defaultThinkLevel: params.thinkLevel,  // whether deep thinking is enabled
  skillsPrompt,                          // installed skills documentation
  docsPath,                              // documentation path
  sandboxInfo,                           // sandbox restrictions
  tools,                                 // available tools list
  runtimeInfo: {                         // runtime environment
    host: machineName,
    os: `${os.type()} ${os.release()}`,
    model: `${params.provider}/${params.modelId}`,
    channel: runtimeChannel,
    capabilities: runtimeCapabilities,   // what this channel supports (e.g. Telegram inline buttons)
  },
  reactionGuidance,    // Telegram/Signal emoji reaction guidance
  messageToolHints,    // message sending tool usage hints
  // ...more params
});
Enter fullscreen mode Exit fullscreen mode

Notice runtimeCapabilities: the AI behaves differently on different channels. If Telegram supports inline buttons, the AI knows it can send interactive button menus. If WhatsApp doesn't, the AI sticks to plain text. The system prompt dynamically adjusts the AI's capability descriptions based on the current channel.

Phase 3: Load Session History

The AI needs to know "what was said before" to continue the conversation:

// src/agents/pi-embedded-runner/run/attempt.ts
await repairSessionFileIfNeeded({ sessionFile: params.sessionFile });

const sessionManager = guardSessionManager(
  (await createAgentSession({ sessionFile, ... })).session,
  { sessionId: params.sessionId }
);

// History length limit: DM sessions have a separate cap (prevent single-user context monopoly)
const historyLimit = getDmHistoryLimitFromSessionKey(params.sessionKey, params.config);
if (historyLimit) {
  await limitHistoryTurns(sessionManager, historyLimit);
}
Enter fullscreen mode Exit fullscreen mode

Session history is stored in JSONL files (~/.openclaw/agents/<agentId>/sessions/), managed by @mariozechner/pi-coding-agent's SessionManager. OpenClaw wraps it in guardSessionManager, intercepting every write to verify its integrity (e.g. tool_use and tool_result entries must be correctly paired).

Phase 4: Register Tools

Tools are the AI's "hands." All available tools are registered here:

// src/agents/pi-embedded-runner/run/attempt.ts
const toolsRaw = createOpenClawCodingTools({
  agentId: sessionAgentId,
  exec: { ...params.execOverrides, elevated: params.bashElevated },
  sandbox,
  messageProvider: params.messageChannel,
  sessionKey: params.sessionKey ?? params.sessionId,
  workspaceDir: effectiveWorkspace,
  config: params.config,
  abortSignal: runAbortController.signal,
  // ...more context
});

// Tool policy filtering
const tools = sanitizeToolsForGoogle({ tools: toolsRaw, provider: params.provider });
const allowedToolNames = collectAllowedToolNames({ tools, clientTools: params.clientTools });
Enter fullscreen mode Exit fullscreen mode

The tool set includes: file read/write, bash execution, message sending, web requests, media processing... The tool policy is detailed in the next section.

Phase 5: Subscribe to Streaming Output

// src/agents/pi-embedded-runner/run/attempt.ts
const subscribeResult = await subscribeEmbeddedPiSession({
  session: sessionManager,
  prompt: params.prompt,
  onBlockReply: params.onBlockReply,   // called when AI completes a text block
  onReasoningStream: params.onReasoningStream,
  // ...
});
Enter fullscreen mode Exit fullscreen mode

subscribeEmbeddedPiSession is the actual entry point for AI execution, receiving and processing streaming events from the SDK.


Streaming Subscription: How the AI's Thinking Is Captured

subscribeEmbeddedPiSession handles three types of events from @mariozechner/pi-agent-core:

Event 1: Text Stream

// Each token arrival
text_delta  accumulate into deltaBuffer  detect <think> tags  filter or emit

const THINKING_TAG_SCAN_RE = /<\s*(\/?)\s*(?:think(?:ing)?|thought|antthinking)\s*>/gi;
Enter fullscreen mode Exit fullscreen mode

When <think>...</think> is encountered, content is handled based on reasoningMode:

  • off: filtered out — users don't see the AI's chain of thought
  • on: thinking is sent as a separate message
  • stream: thinking is pushed in real time (experimental)

The "when to send" timing for text blocks is controlled by blockReplyBreak:

  • text_end (default): send when a full text block is complete — avoids frequent interruptions
  • paragraph: send at each paragraph break — users see progress sooner

This involves a code-span-aware block chunker (EmbeddedBlockChunker): when splitting text, it detects whether you're inside a code block, preventing splits that would break Markdown rendering.

Event 2: Tool Calls

// Tool call event sequence:
tool_use_start  dispatch to appropriate tool executor
tool_use_result  write result back to SessionManager
Enter fullscreen mode Exit fullscreen mode

Before a tool call executes, it passes through runBeforeToolCallHook:

// src/agents/pi-tools.before-tool-call.ts
export async function runBeforeToolCallHook(args: {
  toolName: string;
  params: unknown;
  toolCallId?: string;
  ctx?: HookContext;
}): Promise<HookOutcome> {
  // 1. Tool loop detection (prevent AI from looping the same tool call)
  // 2. Plugin hooks (before_tool_call hook, can intercept or modify params)
  // 3. If blocked=true, return the error as a tool result back to the AI
}
Enter fullscreen mode Exit fullscreen mode

Tool loop detection: if the AI calls the same tool with identical arguments more than ~10 times, it's stuck in a loop — the AI is prompted: "Repeated identical tool call detected, please try a different approach."

Event 3: Compaction Signal

// When pi-agent-core internally triggers compaction
compaction_start  set compactionInFlight = true
compaction_done   clear flag, continue streaming
Enter fullscreen mode Exit fullscreen mode

Tool Policy: What Is the AI Allowed to Do?

The AI has many tools, but not every scenario should grant access to all of them. The tool policy is the key implementation of the security boundary.

Tool Filtering: Deny/Allow Pattern

// src/agents/pi-tools.policy.ts
function makeToolPolicyMatcher(policy: SandboxToolPolicy) {
  const deny = compileGlobPatterns({ raw: expandToolGroups(policy.deny ?? []) });
  const allow = compileGlobPatterns({ raw: expandToolGroups(policy.allow ?? []) });

  return (name: string) => {
    if (matchesAnyGlobPattern(normalized, deny)) return false;  // deny list takes priority
    if (allow.length === 0) return true;                        // no allow list = allow everything
    return matchesAnyGlobPattern(normalized, allow);
  };
}
Enter fullscreen mode Exit fullscreen mode

Tool names support Glob patterns: exec:* matches all exec-series tools, bash matches only bash.

Additional Restrictions for Sub-Agents

When a main Agent spawns a sub-agent to handle a subtask, the sub-agent's tool set is further restricted:

// src/agents/pi-tools.policy.ts — always denied for sub-agents
const SUBAGENT_TOOL_DENY_ALWAYS = [
  "gateway",        // system administration — dangerous
  "agents_list",    // system administration
  "whatsapp_login", // interactive setup — not a task
  "session_status", // status/scheduling — main agent coordinates this
  "cron",           // scheduled tasks — not the sub-agent's domain
  "memory_search",  // memory — main agent passes relevant info via spawn prompt
  "memory_get",
  "sessions_send",  // direct session sends — sub-agents communicate through announce chain
];

// Leaf sub-agents (deepest level, cannot spawn further) additionally denied:
const SUBAGENT_TOOL_DENY_LEAF = [
  "sessions_list",
  "sessions_history",
  "sessions_spawn",  // leaves cannot spawn
];
Enter fullscreen mode Exit fullscreen mode

This design flows from a clear principle: each Agent only does what it's meant to do. Sub-agents are executors, not managers; memory queries and task scheduling are the orchestrator's (main Agent's) responsibility.

Sub-agent spawn depth is configurable (maxSpawnDepth). The deeper the depth, the stricter the restrictions:

  • Depth 1 with maxSpawnDepth >= 2 (orchestrator): can spawn grandchildren
  • Depth >= maxSpawnDepth (leaf): can only execute, cannot spawn

The Outer Retry Loop: Fighting Real-World Unreliability

The inner "single run" occasionally fails: API rate limits, expired auth, context overflow... The outer loop is dedicated to handling these situations:

// src/agents/pi-embedded-runner/run.ts
const MAX_RUN_LOOP_ITERATIONS = resolveMaxRunRetryIterations(profileCandidates.length);
// 32–160 iterations, dynamically scaled by the number of auth profiles

while (true) {
  if (runLoopIterations >= MAX_RUN_LOOP_ITERATIONS) {
    return { error: "Exceeded retry limit after N attempts" };
  }

  const attempt = await runEmbeddedAttempt({ ... });

  if (attempt succeeded) {
    markAuthProfileGood(profileId);  // mark this profile as healthy
    return success result;
  }

  if (isRateLimitError(attempt)) {
    markAuthProfileFailure(profileId, "rate_limit");  // mark as rate-limited, enter cooldown
    const advanced = await advanceAuthProfile();       // switch to next profile
    if (!advanced) return failure;
    continue;  // retry with new profile
  }

  if (isContextOverflowError(attempt)) {
    if (overflowCompactionAttempts < 3) {
      await compactEmbeddedPiSession( ... );  // summarize session history
      overflowCompactionAttempts++;
      continue;  // retry with compacted history
    }
    return context overflow failure;
  }

  if (isAuthError(attempt)) {
    markAuthProfileFailure(profileId, "auth");
    advanceAuthProfile();
    continue;
  }
  // ...more error handling
}
Enter fullscreen mode Exit fullscreen mode

Auth Profile Rotation

This is the core mechanism for handling API rate limits. You can configure multiple API keys (or OAuth accounts) as "auth profiles":

# openclaw.yml
auth:
  profiles:
    - id: primary
      provider: anthropic
      apiKey: sk-ant-...
    - id: backup-1
      provider: anthropic
      apiKey: sk-ant-...
    - id: backup-2
      provider: anthropic
      apiKey: sk-ant-...
Enter fullscreen mode Exit fullscreen mode

When primary hits a rate limit:

  1. markAuthProfileFailure(primary, "rate_limit") — enters cooldown period
  2. advanceAuthProfile() — switches to backup-1
  3. Retry with backup-1
  4. If backup-1 also rate-limits, switch to backup-2
  5. All profiles in cooldown → report "API temporarily unavailable" to user

This solves a real pain point for personal AI assistants: if you kick off a complex task at midnight that requires many API calls, a single key hitting rate limits means you just wait. With profile rotation, the system automatically continues with other keys.

Context Overflow and Compaction

LLM context windows are finite (e.g. Claude's 200k tokens). Long conversations and large tool results will eventually fill it up.

When the API returns a "context exceeded" error:

// src/agents/pi-embedded-runner/run.ts
if (isLikelyContextOverflowError(attempt)) {
  const compacted = await compactEmbeddedPiSession({
    sessionFile: params.sessionFile,
    trigger: "overflow",
    // use a lighter model for summarization (not the current heavy model)
    model: compactionModelId,
    // ...
  });
  // after compaction, retry the attempt with the summarized history
}
Enter fullscreen mode Exit fullscreen mode

The compaction process:

  1. Read the full session history
  2. Have an AI generate a "conversation summary"
  3. Replace the history messages with that summary
  4. Retry the request with the compacted history

This is not simple truncation — truncating causes AI "amnesia." Compaction preserves key context. Using a smaller model for the summarization task also makes sense: it doesn't require complex reasoning, and using a cheap, fast model saves time and cost.

Compaction retries a maximum of 3 times (MAX_OVERFLOW_COMPACTION_ATTEMPTS = 3) to prevent infinite compaction loops.


The Full Flow Diagram

Putting everything together:

Message arrives at Agent
      ↓
runEmbeddedPiAgent()
  → queue in session lane (serializes same-session messages)
  → queue in global lane (fair resource sharing)
      ↓
Outer retry loop (up to 160 iterations)
  ↓ ↑ (on failure: auth rotation / context compaction / model failover)

runEmbeddedAttempt()
  ① Prepare workspace + skill environment
  ② Dynamically build system prompt (channel capabilities, skill docs)
  ③ Load session history (with history length cap)
  ④ Register tools (with policy filtering)
  ⑤ subscribeEmbeddedPiSession()
       ↓
  pi-agent-core SDK inner loop:
  [Model generates text]
       ↓ text_delta events
  Detect <think> tags → filter / send separately
  EmbeddedBlockChunker splitting (code-span-aware)
  onBlockReply → push to Gateway → broadcast to all clients
       ↓
  [Model decides to call a tool]
       ↓ tool_use event
  runBeforeToolCallHook (loop detection + plugin hooks)
       ↓ tool executes (bash / file read-write / message send / ...)
  tool_result → write back to SessionManager
       ↓ result returned to model, next reasoning round begins
       ↓
  [Model finishes]
Final reply sent back to user via channel
Enter fullscreen mode Exit fullscreen mode

Summary

Problem Solution Key code
Same-session concurrent writes Lane serial queue src/agents/pi-embedded-runner/lanes.ts
System prompt varies by channel Dynamically built appendPrompt run/attempt.ts:buildEmbeddedSystemPrompt
Streaming text breaks Markdown Code-span-aware block chunker src/agents/pi-embedded-block-chunker.ts
AI stuck in tool call loop Tool loop detection src/agents/pi-tools.before-tool-call.ts
Sub-agent over-privileged Sub-agent tool deny-list src/agents/pi-tools.policy.ts:SUBAGENT_TOOL_DENY_ALWAYS
API rate limits Auth profile rotation (up to 160 retries) src/agents/pi-embedded-runner/run.ts:advanceAuthProfile
Context window overflow Session compaction (AI summarizes history) src/agents/pi-embedded-runner/compact.ts

Next article covers the Plugin SDK and extension development:

How does OpenClaw let third-party developers extend its capabilities? What interfaces does a new messaging channel (e.g. WeChat Work, Zalo) need to implement? How is plugin lifecycle managed?


Source paths: src/agents/pi-embedded-runner/ | Key files: run.ts, run/attempt.ts, pi-embedded-subscribe.ts, pi-tools.policy.ts

Top comments (0)