WonderLab

Posted on Mar 8

OpenClaw Deep Dive (8): The Skill System — Teaching LLMs to Follow Workflows On Demand

#ai #opensource #openclaw #agents

Scenario: How Does the AI Know Which Command to Use?

You ask OpenClaw: "Check today's weather in Shanghai."

The AI replies with curl "wttr.in/Shanghai?format=3", executes it, and returns accurate weather data.

But there's a question worth digging into — an LLM is a language model. It doesn't inherently know that "checking weather means using wttr.in," or that "managing GitHub PRs means using the gh CLI," or that "controlling Spotify means using spotify-player."

Something is clearly "teaching" it these things. But if you shove the full documentation for 50 tools into the system prompt, the docs alone would blow out the context window.

This is the problem the Skill system solves:

Documentation scale: 50+ tools with detailed docs each — pre-loading all of them would exhaust the LLM's context.
Tool availability: gh CLI not installed, spotify-player not configured — exposing unavailable tools to the LLM is meaningless and causes errors.
Workflow standardization: Tool usage needs to be precisely understood and followed by the LLM, not guessed at.
User experience: Users want to type /weather Shanghai directly, not write a natural-language description every time.

1. SKILL.md: A Documentation Format Designed for LLMs

Why Markdown Instead of Code?

A skill is not a program — it's an "operator's manual for the LLM." LLMs understand natural language and Markdown, so the most natural format is a Markdown file with YAML frontmatter.

Each skill is a directory containing a single SKILL.md:

---
name: weather
description: "Get current weather and forecasts via wttr.in or Open-Meteo.
  Use when: user asks about weather, temperature, or forecasts for any location.
  NOT for: historical weather data, severe weather alerts."
metadata:
  { "openclaw": { "emoji": "🌤️", "requires": { "bins": ["curl"] } } }
---

# Weather Skill

## When to Use

✅ **USE this skill when:**
- "What's the weather?"
- "Will it rain today/tomorrow?"

## Commands

bash

One-line summary

curl "wttr.in/London?format=3"

The file has two parts:

Frontmatter (machine-readable):

// src/agents/skills/types.ts
type OpenClawSkillMetadata = {
  always?: boolean;          // bypass eligibility checks and always include
  emoji?: string;            // display use
  primaryEnv?: string;       // primary environment variable dependency
  requires?: {
    bins?: string[];         // required executables
    anyBins?: string[];      // at least one must be present
    env?: string[];          // required environment variables
    config?: string[];       // required config keys
  };
  install?: SkillInstallSpec[];  // how to install dependencies
};

Two critical frontmatter fields:

description: A one-line summary — the sole "representative" in the system prompt. The LLM uses this single line to decide whether to use the skill.
metadata.openclaw.requires.bins: Declares which executables are required. If they're absent at runtime, the entire skill disappears from the system prompt.

Body (LLM-readable): Detailed "when to use," "when not to use," command templates, and caveats — this part never appears in the system prompt. It's only loaded into context when the LLM actively reads the file.

This separation is the core design insight of the entire system: metadata for machines, body for the LLM, summary as the decision signal in between.

2. Multi-Source Discovery and Priority (`workspace.ts`)

Where Do Skills Come From?

A user might simultaneously have system-built-in skills, personally installed skills, and project-level skills. All of these need to be discovered, and when names conflict there must be a clear override rule.

loadSkillEntries() scans six sources in increasing priority order:

extra (extraDirs specified in openclaw.yml skills.load.extraDirs)
  < bundled (core built-ins, the repo's skills/ directory, shipped with OpenClaw)
    < managed (~/.openclaw/skills/, installed via `openclaw skills install`)
      < agents-skills-personal (~/.agents/skills/, personal global skills)
        < agents-skills-project (workspace .agents/skills/, project-level skills)
          < workspace (workspace skills/, highest priority)

Priority is implemented with a Map<name, Skill> — later assignments overwrite earlier ones:

// src/agents/skills/workspace.ts
const merged = new Map<string, Skill>();
for (const skill of extraSkills)          merged.set(skill.name, skill);
for (const skill of bundledSkills)        merged.set(skill.name, skill);
for (const skill of managedSkills)        merged.set(skill.name, skill);
for (const skill of personalAgentsSkills) merged.set(skill.name, skill);
for (const skill of projectAgentsSkills)  merged.set(skill.name, skill);
for (const skill of workspaceSkills)      merged.set(skill.name, skill);

This means: a skills/github/SKILL.md in the project completely replaces the system-built-in github skill — it doesn't merge. Users can customize any skill's behavior for a specific project.

Nested Directory Detection

resolveNestedSkillsRoot() contains a heuristic: if dir/skills/*/SKILL.md exists, treat dir/skills as the actual skills root. This means ~/.openclaw/skills/ can contain either github/SKILL.md directly, or an entire toolkit with a nested skills/ subdirectory — both structures are correctly recognized.

3. Eligibility Filtering: Only Expose Available Skills

Should the GitHub Skill Appear If `gh` CLI Isn't Installed?

shouldIncludeSkill() performs runtime eligibility checks after loading:

// Check requires.bins: do these executables exist?
// Check requires.anyBins: does at least one exist?
// Check requires.env: are these environment variables set?
// Check requires.config: are these config keys present?
// Check os: does the current OS match? (e.g. macOS-only skills)
// always: true → skip all checks, force include

When gh isn't installed, the requires.bins: ["gh"] check fails and the GitHub skill is removed from the list — nothing about it will appear in the LLM's system prompt.

After filtering, there's a second pass: skills marked disable-model-invocation: true are also removed. These skills can only be triggered explicitly via /command — the LLM never sees them when making autonomous decisions.

Eligibility Context: Remote Environment

SkillEligibilityContext.remote supports injecting the state of a remote node:

type SkillEligibilityContext = {
  remote?: {
    platforms: string[];
    hasBin: (bin: string) => boolean;   // does the target node have curl?
    hasAnyBin: (bins: string[]) => boolean;
    note?: string;
  };
};

When an Agent executes on a remote Node Host (see part 6), eligibility checks target the remote node's environment, not the Gateway's machine — so if the remote Linux server has gh but the local Mac doesn't, the GitHub skill will still appear for the LLM.

4. Progressive Disclosure: Only Summaries in the System Prompt

How Large Would Full Docs for 150 Skills Be?

At an average of 2,000 bytes per SKILL.md, 150 skills amounts to 300KB of plain text — far beyond most models' context windows.

The solution is progressive disclosure: the system prompt only includes three fields per skill (name, description, location). The body is loaded only when the LLM decides to use a skill.

formatSkillsForPrompt() (from the @mariozechner/pi-coding-agent SDK) formats the filtered skill list into:

<available_skills>
<skill>
  <name>weather</name>
  <description>Get current weather and forecasts via wttr.in or Open-Meteo.
    Use when: user asks about weather, temperature, or forecasts for any location.
    NOT for: historical weather data, severe weather alerts.</description>
  <location>~/.openclaw/skills/weather/SKILL.md</location>
</skill>
<skill>
  <name>github</name>
  <description>GitHub operations via gh CLI: issues, PRs, CI runs, code review.
    Use when: (1) checking PR status or CI, (2) creating/commenting on issues...</description>
  <location>~/.openclaw/skills/github/SKILL.md</location>
</skill>
</available_skills>

Note the location field: /Users/alice/.openclaw/skills/weather/SKILL.md is compressed to ~/.openclaw/skills/weather/SKILL.md — this detail is handled by compactSkillPaths(). Each path saves roughly 5–6 tokens; across 150 skills that's 600–900 tokens saved.

Token Budget Control

// src/agents/skills/workspace.ts
const DEFAULT_MAX_SKILLS_IN_PROMPT = 150;
const DEFAULT_MAX_SKILLS_PROMPT_CHARS = 30_000;
const DEFAULT_MAX_SKILL_FILE_BYTES = 256_000;

// If the character limit is exceeded, binary search for the largest fitting prefix
if (!fits(skillsForPrompt)) {
  let lo = 0, hi = skillsForPrompt.length;
  while (lo < hi) {
    const mid = Math.ceil((lo + hi) / 2);
    if (fits(skillsForPrompt.slice(0, mid))) lo = mid;
    else hi = mid - 1;
  }
  skillsForPrompt = skillsForPrompt.slice(0, lo);
}

5. Meta-Instructions in the System Prompt: Teaching the LLM How to Use Skills

Does the LLM Know What to Do With the Skill List?

The list alone isn't enough — the LLM needs explicit behavioral rules. buildSkillsSection() injects both the list and the instructions into the system prompt:

// src/agents/system-prompt.ts
function buildSkillsSection(params: { skillsPrompt?: string; readToolName: string }) {
  return [
    "## Skills (mandatory)",
    "Before replying: scan <available_skills> <description> entries.",
    `- If exactly one skill clearly applies: read its SKILL.md at <location> with \`${readToolName}\`, then follow it.`,
    "- If multiple could apply: choose the most specific one, then read/follow it.",
    "- If none clearly apply: do not read any SKILL.md.",
    "Constraints: never read more than one skill up front; only read after selecting.",
    trimmed,   // ← the <available_skills> summary block
  ];
}

Several key design decisions in these instructions:

(mandatory): Marked as mandatory — the LLM must scan before every reply, not just "occasionally consult."
"read its SKILL.md at <location>": Explicitly specifies using the read tool, with the path right there in the <location> field — the LLM doesn't need to guess the path.
"never read more than one skill up front": Prevents the LLM from eagerly reading all potentially relevant skills (which would waste tokens).
"then follow it": After reading, the LLM must follow the skill, not merely reference it.

The end-to-end flow: user asks "check Shanghai weather" → LLM scans summaries → matches weather skill's description → calls read("~/.openclaw/skills/weather/SKILL.md") → reads the full workflow → executes curl "wttr.in/Shanghai?format=3".

The LLM is an active participant throughout, not a passive script executor — the Skill system gives the LLM exactly enough information to make a decision, with the full content loaded only when truly needed.

6. /Commands: The User-Triggered Path

What If the User Wants to Type `/weather Shanghai` Instead?

buildWorkspaceSkillCommandSpecs() scans all user-invocable: true skills (the default) and registers slash commands for messaging platforms:

// src/auto-reply/skill-commands.ts
// /weather → weather skill
// /github  → github skill
// Conflicts get an automatic _2 suffix

Command names are normalized:

function sanitizeSkillCommandName(raw: string): string {
  return raw
    .toLowerCase()
    .replace(/[^a-z0-9_]+/g, "_")
    .replace(/_+/g, "_")
    .replace(/^_+|_+$/g, "")
    .slice(0, 32);   // Discord limit: max 32 chars
}

Two Trigger Modes

When a user sends /weather Shanghai, the system looks up the SkillCommandSpec for weather:

Mode 1: Through the LLM (default)

/weather Shanghai
  → resolveSkillCommandInvocation() identifies the command
  → injects "weather Shanghai" as a user message into the session
  → LLM processes normally (still reads SKILL.md and decides)

Mode 2: Deterministic Tool Dispatch (command-dispatch: tool)

If the SKILL.md frontmatter declares:

command-dispatch: tool
command-tool: exec
command-arg-mode: raw

Then execution completely bypasses the LLM:

/weather Shanghai
  → dispatch.kind === "tool"
  → directly invokes the exec tool, args = "Shanghai" (passed through verbatim)
  → LLM not involved

This is extremely valuable for scenarios where "the input is clear, the tool is known, no reasoning needed" — execution is faster, and behavior is completely predictable.

7. Skill Sync in Sandbox Environments

When an Agent runs inside a Docker sandbox (see part 7), skill files need to be synced from the host into the container:

// src/agents/skills/workspace.ts
export async function syncSkillsToWorkspace(params: {
  sourceWorkspaceDir: string;  // host workspace
  targetWorkspaceDir: string;  // in-container workspace
}) {
  // 1. Load the host's skill list
  // 2. Clear the container's skills/ directory
  // 3. cp each skill directory into the container
  // 4. Path safety checks (prevent path traversal)
}

After sync, the read tool inside the container reads the container's copy of SKILL.md, not the host path. resolveSandboxPath() ensures each skill directory name is safe — no ../..-style names can escape the container.

Summary: Progressive Disclosure-Driven LLM Workflows

The Skill system's core is a clean design philosophy: don't turn documentation into code; teach the documentation to the LLM and let the LLM act on it.

Phase	Mechanism	Purpose
Discovery	Six-source scan + Map priority override	Lets users/projects override system-built-in skills
Filtering	bins/env/os eligibility checks	Only expose skills actually available in the current environment
Summary injection	`name + description + location`, character budget	Minimum token cost for LLM decision-making
Meta-instruction	`## Skills (mandatory)` + read tool path	Tells the LLM how to use the information
Progressive disclosure	LLM actively calls `read(SKILL.md)` after deciding	Full docs only enter context when truly needed
/command	`buildWorkspaceSkillCommandSpecs()` registers slash commands	User-triggered, bypasses natural language inference
Deterministic dispatch	`command-dispatch: tool`	Execution path never touches the LLM

This design means skill authors only need to write Markdown — no knowledge of LLM reasoning, tool registration, or messaging platforms required. One SKILL.md file is enough for the AI to act exactly as the author intended.

DEV Community

OpenClaw Deep Dive (8): The Skill System — Teaching LLMs to Follow Workflows On Demand

Scenario: How Does the AI Know Which Command to Use?

1. SKILL.md: A Documentation Format Designed for LLMs

Why Markdown Instead of Code?

One-line summary

2. Multi-Source Discovery and Priority (`workspace.ts`)

Where Do Skills Come From?

Nested Directory Detection

3. Eligibility Filtering: Only Expose Available Skills

Should the GitHub Skill Appear If `gh` CLI Isn't Installed?

Eligibility Context: Remote Environment

4. Progressive Disclosure: Only Summaries in the System Prompt

How Large Would Full Docs for 150 Skills Be?

Token Budget Control

5. Meta-Instructions in the System Prompt: Teaching the LLM How to Use Skills

Does the LLM Know What to Do With the Skill List?

6. /Commands: The User-Triggered Path

What If the User Wants to Type `/weather Shanghai` Instead?

Two Trigger Modes

7. Skill Sync in Sandbox Environments

Summary: Progressive Disclosure-Driven LLM Workflows

Top comments (0)

Scenario: How Does the AI Know Which Command to Use?

1. SKILL.md: A Documentation Format Designed for LLMs

Why Markdown Instead of Code?

One-line summary

2. Multi-Source Discovery and Priority (workspace.ts)

Where Do Skills Come From?

Nested Directory Detection

3. Eligibility Filtering: Only Expose Available Skills

Should the GitHub Skill Appear If gh CLI Isn't Installed?

Eligibility Context: Remote Environment

4. Progressive Disclosure: Only Summaries in the System Prompt

How Large Would Full Docs for 150 Skills Be?

Token Budget Control

5. Meta-Instructions in the System Prompt: Teaching the LLM How to Use Skills

Does the LLM Know What to Do With the Skill List?

6. /Commands: The User-Triggered Path

What If the User Wants to Type /weather Shanghai Instead?

Two Trigger Modes

7. Skill Sync in Sandbox Environments

Summary: Progressive Disclosure-Driven LLM Workflows

2. Multi-Source Discovery and Priority (`workspace.ts`)

Should the GitHub Skill Appear If `gh` CLI Isn't Installed?

What If the User Wants to Type `/weather Shanghai` Instead?