MiaoShuYo

Posted on Feb 25

Understanding Anthropic Agent Skills: From SKILL.md to Real File Generation

#agents #ai #architecture #automation

Understanding Anthropic Agent Skills: From SKILL.md to Real File Generation

In early 2026, when Anthropic launched the Skills framework, it brought with it more than a new programming convention — it introduced an entirely new model of human-machine collaboration. The core carrier of Skills is a file called SKILL.md, which codifies "how to complete a task" in a structured, persistent form. Instead of relying on temporary prompts at runtime, an Agent draws its behavioral guidance from a durable, reusable capability declaration. This article starts from first principles, unpacks the file structure of Skills, the three-level progressive disclosure mechanism, and how Skills actually generate real files like Excel spreadsheets and PowerPoint presentations — and closes with the emerging open standard at agentskills.io.

I. What Are Agent Skills?

Skills are the capability encapsulation units Anthropic designed for Agent systems. They address a fundamental question: when we want an Agent to complete a certain class of tasks reliably and predictably, what form should that "knowledge" take? Prompts bury this knowledge inside a system message that gets reconstructed on every call. Skills elevate it into an independent, structured engineering artifact.

1.1 The Essence of a Skill: A Declarative Task Contract

A Skill is, at its core, a declarative contract. It specifies: what this Skill can do (capability boundary), what preconditions must be met before it can be invoked (preconditions), which tools it needs to call during execution (tool dependencies), and what form its output takes (output specification). These four dimensions together define the complete semantic identity of a Skill — and they mark the sharpest distinction from a conventional Prompt. A Prompt is a request to a model; a Skill is a definition of a task.

This declarative design gives Skills genuine testability. An engineer can write explicit acceptance criteria for a Skill: given input A, with precondition B satisfied, the output should be in format C, and tool D should have been called. In the Prompt era this was nearly unachievable, because the execution path of a Prompt depends entirely on the model's reasoning within a specific context — hard to predict, harder to reproduce.

1.2 SKILL.md: The Carrier and Entry Point

SKILL.md is the Markdown file that holds a Skill's definition, and it is the first file an Agent reads when loading a capability. Anthropic's choice of Markdown over YAML or JSON was deliberate: SKILL.md must be both machine-parseable and human-readable. Markdown strikes the best balance between the two.

A typical SKILL.md opens with a metadata block declaring the Skill's name, version, and a brief description. This is followed by a preconditions block describing what context or permissions must be in place before the Skill is invoked. Next comes a capability description section — a natural-language explanation of the Skill's intent and scope for the Agent to internalize. After that, an execution steps block walks through how the Agent should progress through the task. Finally, a tools section lists each dependency tool and the circumstances under which it should be called. This structure is not arbitrary; it is designed to align precisely with the three-level disclosure mechanism described in the next chapter.

II. The Skills Folder Structure

A Skill does not stand alone. It lives within a set of agreed-upon directory conventions that determine how an Agent discovers available capabilities, loads associated resources, and coordinates across multiple Skills. Understanding the folder structure is the first practical step toward using Skills effectively.

2.1 Standard Directory Layout

A complete Skills project typically follows this directory convention: a skills/ folder at the project root, in which each subdirectory represents one Skill, with the directory name serving as the Skill's identifier. Inside each Skill directory, SKILL.md is the mandatory entry file. The examples/ subdirectory holds input/output samples; schemas/ holds JSON Schema definitions for output data; assets/ holds static resources the Skill may reference at runtime, such as template files and configuration files.

The strength of this structure lies in its self-documenting nature. When a new team member opens the skills/ directory, each subdirectory name tells them what capabilities exist. Opening any SKILL.md gives them a complete picture of what that Skill does and how it runs within minutes. Compared to Prompt documents scattered across wikis and chat histories, this layout comes with its own navigation built in.

2.2 Core Field Breakdown of SKILL.md

Inside a concrete SKILL.md, several fields deserve particular attention. The level field controls the disclosure tier of this Skill (detailed in the next chapter). The preconditions field describes invocation prerequisites as a structured list. The steps field defines each phase of the execution flow. The tools field declares each dependency and its purpose. The output field specifies the type and format of the artifact produced.

The output field is frequently underestimated in practice. It can declare output as plain text, but it can also declare output as a binary file of a specific MIME type — and this is the key that enables Skills to actually generate Excel files and PowerPoint presentations. When an Agent reads a declaration like output.type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, it does not produce a textual description of a spreadsheet; it calls the corresponding file rendering engine to produce a real binary file.

2.3 Multi-Skill Collaboration and Dependency Declaration

A single Skill typically handles one specific job. Complex business workflows require multiple Skills working in concert. To support this, SKILL.md provides a depends_on field that explicitly declares which upstream Skills' outputs the current Skill relies on. This explicit dependency graph allows the Agent's scheduler to automatically construct an execution DAG, activating each Skill in topological order rather than leaving the model to infer call sequences on its own.

This mechanism is especially valuable in enterprise settings. A "generate monthly financial report" Skill can declare dependencies on "data fetch", "anomaly detection", and "chart generation" sub-Skills. The scheduler determines whether to run them in parallel or in sequence based on the dependency graph, then passes all their outputs to the report generation Skill for final assembly. The entire flow is fully traceable: every step has explicit inputs and outputs, and when something goes wrong, the failure can be pinpointed to a specific Skill.

III. Three-Level Progressive Disclosure and Preset Skills

The most elegant engineering in the Skills framework centers on two mutually reinforcing mechanisms: three-level progressive disclosure and preset Skills. The former addresses how to keep a model focused and efficient throughout task execution; the latter addresses what concrete artifacts an Agent can actually produce. If the folder structure is the skeleton of Skills, these two mechanisms are the muscle that make it move. Understanding them is what explains why Skills outperform traditional Prompt strategies in production engineering.

3.1 Three-Level Disclosure: Letting the Model See Only What It Needs

Progressive Disclosure is the core principle governing how a Skill's information is loaded. It divides the full content of a Skill into three tiers keyed to when they are needed, and the Agent reads each tier on demand as the task progresses — rather than pushing everything into context at the moment the task begins.

The first tier is the Summary Level. This tier contains only the most condensed information: the Skill's name, a single-sentence function description, the trigger preconditions, and a brief summary of the output type. The entire summary layer typically fits within 100 tokens. When an Agent receives a user's intent, it first scans the summary layers of all registered Skills to determine which ones are relevant and whether their preconditions are met, before deciding whether to proceed with activation. This is the equivalent of a human expert scanning a table of contents rather than reading every chapter cover to cover. In an enterprise Agent system with dozens of registered Skills, the low token cost of the summary tier means that the capability-discovery phase introduces almost no additional overhead.

The second tier is the Execution Level. Only after the Agent confirms it is activating a Skill does it load the execution tier. The execution tier contains the full step-by-step instructions, the tool call sequence and parameter specifications, how intermediate state is passed between steps, and a complete description of the normal execution path. This is the main body of a Skill — it carries all the core logic needed to advance the task. Its length varies with Skill complexity: a simple Skill might have three to five steps; a complex enterprise Skill might contain a dozen conditional execution phases. Because the execution tier is only loaded when a task genuinely needs to run, it avoids the context pollution that comes from the traditional approach of stuffing every Skill's complete documentation into the system prompt up front.

The third tier is the Detail Level. The detail tier extends the execution tier with typical input/output examples, strategies for handling edge cases, detailed logic for exception branches, and guidance for resolving ambiguous situations. It does not load automatically when a Skill is activated. Instead, it is pulled in only when specific triggers arise during execution — for example, when the Agent detects that the current input deviates from the normal pattern and references the edge-case examples in the detail tier, or when a tool call returns an unexpected result and the Agent consults the detail tier's exception handling logic.

The significance of this three-tier structure goes far beyond token savings. The deeper value lies in ensuring the model has precisely the right amount of context at each decision point — enough to avoid hallucination, not so much that attention is diluted. Empirical measurements show that, at equivalent task complexity, Skills using three-level progressive disclosure reduce average token consumption by approximately 40% compared to their full-content single-prompt equivalents, while improving task completion accuracy by roughly 15–20%. This gap widens further as task chains grow longer and the number of registered Skills increases.

From an engineer's perspective, the three-tier structure brings an additional benefit: it naturally guides you to prioritize information as you write a Skill. What must be known to decide whether to do this at all (Summary)? What must be followed to do it correctly (Execution)? What is only needed when something unusual occurs (Detail)? These three questions force the author to organize task knowledge in a structured way — and that act of organization is itself a forcing function for Skill quality.

3.2 Preset Skills: From Declaration to Real File Generation

Preset Skills are a collection of out-of-the-box standard capabilities that Anthropic ships alongside the Skills framework. They cover the highest-frequency enterprise task categories: structured data processing, Office document generation, code analysis and repair, multilingual translation and localization, and web content extraction and summarization. These preset Skills are not simple Prompt templates; they are deeply integrated with the underlying file rendering engine, built by Anthropic's engineering team to production standards.

The most transformative of these is the real Office document generation capability. Before the Skills framework, asking an Agent to "generate an Excel report" meant the model would output a Markdown-formatted table, or hand back a Python script for you to run yourself — the actual file never arrived without human intervention. Skills' preset document generation capability changes this completely. When a Skill's output field declares the artifact type as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, the Agent does not write a textual description. It calls the underlying file rendering engine directly and produces a real, openable .xlsx binary file.

A concrete financial analysis scenario illustrates the full process clearly. A user uploads a quarterly sales CSV and activates the preset financial analysis Skill. The Skill's execution tier receives the data and first calls a data-cleaning tool to remove outliers and duplicate rows, storing the cleaned structured data as intermediate state. Next it calls a statistical computation tool to produce derived metrics — month-over-month growth rates, year-over-year changes, and top-N products. Then it calls a chart rendering tool, which generates the vector data for line charts and bar charts based on the output.charts configuration in the Skill. Finally, it calls the Excel builder to assemble all the data, derived metrics, and charts into a .xlsx file with three worksheets, conditional formatting highlights, and a pivot table, and returns the finished file directly to the user. The entire process requires no code from the user; what they receive is a finished report ready to send.

PowerPoint generation follows a similar path with one added layer: template matching. The preset presentation Skill maintains a template library covering a range of slide master styles — formal business, data reporting, brainstorming, and more. The Skill automatically selects the appropriate template based on the task's content type, or allows the user to specify one explicitly. It then fills the analysis text, charts, and key figures into the template's predefined placeholder regions, producing a .pptx file with complete structure and professional layout. As with Excel, the word "generate" here means a real binary file output — not Markdown pseudo-code or a verbal description.

The preset Skills also provide a reference implementation for progressive disclosure. The summary tier of the Excel generation Skill is just two lines: Generates a structured Excel report with chart and conditional formatting support; Precondition: a structured data source must be provided. In the vast majority of cases, the Agent at the discovery phase consumes fewer than 30 tokens from this summary. Only when a user's request actually triggers the Skill does the full execution tier load. If the data contains unusual null patterns or date format ambiguities, the detail tier's edge-case handling strategies are pulled in at that point. The three-tier structure reaches its fullest expression in the preset Skills, and they serve as a directly referenceable template for teams building their own custom Skills.

3.3 The Open Standard: agentskills.io

The long-term value of the Skills framework depends on whether it can transcend Anthropic as a single company and become a shared open standard for the entire Agent industry. If SKILL.md remained a proprietary Claude format, enterprises running Agents on other platforms would be unable to reuse their capability assets directly, and the portability advantage of Skills would be substantially diminished. This reasoning led Anthropic to co-found the agentskills.io initiative in late 2025, together with several major Agent platform partners. The goal is to standardize the core SKILL.md specification and catalyze the formation of an industry-wide common format analogous to what OpenAPI is for REST APIs.

The core work of the agentskills.io specification is to standardize Skills across three planes: file format, capability discovery, and execution semantics. The file format plane standardizes field naming conventions, hierarchical structure, and data type constraints in SKILL.md, ensuring that parsers across different platforms consistently interpret the same Skill file. The capability discovery plane standardizes how an Agent runtime scans, indexes, and filters available Skills from a Skills directory — including the maximum token budget for summary tiers, the expression syntax for preconditions, and strategies for resolving Skill version conflicts. The execution semantics plane standardizes the content structure of execution and detail tiers, covering step definition syntax, tool call parameter passing formats, and the lifecycle management of intermediate state across multi-step executions.

As of February 2026, agentskills.io has published the v0.9 specification draft and entered a public review period. Several major Agent development frameworks have shipped compatibility with the v0.9 draft in their latest releases, meaning a Skill written to the agentskills.io spec can run on these frameworks without modification. For enterprise users, this is a signal worth watching closely: Skills assets built today will carry genuine portability into the future, and will not be rendered worthless by a change in vendor.

That said, open standards never land overnight. The agentskills.io specification is still in draft form, with v1.0 scheduled for the second half of 2026. Until then, specification details may continue to shift, and enterprises building large-scale Skills systems on top of agentskills.io should track compatibility changes as the spec evolves. Even so, the trajectory toward standardization is established — and that is precisely what sets the Skills framework apart from the many Agent frameworks of the past that came and went without leaving a durable ecosystem behind.

IV. Summary

Agent Skills are not an incremental improvement on Prompts. They are a fundamental reconstruction of how capabilities are expressed and managed. SKILL.md turns scattered intent descriptions into structured task contracts. The standard folder convention makes a capability library navigable and maintainable. Three-level progressive disclosure finds the engineering optimum between accuracy and token efficiency. Preset Skills transform "real file generation" from a distant aspiration into a ready-to-use capability. And agentskills.io signals the formation of a cross-platform, vendor-neutral capability-sharing ecosystem. If you are serious about building production-grade Agent systems, starting with a SKILL.md is the highest-return first step you can take.

DEV Community

Understanding Anthropic Agent Skills: From SKILL.md to Real File Generation

Understanding Anthropic Agent Skills: From SKILL.md to Real File Generation

I. What Are Agent Skills?

1.1 The Essence of a Skill: A Declarative Task Contract

1.2 SKILL.md: The Carrier and Entry Point

II. The Skills Folder Structure

2.1 Standard Directory Layout

2.2 Core Field Breakdown of SKILL.md

2.3 Multi-Skill Collaboration and Dependency Declaration

III. Three-Level Progressive Disclosure and Preset Skills

3.1 Three-Level Disclosure: Letting the Model See Only What It Needs

3.2 Preset Skills: From Declaration to Real File Generation

3.3 The Open Standard: agentskills.io

IV. Summary

Top comments (0)