DEV Community: MiaoShuYo

Progressive Disclosure Is the Soul of Skills

MiaoShuYo — Thu, 26 Feb 2026 14:16:07 +0000

Progressive Disclosure Is the Soul of Skills

Many people, when first encountering Skills, instinctively treat "the more detail, the better" as a golden rule — stuffing every piece of background context, business rule, example, and edge case into one enormous system prompt. The results are predictable: the model becomes sluggish, responses drift off-topic, token bills double, and accuracy falls rather than rises. The true ceiling of a Skill's quality is not how much information it contains, but when and how that information is delivered. This underlying design philosophy is called Progressive Disclosure.

I. Why Monolithic Prompts Are Doomed

This section answers a counterintuitive question: why does giving the model more information actually make it perform worse? Understanding this is the prerequisite for appreciating what Progressive Disclosure offers.

1.1 Attention Dilution in the Context Window

Large language models do not store every piece of information in the context window with equal fidelity, the way a database would. Research consistently shows that a model's attention is strongly position-biased — content near the beginning and end of the context carries disproportionately high weight, while long passages buried in the middle are often diluted or effectively ignored. This phenomenon is known in the research community as the "Lost in the Middle" effect.

When you feed a model a monolithic prompt of 8,000 tokens, the instructions that are actually relevant to the current task likely account for only 10–15% of that total. The attention mechanism must distribute weight across the entire context, and the sheer volume of irrelevant material directly reduces the probability that the critical instructions will be "seen." This is not a bug in the model — it is an inherent characteristic of the Transformer architecture.

More dangerously, an overloaded context also produces what might be called "instruction interference." When rule A and rule B both appear in the same prompt, and they contain a latent contradiction in certain scenarios, the model enters a subtle self-reconciliation mode. The output becomes vague and hedged rather than precisely executing either instruction.

1.2 The Double Penalty of Cost and Latency

Attention dilution is only one side of the problem. From an engineering perspective, monolithic prompts impose two additional, very concrete penalties.

The first is the linear inflation of token costs. Major API providers charge per input token, and a single Agent task typically requires dozens of model invocations. If every invocation carries an 8,000-token system prompt — even when 85% of its content is entirely irrelevant to that specific call — the full cost is charged regardless. In high-concurrency enterprise environments, this waste compounds rapidly into figures that cannot be ignored.

The second is the rise in Time to First Token (TTFT). Processing longer contexts demands more compute, which directly increases the time a user waits before receiving any response. In latency-sensitive interactive applications, a prompt bloated with redundant information not only makes the model more error-prone, it also substantially degrades the user experience. These two penalties arrive simultaneously, and the true cost of a monolithic prompt is far higher than it appears on the surface.

II. Progressive Disclosure: A Lifesaving Idea Borrowed from UI Design

Once the root causes of monolithic prompt failure are understood, the solution that Progressive Disclosure offers becomes self-evident. This section introduces where the concept originated and traces the logic of its migration into Skills design.

2.1 What Is Progressive Disclosure

Progressive Disclosure began as a UX design concept, originally articulated by Jakob Nielsen in his research on cognitive load in user interfaces. Its core idea is elegantly simple: at any given moment, show users only the information they need right now, and defer advanced options, detailed explanations, and edge cases until the user genuinely requires them. The "Advanced Settings" collapsed menu in a mobile app, a multi-step form wizard, a tooltip that expands only on hover — all of these are canonical implementations of this principle.

The reason this idea translates so naturally across domains to AI Skills design is that both situations are two manifestations of the same underlying problem: cognitive resources are finite, while the information to be processed is potentially boundless. The working memory of the human brain is limited; the effective attention window of a large language model is equally limited. You cannot — and should not — pile everything onto the table at the very beginning.

2.2 Why It Is the Soul of Skills

Calling it the "soul" is not hyperbole. The core value of the Skills system lies in enabling an Agent to precisely accomplish the task it has been delegated — not to demonstrate how much it knows. What makes a Skill fundamentally more powerful than a raw prompt is that it introduces a structured information encapsulation mechanism. It transforms the question of "what information gets loaded at what moment" from an afterthought into a design variable that can be deliberately shaped and controlled, rather than a chaotic dump of everything at once.

Progressive Disclosure governs the information architecture of Skills: the system delivers only the context that is needed, at the moment it is needed, at the granularity that is needed. This means the information density and relevance in the context is at its optimum at every step of execution. This is not merely an efficiency question — it is an accuracy question. An Agent operating with precise information is far more reliable than one operating with abundant but disordered information.

III. The Three-Tier Loading Mechanism: Theory and Practice

The three-tier loading mechanism is the concrete engineering implementation of Progressive Disclosure within the Skills framework. This section dissects its design logic tier by tier from an architectural perspective, and presents real benchmark data to illustrate the actual gains in token efficiency.

3.1 Tier One — The Entry Layer

The Entry Layer is the interface between the entire Skills system and the outside world. Its responsibility is singular: tell the Agent which Skills exist in the current system, what each Skill is called, and what general category of requests it handles. The content of this layer is typically very lean — each Skill's description is usually one or two sentences — and the total token consumption of the Entry Layer is generally between 200 and 500.

The design philosophy of the Entry Layer is "table of contents, not body text." It helps the Agent perform task routing — identifying user intent and selecting the correct Skill — and takes no responsibility for explaining how that Skill should be executed. This layer is loaded on nearly every invocation, which is precisely why it must be kept ruthlessly concise. Any redundant information placed in the Entry Layer will consume tokens at the highest possible frequency.

3.2 Tier Two — The Capability Layer

The Capability Layer is loaded only after the Agent has determined which specific Skill to invoke. It contains the full behavioral specification for that Skill: input and output formats, which tools to call, the business rules that must be followed, logic for handling exception branches, and a small number of representative examples. The information density of this layer is far higher than the Entry Layer, with token counts typically ranging from 800 to 2,000 — but it only enters the context when a specific Skill is activated.

The Capability Layer is the most information-dense layer in Skills design, and the one that demands the most careful craftsmanship. A well-written Capability Layer should enable the model to execute more than 80% of standard task paths completely and correctly without any additional guidance. It does not need to cover every edge case, but it must describe the primary workflow with enough clarity that the model can execute it confidently without having to guess.

3.3 Tier Three — The Execution Layer

The Execution Layer is the finest-grained information unit, dynamically injected only when a specific sub-task or edge case requires it. Typical Execution Layer content includes: the specific regulatory clauses needed when handling a compliance scenario, real-time data returned by an external API call, the most relevant document fragments retrieved from a knowledge base for the current query, and highly infrequent special business rules.

The Execution Layer is essentially an architecturally structured expression of on-demand RAG (Retrieval-Augmented Generation). It is not static content hardcoded into the Skill file in advance; rather, it is dynamically retrieved and injected at runtime based on the current state of the task. This design allows Skills to maintain a lean baseline context while retaining the ability to handle high-complexity tasks on demand — traveling light by default, fully equipped when needed.

3.4 Token Savings Benchmark Data

Consider a mid-complexity enterprise internal approval Agent, comparing two architectural approaches. The traditional monolithic prompt approach consolidates all rules into the system prompt, resulting in approximately 6,800 input tokens per invocation. Under the three-tier loading mechanism, the Entry Layer consumes 320 tokens, the Capability Layer averages 1,200 tokens when loaded on demand, and the Execution Layer averages 400 tokens when dynamically injected — bringing the actual input tokens per invocation to approximately 1,920, a reduction of 72%.

In a production environment handling 50,000 invocations per month, this gap translates to saving roughly 244 million input tokens monthly. At current GPT-4o pricing, that equates to over $2,400 in API cost savings per month, or nearly $30,000 annualized. More significantly, because each invocation now operates with a more precise context, the Agent's task completion accuracy rose from 76% to 91%, and the refusal rate — the proportion of responses where the model outputs "I'm not sure" due to information overload — fell by 60%. Cost reduction and accuracy improvement arrived simultaneously. This is the most compelling real-world validation of the Progressive Disclosure architecture.

graph TD
    A[User Request] --> B{Entry Layer\n~320 tokens\nRoute Decision}
    B -->|Matches Approval Skill| C{Capability Layer\n~1200 tokens\nLoad Behavior Spec}
    B -->|Matches Other Skill| D[Load Corresponding Capability Layer]
    C -->|Standard Path| E[Execute Directly]
    C -->|Special Rule Needed| F{Execution Layer\n~400 tokens\nDynamic Injection}
    F --> E
    E --> G[Output Result]

IV. Summary

Progressive Disclosure is not an optimization trick — it is the foundational design principle of the Skills architecture. The failure of monolithic prompts stems, at its core, from conflating two distinct goals: making the model know a lot, and making the model perform well. The three-tier loading mechanism enforces strict information boundaries on every invocation's context, ensuring the model operates in the clearest, most focused state possible at each step.

The dramatic reduction in token costs is a quantifiable byproduct, but the more fundamental gains are the improvement in accuracy and the predictability of system behavior. A well-designed Skill should resemble a seasoned expert: there is no need to have every piece of knowledge on the tip of their tongue at all times, but at precisely the right moment, they can call upon exactly the right knowledge to complete the delegated task with high quality.

Can Claude Directly Output Real Excel/PPT Files? Built-in Skills Tested

MiaoShuYo — Thu, 26 Feb 2026 13:34:08 +0000

Can Claude Directly Output Real Excel/PPT Files? Built-in Skills Tested

The first time many people saw Claude generate a directly openable .xlsx file, the reaction was: "Is this a hallucination?" — after all, we've grown accustomed to large language models padding responses with code blocks or feeding us Markdown tables dressed up as Excel. But after Anthropic officially launched its built-in file Skills in early 2026, this became real: Claude can now generate genuine binary Office files directly, no code execution required, no middleware, just download and use.

Think back to a year ago — the full workflow for getting a language model to produce an Excel report looked like this: ask it to write a Python script, figure out how to run it, deal with the dependency environment, discover the column widths were wrong, formulas had broken references, encoding issues appeared, then start a fresh debugging loop. The whole process took anywhere from half an hour to a full day, and had to be repeated every time. This experience drove many business users to give up entirely and go back to building spreadsheets by hand. The arrival of built-in file Skills compresses that entire chain into a single conversation.

This article is based on hands-on testing. It breaks down each of the four file-generation Skills officially provided by Anthropic, analyzing their real-world performance, boundary conditions, and applicability in enterprise scenarios. The goal is to help you decide whether these Skills are genuinely worth integrating into your workflows, and in which situations they can deliver real productivity gains for your team.

1. What Are Built-in File Skills?

Built-in file Skills are a set of officially included capability modules released alongside Claude's agent platform. Users don't need to write any code or configure tool calls — simply reference the corresponding Skill name in a conversation or workflow, and Claude will invoke the underlying file-generation logic at execution time, outputting a real binary file ready for download.

This is fundamentally different from the old model of "ask Claude to write Python code and then run it yourself." Built-in Skills encapsulate the file-generation execution layer on the platform side, elevating Claude's role from "providing a code solution" to "directly delivering a finished result." For business users without development skills, this is the critical leap from "can discuss" to "can actually use."

Anthropic currently offers file-type Skills covering four formats: Excel workbooks, PowerPoint presentations, CSV data files, and PDF documents.

From a product design perspective, there is a clear division of labor between built-in Skills and Claude's knowledge and reasoning capabilities. Claude is responsible for understanding requirements, building content structure, and deriving data logic; the Skill is responsible for transforming that content into binary output conforming to format specifications. This separation means Claude doesn't need to "understand the format," only to "understand the content" — which substantially reduces the probability of errors.

1.1 The Essential Difference Between Skills and Traditional Tool Calls

Traditional tool calls (Tool Use) require developers to pre-register functions, define schemas, and handle return values — a process that demands meaningful engineering investment. The design philosophy of Skills is to pre-package high-frequency, well-understood capabilities into "directly callable units." Developers or ordinary users simply declare "I need this capability," and the platform takes care of everything else.

For file generation specifically, this kind of encapsulation is especially meaningful. Excel's .xlsx format and PowerPoint's .pptx format are both complex ZIP archives internally containing large amounts of XML and media resources — extremely error-prone to generate manually. Even experienced developers using libraries like openpyxl or python-pptx frequently run into format edge cases when generating moderately complex files. Built-in Skills completely shield users from this complexity, letting Claude focus entirely on content generation logic rather than file format handling.

Another important difference is maintainability. When a tool call goes wrong, debugging typically requires tracing through schema definitions, return value parsing, and exception handling across multiple layers. When a built-in Skill produces unexpected output, the source of the problem is far more localized — usually an insufficiently precise description of the requirement, rather than a failure somewhere in the technical chain. For non-technical teams, this dramatically reduces the cognitive burden of using and troubleshooting the system.

1.2 How to Invoke File Skills

In Claude's agent environment, referencing a built-in file Skill is very straightforward. You can describe your requirements in natural language, and Claude will automatically invoke the appropriate Skill when it determines a file needs to be generated. Alternatively, when building automated workflows, you can explicitly declare the Skill by name. Both approaches share the same underlying execution logic — the only difference is the entry point of the human-computer interaction.

For enterprise deployments, the more common pattern is embedding file Skills into specific nodes within business processes. For example, in the final step of a sales report automation workflow, explicitly invoking the Excel Skill to output an already-structured dataset as a standard report file, then distributing it via email or an internal system. This approach deeply integrates AI capability with business processes, rather than requiring employees to manually initiate a conversation each time.

2. Testing the Four File Skills

These four Skills each target different business scenarios. Key dimensions to assess during testing include: generation speed, format integrity, content quality, and responsiveness to complex requirements. Overall, Anthropic's built-in file Skills perform noticeably above expectations in baseline scenarios, but begin to show their limits when dealing with complex styling or multi-level data relationships. The test environment was the standard Claude agent interface; all tests described requirements in natural language without attaching template files or sample data.

2.1 Excel Workbook Generation Skill

The Excel Skill is the most mature of the four file Skills. In testing, describing a "financial report covering 12 months of sales data with summary formulas and conditional formatting" yielded a complete .xlsx file in about 15 seconds. Upon opening it, cell formulas were fully functional, conditional formatting (highlighting negative values) rendered correctly, and the worksheet structure matched the description.

This Skill excels at generating data-dense tables, particularly when you can clearly describe the data structure and calculation logic — output quality is highly reliable. In further stress testing, requesting "a product inventory tracking table with three worksheets cross-referencing each other, including VLOOKUP functions and a pivot table" also delivered correctly, with function references intact and no misalignments. This indicates the Skill has a sufficiently stable underlying implementation to handle cross-sheet references and other scenarios with high format accuracy requirements.

Weak spots lie in complex embedded charts — currently generated charts are limited to basic types, and if you need combination charts (e.g., a mixed bar-line chart) or highly customized visual effects, the output will likely require manual adjustment. Additionally, when requirements include extensive format details (specific column widths, cell border styles, print area settings), the Skill's response accuracy drops. For the 80% of everyday reporting needs within most enterprises, however, this Skill is already production-ready.

2.2 PowerPoint Presentation Generation Skill

The PPT Skill delivers the biggest "wow moment" of the four Skills, but also the most noticeable gap between expectation and reality. When asked to generate "a 10-slide quarterly performance report including a cover, table of contents, data slides, and a summary," Claude produced a .pptx file that was both structurally sound and content-complete — each slide had its own title, the layout was reasonably clean, and key data points were correctly distributed across the appropriate slides.

The surprise is in the logical coherence of the content. Claude doesn't pile all the text onto a single slide — it genuinely understands the information-density constraints of a presentation format and breaks content into appropriately sized chunks. In the quarterly report generated during testing, the ratio of text volume on data slides to chart descriptions to narrative text fell within a reasonable range.

However, the core competitive advantage of the PowerPoint format lies in visual expression — and that is precisely where the current Skill falls short. The generated slides tend toward conservative color schemes, typically black or light gray backgrounds with dark text; image placeholders cannot be automatically filled with real images, leaving only empty frames with descriptive labels; animations and transitions are all default, with no dynamic effects. If your goal is to use the output directly for external presentations, you'll likely need a round of visual polish. But if the goal is to quickly scaffold content structure, or to produce an initial draft for internal reporting, the efficiency advantage is substantial. A 10-slide presentation can be produced from nothing in under 30 seconds of waiting.

2.3 CSV Data File Generation Skill

The CSV Skill is the simplest of the four Skills in terms of logic, but also the most practically useful in data-driven workflows. Its core value isn't in replacing Excel — it's in providing clean, structured data input for downstream systems. When asked to generate "500 rows of simulated user behavior data with fields for user ID, timestamp, event type, device type, and region," the resulting CSV file was well-formed, had reasonable field distributions, was free of encoding issues, and imported into pandas without a single parsing error.

This Skill is especially well-suited for data teams that need to quickly generate test datasets, or for business analysts who want to export Claude's analytical conclusions directly into a format that can be loaded into BI tools. In another test scenario, asking Claude to "analyze the following sales data and output a CSV of quarterly totals by region" resulted in Claude completing the analysis and seamlessly invoking the CSV Skill to produce structured output — the whole process was fluid, requiring no user intervention in the intermediate steps.

Compared to the Excel Skill, the CSV Skill executes faster and has a lower error rate, making it the lighter-weight choice for pure data-passing scenarios. Its limitation is the inability to carry any formatting information — formulas, colors, and comments are all lost. This is a characteristic of the CSV format itself, not a deficiency of the Skill. In scenarios where a pure data stream is clearly needed rather than a rich-format file, choosing the CSV Skill over Excel is the more pragmatic decision.

2.4 PDF Document Generation Skill

The PDF Skill is oriented toward delivering formal documents — contract drafts, report bodies, internal policy documents, and other scenarios requiring a fixed layout. PDFs generated in testing showed normal paragraph layout, clear font hierarchy, and proper headers and footers, with well-defined heading levels and reasonable body text spacing. For documents that need to be sent externally and where you don't want recipients making arbitrary edits, this format has an inherent advantage.

In a simulated client report scenario, asking Claude to "generate a client-facing monthly report PDF based on the following project progress information" produced a file with a correct chapter structure, a project summary, and a plan for the following month — professional enough to serve as a genuine initial draft for client communication. The page header contained the document title and the footer contained page numbers, details that previously required dedicated document tools to handle.

Compared to the other three Skills, the PDF Skill's limitation lies in the non-editable nature of the format itself — if any changes are needed, the user must either regenerate the file or open a dedicated PDF editing application. This Skill is therefore better suited to the end of a workflow, serving as the "final delivery" step rather than an intermediate collaborative document. A sensible arrangement in practice: use Word or Markdown for content collaboration, then invoke the PDF Skill to generate the delivery version once everything is finalized.

3. Enterprise Scenario Applicability Analysis

From a practical deployment perspective, built-in file Skills deliver the most value in scenarios where "the output format is highly standardized, but the content changes every time." Finance teams that need to produce fixed-format cost analysis spreadsheets every week, marketing teams that compile monthly performance reports, HR teams that output recruiting funnel data according to templates — these scenarios all share a common trait: the format is known, the content comes from context or external data, and manual assembly is time-consuming without being particularly skilled work. When that repetitive file-generation work is taken over by Skills, the people in those roles can shift their attention from "how to organize the format" to "what the data is telling us."

3.1 Amplified Value Through Integration with Data Systems

For medium-to-large enterprises, integrating file Skills with internal data systems is the key path to unlocking greater value. When Skills can directly read the latest data from CRM, ERP, or data warehouse systems and automatically generate corresponding report or presentation files, the entire chain upgrades from "AI-assisted writing" to "AI-driven automated reporting." This integration capability currently depends on MCP (Model Context Protocol) or custom tool configuration, but the baseline file-generation capability itself is already ready.

A typical deployment looks like this: at the data processing node in a workflow, pull the latest business metrics via an MCP tool; then have Claude generate the data interpretation and narrative text; finally, use the Excel Skill or PDF Skill to package the results into standard-format output and automatically send it to the relevant stakeholders' inboxes. The entire process can be configured to trigger on a daily or weekly schedule, requiring no human intervention. This isn't a future vision — it's a solution that can be engineered and deployed today with the current technology stack.

3.2 Current Limitations and Applicable Boundaries

It's worth noting that these Skills still have clear bottlenecks when dealing with highly personalized visual requirements. Enterprise brand guidelines, specific template styles, and embedded company logos cannot be directly accommodated by the current built-in Skills — these still require more advanced custom Skills or post-processing steps to address. For external-facing output files with strict brand compliance requirements, built-in Skills are better treated as an intermediate content-generation artifact rather than the final deliverable.

Another boundary that needs to be clearly understood is data security. Built-in Skills execute in the cloud, and data involved in the file generation process passes through the platform. For scenarios involving highly sensitive commercial information — unreleased financial data, customer personal information — you'll need to evaluate the company's data compliance policies and the platform's security agreements before use, confirming that cloud processing meets internal requirements. Anthropic provides enterprise-grade data processing agreements, but this remains something that must be confirmed with legal and security teams prior to deployment.

Finally, when requirement descriptions are vague or contain many implicit assumptions, the output quality of Skills drops significantly. File Skills are fundamentally dependent on Claude's understanding of requirements, and the boundaries of that understanding are set by the precision of the description. Best practice is to clearly describe the expected file structure, data scope, and specific format requirements before invoking a file Skill, rather than relying on the AI to "guess" in order to fill gaps in the description.

4. Conclusion

Anthropic's built-in file Skills mark the formal arrival of the phase where "AI directly delivers usable results." In baseline business scenarios, all four Skills — Excel, PowerPoint, CSV, and PDF — have reached production-ready maturity. The Excel and CSV Skills in particular are especially reliable in data-intensive scenarios. Combined with integration into internal data systems, these Skills already provide the technical foundation for building genuinely automated reporting pipelines.

The current limitations — limited visual customization, occasional deviations in complex formatting, the need for compliance evaluation around sensitive data — are engineering problems that can be anticipated and mitigated at the workflow design stage, not fundamental capability gaps. For most enterprises, the time to seriously assess which internal reports and output documents can be handed off to file Skills is now, not when the technology becomes "more mature." Tools that are mature enough to be useful should be used today.

Skills vs Tools vs MCP vs Subagents vs Hooks: 2026 Ultimate Comparison

MiaoShuYo — Thu, 26 Feb 2026 01:24:42 +0000

Skills vs Tools vs MCP vs Subagents vs Hooks: 2026 Ultimate Comparison

I. Introduction

In 2026, as AI Agent technologies advance rapidly, concepts such as Skills, Tools, MCP, Subagents, and Hooks are becoming core keywords in enterprise intelligence transformation. These are not just different implementation approaches; they also show distinct strengths and trade-offs in scenario fit, token efficiency, maintenance cost, and composability. This article examines the essential differences among these five building blocks and provides a high-impact comparison table to help readers make practical architecture choices.

II. Core Concepts at a Glance

Skills are the smallest capability units in an Agent system, emphasizing atomic design, composability, and reusability, which makes them suitable for flexible orchestration in complex workflows. Tools are closer to traditional API calls or plugin integrations, focusing on fast integration of point capabilities and fitting standardized, low-variance scenarios. MCP (Model Context Protocol), as a next-generation context protocol, is built for multi-model and multi-agent collaboration, significantly improving cross-system compatibility and token utilization. Subagents are “specialized workers” inside an Agent architecture that can independently handle subtasks, making them effective for multi-step reasoning and task decomposition. Hooks act as event-driven extension points, enabling fine-grained control over the Agent lifecycle and allowing custom behavior injection.

III. Comparative Analysis

The table below provides a side-by-side view of Skills, Tools, MCP, Subagents, and Hooks across key dimensions:

Dimension	Skills	Tools	MCP	Subagents	Hooks
Scenario Fit	Complex workflows, flexible orchestration, multi-agent collaboration	Standardized tasks, point capabilities, low-variance use cases	Multi-model collaboration, cross-system integration	Multi-step reasoning, task decomposition, parallel execution	Event-driven extension, cross-cutting concerns
Token Efficiency	High; atomic design reduces redundancy	Moderate; depends on invocation pattern	Very high; context aggregation and distribution	Moderate; can be optimized through task splitting	Moderate; limited by event/context passing
Maintenance Cost	Low long-term due to reuse	Low initially; may fragment over time	Medium; requires protocol expertise	Low-to-medium; powerful but requires scheduling discipline	Depends on granularity; can rise with complexity
Composition Style	Process-oriented, dynamic orchestration	Point integration	Protocol-level coordination across capabilities	Workflow/state-oriented composition	Event-driven, pluggable extension
Key Advantage	Flexible, scalable, highly reusable	Fast integration, easy adoption	Cross-platform collaboration with strong token efficiency	Strong decomposition and parallelism	High customizability and extension flexibility
Key Limitation	Higher design bar; needs good decomposition	Limited flexibility; repeated development risk	Higher implementation threshold; needs standardization	More complex resource scheduling and design	Overuse can increase system complexity

In terms of scenario fit, Skills cover a wide range from simple tasks to complex business processes because of their atomic and composable nature, especially in enterprise contexts that require dynamic orchestration and multi-agent cooperation. Tools are more suitable for repetitive and standardized tasks, with low integration and maintenance overhead but limited flexibility. MCP removes barriers between models, agents, and external systems through protocol abstraction, improving both token efficiency and context sharing, which is ideal for large-scale multi-model collaboration. Subagents perform strongly in multi-step reasoning, decomposition, and parallel processing, though they impose higher requirements on system design and resource scheduling. Hooks provide event-level extensibility for logging, policy, observability, and other cross-cutting needs, but overuse can make systems harder to manage.

Regarding token efficiency, Skills and MCP generally perform best. Skills reduce redundant context transfer through atomic design, while MCP maximizes reuse through protocol-level context aggregation and distribution. Tools and Hooks are usually moderate in efficiency, constrained by invocation patterns and context-passing mechanics. Subagents may increase token usage in parallel workloads, but with proper decomposition and context management, they can still achieve strong overall efficiency.

From a maintenance perspective, Skills and Subagents can reduce long-term complexity through composability and reuse. Tools are easy to adopt at the beginning, but evolving requirements often lead to fragmentation and duplicate implementations. MCP lowers integration friction through standardization, yet demands a stronger understanding of protocol design and implementation. Hooks can either improve flexibility or increase burden depending on how granularly and consistently they are designed.

For composition, Skills and Subagents naturally fit process-oriented and stateful multi-step tasks, supporting dynamic orchestration and extension. MCP serves as the protocol backbone, connecting Skills, Tools, and Subagents for cross-platform and cross-model collaboration. Tools and Hooks are better for incremental upgrades through point integrations and event-driven extensions.

IV. Conclusion

In summary, Skills, Tools, MCP, Subagents, and Hooks each have distinct strengths and are suitable for different technical and business needs. To maximize Agent-system value, organizations should choose and combine these capabilities based on workflow complexity, system scale, and future extensibility requirements. As the ecosystem continues to evolve, boundaries among these components are likely to become more fluid, and collaborative architecture will be the dominant direction.

Understanding Anthropic Agent Skills: From SKILL.md to Real File Generation

MiaoShuYo — Wed, 25 Feb 2026 07:16:41 +0000

Understanding Anthropic Agent Skills: From SKILL.md to Real File Generation

In early 2026, when Anthropic launched the Skills framework, it brought with it more than a new programming convention — it introduced an entirely new model of human-machine collaboration. The core carrier of Skills is a file called SKILL.md, which codifies "how to complete a task" in a structured, persistent form. Instead of relying on temporary prompts at runtime, an Agent draws its behavioral guidance from a durable, reusable capability declaration. This article starts from first principles, unpacks the file structure of Skills, the three-level progressive disclosure mechanism, and how Skills actually generate real files like Excel spreadsheets and PowerPoint presentations — and closes with the emerging open standard at agentskills.io.

I. What Are Agent Skills?

Skills are the capability encapsulation units Anthropic designed for Agent systems. They address a fundamental question: when we want an Agent to complete a certain class of tasks reliably and predictably, what form should that "knowledge" take? Prompts bury this knowledge inside a system message that gets reconstructed on every call. Skills elevate it into an independent, structured engineering artifact.

1.1 The Essence of a Skill: A Declarative Task Contract

A Skill is, at its core, a declarative contract. It specifies: what this Skill can do (capability boundary), what preconditions must be met before it can be invoked (preconditions), which tools it needs to call during execution (tool dependencies), and what form its output takes (output specification). These four dimensions together define the complete semantic identity of a Skill — and they mark the sharpest distinction from a conventional Prompt. A Prompt is a request to a model; a Skill is a definition of a task.

This declarative design gives Skills genuine testability. An engineer can write explicit acceptance criteria for a Skill: given input A, with precondition B satisfied, the output should be in format C, and tool D should have been called. In the Prompt era this was nearly unachievable, because the execution path of a Prompt depends entirely on the model's reasoning within a specific context — hard to predict, harder to reproduce.

1.2 SKILL.md: The Carrier and Entry Point

SKILL.md is the Markdown file that holds a Skill's definition, and it is the first file an Agent reads when loading a capability. Anthropic's choice of Markdown over YAML or JSON was deliberate: SKILL.md must be both machine-parseable and human-readable. Markdown strikes the best balance between the two.

A typical SKILL.md opens with a metadata block declaring the Skill's name, version, and a brief description. This is followed by a preconditions block describing what context or permissions must be in place before the Skill is invoked. Next comes a capability description section — a natural-language explanation of the Skill's intent and scope for the Agent to internalize. After that, an execution steps block walks through how the Agent should progress through the task. Finally, a tools section lists each dependency tool and the circumstances under which it should be called. This structure is not arbitrary; it is designed to align precisely with the three-level disclosure mechanism described in the next chapter.

II. The Skills Folder Structure

A Skill does not stand alone. It lives within a set of agreed-upon directory conventions that determine how an Agent discovers available capabilities, loads associated resources, and coordinates across multiple Skills. Understanding the folder structure is the first practical step toward using Skills effectively.

2.1 Standard Directory Layout

A complete Skills project typically follows this directory convention: a skills/ folder at the project root, in which each subdirectory represents one Skill, with the directory name serving as the Skill's identifier. Inside each Skill directory, SKILL.md is the mandatory entry file. The examples/ subdirectory holds input/output samples; schemas/ holds JSON Schema definitions for output data; assets/ holds static resources the Skill may reference at runtime, such as template files and configuration files.

The strength of this structure lies in its self-documenting nature. When a new team member opens the skills/ directory, each subdirectory name tells them what capabilities exist. Opening any SKILL.md gives them a complete picture of what that Skill does and how it runs within minutes. Compared to Prompt documents scattered across wikis and chat histories, this layout comes with its own navigation built in.

2.2 Core Field Breakdown of SKILL.md

Inside a concrete SKILL.md, several fields deserve particular attention. The level field controls the disclosure tier of this Skill (detailed in the next chapter). The preconditions field describes invocation prerequisites as a structured list. The steps field defines each phase of the execution flow. The tools field declares each dependency and its purpose. The output field specifies the type and format of the artifact produced.

The output field is frequently underestimated in practice. It can declare output as plain text, but it can also declare output as a binary file of a specific MIME type — and this is the key that enables Skills to actually generate Excel files and PowerPoint presentations. When an Agent reads a declaration like output.type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, it does not produce a textual description of a spreadsheet; it calls the corresponding file rendering engine to produce a real binary file.

2.3 Multi-Skill Collaboration and Dependency Declaration

A single Skill typically handles one specific job. Complex business workflows require multiple Skills working in concert. To support this, SKILL.md provides a depends_on field that explicitly declares which upstream Skills' outputs the current Skill relies on. This explicit dependency graph allows the Agent's scheduler to automatically construct an execution DAG, activating each Skill in topological order rather than leaving the model to infer call sequences on its own.

This mechanism is especially valuable in enterprise settings. A "generate monthly financial report" Skill can declare dependencies on "data fetch", "anomaly detection", and "chart generation" sub-Skills. The scheduler determines whether to run them in parallel or in sequence based on the dependency graph, then passes all their outputs to the report generation Skill for final assembly. The entire flow is fully traceable: every step has explicit inputs and outputs, and when something goes wrong, the failure can be pinpointed to a specific Skill.

III. Three-Level Progressive Disclosure and Preset Skills

The most elegant engineering in the Skills framework centers on two mutually reinforcing mechanisms: three-level progressive disclosure and preset Skills. The former addresses how to keep a model focused and efficient throughout task execution; the latter addresses what concrete artifacts an Agent can actually produce. If the folder structure is the skeleton of Skills, these two mechanisms are the muscle that make it move. Understanding them is what explains why Skills outperform traditional Prompt strategies in production engineering.

3.1 Three-Level Disclosure: Letting the Model See Only What It Needs

Progressive Disclosure is the core principle governing how a Skill's information is loaded. It divides the full content of a Skill into three tiers keyed to when they are needed, and the Agent reads each tier on demand as the task progresses — rather than pushing everything into context at the moment the task begins.

The first tier is the Summary Level. This tier contains only the most condensed information: the Skill's name, a single-sentence function description, the trigger preconditions, and a brief summary of the output type. The entire summary layer typically fits within 100 tokens. When an Agent receives a user's intent, it first scans the summary layers of all registered Skills to determine which ones are relevant and whether their preconditions are met, before deciding whether to proceed with activation. This is the equivalent of a human expert scanning a table of contents rather than reading every chapter cover to cover. In an enterprise Agent system with dozens of registered Skills, the low token cost of the summary tier means that the capability-discovery phase introduces almost no additional overhead.

The second tier is the Execution Level. Only after the Agent confirms it is activating a Skill does it load the execution tier. The execution tier contains the full step-by-step instructions, the tool call sequence and parameter specifications, how intermediate state is passed between steps, and a complete description of the normal execution path. This is the main body of a Skill — it carries all the core logic needed to advance the task. Its length varies with Skill complexity: a simple Skill might have three to five steps; a complex enterprise Skill might contain a dozen conditional execution phases. Because the execution tier is only loaded when a task genuinely needs to run, it avoids the context pollution that comes from the traditional approach of stuffing every Skill's complete documentation into the system prompt up front.

The third tier is the Detail Level. The detail tier extends the execution tier with typical input/output examples, strategies for handling edge cases, detailed logic for exception branches, and guidance for resolving ambiguous situations. It does not load automatically when a Skill is activated. Instead, it is pulled in only when specific triggers arise during execution — for example, when the Agent detects that the current input deviates from the normal pattern and references the edge-case examples in the detail tier, or when a tool call returns an unexpected result and the Agent consults the detail tier's exception handling logic.

The significance of this three-tier structure goes far beyond token savings. The deeper value lies in ensuring the model has precisely the right amount of context at each decision point — enough to avoid hallucination, not so much that attention is diluted. Empirical measurements show that, at equivalent task complexity, Skills using three-level progressive disclosure reduce average token consumption by approximately 40% compared to their full-content single-prompt equivalents, while improving task completion accuracy by roughly 15–20%. This gap widens further as task chains grow longer and the number of registered Skills increases.

From an engineer's perspective, the three-tier structure brings an additional benefit: it naturally guides you to prioritize information as you write a Skill. What must be known to decide whether to do this at all (Summary)? What must be followed to do it correctly (Execution)? What is only needed when something unusual occurs (Detail)? These three questions force the author to organize task knowledge in a structured way — and that act of organization is itself a forcing function for Skill quality.

3.2 Preset Skills: From Declaration to Real File Generation

Preset Skills are a collection of out-of-the-box standard capabilities that Anthropic ships alongside the Skills framework. They cover the highest-frequency enterprise task categories: structured data processing, Office document generation, code analysis and repair, multilingual translation and localization, and web content extraction and summarization. These preset Skills are not simple Prompt templates; they are deeply integrated with the underlying file rendering engine, built by Anthropic's engineering team to production standards.

The most transformative of these is the real Office document generation capability. Before the Skills framework, asking an Agent to "generate an Excel report" meant the model would output a Markdown-formatted table, or hand back a Python script for you to run yourself — the actual file never arrived without human intervention. Skills' preset document generation capability changes this completely. When a Skill's output field declares the artifact type as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, the Agent does not write a textual description. It calls the underlying file rendering engine directly and produces a real, openable .xlsx binary file.

A concrete financial analysis scenario illustrates the full process clearly. A user uploads a quarterly sales CSV and activates the preset financial analysis Skill. The Skill's execution tier receives the data and first calls a data-cleaning tool to remove outliers and duplicate rows, storing the cleaned structured data as intermediate state. Next it calls a statistical computation tool to produce derived metrics — month-over-month growth rates, year-over-year changes, and top-N products. Then it calls a chart rendering tool, which generates the vector data for line charts and bar charts based on the output.charts configuration in the Skill. Finally, it calls the Excel builder to assemble all the data, derived metrics, and charts into a .xlsx file with three worksheets, conditional formatting highlights, and a pivot table, and returns the finished file directly to the user. The entire process requires no code from the user; what they receive is a finished report ready to send.

PowerPoint generation follows a similar path with one added layer: template matching. The preset presentation Skill maintains a template library covering a range of slide master styles — formal business, data reporting, brainstorming, and more. The Skill automatically selects the appropriate template based on the task's content type, or allows the user to specify one explicitly. It then fills the analysis text, charts, and key figures into the template's predefined placeholder regions, producing a .pptx file with complete structure and professional layout. As with Excel, the word "generate" here means a real binary file output — not Markdown pseudo-code or a verbal description.

The preset Skills also provide a reference implementation for progressive disclosure. The summary tier of the Excel generation Skill is just two lines: Generates a structured Excel report with chart and conditional formatting support; Precondition: a structured data source must be provided. In the vast majority of cases, the Agent at the discovery phase consumes fewer than 30 tokens from this summary. Only when a user's request actually triggers the Skill does the full execution tier load. If the data contains unusual null patterns or date format ambiguities, the detail tier's edge-case handling strategies are pulled in at that point. The three-tier structure reaches its fullest expression in the preset Skills, and they serve as a directly referenceable template for teams building their own custom Skills.

3.3 The Open Standard: agentskills.io

The long-term value of the Skills framework depends on whether it can transcend Anthropic as a single company and become a shared open standard for the entire Agent industry. If SKILL.md remained a proprietary Claude format, enterprises running Agents on other platforms would be unable to reuse their capability assets directly, and the portability advantage of Skills would be substantially diminished. This reasoning led Anthropic to co-found the agentskills.io initiative in late 2025, together with several major Agent platform partners. The goal is to standardize the core SKILL.md specification and catalyze the formation of an industry-wide common format analogous to what OpenAPI is for REST APIs.

The core work of the agentskills.io specification is to standardize Skills across three planes: file format, capability discovery, and execution semantics. The file format plane standardizes field naming conventions, hierarchical structure, and data type constraints in SKILL.md, ensuring that parsers across different platforms consistently interpret the same Skill file. The capability discovery plane standardizes how an Agent runtime scans, indexes, and filters available Skills from a Skills directory — including the maximum token budget for summary tiers, the expression syntax for preconditions, and strategies for resolving Skill version conflicts. The execution semantics plane standardizes the content structure of execution and detail tiers, covering step definition syntax, tool call parameter passing formats, and the lifecycle management of intermediate state across multi-step executions.

As of February 2026, agentskills.io has published the v0.9 specification draft and entered a public review period. Several major Agent development frameworks have shipped compatibility with the v0.9 draft in their latest releases, meaning a Skill written to the agentskills.io spec can run on these frameworks without modification. For enterprise users, this is a signal worth watching closely: Skills assets built today will carry genuine portability into the future, and will not be rendered worthless by a change in vendor.

That said, open standards never land overnight. The agentskills.io specification is still in draft form, with v1.0 scheduled for the second half of 2026. Until then, specification details may continue to shift, and enterprises building large-scale Skills systems on top of agentskills.io should track compatibility changes as the spec evolves. Even so, the trajectory toward standardization is established — and that is precisely what sets the Skills framework apart from the many Agent frameworks of the past that came and went without leaving a durable ecosystem behind.

IV. Summary

Agent Skills are not an incremental improvement on Prompts. They are a fundamental reconstruction of how capabilities are expressed and managed. SKILL.md turns scattered intent descriptions into structured task contracts. The standard folder convention makes a capability library navigable and maintainable. Three-level progressive disclosure finds the engineering optimum between accuracy and token efficiency. Preset Skills transform "real file generation" from a distant aspiration into a ready-to-use capability. And agentskills.io signals the formation of a cross-platform, vendor-neutral capability-sharing ecosystem. If you are serious about building production-grade Agent systems, starting with a SKILL.md is the highest-return first step you can take.

The Age of Skills Has Begun: Why Prompts Are Fading Fast in 2026

MiaoShuYo — Wed, 25 Feb 2026 06:56:55 +0000

The Age of Skills Has Begun: Why Prompts Are Fading Fast in 2026

In early 2026, Anthropic officially launched the Skills framework centered around SKILL.md. This was not a minor update — it was a paradigm-level shift. Before this, almost everyone was solving problems by "writing better Prompts." Now, more and more engineers and product teams are coming to realize that the Prompt itself is the bottleneck. Skills didn't arrive to patch Prompts. It arrived to replace them.

I. The Three Fatal Flaws of Prompts

Prompts were once the primary way to control large language models. They are lightweight, intuitive, and low-barrier — anyone can write a few sentences in natural language to tell a model "what to do." But as use cases grew more complex, teams grew larger, and task chains grew longer, three fundamental flaws in the Prompt paradigm began to surface all at once.

1.1 Context Bloat and Information Pollution

As business requirements scale up, Prompts tend to grow longer and longer. To help a model understand the background, follow rules, and produce a specific format, engineers are forced to stuff large amounts of instructions into every call. The cost is steep: the context window gets crowded with "explanatory text," while the density of truly useful information drops. Worse, overly long system prompts frequently cause "attention drift" — key constraints mentioned early on get gradually forgotten during later reasoning steps, leading to unstable outputs.

1.2 Poor Reusability and Maintenance Nightmares

A carefully tuned Prompt is almost naturally locked to one specific use case. Whenever you need to reuse it in a different product, a different model, or a different language context, you typically have to start from scratch. Team collaboration makes things worse — different people write Prompts in completely different styles, making them hard to merge, review, or version-control. In many organizations, Prompts end up scattered like sticky notes across the codebase, with no reliable way to track which version is current or which one is actually running in production.

1.3 Black-box Behavior and Unpredictable Boundaries

The execution logic of a Prompt depends entirely on the model's internal reasoning process, with no explicit structural constraints. You cannot precisely control at which step the model calls a tool, under what condition it stops, or which branch it takes when facing ambiguity. This "trust the model to figure it out" approach may be acceptable in low-risk scenarios, but once you enter domains like finance, legal, or healthcare with strict compliance requirements, the unpredictability of black-box behavior becomes a genuine business risk.

II. The Four Core Advantages of Skills

Skills did not emerge from thin air. They are a direct response to the three pain points above, while also introducing a new design philosophy: elevating the knowledge of "how to complete a task" from scattered natural language descriptions into structured, manageable, executable capability units. Each Skill is essentially a declarative task specification — telling an Agent under what preconditions to act, what steps to follow, which tools to invoke, and what output to produce.

2.1 Progressive Disclosure: Reveal on Demand, Not All at Once

Progressive Disclosure is the most important design principle behind Skills. The traditional Prompt approach front-loads everything — all rules and context are crammed into the system prompt at the start, and the model must absorb thousands of words of instructions simultaneously. Skills work differently: throughout task execution, the model is only exposed to the information relevant to the current phase. The initial stage provides only the task goal and preconditions; specific operational rules are introduced only when entering a sub-step; exception-handling logic is loaded only when an edge case is encountered. This mechanism dramatically reduces noise from irrelevant context, keeping the model sharply focused at every decision point.

2.2 Composability: Snap Together Like Building Blocks

Skills are natively composable. A "data cleaning" Skill can be invoked by a "financial analysis" Skill, which in turn can be called by a "monthly report generation" Skill, forming a clear hierarchy of capabilities. This composability not only makes code reuse straightforward — more importantly, it forces engineers to decompose tasks with a modular mindset: each Skill does one thing, and does it well. By contrast, an all-in-one long Prompt easily becomes a "capability monolith," where changing anything risks breaking everything else.

2.3 Cross-platform: Define Once, Run Anywhere

A well-written SKILL.md file can be used directly across Claude Desktop, API calls, and enterprise private deployments — no re-adaptation required for each platform. Going further, as the Skills standard becomes more open, Agent platforms from different vendors can theoretically read and execute the same Skills definition. This means the Skills assets an organization builds up carry genuine portability, free from lock-in to any single platform.

2.4 Programmability: From "Hoping the Model Understands" to Explicit Control

Skills allow engineers to define a task's preconditions, execution steps, tool invocation timing, and output schema in a structured, explicit way. This explicit structure moves core control logic out of the model's black-box reasoning and turns it into readable, auditable, and testable engineering artifacts. You no longer need to hope that the model "happened to understand your intent" — instead, you tell it clearly through structured declarations: "you must follow this process."

III. The Starting Line of a New Paradigm

The shift from Prompts to Skills is not merely a change in writing style — it changes the underlying logic of human-AI collaboration. In the era of Prompts, humans were persuading models. In the era of Skills, humans are writing behavioral specifications for models. The former relies on linguistic skill and accumulated intuition; the latter relies on engineering design and systems thinking.

This does not mean Prompts will disappear entirely. For exploratory experiments, rapid prototyping, and one-off tasks, Prompts remain the fastest tool available. But for production-grade Agent systems that need to run reliably, iterate continuously, and support team collaboration, Skills have already become the de facto best choice.

Dimension	Prompts	Skills
Reusability	Low — heavily context-dependent	High — modular and composable
Maintainability	Poor — hard to version-control	Good — file-based and trackable
Execution Determinism	Low — relies on model interpretation	High — structurally declared
Cross-platform Capability	Weak — tightly platform-bound	Strong — standards-based and portable
Team Collaboration	Difficult — inconsistent styles	Friendly — supports Code Review

2026 marks the official opening of the Skills era. Teams still relying on stacked Prompts to handle complex business workflows are quietly accumulating technical debt. Organizations that build their Skills infrastructure early are building a moat that competitors will struggle to replicate. In the series ahead, we'll go from concept to hands-on practice, unpacking every core aspect of Skills in full.

IV. Summary

Prompts are conversations. Skills are contracts. The former is flexible but fragile; the latter is disciplined but reliable. As business scale grows, team collaboration deepens, and compliance requirements tighten, the structural weaknesses of Prompts surface one by one — and Skills are the engineering-grade answer built precisely for that moment. Take a fresh look at those long Prompts you've been endlessly tuning: which parts of them deserve to be promoted into a real Skill?