Progressive Disclosure Is the Soul of Skills
Many people, when first encountering Skills, instinctively treat "the more detail, the better" as a golden rule — stuffing every piece of background context, business rule, example, and edge case into one enormous system prompt. The results are predictable: the model becomes sluggish, responses drift off-topic, token bills double, and accuracy falls rather than rises. The true ceiling of a Skill's quality is not how much information it contains, but when and how that information is delivered. This underlying design philosophy is called Progressive Disclosure.
I. Why Monolithic Prompts Are Doomed
This section answers a counterintuitive question: why does giving the model more information actually make it perform worse? Understanding this is the prerequisite for appreciating what Progressive Disclosure offers.
1.1 Attention Dilution in the Context Window
Large language models do not store every piece of information in the context window with equal fidelity, the way a database would. Research consistently shows that a model's attention is strongly position-biased — content near the beginning and end of the context carries disproportionately high weight, while long passages buried in the middle are often diluted or effectively ignored. This phenomenon is known in the research community as the "Lost in the Middle" effect.
When you feed a model a monolithic prompt of 8,000 tokens, the instructions that are actually relevant to the current task likely account for only 10–15% of that total. The attention mechanism must distribute weight across the entire context, and the sheer volume of irrelevant material directly reduces the probability that the critical instructions will be "seen." This is not a bug in the model — it is an inherent characteristic of the Transformer architecture.
More dangerously, an overloaded context also produces what might be called "instruction interference." When rule A and rule B both appear in the same prompt, and they contain a latent contradiction in certain scenarios, the model enters a subtle self-reconciliation mode. The output becomes vague and hedged rather than precisely executing either instruction.
1.2 The Double Penalty of Cost and Latency
Attention dilution is only one side of the problem. From an engineering perspective, monolithic prompts impose two additional, very concrete penalties.
The first is the linear inflation of token costs. Major API providers charge per input token, and a single Agent task typically requires dozens of model invocations. If every invocation carries an 8,000-token system prompt — even when 85% of its content is entirely irrelevant to that specific call — the full cost is charged regardless. In high-concurrency enterprise environments, this waste compounds rapidly into figures that cannot be ignored.
The second is the rise in Time to First Token (TTFT). Processing longer contexts demands more compute, which directly increases the time a user waits before receiving any response. In latency-sensitive interactive applications, a prompt bloated with redundant information not only makes the model more error-prone, it also substantially degrades the user experience. These two penalties arrive simultaneously, and the true cost of a monolithic prompt is far higher than it appears on the surface.
II. Progressive Disclosure: A Lifesaving Idea Borrowed from UI Design
Once the root causes of monolithic prompt failure are understood, the solution that Progressive Disclosure offers becomes self-evident. This section introduces where the concept originated and traces the logic of its migration into Skills design.
2.1 What Is Progressive Disclosure
Progressive Disclosure began as a UX design concept, originally articulated by Jakob Nielsen in his research on cognitive load in user interfaces. Its core idea is elegantly simple: at any given moment, show users only the information they need right now, and defer advanced options, detailed explanations, and edge cases until the user genuinely requires them. The "Advanced Settings" collapsed menu in a mobile app, a multi-step form wizard, a tooltip that expands only on hover — all of these are canonical implementations of this principle.
The reason this idea translates so naturally across domains to AI Skills design is that both situations are two manifestations of the same underlying problem: cognitive resources are finite, while the information to be processed is potentially boundless. The working memory of the human brain is limited; the effective attention window of a large language model is equally limited. You cannot — and should not — pile everything onto the table at the very beginning.
2.2 Why It Is the Soul of Skills
Calling it the "soul" is not hyperbole. The core value of the Skills system lies in enabling an Agent to precisely accomplish the task it has been delegated — not to demonstrate how much it knows. What makes a Skill fundamentally more powerful than a raw prompt is that it introduces a structured information encapsulation mechanism. It transforms the question of "what information gets loaded at what moment" from an afterthought into a design variable that can be deliberately shaped and controlled, rather than a chaotic dump of everything at once.
Progressive Disclosure governs the information architecture of Skills: the system delivers only the context that is needed, at the moment it is needed, at the granularity that is needed. This means the information density and relevance in the context is at its optimum at every step of execution. This is not merely an efficiency question — it is an accuracy question. An Agent operating with precise information is far more reliable than one operating with abundant but disordered information.
III. The Three-Tier Loading Mechanism: Theory and Practice
The three-tier loading mechanism is the concrete engineering implementation of Progressive Disclosure within the Skills framework. This section dissects its design logic tier by tier from an architectural perspective, and presents real benchmark data to illustrate the actual gains in token efficiency.
3.1 Tier One — The Entry Layer
The Entry Layer is the interface between the entire Skills system and the outside world. Its responsibility is singular: tell the Agent which Skills exist in the current system, what each Skill is called, and what general category of requests it handles. The content of this layer is typically very lean — each Skill's description is usually one or two sentences — and the total token consumption of the Entry Layer is generally between 200 and 500.
The design philosophy of the Entry Layer is "table of contents, not body text." It helps the Agent perform task routing — identifying user intent and selecting the correct Skill — and takes no responsibility for explaining how that Skill should be executed. This layer is loaded on nearly every invocation, which is precisely why it must be kept ruthlessly concise. Any redundant information placed in the Entry Layer will consume tokens at the highest possible frequency.
3.2 Tier Two — The Capability Layer
The Capability Layer is loaded only after the Agent has determined which specific Skill to invoke. It contains the full behavioral specification for that Skill: input and output formats, which tools to call, the business rules that must be followed, logic for handling exception branches, and a small number of representative examples. The information density of this layer is far higher than the Entry Layer, with token counts typically ranging from 800 to 2,000 — but it only enters the context when a specific Skill is activated.
The Capability Layer is the most information-dense layer in Skills design, and the one that demands the most careful craftsmanship. A well-written Capability Layer should enable the model to execute more than 80% of standard task paths completely and correctly without any additional guidance. It does not need to cover every edge case, but it must describe the primary workflow with enough clarity that the model can execute it confidently without having to guess.
3.3 Tier Three — The Execution Layer
The Execution Layer is the finest-grained information unit, dynamically injected only when a specific sub-task or edge case requires it. Typical Execution Layer content includes: the specific regulatory clauses needed when handling a compliance scenario, real-time data returned by an external API call, the most relevant document fragments retrieved from a knowledge base for the current query, and highly infrequent special business rules.
The Execution Layer is essentially an architecturally structured expression of on-demand RAG (Retrieval-Augmented Generation). It is not static content hardcoded into the Skill file in advance; rather, it is dynamically retrieved and injected at runtime based on the current state of the task. This design allows Skills to maintain a lean baseline context while retaining the ability to handle high-complexity tasks on demand — traveling light by default, fully equipped when needed.
3.4 Token Savings Benchmark Data
Consider a mid-complexity enterprise internal approval Agent, comparing two architectural approaches. The traditional monolithic prompt approach consolidates all rules into the system prompt, resulting in approximately 6,800 input tokens per invocation. Under the three-tier loading mechanism, the Entry Layer consumes 320 tokens, the Capability Layer averages 1,200 tokens when loaded on demand, and the Execution Layer averages 400 tokens when dynamically injected — bringing the actual input tokens per invocation to approximately 1,920, a reduction of 72%.
In a production environment handling 50,000 invocations per month, this gap translates to saving roughly 244 million input tokens monthly. At current GPT-4o pricing, that equates to over $2,400 in API cost savings per month, or nearly $30,000 annualized. More significantly, because each invocation now operates with a more precise context, the Agent's task completion accuracy rose from 76% to 91%, and the refusal rate — the proportion of responses where the model outputs "I'm not sure" due to information overload — fell by 60%. Cost reduction and accuracy improvement arrived simultaneously. This is the most compelling real-world validation of the Progressive Disclosure architecture.
graph TD
A[User Request] --> B{Entry Layer\n~320 tokens\nRoute Decision}
B -->|Matches Approval Skill| C{Capability Layer\n~1200 tokens\nLoad Behavior Spec}
B -->|Matches Other Skill| D[Load Corresponding Capability Layer]
C -->|Standard Path| E[Execute Directly]
C -->|Special Rule Needed| F{Execution Layer\n~400 tokens\nDynamic Injection}
F --> E
E --> G[Output Result]
IV. Summary
Progressive Disclosure is not an optimization trick — it is the foundational design principle of the Skills architecture. The failure of monolithic prompts stems, at its core, from conflating two distinct goals: making the model know a lot, and making the model perform well. The three-tier loading mechanism enforces strict information boundaries on every invocation's context, ensuring the model operates in the clearest, most focused state possible at each step.
The dramatic reduction in token costs is a quantifiable byproduct, but the more fundamental gains are the improvement in accuracy and the predictability of system behavior. A well-designed Skill should resemble a seasoned expert: there is no need to have every piece of knowledge on the tip of their tongue at all times, but at precisely the right moment, they can call upon exactly the right knowledge to complete the delegated task with high quality.
Top comments (0)