The missing piece in most LLM applications, and how AgentSkills fix it. We've gotten pretty good at telling AI agents who they are.
"You are an expert software engineer." "You are a seasoned marketing strategist." We hand them a persona, dump in some context, maybe paste in a few examples and then we hit send and hope for the best.
And for simple tasks? That works fine.
But the moment you ask an agent to do something that involves multiple steps, decisions, and potential failure points things start to fall apart in ways that are hard to predict and even harder to debug.
The agent sounds confident. It just doesn't behave consistently.
Here's why and what to do about it.
The Gap Nobody Talks About
There's a meaningful difference between knowing what needs to be done and knowing how to do it reliably.
A new employee on their first day might understand the goal perfectly "onboard this customer" but still flounder without a clear process. Do they send the welcome email first or set up the account? What if the system throws an error? Who do they escalate to?
Without a procedure, they improvise. Sometimes that works. Often it doesn't.
LLM agents have the exact same problem.
You can give an agent all the context in the world about what it's supposed to accomplish, and it'll still invent its own process every single time it runs. Skipping steps. Hallucinating validations. Silently glossing over failures.
This is the gap and it's where most LLM applications quietly break down.
Enter AgentSkills (and Why They're a Big Deal)
AgentSkills also called Procedure Skills are exactly what they sound like: explicit, step-by-step instructions that teach an agent how to execute a task, not just what the task is.
Think of it less like a prompt and more like a standard operating procedure. A playbook. A binder on the shelf.
Industry leaders like Anthropic and Microsoft have both converged on this idea and formalized it around a portable format called SKILL.md. That's not a coincidence it signals that the field is maturing from "prompt engineering" toward something more rigorous: procedure engineering.
What a Skill Actually Looks Like
A skill isn't a single prompt tucked inside a system message. It's a structured, self-contained unit of procedural knowledge a directory that bundles everything an agent needs to execute a specific type of task.
Here's how it breaks down:
SKILL.md is the core the instruction manual. It contains YAML frontmatter that lets the agent automatically discover and select the right skill for the job, plus detailed step-by-step execution instructions.
scripts/ holds small, single purpose automation scripts (Python, Bash, Node.js) for the steps that LLMs consistently get wrong when left to their own devices. Repetitive operations, file handling, API calls these belong in code, not in natural language instructions.
resources/ contains domain specific knowledge company standards, data schemas, regulatory rules anything the agent needs to reference but shouldn't be expected to memorize.
assets/ stores output templates. JSON schemas, document layouts, checklists so the agent produces consistent, structured results every time.
Put it all together and you get a self contained playbook instructions, tools, references, and templates in one place.
The Three Layers Most Teams Confuse
Before you can appreciate why skills matter, it helps to get clear on what they're not:
Most teams have prompts. Many now have tools. Very few have skills.
A skill is where workflow intelligence lives. It's the layer that answers the questions nobody bothers to write down: What comes first? What needs to be validated before moving on? What happens if this step fails?
Why Embedding All of This in a System Prompt Fails
The intuitive response to all of this is: "Can't I just put the procedure in the system prompt?"
You can. And for a single, small workflow it might work okay. But it breaks down fast for a few predictable reasons.
Fragility. Large, instruction-heavy prompts are brittle. One small tweak to the wording can cascade into completely different agent behavior. There's no modularity, no separation of concerns.
Token waste. Every time the agent runs, it pays the full token cost of every procedure even the ones that are completely irrelevant to the current task. At scale, this adds up fast.
Inconsistency. Without explicit validation steps ("check whether the file exists before editing it"), agents will invent shortcuts. They'll confidently skip steps and never tell you they did it.
The result is the thing that makes AI in production so frustrating: agents that sound certain and behave unpredictably.
The Idea That Changes Everything: Progressive Disclosure
Here's the mental model that ties this all together and it's dead simple.
Imagine your new employee's first day. You have two options:
Bad approach: Pile every binder all 50 of them on their desk. Tell them to read all of it before they start. By 11am they're exhausted, overwhelmed, and can't remember a thing.
Good approach: Put the binders on a shelf with clear labels. They glance at the labels, grab the one they need, read it, and do the job. Tomorrow, they grab a different one.
That's Progressive Disclosure.
In practice, it works in two phases:
Discovery Phase The agent loads only skill names and short descriptions. A table of contents for procedural knowledge. Minimal tokens, maximum orientation.
Activation Phase When a user request matches a skill's description, the agent loads the full SKILL.md and supporting assets into active memory. Only what's needed, only when it's needed.
The payoff is real: fewer hallucinations, lower token costs, better decisions when many skills exist simultaneously.
How to Design Skills That Actually Work
If you're going to build skills, these principles are worth internalizing from day one:
Write in third person imperative. "Extract the text." Not "You should try to extract the text." Precision matters ambiguous instructions produce ambiguous behavior.
Define failure states explicitly. What should the agent do when a script errors? When a file is missing? When validation fails? If you don't specify, the agent will improvise and you won't like the improvisation.
Keep skills small and composable. A skill called "Marketing" is a red flag. A skill called "Ad Copy Generation" is useful. A skill called "SEO Analysis" is useful. Small, focused skills compose into larger workflows. Monolithic skills just become another fragile mega prompt in disguise.
When Does This Actually Matter?
Not every situation calls for this level of structure. If you have one skill and it's always needed, just hand it to the agent upfront. Progressive disclosure doesn't help when there's nothing to disclose progressively.
But as your agent grows more tasks, more workflows, more edge cases the calculus changes:
- 10 skills, one needed at a time? Huge savings. Show only what's needed.
- 50 skills? Progressive disclosure becomes essential. Otherwise the agent drowns.
- Complex multi step workflows? Explicit failure states and validation steps stop being nice-to-have and become the difference between an agent that works and one that confidently fails.
The Shift Worth Making
AgentSkills represent a genuine change in how we think about building with LLMs.
We're moving from prompt engineering which is ultimately about describing what we want to procedure engineering, which is about encoding how to reliably do it.
From probabilistic answers to deterministic execution.
From agents that talk about the work to agents that actually do it.
The tools and the personas are important. But without skills, you've hired a brilliant employee who has no idea how your company actually operates. Give them the binders. Label them clearly. Put them on the shelf.
That's the whole idea.
The one takeaway: MCP gives the LLM the tools. Skills tell the LLM when to use them. Progressive disclosure means "show only what's needed, when it's needed."
Thanks
Sreeni Ramadorai



Top comments (0)