hefty

Posted on Jun 6

Agent Skills Are Dependencies Now

#ai #agents

The interesting part of agent skills is not that they make prompts nicer.

That is the demo version.

The real shift is that agent behavior is starting to move out of chat history and into folders. Instructions, scripts, reference files, examples, assets, and tool assumptions can now travel together as a reusable unit. That is useful. It is also the exact moment when a "prompting trick" becomes an operational dependency.

Once a team installs or writes a skill, the right question is no longer "does this make the model smarter?"

The better question is: "Would I let this dependency run inside my workflow without reading it?"

Most teams would say no if the dependency were a package. They should say the same thing here.

A skill is more than a longer prompt

The clean mental model for skills is simple: package the repeatable parts of a task so the agent can discover them when needed.

Anthropic's engineering writeup describes skills as folders that can include instructions, scripts, and resources. The docs describe a discovery flow where the agent sees compact metadata first, then loads the relevant skill content later. The public anthropics/skills repo makes the shape even more concrete: each skill has a SKILL.md entry point and can bring supporting files along with it.

That is a much better primitive than stuffing everything into a giant system prompt.

Big prompts rot. They become overloaded with rules for tasks that are not happening right now. They turn every run into a context-tax problem. They also make reuse awkward because copying a useful prompt block into another workflow usually means copying assumptions you forgot were there.

Skills fix a real problem by making procedures portable.

But portability cuts both ways.

If the skill only says "format this document this way," fine. That is small and reviewable. If the skill includes scripts, local file assumptions, tool instructions, authentication expectations, and a habit of reading certain directories, you are not dealing with a cute prompt snippet anymore. You are dealing with something closer to a package that changes how work gets done.

That means it needs the boring stuff.

Ownership. Review. Version history. Scope. Removal.

Progressive disclosure reduces context cost, not review cost

Progressive disclosure is one of the best ideas in this space. The agent does not need to load every long instruction file up front. It can start from metadata, then pull in the deeper procedure only when the task matches.

Great. That saves context.

It does not save you from understanding what the agent may load later.

This is the part I think teams will underestimate. A reviewer can look at the top-level skill description and think they understand the behavior. Then the actual run loads a deeper file, calls a helper script, follows a local convention, or pulls in a resource that was not obvious from the short description.

That is not automatically bad. It is the whole point of the mechanism.

But it changes the review job. You are no longer reviewing one prompt. You are reviewing a small capability tree. The short description is the package summary, not the whole dependency.

For developer workflows, that matters because agents do not just produce text. They inspect repositories. They edit files. They call tools. They run tests. They may touch issue trackers, browsers, docs, design assets, deployment scripts, or private context.

A hidden assumption in a writing skill is annoying.

A hidden assumption in a code-modifying skill is a production risk.

The useful skills will be boring

I trust a narrow skill more than a magical one.

"Generate a polished spreadsheet and verify formulas" is reviewable.

"Handle all business analysis tasks" is where the floor starts getting slippery.

The best agent skills should feel almost disappointingly specific. They should say what they are for, what inputs they expect, which files they read, which scripts they can run, and where their authority stops.

That is not anti-agent. That is how agent workflows become repeatable.

If a skill packages a creator workflow, for example, it should not vaguely say "clean up images." It should name the exact step. A narrow image asset skill might say that Gemini output cleanup applies only to Gemini-generated files and point to one browser-local option such as Gemini Watermark Cleaner as a resource, while keeping the rest of the workflow focused on asset naming, rights checks, export format, and review. That kind of scoped mention is useful because it gives the agent a boundary instead of a fuzzy instruction.

That is the standard I want from these things.

Not "the agent knows what to do."

"The agent has a bounded procedure that a human can inspect."

Review skills like you review dependencies

Small teams do not need a governance department for this. Please do not turn three skill folders into a six-page process document.

But they do need a habit.

Before you install or write a skill, ask:

Who owns it?
What task is it allowed to handle?
What tools, scripts, or commands can it trigger?
What files or resources does it expect to read?
Does it mutate state, or only produce drafts?
What environment does it assume?
Is the source trusted enough for the workflow it will enter?
How do we update it?
How do we remove it when it stops being useful?

That checklist sounds mundane because dependency hygiene is mundane. That is the point.

The failure mode for skills will not be dramatic at first. It will be boring drift.

A helper script assumes an old file path. A style rule survives after the team changes tone. A permission boundary gets copied into a new context where it no longer makes sense. A skill written for one repo gets reused in another because the name sounds close enough. A resource file grows stale, but the agent still treats it as current.

None of that requires a malicious skill. Normal entropy is enough.

Versioning matters more once skills start working

Bad skills are easy to ignore. Good skills become infrastructure.

That is when versioning starts to matter.

If a skill changes the way your agent reviews pull requests, writes docs, creates reports, or runs browser checks, you need to know when that behavior changed. A diff should be readable. A rollback should be possible. A release note should explain the behavioral change, not just say "updated instructions."

This is especially true when skills include executable helpers. Reviewing prose is one thing. Reviewing a folder that contains scripts is another. The trust bar changes the moment a skill can do more than tell the model what to write.

Treating skills as dependencies also helps with blame.

When an agent does something strange, the answer cannot always be "the model messed up." Sometimes the model followed the local procedure too well. Sometimes the skill was too broad. Sometimes the metadata caused the wrong skill to be selected. Sometimes the resource file was stale.

If the skill is versioned and owned, you have somewhere to look.

If it is just a pile of copied prompt text, good luck.

The agent stack is becoming inspectable

I like this direction. I would much rather debug a folder of instructions, scripts, and resources than argue with an invisible blob of prompt history.

But that only works if teams keep the folders inspectable.

The temptation will be to make skills bigger and more impressive. More rules. More examples. More helpers. More edge cases. More automatic behavior. That can be valuable, but every added capability expands the review surface.

The better move is usually smaller:

one skill per durable workflow
clear metadata
few permissions
boring helper scripts
obvious inputs and outputs
source files a reviewer can actually read

Agent skills should not feel like magic. Magic is hard to operate.

They should feel like dependencies you are willing to vendor, pin, update, and delete.

That is the bar. Not because skills are dangerous by default, but because they are useful enough to become part of the build system around the human.

The future of this pattern is not "prompt engineering is dead."

It is simpler than that.

The reusable parts of agent work are becoming software artifacts. Treat them that way.

Source notes

DEV Community