A stale skill is worse than no skill

#ai #agents #claude #tools

While building a skill index I almost recommended a repo with sixteen thousand stars. Looked great. Then I checked when it was last touched: January 2023. Three years dead. If an agent had loaded its instructions it would have followed them confidently, and they're wrong now. That's the whole problem nobody talks about with these skill libraries.

If you've used Claude Code or any agent setup recently, you've seen the pattern. There's a folder of SKILL.md files, or an MCP registry, or some awesome-agent-skills repo, and the agent reaches into it when it needs to do a thing. Everyone is building the library. The library is easy. It's a folder of markdown.

The part nobody builds is the part that actually matters: how does the agent know which skill in the pile won't lie to it?

A skill is instructions an AI follows. Confidently. That's different from code. When code is wrong it crashes, loud, you notice. When a skill is wrong the agent just does the wrong thing and tells you it went great. So a stale or subtly-broken skill isn't neutral. It's negative. It produces a confident wrong answer instead of making the agent stop and think. No skill at all would have been safer.

I went looking for the existing solution and what I found was a hundred link lists. They tell you a skill exists. They do not tell you the one thing you need before you load it: can I trust this, and is it still true.

So I built the missing layer. It's called Skill Atlas. It's a public index of skills, but organized by job (the thing you're about to do, like "work on Upwork" or "build an MCP server" or "write Go" or "set up CI"), and every single entry carries three things a link list skips:

where it came from, and is that source reputable
a trust tier, A through D
a last_validated date, that an actual check happened, not the publish date

The tiers are simple. A is canonical, the official source, you trust it because the vendor wrote it. B is community-proven, high reputation and still maintained. C is useful but verify before you lean on it. D is caution, it's stale or unmaintained, listed on purpose so you don't waste an afternoon rediscovering it's dead.

The freshness part isn't optional, and the English repo is why.

That English repo from the top? Sixteen thousand stars makes it an easy B by the usual logic, maybe higher. The January 2023 last-push makes it a D. Stars said trust it. The date said run.

Same story with a famous interview-prep repo, around three hundred fifty thousand stars, hasn't been touched since last August, so it drops to C. Stars are a lagging signal. People star a thing once and never unstar it when it rots. Stars plus last-push tell the truth that stars alone hide.

That's the actual product. Not the list, the judgment about the list.

To keep it from rotting itself, there's a script that re-checks every source, liveness plus live star counts plus last-push, and a GitHub Action runs it monthly and opens an issue the moment a link dies. An entry that hasn't been re-validated in six months gets treated as "verify before trusting" no matter what tier it claims. The atlas has to hold itself to the same bar it holds everything else, or it becomes the exact thing it warns about.

One more thing that surprised me while building it. For some jobs there is no good public skill, and the honest move is to say so. "How to win on Upwork" doesn't have a trustworthy public skill, because the real one is bespoke, it lives in your own head and your own win/loss history. So the atlas says that out loud instead of padding the slot with generic junk. The whole model is: find the good public starting point, then fork it private and make it yours. The public layer is the floor, not the ceiling.

It's 34 jobs right now. Backend stuff mostly, because that's what I do (Go, Postgres, docker, observability, API design, auth, the usual), plus the agent-specific ones (MCP, prompt engineering) and a few soft ones (interviews, careers). MIT, and it installs as a skill itself so your agent routes to the right vetted entry at the start of a task.

Repo's here if you want to poke at it: https://github.com/luongs3/skill-atlas

The question I actually can't answer yet, and the reason I'm posting this: what other signal would make you trust a skill before loading it? Stars and last-push are what I have. They're not enough. Install counts maybe, but those lie too. If you've got a better trust signal I want to hear it.

Top comments (1)

Mustafa ERBAY • Jun 4

Stars and last-push dates are useful signals, but I suspect the strongest trust signal is still outcome validation.

A skill that was successfully executed 500 times in the last 30 days is probably more trustworthy than a 20,000-star repository that hasn’t been touched in years.

In a sense, AI skills may need something similar to SRE principles: freshness, reliability, success rates, failure rates, and continuous verification.

The challenge isn’t building a skill library.

The challenge is building a living system that knows when not to trust its own knowledge.