The agentskills.io spec recommends two things in every description: start with an action verb, and include a trigger phrase like "use when..." that tells the routing layer when to fire the skill. They take five seconds to add and they're the difference between a skill an agent picks up and a skill that sits unused in the catalog.
I sampled 500 skills at random from a 1,436-skill public corpus and measured both. 5.8% follow both recommendations. 61.8% follow neither.
The full breakdown of what the SKILL.md ecosystem actually looks like in production, as of late April 2026.
Methodology
Corpus: sickn33/antigravity-awesome-skills at HEAD on April 29, 2026. This is the largest publicly bundled SKILL.md collection in a single repo (1,436 indexed skills with metadata for category, source, and risk classification).
Sample: 500 skills, random with seed 42 for reproducibility.
Tool: skillcheck v1.2.0 from PyPI.
Per-skill features captured: every skillcheck diagnostic (rule, severity, message), description quality score, body line count, body and metadata token estimates, activation entropy and top-hypothesis score from --activation-hypotheses, structural features computed locally (description length in chars and words, action verb in first position, trigger-phrase presence, presence of resources//scripts//references/ subdirectories, frontmatter field count and which fields), plus the antigravity-supplied category, source, and risk metadata.
Caveat one: skillcheck's description quality score is a heuristic that includes action-verb and trigger-phrase detection as positive signals. So the correlation between these two features and the score is partly mechanical. The headline finding is not "we discovered these patterns predict quality." It's "the spec recommends these patterns, the linter that encodes the spec rewards them, and almost nobody is using them."
Caveat two: antigravity's bundler injects risk, source, date_added, and category fields into the SKILL.md frontmatter when packaging skills. The author-original frontmatter analysis below excludes these injected fields.
Reproduce in five commands:
pip install skillcheck==1.2.0
git clone --depth 1 https://github.com/sickn33/antigravity-awesome-skills.git
cd antigravity-awesome-skills
# Then sample from skills_index.json with seed 42 and run skillcheck against each
# Full analysis script: see the dataset link at the bottom
The two-pattern adoption gap
Every skill description was classified on two binary features: does it start with an action verb (Generates, Validates, Creates, Builds, Analyzes, etc., from a 90-verb allowlist), and does it contain a trigger phrase (use when, use this skill when, when the user, when working with, whenever, etc.)?
| Pattern | Count | % |
|---|---|---|
| Has both action verb and trigger phrase | 29 | 5.8% |
| Action verb only | 108 | 21.6% |
| Trigger phrase only | 54 | 10.8% |
| Neither | 309 | 61.8% |
The same four groups, scored against skillcheck's description quality metric:
| Group | n | Median score | % scoring 70+ |
|---|---|---|---|
| Has both | 29 | 90.0 | 100.0% |
| Action verb only | 108 | 70.0 | 72.2% |
| Trigger phrase only | 54 | 70.0 | 94.3% |
| Neither | 309 | 50.0 | 8.4% |
The 100% rate in the both-features group isn't magic. It reflects that skillcheck's heuristic was designed around the spec's recommendations and rewards skills that follow them. What's actually striking is the bottom line: 309 of 500 published skills skip both recommendations. That's the working majority of the ecosystem leaving easy quality on the floor.
What authors actually fill in
Outside name and description, frontmatter is mostly empty. The median author-original frontmatter (excluding the bundler's injected fields) has just two fields. Two.
| Field | Adoption |
|---|---|
| name | 99.6% |
| description | 99.6% |
| author | 10.8% |
| tags | 10.6% |
| tools | 8.8% |
| license | 3.8% |
| allowed-tools | 2.8% |
| version | 2.2% |
| triggers | 0.6% |
| user-invokable | 0.6% |
| capabilities | 0.2% |
The spec offers version, author, tags, allowed-tools, model, agent, hooks, user-invocable, disable-model-invocation, skills, mode. Almost none of them are being used. 80% of authors stop after name and description. There's an entire optional metadata layer the spec defines and the ecosystem ignores.
Progressive disclosure adoption is 16%
The spec's load-bearing concept is progressive disclosure: keep metadata tiny so the routing layer scans it cheaply, keep the body lean so it fits the agent's context window, push heavy material into resources/, scripts/, or references/ subdirectories that load only when needed.
| Subdirectory | Adoption |
|---|---|
resources/ |
6.4% |
scripts/ |
4.4% |
references/ |
8.2% |
| Any of the three | 16.0% |
84% of skills inline everything in SKILL.md. The whole architectural promise of progressive disclosure (multiple skills can sit in the agent's catalog without overwhelming context) requires authors to actually use the pattern. Most don't.
Body bloat is real
23% of skills triggered disclosure.body-bloat warnings, meaning they contain code blocks over 50 lines or tables over 20 rows in the SKILL.md body itself. These are exactly the things the progressive disclosure pattern was designed to push out into references/.
13.6% exceeded the spec's 500-line soft cap on body length. 8.4% exceeded the 5,000-token body budget when skillcheck's tokenizer flagged them (the rest weren't measured because they didn't trip the warning threshold).
Description length sweet spot
Quality scores rise with description length up to about 175-225 characters, then plateau:
| Length range (chars) | n | Median quality |
|---|---|---|
| 25-49 | 16 | 50.0 |
| 50-99 | 90 | 50.0 |
| 100-149 | 158 | 60.0 |
| 150-199 | 131 | 70.0 |
| 200-249 | 62 | 67.5 |
| 250-299 | 38 | 60.0 |
The spec's character cap is 1,024. Almost nobody's pushing it. The ecosystem clusters between 100 and 200 chars (median 145), which is roughly the bottom edge of the quality plateau. Authors writing 150+ char descriptions get noticeably better routing signal density.
Cross-source patterns
Antigravity's index classifies each skill's source. Quality patterns by source class:
| Source class | n | Median quality | % action verb | % trigger | % progressive disclosure |
|---|---|---|---|---|---|
| community | 394 | 60.0 | 26.6% | 17.5% | 16.2% |
| external_repo | 38 | 65.0 | 34.2% | 31.6% | 18.4% |
| official_org | 9 | 60.0 | 77.8% | 0.0% | 33.3% |
| personal | 14 | 50.0 | 0.0% | 0.0% | 0.0% |
Three observations. Skills from official org repos (Anthropic, Hugging Face, etc.) hit 77.8% action-verb adoption, miles above the community baseline, but zero trigger-phrase use; their descriptions are direct and verb-led without the "use when" preamble. Skills from individual external repos (someone's personal GitHub project) actually hit the highest trigger-phrase rate (31.6%), suggesting individual maintainers writing for their own activation problem think harder about it than community contributors writing for a shared list. Skills tagged "personal" (someone's curated set of their own work) hit 0% on both patterns, which is the cleanest signal that "I made this for me" doesn't translate to "an agent will pick this up."
Skillcheck v1.2.0 against the corpus
The new version was released April 28, 2026. The skillcheck rule set found:
- 1 of 500 skills produced an actual ERROR (0.2%):
android_ui_verification, which has invalid characters in its name. - 499 of 500 produced WARNINGs (99.8%).
- 0 skills passed completely clean.
Most-fired rules:
| Rule | Count |
|---|---|
frontmatter.field.unknown |
500 |
description.quality-score |
499 |
disclosure.body-bloat |
115 |
compat.unverified |
81 |
disclosure.metadata-budget |
70 |
sizing.body.line-count |
68 |
disclosure.body-budget |
42 |
frontmatter.description.person-voice |
27 |
frontmatter.field.ecosystem |
19 |
sizing.body.token-estimate |
14 |
frontmatter.name.reserved-word |
11 |
The frontmatter.field.unknown warning fires on every file because antigravity injects bundler-only fields into the frontmatter (risk, source, date_added); strip those and the genuine unknown-field rate drops dramatically. Worth knowing if you're running skillcheck against bundled corpora versus author-original repos.
What this means if you publish skills
Four things, all reversible in a single commit per skill:
Start the description with an action verb (
Generates,Validates,Creates,Analyzes,Refactors,Audits, etc.). NotExpert in, notComprehensive, notOne-stop. The verb tells the routing layer what the skill does in two syllables.Include a trigger phrase (
Use when ...,Trigger when ...,Use this skill when the user ...). The agent's routing decision is "should I activate this." A trigger phrase answers it directly.Aim for 175-225 characters in the description. Short descriptions don't carry enough routing signal; long ones bury it.
Push large code blocks (>50 lines), large tables (>20 rows), and detailed reference material out of
SKILL.mdand intoresources/,scripts/, orreferences/. The body should describe the work; the reference files should hold the work.
That's it. Four changes that move a skill from the 61.8% of the ecosystem ignoring spec recommendations to the 5.8% following them.
Methodology, for anyone who wants to push back
- Tool:
skillcheckv1.2.0 from PyPI (released April 28, 2026) - Corpus:
sickn33/antigravity-awesome-skillsat HEAD on April 29, 2026 (1,436 indexed skills) - Sample: 500 skills, drawn with
random.seed(42)thenrandom.sample - Per-skill processing:
skillcheck path --format json --skip-ref-checkplusskillcheck path --activation-hypotheses --format json - Feature extraction: action-verb match against a 90-verb allowlist (gerund and base forms); trigger-phrase match against 9 regex patterns; structural facts computed from filesystem and parsed frontmatter
- Quality score: pulled from skillcheck's
description.quality-scoreinfo diagnostic (a published heuristic whose source is atsrc/skillcheck/rules/description.pyin the skillcheck repo) - Frontmatter analysis: bundler-injected fields (
risk,source,date_added,category,id) excluded from the author-original counts above
The full dataset (500 skills, all features, all diagnostics) and the analysis output are in the skillcheck repo under docs/. Anyone who wants to verify a finding, slice it differently, or run the same pipeline against a different corpus has everything they need.
What's next
This study used skillcheck's symbolic mode and the activation-hypotheses generator. The agent-native critique mode (--ingest-critique) and capability graph extraction (--ingest-graph) weren't run here because they require a real agent in the loop and would have made the corpus run significantly longer. A follow-up study using those modes on a smaller subset (50-100 skills) would tell us what an agent actually sees in a skill versus what a static linter can measure. That's the next post.
moonrunnerkc
/
skillcheck
Cross-agent skill quality gate for SKILL.md files. Validates frontmatter, scores description discoverability, checks file references, enforces three-tier token budgets, and flags compatibility issues across Claude Code, VS Code/Copilot, Codex, and Cursor.
Cross-agent skill quality gate for SKILL.md files.
What This Does
skillcheck validates SKILL.md files against the agentskills.io specification: frontmatter structure, description quality, body size, file references, and cross-agent compatibility. New in v1.0: agent-native semantic self-critique, heuristic capability graph extraction with five structural analyzers, and a per-skill validation history ledger. It does not call any LLM API, execute skill instructions, or modify files.
Why This Exists
Analysis of 580 AI instruction files found that 96% of their content cannot be verified by any static tool. A separate survey found that 22% of SKILL.md files fail basic structural validation. Skills get written, committed, and published to catalogs; nobody proves they work.
skillcheck addresses both gaps with a two-mode design. When a calling agent is present, it uses that agent for semantic self-critique and capability graph extraction: the agent reads the skill's instructions and reports whether they are clear, complete, and internally…
Top comments (0)