The core problem with Claude Code skills is a token budget overflow that silently drops your skill descriptions before Claude ever reads them. If you have built skills, tested them manually, and then watched Claude ignore them in real sessions — you are not doing it wrong. You are hitting a documented architectural limitation that most developers never find.
This article covers the full picture: how Claude Code skills actually work under the hood, the three root causes of missed triggers, and the reliable fixes — including when to use Anthropic's new Skill Creator tool to measure and iterate.
What Is a Claude Code Skill, Actually?
A Claude Code skill is a SKILL.md file in .claude/skills/<name>/ that Claude loads dynamically when it decides the skill is relevant to your request. It is not a plugin, not a system prompt injection, and not a function call. It is closer to a context-gated instruction block — Claude reads the description, decides if it matches the current task, and only then loads the full instructions.
The format is straightforward:
---
name: code-review
description: ">"
Reviews code for security issues, performance problems, and style violations.
Use when reviewing a pull request, inspecting a function, or when the user
asks "review this code" or "check this for bugs."
allowed-tools: Read, Grep
---
When reviewing code, always check:
1. SQL injection and input validation
2. Authentication and authorization checks
3. Error handling completeness
4. Performance bottlenecks (N+1 queries, missing indexes)
The description field is doing most of the work. It is what Claude reads to decide whether to invoke the skill.
Why Do Claude Code Skills Fail to Trigger?
Claude Code skills fail to trigger for three distinct reasons, and each has a different fix.
Root cause 1: Token budget overflow. At session startup, all skill names and descriptions are pre-loaded into the system prompt, subject to a default budget of ~15,000 characters (~4,000 tokens). When you exceed this budget — which happens faster than you expect with five or six verbose skills — some descriptions get silently truncated. Claude literally never sees them. There is no error message.
Fix: Set the environment variable before launching Claude:
SLASH_COMMAND_TOOL_CHAR_BUDGET=30000 claude
This doubles the available budget. For heavy skill setups, set it in your shell profile so it persists.
Root cause 2: YAML formatting issues. Multi-line descriptions using block scalars (> or |) occasionally break the skill loader, especially with auto-formatters like Prettier that reflow text. A skill that parses correctly in isolation can fail when Prettier reformats your SKILL.md.
Fix: Keep your description on a single logical line, and add a comment to block reformatting:
---
name: deploy
description: Deploys the application to production. ONLY invoke when the user explicitly says "deploy" or "ship to prod" — never invoke autonomously. # prettier-ignore
disable-model-invocation: true
---
Root cause 3: Claude's goal-focused behavior. Even with the budget issue resolved and valid YAML, autonomous triggering in real sessions achieves roughly a 50% success rate. Claude prioritizes completing the task as it understands it, not checking whether a skill exists for it. The architecture assumes Claude will proactively check its available tools. In practice, it often does not.
How to Force Reliable Skill Activation
The most reliable pattern for autonomous activation is directive description language — not describing what the skill does, but commanding when it must run:
---
name: security-review
description: "ALWAYS invoke this skill when reviewing any code changes before committing. Use for pull request reviews, diff reviews, and any time the user says 'check', 'review', or 'audit' code. DO NOT write security feedback without invoking this skill first."
---
The before/after difference in activation rate between descriptive and directive language is significant. In Anthropic's own testing with the Skill Creator, directive descriptions improved triggering on 5 out of 6 public skills they evaluated.
For production pipelines where you need guaranteed activation, use a UserPromptSubmit hook:
INPUT=$(cat)
PROMPT=$(echo "$INPUT" | jq -r '.prompt // empty')
if echo "$PROMPT" | grep -qiE '(review|audit|check.*code|security)'; then
echo "INSTRUCTION: Use Skill(security-review) to handle this request"
fi
Register it in .claude/settings.json:
{
"hooks": {
"UserPromptSubmit": [{
"hooks": [{
"type": "command",
"command": "~/.claude/hooks/auto-skill.sh"
}]
}]
}
}
The key distinction: the hook injects "Use Skill(security-review)" — not "Check if there are relevant skills." The explicit tool call instruction is what reliably fires, not the vague reminder.
What Does Anthropic's Skill Creator Actually Do?
Skill Creator is Anthropic's answer to the "I built this skill and have no idea if it works" problem. It shipped its major eval/benchmark upgrade on March 3, 2026, and is available at claude.com/plugins/skill-creator with 50,000+ installs.
It operates in four modes:
- Create: Interactive Q&A that generates a SKILL.md structure from your description
- Eval: You write test cases; it runs your skill against those cases and scores outputs
- Improve: Analyzes eval failures and suggests targeted changes to your description or instructions
- Benchmark: Runs a standardized assessment across your full eval set and tracks pass rate, elapsed time, and token usage across runs
The Eval and Benchmark modes are the ones that matter. The "faith-based automation" problem — build a skill, deploy it, hope it works — is the core failure mode. Skill Creator gives you measurable pass rates.
Invoke it directly:
"Run evals on my code-review skill"
"Benchmark my deploy skill across 10 runs and show variance"
"Compare version 1 vs version 2 of my security-review skill"
The Comparator sub-agent runs blind A/B comparisons between two skill versions without knowing which is which, which eliminates confirmation bias in self-evaluation.
How to Structure a SKILL.md That Scales
The description field has a 1,024-character limit. Use it for trigger keywords, not instructions. Keep the SKILL.md body under 500 lines. For complex workflows, use supporting files:
.claude/skills/deploy/
├── SKILL.md # Trigger logic + high-level steps
├── checklist.md # Full deployment checklist
└── scripts/
└── verify.sh # Post-deploy verification
Inside SKILL.md, reference supporting files instead of inlining:
---
name: deploy
description: "Deploys application to production environments. Invoke when user says 'deploy', 'ship', 'release', or 'push to prod'. ALWAYS require explicit confirmation before running."
context: fork
disable-model-invocation: true
---
Follow the checklist in @checklist.md exactly.
After deploying, run @scripts/verify.sh and report results.
The context: fork flag runs the skill in an isolated subagent context, which prevents deployment operations from contaminating your main conversation state.
What Does the Invocation Control Matrix Look Like?
Three frontmatter flags control when and how skills fire:
| Configuration | You can invoke | Claude auto-invokes | Use case |
|---|---|---|---|
| Default (no flags) | Yes | Yes | Research, formatting, analysis |
disable-model-invocation: true |
Yes | No | Deploy, commit, send messages |
user-invocable: false |
No | Yes | Background quality checks |
For anything with side effects — deploying, committing, publishing — always use disable-model-invocation: true. You do not want Claude autonomously deciding to deploy because you mentioned the word "ship."
Where to Find 1,200+ Community Skills
The community skills library at github.com/travisvn/awesome-claude-skills aggregates the most-starred skill collections. Notable sources:
-
obra/superpowers(40,900+ stars):/brainstorm,/write-plan, and 20+ battle-tested general workflow skills -
alirezarezvani/claude-skills(4,400+ stars): 180+ production-ready skills across coding, writing, and deployment workflows - Official Anthropic skills at
github.com/anthropics/skills:pdf,docx,pptx,xlsx,slack-gif-creator,webapp-testing, andbrand-guidelines
Install via Claude Code:
/plugin add /path/to/skill-directory
All skills follow the agentskills.io open standard released December 2025, which means the same SKILL.md format works across Claude.ai, Claude Code, the API, and compatible tools like OpenCode.
Key Takeaways
-
Token budget overflow is the silent killer. Set
SLASH_COMMAND_TOOL_CHAR_BUDGET=30000if you have more than three skills and wonder why they stop firing after you add new ones. - Directive description language outperforms descriptive language for autonomous triggering. "ALWAYS invoke when..." beats "Helps with..." every time.
-
Hooks give you guaranteed activation. For production workflows, a
UserPromptSubmithook with an explicit"Use Skill(name)"instruction is more reliable than description-based autonomous invocation. - Skill Creator's Eval mode makes skills measurable. Establish a baseline pass rate before optimizing — otherwise you are guessing whether your changes helped.
-
disable-model-invocation: trueis required for any side-effectful skill. Deployments, commits, and API calls should never auto-invoke.
What This Means for Builders
- Start with evals, not instructions. Write three test cases for your skill before writing the SKILL.md body. This clarifies what "working" actually means before you spend time on documentation.
- The description is the trigger, not the instructions. Spend 70% of your SKILL.md iteration time on the description field. A perfect 500-line instruction set with a vague description will never fire.
-
Skill complexity caps out around 500 lines. Beyond that, refactor into supporting files with
@filereferences. Large monolithic SKILL.md files are harder to test and harder for Claude to follow precisely. - Every multi-step production workflow needs a hook. If your automation pipeline requires a specific skill to always run at a specific point, encode that requirement in a hook — not in a description you hope Claude will read correctly.
Built with IntelFlow — open-source AI intelligence engine. Set up your own daily briefing in 60 seconds.
Top comments (0)