zecheng

Posted on Mar 15 • Originally published at lizecheng.net

Why Claude Code Skills Don't Trigger (And How to Fix Them in 2026)

#ai #buildinpublic #productivity #webdev

The core problem with Claude Code skills is a token budget overflow that silently drops your skill descriptions before Claude ever reads them. If you have built skills, tested them manually, and then watched Claude ignore them in real sessions — you are not doing it wrong. You are hitting a documented architectural limitation that most developers never find.

This article covers the full picture: how Claude Code skills actually work under the hood, the three root causes of missed triggers, and the reliable fixes — including when to use Anthropic's new Skill Creator tool to measure and iterate.

What Is a Claude Code Skill, Actually?

A Claude Code skill is a SKILL.md file in .claude/skills/<name>/ that Claude loads dynamically when it decides the skill is relevant to your request. It is not a plugin, not a system prompt injection, and not a function call. It is closer to a context-gated instruction block — Claude reads the description, decides if it matches the current task, and only then loads the full instructions.

The format is straightforward:

---
name: code-review
description: ">"
  Reviews code for security issues, performance problems, and style violations.
  Use when reviewing a pull request, inspecting a function, or when the user
  asks "review this code" or "check this for bugs."
allowed-tools: Read, Grep
---

When reviewing code, always check:
1. SQL injection and input validation
2. Authentication and authorization checks
3. Error handling completeness
4. Performance bottlenecks (N+1 queries, missing indexes)

The description field is doing most of the work. It is what Claude reads to decide whether to invoke the skill.

Why Do Claude Code Skills Fail to Trigger?

Claude Code skills fail to trigger for three distinct reasons, and each has a different fix.

Root cause 1: Token budget overflow. At session startup, all skill names and descriptions are pre-loaded into the system prompt, subject to a default budget of ~15,000 characters (~4,000 tokens). When you exceed this budget — which happens faster than you expect with five or six verbose skills — some descriptions get silently truncated. Claude literally never sees them. There is no error message.

Fix: Set the environment variable before launching Claude:

SLASH_COMMAND_TOOL_CHAR_BUDGET=30000 claude

This doubles the available budget. For heavy skill setups, set it in your shell profile so it persists.

Root cause 2: YAML formatting issues. Multi-line descriptions using block scalars (> or |) occasionally break the skill loader, especially with auto-formatters like Prettier that reflow text. A skill that parses correctly in isolation can fail when Prettier reformats your SKILL.md.

Fix: Keep your description on a single logical line, and add a comment to block reformatting:

---
name: deploy
description: Deploys the application to production. ONLY invoke when the user explicitly says "deploy" or "ship to prod" — never invoke autonomously. # prettier-ignore
disable-model-invocation: true
---

Root cause 3: Claude's goal-focused behavior. Even with the budget issue resolved and valid YAML, autonomous triggering in real sessions achieves roughly a 50% success rate. Claude prioritizes completing the task as it understands it, not checking whether a skill exists for it. The architecture assumes Claude will proactively check its available tools. In practice, it often does not.

How to Force Reliable Skill Activation

The most reliable pattern for autonomous activation is directive description language — not describing what the skill does, but commanding when it must run:

---
name: security-review
description: "ALWAYS invoke this skill when reviewing any code changes before committing. Use for pull request reviews, diff reviews, and any time the user says 'check', 'review', or 'audit' code. DO NOT write security feedback without invoking this skill first."
---

The before/after difference in activation rate between descriptive and directive language is significant. In Anthropic's own testing with the Skill Creator, directive descriptions improved triggering on 5 out of 6 public skills they evaluated.

For production pipelines where you need guaranteed activation, use a UserPromptSubmit hook:

INPUT=$(cat)
PROMPT=$(echo "$INPUT" | jq -r '.prompt // empty')

if echo "$PROMPT" | grep -qiE '(review|audit|check.*code|security)'; then
  echo "INSTRUCTION: Use Skill(security-review) to handle this request"
fi

{
  "hooks": {
    "UserPromptSubmit": [{
      "hooks": [{
        "type": "command",
        "command": "~/.claude/hooks/auto-skill.sh"
      }]
    }]
  }
}

The key distinction: the hook injects "Use Skill(security-review)" — not "Check if there are relevant skills." The explicit tool call instruction is what reliably fires, not the vague reminder.

What Does Anthropic's Skill Creator Actually Do?

Skill Creator is Anthropic's answer to the "I built this skill and have no idea if it works" problem. It shipped its major eval/benchmark upgrade on March 3, 2026, and is available at claude.com/plugins/skill-creator with 50,000+ installs.

It operates in four modes:

Create: Interactive Q&A that generates a SKILL.md structure from your description
Eval: You write test cases; it runs your skill against those cases and scores outputs
Improve: Analyzes eval failures and suggests targeted changes to your description or instructions
Benchmark: Runs a standardized assessment across your full eval set and tracks pass rate, elapsed time, and token usage across runs

The Eval and Benchmark modes are the ones that matter. The "faith-based automation" problem — build a skill, deploy it, hope it works — is the core failure mode. Skill Creator gives you measurable pass rates.

Invoke it directly:

"Run evals on my code-review skill"
"Benchmark my deploy skill across 10 runs and show variance"
"Compare version 1 vs version 2 of my security-review skill"

The Comparator sub-agent runs blind A/B comparisons between two skill versions without knowing which is which, which eliminates confirmation bias in self-evaluation.

How to Structure a SKILL.md That Scales

The description field has a 1,024-character limit. Use it for trigger keywords, not instructions. Keep the SKILL.md body under 500 lines. For complex workflows, use supporting files:

.claude/skills/deploy/
├── SKILL.md           # Trigger logic + high-level steps
├── checklist.md       # Full deployment checklist
└── scripts/
    └── verify.sh      # Post-deploy verification

Inside SKILL.md, reference supporting files instead of inlining:

---
name: deploy
description: "Deploys application to production environments. Invoke when user says 'deploy', 'ship', 'release', or 'push to prod'. ALWAYS require explicit confirmation before running."
context: fork
disable-model-invocation: true
---

Follow the checklist in @checklist.md exactly.
After deploying, run @scripts/verify.sh and report results.

The context: fork flag runs the skill in an isolated subagent context, which prevents deployment operations from contaminating your main conversation state.

What Does the Invocation Control Matrix Look Like?

Three frontmatter flags control when and how skills fire:

Configuration	You can invoke	Claude auto-invokes	Use case
Default (no flags)	Yes	Yes	Research, formatting, analysis
`disable-model-invocation: true`	Yes	No	Deploy, commit, send messages
`user-invocable: false`	No	Yes	Background quality checks

For anything with side effects — deploying, committing, publishing — always use disable-model-invocation: true. You do not want Claude autonomously deciding to deploy because you mentioned the word "ship."

Where to Find 1,200+ Community Skills

The community skills library at github.com/travisvn/awesome-claude-skills aggregates the most-starred skill collections. Notable sources:

obra/superpowers (40,900+ stars): /brainstorm, /write-plan, and 20+ battle-tested general workflow skills
alirezarezvani/claude-skills (4,400+ stars): 180+ production-ready skills across coding, writing, and deployment workflows
Official Anthropic skills at github.com/anthropics/skills: pdf, docx, pptx, xlsx, slack-gif-creator, webapp-testing, and brand-guidelines

Install via Claude Code:

/plugin add /path/to/skill-directory

All skills follow the agentskills.io open standard released December 2025, which means the same SKILL.md format works across Claude.ai, Claude Code, the API, and compatible tools like OpenCode.

Key Takeaways

Token budget overflow is the silent killer. Set SLASH_COMMAND_TOOL_CHAR_BUDGET=30000 if you have more than three skills and wonder why they stop firing after you add new ones.
Directive description language outperforms descriptive language for autonomous triggering. "ALWAYS invoke when..." beats "Helps with..." every time.
Hooks give you guaranteed activation. For production workflows, a UserPromptSubmit hook with an explicit "Use Skill(name)" instruction is more reliable than description-based autonomous invocation.
Skill Creator's Eval mode makes skills measurable. Establish a baseline pass rate before optimizing — otherwise you are guessing whether your changes helped.
disable-model-invocation: true is required for any side-effectful skill. Deployments, commits, and API calls should never auto-invoke.

What This Means for Builders

Start with evals, not instructions. Write three test cases for your skill before writing the SKILL.md body. This clarifies what "working" actually means before you spend time on documentation.
The description is the trigger, not the instructions. Spend 70% of your SKILL.md iteration time on the description field. A perfect 500-line instruction set with a vague description will never fire.
Skill complexity caps out around 500 lines. Beyond that, refactor into supporting files with @file references. Large monolithic SKILL.md files are harder to test and harder for Claude to follow precisely.
Every multi-step production workflow needs a hook. If your automation pipeline requires a specific skill to always run at a specific point, encode that requirement in a hook — not in a description you hope Claude will read correctly.

Built with IntelFlow — open-source AI intelligence engine. Set up your own daily briefing in 60 seconds.

DEV Community