Rost

Posted on Apr 25 • Originally published at glukhov.org

Claude Skills and SKILL.md for Developers: VS Code, JetBrains, Cursor

#llm #ai #aicoding #dev

Most teams misuse Claude Skills in one of two ways. They either turn SKILL.md into a dumping ground, or they never graduate from giant copy-pasted prompts.

Both approaches are sloppy. If you want Skills to work in a real dev workflow, you need to treat them like code and operations logic, not like prompt poetry.

Claude Skills are directories anchored by SKILL.md, with optional scripts, references, and assets. They work because of progressive disclosure. The agent starts by loading only compact metadata such as the skill name and description, then reads the full instructions only when the task matches. That lets an agent keep many skills available without bloating every session from the start.

Anthropic's own guidance makes the intended division of labour pretty clear. CLAUDE.md is for durable, always-on project context. Skills are for reusable knowledge, playbooks, and invocable workflows that should load on demand. Claude Code even folded old custom commands into the same mechanism, so legacy .claude/commands/*.md files still work, but Skills are now the better long-term shape — and the most reusable building block in any AI-powered development workflow.

When to Use Claude Skills: CLAUDE.md vs Skills vs Hooks

A Claude Skill is worth creating when you keep pasting the same checklist, the same deployment playbook, the same code review rubric, or the same internal API gotchas into chat. Anthropic explicitly recommends creating a skill when you keep reusing the same procedure, or when a section of CLAUDE.md has grown into a process rather than a fact. That is the practical answer to the FAQ question "What is a Claude Skill and when should you use one". Use a Skill for repeatable procedure, not for general taste or broad repo rules.

The real win is control over context cost and behaviour. A good Skill is loaded only when relevant, while a bloated CLAUDE.md is loaded every session. Anthropic recommends keeping CLAUDE.md short and moving domain knowledge or procedures into Skills precisely because on-demand loading keeps the agent focused on the task in front of it.

My opinionated rule is simple. If the instruction should apply every single session, it belongs in CLAUDE.md. If the instruction is a reusable method, checklist, or workflow that matters only sometimes, it belongs in a Skill. If the action must happen automatically on every matching event, it probably belongs in a hook, not a Skill. Anthropic's feature overview frames those tools in almost exactly that layering model.

Layer	Tool	When to use
`CLAUDE.md`	Always loaded	Project facts, durable conventions, repo-wide rules
Skill	Loaded on demand	Repeatable procedures, playbooks, domain checklists
Hook	Event-triggered	Automatic side effects on file save, commit, or session start

A practical smell for each: if you find yourself pasting the same instructions into every chat, that is a Skill. If a CLAUDE.md section has grown into a step-by-step process, extract it into a Skill. If you want something to fire silently every time a file is saved, write a hook instead.

Claude Skills IDE Support: VS Code, JetBrains, Cursor, and Codex

Claude Code runs across CLI, Desktop, VS Code, JetBrains, web, and mobile-related remote control flows. Anthropic describes the CLI as the most complete local surface, while the IDE integrations trade some CLI-only capabilities for editor-native review, file context, and tighter workflow ergonomics. Configuration, project memory, and MCP servers are shared across the local surfaces, so your .claude setup follows you rather than being trapped in one editor.

For VS Code, Anthropic says the extension is the recommended interface inside the editor. It provides plan review, inline diffs, file mention support, and integrated access to the CLI. The same install flow also exposes a direct path for Cursor. For JetBrains, the current supported list includes IntelliJ IDEA, PyCharm, Android Studio, WebStorm, PhpStorm, and GoLand, with diff viewing, selection sharing, file-reference shortcuts, and diagnostic sharing built into the plugin.

JetBrains support is better than many developers realise. If you run claude from the IDE's integrated terminal, the integration features are active automatically. If you start from an external terminal, Anthropic documents the /ide command to connect Claude Code back to the JetBrains session, and it explicitly recommends launching from the same project root so Claude sees the same files your IDE sees. If you use auto-edit modes in JetBrains, Anthropic also warns that IDE configuration files can become part of the editable surface, so manual approvals are the safer default in that environment.

Now the bigger point. Claude Skills are not only a Claude Code thing. Agent Skills is an open standard. The official Agent Skills quickstart says the same skill can work in VS Code with GitHub Copilot, Claude Code, and OpenAI Codex, and OpenAI's own Codex docs say Skills are available in the Codex CLI, IDE extension, and app. The Agent Skills implementation guide adds an important portability detail: .agents/skills has emerged as the cross-client convention, while some clients also scan .claude/skills for pragmatic compatibility.

So here is the practical compatibility rule I recommend. If you are building for Claude Code first and only, author in .claude/skills. If you genuinely want cross-client portability, target the open Agent Skills shape and use .agents/skills as the canonical path. Do not pretend those two goals are identical. They are related, not identical.

Quick compatibility reference:

Client	Skills path	Notes
Claude Code CLI	`.claude/skills/` or `~/.claude/skills/`	Most complete surface; full `allowed-tools` support
VS Code + Claude extension	`.claude/skills/`	Inline diffs, plan review, file mention
Cursor	`.claude/skills/`	Same install path as VS Code
JetBrains (IDEA, PyCharm, etc.)	`.claude/skills/`	Run `claude` from IDE terminal or use `/ide` to reconnect
GitHub Copilot, OpenAI Codex	`.agents/skills/`	Open Agent Skills standard; cross-client portability
Claude.ai web	Upload via UI	Dir name must match `name` field; 200-char description cap

SKILL.md File Structure, Folder Layout, and Storage Locations

A proper Skill is a folder, not a random markdown file sitting at repo root. The core specification requires a directory with a SKILL.md file and allows optional scripts/, references/, and assets/ directories. SKILL.md must contain YAML frontmatter followed by markdown instructions. In the spec, name and description are required, name is limited to 64 characters using lowercase letters, numbers, and hyphens, compatibility is only for real environment requirements, and allowed-tools is explicitly experimental across implementations.

Claude Code is a bit looser than the portable spec because it can derive a name from the directory and fall back to the first paragraph when description is missing. You should not rely on that if you care about portability or predictability. Claude.ai requires the directory name to match the name field, and its custom-skill upload path caps descriptions at 200 characters even though the broader spec allows much more. The portable choice is to set an explicit name, keep the directory identical, and write a precise description that fits in tight limits. That answers the FAQ topic "What should a SKILL.md file contain" without hand-waving.

Start from a structure this boring:

repo/
  .claude/
    skills/
      review-pr/
        SKILL.md
        scripts/
          review.sh
        references/
          checklist.md
        assets/
          comment-template.md

If portability across Skills-compatible clients matters more than Claude Code convenience, keep the same internal shape and swap .claude/skills/ for .agents/skills/. The folder structure is the same idea either way.

For Claude Code, the storage locations are straightforward. Project skills live at .claude/skills/<skill-name>/SKILL.md. Personal skills live at ~/.claude/skills/<skill-name>/SKILL.md. Plugin-distributed skills live under <plugin>/skills/<skill-name>/SKILL.md. Anthropic documents precedence across the built-in scopes as enterprise over personal over project, while plugin skills avoid collisions by using a namespaced form such as plugin-name:skill-name. On Windows, ~/.claude resolves to %USERPROFILE%\.claude, and CLAUDE_CONFIG_DIR can relocate the whole base directory.

The choice between project and personal scope is straightforward. Use .claude/skills/ inside the repo when the Skill is tightly coupled to that codebase — for example, a deploy playbook that knows your specific cluster names or a review rubric tuned to your team's conventions. Use ~/.claude/skills/ for Skills that travel with you across projects: personal checklists, generic changelog generators, preferred debugging workflows. Anything you would put in a dotfiles repo belongs in personal scope.

A few sharp edges are worth memorising. SKILL.md must be named exactly with that casing. Anthropic's PDF guide recommends kebab-case folder names and explicitly says not to place a README.md inside the skill folder, because the operative documentation should live in SKILL.md or references/. That same guide also stresses that SKILL.md naming is case-sensitive. These are boring constraints, but boring constraints are what make tooling reliable.

Claude Code also does the right thing for monorepos. It automatically discovers nested .claude/skills/ directories when you work inside subdirectories, which is ideal for package-level or service-level skills. It also watches existing skill directories for live changes during the current session. The one restart trap is creating a top-level skills directory that did not exist when the session started. Anthropic documents that as the case where you do need to restart so the new directory can be watched.

Claude Skills Best Practices: Descriptions, Scripts, and Scope

The fastest way to create a useless Skill is to ask an LLM to invent one from generic training knowledge. Anthropic's best-practices guide warns against exactly that. The valuable bits are the domain-specific corrections, edge cases, tool choices, and conventions the model would not reliably invent on its own. The right workflow is to solve the task once with the agent, correct it until it works, then extract the method into a Skill.

Scope the Skill like a good function, not like a wiki. Anthropic says Skills should encapsulate a coherent unit of work. Too narrow, and you force multiple skills to stack for one task. Too broad, and the agent cannot activate them precisely. The best-practices guide is blunt that overly comprehensive skills can hurt more than they help because the model chases irrelevant instructions and loses the signal.

Description quality is not a cosmetic concern. It is the routing layer. Both Anthropic and the Agent Skills docs say the description field is the primary mechanism the model uses to decide whether to load a Skill at all. Good descriptions say what the Skill does, when to use it, and the trigger phrases or file types a user would actually mention. Bad descriptions are vague, overly technical, or broad enough to match nonsense. That is the real answer to the FAQ question "Why is a Claude Skill not triggering". Usually the router is bad, not the model.

The contrast is clear side by side:

Bad descriptions — too vague to route reliably:

Helps with code review — matches everything, disambiguates nothing
Useful for development tasks — broader than a search query
Assists with writing — not a router, just a category label

Good descriptions — specific trigger language:

Review pull requests for security issues, migration risk, and missing tests. Use when reviewing a PR, git diff, or release critical change.
Generate a changelog from git log output. Use when preparing a release, writing release notes, or summarising commits since last tag.
Scaffold a new Go HTTP handler with request validation and error middleware. Use when adding a new endpoint or route to a Go service.

The pattern is the same each time: state what the Skill does, name the exact user phrases that should activate it, and optionally name file types or tools that are relevant. If your description would match a generic Google query, it is not specific enough.

If a workflow has side effects, make it manual. Claude Code exposes that directly. disable-model-invocation: true makes a Skill user-invoked only, which Anthropic recommends for actions like deploys, commits, or outbound messages. user-invocable: false goes the other way and hides the Skill from the slash menu while still letting Claude use it as background knowledge. That answers the FAQ topic "When should a skill be manual instead of automatic" in one sentence: manual for risk, automatic for safe repeatable guidance.

Keep SKILL.md small enough to stay intelligible. Anthropic recommends keeping it under 500 lines and around 5,000 tokens, then moving detailed material into references/ or similar files with explicit loading instructions. "Read references/api-errors.md if the API returns a non-200" is a good pattern. "See references/" is lazy. Claude Code also injects the rendered Skill into the conversation as a message and does not keep re-reading the file on later turns. After context compaction, only recent Skill content is carried forward within token budgets. Huge Skills are therefore not merely ugly. They are brittle over long sessions.

A good SKILL.md can stay very plain:

---
name: review-pr
description: Review pull requests for security issues, migration risk, and missing tests. Use when reviewing a PR, git diff, or release critical change.
compatibility: Designed for Claude Code. Requires git and gh.
disable-model-invocation: true
allowed-tools: Bash(git diff *) Bash(gh pr diff *) Read Grep Glob
---
# Review PR

Read references/checklist.md before running any commands.

1. Collect the diff and changed files.
2. Flag correctness, security, and test coverage issues.
3. Return findings grouped by severity with file references.
4. Suggest the smallest safe fix first.

Use scripts when determinism matters more than eloquence. The Skills scripts guide is excellent here. It says agent-facing scripts must avoid interactive prompts, document usage through --help, emit helpful error messages, prefer structured output such as JSON or CSV on stdout, send diagnostics to stderr, and support retry-safe use. It also recommends pinning one-off tool versions and describing runtime requirements explicitly in SKILL.md or the compatibility field rather than assuming the environment has the right packages.

A minimal but correct agent-facing script looks like this:

#!/usr/bin/env bash
# scripts/collect-diff.sh — called by review-pr skill
# Usage: collect-diff.sh <base-ref> [<head-ref>]
set -euo pipefail

BASE="${1:?Usage: collect-diff.sh <base-ref> [<head-ref>]}"
HEAD="${2:-HEAD}"

# Structured output to stdout so the agent can parse it
git diff "${BASE}...${HEAD}" --stat --name-only \
  | jq -Rs '{
      "changed_files": split("\n") | map(select(length > 0))
    }' \
  || { printf '{"error":"git diff failed"}\n' >&2; exit 1; }

Three things make this agent-safe. set -euo pipefail ensures the script exits loudly on any failure rather than silently proceeding. JSON on stdout gives the agent a format it can parse without guessing. Diagnostics go to stderr so the agent's stdout stream stays clean. None of this is clever. All of it is necessary.

One subtle trap is allowed-tools. In the spec it is experimental and support varies. In Claude Code it pre-approves specific tools while the Skill is active, but it does not restrict the universe of callable tools, and deny rules still belong in Claude Code permissions. In the Claude Agent SDK, Anthropic explicitly says the allowed-tools frontmatter in SKILL.md does not apply, so SDK apps must enforce tool access in the main allowed_tools or allowedTools configuration instead. If you ignore that difference, your Skill will behave differently in the CLI and in SDK-powered automation.

One more advanced pattern is worth stealing. When a workflow would flood your main thread with logs, file searches, or long research output, Claude Code lets a Skill run in a forked subagent using context: fork and an agent such as Explore. Anthropic shows this for research workflows, where the heavy lifting happens in isolated context and the main conversation gets the summary. For deep codebase exploration, that is a much better design than a giant inline Skill that pollutes the main session.

A forked Skill looks like this in frontmatter:

---
name: explore-codebase
description: Deep exploration of an unfamiliar codebase. Use when onboarding to a new repo, auditing architecture, or mapping module dependencies.
context: fork
agent: Explore
compatibility: Requires Claude Code CLI.
---
# Explore Codebase

1. Walk the directory tree and summarise the top-level modules.
2. Identify the main entry points and their responsibilities.
3. Map the dependency graph between packages.
4. Return a structured summary to the main session — not the raw file list.

The key line is context: fork. Without it, the exploration output lands inline in your conversation. With it, the subagent runs in its own context window and hands back a summary. The difference matters on large repos where exploration alone can consume thousands of tokens.

Testing Claude Skills: Triggers, Correctness, and Baseline Comparisons

A Skill is not tested because one happy-path demo worked once. Anthropic's guide breaks testing into three layers: manual testing in Claude.ai, scripted testing in Claude Code, and programmatic testing via the Skills API. The recommended evaluation areas are triggering, functional correctness, and performance against a baseline without the Skill. That is also the best answer to the FAQ question "How do you test whether a skill is reliable". You test route selection, output quality, and efficiency, not just whether the model sounded confident.

The official eval guidance gives a clean structure for test cases. Each case should include a realistic user prompt, a human-readable description of the expected output, and optional input files. The docs store those in evals/evals.json inside the Skill directory, which is a sensible convention even if you roll your own harness.

Use a fixture file and a no-nonsense eval layout like this:

{
  "skill_name": "review-pr",
  "evals": [
    {
      "id": 1,
      "prompt": "Review this PR for security issues and missing tests",
      "expected_output": "Findings grouped by severity with file references and at least one test recommendation.",
      "files": ["evals/files/pr-diff.patch"]
    },
    {
      "id": 2,
      "prompt": "Summarise last week's commits",
      "expected_output": "The skill should not activate.",
      "files": []
    }
  ]
}

My own testing rule is harsher than most teams use, but it lines up with the official guidance. Every serious Skill should have should-trigger queries, should-not-trigger queries, at least one edge-case test, and a baseline comparison without the Skill. Anthropic's examples compare tool calls, failed API calls, clarification loops, and token use with and without the Skill because "works" is not the same as "improves the workflow".

If you test through the Claude Agent SDK, remember the plumbing. Skills are filesystem artefacts there, not programmatic registrations. Anthropic says you must enable the "Skill" tool and load the relevant filesystem settings through settingSources or setting_sources. If you omit user or project, or point cwd at the wrong place, the SDK will not discover the Skill. Anthropic even recommends asking "What Skills are available?" as a direct discovery check.

Also test on the model and client you actually intend to ship. The open Agent Skills quickstart explicitly warns that tool-use reliability varies across models, and some models may answer directly instead of executing the command the Skill intends. That is not always a Skill design problem. Sometimes it is a model-selection problem, and your test matrix should expose it.

Claude Skills Troubleshooting: Common Failures and Fixes

When a Skill misbehaves, assume packaging before intelligence. The most common failures are still the boring ones.

If the Skill is not found at all, verify the file is named exactly SKILL.md, with the right case, inside the correct directory. Anthropic's troubleshooting guide calls out filename case explicitly, and its Claude Code and SDK docs point you straight at .claude/skills/*/SKILL.md and ~/.claude/skills/*/SKILL.md as the first checks.
If frontmatter is invalid, check the YAML delimiters and quotes first. Anthropic's examples show the classic mistakes: missing ---, unclosed quotes, or invalid names with spaces and capitals. Skill names should be lowercase and hyphenated.
If the Skill exists but does not trigger, the description is usually too vague. Claude Code's own troubleshooting says to include keywords users would naturally say, verify the Skill appears when you ask "What skills are available?", and try rephrasing closer to the description. Anthropic's PDF guide adds a great diagnostic trick: ask Claude when it would use the Skill and listen to how it paraphrases the description back to you.
If the Skill triggers too often, narrow the scope. Anthropic recommends making the description more specific, adding negative triggers, and using disable-model-invocation: true for workflows you want only by explicit command. Over-triggering is usually just under-specified routing language.
If the Skill seems to lose influence in long sessions, remember that descriptions can be shortened in the Claude Code catalogue when many skills are present, and invoked Skills are later carried within token budgets after compaction. Anthropic recommends front-loading keywords in the description, trimming excess text, and, for Claude Code specifically, adjusting SLASH_COMMAND_TOOL_CHAR_BUDGET if description listings are being squeezed too aggressively.
If a bundled script hangs or behaves erratically, check whether it expects interactive input. The scripts guide says agents run in non-interactive shells, so TTY prompts, password dialogs, and confirmation menus are design bugs. Accept input through flags, environment variables, or stdin and make failures explicit.
If the SDK does not see your Skill, confirm that allowed_tools includes "Skill", that settingSources or setting_sources contains user and or project, and that cwd points at the directory that actually contains .claude/skills/. Without that setup, the Skill system is not enabled no matter how correct your markdown looks.
If an MCP-backed Skill loads but the tool calls fail, Anthropic's troubleshooting checklist is sensible: verify the MCP server is connected, confirm authentication and scopes, test the MCP tool directly without the Skill, then check the exact tool names because they are case-sensitive.

The boring truth is that good Claude Skills look like good operational engineering. Clear names. Small files. Explicit triggers. Deterministic scripts where needed. Real tests. If your Skill reads like a crisp runbook, the agent has a fighting chance. If it reads like a brainstorm, you have simply hidden chaos in a folder.