TL;DR
Claude Code Skills are custom extensions that automate and optimize specific developer workflows in Claude. Use the Skill Creator system to define your skillβs purpose, draft the SKILL.md, create test cases, run benchmarks, and iterate until the skill triggers reliably and performs well.
Introduction
If you use Claude Code daily, you likely repeat certain sequences: initializing projects, running tests, formatting outputs, and so on. Instead of explaining your workflow every time, Claude Code Skills let you encode these steps once and reuse them indefinitely. The Skill Creator system provides an automated, structured pathway for building, evaluating, and refining these custom skills for your workflow.
This guide covers the end-to-end process: skill anatomy, creation workflow, evaluation, optimization, and practical examples from Anthropicβs official skills repository.
π‘ Tip: Building API-related skills? Apidog integrates seamlessly, letting you test endpoints, validate responses, and generate docs in a unified skill workflow.
What Are Claude Code Skills?
Claude Code Skills are markdown-based instruction sets that extend Claudeβs built-in capabilities. Treat them like custom plugins for repeatable developer tasks.
The Skill System Architecture
Skills use a three-level loading system:
- Metadata (~100 words): Name and description, always in context
- SKILL.md body (<500 lines): Core instructions, loaded when skill triggers
- Bundled resources (unlimited): Scripts, references, assets loaded on demand
skill-name/
βββ SKILL.md (required)
β βββ YAML frontmatter (name, description)
β βββ Markdown instructions
βββ Bundled Resources (optional)
βββ scripts/
βββ references/
βββ assets/
When Skills Trigger
Skills appear in Claudeβs available_skills list. Claude consults a skill if its description matches a task it can't handle directly. Only complex, multi-step workflows reliably trigger skills.
Real-World Examples
| Skill | Purpose | Key Features |
|---|---|---|
| skill-creator | Create new skills | Test generation, benchmark evaluation, description tuning |
| mcp-builder | Build MCP servers | Python/Node templates, evaluation framework |
| docx | Generate Word documents | python-docx scripts, templates, styling guide |
| Extract/manipulate PDFs | Form handling, extraction, reference docs | |
| frontend-design | Build web interfaces | Component library, Tailwind, accessibility checks |
The Skill Creation Workflow
Follow this systematic loop:
- Capture intent: Define the skill's purpose
-
Write a draft: Create the
SKILL.mdfile - Create test cases: Define realistic prompts
- Run evaluations: Execute with and without the skill
- Review results: Analyze feedback and metrics
- Iterate: Refine based on findings
- Optimize description: Improve trigger accuracy
-
Package: Distribute as a
.skillfile
Step 1: Capture Intent
Clarify what the skill should do. Extract patterns from your workflow history.
Key questions:
- What outcome should the skill achieve?
- When should it trigger (user phrases/contexts)?
- What output formats are expected?
- Are test cases needed? (Yes for verifiable outputs.)
Example: API Testing Skill
Intent: Help developers test REST APIs systematically
Trigger: User mentions API testing, endpoints, REST, GraphQL, validation
Output: Test reports with pass/fail, curl commands, response comparisons
Test cases: Yes
Step 2: Write the SKILL.md File
Every skill requires a SKILL.md with YAML frontmatter and markdown instructions.
Example Anatomy:
---
name: api-tester
description: How to test REST APIs systematically. Use when users mention API testing, endpoints, REST, GraphQL, or want to validate API responses. Make sure to suggest this skill whenever testing is involved.
compatibility: Requires curl or HTTP client tools
---
# API Tester Skill
## Core Workflow
1. **Understand the endpoint**
2. **Design test cases**
3. **Execute tests** (curl or Apidog)
4. **Validate responses**
5. **Report results**
Best Practices:
- Keep
SKILL.mdunder 500 lines; move details toreferences/ - Explain reasoning, not just steps
- Use imperative statements ("Always validate status code first")
- Include input/output examples
Step 3: Create Test Cases
Draft 2-3 realistic test prompts and store them in evals/evals.json.
Example Format:
{
"skill_name": "api-tester",
"evals": [
{
"id": 1,
"prompt": "Test the /users endpoint on api.example.com - it needs a Bearer token and returns a list of users with id, name, email fields",
"expected_output": "Test report with at least 5 test cases including auth failure, success, and pagination tests",
"files": []
},
...
]
}
Good test prompts are specific, contextual, and describe expected behavior.
Step 4: Run Evaluations
For each test case, run two parallel subagents:
- With skill: Uses your custom skill
- Baseline: No skill (or previous version)
Workspace structure:
api-tester-workspace/
βββ iteration-1/
β βββ eval-0-auth-failure/
β β βββ with_skill/
β β βββ without_skill/
β β βββ eval_metadata.json
β βββ benchmark.json
β βββ benchmark.md
...
Capture timing:
Store total_tokens and duration_ms in timing.json for each run.
Step 5: Draft Assertions
While runs complete, define quantitative assertions in eval_metadata.json.
Example:
{
"assertions": [
{
"name": "includes_auth_failure_test",
"description": "Test report includes at least one authentication failure test case",
"type": "contains",
"value": "401"
},
...
]
}
Step 6: Grade and Aggregate
After runs finish:
-
Grade runs: Use a grader agent to check assertions; save to
grading.json. - Aggregate: Run the aggregation script for benchmarks.
Example aggregation command:
python -m scripts.aggregate_benchmark api-tester-workspace/iteration-1 --skill-name api-tester
Analyze: Look for non-discriminating assertions, flaky evals, or efficiency issues.
Step 7: Launch the Eval Viewer
Visualize outputs and metrics in a browser.
Generate viewer:
nohup python /path/to/skill-creator/eval-viewer/generate_review.py \
api-tester-workspace/iteration-1 \
--skill-name "api-tester" \
--benchmark api-tester-workspace/iteration-1/benchmark.json \
> /dev/null 2>&1 &
VIEWER_PID=$!
For later iterations, add --previous-workspace.
Headless environments: Use --static to generate a standalone HTML file.
Step 8: Read Feedback and Iterate
After user review, read feedback.json and focus improvements on areas with actionable comments.
Iteration loop:
- Apply improvements
- Rerun test cases
- Relaunch viewer with previous iteration
- Repeat until satisfied
Kill the viewer when finished:
kill $VIEWER_PID 2>/dev/null
Step 9: Optimize the Skill Description
The description in SKILL.md is vital for triggering accuracy.
Generate trigger eval queries: Create at least 20, mixing should-trigger and should-not-trigger cases.
Run optimization:
python -m scripts.run_loop \
--eval-set /path/to/trigger-eval.json \
--skill-path /path/to/api-tester \
--model claude-sonnet-4-6 \
--max-iterations 5 \
--verbose
Use the best_description from the output to update SKILL.md.
Step 10: Package and Distribute
Package your skill with:
python -m scripts.package_skill /path/to/api-tester
Distribute the resulting .skill file. Users install by placing it in their skills directory or using Claudeβs install command.
Common Skill Creation Mistakes
Mistake 1: Vague Description
# Bad
description: A skill for working with APIs
# Good
description: How to test REST APIs systematically. Use when users mention API testing, endpoints, REST, GraphQL, or want to validate API responses...
Mistake 2: Overly Restrictive Instructions
# Bad
ALWAYS use this exact format. NEVER deviate.
# Good
Use this format because it ensures stakeholders can quickly find the information they need. Adapt if your audience has different needs.
Mistake 3: Skipping Test Cases
Even for subjective skills, run a few qualitative checks.
Mistake 4: Ignoring Timing Data
Optimize for efficiency, not just correctness.
Mistake 5: Not Bundling Repeated Scripts
Bundle helper scripts in scripts/ to avoid duplication.
Real-World Skill Examples
MCP Builder Skill
Purpose: Build MCP servers
Features: Python/Node templates, evaluation framework, best practices
mcp-builder/
βββ SKILL.md
βββ reference/
β βββ mcp_best_practices.md
β βββ python_mcp_server.md
β βββ node_mcp_server.md
βββ evaluation/
βββ evaluation.md
Docx Skill
Purpose: Generate Word docs
Features: python-docx scripts, templates, styling guide
Workflow: Gather requirements β select template β generate β validate
Frontend Design Skill
Purpose: Build web interfaces
Features: Component library, Tailwind, accessibility checks
Core workflow in SKILL.md, details in references/
Testing Your Skill with Apidog
If youβre building API-related skills, Apidog integrates directly into the workflow.
Example: API Testing Skill Integration
## Running API Tests
Use Apidog for systematic testing:
1. Import the OpenAPI spec into Apidog
2. Generate test cases from the spec
3. Run tests and export results as JSON
4. Validate responses against expected schemas
For custom assertions, use Apidog's scripting feature.
Bundle Apidog Scripts:
api-tester/
βββ SKILL.md
βββ scripts/
βββ run-apidog-tests.py
βββ generate-report.py
This standardizes future runs and ensures repeatability.
Conclusion
Claude Code Skills let you encode and automate custom workflows in Claude. The Skill Creator system provides a repeatable, test-driven process:
- Define intent
- Draft
SKILL.mdwith clear, example-driven instructions - Create realistic test cases
- Run evaluations (with/without skill)
- Analyze feedback and metrics
- Iterate improvements
- Optimize description for reliable triggering
- Package and distribute as a
.skillfile
FAQ
How long does it take to create a skill?
- Simple skills: 15β30 minutes
- Complex skills with references/scripts: 2β3 hours (including evaluation)
Do I need to write test cases for every skill?
- Only for skills with objectively verifiable outputs (code, file transforms, data extraction). Subjective skills (writing, design) can be checked qualitatively.
What if my skill doesnβt trigger reliably?
- Optimize the
descriptionfield. Include explicit trigger phrases and run the optimization loop with 20 eval queries.
How do I share skills with my team?
- Package with
python -m scripts.package_skill <path>and distribute the.skillfile. Team members install it in their skills directory.
Can skills call external APIs?
- Yes. Bundle scripts for API calls, and use environment variables for keys.
Whatβs the file size limit for skills?
- No hard limit, but keep
SKILL.mdunder 500 lines. Offload details to references/scripts.
How do I update an existing skill?
- Copy to a writable location, edit, repackage, and preserve the original name unless creating a variant.
Build smarter workflows and automate repetitive tasks in Claudeβstart building your own Code Skills today!

Top comments (0)