DEV Community

Cover image for How to Create Claude Code Skills Automatically with Skill Creator
Wanda
Wanda

Posted on • Originally published at apidog.com

How to Create Claude Code Skills Automatically with Skill Creator

TL;DR

Claude Code Skills are custom extensions that automate and optimize specific developer workflows in Claude. Use the Skill Creator system to define your skill’s purpose, draft the SKILL.md, create test cases, run benchmarks, and iterate until the skill triggers reliably and performs well.

Try Apidog today


Introduction

If you use Claude Code daily, you likely repeat certain sequences: initializing projects, running tests, formatting outputs, and so on. Instead of explaining your workflow every time, Claude Code Skills let you encode these steps once and reuse them indefinitely. The Skill Creator system provides an automated, structured pathway for building, evaluating, and refining these custom skills for your workflow.

This guide covers the end-to-end process: skill anatomy, creation workflow, evaluation, optimization, and practical examples from Anthropic’s official skills repository.

πŸ’‘ Tip: Building API-related skills? Apidog integrates seamlessly, letting you test endpoints, validate responses, and generate docs in a unified skill workflow.


What Are Claude Code Skills?

Claude Code Skills are markdown-based instruction sets that extend Claude’s built-in capabilities. Treat them like custom plugins for repeatable developer tasks.

The Skill System Architecture

Skills use a three-level loading system:

  1. Metadata (~100 words): Name and description, always in context
  2. SKILL.md body (<500 lines): Core instructions, loaded when skill triggers
  3. Bundled resources (unlimited): Scripts, references, assets loaded on demand
skill-name/
β”œβ”€β”€ SKILL.md (required)
β”‚   β”œβ”€β”€ YAML frontmatter (name, description)
β”‚   └── Markdown instructions
└── Bundled Resources (optional)
    β”œβ”€β”€ scripts/
    β”œβ”€β”€ references/
    └── assets/
Enter fullscreen mode Exit fullscreen mode

When Skills Trigger

Skills appear in Claude’s available_skills list. Claude consults a skill if its description matches a task it can't handle directly. Only complex, multi-step workflows reliably trigger skills.

Real-World Examples

Skill Purpose Key Features
skill-creator Create new skills Test generation, benchmark evaluation, description tuning
mcp-builder Build MCP servers Python/Node templates, evaluation framework
docx Generate Word documents python-docx scripts, templates, styling guide
pdf Extract/manipulate PDFs Form handling, extraction, reference docs
frontend-design Build web interfaces Component library, Tailwind, accessibility checks

The Skill Creation Workflow

Follow this systematic loop:

  1. Capture intent: Define the skill's purpose
  2. Write a draft: Create the SKILL.md file
  3. Create test cases: Define realistic prompts
  4. Run evaluations: Execute with and without the skill
  5. Review results: Analyze feedback and metrics
  6. Iterate: Refine based on findings
  7. Optimize description: Improve trigger accuracy
  8. Package: Distribute as a .skill file

Step 1: Capture Intent

Clarify what the skill should do. Extract patterns from your workflow history.

Key questions:

  • What outcome should the skill achieve?
  • When should it trigger (user phrases/contexts)?
  • What output formats are expected?
  • Are test cases needed? (Yes for verifiable outputs.)

Example: API Testing Skill

Intent: Help developers test REST APIs systematically
Trigger: User mentions API testing, endpoints, REST, GraphQL, validation
Output: Test reports with pass/fail, curl commands, response comparisons
Test cases: Yes
Enter fullscreen mode Exit fullscreen mode

Step 2: Write the SKILL.md File

Every skill requires a SKILL.md with YAML frontmatter and markdown instructions.

Example Anatomy:

---
name: api-tester
description: How to test REST APIs systematically. Use when users mention API testing, endpoints, REST, GraphQL, or want to validate API responses. Make sure to suggest this skill whenever testing is involved.
compatibility: Requires curl or HTTP client tools
---

# API Tester Skill

## Core Workflow

1. **Understand the endpoint**
2. **Design test cases**
3. **Execute tests** (curl or Apidog)
4. **Validate responses**
5. **Report results**
Enter fullscreen mode Exit fullscreen mode

Best Practices:

  • Keep SKILL.md under 500 lines; move details to references/
  • Explain reasoning, not just steps
  • Use imperative statements ("Always validate status code first")
  • Include input/output examples

Step 3: Create Test Cases

Draft 2-3 realistic test prompts and store them in evals/evals.json.

Example Format:

{
  "skill_name": "api-tester",
  "evals": [
    {
      "id": 1,
      "prompt": "Test the /users endpoint on api.example.com - it needs a Bearer token and returns a list of users with id, name, email fields",
      "expected_output": "Test report with at least 5 test cases including auth failure, success, and pagination tests",
      "files": []
    },
    ...
  ]
}
Enter fullscreen mode Exit fullscreen mode

Good test prompts are specific, contextual, and describe expected behavior.


Step 4: Run Evaluations

For each test case, run two parallel subagents:

  • With skill: Uses your custom skill
  • Baseline: No skill (or previous version)

Workspace structure:

api-tester-workspace/
β”œβ”€β”€ iteration-1/
β”‚   β”œβ”€β”€ eval-0-auth-failure/
β”‚   β”‚   β”œβ”€β”€ with_skill/
β”‚   β”‚   β”œβ”€β”€ without_skill/
β”‚   β”‚   └── eval_metadata.json
β”‚   β”œβ”€β”€ benchmark.json
β”‚   └── benchmark.md
...
Enter fullscreen mode Exit fullscreen mode

Capture timing:
Store total_tokens and duration_ms in timing.json for each run.


Step 5: Draft Assertions

While runs complete, define quantitative assertions in eval_metadata.json.

Example:

{
  "assertions": [
    {
      "name": "includes_auth_failure_test",
      "description": "Test report includes at least one authentication failure test case",
      "type": "contains",
      "value": "401"
    },
    ...
  ]
}
Enter fullscreen mode Exit fullscreen mode

Step 6: Grade and Aggregate

After runs finish:

  1. Grade runs: Use a grader agent to check assertions; save to grading.json.
  2. Aggregate: Run the aggregation script for benchmarks.

Example aggregation command:

python -m scripts.aggregate_benchmark api-tester-workspace/iteration-1 --skill-name api-tester
Enter fullscreen mode Exit fullscreen mode

Analyze: Look for non-discriminating assertions, flaky evals, or efficiency issues.


Step 7: Launch the Eval Viewer

Visualize outputs and metrics in a browser.

Generate viewer:

nohup python /path/to/skill-creator/eval-viewer/generate_review.py \
  api-tester-workspace/iteration-1 \
  --skill-name "api-tester" \
  --benchmark api-tester-workspace/iteration-1/benchmark.json \
  > /dev/null 2>&1 &
VIEWER_PID=$!
Enter fullscreen mode Exit fullscreen mode

For later iterations, add --previous-workspace.

Headless environments: Use --static to generate a standalone HTML file.


Step 8: Read Feedback and Iterate

After user review, read feedback.json and focus improvements on areas with actionable comments.

Iteration loop:

  1. Apply improvements
  2. Rerun test cases
  3. Relaunch viewer with previous iteration
  4. Repeat until satisfied

Kill the viewer when finished:

kill $VIEWER_PID 2>/dev/null
Enter fullscreen mode Exit fullscreen mode

Step 9: Optimize the Skill Description

The description in SKILL.md is vital for triggering accuracy.

Generate trigger eval queries: Create at least 20, mixing should-trigger and should-not-trigger cases.

Run optimization:

python -m scripts.run_loop \
  --eval-set /path/to/trigger-eval.json \
  --skill-path /path/to/api-tester \
  --model claude-sonnet-4-6 \
  --max-iterations 5 \
  --verbose
Enter fullscreen mode Exit fullscreen mode

Use the best_description from the output to update SKILL.md.


Step 10: Package and Distribute

Package your skill with:

python -m scripts.package_skill /path/to/api-tester
Enter fullscreen mode Exit fullscreen mode

Distribute the resulting .skill file. Users install by placing it in their skills directory or using Claude’s install command.


Common Skill Creation Mistakes

Mistake 1: Vague Description

# Bad
description: A skill for working with APIs

# Good
description: How to test REST APIs systematically. Use when users mention API testing, endpoints, REST, GraphQL, or want to validate API responses...
Enter fullscreen mode Exit fullscreen mode

Mistake 2: Overly Restrictive Instructions

# Bad
ALWAYS use this exact format. NEVER deviate.

# Good
Use this format because it ensures stakeholders can quickly find the information they need. Adapt if your audience has different needs.
Enter fullscreen mode Exit fullscreen mode

Mistake 3: Skipping Test Cases
Even for subjective skills, run a few qualitative checks.

Mistake 4: Ignoring Timing Data
Optimize for efficiency, not just correctness.

Mistake 5: Not Bundling Repeated Scripts
Bundle helper scripts in scripts/ to avoid duplication.


Real-World Skill Examples

MCP Builder Skill

Purpose: Build MCP servers

Features: Python/Node templates, evaluation framework, best practices

mcp-builder/
β”œβ”€β”€ SKILL.md
β”œβ”€β”€ reference/
β”‚   β”œβ”€β”€ mcp_best_practices.md
β”‚   β”œβ”€β”€ python_mcp_server.md
β”‚   └── node_mcp_server.md
└── evaluation/
    └── evaluation.md
Enter fullscreen mode Exit fullscreen mode

Docx Skill

Purpose: Generate Word docs

Features: python-docx scripts, templates, styling guide

Workflow: Gather requirements β†’ select template β†’ generate β†’ validate

Frontend Design Skill

Purpose: Build web interfaces

Features: Component library, Tailwind, accessibility checks

Core workflow in SKILL.md, details in references/


Testing Your Skill with Apidog

If you’re building API-related skills, Apidog integrates directly into the workflow.

Apidog Example

Example: API Testing Skill Integration

## Running API Tests

Use Apidog for systematic testing:

1. Import the OpenAPI spec into Apidog
2. Generate test cases from the spec
3. Run tests and export results as JSON
4. Validate responses against expected schemas

For custom assertions, use Apidog's scripting feature.
Enter fullscreen mode Exit fullscreen mode

Bundle Apidog Scripts:

api-tester/
β”œβ”€β”€ SKILL.md
└── scripts/
    β”œβ”€β”€ run-apidog-tests.py
    └── generate-report.py
Enter fullscreen mode Exit fullscreen mode

This standardizes future runs and ensures repeatability.


Conclusion

Claude Code Skills let you encode and automate custom workflows in Claude. The Skill Creator system provides a repeatable, test-driven process:

  1. Define intent
  2. Draft SKILL.md with clear, example-driven instructions
  3. Create realistic test cases
  4. Run evaluations (with/without skill)
  5. Analyze feedback and metrics
  6. Iterate improvements
  7. Optimize description for reliable triggering
  8. Package and distribute as a .skill file

FAQ

How long does it take to create a skill?

  • Simple skills: 15–30 minutes
  • Complex skills with references/scripts: 2–3 hours (including evaluation)

Do I need to write test cases for every skill?

  • Only for skills with objectively verifiable outputs (code, file transforms, data extraction). Subjective skills (writing, design) can be checked qualitatively.

What if my skill doesn’t trigger reliably?

  • Optimize the description field. Include explicit trigger phrases and run the optimization loop with 20 eval queries.

How do I share skills with my team?

  • Package with python -m scripts.package_skill <path> and distribute the .skill file. Team members install it in their skills directory.

Can skills call external APIs?

  • Yes. Bundle scripts for API calls, and use environment variables for keys.

What’s the file size limit for skills?

  • No hard limit, but keep SKILL.md under 500 lines. Offload details to references/scripts.

How do I update an existing skill?

  • Copy to a writable location, edit, repackage, and preserve the original name unless creating a variant.

Build smarter workflows and automate repetitive tasks in Claudeβ€”start building your own Code Skills today!

Top comments (0)