Aman

Posted on Dec 26, 2025

Building Agent Skills from Scratch

#ai #llm #agents #skills

There's a lot written about agent skills, but not much about actually implementing them.

This post shows you how they work and how to integrate them into your existing agent.

View the complete implementation on GitHub

What Are Agent Skills?

Agent skills solve a simple problem: your system prompt gets bloated when you try to make your agent good at everything.

Instead of this:

You're an expert at code review, git, file organization, API testing...
[2000 lines of instructions]

You do this:

You have access to these skills:
- code-review: Reviews code for bugs and security
- git-helper: Git workflows and troubleshooting
- file-organizer: Organizes files intelligently
- api-tester: Tests REST APIs

Load them when needed.

The Core Idea

Skills are markdown files that live in a directory. Each skill has two parts:

YAML frontmatter with name and description (see this guide on frontmatter if you're new to it)
Markdown body with detailed instructions

When the agent needs expertise, it loads the relevant skill on the fly.

User: "Review this code for SQL injection"
  ↓
Agent: "I need the code-review skill"
  ↓
System: [Loads SKILL.md with security guidelines]
  ↓
Agent: [Follows those guidelines]

The key insight is that skills are just structured prompts. But they're modular, discoverable, and loaded on demand.

How It Actually Works

Step 1: Discovery

Scan a directory for SKILL.md files and parse their frontmatter. You only load the metadata initially, not the full content. This keeps memory usage low. (See the discovery implementation)

skills/
├── code-review/
│   └── SKILL.md        # name: code-review, description: ...
├── git-helper/
│   └── SKILL.md

During discovery, you extract just the YAML frontmatter (name, description). The full markdown content stays on disk until needed.

Step 2: Tool Registration

Convert each skill into an OpenAI function tool. The LLM sees these as callable functions:

activate_skill_code_review: "Reviews code for bugs, security, best practices"
activate_skill_git_helper: "Git workflows and troubleshooting"

The description is critical because it's what the LLM uses to decide which skill to activate. Be specific and clear.

Step 3: Activation

When the LLM calls a skill function:

Load the full SKILL.md content from disk
Add it to the conversation as a tool result
Let the LLM continue with those instructions

This is lazy loading. You only fetch content when actually needed. If you have 20 skills but only use 2, you've only loaded 2.

Step 4: Execution

The LLM reads the skill instructions and follows them. The skill acts like a temporary system prompt for that specific task. Once the task is done, the skill instructions fade from context (unless you keep them for multi-turn conversations).

What a Skill Looks Like

---
name: code-review
description: Reviews code for bugs, security, and best practices
version: 1.0.0
---

# Code Review Skill

You are an expert code reviewer.

## Check For

1. **Security**
   - SQL injection in queries
   - XSS in user inputs
   - Auth bypasses

2. **Quality**
   - Readability
   - Maintainability
   - DRY violations

3. **Performance**
   - N+1 queries
   - Memory leaks
   - Inefficient algorithms

## Response Format

**Summary**: Brief assessment
**Critical Issues**: Security problems (if any)
**Improvements**: Suggestions for better code
**Positives**: What works well

Notice the structure: clear sections, bullet points, and expected output format. The LLM follows structured instructions much better than prose. (For more on crafting effective prompts, see this guide.)

Why This Pattern Works

1. Context Efficiency

Instead of loading 10KB of instructions upfront, you load 100 bytes of metadata. Full instructions only come in when needed. This matters when you're paying per token.

2. Modularity

Each skill is independent. Add a new one by dropping in a SKILL.md file. No code changes needed. Want to remove a skill? Delete the directory.

3. Clarity

When debugging, you can see exactly which skill was activated and what instructions it provided. This makes troubleshooting much easier than a monolithic prompt.

4. Reusability

Share skills across projects. Someone else's api-tester skill works in your agent with zero modification. Skills become a shared library of expertise.

Key Design Decisions

Lazy Loading

Don't load all skills into memory at startup. This defeats the purpose because you're back to loading everything upfront.

Do load on demand. Parse frontmatter during discovery, but keep full content on disk until the LLM actually requests it.

Function Naming

Prefix skill functions clearly: activate_skill_code_review. This makes it obvious in logs what's happening. When you see activate_skill_* in your logs, you know a skill was activated.

Conversation Flow

The exact sequence matters. Here's what happens:

User sends message
LLM responds with tool_calls (requesting a skill)
Critical: Add assistant message with tool_calls to conversation
Add tool message with skill content
LLM continues with skill instructions
Final response

If you skip step 3, OpenAI will reject your request. The tool_calls must be properly formatted with a type field and nested function object. This is a common gotcha. (See OpenAI's tools documentation for details.)

Looping for Multiple Tool Calls

Skills can chain. A skill might activate code execution, which might need another skill. Your agent should loop until there are no more tool calls:

while True:
    response = llm.chat(messages=messages, tools=tools)
    if not response["tool_calls"]:
        break
    handle_tool_calls(response)

Always pass tools in every call, even after skill activation. Otherwise, skills can't use other tools like code execution. (See full implementation for the complete loop logic.)

Practical Considerations

Skill Scope

One skill equals one domain. Keep them focused.

Good examples: code-review, git-helper, api-tester
Bad example: developer-tools (too broad)

Skill Structure

Use clear sections with examples:

What the skill does
How to approach tasks
Expected output format
Examples of good results

A wall of text doesn't work. Structure helps the LLM follow instructions.

Error Handling

What if a skill doesn't exist? Return a helpful error:

"Skill 'xyz' not found. Available: code-review, git-helper"

Common Mistakes & Troubleshooting

Loading Everything Upfront

Some implementations load all skills at startup. This defeats the purpose because you're back to loading everything upfront, wasting memory and context tokens.

Fix: Load metadata only during discovery. Activate skills when needed.

Vague Skill Descriptions

The LLM uses skill descriptions to decide which to activate. Be specific.

❌ "Helps with code"
✅ "Reviews Python/JavaScript code for security vulnerabilities, PEP 8 compliance, and performance issues"

Include what the skill does, what types of tasks it handles, and key capabilities.

Wrong Tool Calls Format

Error: Missing required parameter: messages[1].tool_calls[0].type

Cause: OpenAI requires a specific nested structure. The tool_calls must have a type field and nest the function details under a function key.

Fix: Use the correct format with type: "function" and nested function object. Don't flatten it. See OpenAI's tools documentation for the exact message format.

Forgetting to Include Tools After Skill Activation

Problem: After activating a skill, the LLM can't use other tools like code execution.

Fix: Always pass tools in every LLM call. Don't remove tools after skill activation because skills might need them.

No Structure in Skills

A wall of text doesn't work. Use clear headings, bullet points, code examples, and expected output formats. The LLM follows structured instructions much better than prose.

When Skills Make Sense

Good fit:

Multi-domain agents that handle code, git, and devops
Agents with specialized workflows
Teams sharing common patterns
When you hit context limits

Not needed:

Single-purpose agents
Agents with small, focused prompts
Prototypes and experiments

Don't over-engineer. If your system prompt is small and manageable, you probably don't need skills.

The Standard

AgentSkills.io defines the open format:

SKILL.md naming convention
YAML frontmatter schema
Directory structure
Best practices

Following the standard means your skills work with other implementations. Skills become portable across projects and teams.

Building Your First Skill

Create the directory: mkdir -p skills/my-first-skill
Create SKILL.md with YAML frontmatter and markdown instructions
Integrate SkillsManager into your agent (see GitHub repo for full code)
Test it by asking your agent to use the skill and verifying it activates

That's it. No code changes needed to add new skills. Just drop in a SKILL.md file.

Bottom Line

Agent skills are structured prompts with a loading mechanism.

The pattern works because:

It keeps context lean by only loading what you need
It makes agents modular since skills are independent
It enables skill reuse so you can share skills across projects
It simplifies debugging with clear activation logs

You can build a working implementation in an afternoon. The core SkillsManager is about 130 lines of Python. (View the implementation)

Start with one skill. See if it helps. Expand from there.

The complete working implementation is available on GitHub. Use it as a reference or starting point for your own agent.

Resources

AgentSkills.io - Official specification
Claude Skills - Anthropic's skill examples
Open Agent Skills - Community skill library
Working Implementation - Complete code from this tutorial
WTF is Frontmatter? - Understanding frontmatter/metadata in markdown files
Anatomy of a Prompt - Complete guide to crafting effective AI prompts with structured approaches
OpenAI Function Calling - Official OpenAI tools documentation

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.