DEV Community

Cover image for Building Agent Skills: A Pattern for Discoverable Capabilities
Guatu
Guatu

Posted on • Originally published at guatulabs.dev

Building Agent Skills: A Pattern for Discoverable Capabilities

I spent three weeks building a set of "tools" for a custom agent that could manage my infrastructure, only to realize the agent had no idea how to actually use them in combination. I'd give it a read_file tool and a grep_search tool, and it would repeatedly try to read a 50MB log file into its context window instead of grepping for the error first. The tools existed, but the "skill" of knowing when and how to sequence them was missing.

If you're building AI agents, you've probably hit this. Most frameworks treat tools as a flat list of functions. You dump 20 Python functions into the system prompt and hope the LLM's reasoning is strong enough to pick the right one. It usually isn't.

The False Start: The "Tool Soup" Approach

My first instinct was to just write better descriptions. I spent hours tweaking the docstrings of my functions, adding phrases like "Use this tool ONLY when the file is larger than 10KB." I was treating the LLM like a junior dev who just needed better instructions.

The problem is that tool-calling is fundamentally different from skill execution. A tool is an atomic action (e.g., GET /api/v1/status). A skill is a capability (e.g., "Diagnose why the Kubernetes ingress is returning 502").

I tried to solve this by creating "orchestrator" tools—basically giant functions that wrapped other functions. This just moved the complexity into my Python code. I ended up with a monolithic diagnose_k8s_issue() function that was 300 lines long and impossible to test. I had created a rigid script, not a flexible agent. I'd effectively turned my AI agent back into a bash script with a fancy interface.

The Solution: Discoverable Skill Definitions

The shift happened when I stopped defining tools and started defining skills as discoverable metadata. Instead of just exposing a function, I created a registry where skills are defined by their intent, the tools they require, and a suggested execution pattern.

I implemented this using a structured manifest. Instead of the LLM guessing which tool to use, the agent first queries a "Skill Registry" to find a capability that matches the user's intent.

Here is the pattern I'm using now. Each skill is a standalone definition that explicitly maps the capability to the underlying tool.

# skill-registry.yaml
skills:
  - id: "log-error-search"
    name: "Search Logs for Errors"
    description: "Finds specific error patterns in system logs without loading entire files."
    required_tools: ["grep", "ls"]
    execution_pattern: |
      1. Use 'ls' to identify the relevant log file in /var/log.
      2. Use 'grep' with the --context flag to find the error and surrounding lines.
      3. If no results, try searching for 'FATAL' or 'CRITICAL'.
    usage_example: "/skill:search --tool=grep --pattern='timeout' --files='/var/log/syslog'"
Enter fullscreen mode Exit fullscreen mode

To make this work in practice, I changed the agent's loop. Instead of User -> LLM -> Tool, the flow became User -> LLM -> Skill Lookup -> LLM -> Tool Sequence.

When the agent identifies it needs to search logs, it doesn't just call grep. It retrieves the log-error-search skill definition. This gives the LLM a "recipe" for the task. It's the difference between giving someone a pile of ingredients and giving them a recipe book.

If you're building these as MCP servers, you can implement this by creating a specific "discovery" tool that returns these manifests. I've written about building MCP servers with FastMCP, and applying this skill pattern there makes the tools significantly more reliable across different IDEs like Antigravity or Kiro.

Handling the "Dirty Work" of Execution

One of the biggest gaps in agent documentation is how to handle the actual execution of these skills when they hit real-world infrastructure. For example, if a skill requires searching through Kubernetes volumes, you can't just assume the agent has the right permissions or that the volume is healthy.

I hit a wall where my "Log Search" skill would fail because the underlying Longhorn volumes were hitting snapshot limits, causing the filesystem to go read-only. The agent would just report "Permission Denied," which is useless.

I had to build "pre-flight" checks into the skill execution layer. If a skill involves storage, it first checks the volume health. If I see a bunch of stale snapshots, I have the agent run a cleanup before attempting the search.

# Example of a cleanup command the agent can trigger via a 'maintenance' skill
kubectl delete snapshots.longhorn.io -l "snapshot-name=old-snapshot-2025"
Enter fullscreen mode Exit fullscreen mode

This is where the gap between "it works in the playground" and "it works in production" becomes obvious. If you're running these agents on bare metal, you need to account for the infrastructure failures I've detailed in my posts on Longhorn volume health.

Why This Pattern Works

The reason this beats a flat list of tools is cognitive load. LLMs have a limited context window, and more importantly, a limited "attention" span (the lost-in-the-middle phenomenon). When you provide 50 tools, the probability of the LLM picking a suboptimal tool increases.

By using a skill registry, you're implementing a form of "just-in-time" prompting. The agent only sees the detailed instructions for the specific skill it needs for the current step.

Feature Tool-Based Approach Skill-Based Approach
Discovery LLM scans all tool descriptions Agent queries registry for specific intent
Execution LLM guesses the sequence Agent follows a proven execution pattern
Maintenance Change docstrings and hope for the best Update the skill manifest in one place
Reliability High variance in output Consistent, repeatable workflows
Scalability Context window fills up quickly Only relevant skills are loaded into context

This approach also solves the security problem. I don't give the agent a blanket "Admin" token. Instead, I map skills to specific two-tier service accounts. A "Read-Only Log Search" skill uses a restricted token, while a "Restart Pod" skill requires a higher-privilege token and a manual approval gate.

Lessons Learned and Gotchas

The biggest surprise was that the LLM actually prefers being told how to use a tool over being told what the tool does. A tool description like "Greps a file" is useless. A skill pattern that says "First list the files, then grep the most recent one" is a force multiplier.

I also learned that you can't trust the LLM to always follow the registry. Sometimes it tries to be "clever" and skip a step. I had to implement a validation layer that checks the output of each step against the skill's expected state. If the ls step fails, the agent isn't allowed to attempt the grep step.

If I were to do this over again, I'd move the skill registry into a vector database from the start. As the number of skills grows, even a YAML file becomes a bottleneck. Using a vector search to find the top 3 most relevant skills based on the user's query is the only way to scale this to hundreds of capabilities.

The most important takeaway is this: stop trying to make your agents "smarter" by using a larger model. Instead, make your capabilities more discoverable. The intelligence should live in the architecture of the skills, not just in the weights of the LLM.

For those building these systems for industrial or production use, I highly recommend looking into how these patterns fit into a broader multi-agent architecture. One agent can act as the "Librarian" (managing the skill registry), while another acts as the "Executor" (following the recipes). This separation of concerns prevents the executor from getting distracted by the discovery process.

Top comments (0)