supreet singh

Posted on Feb 17

Beyond the Prompt: An Explorer’s Guide to Claude Skills (Part 1)

#claude #claudeskills #agents

I’ve been spending a lot of time lately trying to figure out where "prompting" ends and "systems" begin.

The Chat Box Ceiling

Most of us start our AI journey in a chat box. We learn to ask better questions and provide more context, eventually building massive, complex prompts. Yet, even the best prompts often miss the mark.

The Claude Code Transition

I expected the friction to disappear when I moved to Claude Code because the tool lives in my local environment. It didn't work out that way. The overhead remains surprisingly high. Even with local access, if you want Claude to plan and execute a complex task, you still end up defining a massive instruction set for every single run. It becomes a tedious ritual of:

Specifying every parameter for the task.
Reminding the agent what to watch for.
Defining exactly how to take specific actions.

It feels less like doing the actual work and more like managing a micromanager.

The Legacy System Hurdle

When I started using Claude Code last year, "Skills" weren't really on the radar yet. My workaround was building a library of SOPs (Standard Operating Procedures). This was a necessity because I work with legacy systems—specifically ancient Linux builds and fragile ecosystems that are decades old.

Before every task, I had to manually point Claude toward that world. I would identify which SOPs were relevant, tell Claude Code to read them, and then wait for it to "load" that logic before it could finally act.

The Discovery of Skills

Recently, I stumbled upon Claude Skills. I’ve found them to be much more effective at handling this orchestration without the manual baggage of the SOP approach.

I’m not writing this as an expert who has mastered the architecture. I’m an explorer in the middle of an "Aha!" moment. I’m identifying patterns, figuring out the logic, and trying to build something that actually scales. Here is what I’m learning in real time.

1. The Conceptual Shift: Instruction vs. Orchestration

The first thing I had to unlearn is the idea that a Skill is just a "saved prompt." It isn’t.

If a prompt is a recipe, a Skill is the entire kitchen setup. A normal prompt tells Claude what to do once. A Skill defines how Claude should operate across an entire category of scenarios. It is a workflow abstraction layer. Instead of a linear prompt-response cycle, you are encoding "agentic behavior" into a reusable unit. This includes conditional logic, tool choices, and specific reasoning stages.

Prompt-level: "Summarize this into five bullet points."
Skill-level: "When I give you a financial document, identify the sender, select the right parsing method, and decide if an Excel file is needed based on the data complexity."

Recent research into "Divide and Conquer" frameworks for LLMs supports this. Studies show that single-shot, linear instructions suffer from "superlinear noise growth." In plain English, the longer and more complex your instructions get, the more the model’s focus degrades. Moving to an orchestration model keeps the "noise" low by keeping the immediate task context tiny.

2. The Anatomy: How It’s Actually Built

Technically, a Skill is just a folder. The structure of that folder is where the magic happens. At the root, you have a file called SKILL.md. This isn’t just a readme; it is the brain of the operation. It generally has three layers:

The Front Matter (Identity): A brief YAML block that tells the system what the skill is. This is how Claude "discovers" the skill.
The Main Body (The Router): This is the orchestration layer. It doesn't hold all the instructions. Instead, it acts as a traffic controller. It tells Claude: "If Task A happens, look at File X. If Task B happens, use Tool Y."
Contextual Files (The Specialists): These are separate files that hold the deep, domain-specific logic.

This separation is the core of Progressive Disclosure. Instead of stuffing 5,000 words of instructions into every chat, the Skill selectively loads only what is relevant. If you are summarizing a document, it doesn't load the code-generation logic. It keeps the "cognitive scope" narrow and precise.

3. The "Token Economy" Hypothesis

One of the most exciting things about Skills is that they can significantly reduce token usage, sometimes by 60% to 70%.

This efficiency comes from that Progressive Disclosure architecture. In a typical session, Claude might recognize twenty different skills but only loads a tiny metadata snippet for each one. The heavy instructions and reference files stay out of the context window until Claude decides they are actually needed.

Loading less context uses fewer tokens, but it also leads to better results. This isn't just a hunch; 2025 research on "Context Rot" shows that even if a model is technically within its context window, irrelevant information causes reasoning performance to drop by anywhere from 13% to a staggering 85%. By limiting what the model sees, we aren't just saving money; we are dramatically improving the quality of the reasoning.

4. When Does a Skill Actually Make Sense?

If you just want your AI to always use a specific tone or return JSON, a simple template is fine. You are over-engineering if you build a Skill for a trivial task. The value of a Skill explodes when you deal with conditional complexity. Think about a financial workflow:

Input: A monthly report.
Action: Parse expenses and analyze patterns.
Condition: If spending is over budget, load the "Planning Recommendation" context.
Output: Generate an Excel file and trigger a tool to email the summary to an accountant.

At this point, you have a system that can orchestrate different flows of your workflow automatically.

5. The "Unanswered Questions" Log

This is what I’m currently working through and what I'll cover in the next few articles:

Observability: How do we actually see the Skill flow? I want to know exactly when a specific context is triggered and how to measure if the output is truly optimized.
Versioning: How do we update a Skill without breaking the workflows that rely on it?
Dynamic Tool Selection: I’ve discovered that Claude uses a search pattern to manage massive libraries. Instead of loading every tool definition, it uses a lightweight index to find tools on demand. I’m exploring how to implement custom search strategies in my own skills.

What’s Next?

This is just the beginning. I’m currently building custom skills for my day-to-day work. While I can’t share the proprietary details, I’ll be documenting the broader architectural lessons I learn along the way.

I’ll also be publishing public skills for Openclaw. It uses a similar system, but I’ve noticed it tends to be context-heavy, which leads to significant token waste. I’m looking at how to port Claude’s efficient patterns over to Openclaw to lean out that workflow.

I’m curious about what you’re building. What messy, repetitive workflows are you trying to turn into "systems" right now? If you've managed to incorporate skills into your day-to-day, I’d love to hear how it’s changing your approach.

Top comments (1)

Kinan Hamwi • Feb 18

This is the main reason I built Opaal actually, and just published it today, this is to better draft these long Orchestration prompts github.com/Agravak/opaal