Seenivasa Ramadurai

Posted on Mar 18

Agent Skills: The Missing Layer That Makes AI Agents Enterprise Ready

#agents #ai #automation #llm

How Enterprises Encode Institutional Knowledge into AI Agents

What Is an AI Agent?

An AI agent is a system that uses a large language model (LLM) to reason, plan, and act to reach a goal. Unlike a simple chatbot that only answers questions, an agent can:

Perceive its environment (user input, tool results, context)
Decide what to do next (reason, plan, choose tools)
Act by calling tools, APIs, or scripts
Iterate until the task is done (ReAct: Reason → Act → Observe → repeat)

AI agents are being deployed across every department HR, Finance, Legal, IT, Customer Support. They can read documents, call APIs, extract data, and take actions. On paper, the capability is there.

But in practice, something keeps going wrong. The agent gives an answer that is technically correct but does not match how your company actually operates. It follows a process that made sense in general but violates your internal policy. It retrieves the right data but does not know what to do with it next. It escalates everything when only some things need escalating or escalates nothing when everything does.

The problem is not the model. The problem is that the agent has no knowledge of your organization your leave policies, your approval thresholds, your GL coding rules, your escalation criteria, your vendor requirements. It has been trained on the public internet. It knows what companies do in general. It does not know what your company does specifically.

This is the gap that Agent Skills closes.

What Are Agent Skills?

Agent Skills are a simple, open format for giving AI agents domain expertise and procedural knowledge. They answer one question: How should the agent approach this kind of task?

The Pilot, the Plane, and the Flight Manual

The clearest way to understand how the LLM, Tools, and Agent Skills work together is through a single analogy one that maps all three, not just two.

Picture a commercial aircraft sitting at the gate, ready for departure. It has everything a plane is supposed to have. But before we can talk about what makes it fly safely, we need to understand what each part actually does.

The Controls-Tools

The aircraft has throttles, a yoke, flaps, landing gear, and dozens of other physical controls. These are the mechanisms that change the state of the world. Push the throttle forward and the engines spool up. Deploy the flaps and the lift characteristics of the wing change. Lower the landing gear and the plane is ready to touch down.

The controls cannot do anything on their own. They sit inert until someone operates them. But without them, nothing can happen at all no matter how skilled the pilot or how detailed the manual.

In an AI agent, Tools are the controls. They are the callable functions that interact with the outside world querying a database, calling a REST API, reading a PDF, posting to Slack, writing a record to an ERP. Without tools, the agent can reason about anything but change nothing. With tools, every decision the agent makes can become a real action in a real system.

The Pilot-The LLM

The pilot is what brings the aircraft to life. They sit in the cockpit, read the instruments altitude, airspeed, weather radar, traffic alerts and make continuous decisions: when to climb, when to level off, when to adjust course, when to call air traffic control.

The pilot does not invent the controls. The throttle was already there. The pilot decides which control to use, when to use it, in what combination, and in what sequence. They are the reasoning layer that turns raw capability into purposeful action.

In an AI agent, the LLM is the pilot. It reads the inputs the user message, the tool results, the conversation history and decides what to do next. Which tool to call. What parameters to pass. Whether the task is complete or needs another step. The LLM does not execute tools directly; it decides to invoke them, just as a pilot decides to operate a control.

The Flight Manual—Agent Skills

Now imagine a highly experienced pilot in an unfamiliar aircraft type, flying into an airport they have never visited before, under regulations they were not trained on. They can fly. They can read instruments. They can operate controls. But they are improvising every decision because they do not have the specific procedures for this situation.

The flight manual the Standard Operating Procedures is what fills that gap. It tells the pilot exactly what checklist to run before takeoff at this airport. What altitude to maintain in this specific airspace. Precisely what to do when this warning light illuminates. How to coordinate with ground control using this airline’s specific protocols.

The manual does not fly the plane. It does not operate the controls. What it does is ensure that every decision the pilot makes is the correct decision for this context, not just a reasonable guess based on general experience.

In an AI agent, Agent Skills are the flight manual. They encode your organization’s specific rules, workflows, and policies the leave entitlements, the approval thresholds, the invoice validation steps, the escalation criteria. The LLM still does the reasoning. The tools still take the actions. But now every decision is grounded in your actual procedures, not generic training data.

What Happens When One Is Missing

Controls but no pilot and no manual (Tools only): the throttle is there but nothing is operating it. The agent has APIs it can call but no reasoning to decide which one, when, or in what order. It cannot complete a task.

Pilot but no controls (LLM only): the pilot reads every instrument perfectly and knows exactly what to do but has no way to act. The agent reasons flawlessly but cannot retrieve data, call a system, or change anything in the world. It can only generate text.

Pilot and controls but no manual (LLM + Tools, no Agent Skills): the pilot can fly and the controls respond but every decision is improvised from general experience. This is most enterprise agents today. They work, inconsistently. They produce plausible answers that do not match your actual policies. Each run may go differently. Nothing is auditable.

All three together: the controls take action, the pilot reasons about what to do, and the manual ensures every decision follows your organization’s exact procedures. Consistent. Auditable. Trustworthy.

Tools (controls) give the agent reach. The LLM (pilot) gives the agent reasoning. Agent Skills (flight manual) give the agent organizational judgment. You need all three just like a flight needs controls, a pilot, and the procedures to fly it safely.

The Same Pattern Across Every Skilled Domain

The analogy holds anywhere expertise is applied through instruments:

Surgeon and scalpel: the scalpel can cut anywhere. The surgeon’s training specifies exactly where, how deep, at what angle, and what to do if something unexpected is found. Remove the training and the scalpel is just a sharp object.

Chef and kitchen: the kitchen has every tool ovens, knives, heat, timers. The recipe encodes the sequence, temperatures, timings, and substitutions that produce a consistent dish. Without it, two chefs produce two different meals from the same ingredients.

Architect and CAD tools: the software can draw anything. The architect’s expertise encodes load bearing constraints, building codes, spatial relationships, and material properties that make the drawing a safe, buildable structure.

In every case the pattern is identical: tools provide capability, expertise provides judgment. The tool without the expertise is hardware. The expertise without the tool cannot act. Together they produce something reliable.

What Agent Skills Actually Are

A skill is a directory containing:

*SKILL.md *(required) — instructions in plain Markdown with YAML frontmatter metadata

scripts/ (optional) — code the agent can run

references/ (optional) — policy documents, FAQs, reference material

assets/ (optional) — templates, schemas, examples

All Three Working Together

In a well designed enterprise agent, MCP and Agent Skills each play their role and neither replaces the other:

MCP Tool: fetches the vendor contract document from SharePoint

Agent Skill: applies liability cap rules, escalation logic, and policy references

LLM: reads both, produces the grounded compliance response

Remove any one of the three and the agent breaks. The tool without the skill fetches the document but does not know what to look for. The skill without the tool knows the rules but cannot access the data. The LLM without either produces a plausible guess.

> Need to connect to something? Use MCP. Need to teach the agent how to approach something? Use a Skill. Need a policy-grounded answer? You need both.

Scaling Agent Skills Across the Enterprise

One skill in one department is a proof of concept. The real value is a skills library a version-controlled repository of organizational expertise that any agent can draw from, on any compatible platform, across every department.

Any agent on any platform granted read access to this repository can load skills from it. When a policy changes, one SKILL.md update propagates to every agent on every platform simultaneously.

One skills library. Every department. Every platform. Updated in one place. Governed like code. T*his is the enterprise value proposition of Agent Skills*.

Building an Agent with Tools and AgentSkills .

*Folder Structure *

Sample Tool

"""Incident report status tool - implements incident-report skill."""

from langchain_core.tools import tool
from pydantic import BaseModel, Field


class IncidentInput(BaseModel):
    incident_id: str = Field(..., description="Incident ID (e.g., INC-2025-001)", min_length=5, max_length=32)


_DEMO_INCIDENTS = {
    "INC-2025-001": {
        "title": "API latency spike in us-east-1",
        "severity": "P1",
        "status": "Resolved",
        "affected_service": "payment-gateway",
        "started_at": "2025-03-16 14:30 UTC",
        "resolved_at": "2025-03-16 15:45 UTC",
        "owner": "SRE On-Call",
    },
    "INC-2025-002": {
        "title": "Database replica lag exceeding threshold",
        "severity": "P2",
        "status": "Investigating",
        "affected_service": "analytics-db",
        "started_at": "2025-03-17 09:00 UTC",
        "resolved_at": None,
        "owner": "Platform Team",
    },
    "INC-2025-003": {
        "title": "CDN cache miss rate elevated",
        "severity": "P2",
        "status": "Monitoring",
        "affected_service": "cdn-edge",
        "started_at": "2025-03-17 11:20 UTC",
        "resolved_at": None,
        "owner": "Infrastructure",
    },
}


def lookup_incident(incident_id: str) -> dict:
    """Lookup incident - used by tool and scripts."""
    incident_id = incident_id.strip().upper()
    if not incident_id:
        return {"error": "Incident ID is required. Use format INC-2025-001."}
    if incident_id not in _DEMO_INCIDENTS:
        return {
            "error": f"Incident '{incident_id}' not found. Known demo incidents: "
            f"{', '.join(_DEMO_INCIDENTS.keys())}."
        }
    data = _DEMO_INCIDENTS[incident_id]
    return {
        "incident_id": incident_id,
        "title": data["title"],
        "severity": data["severity"],
        "status": data["status"],
        "affected_service": data["affected_service"],
        "started_at": data["started_at"],
        "resolved_at": data["resolved_at"] or "N/A - still active",
        "owner": data["owner"],
    }


@tool
def incident_report_status(incident_id: str) -> str:
    """Check incident report status. Use when user asks about outage status, P1/P2 incidents, or specific incident ID (e.g., INC-2025-001)."""
    try:
        validated = IncidentInput(incident_id=incident_id)
        result = lookup_incident(validated.incident_id)
        if "error" in result:
            return result["error"]
        resolved = f"Resolved: {result['resolved_at']}" if result["resolved_at"] != "N/A - still active" else "Status: Active"
        return (
            f"Incident {result['incident_id']}: {result['title']}\n"
            f"Severity: {result['severity']} | Status: {result['status']}\n"
            f"Affected: {result['affected_service']} | Owner: {result['owner']}\n"
            f"Started: {result['started_at']} | {resolved}"
        )
    except Exception as e:
        return f"Error: Invalid incident ID format. Use INC-YYYY-NNN. Details: {e}"

SKILL.md

"""Lookup incident by ID. Usage: python lookup.py <incident_id>"""

import json
import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).resolve().parents[4]))
from agentskills.skills._tools.incident_report import lookup_incident

if __name__ == "__main__":
    incident_id = sys.argv[1] if len(sys.argv) > 1 else ""
    result = lookup_incident(incident_id)
    print(json.dumps(result, indent=2))

agent.py

"""LangGraph ReAct agent with AgentSkills."""

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

from agentskills.skills import get_agent_skills


def create_agent(model: str = "gpt-4o-mini", temperature: float = 0):
    """Create a ReAct agent with AgentSkills tools."""
    llm = ChatOpenAI(model=model, temperature=temperature)
    tools = get_agent_skills()
    return create_react_agent(llm, tools)


def run_agent(agent, user_message: str) -> str:
    """Run the agent and return the final response."""
    result = agent.invoke({"messages": [{"role": "user", "content": user_message}]})
    messages = result.get("messages", [])
    if messages:
        last = messages[-1]
        if hasattr(last, "content") and last.content:
            return last.content
    return str(result)


def run_agent_interactive(agent, user_message: str) -> str:
    """Run the agent with visible execution: streams tool calls and results."""
    from langchain_core.messages import AIMessage, ToolMessage

    final_content = ""

    for chunk in agent.stream(
        {"messages": [{"role": "user", "content": user_message}]},
        stream_mode="updates",
    ):
        for node_name, node_output in chunk.items():
            messages = node_output.get("messages", [])
            for msg in messages:
                if isinstance(msg, AIMessage):
                    if getattr(msg, "tool_calls", None):
                        for tc in msg.tool_calls:
                            name = tc.get("name", "?")
                            args = tc.get("args", {})
                            print(f"  🔧 AgentSkills: {name}({args})")
                    elif msg.content:
                        text = str(msg.content)
                        print(f"  💭 Agent: {text[:200]}{'...' if len(text) > 200 else ''}")
                        final_content = text
                elif isinstance(msg, ToolMessage):
                    content = str(msg.content)[:300]
                    if len(str(msg.content)) > 300:
                        content += "..."
                    print(f"  ✓ Result: {content}")

    return final_content

main.py

"""Run the AgentSkills LangGraph agent."""

from dotenv import load_dotenv

load_dotenv()

from agentskills import create_agent, get_agent_skills, run_agent_interactive
from agentskills.skills.registry import get_skill_catalog


def _print_skills():
    """Show available Agent Skills (agentskills.io format)."""
    catalog = get_skill_catalog()
    if catalog:
        print("\n  Agent Skills (agentskills.io):")
        for s in catalog:
            desc = s["description"]
            print(f"    • {s['name']}: {desc[:70]}{'...' if len(desc) > 70 else ''}")
    else:
        tools = get_agent_skills()
        print("\n  Agent Skills:")
        for t in tools:
            print(f"    • {t.name}")
    print()


def main():
    print("Creating AgentSkills agent...")
    agent = create_agent()
    print("\n" + "=" * 50)
    print("  AgentSkills – enterprise skills for the agent")
    print("=" * 50)
    _print_skills()
    print("Commands: 'skills' = list AgentSkills | 'quit'/'exit' = stop\n")

    while True:
        try:
            user_input = input("You: ").strip()
        except (EOFError, KeyboardInterrupt):
            print("\nGoodbye!")
            break

        if not user_input:
            continue
        if user_input.lower() in ("quit", "exit", "q"):
            print("Goodbye!")
            break
        if user_input.lower() == "skills":
            _print_skills()
            continue

        print("\n--- AgentSkills executing ---")
        response = run_agent_interactive(agent, user_input)
        print(f"\n--- Agent ---\n{response}\n")


if __name__ == "__main__":
    main()

Run the agent with sample query or prompt

(AgentSkills) sreenir@Seenivasas-MacBook-Pro AgentSkills % uv run python main.py
Creating AgentSkills agent...

==================================================

AgentSkills – enterprise skills for the agent

Agent Skills (agentskills.io):
• contract-review: Check vendor contract review status in legal or procurement. Use when ...
• incident-report: Check incident and outage status for P1/P2 incidents. Use when the use...
• jira-ticket: Fetch Jira ticket details including status, assignee, and priority. Us...

Commands: 'skills' = list AgentSkills | 'quit'/'exit' = stop

*You: outage status *

Incident ID: INC-2025-001
Description: API latency spike in us-east-1
Severity: P1
Status: Resolved
Affected Service: Payment Ga...

--- Agent ---
The current outage status is as follows:

Incident ID: INC-2025-001
Description: API latency spike in us-east-1
Severity: P1
Status: Resolved
Affected Service: Payment Gateway
Owner: SRE On-Call
Incident Start Time: March 16, 2025, 14:30 UTC
Incident Resolved Time: March 16, 2025, 15:45 UTC

If you need more information, feel free to ask!

*You: RCA Requests *

--- AgentSkills executing ---
💭 Agent: It seems you're looking for information related to Root Cause Analysis (RCA) requests. However, I can't assist with creating or managing RCA requests directly. If you have a specific incident or issue...

--- Agent ---
It seems you're looking for information related to Root Cause Analysis (RCA) requests. However, I can't assist with creating or managing RCA requests directly. If you have a specific incident or issue in mind, please provide details, and I can help check the status of that incident or provide related information.

*You: any active outages *

--- AgentSkills executing ---
🔧 AgentSkills: incident_report_status({'incident_id': 'INC-2025-001'})
✓ Result: Incident INC-2025-001: API latency spike in us-east-1
Severity: P1 | Status: Resolved
Affected: payment-gateway | Owner: SRE On-Call
Started: 2025-03-16 14:30 UTC | Resolved: 2025-03-16 15:45 UTC
🔧 AgentSkills: incident_report_status({'incident_id': 'INC-2025-002'})
✓ Result: Incident INC-2025-002: Database replica lag exceeding threshold
Severity: P2 | Status: Investigating
Affected: analytics-db | Owner: Platform Team
Started: 2025-03-17 09:00 UTC | Status: Active
💭 Agent: Currently, there is one active outage:

Incident ID: INC-2025-002
- Description: Database replica lag exceeding threshold
- Severity: P2
- Status: Investigating
- **Affected ...

--- Agent ---
Currently, there is one active outage:

Incident ID: INC-2025-002
- Description: Database replica lag exceeding threshold
- Severity: P2
- Status: Investigating
- Affected Service: analytics-db
- Owner: Platform Team
- Started: March 17, 2025, 09:00 UTC

The previous incident (INC-2025-001) regarding API latency spike has been resolved.

You:

The Bottom Line

The core challenge in enterprise AI is not raw capability. Modern LLMs are already powerful enough for most business use cases. The real challenge is reliability ensuring agents consistently follow enterprise rules, policies, and standards at scale.

This is where Agent Skills and MCP complement each other:

MCP provides access to systems APIs, databases, and external actions (capabilities)

Agent Skills define how those capabilities should be used policies, reasoning steps, and constraints (procedures)

Used together, they close the gap between what an agent can do and what it should do.

A tool without a skill leads to inconsistent, guess based behavior

A skill without a tool produces correct reasoning but no execution

When combined, they create agents that are:

Predictable (consistent outputs)
Precise (aligned with business rules)
Auditable (traceable decisions and actions)

These are not optional qualities they are baseline requirements for production enterprise systems.

In One Line

MCP gives agents reach. Agent Skills give them discipline.

Write your skills once.

Apply them everywhere.

Govern them like code.

That is enterprise grade AI.

Thanks
Sreeni Ramadorai

Top comments (7)

Kalpaka • Mar 25

The pilot analogy works cleanly for the three-layer split, but it sneaks in an assumption: someone maintains the flight manual. Airlines have entire departments for that. Enterprise skill files don't.

The person who encoded the approval threshold leaves. The policy shifts but the SKILL.md doesn't. Six months later the agent is confidently enforcing rules nobody remembers writing.

Version control helps but doesn't solve staleness detection. You'd need the agent itself to flag when its skill-based decisions diverge from observed human overrides. Without that, the skill layer trades improvisation for a subtler failure: confident obsolescence.

Seenivasa Ramadurai • Mar 25

Hi Kalpaka ,
I see the concern, but in an Agent SDLC the skill‑staleness problem is managed in the same way traditional software staleness is handled. Skills aren’t loose documents sitting on a shared drive they’re versioned assets in a source code repository like GitHub. Any policy change or updated threshold follows the same lifecycle as code: a developer or skill owner modifies the skill file, commits the change, and the updated version is deployed with the next agent release. With that discipline in place, the skills evolve along with the system, and the risk of them silently drifting away from real-world policies is dramatically reduced.

Kalpaka • Mar 26

Code changes have natural tripwires: tests break, CI fails, users complain. Skill file changes often don't. The discipline works when someone remembers to commit the update. The failure mode I'm pointing at is subtler: the policy shifts in a meeting, the SKILL.md stays the same, and nothing signals the gap because the agent's output still looks correct — just against yesterday's rules.

Git tells you when a file changed. It can't tell you when a file should have changed but didn't.

klement Gunndu • Mar 19

The pilot/plane/manual analogy lands well. We've been encoding similar procedural knowledge as constraint files per agent — the gap between 'the model can do X' and 'the model knows how WE do X' is exactly where most enterprise deployments stall.

Seenivasa Ramadurai • Mar 19

Hi Klement ,
Thank you . and you stated very well the gap .

blade dancer • Apr 8

This is very interesting, thanks for sharing. What do you think of Skillware? For me skills.md doesn't work good, especially for custom skills and from what I understand it eats up tokens everytime it is trying to read the md and spinoff a skill. I am looking to contribute my skills on skillware.

Seenivasa Ramadurai • Apr 9

Hi Blade , I never heard about SKillware .. let me take a look and give you my feedback .