Richard Dillon

Posted on Mar 29

Primitive Shifts: AI Skills — The Knowledge Primitive Replacing Prompt Engineering

#ai #agents #webdev #programming

Primitive Shifts: AI Skills — The Knowledge Primitive Replacing Prompt Engineering

Every few months, the baseline of how AI systems work quietly moves. Engineers who noticed early weren't smarter — they were just paying attention to the right signals. The shift from "prompt engineering" to "function calling" caught teams off guard in 2023. The rise of MCP for tool integration did the same in late 2024. Now, the same pattern is playing out with AI Skills — and if you're still treating agent context as a prompt engineering problem, you're about to feel that familiar sting of realizing the floor moved while you were standing on it.

What Is It?

AI Skills are portable, structured knowledge packages that teach agents how to perform domain-specific tasks. This is categorically different from tools (which provide access to external systems) and prompts (which provide instructions for a single interaction). A skill encapsulates institutional knowledge: your team's code review standards, your API versioning conventions, the twelve edge cases in your payment processing flow that every new engineer learns the hard way.

The pattern didn't emerge from a single announcement. It crystallized from convergence. Cursor rules, GitHub Copilot custom instructions, Windsurf rules, and the .cursorrules file format all evolved toward the same abstraction independently throughout 2024 and early 2025. Different teams, solving the same underlying problem, arrived at remarkably similar solutions.

Anthropic formalized this in December 2025 with the Agent Skills open standard. OpenAI, Google, and Microsoft tooling adopted compatible interfaces within months. This wasn't vendor lock-in theater — the abstraction was obvious enough that interoperability became the path of least resistance.

Critically, skills are consumed at inference time, not training time. They extend agent capabilities without fine-tuning, sitting in context windows as structured, discoverable knowledge units. The primitive answers a specific architectural question that every production AI team eventually asks: "How do I make my agent understand our codebase conventions, compliance requirements, and domain logic without rebuilding the model?"

The answer used to be "stuff it in the system prompt and hope for the best." Now there's actual infrastructure.

Why It's Flying Under the Radar

Skills suffer from the worst kind of invisibility: they look like something teams already know how to dismiss. Engineers see .md files with instructions and assume it's prompt engineering with extra steps. The structural and discoverability properties — the actual innovation — are invisible at first glance. "We already have a CONTRIBUTING.md," teams say, missing that the format, placement, and tooling integration are what transform documentation into agent-consumable knowledge.

The adoption happened inside IDEs, not through announcement posts. Over 60,000 open-source projects adopted AGENTS.md files before most engineering teams noticed the pattern. The signal was there — you just had to be looking at repository structures rather than press releases.

Skills solve a problem teams don't name correctly. "Context finding" is the number-one developer pain point — 40% cite it in recent surveys. But teams frame this as a documentation problem or a search problem, not a knowledge-packaging problem. They invest in better docs, improved search indexing, more comprehensive onboarding materials. Meanwhile, the actual friction is that their AI agents can't find the knowledge that already exists.

Here's what should concern you: the format war already ended. While internal teams debated custom solutions and enterprise architects drew up plans for proprietary knowledge management systems, the major AI platforms quietly converged on a common interface. The window for "we'll build our own" closed without a memo.

Flask creator Armin Ronacher's observation went viral in niche circles: moving several MCP integrations to skills improved agent performance meaningfully — fewer hallucinations, better adherence to conventions, reduced back-and-forth. But this signal stayed in technical blogs and Hacker News threads. It didn't reach engineering leadership channels. Which means right now, there are teams planning six-month "AI context management" initiatives that could be solved with a well-structured markdown file and thirty minutes of work.

Hands-On: Try It Today

Let's make this concrete. The following Python script demonstrates how to create, validate, and test a skill file that teaches an AI agent your team's database migration conventions. This isn't a toy example — it's the pattern production teams are using.

#!/usr/bin/env python3
"""
skill_builder.py - Create and validate AI Skills for your codebase

This script demonstrates the emerging skill file pattern:
1. Structured markdown with semantic sections
2. Validation against the Agent Skills schema
3. Testing skill discovery with a local agent

Requires: pip install pydantic>=2.0 anthropic>=0.40.0 pyyaml>=6.0
Tested with: Python 3.11+, Anthropic API v0.40.0
"""

import json
import re
from pathlib import Path
from typing import Optional
from pydantic import BaseModel, Field, field_validator
import anthropic


class SkillSection(BaseModel):
    """A single section within a skill file."""
    heading: str = Field(..., description="Section heading (## level)")
    content: str = Field(..., description="Section content")
    priority: int = Field(default=0, ge=0, le=10, description="0-10, higher = more critical")


class AgentSkill(BaseModel):
    """
    Represents a portable AI Skill following the emerging standard.

    Skills differ from prompts in three ways:
    1. They're discoverable (agents find them based on task context)
    2. They're portable (same skill works across Cursor, Claude Code, Copilot)
    3. They're composable (multiple skills can be active simultaneously)
    """
    name: str = Field(..., description="Skill identifier, e.g., 'database-migrations'")
    version: str = Field(default="1.0.0", pattern=r"^\d+\.\d+\.\d+$")
    triggers: list[str] = Field(
        default_factory=list,
        description="Keywords/patterns that should activate this skill"
    )
    sections: list[SkillSection] = Field(default_factory=list)
    max_rules: int = Field(default=15, description="Guardrails should stay under 15 rules")

    @field_validator('sections')
    @classmethod
    def validate_guardrails_count(cls, sections: list[SkillSection]) -> list[SkillSection]:
        """Enforce the 15-rule limit for GUARDRAILS sections."""
        for section in sections:
            if "GUARDRAIL" in section.heading.upper():
                rule_count = len(re.findall(r'^[-*]\s', section.content, re.MULTILINE))
                if rule_count > 15:
                    raise ValueError(
                        f"GUARDRAILS section has {rule_count} rules; max is 15. "
                        "Agents struggle with more — prioritize ruthlessly."
                    )
        return sections

    def to_markdown(self) -> str:
        """Export skill as AGENTS.md compatible markdown."""
        lines = [
            f"# {self.name}",
            f"<!-- skill-version: {self.version} -->",
            f"<!-- triggers: {', '.join(self.triggers)} -->",
            "",
        ]
        for section in sorted(self.sections, key=lambda s: -s.priority):
            lines.extend([
                f"## {section.heading}",
                "",
                section.content,
                "",
            ])
        return "\n".join(lines)


def create_migration_skill() -> AgentSkill:
    """
    Example: A skill teaching database migration conventions.

    This encapsulates knowledge that would otherwise live in:
    - Onboarding docs (rarely read by AI)
    - Code review comments (learned too late)
    - Tribal knowledge (never written down)
    """
    return AgentSkill(
        name="database-migrations",
        version="1.0.0",
        triggers=["migration", "schema change", "alembic", "database", "ALTER TABLE"],
        sections=[
            SkillSection(
                heading="GUARDRAILS",
                priority=10,
                content="""
- NEVER use `DROP COLUMN` without a two-phase migration plan
- ALWAYS add new columns as nullable first, backfill, then add constraints
- NEVER modify migrations that have been applied to production
- ALWAYS include rollback instructions in migration docstring
- Use `op.execute()` for data migrations, not ORM models
- Foreign keys must use `ondelete="CASCADE"` or explicit alternative
- Index names must follow pattern: `ix_{table}_{columns}`
- ALWAYS run `alembic check` before committing migration files
"""
            ),
            SkillSection(
                heading="CONVENTIONS",
                priority=5,
                content="""
Our migrations follow a specific lifecycle:

1. **Development**: Create migration with `alembic revision --autogenerate -m "description"`
2. **Review**: Migration PRs require explicit sign-off from database owner
3. **Staging**: Migrations run automatically on merge to `staging` branch
4. **Production**: Migrations require manual trigger via deployment pipeline

Migration files live in `src/db/migrations/versions/`. The revision ID format
is timestamp-based (YYYYMMDD_HHMM) not random hex.

For large tables (>1M rows), use batch operations:

python

Good: batched update

for batch in op.batch_alter_table('users', batch_size=10000):
batch.update(...)

Bad: full table lock

op.execute("UPDATE users SET ...")

"""
            ),
            SkillSection(
                heading="COMMON MISTAKES",
                priority=7,
                content="""
**The ORM Import Trap**: Never import application models in migrations.
Models change; migrations are immutable history. Use raw SQL or SQLAlchemy core.

**The Default Value Trap**: Adding `server_default` to existing column requires
two migrations — one to add the default, one to backfill NULL values.

**The Index Trap**: Creating indexes on large tables blocks writes. Use 
`CREATE INDEX CONCURRENTLY` via `op.execute()` with `postgresql_concurrently=True`.
"""
            ),
        ],
    )


def test_skill_discovery(skill: AgentSkill, test_query: str) -> dict:
    """
    Test whether an agent correctly discovers and applies a skill.

    This simulates what happens when a developer asks for help with
    a task that should trigger skill activation.
    """
    client = anthropic.Anthropic()  # Uses ANTHROPIC_API_KEY env var

    skill_content = skill.to_markdown()

    # Simulate skill injection (in production, IDEs do this automatically)
    system_prompt = f"""You are a coding assistant with access to project-specific skills.

Active skill:
---
{skill_content}
---

When the user's request relates to a skill's triggers, apply that skill's 
guardrails and conventions. Cite specific rules when relevant."""

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=system_prompt,
        messages=[{"role": "user", "content": test_query}]
    )

    response_text = response.content[0].text

    # Check if skill was actually applied
    skill_indicators = [
        "nullable first" in response_text.lower(),
        "guardrail" in response_text.lower(),
        "two-phase" in response_text.lower(),
        "rollback" in response_text.lower(),
    ]

    return {
        "query": test_query,
        "response": response_text,
        "skill_applied": any(skill_indicators),
        "indicators_found": sum(skill_indicators),
    }


if __name__ == "__main__":
    # Create the skill
    skill = create_migration_skill()

    # Write to AGENTS.md (or skill-specific file)
    output_path = Path("AGENTS.md")
    output_path.write_text(skill.to_markdown())
    print(f"✓ Wrote skill to {output_path}")

    # Validate structure
    print(f"✓ Skill '{skill.name}' v{skill.version} validated")
    print(f"  Triggers: {skill.triggers}")
    print(f"  Sections: {[s.heading for s in skill.sections]}")

    # Test discovery (requires ANTHROPIC_API_KEY)
    try:
        result = test_skill_discovery(
            skill, 
            "I need to add a new 'email_verified' boolean column to the users table"
        )
        print(f"\n✓ Discovery test:")
        print(f"  Query: {result['query']}")
        print(f"  Skill applied: {result['skill_applied']}")
        print(f"  Indicators found: {result['indicators_found']}/4")
    except anthropic.AuthenticationError:
        print("\n⚠ Skipping discovery test (set ANTHROPIC_API_KEY to test)")

Start with a single AGENTS.md file at your repository root. Document your team's code review standards, API conventions, and — critically — the things your AI assistant keeps getting wrong. The emerging hierarchy uses MEMORY.md for long-term knowledge, GUARDRAILS.md for hard constraints (keep it under 15 rules; agents struggle with more), and WORKSTATE.md for session continuity.

Test skill discovery by creating a skill for one workflow and verifying your IDE's agent surfaces it unprompted when relevant context appears. Measure the delta: run identical coding tasks with and without skills loaded. Teams report 20-30% reduction in back-and-forth corrections, which compounds significantly over a sprint.

What This Means for Your Stack

If your team has per-developer .cursorrules files, per-repo prompt templates, and scattered system_prompt.txt files across different projects, you're accumulating non-portable knowledge silos. This is technical debt that will become painful when you need to switch tools, onboard new team members, or scale agent usage beyond individual developers.

More immediately relevant: your RAG pipelines may be over-engineered for problems skills solve natively. If you're chunking internal documentation, embedding it into a vector database, and injecting retrieved chunks into prompts at inference time, ask yourself whether a curated 500-line skill file would perform better. For institutional knowledge — conventions, constraints, common mistakes — the answer is usually yes. Lower latency, lower cost, better coherence.

The "context window economy" is real. Atlassian's 2025 developer productivity data shows AI gains are significantly offset by knowledge-finding friction. Skills function as the compression layer that makes institutional knowledge agent-consumable. You're not solving a documentation problem; you're solving a context density problem.

Here's the architectural insight that matters: Skills and MCP form a complete architecture. MCP gives agents access to systems — databases, APIs, services, tools. Skills give agents knowledge about how to use that access correctly. Most teams have built the former without the latter. They have agents that can query their databases and call their APIs, but those agents don't know the team's conventions, constraints, or context. That's why the outputs feel generic even when the capabilities are powerful.

Your evaluation frameworks need updating too. You're not just testing model outputs anymore. You're testing whether the right skills were discovered and applied for the task. Add skill coverage to your agent evaluation suite.

The Infrastructure Signal

When Anthropic, OpenAI, Google, and Microsoft independently adopt the same abstraction within six months, you're watching an interface crystallize into infrastructure. This is the signal that separates "interesting experiment" from "new baseline." Skills crossed that threshold.

Bessemer Venture Partners' 2025 State of AI report identifies skills as the "memory, context, and beyond" layer — what they call the dark matter of production AI systems. It's the factor that explains why two teams using the same model, same tools, and same prompts get meaningfully different results. One team's agents know things. The other team's agents are constantly relearning.

The pattern mirrors function calling's trajectory from 2023. That capability went from prompt hack to first-class API primitive to "why are you still parsing JSON manually?" in under 18 months. Skills are on the same curve, roughly twelve months behind. We're currently in the "early adopters have integrated it; everyone else is about to feel pressure" phase.

IDE vendors are building skill registries and discovery mechanisms into their agent UX. The primitives are becoming platform features, not user configurations. This is the point where opting out starts to cost more than opting in.

The research-to-production gap is closing fast. Academic work on "institutional knowledge primitives" appeared in March 2025 arxiv papers. Those same concepts shipped in production products by Q4 2025. The theory-to-deployment cycle has compressed to months, not years. By the time you read a research paper about emerging AI patterns, the major platforms have often already shipped an implementation.

Shift Rating

🟢 Adopt Now

The standard exists. Major platforms support it. The migration cost is minimal — you're writing structured markdown files, not rebuilding infrastructure. Teams that formalize their institutional knowledge into skills today will compound that advantage as agent capabilities expand. Every skill you write now becomes more valuable as models get better at following them.

Teams that wait will find themselves explaining the same context to every new AI tool, repeatedly, while competitors' agents already know the answers. The compounding effect here is real: a team with well-maintained skills gets better agent outputs, which means less time correcting agents, which means more time building, which means faster iteration on the skills themselves.

Start this week. Pick one workflow where your AI assistant consistently makes mistakes or asks for clarification. Document the implicit knowledge in a skill file. Measure the improvement. Then expand from there. The infrastructure is ready. The question is whether you are.

This is part of **Primitive Shifts* — a monthly series tracking when new AI building blocks
move from novel experiments to infrastructure you'll be expected to know.*

Follow the Next MCP Watch series on Dev.to to catch every edition.

Spotted a shift happening in your stack? Drop it in the comments.

DEV Community

Primitive Shifts: AI Skills — The Knowledge Primitive Replacing Prompt Engineering

Primitive Shifts: AI Skills — The Knowledge Primitive Replacing Prompt Engineering

What Is It?

Why It's Flying Under the Radar

Hands-On: Try It Today

Good: batched update

Bad: full table lock

What This Means for Your Stack

The Infrastructure Signal

Shift Rating

Start this week. Pick one workflow where your AI assistant consistently makes mistakes or asks for clarification. Document the implicit knowledge in a skill file. Measure the improvement. Then expand from there. The infrastructure is ready. The question is whether you are.

Top comments (0)