WonderLab

Posted on Jun 11

Open Source Project of the Day (#92): Agent Skills — Engineering Discipline for AI Coding Agents

#ai #opensource #claude #engineering

Introduction

"AI coding agents default to the shortest path — which often means skipping specs, tests, and security reviews."

This is article #92 in the Open Source Project of the Day series. Today's project is Agent Skills — a collection of production-grade engineering workflow skills for AI coding agents, built by Addy Osmani, Principal Engineer on the Google Chrome team.

You've probably used Claude Code or Cursor to write a feature, then looked back and realized the agent wrote no tests. Or an API endpoint had zero input validation. Or the architecture doc was still blank.

This isn't coincidence. AI agents have a deep-seated pull toward the shortest path. Hand an agent a task and it will make the code run as quickly as possible, then consider the work done. Specs, test coverage, security hardening — none of those are on the path to "runs," so agents skip them by default.

Agent Skills starts from that observation: encode the discipline that senior engineers bring into skill files, so agents have structured workflows and exit criteria to follow at every development phase instead of improvising.

What You'll Learn

The full architecture: how 24 skills cover 7 phases of the development lifecycle
All 7 slash commands: the complete /spec to /ship workflow
The anatomy of a SKILL.md file: anti-rationalization tables and verification exit conditions
4 agent personas: Code Reviewer, Test Engineer, Security Auditor, Web Performance Auditor
The engineering principles embedded in the workflows: Hyrum's Law, the Beyoncé Rule, Chesterton's Fence
Installation in Claude Code, Cursor, and other AI tools

Prerequisites

Experience with Claude Code, Cursor, or a similar AI coding tool
Basic software engineering background (tests, code review, CI/CD)
Interest in making AI-assisted development more reliable and disciplined

Project Background

What Is Agent Skills?

Agent Skills is a set of production-grade engineering workflow files for AI coding agents, positioned as "the discipline layer your AI agent is missing."

The problem it addresses isn't agent capability — modern AI agents write code well. The problem is default behavior. Without constraints, agents default to shortcuts: get the code running first, write docs later, skip tests for now, handle security "in a future pass." Those deferred tasks accumulate into unmaintainable technical debt in any real project.

Author / Team

Author: Addy Osmani
Background: Principal Engineer at Google Chrome, author of Learning JavaScript Design Patterns, prominent engineering voice in the frontend community
License: MIT
Version: Main branch, continuously updated

Project Stats

⭐ GitHub Stars: 51,900+
🍴 Forks: 5,700+
📦 Content: 24 skills + 7 slash commands + 4 agent personas
📄 License: MIT

Core Features

What It Does

Agent Skills works by providing structured engineering workflows as Markdown files. When an agent processes a relevant task, it reads the skill file and follows the defined steps and checkpoints rather than improvising the shortest path.

Agent without Skills:
Task → Write code immediately → "Done"
          ↓ (skips spec, tests, security)
       Technical debt accumulates

Agent with Skills:
Task → Read skill → Execute by phase → Pass exit criteria → Actually done
         ↓                    ↓
    Clear workflow       Each phase has a verification gate

Use Cases

New feature development
- /spec forces a written specification before any code, converting requirements into testable acceptance criteria
- /plan breaks the feature into atomic tasks, each touching no more than 5 files
- /build implements one vertical slice at a time, with a test and commit per slice
Code quality
- /review conducts a code review at Staff engineer standard, covering readability, testability, and maintainability
- /code-simplify targets complexity reduction specifically, separate from general refactoring
Testing
- /test runs a test-driven development cycle, Red-Green-Refactor
- "Prove-It" mode for bug fixes: write a failing test that reproduces the bug first; passing test proves the fix
Security hardening
- The security-and-hardening skill mandates STRIDE threat modeling before writing security-sensitive code
- Separate checklist for LLM-specific risks: prompt injection, context leakage, untrusted model output
Shipping
- /ship covers the complete release chain from Git workflow through CI/CD to observability

Quick Start

Install in Claude Code:

# Option 1: Clone the full repo
git clone https://github.com/addyosmani/agent-skills
cp -r agent-skills/skills ~/.claude/skills/
cp -r agent-skills/agents ~/.claude/agents/
cp -r agent-skills/commands ~/.claude/commands/

# Option 2: Copy individual skills as needed
cp agent-skills/skills/spec-driven-development/SKILL.md ~/.claude/skills/spec-driven-development.md

Then use slash commands directly in conversation:

/spec I need a user authentication module — email registration, OAuth login, password reset

/build auto Implement the auth module from the spec above

/review Review all changes under src/auth/

/ship Prepare v1.2.0 for release

Install in Cursor:

# Project-level
cp -r agent-skills/skills .cursor/skills/

# Global
cp -r agent-skills/skills ~/.cursor/skills/

Compatibility across AI tools:

Tool	Path	Command support
Claude Code	`~/.claude/skills/`	✅ Full
Cursor	`.cursor/skills/`	⚠️ Partial
Gemini CLI	`~/.gemini/skills/`	⚠️ Partial
Windsurf	`.windsurf/skills/`	⚠️ Partial
OpenCode	`.opencode/skills/`	⚠️ Partial
GitHub Copilot	Markdown system prompt	⚠️ No slash commands
Kiro IDE	Native support	✅ Full

All 7 Commands

Command	Phase	Core Principle
`/spec`	Define	Spec before code
`/plan`	Plan	Small, atomic tasks
`/build`	Build	One slice at a time
`/test`	Verify	Tests are proof
`/review`	Review	Improve code health
`/code-simplify`	Simplify	Clarity over cleverness
`/ship`	Deploy	Faster is safer

/build auto is a special mode: you approve the plan once, and the agent executes the full implementation autonomously — but still commits and tests each task individually.

Deep Dive

Skill File Anatomy

Every SKILL.md follows a consistent structure with four core sections:

SKILL.md structure
├── Frontmatter (name, description, trigger conditions)
├── Step-by-step Workflow (phased, specific steps)
├── Anti-Rationalization Table (common excuses + rebuttals)
└── Verification / Exit Criteria (what "done" actually means)

The last two sections carry most of the value.

Anti-rationalization tables list the shortcuts agents most commonly take, paired with the reality:

Rationalization	Reality
"I'll add tests later"	Bugs compound. A bug in Slice 1 makes Slices 2-5 wrong.
"It's faster to do it all at once"	Feels faster until something breaks across 500 changed lines.
"This refactor is small enough to include"	Refactors mixed with features make both harder to review and debug.
"Run again just to be sure"	Repeating the same command adds nothing unless the code has changed.

Exit criteria define what "done" actually means. "Seems right" is never sufficient:

incremental-implementation exit criteria:
- [ ] Each increment individually tested and committed
- [ ] Full test suite passes
- [ ] Build is clean
- [ ] Feature works end-to-end as specified
- [ ] No uncommitted changes remain

The design gives agents a concrete checklist to verify against, rather than relying on self-assessed completion.

The Full 24-Skill Map

Define (3)
  ├── interview-me             ← Requirement elicitation through structured questions
  ├── idea-refine              ← Turning rough ideas into executable directions
  └── spec-driven-development  ← Full spec before any implementation

Plan (1)
  └── planning-and-task-breakdown  ← Atomic task decomposition, ≤5 files per task

Build (7)
  ├── incremental-implementation   ← Vertical slices, test and commit per slice
  ├── test-driven-development      ← Red-Green-Refactor cycle
  ├── context-engineering          ← Precise control of agent working context
  ├── source-driven-development    ← Existing code as ground truth
  ├── doubt-driven-development     ← Actively surfacing blind spots
  ├── frontend-ui-engineering      ← Component design and accessibility
  └── api-and-interface-design     ← Contract-first, versioned interfaces

Verify (2)
  ├── browser-testing-with-devtools  ← DevTools-assisted browser testing
  └── debugging-and-error-recovery   ← Systematic fault isolation and fix

Review (4)
  ├── code-review-and-quality   ← Readability, testability, maintainability
  ├── code-simplification       ← Complexity reduction, distinct from refactoring
  ├── security-and-hardening    ← STRIDE modeling + non-negotiable checklist
  └── performance-optimization  ← Core Web Vitals and backend performance

Ship (6)
  ├── git-workflow-and-versioning       ← Trunk-based development
  ├── ci-cd-and-automation             ← Automated pipeline design
  ├── deprecation-and-migration        ← Safe API removal and migration
  ├── documentation-and-adrs           ← Architecture Decision Records
  ├── observability-and-instrumentation ← Logs, metrics, traces
  └── shipping-and-launch              ← Complete release checklist

Meta (1)
  └── using-agent-skills  ← How to use this system effectively

4 Agent Personas

Beyond skill files, the project provides 4 specialized agent personas for targeted work:

code-reviewer: Reviews code at Staff engineer standard — readability, testability, side effects, edge cases.

test-engineer: Focuses on test coverage and quality analysis, examining test design rather than just measuring coverage numbers.

security-auditor: OWASP-based security assessment with a separate checklist for LLM applications — prompt injection, context leakage, untrusted model output.

web-performance-auditor: Core Web Vitals audit with actionable optimization recommendations.

Using personas in Claude Code:

Use the security-auditor persona to review src/api/auth.ts

Use the web-performance-auditor persona to analyze homepage load performance

Embedded Engineering Principles

The skills encode several principles from Google's engineering culture directly into the workflows:

Hyrum's Law: Once an API has enough users, they will depend on every observable behavior, regardless of what the documentation says. In practice: search all callers before changing any public behavior; don't assume undocumented behavior goes unused.

Beyoncé Rule ("if you liked it, you should have put a test on it"): If a behavior is worth keeping, it's worth having a test. Code without test coverage has no safety net when changed.

Chesterton's Fence: Don't remove code you don't understand. Establish the reason it exists before deciding to delete it.

Shift Left: Move security and testing from pre-release into development. The earlier a problem is caught, the cheaper it is to fix.

Trunk-based development: Short-lived feature branches, frequent integration into the main branch. Avoids the merge conflicts that accumulate with long-running branches.

security-and-hardening: The LLM-Specific Rules

This skill includes a section dedicated to LLM applications that's worth examining on its own:

"Treat all model output as untrusted input."

Specific rules:

Never pass model output directly into SQL queries, eval(), shell commands, or innerHTML
The system prompt is not a security boundary — enforce permissions in code, not prompts
Keep users' private data and other users' data out of prompts; anything in context can be echoed back

These rules address real vulnerability patterns in current LLM application development, not hypothetical threats.

spec-driven-development: Four Gated Phases

The spec workflow enforces a four-phase gate model, each requiring human review before advancing:

Phase 1: Specify
  → Draft requirements covering objective, structure, testing, and boundaries
  → Surface assumptions explicitly — list them and ask for correction

Phase 2: Plan
  → Technical implementation approach with dependencies and risks
  → Reframe vague goals as testable success criteria ("faster" → specific LCP/CLS targets)

Phase 3: Tasks
  → Discrete, verifiable work items (~5 files max each)
  → Three-tier boundaries: Always do / Ask first / Never do

Phase 4: Implement
  → Execute tasks incrementally, spec stays alive
  → Update spec when decisions shift; commit it to version control

The common trap the skill flags: "I'll write the spec after I code" produces documentation, not specification. The value comes from forcing clarity before work begins.

Links and Resources

Official Resources

🌟 GitHub: addyosmani/agent-skills
👤 Author: Addy Osmani
📖 Book: Learning JavaScript Design Patterns — Addy Osmani

Engineering Principles Referenced

Hyrum's Law — Hyrum Wright, Software Engineering at Google
The Beyoncé Rule — Google SRE Workbook
Chesterton's Fence — G.K. Chesterton, Orthodoxy
Shift Left Testing — Modern DevOps practice
Trunk-Based Development — trunkbaseddevelopment.com

Conclusion

Agent Skills doesn't extend what AI agents can do. It constrains what they do by default.

The capability ceiling for AI coding agents is already quite high. The gap is in default behavior: specs get skipped, tests get deferred, security gets left to "a future pass." Those deferrals accumulate into projects that are hard to understand, hard to change, and hard to trust.

The design approach here is worth studying beyond this specific project. Encode expert behavior as executable constraints rather than relying on the AI to self-assess "how things should be done." Anti-rationalization tables make the most common shortcuts explicit. Exit criteria make "done" unambiguous. The same pattern shows up in PM workflows (pm-skills) and AI design constraints (taste-skill) — structured human expertise, packaged for AI consumption.

For any engineer using AI-assisted coding, agent-skills is worth a trial. At minimum, install spec-driven-development and test-driven-development and observe whether agent behavior changes.

Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.

Welcome to my Homepage for more useful insights and interesting products.

DEV Community