Introduction
"AI coding agents default to the shortest path — which often means skipping specs, tests, and security reviews."
This is article #92 in the Open Source Project of the Day series. Today's project is Agent Skills — a collection of production-grade engineering workflow skills for AI coding agents, built by Addy Osmani, Principal Engineer on the Google Chrome team.
You've probably used Claude Code or Cursor to write a feature, then looked back and realized the agent wrote no tests. Or an API endpoint had zero input validation. Or the architecture doc was still blank.
This isn't coincidence. AI agents have a deep-seated pull toward the shortest path. Hand an agent a task and it will make the code run as quickly as possible, then consider the work done. Specs, test coverage, security hardening — none of those are on the path to "runs," so agents skip them by default.
Agent Skills starts from that observation: encode the discipline that senior engineers bring into skill files, so agents have structured workflows and exit criteria to follow at every development phase instead of improvising.
What You'll Learn
- The full architecture: how 24 skills cover 7 phases of the development lifecycle
- All 7 slash commands: the complete
/specto/shipworkflow - The anatomy of a SKILL.md file: anti-rationalization tables and verification exit conditions
- 4 agent personas: Code Reviewer, Test Engineer, Security Auditor, Web Performance Auditor
- The engineering principles embedded in the workflows: Hyrum's Law, the Beyoncé Rule, Chesterton's Fence
- Installation in Claude Code, Cursor, and other AI tools
Prerequisites
- Experience with Claude Code, Cursor, or a similar AI coding tool
- Basic software engineering background (tests, code review, CI/CD)
- Interest in making AI-assisted development more reliable and disciplined
Project Background
What Is Agent Skills?
Agent Skills is a set of production-grade engineering workflow files for AI coding agents, positioned as "the discipline layer your AI agent is missing."
The problem it addresses isn't agent capability — modern AI agents write code well. The problem is default behavior. Without constraints, agents default to shortcuts: get the code running first, write docs later, skip tests for now, handle security "in a future pass." Those deferred tasks accumulate into unmaintainable technical debt in any real project.
Author / Team
- Author: Addy Osmani
- Background: Principal Engineer at Google Chrome, author of Learning JavaScript Design Patterns, prominent engineering voice in the frontend community
- License: MIT
- Version: Main branch, continuously updated
Project Stats
- ⭐ GitHub Stars: 51,900+
- 🍴 Forks: 5,700+
- 📦 Content: 24 skills + 7 slash commands + 4 agent personas
- 📄 License: MIT
Core Features
What It Does
Agent Skills works by providing structured engineering workflows as Markdown files. When an agent processes a relevant task, it reads the skill file and follows the defined steps and checkpoints rather than improvising the shortest path.
Agent without Skills:
Task → Write code immediately → "Done"
↓ (skips spec, tests, security)
Technical debt accumulates
Agent with Skills:
Task → Read skill → Execute by phase → Pass exit criteria → Actually done
↓ ↓
Clear workflow Each phase has a verification gate
Use Cases
-
New feature development
-
/specforces a written specification before any code, converting requirements into testable acceptance criteria -
/planbreaks the feature into atomic tasks, each touching no more than 5 files -
/buildimplements one vertical slice at a time, with a test and commit per slice
-
-
Code quality
-
/reviewconducts a code review at Staff engineer standard, covering readability, testability, and maintainability -
/code-simplifytargets complexity reduction specifically, separate from general refactoring
-
-
Testing
-
/testruns a test-driven development cycle, Red-Green-Refactor - "Prove-It" mode for bug fixes: write a failing test that reproduces the bug first; passing test proves the fix
-
-
Security hardening
- The
security-and-hardeningskill mandates STRIDE threat modeling before writing security-sensitive code - Separate checklist for LLM-specific risks: prompt injection, context leakage, untrusted model output
- The
-
Shipping
-
/shipcovers the complete release chain from Git workflow through CI/CD to observability
-
Quick Start
Install in Claude Code:
# Option 1: Clone the full repo
git clone https://github.com/addyosmani/agent-skills
cp -r agent-skills/skills ~/.claude/skills/
cp -r agent-skills/agents ~/.claude/agents/
cp -r agent-skills/commands ~/.claude/commands/
# Option 2: Copy individual skills as needed
cp agent-skills/skills/spec-driven-development/SKILL.md ~/.claude/skills/spec-driven-development.md
Then use slash commands directly in conversation:
/spec I need a user authentication module — email registration, OAuth login, password reset
/build auto Implement the auth module from the spec above
/review Review all changes under src/auth/
/ship Prepare v1.2.0 for release
Install in Cursor:
# Project-level
cp -r agent-skills/skills .cursor/skills/
# Global
cp -r agent-skills/skills ~/.cursor/skills/
Compatibility across AI tools:
| Tool | Path | Command support |
|---|---|---|
| Claude Code | ~/.claude/skills/ |
✅ Full |
| Cursor | .cursor/skills/ |
⚠️ Partial |
| Gemini CLI | ~/.gemini/skills/ |
⚠️ Partial |
| Windsurf | .windsurf/skills/ |
⚠️ Partial |
| OpenCode | .opencode/skills/ |
⚠️ Partial |
| GitHub Copilot | Markdown system prompt | ⚠️ No slash commands |
| Kiro IDE | Native support | ✅ Full |
All 7 Commands
| Command | Phase | Core Principle |
|---|---|---|
/spec |
Define | Spec before code |
/plan |
Plan | Small, atomic tasks |
/build |
Build | One slice at a time |
/test |
Verify | Tests are proof |
/review |
Review | Improve code health |
/code-simplify |
Simplify | Clarity over cleverness |
/ship |
Deploy | Faster is safer |
/build auto is a special mode: you approve the plan once, and the agent executes the full implementation autonomously — but still commits and tests each task individually.
Deep Dive
Skill File Anatomy
Every SKILL.md follows a consistent structure with four core sections:
SKILL.md structure
├── Frontmatter (name, description, trigger conditions)
├── Step-by-step Workflow (phased, specific steps)
├── Anti-Rationalization Table (common excuses + rebuttals)
└── Verification / Exit Criteria (what "done" actually means)
The last two sections carry most of the value.
Anti-rationalization tables list the shortcuts agents most commonly take, paired with the reality:
| Rationalization | Reality |
|---|---|
| "I'll add tests later" | Bugs compound. A bug in Slice 1 makes Slices 2-5 wrong. |
| "It's faster to do it all at once" | Feels faster until something breaks across 500 changed lines. |
| "This refactor is small enough to include" | Refactors mixed with features make both harder to review and debug. |
| "Run again just to be sure" | Repeating the same command adds nothing unless the code has changed. |
Exit criteria define what "done" actually means. "Seems right" is never sufficient:
incremental-implementation exit criteria:
- [ ] Each increment individually tested and committed
- [ ] Full test suite passes
- [ ] Build is clean
- [ ] Feature works end-to-end as specified
- [ ] No uncommitted changes remain
The design gives agents a concrete checklist to verify against, rather than relying on self-assessed completion.
The Full 24-Skill Map
Define (3)
├── interview-me ← Requirement elicitation through structured questions
├── idea-refine ← Turning rough ideas into executable directions
└── spec-driven-development ← Full spec before any implementation
Plan (1)
└── planning-and-task-breakdown ← Atomic task decomposition, ≤5 files per task
Build (7)
├── incremental-implementation ← Vertical slices, test and commit per slice
├── test-driven-development ← Red-Green-Refactor cycle
├── context-engineering ← Precise control of agent working context
├── source-driven-development ← Existing code as ground truth
├── doubt-driven-development ← Actively surfacing blind spots
├── frontend-ui-engineering ← Component design and accessibility
└── api-and-interface-design ← Contract-first, versioned interfaces
Verify (2)
├── browser-testing-with-devtools ← DevTools-assisted browser testing
└── debugging-and-error-recovery ← Systematic fault isolation and fix
Review (4)
├── code-review-and-quality ← Readability, testability, maintainability
├── code-simplification ← Complexity reduction, distinct from refactoring
├── security-and-hardening ← STRIDE modeling + non-negotiable checklist
└── performance-optimization ← Core Web Vitals and backend performance
Ship (6)
├── git-workflow-and-versioning ← Trunk-based development
├── ci-cd-and-automation ← Automated pipeline design
├── deprecation-and-migration ← Safe API removal and migration
├── documentation-and-adrs ← Architecture Decision Records
├── observability-and-instrumentation ← Logs, metrics, traces
└── shipping-and-launch ← Complete release checklist
Meta (1)
└── using-agent-skills ← How to use this system effectively
4 Agent Personas
Beyond skill files, the project provides 4 specialized agent personas for targeted work:
code-reviewer: Reviews code at Staff engineer standard — readability, testability, side effects, edge cases.
test-engineer: Focuses on test coverage and quality analysis, examining test design rather than just measuring coverage numbers.
security-auditor: OWASP-based security assessment with a separate checklist for LLM applications — prompt injection, context leakage, untrusted model output.
web-performance-auditor: Core Web Vitals audit with actionable optimization recommendations.
Using personas in Claude Code:
Use the security-auditor persona to review src/api/auth.ts
Use the web-performance-auditor persona to analyze homepage load performance
Embedded Engineering Principles
The skills encode several principles from Google's engineering culture directly into the workflows:
Hyrum's Law: Once an API has enough users, they will depend on every observable behavior, regardless of what the documentation says. In practice: search all callers before changing any public behavior; don't assume undocumented behavior goes unused.
Beyoncé Rule ("if you liked it, you should have put a test on it"): If a behavior is worth keeping, it's worth having a test. Code without test coverage has no safety net when changed.
Chesterton's Fence: Don't remove code you don't understand. Establish the reason it exists before deciding to delete it.
Shift Left: Move security and testing from pre-release into development. The earlier a problem is caught, the cheaper it is to fix.
Trunk-based development: Short-lived feature branches, frequent integration into the main branch. Avoids the merge conflicts that accumulate with long-running branches.
security-and-hardening: The LLM-Specific Rules
This skill includes a section dedicated to LLM applications that's worth examining on its own:
"Treat all model output as untrusted input."
Specific rules:
- Never pass model output directly into SQL queries,
eval(), shell commands, orinnerHTML - The system prompt is not a security boundary — enforce permissions in code, not prompts
- Keep users' private data and other users' data out of prompts; anything in context can be echoed back
These rules address real vulnerability patterns in current LLM application development, not hypothetical threats.
spec-driven-development: Four Gated Phases
The spec workflow enforces a four-phase gate model, each requiring human review before advancing:
Phase 1: Specify
→ Draft requirements covering objective, structure, testing, and boundaries
→ Surface assumptions explicitly — list them and ask for correction
Phase 2: Plan
→ Technical implementation approach with dependencies and risks
→ Reframe vague goals as testable success criteria ("faster" → specific LCP/CLS targets)
Phase 3: Tasks
→ Discrete, verifiable work items (~5 files max each)
→ Three-tier boundaries: Always do / Ask first / Never do
Phase 4: Implement
→ Execute tasks incrementally, spec stays alive
→ Update spec when decisions shift; commit it to version control
The common trap the skill flags: "I'll write the spec after I code" produces documentation, not specification. The value comes from forcing clarity before work begins.
Links and Resources
Official Resources
- 🌟 GitHub: addyosmani/agent-skills
- 👤 Author: Addy Osmani
- 📖 Book: Learning JavaScript Design Patterns — Addy Osmani
Engineering Principles Referenced
- Hyrum's Law — Hyrum Wright, Software Engineering at Google
- The Beyoncé Rule — Google SRE Workbook
- Chesterton's Fence — G.K. Chesterton, Orthodoxy
- Shift Left Testing — Modern DevOps practice
- Trunk-Based Development — trunkbaseddevelopment.com
Conclusion
Agent Skills doesn't extend what AI agents can do. It constrains what they do by default.
The capability ceiling for AI coding agents is already quite high. The gap is in default behavior: specs get skipped, tests get deferred, security gets left to "a future pass." Those deferrals accumulate into projects that are hard to understand, hard to change, and hard to trust.
The design approach here is worth studying beyond this specific project. Encode expert behavior as executable constraints rather than relying on the AI to self-assess "how things should be done." Anti-rationalization tables make the most common shortcuts explicit. Exit criteria make "done" unambiguous. The same pattern shows up in PM workflows (pm-skills) and AI design constraints (taste-skill) — structured human expertise, packaged for AI consumption.
For any engineer using AI-assisted coding, agent-skills is worth a trial. At minimum, install spec-driven-development and test-driven-development and observe whether agent behavior changes.
Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.
Welcome to my Homepage for more useful insights and interesting products.
Top comments (0)