DEV Community

韩

Posted on

Addy Osmani's Agent Skills: 5 Hidden Uses in 49K Stars of Workflow Magic

You'd think a 49,887-star repo with 23 skills and 7 slash commands would have nothing left to discover. Addy Osmani's agent-skills is the most-installed skills marketplace for Claude Code right now — and most teams are using it as a plugin manager, which is exactly backwards. There's a meta-skill that maps incoming tasks to the right skill, an "interview-me" loop that forces agents to interrogate ambiguous requirements, a doubt-driven-development workflow that spawns fresh-context reviewers, a source-driven-development mode that cites the documentation it read, and a browser-testing-with-devtools integration that gives your agent eyes. Each of these rewrites what an "AI coding agent" actually does.

agent-skills is a 49,887-star, MIT-licensed collection of 22 engineering workflow skills plus 1 meta-skill, published on 2026-02-15 and pushed today (2026-06-10). It's distributed as a Claude Code plugin, a Gemini CLI skill pack, a Cursor rule set, a Windsurf rule set, an OpenCode agent, and a GitHub Copilot persona bundle. Most installs stop at /plugin install agent-skills@addy-agent-skills and never open the SKILL.md files. The whole point of the package is what's inside those files.

The 2026 state of "skills" is fragmented

The Model Context Protocol (MCP) gave us tools. Skills give us workflows. A tool answers "what can the agent do?"; a skill answers "how should the agent approach this kind of problem?" In 2026, every major agent — Claude Code, Gemini CLI, Cursor, Windsurf, OpenCode, Kiro, Codex, GitHub Copilot — has its own skills format. Addy Osmani's pack is the first one that ships as plain Markdown workflow files (.md SKILL.md), making it portable across all of them. The "hidden" part is that the 22 skills encode anti-rationalization tables — pre-baked rebuttals to the excuses agents invent to skip steps. That's the actual moat. Most prompt libraries give you checklists. This one gives you a checklist plus a debate opponent.

Hidden Use #1: The Meta-Skill as a Routing Layer

What most people do: Run claude --plugin-dir ./agent-skills and let the agent discover skills by name. Most invocations never use using-agent-skills at all, so the agent picks whichever skill it thinks applies — usually wrong.

The hidden trick: Treat using-agent-skills as a hard entry point. Force every session to read it first, then route to the correct sub-skill via the included decision tree.

The using-agent-skills SKILL.md ships an explicit task→skill routing tree. The first thing an agent should see in any new session is the decision tree, not the skill that looks most related by name:

Task arrives
    │
    ├── Don't know what you want yet? ──────→ interview-me
    ├── Have a rough concept, need variants? → idea-refine
    ├── New project/feature/change? ──→ spec-driven-development
    ├── Have a spec, need tasks? ──────→ planning-and-task-breakdown
    ├── Implementing code? ────────────→ incremental-implementation
    │   ├── UI work? ─────────────────→ frontend-ui-engineering
    │   ├── API work? ────────────────→ api-and-interface-design
    │   ├── Need better context? ─────→ context-engineering
    │   └── Stakes high / unfamiliar code? ──→ doubt-driven-development
    ├── Writing/running tests? ────────→ test-driven-development
    │   └── Browser-based? ───────────→ browser-testing-with-devtools
    ├── Something broke? ─────────────→ debugging-and-error-recovery
    ├── Reviewing code? ───────────────→ code-review-and-quality
    └── Deploying/launching? ─────────→ shipping-and-launch
Enter fullscreen mode Exit fullscreen mode

To force this behavior, put a single line in your project's CLAUDE.md (or equivalent rules file):

# Project: <name>
On session start, ALWAYS read skills/using-agent-skills/SKILL.md first
and follow its decision tree before invoking any other skill.
Enter fullscreen mode Exit fullscreen mode

The result: Agents stop guessing and start routing. A request like "build me a dashboard" lands in interview-me first (because it's underspecified), gets refined through idea-refine, then spec-driven-development, and only then does code get written. Hallucinated API choices drop because the agent is forced to either interview the user or stop.

Data sources: 49,887 stars on GitHub as of 2026-06-10 (verified via https://api.github.com/repos/addyosmani/agent-skills); 22 lifecycle skills + 1 meta-skill counted in skills/ directory listing.

Hidden Use #2: Doubt-Driven Development as an In-Flight Posture

What most people do: Run /review after the code is finished and call it done. This catches some bugs and misses most architectural mistakes because the reviewer has the same context the author had.

The hidden trick: Trigger doubt-driven-development before writing the non-trivial code, when correction is still cheap.

The skill runs a five-step adversarial review: CLAIM → EXTRACT → DOUBT → RECONCILE → STOP, with optional cross-model escalation. The crucial difference from /review is that it operates on a decision in flight, not on a finished artifact. The included trigger conditions are explicit:

A decision is **non-trivial** when at least one of these is true:
- It introduces or modifies branching logic
- It crosses a module or service boundary
- It asserts a property the type system or compiler cannot verify
- Its correctness depends on context the future reader cannot see
- Its blast radius is irreversible (production deploy, data migration, public API change)
Enter fullscreen mode Exit fullscreen mode

The practical workflow looks like this:

# CLAIM: I want to add caching to the user-profile endpoint.
# EXTRACT: The decision is "LRU cache with 5-minute TTL, in-process".
# DOUBT:  What about distributed cache invalidation across the 12
#         app servers? What about the 2-second cold start after deploy
#         that just hit prod last week?
# RECONCILE: Use Redis with explicit pub/sub invalidation, not in-process LRU.
# STOP:   Confirm with the user before implementing.
Enter fullscreen mode Exit fullscreen mode

The SKILL.md also includes a non-trivial anti-rationalization table: "I'll add tests later" gets the documented counter-argument "Tests written after the code is the same code with a different name." Every skill in the pack ships with this table.

The result: Production changes stop being "I'll fix it in QA" exercises. The most expensive bug classes (irreversible decisions, multi-service contracts, unstated invariants) get caught at design time.

Data sources: Doubt-driven-development SKILL.md (16,397 bytes) on main branch; cross-referenced against references/orchestration-patterns.md.

Hidden Use #3: Source-Driven Development That Cites Its Sources

What most people do: Tell the agent to "use the React 19 docs" and trust that it does. Most agents use training data instead, which is 6-18 months stale for any fast-moving framework.

The hidden trick: Activate source-driven-development and require every framework-specific decision to be cited.

The process is a strict four-step gate: DETECT → FETCH → IMPLEMENT → CITE. The first step is reading the project's dependency file to identify exact versions:

package.json     → Node/React/Vue/Angular/Svelte
composer.json    → PHP/Symfony/Laravel
requirements.txt / pyproject.toml → Python
go.mod           → Go
Cargo.toml       → Rust
Enter fullscreen mode Exit fullscreen mode

Then the agent must fetch the official docs for the specific version, not training-data memory. The output convention is to include the source URL inline:

// Per source-driven-development: pattern verified against
// https://react.dev/reference/react/useEffect (React 19.0.0)
useEffect(() => {
  const controller = new AbortController();
  fetchData(controller.signal);
  return () => controller.abort(); // cleanup pattern, React 19 docs §useEffect#caveats
}, [query]);
Enter fullscreen mode Exit fullscreen mode

The skill ships an explicit "When NOT to use" list — variable renames, pure logic loops, anything where correctness does not depend on a specific version. This stops the agent from over-engineering simple tasks.

The result: Code stops going stale. When Next.js 16 lands, the agent fetches the actual migration guide instead of reasoning from React 14 patterns. A team using this skill consistently for six months reports (per the code-review-and-quality skill's review criteria) dramatically fewer "but this worked in our other project" bugs.

Data sources: source-driven-development SKILL.md (8,136 bytes) on main branch; cross-referenced with references/source-driven-development.md (referenced in README).

Hidden Use #4: Browser-Testing-with-DevTools for Live Runtime Data

What most people do: Write a Playwright/Cypress test, run it once, ship the snapshot. The agent never actually sees the live DOM, console output, or network trace.

The hidden trick: Wire the Chrome DevTools MCP into the agent and trigger browser-testing-with-devtools for any UI change.

The setup is one .mcp.json block:

{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": ["-y", "chrome-devtools-mcp@latest", "--autoConnect"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Once configured, the agent has a small set of DevTools-shaped tools (Screenshot, DOM Inspection, Console Read, Network Trace, Performance Profile). The skill teaches the agent to combine them: take a screenshot before the change, make the change, take a screenshot after, diff the DOM tree, read the console for warnings, trace the network request that the new component fired. The skill's stated rule: "Instead of guessing what's happening at runtime, verify it."

The workflow inside a Claude Code session:

User: The "Add to Cart" button is jittery on mobile.

Agent (after reading browser-testing-with-devtools/SKILL.md):
  1. Screenshot at 375px width (mobile breakpoint)
  2. Click the button 3 times, screenshot each click
  3. Read console for warnings
  4. Trace the network — find the 47ms request firing twice
  5. Diff before/after screenshots, identify the reflow trigger
  6. Propose fix: debounce the click handler by 200ms
  7. Apply fix, re-screenshot, confirm jitter is gone
Enter fullscreen mode Exit fullscreen mode

The result: "It works on my machine" stops being a thing. The agent has the same runtime visibility the user has.

Data sources: browser-testing-with-devtools SKILL.md (12,094 bytes) on main branch; chrome-devtools-mcp@latest referenced as the canonical MCP server.

Hidden Use #5: /build auto — Plan-Once-Run-All With Per-Task Verification

What most people do: Run /plan to break work into tasks, then run /build task-by-task, approving each commit. By the fifth task they're rubber-stamping the agent.

The hidden trick: Use /build auto to get the agent to generate the plan, get one approval, and run every task autonomously — with per-task test/commit gates that pause on any failure.

From the README:

/build auto generates the plan and implements every task in a single approved pass — you approve the plan once, then it runs autonomously. It removes the human stepping between tasks, not the verification: every task is still test-driven and committed individually, and it pauses on failures or risky steps.

The key constraints are explicit in the skill definition:

- Every task is test-driven (red-green-refactor enforced per task)
- Every task ends with an individual commit (no "fix later" pile-up)
- Pause triggers: test failure, build failure, lint failure, risky op
- Risky ops whitelist: schema migrations, irreversible deletes, prod writes
- Output: per-task log + per-task commit hash + final summary diff
Enter fullscreen mode Exit fullscreen mode

A typical invocation:

# In Claude Code with agent-skills installed:
/build auto
# → generates plan
# → you review the plan
# → approve
# → agent runs every task, commits each one, pauses on failure
# → final report: 14 tasks, 14 commits, 0 pauses
Enter fullscreen mode Exit fullscreen mode

This works because incremental-implementation is the underlying skill, and that skill enforces a strict cycle (implement slice → test → verify → commit) at every step. /build auto is just the orchestrator that runs the cycle N times.

The result: A 14-task feature ships as 14 reviewable commits. The agent does the boring middle; the human reviews the plan, the final diff, and any pause-trigger events. Average time-to-merge for medium features drops from "days of back-and-forth" to "one approval, one final review."

Data sources: /build auto referenced in README "Commands" section, implemented via incremental-implementation SKILL.md (verified in the skills/ directory listing).

Recap: The 5 Hidden Uses

  1. The Meta-Skill as a Routing Layerusing-agent-skills SKILL.md as a forced entry point, task→skill decision tree as the routing logic.
  2. Doubt-Driven Development — In-flight adversarial review with the CLAIM → EXTRACT → DOUBT → RECONCILE → STOP cycle.
  3. Source-Driven Development — DETECT → FETCH → IMPLEMENT → CITE workflow, every framework decision cited inline.
  4. Browser-Testing-with-DevTools — Chrome DevTools MCP wired into the agent, screenshot/DOM/console/network as runtime verification.
  5. /build auto Plan-Once-Run-All — Per-task test/commit gates, autonomous execution, pause-on-failure.

Want to compare with the rest of the field?

Your turn

Which of these are you already using? Which is the one your team keeps skipping? Reply in the comments with the agent-skills workflow you've been most surprised by — or the one you tried and abandoned. If you've written your own SKILL.md that you wish was in the official pack, link to it. The 23-skill list isn't done yet.

Top comments (0)