TL;DR
- SKILL.md gives Claude Code a persistent QA methodology: 5 workflow phases, folder structure, locator rules, and a capped fix loop
- Drop it in your project root, give Claude Code your codebase or test cases, it writes and saves an organized Playwright suite automatically
- Tests persist on disk and run free forever via Playwright's native runner
The Problem
Writing tests takes as long as writing the feature. Most devs either skip them or write shallow ones that break on the next refactor. The issue isn't Playwright — it's that there's no system. Every test session starts from scratch, and coverage is whatever you had time for.
Asking Claude Code to "write some tests" helps, but without instructions it's inconsistent. It writes differently every time and has no idea how you want things organized.
The fix is giving it a methodology to follow.
What's a SKILL.md?
A markdown file in your project root. Claude Code reads it at the start of every session as its operating instructions for that project.
The difference from prompting: prompts are forgotten when the conversation ends. A SKILL.md persists. You configure the methodology once; Claude Code follows it on every session, in every project you drop it into.
What the SKILL.md Defines
5 Workflow Phases
| Phase | What happens |
|---|---|
| Assess | Reads your project, proposes test scenarios, waits for your approval |
| Author | Writes organized test files by concern, saves them with meaningful names |
| Execute | Runs the suite via Playwright's native runner |
| Fix | Categorizes each failure, fixes and reruns — max 3 attempts |
| Report | Results summary, bugs found, coverage gaps, flaky test flags |
Folder Structure
Tests go into tests/e2e/ organized by concern:
- happy-path — core flows that must always work
- validation — form errors, required fields, bad input handling
- edge-cases — empty inputs, special characters, boundary values
- accessibility — keyboard nav, focus order, aria attributes
Locator Priority
Claude Code follows this order on every test it writes:
- getByRole — survives refactors, matches user intent
- getByLabel — for form fields
- getByText — for buttons and visible content
- data-testid — when semantic locators aren't enough
- CSS selectors — last resort only
CSS selectors break every time someone touches a class name. Enforcing this in the SKILL.md means Claude Code never takes the lazy route.
Fix Loop
When a test fails, Claude Code decides what kind of failure it is before touching anything:
- Test bug (wrong selector, race condition) → fix the test
- Real app bug → fix the app, report what broke
- Flaky (intermittent) → add a wait on that specific action
Max 3 attempts. After that it stops and explains rather than looping forever.
Language Support
The SKILL.md works with TypeScript, JavaScript, Python, Java, and C#. Claude Code detects your language from your project files and generates the right file extensions, run commands, and test syntax automatically.
The Workflow — 4 Prompts
"Here's my codebase / feature spec. What should I be testing?"
Claude Code reads everything, identifies scenarios grouped by concern, and gives you a numbered list to approve. Nothing gets written until you confirm. This is your only required input.
"Generate the full test suite based on those scenarios and save the files."
It writes the tests, picks the folder structure, names files meaningfully, and saves to tests/e2e/. Locator rules and test hygiene from the SKILL.md apply automatically.
"Run the tests and fix any failures."
The fix loop runs per the SKILL.md — categorize, fix, rerun, 3 attempts max. You get a clean report either way.
"Here's a new user story. Add tests for it to the existing suite."
Weeks later, new feature lands. Claude Code reads your existing files, avoids duplicates, and extends the suite cleanly. The tests grow with the product.
Swarm Mode
For full pre-release coverage, Claude Code spawns 3 sub-agents in parallel instead of running sequentially:
- Agent 1 — happy paths and success flows
- Agent 2 — validation, edge cases, error states
- Agent 3 — accessibility and UX behavior
All three write simultaneously. Playwright runs the combined suite once they're done. AI tokens are spent once across 3 parallel agents; execution is free regardless of how many tests were written.
Use single agent for targeted checks. Use swarm mode when you need comprehensive coverage before a release.
Get the SKILL.md
Drop it in your project root as SKILL.md. Open Claude Code. Start with Prompt 1.
https://gist.github.com/strelec00/b76230c45523a54597b6d115f78b80f7
5 Prompts Worth Saving
Gap analysis on an existing suite:
"Read all test files in this project. What's covered, what's missing, and what looks redundant?"
Failure mode thinking:
"What are the 5 most likely ways this feature could break that aren't covered by happy-path tests?"
Regression from a bug report:
"A bug was reported where [describe it]. Write a test that would have caught this before it shipped."
Suite update after UI changes:
"The UI for this flow changed. Here's the new spec. Update the existing tests to match without removing coverage."
Auto-generated documentation:
"Write a short guide explaining what this test suite covers, how to run it, and what to do when a test fails."
Limitations
Review generated locators. If Claude Code only has a plain-English description to work from, it guesses at button labels and input names. Give it component code or describe your actual UI for accurate selectors.
Quality of input determines quality of output. A vague spec produces shallow tests. Specific acceptance criteria produces real coverage.
Swarm mode takes time on large codebases. For a quick pre-commit check, single agent is faster. Swarm is for depth runs where thoroughness matters more than speed.
Top comments (0)