jan strelec

Posted on Mar 28

How I Use a SKILL.md File to Make Claude Code Run a Full QA Workflow Automatically

#testing #playwright #claudecode #automation

TL;DR

SKILL.md gives Claude Code a persistent QA methodology: 5 workflow phases, folder structure, locator rules, and a capped fix loop
Drop it in your project root, give Claude Code your codebase or test cases, it writes and saves an organized Playwright suite automatically
Tests persist on disk and run free forever via Playwright's native runner

The Problem

Writing tests takes as long as writing the feature. Most devs either skip them or write shallow ones that break on the next refactor. The issue isn't Playwright — it's that there's no system. Every test session starts from scratch, and coverage is whatever you had time for.

Asking Claude Code to "write some tests" helps, but without instructions it's inconsistent. It writes differently every time and has no idea how you want things organized.

The fix is giving it a methodology to follow.

What's a SKILL.md?

A markdown file in your project root. Claude Code reads it at the start of every session as its operating instructions for that project.

The difference from prompting: prompts are forgotten when the conversation ends. A SKILL.md persists. You configure the methodology once; Claude Code follows it on every session, in every project you drop it into.

What the SKILL.md Defines

5 Workflow Phases

Phase	What happens
Assess	Reads your project, proposes test scenarios, waits for your approval
Author	Writes organized test files by concern, saves them with meaningful names
Execute	Runs the suite via Playwright's native runner
Fix	Categorizes each failure, fixes and reruns — max 3 attempts
Report	Results summary, bugs found, coverage gaps, flaky test flags

Folder Structure

Tests go into tests/e2e/ organized by concern:

happy-path — core flows that must always work
validation — form errors, required fields, bad input handling
edge-cases — empty inputs, special characters, boundary values
accessibility — keyboard nav, focus order, aria attributes

Locator Priority

Claude Code follows this order on every test it writes:

getByRole — survives refactors, matches user intent
getByLabel — for form fields
getByText — for buttons and visible content
data-testid — when semantic locators aren't enough
CSS selectors — last resort only

CSS selectors break every time someone touches a class name. Enforcing this in the SKILL.md means Claude Code never takes the lazy route.

Fix Loop

When a test fails, Claude Code decides what kind of failure it is before touching anything:

Test bug (wrong selector, race condition) → fix the test
Real app bug → fix the app, report what broke
Flaky (intermittent) → add a wait on that specific action

Max 3 attempts. After that it stops and explains rather than looping forever.

Language Support

The SKILL.md works with TypeScript, JavaScript, Python, Java, and C#. Claude Code detects your language from your project files and generates the right file extensions, run commands, and test syntax automatically.

The Workflow — 4 Prompts

"Here's my codebase / feature spec. What should I be testing?"

Claude Code reads everything, identifies scenarios grouped by concern, and gives you a numbered list to approve. Nothing gets written until you confirm. This is your only required input.

"Generate the full test suite based on those scenarios and save the files."

It writes the tests, picks the folder structure, names files meaningfully, and saves to tests/e2e/. Locator rules and test hygiene from the SKILL.md apply automatically.

"Run the tests and fix any failures."

The fix loop runs per the SKILL.md — categorize, fix, rerun, 3 attempts max. You get a clean report either way.

"Here's a new user story. Add tests for it to the existing suite."

Weeks later, new feature lands. Claude Code reads your existing files, avoids duplicates, and extends the suite cleanly. The tests grow with the product.

Swarm Mode

For full pre-release coverage, Claude Code spawns 3 sub-agents in parallel instead of running sequentially:

Agent 1 — happy paths and success flows
Agent 2 — validation, edge cases, error states
Agent 3 — accessibility and UX behavior

All three write simultaneously. Playwright runs the combined suite once they're done. AI tokens are spent once across 3 parallel agents; execution is free regardless of how many tests were written.

Use single agent for targeted checks. Use swarm mode when you need comprehensive coverage before a release.

Get the SKILL.md

Drop it in your project root as SKILL.md. Open Claude Code. Start with Prompt 1.

https://gist.github.com/strelec00/b76230c45523a54597b6d115f78b80f7

5 Prompts Worth Saving

Gap analysis on an existing suite:

"Read all test files in this project. What's covered, what's missing, and what looks redundant?"

Failure mode thinking:

"What are the 5 most likely ways this feature could break that aren't covered by happy-path tests?"

Regression from a bug report:

"A bug was reported where [describe it]. Write a test that would have caught this before it shipped."

Suite update after UI changes:

"The UI for this flow changed. Here's the new spec. Update the existing tests to match without removing coverage."

Auto-generated documentation:

"Write a short guide explaining what this test suite covers, how to run it, and what to do when a test fails."

Limitations

Review generated locators. If Claude Code only has a plain-English description to work from, it guesses at button labels and input names. Give it component code or describe your actual UI for accurate selectors.

Quality of input determines quality of output. A vague spec produces shallow tests. Specific acceptance criteria produces real coverage.

Swarm mode takes time on large codebases. For a quick pre-commit check, single agent is faster. Swarm is for depth runs where thoroughness matters more than speed.