How I built an AI reporter for Playwright that explains test failures

Santiago Echavarria — Wed, 13 May 2026 21:50:11 +0000

After 6 years doing QA automation in Fintech, I got tired of the same cycle:

Test fails in CI
Download the report
Spend 20 minutes reading stack traces
Realize it was a one-line selector issue

So I built a custom Playwright reporter that does the debugging for you.
What it does
When a test fails, the reporter automatically:

Captures the test name, error message, and stack trace
Sends it to an AI with a structured prompt
Gets back a root cause, a TypeScript fix, and a prevention tip
Saves everything to test-results/ai-diagnosis.md

Here's what the output looks like:
❌ Login › locked out user sees lock message

🔍 Analyzing failure: "locked out user sees lock message"

❌ locked out user sees lock message

1. Root cause

The selector [data-test="error"] matched but the assertion
expected different text than what Sauce Demo returned.

2. Fix

await expect(loginPage.errorMessage).toContainText(
  'Epic sadface: Sorry, this user has been locked out.'
);

3. Prevention

Always assert the exact error string shown in the UI, not a partial guess.

📄 Diagnosis saved → test-results/ai-diagnosis.md
The architecture
The key design decision was making the AI provider swappable at runtime via environment variable. No code changes needed — just update your .env:
bash# Free default
AI_PROVIDER=groq
GROQ_API_KEY=your_key

Or swap to Claude / OpenAI

AI_PROVIDER=claude
ANTHROPIC_API_KEY=your_key
This is powered by a simple AIProvider interface:
typescriptexport interface AIProvider {
analyzeFailure(testName: string, error: string): Promise;
}
Each provider (Groq, Claude, OpenAI) implements this interface. The factory reads the env var and returns the right one:
typescriptexport function createProvider(): AIProvider {
const provider = process.env.AI_PROVIDER ?? 'groq';
switch (provider) {
case 'claude': return new ClaudeProvider();
case 'openai': return new OpenAIProvider();
default: return new GroqProvider(); // free, no credit card
}
}
The reporter itself
Playwright lets you build custom reporters by implementing the Reporter interface. The key hook is onTestEnd:
typescriptexport class AIReporter implements Reporter {
async onTestEnd(test: TestCase, result: TestResult) {
if (result.status !== 'failed') return;

const provider = createProvider();
const diagnosis = await provider.analyzeFailure(
  test.title,
  result.error?.message ?? 'Unknown error'
);

console.log(`\n🔍 Analyzing failure: "${test.title}"\n`);
console.log(diagnosis);

fs.appendFileSync('test-results/ai-diagnosis.md', diagnosis);

}
}
Clean and minimal — it only runs when a test fails, so it doesn't slow down passing suites.
The full project structure
The repo also includes a full working example with Sauce Demo:
playwright-ai-reporter/
├── src/ai/
│ ├── AIProvider.interface.ts # shared contract
│ ├── GroqProvider.ts # free default
│ ├── ClaudeProvider.ts
│ ├── OpenAIProvider.ts
│ └── providerFactory.ts
├── reporters/
│ └── AIReporter.ts
├── tests/saucedemo/
│ ├── login.spec.ts
│ ├── cart.spec.ts
│ └── checkout.spec.ts
├── pages/ # Page Object Models
├── fixtures/ # shared Playwright fixtures
└── .github/workflows/
└── playwright.yml # CI already configured
CI with GitHub Actions
The workflow runs on every push. It uploads the HTML report as an artifact always, and uploads ai-diagnosis.md only when there are failures — so you always know exactly what broke and why.
yaml- name: Upload AI diagnosis
if: failure()
uses: actions/upload-artifact@v4
with:
name: ai-diagnosis
path: test-results/ai-diagnosis.md
AI provider comparison
ProviderCostSpeedQualityBest forGroq (Llama 3)FreeVery fastGoodPortfolio, small teamsClaude Haiku~$0.001/testFastVery goodMedium teamsClaude Sonnet~$0.005/testMediumExcellentEnterpriseGPT-4o-mini~$0.003/testFastVery goodAlternative
I use Groq for local dev (free tier is more than enough) and would use Claude Sonnet for a production CI pipeline where diagnosis quality matters.
What's next
A few things I want to add:

Screenshot attachment in the diagnosis when available
Grouping repeated failures to avoid duplicate AI calls
Slack/Teams notification with the diagnosis embedded

Try it yourself
bashgit clone https://github.com/sechavarriar/playwright-ai-reporter
cd playwright-ai-reporter
npm install
npx playwright install chromium
cp .env.example .env

Add your free Groq key from console.groq.com

npm test
The repo is at https://github.com/sechavarriar/playwright-ai-reporter — feedback and PRs very welcome. Especially curious if anyone has ideas for handling flaky tests differently.