Triage Playwright flakes from CI logs before opening traces

#testing #playwright

Flaky Playwright tests usually do not start as a clean debugging session. They start as a red CI job, a rerun button, and a long trace or log that someone has to interpret under time pressure.

I built Playwright Flake Triage Toolkit as a small local CLI for the first pass: scan Playwright JSON reports, JUnit XML, and CI logs, then produce a Markdown or JSON checklist of likely causes.

GitHub: https://github.com/vasiliy0/playwright-flake-triage

PyPI: https://pypi.org/project/playwright-flake-triage/

What it tries to answer

Instead of replacing the Playwright trace viewer, the tool answers a narrower question:

What kind of flake is this likely to be, and what should I check first?

Current categories include:

ambiguous/brittle selectors
auth/session state mismatch
timeout or readiness instability
network/backend dependency flakes
browser/context/page lifecycle races
navigation/frame detachment races
visual snapshot instability
parallel/shared-state collisions
repeated failure fingerprints across retries/log files

Example

pip install playwright-flake-triage
pw-flake-triage playwright-report.json junit.xml ci.log

For CI usage, the tool can write a GitHub Actions step summary:

pw-flake-triage test-results/ --github-step-summary --fail-on-severity high

It is read-only and local: no service account, no token, no upload of private logs.

Why this is useful before deep debugging

A failed Playwright trace is still the source of truth, but teams often lose time by treating every flake as a generic timeout. A first-pass classifier helps split failures into different queues:

selector work: improve locators and avoid stale element handles;
product/test-state work: verify auth, seeded data, and permissions;
infrastructure work: separate backend/network failures from browser timing;
CI policy work: fail only on selected severity or known categories.

What the tool does not do

It does not claim to prove root cause. The output is a triage checklist, not an automatic fix. It also intentionally avoids uploading logs to a hosted service.

Feedback I am looking for

If you run Playwright in CI, the most useful feedback would be:

Which failure wording is missing from the current rules?
Which categories are too broad or too noisy?
Would a CI summary / fail-on-severity mode fit your workflow?

Repo issues are the best place for examples, with sensitive logs sanitized first.

DEV Community