Flaky Playwright tests usually do not start as a clean debugging session. They start as a red CI job, a rerun button, and a long trace or log that someone has to interpret under time pressure.
I built Playwright Flake Triage Toolkit as a small local CLI for the first pass: scan Playwright JSON reports, JUnit XML, and CI logs, then produce a Markdown or JSON checklist of likely causes.
GitHub: https://github.com/vasiliy0/playwright-flake-triage
PyPI: https://pypi.org/project/playwright-flake-triage/
What it tries to answer
Instead of replacing the Playwright trace viewer, the tool answers a narrower question:
What kind of flake is this likely to be, and what should I check first?
Current categories include:
- ambiguous/brittle selectors
- auth/session state mismatch
- timeout or readiness instability
- network/backend dependency flakes
- browser/context/page lifecycle races
- navigation/frame detachment races
- visual snapshot instability
- parallel/shared-state collisions
- repeated failure fingerprints across retries/log files
Example
pip install playwright-flake-triage
pw-flake-triage playwright-report.json junit.xml ci.log
For CI usage, the tool can write a GitHub Actions step summary:
pw-flake-triage test-results/ --github-step-summary --fail-on-severity high
It is read-only and local: no service account, no token, no upload of private logs.
Why this is useful before deep debugging
A failed Playwright trace is still the source of truth, but teams often lose time by treating every flake as a generic timeout. A first-pass classifier helps split failures into different queues:
- selector work: improve locators and avoid stale element handles;
- product/test-state work: verify auth, seeded data, and permissions;
- infrastructure work: separate backend/network failures from browser timing;
- CI policy work: fail only on selected severity or known categories.
What the tool does not do
It does not claim to prove root cause. The output is a triage checklist, not an automatic fix. It also intentionally avoids uploading logs to a hosted service.
Feedback I am looking for
If you run Playwright in CI, the most useful feedback would be:
- Which failure wording is missing from the current rules?
- Which categories are too broad or too noisy?
- Would a CI summary / fail-on-severity mode fit your workflow?
Repo issues are the best place for examples, with sensitive logs sanitized first.
Top comments (0)