DEV Community

vasiliy0
vasiliy0

Posted on

Triage Playwright flakes from CI logs before opening traces

Flaky Playwright tests usually do not start as a clean debugging session. They start as a red CI job, a rerun button, and a long trace or log that someone has to interpret under time pressure.

I built Playwright Flake Triage Toolkit as a small local CLI for the first pass: scan Playwright JSON reports, JUnit XML, and CI logs, then produce a Markdown or JSON checklist of likely causes.

GitHub: https://github.com/vasiliy0/playwright-flake-triage

PyPI: https://pypi.org/project/playwright-flake-triage/

What it tries to answer

Instead of replacing the Playwright trace viewer, the tool answers a narrower question:

What kind of flake is this likely to be, and what should I check first?

Current categories include:

  • ambiguous/brittle selectors
  • auth/session state mismatch
  • timeout or readiness instability
  • network/backend dependency flakes
  • browser/context/page lifecycle races
  • navigation/frame detachment races
  • visual snapshot instability
  • parallel/shared-state collisions
  • repeated failure fingerprints across retries/log files

Example

pip install playwright-flake-triage
pw-flake-triage playwright-report.json junit.xml ci.log
Enter fullscreen mode Exit fullscreen mode

For CI usage, the tool can write a GitHub Actions step summary:

pw-flake-triage test-results/ --github-step-summary --fail-on-severity high
Enter fullscreen mode Exit fullscreen mode

It is read-only and local: no service account, no token, no upload of private logs.

Why this is useful before deep debugging

A failed Playwright trace is still the source of truth, but teams often lose time by treating every flake as a generic timeout. A first-pass classifier helps split failures into different queues:

  • selector work: improve locators and avoid stale element handles;
  • product/test-state work: verify auth, seeded data, and permissions;
  • infrastructure work: separate backend/network failures from browser timing;
  • CI policy work: fail only on selected severity or known categories.

What the tool does not do

It does not claim to prove root cause. The output is a triage checklist, not an automatic fix. It also intentionally avoids uploading logs to a hosted service.

Feedback I am looking for

If you run Playwright in CI, the most useful feedback would be:

  1. Which failure wording is missing from the current rules?
  2. Which categories are too broad or too noisy?
  3. Would a CI summary / fail-on-severity mode fit your workflow?

Repo issues are the best place for examples, with sensitive logs sanitized first.

Top comments (0)