I got tired of downloading Playwright artifacts from CI — so I changed the workflow
Debugging Playwright failures in CI has always felt more manual than it should be.
Not because the data isn’t there — it is.
But because it’s scattered.
A typical failure for me looks like this:
- open CI job
- download artifacts
- open trace viewer locally
- check screenshots
- scroll logs
- try to line everything up
It works… but it’s slow. Especially when multiple tests fail at once.
The real problem
The issue isn’t lack of data.
It’s that there’s no single place to understand what happened.
Everything lives in separate files:
- traces
- screenshots
- logs
- CI output
So debugging turns into stitching together context manually.
It gets worse with:
- parallel runs
- flaky tests
- multiple failures triggered by the same root cause
At that point you’re not debugging — you’re reconstructing events.
What I tried instead
I wanted to answer one simple question faster:
“What actually happened in this run?”
So I changed the workflow.
Instead of downloading artifacts and inspecting things one by one,
I pushed everything from a run into a single view.
That view shows:
- all failed tests across jobs
- traces, screenshots, logs in one place
- failures grouped if they look related
- a short summary of what likely happened
The goal wasn’t to add more data — it was to remove the jumping between tools.
Example
Instead of this:
- open CI
- download artifacts
- open trace
- go back to logs
- repeat
You just open one link and see:
- which tests failed
- whether they failed for the same reason
- what the UI looked like at failure
- what the logs say
No downloading, no switching contexts.
What improved
Two things stood out immediately.
1. Faster triage
You can tell pretty quickly if:
- it’s one bug causing multiple failures
- or a bunch of unrelated issues
That alone saves a lot of time.
2. Less noise from flakiness
Grouping similar failures makes it obvious when:
- multiple tests break for the same reason
- vs random flakes
Before that, everything just looked like chaos.
What still isn’t great
This still feels like a workaround.
The ecosystem gives you all the pieces,
but not a clean way to reason about failures at the run level.
I’m curious how others are handling this today.
- Do you rely mostly on trace viewer?
- Do you download artifacts every time?
- Any workflows that actually reduce debugging time?
If you’re curious
I open-sourced what I’ve been using here:
👉 https://github.com/adnangradascevic/playwright-reporter
Would love feedback — especially if you’re dealing with a lot of CI failures.
Top comments (1)
One thing I didn’t expect. Grouping failures ended up being more useful than the raw logs.
Curious if others see the same or if you prefer digging test by test.