Adnan G

Posted on Mar 20

I got tired of downloading Playwright artifacts from CI, so I changed the workflow

#playwright #testing #ci #devops

I got tired of downloading Playwright artifacts from CI — so I changed the workflow

Debugging Playwright failures in CI has always felt more manual than it should be.

Not because the data isn’t there — it is.

But because it’s scattered.

A typical failure for me looks like this:

open CI job
download artifacts
open trace viewer locally
check screenshots
scroll logs
try to line everything up

It works… but it’s slow. Especially when multiple tests fail at once.

The real problem

The issue isn’t lack of data.

It’s that there’s no single place to understand what happened.

Everything lives in separate files:

traces
screenshots
logs
CI output

So debugging turns into stitching together context manually.

It gets worse with:

parallel runs
flaky tests
multiple failures triggered by the same root cause

At that point you’re not debugging — you’re reconstructing events.

What I tried instead

I wanted to answer one simple question faster:

“What actually happened in this run?”

So I changed the workflow.

Instead of downloading artifacts and inspecting things one by one,

I pushed everything from a run into a single view.

That view shows:

all failed tests across jobs
traces, screenshots, logs in one place
failures grouped if they look related
a short summary of what likely happened

The goal wasn’t to add more data — it was to remove the jumping between tools.

Example

Instead of this:

open CI
download artifacts
open trace
go back to logs
repeat

You just open one link and see:

which tests failed
whether they failed for the same reason
what the UI looked like at failure
what the logs say

No downloading, no switching contexts.

What improved

Two things stood out immediately.

1. Faster triage

You can tell pretty quickly if:

it’s one bug causing multiple failures
or a bunch of unrelated issues

That alone saves a lot of time.

2. Less noise from flakiness

Grouping similar failures makes it obvious when:

multiple tests break for the same reason
vs random flakes

Before that, everything just looked like chaos.

What still isn’t great

This still feels like a workaround.

The ecosystem gives you all the pieces,

but not a clean way to reason about failures at the run level.

I’m curious how others are handling this today.

Do you rely mostly on trace viewer?
Do you download artifacts every time?
Any workflows that actually reduce debugging time?

If you’re curious

I open-sourced what I’ve been using here:

👉 https://github.com/adnangradascevic/playwright-reporter

Would love feedback — especially if you’re dealing with a lot of CI failures.

Top comments (2)

Adnan G • Mar 20

One thing I didn’t expect. Grouping failures ended up being more useful than the raw logs.

Curious if others see the same or if you prefer digging test by test.

Alex Serebriakov • Apr 8

good timing — also explored alternatives to self-hosted puppeteer recently

snapapi.pics is what we settled on. REST API for screenshots/PDF, no browser infra, handles scale