DEV Community

Cover image for I got tired of downloading Playwright artifacts from CI, so I changed the workflow
Adnan G
Adnan G

Posted on

I got tired of downloading Playwright artifacts from CI, so I changed the workflow

I got tired of downloading Playwright artifacts from CI — so I changed the workflow

Debugging Playwright failures in CI has always felt more manual than it should be.

Not because the data isn’t there — it is.

But because it’s scattered.

A typical failure for me looks like this:

  • open CI job
  • download artifacts
  • open trace viewer locally
  • check screenshots
  • scroll logs
  • try to line everything up

It works… but it’s slow. Especially when multiple tests fail at once.


The real problem

The issue isn’t lack of data.

It’s that there’s no single place to understand what happened.

Everything lives in separate files:

  • traces
  • screenshots
  • logs
  • CI output

So debugging turns into stitching together context manually.

It gets worse with:

  • parallel runs
  • flaky tests
  • multiple failures triggered by the same root cause

At that point you’re not debugging — you’re reconstructing events.


What I tried instead

I wanted to answer one simple question faster:

“What actually happened in this run?”

So I changed the workflow.

Instead of downloading artifacts and inspecting things one by one,

I pushed everything from a run into a single view.

That view shows:

  • all failed tests across jobs
  • traces, screenshots, logs in one place
  • failures grouped if they look related
  • a short summary of what likely happened

The goal wasn’t to add more data — it was to remove the jumping between tools.


Example

Instead of this:

  • open CI
  • download artifacts
  • open trace
  • go back to logs
  • repeat

You just open one link and see:

  • which tests failed
  • whether they failed for the same reason
  • what the UI looked like at failure
  • what the logs say

No downloading, no switching contexts.


What improved

Two things stood out immediately.

1. Faster triage

You can tell pretty quickly if:

  • it’s one bug causing multiple failures
  • or a bunch of unrelated issues

That alone saves a lot of time.


2. Less noise from flakiness

Grouping similar failures makes it obvious when:

  • multiple tests break for the same reason
  • vs random flakes

Before that, everything just looked like chaos.


What still isn’t great

This still feels like a workaround.

The ecosystem gives you all the pieces,

but not a clean way to reason about failures at the run level.

I’m curious how others are handling this today.

  • Do you rely mostly on trace viewer?
  • Do you download artifacts every time?
  • Any workflows that actually reduce debugging time?

If you’re curious

I open-sourced what I’ve been using here:

👉 https://github.com/adnangradascevic/playwright-reporter

Would love feedback — especially if you’re dealing with a lot of CI failures.

Top comments (1)

Collapse
 
sentinelqa profile image
Adnan G

One thing I didn’t expect. Grouping failures ended up being more useful than the raw logs.

Curious if others see the same or if you prefer digging test by test.