TestDino

Posted on Jun 25

What to Check When Your Playwright Test Passes Locally But Fails in CI

#playwright #testing #ci #qa

You pushed the test. Passed locally 10 times clean. CI runs it. Fails.

You open the report. Screenshot is blank. Stack trace points to a selector you know exists.

You add a waitForTimeout(2000). Push again. Passes.
Next week, same test. Same blank screenshot. Different error message.

You are not fixing the test. You are guessing until it stops failing temporarily.

Here is what is actually happening and how to get out of it.

Why Does a Playwright Test Pass Locally But Fail in CI?

Your test is probably not broken. The CI environment is the variable nobody talks about.

On your machine:

One test runs. Full CPU. Full network. Nothing competing.
Page loads in under a second. Images render. API responds in under 200ms.
Screenshot captures a fully loaded state.

In CI with parallel tests running:

CPU is split across all parallel processes
Network bandwidth is shared across all shards
API calls queue behind dozens of others hitting the same endpoint
Page starts loading, resources get starved halfway through, rendering stops
Your selector checks for an element that never finished rendering
Screenshot captures a blank or half-loaded state

Your selector was never wrong. The environment killed the render before the element existed.

But the report says element not found. So you go fix the selector.

That is the loop.

What Does a Blank Playwright Screenshot in CI Actually Mean?

Not "test failed." One of these four specific things:

1. Page timed out before full render

Network got thin mid-load. Playwright captured the loading state. The element you checked for was never painted onto the screen.

You see: blank screenshot
What happened: resource contention during page load, not a selector problem

2. Image format changed and the CDN did not catch up

Your main branch now serves WebP or AVIF optimized images. Staging CDN does not support that format yet. Image request returns 404 silently or hangs. Placeholder stays. Your selector looks for the real rendered image. Fails.

You see: element not found: img.product-image
What happened: image format change on main branch broke the CDN request in staging

3. API timeout cascade

Parallel tests hammer the same endpoint. Your call joins the queue. Times out. Page never finishes rendering. Screenshot captures whatever loaded before timeout fired.

You see: blank or partial screenshot
What happened: API queue filled up under parallel load, your request died in it

4. Resource starvation from parallel execution

OS deprioritized your page load because 99 other processes were competing. Load that takes under a second locally takes 3 to 4 seconds in parallel CI. Playwright timeout fires before the page finishes.

You see: blank screenshot
What happened: physics, not your test code

Why Can't You Tell Which One It Is From the Report?

Standard Playwright reporter shows you:

Assertion failed
Blank screenshot
Stack trace pointing at the selector line

It does not show you:

Which network requests failed or timed out
When your selector ran relative to when the API responded
How many tests were running at the same time
Whether the page was mid-render or fully loaded when failure fired

So you have a failure with no context. You guess.

Most QAs guess timing and add a wait. But it was a CDN or environment issue. The wait sometimes helps by coincidence. The real problem stays. It comes back next week with a slightly different error and you start the loop again.

How Do You Actually Debug a Playwright CI Failure?

Here is the step-by-step process to stop guessing and start knowing.

Step 1: Check the network tab before you touch your test

Before you change a selector or add a wait, get the network data from that CI run.

First, make sure you are capturing traces. If trace: 'on-first-retry' is not in your config, you have zero network data on failures.

// playwright.config.ts
use: {
  trace: 'on-first-retry',
  screenshot: 'only-on-failure',
  video: 'retain-on-failure'
}

Playwright Trace Viewer

Once you have trace data, open Trace Viewer and go to the Network tab. Look for:

Any request with a 404 or 5xx status
Any request that shows no response at all (started, then nothing)
Image requests specifically if the failure involves a missing image element

If a network request failed before your selector ran: your test is innocent. Stop there.

Step 2: Check the timeline, not just the error message

The order of events matters more than what the error says.

You want to know:

When did the page load finish?
When did the API call complete?
When did your selector check run?

If your selector ran at 2100ms and the API responded at 2200ms, that is a timing issue. The element existed, your check ran 100ms too early.

If the API never responded at all, that is an environment issue. Not your test.

Step 3: Classify the failure type before writing any fix

There are three kinds of CI failures and each needs a different response:

Real test failure
Same error, fails consistently on every run, every environment.
What you do: fix your test code.

Environment failure
Passes when running alone, fails under parallel load. Network timeouts, resource starvation.
What you do: fix the environment config or escalate to ops.

Timing failure
Passes after you add a wait, was checking before the element existed.
What you do: replace waitForTimeout with waitForSelector or waitForResponse.

When you add waitForTimeout to everything without classifying first, you cover up environment failures instead of surfacing them. The suite fills up with arbitrary waits and inflated timeouts. It gets slower, more fragile, and the real problems stay invisible.

Step 4: Replace static waits with explicit waits

Once you confirm it is a timing issue and not a network issue, fix it properly.

Replace:

await page.waitForTimeout(2000); // this is a guess

With:

await page.waitForResponse(
  resp => resp.url().includes('/api/product') && resp.status() === 200
);

Or for a specific element:

await page.waitForSelector('img.product-image', { state: 'visible' });

Static waits hide the real problem. waitForResponse and waitForSelector make timing explicit and debuggable.

Playwright waitForResponse docs
Playwright waitForSelector docs

How Does the Image Optimization Problem Break CI in 2026?

This one trips up QA teams specifically because it looks exactly like a selector problem and the fix looks obvious but is completely wrong.

Your main branch started serving WebP and AVIF images for performance optimization. Your staging CDN config has not been updated in months. It does not support AVIF. Image requests either return 404 silently or hang until timeout.

Here is the actual sequence when your test hits CI:

0ms    -- Page starts loading
400ms  -- Image request fires (AVIF format from main branch)
2000ms -- Main page HTML content loads fine
3000ms -- Image CDN timeout threshold hit. No response from CDN.
3100ms -- Your selector checks for img.product-image
3200ms -- Selector fails. Image element never rendered

Report says: element not found: img.product-image
Screenshot shows: page looks loaded, image area is a blank placeholder
You think: wrong selector, or needs a wait
What actually happened: CDN does not support AVIF in staging, element never existed

Quick check to confirm this in your CI environment:

url -I https://your-staging-cdn.com/images/product.avif

If that returns 404: your blank screenshot is explained. Fix the CDN config. Do not touch your test.

Why Does Parallel Execution Make This Worse?

Because it is the default now and most reports do not reflect it.

GitHub Actions, GitLab CI, and CircleCI all shard and parallelize by default. Running 100 tests in parallel on a runner with 2 CPUs means you have already created resource starvation.

What this creates:

CPU contention: Your test alone runs in under a second. With 99 others competing on a 2-CPU runner, it runs in 3 to 4 seconds. Same test. Different resources. Nothing wrong with your code.

API queue buildup: Your endpoint responds in 150ms under normal load. Under 100 parallel requests, it responds in 1800ms or times out entirely. Same API. Different load.

CDN cache miss storm: Locally, your browser has the images cached. In parallel CI, 100 tests hit the CDN simultaneously with cold caches. Some requests hit the raw origin. Some time out.

A test that passes running alone but fails with 99 others is not a flaky test. It is a resource allocation problem.

Check this in your CI config:

`# In your GitHub Actions workflow

If you have a 2-CPU runner, 4 shards is reasonable. 100 is not.

run: npx playwright test --shard=1/4`

More shards than your runner can support is the problem. More waits will not fix that.

What Should a CI Failure Report Actually Show You?

Here is what good failure visibility looks like:

Visual state at failure:

Screenshot at the exact moment of failure
What was rendered and what was not
Was it a blank page, a partial load, or a fully loaded page with one missing element

Network state at failure:

Which requests completed with 200
Which returned 4xx or 5xx
Which timed out and at exactly what timestamp
Full sequence so you can see what failed first

Timing:

When did page load complete
When did each API call complete
When did your selector check run
Gap between last network event and selector check

Environment context:

How many tests were running at the same time
Whether resources were under pressure

When you have all four, you go from blind guessing to "image CDN request timed out at 3000ms, selector ran at 3100ms, element never rendered, not a test issue" in under two minutes.

Without all four, you are doing forensics manually every time.

What Is the Complete Process to Follow on Every CI Failure?

Follow these in order. Do not skip to your test code first.

1. Open trace. Go to the Network tab.

Found a 404 or 5xx? Image or API infrastructure issue. Your test is not the problem. Check CDN config or escalate to ops.
Found a request with no response at all? If it is an image request, check your CDN supports current image formats. If it is an API request, you are hitting a rate limit or queue issue under parallel load. Not your test.
All requests returned 200? Move to step 2.

2. Check the timing in the trace timeline.

Selector ran before the API completed? Use waitForResponse instead of waitForTimeout.
Selector ran after everything completed and element still not found? Now check your selector. This is the one case where your test code is actually the issue.

Check if the failure is consistent or only under parallel load.

Fails consistently on every run? Real test bug. Fix the selector or assertion.
Only fails under parallel load? Resource starvation. Check your shard count against runner CPU count. Reduce shards or increase runner size.

Network first. Timing second. Selector third. That order is everything.

How Does Getting the Right Visibility Change Everything?

The QA teams that stopped spending hours on every CI failure did not find better selectors or learn to write better waits. They got the right information in front of them when a failure happened.

Once you can see network requests, timing, and visual state together in one report:

You stop adjusting selectors that were never wrong
You stop adding waits for problems that were never timing issues
You surface CDN and infra problems to the right people instead of absorbing them as test flakiness
You spend 2 minutes on a failure instead of 45

Your tests do not need to get better first. Your visibility does.

TL;DR

Blank screenshot in CI = environment problem until the network tab proves otherwise
Turn on trace: 'on-first-retry' in playwright.config.ts or you are debugging blind
Check network requests before touching your selector
Check timing before adding a wait
WebP or AVIF image optimization on main branch causes "element not found" that looks like a selector issue but is a CDN config problem
100 shards on a 2-CPU runner is resource starvation, not flaky tests
The fix order is: network first, timing second, test code last

Your test is probably fine. Your visibility is the gap.

You do your job right. You catch things. You dig in when everyone else moves on.

The problem is that most reports stop at "it failed" and leave you doing the detective work alone.

You follow the process, you surface what broke, you fix the right thing. We are here to make sure that process takes two minutes instead of two hours. The hard work is yours.The visibility is on us.

DEV Community