<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Adnan G</title>
    <description>The latest articles on DEV Community by Adnan G (@sentinelqa).</description>
    <link>https://dev.to/sentinelqa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3641884%2F3a07e6b8-afb1-459c-97b0-d3b153bc3de1.png</url>
      <title>DEV Community: Adnan G</title>
      <link>https://dev.to/sentinelqa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sentinelqa"/>
    <language>en</language>
    <item>
      <title>I got tired of downloading Playwright artifacts from CI, so I changed the workflow</title>
      <dc:creator>Adnan G</dc:creator>
      <pubDate>Fri, 20 Mar 2026 21:20:22 +0000</pubDate>
      <link>https://dev.to/sentinelqa/i-got-tired-of-downloading-playwright-artifacts-from-ci-so-i-changed-the-workflow-6gf</link>
      <guid>https://dev.to/sentinelqa/i-got-tired-of-downloading-playwright-artifacts-from-ci-so-i-changed-the-workflow-6gf</guid>
      <description>&lt;h1&gt;
  
  
  I got tired of downloading Playwright artifacts from CI — so I changed the workflow
&lt;/h1&gt;

&lt;p&gt;Debugging Playwright failures in CI has always felt more manual than it should be.&lt;/p&gt;

&lt;p&gt;Not because the data isn’t there — it is.&lt;br&gt;&lt;br&gt;
But because it’s scattered.&lt;/p&gt;

&lt;p&gt;A typical failure for me looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open CI job
&lt;/li&gt;
&lt;li&gt;download artifacts
&lt;/li&gt;
&lt;li&gt;open trace viewer locally
&lt;/li&gt;
&lt;li&gt;check screenshots
&lt;/li&gt;
&lt;li&gt;scroll logs
&lt;/li&gt;
&lt;li&gt;try to line everything up
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It works… but it’s slow. Especially when multiple tests fail at once.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real problem
&lt;/h2&gt;

&lt;p&gt;The issue isn’t lack of data.&lt;/p&gt;

&lt;p&gt;It’s that there’s no &lt;strong&gt;single place&lt;/strong&gt; to understand what happened.&lt;/p&gt;

&lt;p&gt;Everything lives in separate files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;traces
&lt;/li&gt;
&lt;li&gt;screenshots
&lt;/li&gt;
&lt;li&gt;logs
&lt;/li&gt;
&lt;li&gt;CI output
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So debugging turns into stitching together context manually.&lt;/p&gt;

&lt;p&gt;It gets worse with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parallel runs
&lt;/li&gt;
&lt;li&gt;flaky tests
&lt;/li&gt;
&lt;li&gt;multiple failures triggered by the same root cause
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point you’re not debugging — you’re reconstructing events.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I tried instead
&lt;/h2&gt;

&lt;p&gt;I wanted to answer one simple question faster:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What actually happened in this run?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I changed the workflow.&lt;/p&gt;

&lt;p&gt;Instead of downloading artifacts and inspecting things one by one,&lt;br&gt;&lt;br&gt;
I pushed everything from a run into a single view.&lt;/p&gt;

&lt;p&gt;That view shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;all failed tests across jobs
&lt;/li&gt;
&lt;li&gt;traces, screenshots, logs in one place
&lt;/li&gt;
&lt;li&gt;failures grouped if they look related
&lt;/li&gt;
&lt;li&gt;a short summary of what likely happened
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal wasn’t to add more data — it was to remove the jumping between tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;Instead of this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open CI
&lt;/li&gt;
&lt;li&gt;download artifacts
&lt;/li&gt;
&lt;li&gt;open trace
&lt;/li&gt;
&lt;li&gt;go back to logs
&lt;/li&gt;
&lt;li&gt;repeat
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You just open one link and see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which tests failed
&lt;/li&gt;
&lt;li&gt;whether they failed for the same reason
&lt;/li&gt;
&lt;li&gt;what the UI looked like at failure
&lt;/li&gt;
&lt;li&gt;what the logs say
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No downloading, no switching contexts.&lt;/p&gt;




&lt;h2&gt;
  
  
  What improved
&lt;/h2&gt;

&lt;p&gt;Two things stood out immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Faster triage
&lt;/h3&gt;

&lt;p&gt;You can tell pretty quickly if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it’s one bug causing multiple failures
&lt;/li&gt;
&lt;li&gt;or a bunch of unrelated issues
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That alone saves a lot of time.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Less noise from flakiness
&lt;/h3&gt;

&lt;p&gt;Grouping similar failures makes it obvious when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple tests break for the same reason
&lt;/li&gt;
&lt;li&gt;vs random flakes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before that, everything just looked like chaos.&lt;/p&gt;




&lt;h2&gt;
  
  
  What still isn’t great
&lt;/h2&gt;

&lt;p&gt;This still feels like a workaround.&lt;/p&gt;

&lt;p&gt;The ecosystem gives you all the pieces,&lt;br&gt;&lt;br&gt;
but not a clean way to reason about failures at the run level.&lt;/p&gt;

&lt;p&gt;I’m curious how others are handling this today.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you rely mostly on trace viewer?
&lt;/li&gt;
&lt;li&gt;Do you download artifacts every time?
&lt;/li&gt;
&lt;li&gt;Any workflows that actually reduce debugging time?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  If you’re curious
&lt;/h2&gt;

&lt;p&gt;I open-sourced what I’ve been using here:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/adnangradascevic/playwright-reporter" rel="noopener noreferrer"&gt;https://github.com/adnangradascevic/playwright-reporter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love feedback — especially if you’re dealing with a lot of CI failures.&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>ci</category>
      <category>devops</category>
    </item>
    <item>
      <title>Debugging Playwright Failures in CI Is Still Painful - I Tried to Fix It</title>
      <dc:creator>Adnan G</dc:creator>
      <pubDate>Tue, 17 Mar 2026 19:25:52 +0000</pubDate>
      <link>https://dev.to/sentinelqa/debugging-playwright-failures-in-ci-is-still-painful-i-tried-to-fix-it-40g0</link>
      <guid>https://dev.to/sentinelqa/debugging-playwright-failures-in-ci-is-still-painful-i-tried-to-fix-it-40g0</guid>
      <description>&lt;h1&gt;
  
  
  Debugging Playwright Failures in CI Is Still Painful — I Tried to Fix It
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;Playwright gives you everything you need to debug a failed test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;traces
&lt;/li&gt;
&lt;li&gt;screenshots
&lt;/li&gt;
&lt;li&gt;videos
&lt;/li&gt;
&lt;li&gt;logs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In theory, debugging should be easy.&lt;/p&gt;

&lt;p&gt;In practice, it’s not.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually happens in CI
&lt;/h2&gt;

&lt;p&gt;When a test fails in CI, the workflow usually looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;download the trace
&lt;/li&gt;
&lt;li&gt;open screenshots
&lt;/li&gt;
&lt;li&gt;watch the video
&lt;/li&gt;
&lt;li&gt;scroll through logs
&lt;/li&gt;
&lt;li&gt;try to reconstruct what happened
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All the data is there.&lt;/p&gt;

&lt;p&gt;It’s just… scattered.&lt;/p&gt;

&lt;p&gt;And the more tests you run, the worse this gets.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real bottleneck
&lt;/h2&gt;

&lt;p&gt;It’s not writing tests.&lt;br&gt;&lt;br&gt;
It’s not even flaky tests.&lt;/p&gt;

&lt;p&gt;It’s this:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;figuring out why a test failed takes too long&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Especially when you’re dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parallel runs
&lt;/li&gt;
&lt;li&gt;multiple environments
&lt;/li&gt;
&lt;li&gt;CI pipelines
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I wanted instead
&lt;/h2&gt;

&lt;p&gt;I didn’t want more data.&lt;/p&gt;

&lt;p&gt;I wanted:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;everything about a failed test in one place&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  So I built a small open-source Playwright reporter
&lt;/h2&gt;

&lt;p&gt;It collects everything from a test run and puts it into a single report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;traces
&lt;/li&gt;
&lt;li&gt;screenshots
&lt;/li&gt;
&lt;li&gt;videos
&lt;/li&gt;
&lt;li&gt;logs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No downloading artifacts.&lt;br&gt;&lt;br&gt;
No jumping between tools.&lt;/p&gt;

&lt;p&gt;Just one place to understand what happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it looks like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1yicc407jbpctnqygpx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1yicc407jbpctnqygpx.png" alt=" " width="800" height="870"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How it fits into a workflow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Run tests (locally or in CI)
&lt;/li&gt;
&lt;li&gt;Reporter collects artifacts
&lt;/li&gt;
&lt;li&gt;Open one report → see everything
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Optional: cloud debugging
&lt;/h2&gt;

&lt;p&gt;If you're running tests in CI, there’s also an option to upload runs to a cloud dashboard (Sentinel) so you can inspect failures without downloading artifacts.&lt;/p&gt;

&lt;p&gt;But the reporter itself works fully on its own.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I’m sharing this
&lt;/h2&gt;

&lt;p&gt;I kept running into this problem over and over again, and I’m curious if others are dealing with the same thing.&lt;/p&gt;

&lt;p&gt;How are you debugging Playwright failures in CI today?&lt;/p&gt;




&lt;h2&gt;
  
  
  GitHub
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/adnangradascevic/playwright-reporter" rel="noopener noreferrer"&gt;https://github.com/adnangradascevic/playwright-reporter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>devops</category>
      <category>cicd</category>
    </item>
    <item>
      <title>Why Playwright Tests Pass Locally but Fail in CI</title>
      <dc:creator>Adnan G</dc:creator>
      <pubDate>Fri, 06 Mar 2026 17:00:03 +0000</pubDate>
      <link>https://dev.to/sentinelqa/why-playwright-tests-pass-locally-but-fail-in-ci-4ph6</link>
      <guid>https://dev.to/sentinelqa/why-playwright-tests-pass-locally-but-fail-in-ci-4ph6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs6pmrp6yazlwaeiprq15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs6pmrp6yazlwaeiprq15.png" alt="Why Playwright Tests Pass Locally but Fail in CI" width="800" height="533"&gt;&lt;/a&gt;A Playwright test that passes on your laptop but fails in CI is not behaving randomly. It is exposing a dependency your test already had.&lt;/p&gt;

&lt;p&gt;Most teams call this “CI flakiness” too early. That label is usually too vague to be useful. What is really happening is a mix of environmental mismatch and hidden assumptions. Your laptop is already warmed up. You may already be logged in. Your machine may be faster in the ways that matter for your app. You are probably running fewer tests at once. CI removes a lot of that comfort.&lt;/p&gt;

&lt;p&gt;That is why a test can look healthy during development and still break the moment it runs inside a container, on a hosted runner, or across multiple shards.&lt;/p&gt;

&lt;p&gt;The important mindset shift is simple:&lt;/p&gt;

&lt;p&gt;CI failures are rarely random. They expose assumptions your tests were already making.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Playwright Tests Pass Locally but Fail in CI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CI machines behave differently
&lt;/h3&gt;

&lt;p&gt;CI runners are usually more constrained than developer machines. They often have less CPU, less memory, noisier disk I/O, and more background contention. Browser startup can take longer. Rendering can lag. Network timing changes. Animations and layout shifts may occur at different moments.&lt;/p&gt;

&lt;p&gt;That matters because timing-sensitive tests usually do not fail where they were written. They fail where the environment stops hiding the race.&lt;/p&gt;

&lt;p&gt;A common example is clicking a button immediately after navigation because it always works locally. In CI, the page may still be loading data, the button may still be disabled, or a loading overlay may briefly cover the element.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parallel execution changes behavior
&lt;/h3&gt;

&lt;p&gt;Playwright runs tests in parallel workers by default, and CI usually pushes harder on parallelism than local development does.&lt;/p&gt;

&lt;p&gt;That changes the system around the tests. Shared accounts become a problem. Database fixtures collide. Two tests update the same entity. Temporary files get overwritten. API rate limits appear. Test order starts to matter when it should not.&lt;/p&gt;

&lt;p&gt;Locally, you might run one spec file. In CI, the full suite may run across workers and shards. Same code, very different pressure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local state hides dependencies
&lt;/h3&gt;

&lt;p&gt;Your laptop often has invisible advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cached authentication&lt;/li&gt;
&lt;li&gt;existing cookies or local storage&lt;/li&gt;
&lt;li&gt;seeded test data&lt;/li&gt;
&lt;li&gt;environment variables loaded in your shell&lt;/li&gt;
&lt;li&gt;already-installed browser dependencies&lt;/li&gt;
&lt;li&gt;slightly different Node, OS, or browser versions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CI starts clean. That is not a drawback. It is often the first environment that tells the truth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Headless execution reveals weak synchronization
&lt;/h3&gt;

&lt;p&gt;Another common pattern is a test that passes when you watch it but fails when it runs unattended.&lt;/p&gt;

&lt;p&gt;That usually means the test is benefitting from the extra delay introduced by headed mode, debug mode, or step-by-step local investigation. The interaction only works because your local workflow slows the system down enough to avoid the race.&lt;/p&gt;

&lt;p&gt;CI runs headless and moves quickly. If your test clicks too early, asserts too early, or depends on a transient DOM state, CI is where that weakness shows up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recognizable Symptoms
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The same test fails in different places
&lt;/h3&gt;

&lt;p&gt;This is one of the clearest signs of unstable synchronization.&lt;/p&gt;

&lt;p&gt;One run times out waiting for a click. Another fails on an assertion. Another says the element detached from the DOM. Another times out during navigation.&lt;/p&gt;

&lt;p&gt;Different symptoms, same root issue: the test is racing the application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failures appear only under full suite load
&lt;/h3&gt;

&lt;p&gt;If a test passes alone but fails during the full suite, look closely at concurrency and shared state.&lt;/p&gt;

&lt;p&gt;Typical patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it passes in local UI mode&lt;/li&gt;
&lt;li&gt;it passes when run alone&lt;/li&gt;
&lt;li&gt;it passes with &lt;code&gt;--workers=1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;it fails in CI under normal parallelism&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is usually not mysterious flakiness. It is interference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retries make the pipeline green but confidence worse
&lt;/h3&gt;

&lt;p&gt;Retries are useful, but only if you treat them as a signal.&lt;/p&gt;

&lt;p&gt;A test that fails once and passes on retry is not healthy. It is unstable. Green builds created by retries can hide a growing reliability problem until the suite becomes noisy enough that engineers stop trusting it.&lt;/p&gt;

&lt;p&gt;Retries help classify failure patterns. They do not fix the underlying cause.&lt;/p&gt;

&lt;h3&gt;
  
  
  Screenshots are not enough
&lt;/h3&gt;

&lt;p&gt;A screenshot shows you one frame near the point of failure. That can be useful, but it rarely explains why the failure happened.&lt;/p&gt;

&lt;p&gt;For CI debugging, you usually need more than a still image. You need sequence and context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what action happened before the failure&lt;/li&gt;
&lt;li&gt;what the DOM looked like at that point&lt;/li&gt;
&lt;li&gt;whether the page was still loading&lt;/li&gt;
&lt;li&gt;whether a request failed&lt;/li&gt;
&lt;li&gt;whether a modal, toast, or overlay appeared&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why traces are usually more valuable than screenshots alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Debug the Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Reproduce under CI-like conditions first
&lt;/h3&gt;

&lt;p&gt;The first mistake is often reproducing the issue in a slower, more forgiving local mode.&lt;/p&gt;

&lt;p&gt;Start by making the local run behave more like CI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 npx playwright &lt;span class="nb"&gt;test &lt;/span&gt;tests/checkout.spec.ts &lt;span class="nt"&gt;--workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4 &lt;span class="nt"&gt;--retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then reduce variables gradually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 npx playwright &lt;span class="nb"&gt;test &lt;/span&gt;tests/checkout.spec.ts &lt;span class="nt"&gt;--workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the failure disappears with one worker, investigate shared state, ordering assumptions, and data isolation.&lt;/p&gt;

&lt;p&gt;If it still fails, look at timing, locators, network dependencies, and environment drift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Turn on traces for failing runs
&lt;/h3&gt;

&lt;p&gt;Traces are the fastest way to understand most CI failures because they preserve the timeline.&lt;/p&gt;

&lt;p&gt;They typically let you inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;each action the test took&lt;/li&gt;
&lt;li&gt;DOM snapshots around each step&lt;/li&gt;
&lt;li&gt;network activity&lt;/li&gt;
&lt;li&gt;console output&lt;/li&gt;
&lt;li&gt;screenshots captured through the run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A practical Playwright config looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@playwright/test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;use&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;on-first-retry&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;only-on-failure&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;video&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;retain-on-failure&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you useful evidence without tracing every single passing test.&lt;/p&gt;

&lt;p&gt;To inspect a saved trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx playwright show-trace trace.zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Compare the failing action with the previous stable action
&lt;/h3&gt;

&lt;p&gt;When debugging a trace, do not stare only at the final stack trace.&lt;/p&gt;

&lt;p&gt;Instead, reconstruct the sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What was the last clearly successful action?&lt;/li&gt;
&lt;li&gt;What changed between that step and the failure?&lt;/li&gt;
&lt;li&gt;Did the DOM update?&lt;/li&gt;
&lt;li&gt;Did the page navigate?&lt;/li&gt;
&lt;li&gt;Did a request arrive late or fail?&lt;/li&gt;
&lt;li&gt;Did some UI element block the next action?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is how reliable debugging works. You are building a timeline, not just reading an error string.&lt;/p&gt;

&lt;h3&gt;
  
  
  Check environment drift explicitly
&lt;/h3&gt;

&lt;p&gt;A surprising number of CI issues come from local and CI environments not actually matching.&lt;/p&gt;

&lt;p&gt;Verify the basics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node version&lt;/li&gt;
&lt;li&gt;Playwright version&lt;/li&gt;
&lt;li&gt;browser version&lt;/li&gt;
&lt;li&gt;OS or container image&lt;/li&gt;
&lt;li&gt;timezone and locale&lt;/li&gt;
&lt;li&gt;environment variables&lt;/li&gt;
&lt;li&gt;backend endpoints&lt;/li&gt;
&lt;li&gt;test data setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also make sure CI installs the browser dependencies correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx playwright &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--with-deps&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small differences can create failures that look random until you line the environments up properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Using fixed sleeps
&lt;/h3&gt;

&lt;p&gt;This is still one of the most common causes of weak Playwright tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[data-test=submit]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a brittle test for both slow and fast runs. On a slow CI worker, two seconds may not be enough. On a fast run, it wastes time without increasing confidence.&lt;/p&gt;

&lt;p&gt;Prefer waiting for a meaningful condition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;button&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Submit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})).&lt;/span&gt;&lt;span class="nf"&gt;toBeEnabled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;button&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Submit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That ties the wait to application readiness instead of guessing at timing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing selectors that match by accident
&lt;/h3&gt;

&lt;p&gt;A selector can appear stable locally and still be fundamentally weak.&lt;/p&gt;

&lt;p&gt;That often happens when the selector depends on CSS structure, text that appears in multiple places, or elements that exist only briefly during loading.&lt;/p&gt;

&lt;p&gt;Prefer resilient locators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;button&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Checkout&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByLabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByTestId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;submit-order&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stable locators reduce the chance that timing changes will cause the test to hit the wrong element.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sharing accounts and mutable data across workers
&lt;/h3&gt;

&lt;p&gt;If multiple tests use the same login, mutate the same cart, or update the same record, parallel workers will eventually collide.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one test deletes data another test needs&lt;/li&gt;
&lt;li&gt;two workers update the same profile&lt;/li&gt;
&lt;li&gt;multiple tests create orders under one account&lt;/li&gt;
&lt;li&gt;shared setup leaves the system in an unexpected state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Isolation must include test data, not just browser context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Debugging only with video
&lt;/h3&gt;

&lt;p&gt;Video is useful for showing a flow. It is much less useful for explaining a failure.&lt;/p&gt;

&lt;p&gt;A video does not tell you the full DOM state at each action. It does not show the structured test timeline. It does not explain which network request failed or which locator matched.&lt;/p&gt;

&lt;p&gt;That is why video is helpful context, but trace is usually the stronger debugging artifact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Better Debugging Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Collect artifacts by default
&lt;/h3&gt;

&lt;p&gt;Do not wait for a severe incident before adding artifact retention to the pipeline.&lt;/p&gt;

&lt;p&gt;A strong CI workflow keeps the evidence needed to debug failures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;traces&lt;/li&gt;
&lt;li&gt;screenshots&lt;/li&gt;
&lt;li&gt;videos&lt;/li&gt;
&lt;li&gt;console logs&lt;/li&gt;
&lt;li&gt;HTML reports&lt;/li&gt;
&lt;li&gt;any relevant backend or network logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where a tool like &lt;a href="https://sentinelqa.com/" rel="noopener noreferrer"&gt;SentinelQA&lt;/a&gt; helps. Not because it magically fixes flakiness, but because it aggregates the artifacts engineers already need when Playwright failures happen in CI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Classify failures before trying to fix them
&lt;/h3&gt;

&lt;p&gt;Not every red build is the same kind of problem.&lt;/p&gt;

&lt;p&gt;A useful first pass is to classify each failure into one of these buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;synchronization bug&lt;/li&gt;
&lt;li&gt;selector bug&lt;/li&gt;
&lt;li&gt;shared state bug&lt;/li&gt;
&lt;li&gt;test data issue&lt;/li&gt;
&lt;li&gt;infrastructure or dependency issue&lt;/li&gt;
&lt;li&gt;genuine product regression&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That prevents teams from using “flaky” as a catch-all label for everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reproduce with the smallest meaningful scope
&lt;/h3&gt;

&lt;p&gt;Rerunning the whole pipeline again and again usually wastes time.&lt;/p&gt;

&lt;p&gt;Instead, narrow the failure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 npx playwright &lt;span class="nb"&gt;test &lt;/span&gt;tests/checkout.spec.ts &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;chromium &lt;span class="nt"&gt;--workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then inspect the report:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx playwright show-report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And inspect the trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx playwright show-trace trace.zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tight loop makes debugging much faster than repeatedly waiting for full-suite reruns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Tips
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Make CI behavior explicit in config
&lt;/h3&gt;

&lt;p&gt;Many teams get better reliability simply by making CI-specific behavior intentional instead of accidental.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@playwright/test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;forbidOnly&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reporter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;line&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
  &lt;span class="na"&gt;use&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;on-first-retry&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;only-on-failure&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;video&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;retain-on-failure&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not that every suite should use these exact values. The point is that CI settings should be deliberate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test the real contract, not the animation
&lt;/h3&gt;

&lt;p&gt;A spinner disappearing does not always mean the page is ready. A navigation event finishing does not always mean the right data is rendered.&lt;/p&gt;

&lt;p&gt;Wait for what the user actually depends on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a button becomes enabled&lt;/li&gt;
&lt;li&gt;a row appears with expected data&lt;/li&gt;
&lt;li&gt;a confirmation message is visible&lt;/li&gt;
&lt;li&gt;the new URL is loaded and a key element is rendered&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tests become more stable when they assert outcomes instead of transitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compare single-worker and multi-worker behavior
&lt;/h3&gt;

&lt;p&gt;A fast way to identify concurrency issues is to run the same test under different worker settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx playwright &lt;span class="nb"&gt;test &lt;/span&gt;tests/account.spec.ts &lt;span class="nt"&gt;--workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
npx playwright &lt;span class="nb"&gt;test &lt;/span&gt;tests/account.spec.ts &lt;span class="nt"&gt;--workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the failure appears only under higher concurrency, you have already learned something important. Stop treating the issue as random and inspect isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep artifacts easy to access
&lt;/h3&gt;

&lt;p&gt;The biggest productivity gain in CI debugging is usually not more reruns. It is faster access to evidence.&lt;/p&gt;

&lt;p&gt;If engineers have to download separate files from different CI tabs and reconstruct the timeline manually, debugging stays slow. If traces, logs, screenshots, and reports are easy to inspect in one place, root cause analysis becomes much faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;When Playwright tests pass locally but fail in CI, the problem is usually not that CI is unreliable.&lt;/p&gt;

&lt;p&gt;CI is stricter. It removes local conveniences. It starts from a cleaner environment. It adds concurrency. It exposes weak synchronization, hidden state, brittle selectors, and environment drift.&lt;/p&gt;

&lt;p&gt;That is useful.&lt;/p&gt;

&lt;p&gt;The wrong response is to add sleeps and hope retries keep the pipeline green. The right response is to reproduce under CI-like conditions, capture the right artifacts, inspect traces, and design tests that stay correct when the environment stops helping them.&lt;/p&gt;

&lt;p&gt;Once you do that, CI stops feeling random.&lt;/p&gt;

&lt;p&gt;It starts behaving like what it really is: the most honest test environment you have.&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>devops</category>
      <category>automation</category>
    </item>
    <item>
      <title>Debugging Playwright failures in CI is harder than it should be</title>
      <dc:creator>Adnan G</dc:creator>
      <pubDate>Thu, 05 Mar 2026 05:58:19 +0000</pubDate>
      <link>https://dev.to/sentinelqa/debugging-playwright-failures-in-ci-is-harder-than-it-should-be-42bc</link>
      <guid>https://dev.to/sentinelqa/debugging-playwright-failures-in-ci-is-harder-than-it-should-be-42bc</guid>
      <description>&lt;p&gt;If you run Playwright tests locally, debugging failures is usually straightforward.&lt;/p&gt;

&lt;p&gt;You run the test, open the trace viewer, inspect the DOM, and quickly figure out what went wrong.&lt;/p&gt;

&lt;p&gt;But once Playwright runs inside CI, things start to get messy.&lt;/p&gt;

&lt;p&gt;A typical failure workflow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A test fails in GitHub Actions / GitLab CI&lt;/li&gt;
&lt;li&gt;You open the job logs&lt;/li&gt;
&lt;li&gt;You download the artifacts&lt;/li&gt;
&lt;li&gt;You unzip the HTML report&lt;/li&gt;
&lt;li&gt;You open the trace locally&lt;/li&gt;
&lt;li&gt;You repeat this for other jobs if the suite runs in parallel&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At this point debugging the test sometimes takes longer than fixing the issue.&lt;/p&gt;

&lt;p&gt;The bigger the suite gets, the worse this becomes.&lt;/p&gt;

&lt;p&gt;If tests run across 10 to 20 CI jobs (or shards), understanding what actually happened requires digging through traces, logs and screenshots across multiple artifacts.&lt;/p&gt;

&lt;p&gt;In other words:&lt;/p&gt;

&lt;p&gt;The slow part of Playwright debugging in CI is not root cause analysis.&lt;/p&gt;

&lt;p&gt;It's reconstructing the failure context.&lt;/p&gt;

&lt;h2&gt;
  
  
  What usually helps
&lt;/h2&gt;

&lt;p&gt;A few things make CI debugging easier:&lt;/p&gt;

&lt;p&gt;• Enable &lt;code&gt;trace: "on-first-retry"&lt;/code&gt; or &lt;code&gt;trace: "retain-on-failure"&lt;/code&gt;&lt;br&gt;
• Save screenshots/videos on failure&lt;br&gt;
• Upload artifacts from CI&lt;/p&gt;

&lt;p&gt;These are essential, but they still leave you with scattered artifacts that must be downloaded locally.&lt;/p&gt;

&lt;h2&gt;
  
  
  A different approach
&lt;/h2&gt;

&lt;p&gt;Instead of downloading artifacts, we started reconstructing the entire CI run into a single debugging view.&lt;/p&gt;

&lt;p&gt;This way you can open a failed test and immediately inspect:&lt;/p&gt;

&lt;p&gt;• trace&lt;br&gt;&lt;br&gt;
• screenshots&lt;br&gt;&lt;br&gt;
• logs&lt;br&gt;&lt;br&gt;
• video&lt;br&gt;&lt;br&gt;
• CI metadata  &lt;/p&gt;

&lt;p&gt;without downloading anything.&lt;/p&gt;

&lt;p&gt;Here is a simple demo of what that looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://sentinelqa.com/demo" rel="noopener noreferrer"&gt;https://sentinelqa.com/demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Curious how other teams debug Playwright failures in CI once their test suites start running across multiple jobs.&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>devops</category>
      <category>cicd</category>
    </item>
    <item>
      <title>Introducing SentinelQA | AI-powered test intelligence for CI pipelines</title>
      <dc:creator>Adnan G</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:29:25 +0000</pubDate>
      <link>https://dev.to/sentinelqa/introducing-sentinelqa-ai-powered-test-intelligence-for-ci-pipelines-5acb</link>
      <guid>https://dev.to/sentinelqa/introducing-sentinelqa-ai-powered-test-intelligence-for-ci-pipelines-5acb</guid>
      <description>&lt;p&gt;CI failures are painful to debug. SentinelQA gives you run summaries, flaky test detection, regression analysis, visual diffs and AI-generated action items.&lt;/p&gt;

&lt;p&gt;Full launch details + free plan here:&lt;br&gt;
&lt;a href="https://sentinelqa.com/" rel="noopener noreferrer"&gt;https://sentinelqa.com/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>qa</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
