TestDino

Posted on Feb 24

Playwright CI/CD Integrations: GitHub Actions, Jenkins, and GitLab CI - What Actually Works

#playwright #cicd #github #jenkins

Running Playwright tests locally is easy. Running them automatically on every commit, across branches, with proper reporting? That's where most teams run into real problems.

This article walks through setting up Playwright in the three most popular CI systems: GitHub Actions, Jenkins, and GitLab CI, with working config examples, a side-by-side comparison, and practical tips for scaling Playwright test automation in CI.

Why CI/CD Matters for Playwright Teams

Without CI, someone has to manually run npx playwright test before every merge. That works with 10 tests. It falls apart completely at 200.
CI/CD solves this by giving your Playwright suite:

Automatic triggers - Tests run on every push or pull request without anyone remembering to run them
Consistent environments - No more "works on my machine." Same OS, same browser versions, every time
Fast feedback - Developers see pass/fail in the PR within minutes, not after merging
Artifact collection - Screenshots, traces, and videos saved automatically when tests fail
Parallel execution - Split your suite across machines to cut 30-minute runs down to under 5.

The Three Platforms at a Glance

All three platforms run Playwright the same way under the hood: install Node, install browsers, run npx playwright test, collect artifacts. The difference is in configuration and infrastructure management.

1. GitHub Actions is the fastest to set up if your code lives on GitHub. YAML config, managed cloud runners, native PR check integration. Playwright's own documentation uses GitHub Actions as the primary CI example.

2. Jenkins gives you full infrastructure control which is useful for regulated environments and teams behind corporate firewalls but you maintain the server, agents, and browser dependencies yourself.

3. GitLab CI is the natural choice for GitLab teams. Native Docker support, a built-in parallel keyword that makes Playwright sharding dead simple, and artifacts tied directly to merge requests.

Core Setup: What Every Platform Needs

The steps are identical across all three:

Check out the code
Install Node.js and project dependencies
Install Playwright browsers and OS-level system libraries
Run npx playwright test
Upload artifacts : reports, screenshots, traces

The key insight for browser dependencies: Playwright needs system libraries like libgtk, NSS, and ALSA that aren't on default CI images.Using Playwright's official Docker image(mcr.microsoft.com/playwright:v1.52.0-noble) is the recommended approach regardless of which CI platform you use. It ships with all browsers and system dependencies pre-installed.

Without Docker, you run npx playwright install --with-deps and manually manage system dependencies on each runner, which works but creates long-term maintenance headaches.

Playwright Config: Make It CI-Aware

Before writing any pipeline config, update your playwright.config.ts to detect CI automatically:

export default defineConfig({
  forbidOnly: !!process.env.CI,   // blocks .only() from sneaking into CI
  retries: process.env.CI ? 2 : 0, // catches flaky failures without manual reruns
  workers: process.env.CI ? 1 : undefined,
  reporter: [['html'], ['junit', { outputFile: 'test-results/results.xml' }]],
  use: {
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
});

Start with workers: 1 in CI. Once your suite is stable, scale up. Going straight to high parallelism often introduces flakiness that wastes more time than it saves.

Running Playwright Tests in Parallel Across CI

Serial execution works for small suites. Once you hit 50+ tests, it becomes a bottleneck.

Workers run multiple tests on the same machine. Sharding distributes tests across multiple CI machines entirely. For suites over 10 minutes, sharding is the right move.

GitHub Actions uses a matrix strategy where you define shards like 1/4, 2/4, 3/4, 4/4
GitLab CI uses the native parallel keyword; arguably the cleanest sharding implementation of the three
Jenkins uses parallel stages in the Jenkinsfile

After sharding, use Playwright's merge-reports command to combine blob reports from each shard into a single HTML report. Without this step, you end up with fragmented results across machines.

Tracking Playwright CI Failures at Scale

Setting up CI is the easy part. The hard part is figuring out what went wrong when tests fail across hundreds of tests, multiple branches, and different environments.

Default Playwright CI output: pass/fail counts and error logs, is fine for 20 tests. As your suite grows, you hit real walls:

- Noisy logs: The actual failure is buried in hundreds of lines of output
- Disappearing artifacts: Most CI platforms delete them after a few days — historical comparison gone
- Hidden flaky tests: A test failing 10% of the time doesn't look "broken." It just adds confusion
- No cross-run context: Is this failure new? Did it happen on main too? One CI run cannot tell you

Good CI reporting should instantly answer four questions: Which tests failed and why? Is this a new failure or a known flaky test? Which branch and commit introduced it? Is this failure environment-specific?

For teams running Playwright across multiple CI providers, centralized reporting with AI failure classification,marking failures as Bug, Flaky, or UI Change, means you know where to start before you even open a trace.

The Non-Negotiable Best Practices

For speed: Cache node_modules and Playwright browsers (browser downloads are 200–400MB). Use Playwright's Docker image to skip browser install entirely. Shard when serial time exceeds 10 minutes.

For stability: Pin browser versions with a specific Docker tag ,not latest. Set forbidOnly: true in CI. Keep retries at exactly 2 , more than 3 hides real problems.

For debugging: Always upload artifacts with if: always() (GitHub Actions) or when: always (GitLab CI). Enable traces on first retry. on-first-retry captures a full trace only on failure, balancing data quality with storage costs.

For reporting: Track flaky test rates. Above 2% means trust in CI is eroding. Creating a Jira or Linear ticket from a CI failure should take one click, not five minutes of copy-pasting.

Bottom Line

Pick the platform that matches your source control, add a config file, and your tests run on every push. The real challenge is at scale.

Get the pipeline right first : Docker images, proper artifact collection, smart Playwright test automation. Then invest in reporting that gives your team fast, clear answers instead of log scrolling.

Here is the full guide with working YAML for GitHub Actions, Jenkins, and GitLab CI.

DEV Community