Playwright sharding with Bitbucket pipelines

#playwright #testing #automation #devops

Everyone seems to have a love/hate relationship with Atlassian products. I've only really worked at "Atlassian shops" my entire career. Jira, Confluence, Bitbucket, StatusPage. It's nice to have everything in "one place" but on occasion, it seems like so many people are always "fighting" with a limitation of their products. Can't get Jira to do the thing? I guess it's Excel again. Can't get Bitbucket to work with Playwright Test sharding? You've come to the right place.

So what is sharding? The concept is pretty simple and the execution even simpler. The command line pretty much looks like this:

npx playwright test --shard 1/3

Then you do the same for shard 2 of 3, and 3 of 3. Ideally, each command runs in its own machine/Docker image, and it's assigned its own little subset of tests.

And how does reporting work? If you are able to gather up all of the artifacts written (by default) to ./blob-report, then it's just this:

npx playwright merge-reports --reporter html ./blob-report

Sounds pretty sweet right? Bunch of tests running in parallel, across different pipeline jobs, and you merge a report and serve it up somewhere.

All of this is made super easy in Github Actions but unfortunately is absolutely non-existent in Bitbucket pipelines. The idea of a "job triggering other jobs" is just not a thing.

So how can this be done? Everything is done through shell scripts and some imagination. Firstly, let's take a look at the top level pipelines that we'll need:



pipelines:
  custom:
    execute-tests:
      - variables:
          - name: Environment
            default: dev
            allowed-values:
              - dev
              - stage
          - name: MaxNumberOfShards
            default: 1
      - step: *run-tests
    run-shard:
      - variables:
          - name: Environment
          - name: ShardNumber
          - name: MaxNumberOfShards
      - step: *run-shard

So what's happening here? The job run-shard is basically how our individual shards will be run. This is what it looks like from the Bitbucket Pipeline UI:

If you really wanted to, you could go into the Bitbucket pipeline UI, and resubmit this form for all of the shards you want to run. The idea here is to use our execute-tests pipeline job to automate all of that!

So what does our run-shard definition actually look like?



definitions:
  services:
    run-shard: &run-shard
      name: Run shard for playwright tests
      image: mcr.microsoft.com/playwright:v1.37.0-jammy
      size: 2x
      caches:
        - node
      script:
        - echo "TEST_ENV=$Environment" > .env
        - export DEBIAN_FRONTEND=noninteractive # Interactive installation of aws-cli causes issues
        - apt-get update && apt-get install -y awscli
        - npm install
        - npx playwright test --shard="$ShardNumber"/"$MaxNumberOfShards" || true # Run test shard
        - aws s3 cp blob-report/ s3://my-bucket/blob-report --recursive # Copy blob report to s3
      artifacts:
        - playwright-report/**
        - test-results/**
        - blob-report/**
        - logs/**
        - .env

Looking a little nasty isn't it? We have our Playwright Docker image executing what we want, which is the playwright test --shard cli command that we needed. From there, we are uploading the blob-report to S3, which means installing aws-cli during our pipeline. To me, this seemed a lot easier than trying to fetch artifacts from various pipeline jobs that can be fairly difficult to track down.

We have our individual run-shard job that can run shardNumber out of maxNumberOfShards (i.e. 1/6, 2/6, etc). I refer to these as "child pipelines". Take note that we've added || true to the playwright test step, as honestly we're not interested in seeing the individual test statuses for the child pipelines. Also we want to really focus on examining test results from our "parent pipeline", and not have a bunch of failed child pipelines divert our attention.

And so what does our parent pipeline look like? Admittedly it's a mess of shell scripts designed to do a few different things.



    run-tests: &run-tests
      name: Run all UI tests
      image: mcr.microsoft.com/playwright:v1.37.0-jammy
      size: 2x
      caches:
        - node
      script:
        - echo "TEST_ENV=$Environment" > .env
        - export DEBIAN_FRONTEND=noninteractive # Interactive installation of aws-cli causes issues
        - apt-get update && apt-get install -y awscli
        - aws s3 rm s3://my-bucket/blob-report --recursive # Clear out old blob reports from previous test runs
        - npm install
        - /bin/bash ./scripts/start_playwright_shards.sh # Start child pipelines
        - /bin/bash ./scripts/monitor_shards.sh # Monitor child pipelines from parent pipeline
        - /bin/bash ./scripts/merge_reports_from_shards.sh # Download sharded blob reports from S3 and merge
        # Fail the parent pipeline if test failures are found across shards
        - |
          if grep -qE "[0-9]+ failed" ./logs/test-results.log; then
            echo "Failed tests found in log file"
            exit 1
          fi
      artifacts:
        - playwright-report/**
        - test-results/**
        - logs/**
        - .env

This parent pipeline, through some shell scripts, will accomplish the following:

Iterate from 1 through $MaxNumberOfShards and send a POST to Bitbucket's API to start the run-shard pipeline job. The pipeline variables are sent as part of its payload.
Poll for any IN_PROGRESS child pipeline jobs using the Bitbucket API. If the number of run-shard jobs is 0, that means we're all done and the parent pipeline can finish.
Download the blob-report folder from S3 and execute merge-report. Here, I opt to create an html report as well as a list report, which is the Playwright default. The former is found as an artifact in playwright-report, while the latter is found in logs/test-results.log, which is a file that is normalized and parsed for results.
If the log file generated contains "X failed", it means at least 1 test failed across all children. And if any of the individual children fail, then the parent is deemed a failure too (hey, just like in real life!)

I'll spare you the details on the bash scripts, but for the most part the work involved inspecting Bitbucket's network requests and mimicking those via curl. From there, it's also a good idea to make your test reporting shareable and easily accessible for your team.

Well that's all there is to it. I wish it were simpler in Bitbucket but... it's not. Github Actions allows for a couple dozen lines of YAML to do the same thing. But here we have another thing to deal with when it comes to Atlassian. Thanks for the the blog idea though.

DEV Community

Playwright sharding with Bitbucket pipelines

Top comments (0)