DEV Community: Phil

When Playwright’s Locator Tool Isn’t Enough

Phil — Mon, 19 Jan 2026 21:47:09 +0000

Playwright’s built-in locator tool works fine most of the time, but once you begin dealing with real-world component libraries it can start to miss. It often suggests getByText or getByRole against elements that are not truly semantic controls, which makes those locators flaky or unusable in practice.

Checkboxes are a good example. In most modern apps, vanilla input [type="checkbox"] elements are rarely seen. What usually exists are custom components built with divs, spans, hidden inputs, aria attributes, and classes that represent state. Because of that, Playwright assertions like toBeChecked or toBeDisabled often cannot be trusted. The checkbox looks checked in the UI, but the underlying HTML does not expose the state in the way those helpers expect.

In these cases you need to own the locator yourself. The most reliable starting point is usually some stable text on the screen. Once you anchor to that text, you can walk up and down the DOM to find the real checkbox element.

Playwright has a small but very useful API for this: locator('..'). It moves one level up in the DOM, and you can chain it as many times as you need. It is much cleaner than xpath=../.. and a lot easier to remember. From there, you can navigate back down into the exact node that represents the state you care about.

For example, a locator chain for a checkbox in our app might look like this:

const checkboxLocator = page
  .locator('p.user-select-none', { hasText: 'School-Pupil Activity Bus' })
  .locator('..')
  .locator('..')
  .locator('.special-requirement-checkbox')
  .locator('input[type="checkbox"][aria-checked="true"][disabled="disabled"]')

await expect(checkboxLocator).toBeVisible()

Here the anchor is the visible label text. From there, the locator walks up to a shared container, then back down into a wrapper with a class you know is stable enough, and finally onto the actual input element. Because the checkbox is custom, you assert on aria-checked and disabled instead of relying on toBeChecked or toBeDisabled. A simple toBeVisible plus the right attributes ends up being more concrete than the higher level assertion API.

All of this is also to say why I am a little skeptical of AI testing tools that promise automatic locators and assertions. Real applications rarely use simple, semantic HTML controls. There is a lot of custom markup, hidden inputs, and framework specific structure that you have to understand and navigate. For now, a human who can anchor on the right text, walk the DOM, and assert on real attributes is still the most reliable way to write strong Playwright tests.

Using AI in Playwright Tests

Phil — Mon, 03 Nov 2025 21:20:13 +0000

It's been a long while since I've posted anything. Excuse the clickbaity title, but rest assured that this is content that I think will deliver. And this post isn't written by AI!

I'll keep this pretty short and sweet though. This idea was first inspired by the mobile automation framework Maestro. Other than its solid capabilities as a mobile test framework (using YAML of all things), I was impressed to see an API named assertWithAI. Imagine just prompting an LLM with "assert that a blue-colored Login button appears at the bottom". A real game changer in testing if you ask me!

That API is gatekept behind a paid subscription so I didn't experiment much from there. But I was curious to see if Playwright was anywhere close to implementing anything like that, since it is maintained by Microsoft, and Microsoft has its own large stake in OpenAI. At the time of this writing, it did not.

I gained access to an Enterprise version of OpenAI so I decided to experiment with its API. Here is a helpful util that will integrate AI with visual testing.

import OpenAI from 'openai'
import fs from 'fs'
import path from 'path'
import { Page } from '@playwright/test'
import { uniqueId } from './stringHelper'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY as string,
})

export const askAI = async (page: Page, question: string): Promise<string> => {
  const fileName = `screenshot-ai-${uniqueId()}.png`
  const absPath = path.resolve(fileName)
  await page.screenshot({ path: absPath })
  const imageBase64 = fs.readFileSync(absPath, 'base64')
  const dataUrl = `data:image/png;base64,${imageBase64}`

  const instructions = [
    'You are helping test a web page from a screenshot.',
    'Please answer in this format:',
    'First line: YES or NO only.',
    'Second line: One short sentence explaining your reasoning.',
    'If you are not confident, please answer NO.'
  ].join('\n')

  try {
    const response = await openai.responses.create({
      model: 'gpt-4.1-mini', // Cheapest version
      instructions,
      input: [
        {
          role: 'user',
          content: [
            {
              type: 'input_text',
              text: `Question: ${question}`,
            },
            {
              type: 'input_image',
              image_url: dataUrl,
              detail: 'low', // Really skimping on the dollars
            },
          ],
        },
      ],
      temperature: 0, // Always provide a more deterministic answer each time
      max_output_tokens: 128, // Keep responses short & cheap
    })

    const answer = response.output_text?.trim() || ''

    console.log('\n[AI PAGE CHECK]')
    console.log('Question:', question)
    console.log('Full AI answer:\n', answer)
    console.log('Screenshot file:', absPath, '\n')

    return answer
  } catch (err: any) {
    // Check rate limit
    if (err.status === 429) {
      console.log(err)
      throw new Error('OpenAI rate limit reached.')
    }
    throw err
  }
}

export const checkAIResponse = (aiResponse: string, expected: boolean) => {
  const firstLine = aiResponse.trim().split('\n')[0].toLowerCase()
  const aiSaidYes = firstLine.includes('yes')

  // Expected an answer of 'Yes' but AI said 'No'
  if (expected && !aiSaidYes) {
    throw new Error(`Expected YES but got:\n${aiResponse}`)
  }

  // Expected an answer of 'No' but AI said 'Yes'
  if (!expected && aiSaidYes) {
    throw new Error(`Expected NO but got:\n${aiResponse}`)
  }
}

And then writing the tests is really straightforward, powerful, and actually pretty fun:

test('AI confirms a map is visible on the page', async ({ page }) => {
  const mapLink = 'https://www.mysite.com/tracking/b3cd29b39b'
  await page.goto(mapLink)
  await expect(page.locator('[aria-label="Map"]')).toBeVisible()

  const aiAnswer = await askAI(page,
    'Is there a map and a visible plotted route that starts in Ann Arbor and ends in Detroit?'
  )

  await checkAIResponse(aiAnswer, true)
})

Sometimes Playwright snapshots can only take you so far, and they can become flaky if those snapshots are constantly changing. If there are assets in your app that frequently change and are difficult to automate (but easy to visually confirm), then these are the best spots for this kind of AI assist.

Sharding Jest tests. Harder than it should be?

Phil — Thu, 19 Dec 2024 14:58:29 +0000

I haven't posted here in quite a while. It's been a busy year!

One of the more challenging problems I've run into was getting our API tests in a place where they can run more quickly. I wrote an old post about API testing here.

In CI, our API tests run with one worker. A simple npm test kicks things off, sends results to a log file and nicely formatted HTML report thanks to jest-html-reporters (it even has dark mode!). However, as the number of tests increase, so does execution time.

In Playwright, I describe how we accomplished sharding in our test pipelines in this other post. Unfortunately, Jest does not make this easy whatsoever.

Even though Jest does support a --shard option, similar to Playwright, there doesn't appear to be any out-of-the-box solution to merging reports. That means that test results live in isolation in their own shards and in their own pipelines. Playwright offers this feature in a merge-reports option that wraps a nice bow on all of the reports generated by each shard.

I won't post too much code for now. Below is the general gist of how we ended up sharding our Jest API tests.

Giving up on the report merge

No feasible solutions exist to merge HTML reports, so we ended up creating an HTML "landing page" with clickable links to reports generated by each shard. At first, I began to experiment with jest-stare, but my dedication to dark mode in the reports was too much to overcome! At the same time, using this library also meant outputting test results to JSON, then merging those JSON results by hand, then converting those to an HTML report, and then trying to produce some kind of custom dark mode styling. I felt like I was sinking too much time there, and since we were already producing reports with jest-html-reporters, the quickest way out was to just serve up the individual reports.

So this landing page ends being uploaded in a step in our pipeline, kind of looking like this:

aws s3 cp landing_page.html $S3_BUCKET/$BITBUCKET_BUILD_NUMBER/index.html

The individual shard child pipelines are instructed to upload their own results, which can then be accessed via the index page when all is said and done.

Aggregating test results: Bonkers use of Bash

When pulling test results from a single Jest run, we were using regex to scrape the test results from a log file. For each individual shard, we would download each log file and cat them together. There was some seriously awful looking Bash scripts that had been put together to find the lines prefixed with Tests: in the log file, and then attempts to aggregate the number of tests passed and failed. That was when I learned about BASH_REMATCH, which sounds like an old NES game or something.

Here's a little snippet of some of that nastiness:

# Match lines that include failed, passed, and total
if [[ "$line" =~ Tests:\ *([0-9]+)\ failed,\ *([0-9]+)\ passed,\ *([0-9]+)\ total ]]; then
  FAILED_TESTS_TOTAL=$((FAILED_TESTS_TOTAL + ${BASH_REMATCH[1]}))
  PASSED_TESTS_TOTAL=$((PASSED_TESTS_TOTAL + ${BASH_REMATCH[2]}))
  TOTAL_TESTS_TOTAL=$((TOTAL_TESTS_TOTAL + ${BASH_REMATCH[3]}))

To my eyes, this looks pretty awful, but thankfully Jest logs are consistent in generating summary of test results. I cannot say with confidence that this will continue to work with any new Jest version updates.

Final thoughts, and a small plead for help?

I think it's amazing that Playwright had the foresight to put a lot of work towards reporting, which in my opinion is one of the more vital and underrated pieces of test engineering. But why is this so hard to do in Jest? I know that Jest was originally intended as a quick and lightweight test runner for frontend unit tests, but I feel confident that I'm not the only one using Jest as a test runner for API testing. If anyone out there is also running into scalability challenges in Jest and reporting, give a shout!

Adding standalone or "one off" scripts to your Playwright suite

Phil — Mon, 08 Apr 2024 13:11:46 +0000

There might be a time where you may be asked to automate some tasks that fall outside of what is considered a traditional test.

What do I mean by a traditional test? An automated test typically accomplishes some key ideas:

Perform some actions in the UI
Assert on the behavior resulting from those actions
Detects change in behavior and reports back on why a change might have occurred
Runs as often as needed, sometimes multiple times a day

Now there are times where automating the UI is required, but the outcome of that automation isn't needed regularly. I consider these to be standalone or "one off" Playwright scripts. Some examples might be:

Scraping some data off of pages to be used later (analytics, manual error checking, cataloging)
Inputting and setting (many) values in a form on a one-time basis
Reproducing a bug that might require repeated interactions with the same component / api / UI workflow.
Client requests such as "we have this spreadsheet data of configurations to create 1,000 widgets in your app, but we don't want to go through the UI manually.
A small set of smoke tests designed for production systems only, that aren't applicable in your lower environments
Cleanup scripts if tests aren't designed to do cleanup on their own

You can see that these obviously don't fit the bill of a "test suite", with interactions, assumptions, and assertions. So how do you include these type of files without having them run in CI and causing a disaster?

And here are some other requirements that were imposed:

We wanted devs to be able to run these scripts locally without any barriers. So a very simple npx playwright test reproduceBug was the goal
As just mentioned, we cannot have these files run in CI
They still needed to maintain the .spec.ts extension, otherwise Playwright will just spit out "No tests found"
The scripts should live in the same code repo we use for our other normal tests. An entirely separate repo just puts up more barriers for engaging with Playwright.

Playwright configs by default look like this:

export default defineConfig({
  testDir: './tests',

... rest of config

This means you cannot place test files outside of this directory, which was brought up as a question on Github some time ago. Initially, I thought it would be nice to add another folder in the repo called "scripts", but Playwright does not allow multiple testDir values.

So the easiest solution was simply to add a subfolder called tests-ignored, so the structure just looks like this:

├── root
│   ├── tests
│   │   ├── tests-ignored
├── package.json
├── package-lock.json
└── .gitignore

So in CI, you run your full battery of tests like this:

npx playwright test --grep-invert ignored

And if you're running a test file locally, everything is as normal as can be:

npx playwright test widgetCreation.spec.ts

And if you're running these other "one off" scripts, it's the same exact pattern:

npx playwright test reproduceDeadlockBug.spec.ts

So there you have it. A nice simple solution that avoids creating extra projects in your Playwright config, and avoids having to know extra options to have to pass to your CLI. This setup helps us freely automate anything we need, without it having to fit any rigid test structures purposefully designed for our regression suite.

Playwright sharding with Bitbucket pipelines

Phil — Mon, 20 Nov 2023 14:43:25 +0000

Everyone seems to have a love/hate relationship with Atlassian products. I've only really worked at "Atlassian shops" my entire career. Jira, Confluence, Bitbucket, StatusPage. It's nice to have everything in "one place" but on occasion, it seems like so many people are always "fighting" with a limitation of their products. Can't get Jira to do the thing? I guess it's Excel again. Can't get Bitbucket to work with Playwright Test sharding? You've come to the right place.

So what is sharding? The concept is pretty simple and the execution even simpler. The command line pretty much looks like this:

npx playwright test --shard 1/3

Then you do the same for shard 2 of 3, and 3 of 3. Ideally, each command runs in its own machine/Docker image, and it's assigned its own little subset of tests.

And how does reporting work? If you are able to gather up all of the artifacts written (by default) to ./blob-report, then it's just this:

npx playwright merge-reports --reporter html ./blob-report

Sounds pretty sweet right? Bunch of tests running in parallel, across different pipeline jobs, and you merge a report and serve it up somewhere.

All of this is made super easy in Github Actions but unfortunately is absolutely non-existent in Bitbucket pipelines. The idea of a "job triggering other jobs" is just not a thing.

So how can this be done? Everything is done through shell scripts and some imagination. Firstly, let's take a look at the top level pipelines that we'll need:

pipelines:
  custom:
    execute-tests:
      - variables:
          - name: Environment
            default: dev
            allowed-values:
              - dev
              - stage
          - name: MaxNumberOfShards
            default: 1
      - step: *run-tests
    run-shard:
      - variables:
          - name: Environment
          - name: ShardNumber
          - name: MaxNumberOfShards
      - step: *run-shard

So what's happening here? The job run-shard is basically how our individual shards will be run. This is what it looks like from the Bitbucket Pipeline UI:

If you really wanted to, you could go into the Bitbucket pipeline UI, and resubmit this form for all of the shards you want to run. The idea here is to use our execute-tests pipeline job to automate all of that!

So what does our run-shard definition actually look like?

definitions:
  services:
    run-shard: &run-shard
      name: Run shard for playwright tests
      image: mcr.microsoft.com/playwright:v1.37.0-jammy
      size: 2x
      caches:
        - node
      script:
        - echo "TEST_ENV=$Environment" > .env
        - export DEBIAN_FRONTEND=noninteractive # Interactive installation of aws-cli causes issues
        - apt-get update && apt-get install -y awscli
        - npm install
        - npx playwright test --shard="$ShardNumber"/"$MaxNumberOfShards" || true # Run test shard
        - aws s3 cp blob-report/ s3://my-bucket/blob-report --recursive # Copy blob report to s3
      artifacts:
        - playwright-report/**
        - test-results/**
        - blob-report/**
        - logs/**
        - .env

Looking a little nasty isn't it? We have our Playwright Docker image executing what we want, which is the playwright test --shard cli command that we needed. From there, we are uploading the blob-report to S3, which means installing aws-cli during our pipeline. To me, this seemed a lot easier than trying to fetch artifacts from various pipeline jobs that can be fairly difficult to track down.

We have our individual run-shard job that can run shardNumber out of maxNumberOfShards (i.e. 1/6, 2/6, etc). I refer to these as "child pipelines". Take note that we've added || true to the playwright test step, as honestly we're not interested in seeing the individual test statuses for the child pipelines. Also we want to really focus on examining test results from our "parent pipeline", and not have a bunch of failed child pipelines divert our attention.

And so what does our parent pipeline look like? Admittedly it's a mess of shell scripts designed to do a few different things.

    run-tests: &run-tests
      name: Run all UI tests
      image: mcr.microsoft.com/playwright:v1.37.0-jammy
      size: 2x
      caches:
        - node
      script:
        - echo "TEST_ENV=$Environment" > .env
        - export DEBIAN_FRONTEND=noninteractive # Interactive installation of aws-cli causes issues
        - apt-get update && apt-get install -y awscli
        - aws s3 rm s3://my-bucket/blob-report --recursive # Clear out old blob reports from previous test runs
        - npm install
        - /bin/bash ./scripts/start_playwright_shards.sh # Start child pipelines
        - /bin/bash ./scripts/monitor_shards.sh # Monitor child pipelines from parent pipeline
        - /bin/bash ./scripts/merge_reports_from_shards.sh # Download sharded blob reports from S3 and merge
        # Fail the parent pipeline if test failures are found across shards
        - |
          if grep -qE "[0-9]+ failed" ./logs/test-results.log; then
            echo "Failed tests found in log file"
            exit 1
          fi
      artifacts:
        - playwright-report/**
        - test-results/**
        - logs/**
        - .env

This parent pipeline, through some shell scripts, will accomplish the following:

Iterate from 1 through $MaxNumberOfShards and send a POST to Bitbucket's API to start the run-shard pipeline job. The pipeline variables are sent as part of its payload.
Poll for any IN_PROGRESS child pipeline jobs using the Bitbucket API. If the number of run-shard jobs is 0, that means we're all done and the parent pipeline can finish.
Download the blob-report folder from S3 and execute merge-report. Here, I opt to create an html report as well as a list report, which is the Playwright default. The former is found as an artifact in playwright-report, while the latter is found in logs/test-results.log, which is a file that is normalized and parsed for results.
If the log file generated contains "X failed", it means at least 1 test failed across all children. And if any of the individual children fail, then the parent is deemed a failure too (hey, just like in real life!)

I'll spare you the details on the bash scripts, but for the most part the work involved inspecting Bitbucket's network requests and mimicking those via curl. From there, it's also a good idea to make your test reporting shareable and easily accessible for your team.

Well that's all there is to it. I wish it were simpler in Bitbucket but... it's not. Github Actions allows for a couple dozen lines of YAML to do the same thing. But here we have another thing to deal with when it comes to Atlassian. Thanks for the the blog idea though.

Using Playwright fixtures to skip login pages

Phil — Tue, 26 Sep 2023 16:38:03 +0000

One of the first things that you might do with Playwright when you start automating is writing some code to automate logging into your app. It's so incredibly easy to do with npx playwright codegen as well, and I've demo'ed this to my team as a rudimentary example while introducing Playwright.

As you begin to scale up your tests, you'll likely find that interacting with the login page results in a lot of wasted clicks and network requests. In fact, we found that hitting /auth in every single one of our tests can cause some unintended side effects. Additionally, logging in repeatedly with the same user credentials doesn't coincide with real-world app usage, so why should we do the same in our tests?

Fixtures are very useful in that they act as a hook to your original tests, similarly to having beforeEach or afterEach hooks everywhere. The major upside of fixtures is that it makes for much cleaner and readable code and provides for consistency across all tests.

Here's what a fixture might look like:

import { test as baseTest, type Page } from '@playwright/test'
import { createAuthContext } from './authHelper'

type AuthFixtures = {
  mySiteAuth: Page
}

export const test = baseTest.extend<AuthFixtures>({
  mySiteAuth: async ({ browser }, use) => {
    const { page: authorizedPage, context: authorizedContext } = await createAuthContext(browser)
    await use(authorizedPage)
    await authorizedContext.close()
  }
})

Pretty simple but very important stuff going on here:

We're retrieving a page and browser context.
The call to use yields back to Playwright test where all your steps are written.
Closing context is important so that browser windows and their pages are closed properly.

You'll want to spend a decent chunk of time figuring out what your function for createAuthContext might look like, but in the end, you want it to return a new browser context along with a new page for that context. The browser context itself should take advantage of the storageState that Playwright offers. This storageState is basically a JSON blob that is written to mimic the exact local storage state of your app. Finally, you'll want to probably add some actual UI login logic if the storage JSON blobs don't exist.

I'll provide a little bit of code below, but keep in mind that this is already well documented by the Playwright team:

export const MY_SITE_FILE = 'playwright/.auth/my_site.json'

export const createAuthContext = async (browser: Browser) => {

  const contextOptions: any = { storageState: MY_SITE_FILE }

  const context = await browser.newContext(contextOptions)
  const page = await context.newPage()

  return { page, context }
}

Now, all you'll have to do is change up your test signatures. Most of them probably looked like this:

import { test, expect } from '@playwright/test'

test('do some things', async ({ page }) => {
  page.doSomeThings()
})

And now they'll just look like this:

import { test } from './helpers/testFixtures'
import { expect } from '@playwright/test'

test('do some things', async ({ mySiteAuth: page }) => {
  page.doTheSameThings()
})

What's awesome here is that you don't need to go and refactor a bunch of individual tests. The most you'll have to do is perhaps remove any calls to your UI login functions, which hopefully is easy to find and remove if you've followed some pretty consistent patterns in your tests.

What I find notable here is the use of await use(authorizedPage) almost acts like yield in Ruby, which I can definitely appreciate as someone who's loved the language for so long.

Hopefully this helps explain how to have Playwright fixtures working alongside their authentication patterns. I did feel that Playwright's documentation was a teeny bit lacking in this area. I'll also mention that using this type of approach that decouples from any global setup also allows for extending AuthFixtures to other sites within the same test suite.

When compared to Selenium WebDriver and how you typically have to set local storage manually through traditional Web APIs, Playwright really does make it a breeze to have already authorized pages at your disposal. Getting right into auth sessions right off the bat really allows for more focused testing.

Intercepting network requests in Playwright

Phil — Thu, 20 Jul 2023 01:30:25 +0000

I recall some time ago that the folks behind WebDriver made their intentions very clear: If an action can be taken in the browser viewport with a mouse and keyboard, they would provide an API for it. Anything else beyond that was merely an afterthought. As testing evolved into a more complex, technically challenging discipline in its own right, the tooling had to meet that demand.

Fast forward to the present day and we now have tools like WebDriver's BiDi API, as well as Playwright's support for network events.

I had described previously that intercepting network requests added to an effective waiting strategy for the DOM, and it still does. Slow network responses combined with very fast interactions in the UI can result in some fairly nasty, difficult-to-troubleshoot race conditions.

But one thing I had failed to mention was this: Ultimately, is that button that you're clicking actually sending the proper request? Playwright allows us to check on that with ease, and adds a nice wrinkle of defensiveness to our tests.

Playwright's documentation is pretty straightforward about this. Set up some promises, go about your Playwright Actions, and then retrieve the result of those promises. Here, we've written a higher order function that can be used throughout our test suite:

export const expectRequest = async (page: Page, requests: {url: string, method: string}[], action: () => Promise<void>) => {
  const promises = requests.map(({url, method}) => {
    const predicate = (response: Response) =>
      response.url().includes(url) &&
      response.request().method() === method &&
      response.status() === 200
    return page.waitForResponse(predicate)
  })

  await action()

  return await Promise.all(promises)
}

And here's an example of how we would use this helper:

const loginRequest = {url: '/auth', method: 'POST'}
const profileRequest = {url: '/profile', method: 'GET}
await expectRequest(page, [loginRequest, profileRequest], async () => {
  await page.getByRole('button', { name: 'Log In' }).click()
})

So there we have a pretty simple example of a button click to 'Log In', and we are expecting multiple requests to succeed. A POST request to /auth, and a GET request to /profile. This is a fairly arbitrary example, but I have come across some areas in our app where this has made a nice positive impact in our tests, and increases our confidence that the frontend and backend are in complete lockstep.

Happy testing!

Taking the plunge into Playwright!

Phil — Mon, 15 May 2023 15:30:19 +0000

I've finally got to a point where we decided to experiment with Playwright. There were some very good reasons for this:

Our old WebDriver tests couldn't really be trusted anymore. This wasn't really a problem specific with WebDriver, it was just unfortunate that this happens when tests aren't run regularly (daily/weekly), maintenance isn't prioritized, and tests were just written poorly from the beginning. Just to note: We support a product where our WebDriver tests are maintained & architected properly and executed on a weekly basis, and things are going swimmingly there. It's only when that cadence is abandoned where things can really really go south.
Running them in CI (even as simple as in a Bitbucket pipeline) was a huge pain, with a ton of custom work. This included containerization of our test code, pushing to ECR, starting an ECS instance, developing highly customized HTML reporting, and highly customized retry mechanisms. It was not the friendliest thing to deal with on a long-term basis.
Writing tests in Ruby is great, but with the limited team size, it seemed that moving to a language that aligned with the dev team has some potential benefits. While expectations are low that the entire dev team would jump into writing e2e tests, I felt that breaking the language barrier was at least the bare minimum.
Finding opportunities for upskilling my team is always a priority. Always valuable to learn things that are on trend and desirable in the market.

I'll share some more thoughts and quick takes in another post. But so far, Playwright has made things very easy to get started. But as always, automation is easy to start, and difficult to master.

API testing for the win

Phil — Tue, 04 Apr 2023 19:24:42 +0000

Once in a while, depending on your product, you may want to steer towards API tests for a few reasons:

If you're working with a small test team, you'll get a pretty big win out of API tests. They're easier to write than UI tests, and you're only dealing with the backend.
Execution times are faster (most of the time, anyway).
Developers typically don't get involved with UI tests, but there may be a comfort level with API tests that might increase engagement when it comes to adding new tests, troubleshooting failures, and reviewing test results.

Let's get right to it. Overall, testing Restful services has always followed a pretty simple pattern:

Depending on the type of request, there may be a payload to send.
Send the request via rest library.
Assert against the response.

Jest works great as a test runner and assertion library. You can use axios/supertest/request/chai as the rest library (I am personally partial to supertest). Tests are easy to write in Typescript and are (in my opinion) less verbose than using Java. Parsing JSON responses using JS is also a dream. Finally, finding some talent familiar with JS should be fairly easy.

In breaking down the above 3 steps you might want to build out a set of utils/helpers/whatever to help you do them:

You may want to follow a pattern where you can build payloads, massage / transform some of it for your scenarios, and return those payloads so they can be sent off to the backend.
You'll want to abstract away some of the get/patch/post/delete requests into their own functions so that you don't need to worry about auth and tokens.
Unfortunately some backend responses can be massive. So asserting against dozens of keys and values can be a counterproductive exercise. Asserting on the status code, and then maybe some important IDs, and other keys that might be important to business logic might be a good place to start.

Then all of a sudden tests look very, very simple to write like this:

import { createUserPayload } from '../helpers/userPayloadHelper'
import { post } from '../helpers/requestHelper'

describe('users API', () => {
  test('create user', async () => {
    const uri = '/users'
    const user = createUserPayload()
    const response = await post(uri, user)

    expect(response.statusCode).toBe(201)
    expect(response.body.name).toBe(user.name)
    expect(response.body.email).toBe(user.email)
  })
})

You'll find a lot of teams that also go with the whole Postman/Newman route that is pretty popular, but I have pretty strong opinions that maintaining code in the long run will always beat out maintaining a Postman collection. Plus you'll be doing your test team a favor getting them to learn to write code, and API tests follow some pretty simple patterns that are pretty hard to screw up even if you try.

Let's ask ChatGPT about Playwright versus Ruby's Capybara

Phil — Tue, 28 Feb 2023 13:51:50 +0000

I wanted to learn more about Playwright, so I simply asked ChatGPT a series of questions to find out more! See some of the below snapshots for my interesting conversation with the AI bot that is all the rage nowadays.

Wow! ChatGPT puts together what looks to be a valid UI test for a Google Search.

I have pretty basic knowledge of async and promises, but I wasn't super clear on why it might be so prevalent in JS-based UI testing. Maybe I've always oversimplified my take on the complexity of DOM events. Returning a promise object on a mouse click is a little foreign to me.

I have a tendency for code to simply look cleaner. For example I'll always try to refactor away string literals where I can. I thought maybe the await keyword can be abstracted away somehow.

I still wasn't too satisfied so I asked ChatGPT to refactor again.

I went ahead and asked ChatGPT to write the same test using Capybara. It produces exactly what I expect (and with MUCH fewer lines of code!)

And finally I ask for ChatGPT's take on what I perceived as a vast difference... verbosity?

So there you have it! A quick exercise and comparison between Ruby's stack and Playwright, and an interesting assessment on both from ChatGPT.

Using Selenium WebDriver 4.0 BiDirectional API

Phil — Mon, 16 Jan 2023 15:37:07 +0000

I had mentioned in an earlier post that checking on throbbers is critical to ensuring that tests are stepping forward the right way, which avoids flaky tests and headaches.

I ran into a recent case where I was not able to depend on a throbber, newly enabled element, or any other changes in the DOM that would let me know that it was safe to proceed in our test. I did open up a new Jira ticket to address this. After all, if a machine/bot/whatever is able to produce a bug from moving "too fast", then perhaps there are some UX updates that should be made. Ultimately I did expose a race condition that is reproduced under very poor latency conditions.

I decided to look into Selenium's BiDi API. For one, it looks like testing stacks involving Cypress and Playwright are already trying to incorporate more support at the devtools level. So it's good to see that the dinosaur that is Selenium is keeping pace with those newer Node stacks.

I'll be speaking specifically about the Network Interception functionality. To me, that is the one with the most utility in the context of "throbbers and waiters".

From what I understood, it looks as though the BiDi API is designed in such a way that an async function / block is called. From there, any WebDriver actions allow for that function / block to intercept network requests and at a minimum, allow for the developer to output request URLs, responses, response statuses, and even potentially manipulate incoming responses for the browser to act upon. Super cool!

In the end, I wrote a helper method to ultimately do what I needed it to do: Do a thing in the UI, make sure that the frontend is sending a specific network request, and making sure that the browser is getting a 200 back. So without further ado:

  def wait_on_request
    @request_found = false
    page.driver.browser.intercept do |request, &continue|
      if request.method == 'GET' && request.url.match?(/tables/)
        puts 'Checking for OK response code for %s' % request.url
        continue.call(request) do |response|
          response.code.should == 200
          @request_found = true
          puts 'OK response code found'
        end
      else
        continue.call(request)
      end
    end
    yield
    attempts = 0
    while !@request_found
      puts 'Waiting for request to be sent by frontend'
      sleep 3
      attempts += 1
      fail 'Request was not sent by the frontend within 30 seconds' if attempts == 10
    end
  end

In this block of code, our app under test uses various REST endpoints that include the path /table and is hard coded here to account for that request. We are using the yield keyword that allows passing of another block that is expected to trigger the request. The while loop is triggered to "monitor" whether the request was actually sent. Tests are designed to fail if the request is never sent by the frontend, or if the request returns anything other than a 200.

So there you have it. It is very, very cool seeing the advancements in WebDriver over just the past year especially since the release of Selenium 4.0. Historically, they were always such a stickler to "we only support things that a user in a browser would do", but are now bending more towards "we support things that a developer in a browser would do". I'm excited to to see what's in store for future releases.

Something I found last minute: Unfortunately this is not supported by AWS Device Farm and it's not clear when it will ever be. This means that this is only supported locally, or for any remote/grid servers that have built-in support for it. I happen to run into the error DevTools is not supported by the Remote Server. If anyone has any solutions or workarounds for this, I'm all ears!

Improve your testing resume with these tips (with real life examples)

Phil — Wed, 21 Dec 2022 19:06:18 +0000

If you survey anyone that spends time screening resumes and ask them how long they spend reviewing each one, nearly every response will be seconds, not minutes. Avoid having your resume stand out for the wrong reasons. I will illustrate a few tips, what to avoid, and why. Trigger warning for those who unknowingly follow some of these blunders (and stand in defense of them).

Resume length

When your resume is like reading War and Peace

If you have a multi-page resume with less than 8 years of experience, look to shorten it to 1 page. Portraying your career across multiple pages demonstrates an inability to succinctly communicate what you have to offer. More often than not, multi-page resumes are far too verbose, and much of the content ends up repetitive.

Formatting and other problems unrelated to content

How bold of you to bold every other word

Avoid constant bolding or italicizing of random terms throughout your resume.
Avoid capitalizing words for no reason, like "Test Plan".
Do capitalize things that make sense, like "Selenium WebDriver" or "Jira".
Typos will show a true lack of attention to detail (probably the most important trait for a testing role) so inspect for those closely.

Bad content

This is just painful to see

Avoid describing experience with something like "Good experience with..." or "Good knowledge of...". There are so many other adjectives other than "good".
Get into the job history as soon as possible. A very lengthy list of "skills" bullet points (or humongous table!) will always get skipped over. On the other hand, a quick, eye catching 2-3 liner summary (that avoids sounding too generic) is something I personally like reading if it describes their story in a unique way.
Maybe this is a cultural thing, but it is not necessary to see what the candidate looks like. Avoid inserting a photograph on your resume.
And finally, if you don't know how to code, please don't lie about it.

I promise you that if you follow some of these tips, your resume will be in the top 5% of resumes that get in front of hiring managers. The rest of 'em are just tossed into the virtual trash bin.