Using AI in Playwright Tests

#ai #playwright #testing #automation

It's been a long while since I've posted anything. Excuse the clickbaity title, but rest assured that this is content that I think will deliver. And this post isn't written by AI!

I'll keep this pretty short and sweet though. This idea was first inspired by the mobile automation framework Maestro. Other than its solid capabilities as a mobile test framework (using YAML of all things), I was impressed to see an API named assertWithAI. Imagine just prompting an LLM with "assert that a blue-colored Login button appears at the bottom". A real game changer in testing if you ask me!

That API is gatekept behind a paid subscription so I didn't experiment much from there. But I was curious to see if Playwright was anywhere close to implementing anything like that, since it is maintained by Microsoft, and Microsoft has its own large stake in OpenAI. At the time of this writing, it did not.

I gained access to an Enterprise version of OpenAI so I decided to experiment with its API. Here is a helpful util that will integrate AI with visual testing.

import OpenAI from 'openai'
import fs from 'fs'
import path from 'path'
import { Page } from '@playwright/test'
import { uniqueId } from './stringHelper'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY as string,
})

export const askAI = async (page: Page, question: string): Promise<string> => {
  const fileName = `screenshot-ai-${uniqueId()}.png`
  const absPath = path.resolve(fileName)
  await page.screenshot({ path: absPath })
  const imageBase64 = fs.readFileSync(absPath, 'base64')
  const dataUrl = `data:image/png;base64,${imageBase64}`

  const instructions = [
    'You are helping test a web page from a screenshot.',
    'Please answer in this format:',
    'First line: YES or NO only.',
    'Second line: One short sentence explaining your reasoning.',
    'If you are not confident, please answer NO.'
  ].join('\n')

  try {
    const response = await openai.responses.create({
      model: 'gpt-4.1-mini', // Cheapest version
      instructions,
      input: [
        {
          role: 'user',
          content: [
            {
              type: 'input_text',
              text: `Question: ${question}`,
            },
            {
              type: 'input_image',
              image_url: dataUrl,
              detail: 'low', // Really skimping on the dollars
            },
          ],
        },
      ],
      temperature: 0, // Always provide a more deterministic answer each time
      max_output_tokens: 128, // Keep responses short & cheap
    })

    const answer = response.output_text?.trim() || ''

    console.log('\n[AI PAGE CHECK]')
    console.log('Question:', question)
    console.log('Full AI answer:\n', answer)
    console.log('Screenshot file:', absPath, '\n')

    return answer
  } catch (err: any) {
    // Check rate limit
    if (err.status === 429) {
      console.log(err)
      throw new Error('OpenAI rate limit reached.')
    }
    throw err
  }
}

export const checkAIResponse = (aiResponse: string, expected: boolean) => {
  const firstLine = aiResponse.trim().split('\n')[0].toLowerCase()
  const aiSaidYes = firstLine.includes('yes')

  // Expected an answer of 'Yes' but AI said 'No'
  if (expected && !aiSaidYes) {
    throw new Error(`Expected YES but got:\n${aiResponse}`)
  }

  // Expected an answer of 'No' but AI said 'Yes'
  if (!expected && aiSaidYes) {
    throw new Error(`Expected NO but got:\n${aiResponse}`)
  }
}

And then writing the tests is really straightforward, powerful, and actually pretty fun:

test('AI confirms a map is visible on the page', async ({ page }) => {
  const mapLink = 'https://www.mysite.com/tracking/b3cd29b39b'
  await page.goto(mapLink)
  await expect(page.locator('[aria-label="Map"]')).toBeVisible()

  const aiAnswer = await askAI(page,
    'Is there a map and a visible plotted route that starts in Ann Arbor and ends in Detroit?'
  )

  await checkAIResponse(aiAnswer, true)
})

Sometimes Playwright snapshots can only take you so far, and they can become flaky if those snapshots are constantly changing. If there are assets in your app that frequently change and are difficult to automate (but easy to visually confirm), then these are the best spots for this kind of AI assist.

DEV Community

Using AI in Playwright Tests

Top comments (0)