Tommaso Ruscica for Subito

Posted on Nov 12 with Alessandro Grosselle

How We Catch UI Bugs Early with Visual Regression Testing

#playwright #testing #ci #automation

Visual regression testing is a powerful technique to ensure our web application looks as expected, even as code changes over time. In this article, we walk through what a visual regression testing tool is, why it’s important, and how we implemented it in our CI/CD pipeline using Playwright, GitHub Actions, and Git LFS at subito.it, Italy’s leading online classifieds platform.

What Is a Visual Regression Testing Tool?

A visual regression testing tool automatically detects changes in the visual appearance of an application or website. It works by taking screenshots of pages or components and comparing them to previously approved "baseline" images. If any unexpected differences are found, the tool flags them for review.

Why Do You Need It?

CSS and layout changes can have unintended side effects, breaking parts of the UI in subtle ways. For example:

A global CSS tweak makes buttons unreadable.
A new font or asset isn’t loaded correctly.
A component’s layout shifts, breaking alignment.

Visual regression testing helps us:

Catch these issues early, before they reach production.
Document intentional UI changes over time.
Maintain a consistent user experience.

While this tool seems theoretically useful for supporting delivery, let’s share some real examples from our experience to help you better understand the value of this additional tool.

Example 1: Login modal, Submit button not visible

Our "login in place" feature is designed not to show the vertical scrollbar.
A recent change caused the submit button to be pushed off-screen, making it inaccessible to users (See the left image below).

This login modal is crucial for us at subito.it because it allows users to log in without leaving the current page after an action that requires authentication, such as adding an ad to favorites.
Thanks to our visual regression testing tool, we caught this error.

Both unit tests (using Jest) and integration tests (via Playwright) were still passing because they were both able to click the button programmatically.

Example 2: Forgotten font import in the Home Page

During a recent update to our Home Page, a font import was accidentally omitted from the CSS.
Thanks to visual regression testing, we caught the issue before it went live. The screenshots below show the difference:

You can see how the text looks different due to the missing font, which would have negatively impacted user experience.

Example 3: Bug caused by CSS import reordering in Next.js

Recently, we introduced an ESLint rule to enforce grouping and alphabetical ordering of imports in our files.
The related PR was quite large, and we overlooked the fact that CSS import order matters in Next.js (see docs).
As a result, two CSS rules with the same specificity ended up being swapped in the final generated CSS.

At first glance, the bug was tricky to understand; it even seemed flaky when reviewing the updated snapshot PR:

However, by inspecting the HTML diff in the Playwright report, we noticed that the footer layout had changed because Next.js bundled the CSS rules in a different order:

Example 4: Updated carousel component for Mobile site causing an unintentional change on Desktop

In another case, we updated a carousel component to improve mobile usability. However, this change inadvertently affected the desktop carousel, changing the cards’ dimensions.

We noticed the issue because the PR created by the visual regression tool highlighted differences in the desktop view as well. We expected only one file to be changed (the mobile screen), but the PR showed two files changed.

Upon investigation, we found that the desktop carousel cards were now taller than before:

How Did We Implement Visual Regression Testing?

Using Playwright and its visual comparison feature, we implemented visual tests for our pages.
For example, here is a test for our Login page:

import { expect, test } from '@playwright/test';

test('@only-visual Login', async ({ page }) => {
  test.slow();
  await page.goto('/login_form');
  await page.waitForLoadState('networkidle');
  await page.getByText('Accedi con Google').waitFor({ state: 'visible' });

  // Take a screenshot of the page
  const screenshot = await page.screenshot();

  // Compare the screenshot with the baseline
  expect(screenshot).toMatchSnapshot('login.png', {
    maxDiffPixelRatio: 0.01,
  });
});

When implementing visual tests, you will likely need to fine tune for variables that can invalidate the visual comparison but are outside your control, such as advertising banners, marketing promos, or client-side calls like "recommended ads".
We adopted two approaches:

If a component is outside of our control and not part of the core product experience (for example, a marketing banner), we chose to temporarily hide it during visual testing:

await page.addStyleTag({
  content: `
    #sticky-cta-container { display: none !important; }
    .sticky-cta-bottom-anchor { display: none !important; }
  `,
});

When the component’s size or layout was relevant, or when we wanted to display something in its place, we used Playwright's mask option.
For example, we masked the Google Maps widget and replaced it with a simple placeholder square.

const googleIframe = page.locator('iframe');
const yatmoMapIframe = page.locator('#map');
const screenshot = await page.locator('#layout').screenshot({
  animations: 'disabled',
  mask: [googleIframe, yatmoMapIframe],
});

We also discovered a few additional tips that helped improve the reliability of our visual tests:

Blocking Google Tag Manager (or similar scripts) prevents external resources from being fetched during tests, ensuring consistent screenshots across runs.

  // Block Google Tag Manager to avoid loading external resources
  await page.route(/\/gtm.js/, (route) => route.abort());

Another interesting case we encountered involves images using the "lazy" loading attribute.
Because these images load asynchronously, they can cause flaky results.

Here’s an example of what that looks like:

We implemented this helper to override the loading HTML attribute:

import { Page } from '@playwright/test';

export async function forceLoadLazyImages(page: Page): Promise<void> {
  return page.evaluate(() => {
    for (const image of document.querySelectorAll<HTMLImageElement>(
      'img[loading="lazy"]'
    )) {
      image.setAttribute('loading', 'eager');
    }
  });
}

The GitHub Action

To automate our visual regression testing workflow, we use a GitHub Action; it runs automatically whenever a pull request is merged into the main branch.

The workflow performs a full end-to-end process, made of these key steps:

Run visual regression tests: the action launches Playwright. If any snapshot doesn’t match the baseline, the job flags that an update is needed.
Save test results for reporting: regardless of the outcome, all test reports are collected and stored as build artifacts. This allows merging results from all shards later into a single, comprehensive HTML report.
Update snapshots when differences are detected: when visual mismatches are found, Playwright re-runs in update mode (--update-snapshots), refreshing only the changed images.
Identify and upload modified snapshots: the Action inspects the Git diff to identify exactly which .png files changed.
Merge all reports into a single HTML summary: thanks to Playwright’s merge-reports command, all blob reports from multiple shards are combined into one HTML report. The final report can be downloaded directly from the workflow artifacts and provides a clear visual summary of all changes.
Open a PR with updated snapshots: once all changes are ready, the workflow automatically creates a pull request containing only the modified snapshots.
Notify the author via Slack: finally, the Action sends a Slack notification to the author of the merged PR.

A note on Git LFS

Visual regression testing involves large binary files (mostly .png images). To keep the repository lightweight and fast to clone, we use Git LFS (Large File Storage), which handles these files efficiently without bloating the main Git history.

If you want to explore the complete YAML configuration, including all commands and conditions for each step, you can check it out here:
Full GitHub Action workflow on GitHub

Conclusions

At subito.it we have developed a robust testing strategy:

Unit tests for our components using Testing Library and Jest.
Integration tests for all main user flows, avoiding mocks for backend services except for external providers.
Recently, we added visual regression tests to support style and color updates, focusing on the most important pages and cases like the in-place login.

Our internal process is simple: if an incident occurs, during the post-mortem we ask, "Could this have been prevented with a visual test?" If the answer is yes, we add a new visual test to our suite.

If you are wondering whether we also take snapshots for single components, for now the answer is no, we only do this for entire pages. "Component level" snapshots are on our backlog, and we will likely use another tool for that (spoiler: Storybook).

Top comments (7)

Alberto De Agostini • Nov 14

Nice one!
Thanks for sharing, I really liked especially some of the tips, like the hide out-of-control components and using masks.

I was wondering, do you run this type of tests also locally? If so, did you encounter issues with slight differences across different machines / OS?

Ilya Borisov • Nov 16 • Edited

You can run VR tests locally, but it's a huge headache. Running in CI-only also has its own drawback, like longer regression feedback loop or having to match your company dev flow.

Here's what I tried:

Storing screenshots alongside tests, without specifying what environment [0] they run in. My team quickly discovered all the usual ways screenshots come out different under different conditions and how most of pixel diffing solutions are not sufficiently robust. After having enough of tweaking of comparison tolerance so diffs run locally and on CI, we scrapped the approach.
In my personal project, I commit screenshots for Windows (run locally, pretty stable) and Linux (always made in same OS inside container). That kinda-sorta works, but is cumbersome. Probably won't scale well for several devs on different OSes.
For my current project at work, we settled on not committing screenshots. Instead, we start two dev environments - one for dev branch, one for master, make reference images from master and compare with branch. It works and is easy to run, but has a unique set of drawbacks: tests have to remain compatible between two git refs, i.e. stable markup/selectors; this also takes twice as long to run. Nevertheless, this approach provides plenty of utility and catches VR bugs that nobody would even notice.

Personally, my holy grail for VR would be to commit screenshots from whatever env devs have and use an AI to pick out differences, with a way to tell it "ignore OS chrome changes" and such. So far, nothing of the sort emerged.

[0] By environment, I mean everything that affects image output. That includes fun stuff like browser vendor, browser version, OS, OS version, locale, light/dark OS theme, what fonts you have installed, font rendering settings (ClearType tuning), font stack, GPU model, GPU driver version. This is arguably the worst aspect of VR and why they aren't as widespread.

Alessandro Grosselle Subito • Nov 14

I was wondering, do you run this type of tests also locally? If so, did you encounter issues with slight differences across different machines / OS?

No, we only run them in the CI and haven’t encountered any issues there, it works reliably.

Running them locally isn’t really worth it, because you do get small visual differences across machines and OS.

Alberto De Agostini • Nov 14

Makes sense, thanks for the reply and thanks again for sharing 🚀

Murali Srinivasan • Nov 15

Does it support running regression in Different Browsers?

Alessandro Grosselle Subito • Nov 15

Sure!!
You can configure playwright config to run in different devices.

Currently we are using: Desktop Chrome', 'Desktop Safari', 'Android Chrome', 'iOS Safari'

Some comments may only be visible to logged-in visitors. Sign in to view all comments.