DEV Community

Cover image for Automating Visual Testing With Screenshots
APIVerve
APIVerve

Posted on • Originally published at blog.apiverve.com

Automating Visual Testing With Screenshots

Your unit tests pass. Your integration tests pass. Your end-to-end tests pass. You deploy with confidence.

Then someone messages you: "Why is the checkout button invisible?" Or: "The hero image overlaps the navigation." Or: "Everything is in Times New Roman now."

Visual bugs slip through traditional testing because traditional tests don't look at your application — they interrogate it. They check that elements exist, that functions return values, that APIs respond correctly. They don't notice that your CSS changes made the submit button white on white.

Screenshot-based testing catches what code-based testing misses: the actual visual output that users see.

The Problem With Visual Bugs

Visual bugs are uniquely frustrating because they're obvious to humans and invisible to automated tests.

Consider this scenario: a developer updates a CSS file. The change looks fine on their machine, passes code review (CSS diffs are hard to evaluate), and deploys to production. Later, someone notices that on mobile devices, the "Buy Now" button is scrolled off-screen because a padding value broke responsive layout.

The functional tests all passed — the button existed, the click handler was attached, the checkout flow worked. But real users on real phones couldn't complete purchases because they couldn't see the button.

This isn't hypothetical. I've seen visual bugs:

  • Hide important CTAs
  • Make text unreadable (wrong color contrast)
  • Break layouts on specific browser versions
  • Display wrong images or no images
  • Show overlapping elements that cover each other

Each of these passed functional tests. Each cost real money in lost conversions or support tickets.

Screenshot Testing: The Concept

The core idea is simple: take a picture of your application, compare it to a known-good picture, and flag differences.

Baseline screenshots capture your application in a known-good state. These are the reference point.

Test screenshots capture your application after changes — in a PR, after deployment, on a schedule.

Comparison overlays the two images pixel by pixel, highlighting differences. Humans review the differences to determine if they're intentional (you changed the design) or bugs (something broke).

This approach catches an entire category of bugs that code-based tests miss. If it's visible on screen, screenshot testing can detect changes to it.

Capturing Screenshots

The first step is reliably capturing screenshots of your pages.

async function captureScreenshot(url, options = {}) {
  const response = await fetch('https://api.apiverve.com/v1/websitescreenshot', {
    method: 'POST',
    headers: {
      'x-api-key': API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      url: url,
      width: options.width || 1280,
      height: options.height || 800,
      fullpage: options.fullPage || false,
      type: 'png'
    })
  });

  const { data } = await response.json();
  return data.downloadURL; // URL to download the screenshot
}
Enter fullscreen mode Exit fullscreen mode

This gives you a PNG image of the rendered page. But single screenshots aren't enough for comprehensive testing.

What to Screenshot

You can't screenshot everything. Pages have too many states, devices have too many sizes, and comparison has real costs. Focus on high-value targets.

Critical pages. Homepage, pricing, checkout, login, core dashboard views. Pages where visual bugs directly cost money or damage user experience.

Component states. Buttons in hover/active/disabled states. Forms with validation errors showing. Modals open. Dropdowns expanded. Toast notifications visible.

Responsive breakpoints. Desktop (1280px), tablet (768px), mobile (375px) at minimum. More if your design has additional breakpoints.

Browser variations. If you support multiple browsers, capture in each. Safari renders differently than Chrome. Firefox has its own quirks.

For a typical web application, this might mean 20-50 screenshot combinations. That sounds like a lot, but automation handles it.

The Comparison Process

Pixel-perfect comparison rarely works. Browsers render slightly differently between runs. Anti-aliasing varies. Dynamic content changes.

Instead, use fuzzy comparison that tolerates small differences and flags significant changes:

const pixelmatch = require('pixelmatch');
const { PNG } = require('pngjs');

function compareScreenshots(baseline, current, threshold = 0.1) {
  const baselinePng = PNG.sync.read(Buffer.from(baseline, 'base64'));
  const currentPng = PNG.sync.read(Buffer.from(current, 'base64'));

  const { width, height } = baselinePng;
  const diff = new PNG({ width, height });

  const mismatchedPixels = pixelmatch(
    baselinePng.data,
    currentPng.data,
    diff.data,
    width,
    height,
    { threshold }  // 0-1, higher = more tolerant
  );

  const mismatchPercent = (mismatchedPixels / (width * height)) * 100;

  return {
    matched: mismatchPercent < 0.1,  // Less than 0.1% difference
    mismatchPercent: mismatchPercent.toFixed(2),
    diffImage: diff  // Visual diff highlighting changes
  };
}
Enter fullscreen mode Exit fullscreen mode

The diff image highlights changed pixels, making it easy for humans to see exactly what changed.

Handling False Positives

Screenshot testing has noise. Sources of false positives include:

Dynamic content. Timestamps, personalized greetings, "Last updated 3 minutes ago." These change between screenshots without representing bugs.

Advertisements. Third-party ad content varies by request. Comparing pages with ads produces constant diffs.

Animations. A spinning loader captured mid-spin looks different each time. Animated elements produce random diffs.

Font rendering. Different machines render fonts slightly differently. Anti-aliasing varies.

Third-party widgets. Chat widgets, analytics overlays, and embedded content may load differently.

Strategies for reducing noise:

Mock dynamic content. Freeze timestamps, use fixed "Hello, Test User" greetings, disable animations.

Exclude regions. Mask out areas with known dynamic content from comparison.

Wait for stability. Add delays to ensure fonts load, animations complete, and async content renders.

Control the environment. Capture in Docker containers with fixed fonts and configurations.

CI/CD Integration

The real value of screenshot testing comes from automation. Run visual tests on every PR:

name: Visual Tests
on: [pull_request]

jobs:
  visual-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Start preview environment
        run: npm run preview &

      - name: Wait for server
        run: npx wait-on http://localhost:3000

      - name: Capture and compare screenshots
        run: node scripts/visual-test.js
        env:
          APIVERVE_KEY: ${{ secrets.APIVERVE_KEY }}

      - name: Upload diff images on failure
        if: failure()
        uses: actions/upload-artifact@v3
        with:
          name: visual-diffs
          path: ./diffs/
Enter fullscreen mode Exit fullscreen mode

When visual tests fail, the workflow uploads diff images as artifacts. Reviewers can see exactly what changed without running tests locally.

The Review Workflow

Screenshot testing produces diffs. Someone needs to decide if each diff is a bug or an intentional change.

On PR creation: Capture screenshots, compare to baselines, flag differences.

If differences found: PR is blocked (or flagged, depending on your process). Reviewer examines diffs.

If intentional: Reviewer approves the visual changes. Baselines are updated.

If unintentional: It's a bug. Developer fixes it, new screenshots are captured, cycle repeats.

Updating baselines is a deliberate action. Don't auto-update — that defeats the purpose. Someone should consciously decide "yes, this is how it should look now."

Baseline Management

Where do baseline screenshots live? Options include:

In the repository. Simple, versioned with code, but large images bloat repo size.

In cloud storage. S3, GCS, or similar. Keeps repo clean but adds infrastructure.

In a service. Visual testing services like Percy, Chromatic, or Applitools handle storage and comparison.

For most teams starting out, repository storage works. Modern git handles binary files reasonably well. If image size becomes a problem, migrate to cloud storage later.

Baseline updates should go through code review. A PR that changes baselines is saying "this is the new expected appearance." Reviewers should see and approve those changes.

Responsive Testing

Different screen sizes reveal different bugs. A layout that works on desktop might collapse on mobile.

Test at least three widths:

  • Desktop: 1280px or 1440px
  • Tablet: 768px
  • Mobile: 375px (iPhone SE width)

Capture each critical page at each width. That multiplies your screenshot count but catches responsive bugs.

Browser Testing

Safari renders differently than Chrome. Firefox has its own quirks. Edge has gotten better but still has differences.

If you officially support multiple browsers, test in each. This typically means:

  • Chrome (most users)
  • Safari (critical for iOS/Mac users)
  • Firefox (smaller percentage but dedicated users)

Browser testing is expensive — it multiplies screenshots again. Prioritize based on your actual user browser distribution.

Performance Considerations

Screenshot testing is slower than unit tests. Capturing a page, comparing images, and storing results takes time.

Strategies for speed:

Parallel capture. Screenshots are independent; capture multiple pages simultaneously.

Incremental testing. On file changes, only test pages affected by those changes. A CSS file change might require full testing; a Python backend change might not need visual tests at all.

Selective testing. Not every PR needs full visual testing. Skip for documentation changes, backend-only changes, etc.

Cloud capture. APIs like ours capture in parallel across fast servers. Faster than spinning up local browsers.

Beyond Regression Testing

Screenshot automation has uses beyond testing:

Documentation. Auto-generate screenshots for help docs and README files. Screenshots stay current as the product evolves.

Marketing assets. Capture consistent product screenshots for website, presentations, and ads.

Compliance. Archive what users saw at specific points in time. Useful for regulatory requirements.

Monitoring. Periodically capture production pages and alert on unexpected changes. Detect third-party scripts breaking your layout.

PDF generation. Capture pages as images for PDF reports or printable versions.

When Screenshot Testing Isn't Worth It

Screenshot testing isn't always the right investment.

Early-stage products. If the UI changes daily, baselines are constantly outdated. Testing provides little value when everything is in flux.

Internal tools with few users. If three engineers use an internal dashboard, visual bugs are caught and fixed informally. The overhead isn't justified.

Content-heavy pages. Pages where content changes constantly (news feeds, dashboards) produce constant diffs. Comparison becomes noise.

Limited visual complexity. A command-line tool or API has no visual output to test. Screenshot testing doesn't apply.

Start screenshot testing when your UI has stabilized enough that "it looks like last time" is usually the correct expectation.

Building Confidence

Visual bugs erode user trust. A button that's invisible, text that overlaps, images that don't load — these feel broken in ways that functional bugs don't.

Screenshot testing catches these issues before users do. It's not a replacement for functional testing; it's a complement. Unit tests check logic. Integration tests check flow. Screenshot tests check that the result actually looks right.

Together, they give you confidence that what you deploy actually works — functionally and visually.


Capture any webpage with the Website Screenshot API. Convert HTML directly to images with the HTML to Image API. See your application the way users see it.


Originally published at APIVerve Blog

Top comments (0)