Make your end to end tests fast

#webdev #testing #programming #devops

Reflow v4.14.0

If your end-to-end tests are slow, you will either avoid running them, or waste time and delay your release cadence. A fast end-to-end test suite is a valuable asset to your team's productivity.

We released Reflow v4.14.0 today, which reduces our average end-to-end test sequence time by ~70%. We're writing this article to explain how. Reflow is a low-code tool that helps your team develop and maintain resilient end-to-end tests. We use playwright under the hood, so anything we've done can be applied regardless of if you use Reflow or not.

Embarrassingly Parallel Testing

Your end-to-end test suite must run user steps in a flow syncronously, but should run multiple independent flows in parallel.

In reflow, we embrace this with our distributed architecture. Test Composition with Pipelines allows for N servers to be created to handle N user flows: you just need to design your test suites to be able to run in parallel.

Despite this ethos, synchronous code can easily creep in.

For example, the following code handles a progressive test update providing realtime user feedback: taking in an array of updating steps and executing them sequentially.

for (const action of toUpdate) {
  await queryAll<ActionInContextModel, MutationUpdateActionInContextArgs>(client, {
    mutation: shallowUpdateActionInContext,
    variables: {
      input: action,
    },
  });
}

This can trivially be re-written to run in parallel.

await Promise.all(toUpdate.map((action) =>
  queryAll<ActionInContextModel, MutationUpdateActionInContextArgs>(client, {
    mutation: shallowUpdateActionInContext,
    variables: {
      input: action,
    },
  }))
);

Reflow uses AppSync Resolvers with Serverless DynamoDB to power our APIs. This scales up/down as needed, so we see absolutely no negative impact doing more in parallel. We fixed this is v4.12.1, and in v4.14.0 we're enhancing that further by introducing an additional caching layer to reduce S3 object pushes for screenshot images.

Conditional Stability

Reflow learns about your application, and uses that knowledge to keep recorded tests stable. However, keeping recorded tests stable often takes time; from v4.14.0 Reflow now dynamically reduces its stability methods by introducing additional logic to determine if they're necessary or not.

Prior to v4.14.0, this section of code would always wait for the three events after every action: a load event, a networkidle event, and screenshotstable event.

private async pageStable(baselineAction): Promise<void> {
  try {
    await this.page.waitForLoadState('load', { timeout: 30000 });
  } catch (e) {
    logger.verbose(this.test.id, "waitForLoadState('load')", e);
  }
  try {
    await this.page.waitForLoadState('networkidle', { timeout: 5000 });
  } catch (e) {
    logger.verbose(this.test.id, "waitForLoadState('networkidle')", e);
  }

   await this.screenshotStable(baselineAction?.preStepScreenshot?.image);
}

load - wait for the load event to be fired. This event fires when all markup, stylesheet, javascript and all static assets like images and audio have been loaded.
networkidle - wait until there are no network connections for at least 500 ms.
screenshotStable - waits until the screenshot either matches a given baseline screenshot, or the page stops animating as two subsequent screenshots look the same

Unfortunately, if the page is already stable, but has background network requests, this will introduce a 5s delay due to the networkidle state. To avoid this, we now only wait for networkidle when the action historically had a navigation, or when the run is executed with an optional flag to enforce all stability checks.

The lesson here is to only use stability methods when it's truly necessary to do so. Do not add them arbitrarily to your code, but sprinkle them across it when test actions need a stability enhancer.

Reflow will dynamically learn about your application by hooking into DOM and Network events. If you want to spend less effort maintaining your end-to-ends, give it a try.

No more explicit waits

In many companies, when there's an issue with end-to-end stability, adding a await wait(1000) or equivalent to just delay test action execution is a common strategy.

This is a far greater sin than arbitrarily using stability methods. Stability methods will exit early once their stability event fires. If you're feeling lazy, at the very least try a waitForLoadState rather than a wait first.

If you're up for writing a bit more code, try writing a waitUntil clause that explicitly waits for some page state to be set when the page is stable. Whilst reflow supports a wait action, we advise our clients to always use a visual assertion instead. This will wait (up until an arbitrary configurable maximum) until a page element looks like a recorded baseline. If it doesn't reflow can be configured to continue or fail the test.

A few wait(X) statements creep into the reflow codebase over time. It's tempting to add them into new features to lazily increase stability enough to ship the feature. In v4.14.0, we've ruthlessly culled all explicit wait invocations from hot code paths, replacing them with dynamic wait times tuned based on application under test.

Move compute outside hot pathways

Reflow uses a visual comparison algorithm (SSIM weighted Pixel Diff) to compute visual changes. It captures full-height web pages, then compares these to baseline images to compute stability and inform the user of page changes.

This is done entirely on the CPU, and is therefore relatively slow: for a page that's 1080 x 10000px in height we find it can block the main thread for 2-3 seconds. This has been a major performance bottleneck for large applications.

Some of these visual comparisons are done to compute stability, but for those that are for providing visual feedback they don't need to be on the main thread; instead we now return a promise that gets dynamically resolved when needed.

private async screenshotStable(baselineScreenshot: S3ObjectInput | undefined): Promise<{
  diff: Promise<ComparisonModel | undefined>;
  current?: { image: S3ObjectInput };
}> {
/* ... */
}

Move compute into worker threads

By default, all compute in Node.JS is single-threaded. It's usually not worth the effort to build multi-threaded applications: most node.js processes can handle work by distributing it amongst processes, rather than having to deal with threads.

In Reflow, because of the compute-heavy nature of image comparison, we've moved it into a worker thread. This ensures that there's no halting of the application during a visual comparison: all other asyncronous processes (such as realtime uploads of test progress) can be handled in parallel with image comparison.

We did this via the npm threads wrapper and esbuild. We first moved all of our compute code into a new file with minimal imports, called imageCompare.worker.js. We then added a pre-compilation step with esbuild to compile this file into a bundle. We then spawn the worker using this generated file as a blob, and interact with it via the threads promise interface.

import fs from 'fs';
import { expose } from 'threads/worker';
import { isMainThread } from 'worker_threads';

/* ... */

const workerExports = {
  configureWorker,
  compareFiles,
  compareScreenshots,
};

if (!isMainThread) {
  expose(workerExports);
}
export type ImageCompareWorkerExports = typeof workerExports;

import { spawn, BlobWorker } from 'threads';

import type { ImageCompareWorkerExports } from './imageCompare.worker';
import { configureWorker, comparePNG as workerComparePng, cropImage as workerCropImage } from './imageCompare.worker';
import { source as workerBlob } from '../../generated/imageCompare.workerSource';
import logger, { getLevel } from '../logger';

let worker: ImageCompareWorkerExports;

export async function bootImageCompareWorker() {
  try {
    worker = await spawn<ImageCompareWorkerExports>(BlobWorker.fromText(workerBlob));
    configureWorker(getLevel(process?.env?.LOG_LEVEL));
    return worker.configureWorker(getLevel(process?.env?.LOG_LEVEL));
  } catch (e) {
    logger.fatal('Error starting worker', e);
  }
}

export async function compareFiles(imageA: string, imageB: string, outFile: string): Promise<void> {
  return worker.compareFiles(imageA, imageB, outFile);
}

export async function compareScreenshots(preData: Buffer, postData: Buffer, options): Promise<ScreenshotCompareOutput> {
  return worker.compareScreenshots(preData, postData, options);
}

Track your releases, run end-to-end tests once per release

Your end-to-end tests should be focused on catching regressions, not heisenbugs: if they pass once on a release you shouldn't execute them again.

In v4.14.0 we made our first step towards helping QA teams do this: source-map powered release tracking. Reflow now hashes and downloads all source maps associated with a given release (conditionally with a regular expression to filter them to application code) to create a version identifier. This means that it can track when your application is released, and hence allow you to determine if an end-to-end sequence has already been successfully executed on that release.

This is currently opt-in, as we're still working out how to make downloading sourcemaps not affect the performance of a test execution when large numbers of them are exposed. We've built a sourcemap explorer and a simple UI to help guide our customers on what's currently executing within an environment.

TL;DR

Cull all sleep statements: always wait for specific events. Only introduce stability methods when necessary.
Execute your tests in parallel
Intelligently Cache Tests; don't bother executing them more than once per release

If you're feeling adventurous, try reflow: a low-code record/replay test automation SaaS that will try to do all this for you, fast.

Minor caveat: self-host reflow for maximum speed. We use AWS Fargate to spin up per-user ephemeral browser instances, which have ~1 minute cold startup times on the first use of a test recorder.