Building BlazeDiff: How I Made The Fastest Image Diff up-to 60% Faster with Block-Level Optimization

#performance #javascript #typescript #frontend

From analyzing pixelmatch's bottlenecks to creating a faster algorithm with zero allocations and dynamic block sizing.

The Spark: Visual Testing Performance Pain

It started during a usual day of visual regression testing. I was watching a CI pipeline going through hundreds of screenshot comparisons, each one taking precious seconds.

I was using pixelmatch: the gold standard for pixel-level image comparison in JavaScript. It's an excellent library that's served the community well for years. But as my test suite grew and image resolutions increased, those milliseconds started adding up to minutes. I thought: "There has to be a better way."

Diving Deep: Analyzing pixelmatch's Architecture

Before jumping into optimization, I needed to understand what pixelmatch was actually doing. I dove into the source code and found a beautifully simple algorithm:

// Simplified pixelmatch flow
function pixelmatch(img1, img2, output, width, height, options) {
  // 1. Check if images are completely identical (fast path)
  const len = width * height;
  const a32 = new Uint32Array(img1.buffer, img1.byteOffset, len);
  const b32 = new Uint32Array(img2.buffer, img2.byteOffset, len);

  let identical = true;
  for (let i = 0; i < len; i++) {
    if (a32[i] !== b32[i]) { 
      identical = false; 
      break; 
    }
  }

  if (identical) return 0; // Early exit

  // 2. Pixel-by-pixel comparison using YIQ color space
  for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
      // Complex color difference calculation...
    }
  }
}

Pixelmatch had a clever optimization for completely identical images. It would quickly scan through 32-bit chunks and exit early. But what about partially identical images?

In real-world visual testing scenarios:

Screenshots often have large unchanged regions (headers, sidebars, backgrounds)
Only small portions typically change (content areas, buttons, text)
We were still processing every single pixel even when 80% of the image was identical

The "Aha!" Moment: Coarse-to-Fine Processing

The breakthrough came when I realized we could apply the "identical check" concept at a block level, rather than just the entire image.

What if we could:

Divide the image into blocks (16x16, 32x32, etc.)
Quickly identify which blocks are identical using the same 32-bit comparison trick
Skip pixel-level processing entirely for identical blocks
Only do the expensive YIQ color analysis on blocks that actually changed

This is a classic coarse-to-fine approach used in computer vision, but I hadn't seen it applied to web-based image diffing libraries.

From Idea to Implementation

Challenge #1: Dynamic Block Sizing

Fixed block sizes would be suboptimal – small images need fine granularity, large images can use bigger blocks for better cache performance.

function calculateOptimalBlockSize(width: number, height: number): number {
  const area = width * height;

  // Block size grows roughly with the square root of the image area
  // Scale factor chosen to match your thresholds
  const base = 16;
  const scale = Math.sqrt(area) / 100; // 100 is a tuning constant

  // Round to nearest power-of-two block size
  const rawSize = base * Math.pow(scale, 0.5);
  return Math.pow(2, Math.round(Math.log2(rawSize)));
}

Challenge #2: Efficient Memory Management

Initial prototypes were actually slower than pixelmatch because I was creating dynamic arrays and objects to store block information. The allocation overhead was killing performance.

The solution? Store changed coordinates only and preprocess blocks:

// Instead of storing full changed blocks in arrays...
const changedBlocks = [];
const identicalBlocks = [];

// Allocate memory for a fast Int32Array...
const maxBlocks = Math.ceil(width / 8) * Math.ceil(height / 8); // worst case
  const changedBlockCoords = new Int32Array(maxBlocks * 4); // x,y,endX,endY
let changedCount = 0;

// Preprocess blocks...
for (let by = 0; by < blocksY; by++) {
  for (let bx = 0; bx < blocksX; bx++) {
    if (blockIsIdentical(startX, startY, endX, endY)) {
      processIdenticalBlock(); // Draw gray pixels immediately
    } else {
      processChangedBlock();   // Add to changedBlockCoords
    }
  }
}

// Process only changed blocks...
for (let blockIdx = 0; blockIdx < changedCount; blockIdx++) {

}

Challenge #3: Maintaining Pixel-Perfect Accuracy

The block optimization couldn't change the final result – it had to produce identical output to pixelmatch. This meant:

Same YIQ color space calculations
Same anti-aliasing detection
Same output pixel colors
Only the processing order could change

Going Live

After solving the core algorithmic challenges, I released BlazeDiff as MIT licensed with a minimal ecosystem:

@blazediff/core - The core algorithm
@blazediff/pngjs-transformer - PNG support using pngjs
@blazediff/sharp-transformer - High-performance image processing with Sharp
@blazediff/bin - Transformers + core algorithm via CLI or programmatically

This modular approach means you only install what you need, keeping bundle sizes minimal.

Built for Today's Ecosystem

While analyzing pixelmatch, I noticed it was built with the tooling standards of its era. Don't get me wrong – it works perfectly. But I saw an opportunity to leverage modern development practices that could improve both developer experience and performance.

pixelmatch approach:

JavaScript with JSDoc comments
npm for package management
Basic build setup
Single package architecture

BlazeDiff approach:

TypeScript for type safety and better DX
pnpm for efficient dependency management
tsup for lightning-fast builds
Monorepo architecture with multiple focused packages

Why does it matter?

Crystal clear APIs and complete IntelliSense with TypeScript
TypeScript's compile-time safety prevents common mistakes
Bundle size wins with composable monorepo packages
- @blazediff/core is only 8.59 kB compared to pixelmatch's 19.4 kB

The Results: 20-60% Performance Improvement

After weeks of optimization and benchmarking:

Around 20-25% boost when using @blazediff/core package
Around 60-99% boost when using @blazediff/bin with @blazediff/sharp-transformer.

You can find benchmark results in the GitHub Action of the BlazeDiff repository.

Top comments (4)

Faradzh Musaev • Sep 2

This is just awesome! Love how you thought "there's gotta be a better way" and actually made it happen. The block idea is genius - why check every single pixel when most of the image probably hasn't changed? Those performance gains are incredible, especially for CI pipelines. Nice work putting this together!