DEV Community: Neetin Singh Negi

How I Built a High-Performance Browser Image Processing Pipeline with Web Workers and WebAssembly

Neetin Singh Negi — Sun, 05 Jul 2026 11:47:47 +0000

A deep dive into worker pools, zero-copy transfers, SharedArrayBuffer, scheduling, and the engineering decisions behind a browser-native image processing engine.

Introduction

In my previous article, I explained how I replaced an image-processing backend with WebAssembly and moved the entire optimization pipeline into the browser.

Many readers asked the same question afterward:

"How do you process dozens of large images in parallel without freezing the browser?"

The answer isn't WebAssembly.

It isn't libvips.

And surprisingly, it isn't image compression either.

The hardest part of the entire project wasn't image compression—it was building a worker pool that could process large batches efficiently while keeping memory usage under control.

A naïve implementation quickly runs into problems:

Too many workers compete for CPU.
Decoded images consume far more memory than their file size suggests.
Aggressive parallelism can make the browser unresponsive.

This article is a deep dive into how I designed a browser-native processing pipeline using Web Workers, SharedArrayBuffer, task scheduling, and zero-copy memory transfers.

The Problem: Browser-Native Processing Doesn't Scale Automatically

Processing a single image inside the browser is surprisingly straightforward.

Most modern browsers can easily decode an image, run it through a WebAssembly module, and return the optimized result.

The challenge begins when users stop uploading a single image.

Real-world image optimization tools are rarely used one file at a time. More often, users drag an entire folder into the browser and expect dozens of high-resolution images to begin processing immediately.

That's where browser-native processing becomes much more complicated.

Large images may occupy only a few megabytes on disk, but after decoding they can consume hundreds of megabytes of memory. At the same time, image encoding is computationally expensive, and users still expect the interface to remain responsive while progress updates, previews, and downloads continue to work smoothly.

The obvious solution might seem to be creating more Web Workers.

Unfortunately, that usually makes the problem worse.

More workers mean more decoded images in memory, higher CPU contention, additional garbage collection pressure, and an increased risk of exhausting the browser's available heap.

The challenge isn't simply processing images in parallel.

The real challenge is deciding how much work should run simultaneously, which images should run first, and how memory should be managed while everything is happening.

That realization completely changed the architecture of my application.

Instead of building "an image compressor," I ended up building a scheduling system.

Figure 1. Processing one image is easy. Processing large batches requires scheduling, controlled concurrency, and careful memory management.

Why a Single Worker Isn't Enough

Image Optimization tools are rarely used on a single image.More often , users drag and entire folder of photos into the browser and expect everything to be processed at once.

At first glance the solutions seems to be obvious: either process image one by one , or spin up worker for each image . In pratice , neither approach works well in the brwoser.

Let's look at both extreme and why we need something smarter.

Figure 2. Neither sequential processing nor unlimited parallelism scales well. Efficient browser-native image processing requires controlled concurrency through a worker pool and intelligent task scheduling.

Overall Pipeline

Once I realized that browser-native image processing was really a scheduling problem rather than a compression problem, the overall architecture became much clearer.

Instead of sending every uploaded image directly to a worker, each image moves through a series of stages designed to maximize throughput while keeping memory usage predictable and the browser responsive.

The complete pipeline looks like this.

Figure 3. Every uploaded image passes through a scheduler before reaching the worker pool. This allows the application to control concurrency, minimize memory pressure, and process images efficiently without blocking the main thread.

Images

Every uploaded image is first converted into a task and placed into a processing queue. Rather than immediately assigning work to a Web Worker, the application waits until resources are available.

This small design decision gives the scheduler complete control over the workload.

Task Scheduler

The scheduler acts as the brain of the entire system.

Instead of simply processing images in the order they arrive, it decides:

Which image should run next.
Which worker should receive the task.
How many images can safely run in parallel.
Whether heavy images should wait while smaller images complete first.

This prevents a handful of very large images from blocking an entire batch.

Worker Pool

Once a task is selected, it is assigned to an available worker from a fixed-size worker pool.

Each worker runs independently on its own thread, allowing multiple images to be processed simultaneously without blocking the browser's main UI thread.

Because the workers are reused, the expensive WebAssembly runtime only needs to be initialized once per worker instead of once per image.

Because workers are long-lived, those expensive startup costs are paid once instead of once per image. This allows the scheduler to dispatch new tasks almost immediately, rather than repeatedly downloading, initializing, and configuring the WebAssembly runtime.

Rather than creating and destroying workers continuously, the scheduler simply assigns new tasks to workers that have become idle.

SharedArrayBuffer

Large image buffers are transferred efficiently between JavaScript and WebAssembly using shared memory.

Reducing unnecessary allocations keeps memory usage stable and significantly lowers garbage collection pressure during large batch operations.

WebAssembly + libvips

This is where the heavy work happens.

A WebAssembly build of libvips performs decoding, resizing, compression, format conversion, and encoding directly inside the browser.

The processing engine is the same class of native library commonly used on backend servers—except it's now running entirely on the client.

Output

Once processing finishes, the optimized image is returned to the React application, where users can preview, download, or optionally upload it to cloud storage.

At no point does the image need to pass through a backend server.

This architecture shifts the browser from being a simple user interface into a complete image-processing runtime.

Designing the Worker Pool

Building a worker pool sounds straightforward until you start thinking about everything that can go wrong.

Workers aren't simply "running" or "idle."

In production they constantly move between different states.

Figure 4. The scheduler continuously monitors worker availability and assigns new tasks only to idle workers, ensuring efficient resource utilization without oversubscribing the browser.

Each worker can be:

Idle – waiting for work.
Busy – actively processing an image.
Timed Out – taking longer than expected.
Failed – encountered an unexpected runtime error.
Restarting – being recreated after a failure.

Managing these state transitions turned out to be just as important as image compression itself.

Instead of creating a new Web Worker for every uploaded image, I initialize a fixed-size pool when the application starts.

Those workers stay alive for the lifetime of the session and continuously receive new tasks from the scheduler.

This approach has several advantages:

The WebAssembly runtime is loaded only once per worker.
Memory allocations are reused instead of recreated.
Worker startup overhead disappears after initialization.
Browser resources remain predictable even for very large batches.

Assigning Work

Whenever a worker finishes processing an image, it immediately requests another task from the scheduler.

The scheduler simply finds the next available image and dispatches it to the newly idle worker.

That continuous cycle keeps every worker busy without overwhelming the browser.

Because only idle workers receive new work, concurrency always remains under control regardless of how many images users upload.

Handling Failures

Production systems need to assume that failures will happen.

A corrupted image, an unexpected WebAssembly error, or a browser limitation should never stall the entire pipeline.

Each task is therefore assigned a timeout.

If a worker stops responding:

The task is marked as failed.
The worker is terminated.
A replacement worker is created.
Remaining tasks continue processing normally.

This fault-tolerant design prevents a single bad image from affecting the rest of the batch.

Zero-Copy Transfers

Once multiple workers were processing images in parallel, another performance problem became obvious.

Moving large image buffers between the main thread and workers wasn't free.

Every unnecessary memory copy increases allocation pressure, consumes additional RAM, and creates more work for the browser's garbage collector. For multi-megabyte images, those costs add up surprisingly quickly.

Instead of copying image data into a worker, I transfer ownership of the underlying ArrayBuffer.

private assignTask(slot: WorkerSlot, task: TaskRecord): void {
  if (slot.dead) {
    this.taskQueue.unshift(task);
    return;
  }

  slot.currentTaskId = task.id;

  if (slot.timeoutId) clearTimeout(slot.timeoutId);

  slot.timeoutId = setTimeout(() => {
    this.failSlot(
      slot,
      new Error("Local processing timed out. Try a smaller image or reload the page."),
    );
  }, task.timeoutMs);

  try {
    // Zero-copy transfer of ArrayBuffer into the worker.
    slot.worker.postMessage(task.request, [task.request.buffer]);
  } catch (error) {
    this.failSlot(slot, error instanceof Error ? error : new Error("Failed"));
  }
}

The second argument to postMessage() transfers ownership of the buffer rather than creating a duplicate copy.

For large batches, this significantly reduces memory usage and improves responsiveness.

Figure 5. Instead of copying image data between threads, ownership of the ArrayBuffer is transferred directly to the worker, eliminating unnecessary memory allocations.

Why SharedArrayBuffer Matters

Passing messages between workers is straightforward.

Sharing memory between workers is considerably more powerful.

Without shared memory, every worker maintains its own independent allocations, which quickly increases overall memory consumption during large batch processing.

By enabling SharedArrayBuffer, JavaScript and the WebAssembly runtime can coordinate through a shared memory region instead of constantly allocating new buffers.

This reduces allocation overhead and allows the WebAssembly runtime to reuse memory much more efficiently.

The trade-off is deployment complexity.

Browsers only expose SharedArrayBuffer when the page is running in a Cross-Origin Isolated environment.

That requires enabling both:

Cross-Origin-Opener-Policy (COOP)
Cross-Origin-Embedder-Policy (COEP)

Without those headers, shared memory is disabled entirely, regardless of how the application is written.

Loading WebAssembly Only Once

Initializing a WebAssembly runtime is surprisingly expensive. Before a single image can be processed, the browser needs to download the module, instantiate the runtime, configure memory, detect runtime capabilities, and initialize libvips.

If every worker repeated that process for every task, startup latency would quickly dominate the overall processing time.

Instead, I lazily initialize the runtime and cache the resulting promise. The first task performs the initialization, while every subsequent task simply waits for the same promise to resolve. This ensures that WebAssembly is loaded only once per worker, regardless of how many images are processed.

let vipsPromise: Promise<VipsRuntime> | undefined;

async function getVips(): Promise<VipsRuntime> {
  if (vipsPromise) return vipsPromise;

  vipsPromise = (async () => {
    const memory = getSharedWasmMemory();
    const vipsEs6Url = `${ORIGIN}/wasm-vips/vips-es6.js`;

    const mod = await import(
      /* webpackIgnore: true */
      vipsEs6Url
    );

    const factory =
      (mod as { default?: unknown }).default ?? mod;

    if (supportsSimd()) {
      console.info("SIMD supported.");
    } else {
      console.warn("SIMD unavailable.");
    }

    const vips = await factory({
      wasmMemory: memory,
      mainScriptUrlOrBlob: vipsEs6Url,
      locateFile: (file) => `${ORIGIN}/wasm-vips/${file}`,
    });

    vips.Cache?.maxMem?.(WASM_HEAP_MAX_BYTES);

    return vips;
  })();

  return vipsPromise;
}

Although the implementation is relatively small, it encapsulates several important performance optimizations:

Lazy initialization ensures the runtime is created only when it's actually needed.
Promise caching guarantees that multiple requests share the same initialization instead of creating duplicate WebAssembly instances.
SharedArrayBuffer-backed memory allows the runtime to work with shared memory instead of allocating separate heaps.
Dynamic imports keep the initial application bundle smaller by loading the WebAssembly runtime only when image processing begins.
SIMD detection enables browsers with SIMD support to automatically take advantage of additional CPU instructions for faster image processing.

This initialization happens only once, but it has a significant impact on the overall user experience. By avoiding repeated runtime creation, the application can immediately begin processing the next image instead of repeatedly paying the cost of setting up WebAssembly.

The Real Bottleneck: Memory

When I started this project, I assumed CPU performance would be the biggest challenge.

Image compression is computationally expensive, so I expected most of my time would be spent optimizing encoder settings and reducing processing time.

I was wrong.

The real bottleneck wasn't CPU—it was memory.

A JPEG that occupies only 10 MB on disk may require 200–300 MB of memory once it's decoded for processing.

That changes the problem completely.

Processing one image is usually straightforward.

Processing ten large images simultaneously can consume several gigabytes of memory surprisingly quickly.

This is where many browser-native image processing experiments begin to fail.

An aggressive worker pool might keep every CPU core busy, but it also increases:

Browser heap usage
Garbage collection pressure
Memory fragmentation
Risk of exhausting available memory

Eventually, the browser spends more time reclaiming memory than processing images.

Ironically, adding more workers can make the application slower instead of faster.

That realization completely changed my priorities.

Instead of maximizing throughput at all costs, I focused on keeping memory usage predictable.

Stable performance turned out to be far more valuable than maximum parallelism.

Figure 6. Compressed files are relatively small, but decoding them dramatically increases memory usage. Managing decoded images efficiently became the primary engineering challenge.

Why FIFO Scheduling Wasn't Good Enough

Once memory became the primary constraint, the scheduler became the most important component of the system.

A simple first-in, first-out queue sounds reasonable.

Until someone uploads twenty images where the first file is a 300 MB panorama.

Every smaller image waits behind that single task.

The browser appears frozen even though workers are available.

Instead, the scheduler estimates workload and separates tasks into different queues.

Small images finish quickly, giving users immediate feedback, while larger images continue processing in the background.

The scheduler also adjusts concurrency based on available hardware, ensuring that lower-powered devices aren't overwhelmed while more capable machines can process additional work in parallel.

This simple change dramatically improved perceived performance.

Users no longer had to wait for the largest image before seeing progress.

Instead, optimized images begin appearing almost immediately, making the application feel significantly faster even when total processing time remains similar.

Fault Recovery

Building a fast processing pipeline is only half the problem. It also needs to recover gracefully when something goes wrong.

In practice, workers don't always complete successfully. A corrupted image, an unexpected runtime error, or even a browser-specific issue can leave a worker stuck indefinitely. If a single worker hangs, it can stall the entire processing queue.

Failures are inevitable. Hanging forever isn't.

To prevent that, every task is assigned a timeout when it's dispatched to a worker.

If the timeout expires before the worker returns a result, the scheduler assumes the worker is no longer healthy. The task is marked as failed, the worker is recycled, and a fresh worker takes its place.

This ensures that one problematic image doesn't block every other image waiting in the queue.

slot.timeoutId = setTimeout(() => {
  this.failSlot(
    slot,
    new Error(
      "Local processing timed out. Try a smaller image or reload the page."
    )
  );
}, task.timeoutMs);

The implementation itself is straightforward, but the impact on reliability is significant. The recovery strategy follows four simple steps:

Detect stalled workers with configurable timeouts.
Remove unhealthy workers from the pool.
Create replacement workers automatically.
Continue processing the remaining tasks.

This approach favors reliability over maximum throughput. In a browser environment, keeping the application responsive is often more valuable than squeezing out a few extra milliseconds of performance.

Figure 7. If a worker becomes unresponsive, the scheduler automatically recovers by recycling the worker and continuing with the remaining tasks.

Results

Instead of focusing on compression ratios—which I covered in my previous article—I wanted to evaluate how the architecture behaved under sustained workloads.

The goal wasn't simply to compress images faster. It was to determine whether the browser could remain responsive while processing large batches of high-resolution images in parallel.

After introducing worker pooling, zero-copy transfers, shared memory, and dynamic scheduling, the difference was immediately noticeable.

Users no longer have to wait for an entire batch to finish before seeing results. As workers complete individual tasks, optimized images begin appearing almost immediately, making the application feel significantly more responsive.

The architectural improvements can be summarized like this:

Before	After
Sequential processing	Parallel worker pool
Frequent memory copies	Zero-copy ArrayBuffer transfers
Main thread blocked	Responsive UI
Fixed execution order	Dynamic task scheduling
Workers could stall indefinitely	Automatic timeout & recovery
High memory pressure	Controlled concurrency

None of these improvements came from changing the compression algorithm itself. They came from treating the browser like a runtime rather than just a user interface.

Although the compression algorithms themselves never changed, the surrounding architecture dramatically improved throughput, responsiveness, and overall stability.

The browser now behaves much more like a dedicated processing engine than a traditional web page.

Lessons Learned

When I started this project, I thought performance meant making image compression faster.

By the end, I realized performance is mostly about architecture.

The compression library was already highly optimized. My job wasn't to make libvips faster—it was to build a system that could use it efficiently inside the constraints of a browser.

That meant thinking less about algorithms and more about how work flows through the system.

A few architectural decisions ended up having a far greater impact than any micro-optimization:

Reusing workers instead of constantly creating new ones.
Eliminating unnecessary memory copies with transferable ArrayBuffers.
Sharing memory efficiently with SharedArrayBuffer.
Scheduling work instead of processing images strictly in arrival order.
Limiting concurrency based on available resources instead of maximizing parallelism.
Recovering automatically from stalled workers without interrupting the user.

Individually, none of these techniques are groundbreaking.

Together, they transformed the browser into a runtime capable of handling workloads that I previously assumed required a backend.

That was probably the biggest lesson from the entire project.

Modern browsers aren't just rendering engines anymore. They're increasingly capable application platforms—but getting the best performance out of them requires thinking like a systems engineer rather than a frontend developer.

Conclusion

In my previous article, I showed that modern browsers are capable of replacing an image-processing backend.

This article explored what it actually takes to make that architecture reliable in production.

Moving image processing into the browser isn't simply a matter of compiling native code to WebAssembly. It requires careful attention to worker pools, concurrency, memory management, scheduling, and fault tolerance. WebAssembly makes browser-native image processing possible, but it's the surrounding architecture that makes it practical.

As browser APIs continue to evolve, I expect more traditionally server-side workloads to move to the client.

The interesting question is no longer:

Can the browser do this?

It's becoming:

Does this feature really need a backend anymore?

📚 Browser-Native Image Processing Series

If you're interested in browser-native image processing, this article is part of a two-part series:

Part 1: How I Replaced My Image Processing Backend with WebAssembly

Learn why I moved image processing entirely into the browser and how WebAssembly made it possible.

Part 2: How I Built a High-Performance Browser Image Processing Pipeline with Web Workers and WebAssembly (You're here)

A deep dive into the worker pool, task scheduler, SharedArrayBuffer, zero-copy transfers, and fault recovery that make the architecture production-ready.

How I Replaced My Image Processing Backend with WebAssembly: Building a Browser-Native Image Optimization Engine

Neetin Singh Negi — Fri, 03 Jul 2026 11:50:19 +0000

For years, I assumed every image optimizer needed a backend.

The workflow seemed obvious:

Upload an image.
Compress it on the server.
Download the optimized result.

That's how almost every online image optimization service works.

But while building my own browser-based image optimizer, I kept asking myself one simple question:

If the browser already has the image, why should it upload it just to compress it?

That question completely changed the architecture of the project.

Instead of scaling servers to process images, I moved the entire image-processing pipeline into the browser using WebAssembly, libvips, Web Workers, and SharedArrayBuffer.

What started as an experiment to reduce backend infrastructure quickly became a deep dive into browser performance, memory management, worker scheduling, and modern web platform capabilities.

In this article, I'll explain how I built a browser-native image optimization engine, the engineering challenges I encountered, the trade-offs I had to make, and why browser memory—not CPU—became the hardest problem to solve.

TL;DR

Instead of uploading images to a server, this project processes them entirely inside the browser using WebAssembly, libvips, Web Workers, and SharedArrayBuffer.

Images stay on the user's device, improving privacy while eliminating backend image-processing costs.

Try the Live Demo

Everything described in this article is already running in production.

🌐 https://www.imageoptimizer.org/

Upload a few images, experiment with different output formats, and inspect the processing times yourself.

All image processing happens locally in your browser. Your images remain on your device unless you explicitly choose to export them.

Architecture

Before diving into the implementation, let's look at how the system is structured at a high level and how the browser has become the new image-processing engine.

Traditional Architecture

Nearly every online image optimizer follows the same workflow:

Upload the image
Process it on the server
Store it temporarily
Return the optimized image

This approach is simple and proven, but it introduces several challenges:

Every image must leave the user's device.
CPU usage grows with traffic.
Storage and bandwidth costs increase over time.
Users must trust the service with their files.

I wanted to see if I could remove the most expensive part of the system entirely.

Instead of sending images to the backend, I moved the backend into the browser.

Browser-First Architecture

Modern browsers are capable of much more than many developers realize.

With WebAssembly, Web Workers, SharedArrayBuffer, and modern browser APIs, it's possible to move the entire image-processing pipeline into the client.

That led to the following architecture:

Every optimization happens locally inside the browser.

The backend no longer performs image processing.

Instead, it is responsible for:

Authentication
Presigned upload URLs
Optional cloud exports
Account management

Images are never uploaded for optimization.

Inside the Worker: Processing Pipeline

One challenge with browser-side image processing is keeping the interface responsive.

Running image compression on the main thread quickly causes the UI to freeze, especially when processing multiple large images.

To avoid that, every optimization task runs inside a dedicated Web Worker.

The worker receives image data, processes it using libvips running inside WebAssembly, and returns the optimized result back to the main thread.

SharedArrayBuffer enables efficient communication between workers and the WebAssembly runtime while reducing memory overhead.

Smart Batch Scheduling

The biggest challenge wasn't CPU performance.

It was memory.

A compressed image can expand dramatically once decoded, which means processing several large images simultaneously can exhaust browser memory.

To solve that problem, I implemented a scheduling strategy that separates work into lightweight and heavyweight queues.

Small images are processed aggressively in parallel, while large images are scheduled more conservatively.

The goal isn't maximum throughput.

The goal is maintaining a responsive application without exhausting memory.

Engineering Highlights

The architecture evolved around several key design decisions:

WebAssembly + libvips for production-grade image processing
Dedicated Worker Pool to keep the UI responsive
Zero-Copy Transfers using transferable ArrayBuffers
Dynamic Concurrency based on device capabilities
Heavy/Light Scheduling to reduce memory pressure
Privacy-First Processing where images never leave the device

Building the Processing Pipeline

Moving image processing into the browser wasn't as simple as compiling libvips to WebAssembly.

The real challenge was building a pipeline that could process multiple large images efficiently without freezing the UI or exhausting browser memory.

That required solving three different problems:

Running native image processing inside the browser.
Communicating efficiently between the main thread and workers.
Enabling shared memory safely.

Running libvips in WebAssembly

The goal wasn't simply to make image processing possible in the browser—it was to bring a production-grade native image-processing pipeline to the web. I chose libvips, one of the fastest image-processing libraries available.

Compiling it to WebAssembly allowed me to reuse a mature native library while keeping all image processing inside the browser.

Initializing the runtime only happens once, after which every worker can reuse the same compiled module.

let vipsPromise: Promise<VipsRuntime> | undefined;

async function getVips(): Promise<VipsRuntime> {
    if (vipsPromise) return vipsPromise;

    vipsPromise = (async () => {
        const memory = getSharedWasmMemory();

        return factory({
            wasmMemory: memory,
            locateFile: file => `${ORIGIN}/wasm-vips/${file}`,
        });
    })();

    return vipsPromise;
}

Instead of recreating the runtime for every image, the application lazily initializes it once and reuses the same instance throughout the session.

This significantly reduces startup overhead when processing batches of images.

Why Web Workers Matter

Image compression is computationally expensive.

Running libvips directly on the main thread caused the interface to freeze whenever multiple large images were processed.

The solution was to move every optimization task into a dedicated worker pool.

Each worker operates independently, allowing multiple images to be processed in parallel while the React interface remains responsive.

Zero-Copy Memory Transfers

One optimization that made a surprisingly large difference was avoiding unnecessary memory copies.

Large decoded images can easily occupy hundreds of megabytes in memory.

Copying those buffers between threads quickly becomes expensive.

Instead of copying image data, the application transfers ownership of the underlying ArrayBuffer.

slot.worker.postMessage(task.request, [task.request.buffer]);

The second argument tells the browser to transfer ownership of the buffer instead of cloning it.

That single line dramatically reduces memory pressure during large optimization batches.

The SharedArrayBuffer Problem

The biggest surprise wasn't image processing.

It was SharedArrayBuffer.

Modern browsers intentionally disable SharedArrayBuffer unless an application runs inside a Cross-Origin Isolated environment.

Without those security headers, workers cannot safely share memory with the WebAssembly runtime.

Initially this was frustrating because everything appeared to work—until shared memory was required.

The solution was enabling two HTTP response headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Together, these create a Cross-Origin Isolated environment that allows the browser to expose SharedArrayBuffer safely.

Once those headers were configured correctly, workers could communicate with the WebAssembly runtime much more efficiently while keeping the processing pipeline entirely inside the browser.

Browser Security Shapes Architecture

One lesson I didn't expect was how much browser security policies influence software architecture.

On the backend, memory sharing is usually an implementation detail.

Inside the browser, it's part of the platform itself.

Design decisions such as enabling COOP/COEP, transferring ownership of ArrayBuffers instead of copying them, and initializing a shared WebAssembly runtime all became architectural decisions—not just implementation details.

In the end, the hardest part wasn't getting image compression to work.

It was designing a browser-native processing pipeline that remained fast, memory-efficient, and secure at the same time.

Real-World Benchmarks

Architecture diagrams and code are one thing, but performance is what ultimately determines whether a browser-native approach is practical.

To evaluate the pipeline, I optimized the same 9.4 MB JPEG image into four different output formats using the current WebAssembly implementation.

Benchmark note: Rather than relying on synthetic benchmarks, I tested the optimizer with a real 9.4 MB JPEG image and measured both compression ratio and processing time for each output format.

JPEG

Original Size: 9.4 MB
Optimized Size: 918 KB
Reduction: 91%
Processing Time: 4.75 seconds

JPEG delivered the best balance between compression ratio and processing time in this test. It reduced the image by 91% while completing in under five seconds, making it an excellent default choice for general photography.

WebP

Original Size: 9.4 MB
Optimized Size: 985 KB
Reduction: 90%
Processing Time: 13.49 seconds

WebP produced a file size comparable to JPEG but required significantly more processing time in my current implementation. This is likely influenced by the encoder configuration and quality settings rather than the format itself, making it an area I plan to continue optimizing.

AVIF

Original Size: 9.4 MB
Optimized Size: 1.2 MB
Reduction: 87%
Processing Time: 6.88 seconds

AVIF achieved an impressive 87% reduction while completing in under seven seconds. It offers an attractive compromise between compression efficiency and execution time, especially for applications that prioritize bandwidth savings over encoding speed.

PNG

Original Size: 9.4 MB
Optimized Size: 6.4 MB
Reduction: 32%
Processing Time: 4.06 seconds

PNG behaved exactly as expected. Because it is a lossless format, dramatic size reductions are naturally more difficult to achieve. Even so, the optimizer reduced the file size by 32% while preserving image fidelity.

What These Results Tell Me

One of the biggest surprises wasn't the compression ratios—it was how different the encoding costs were.

JPEG delivered the fastest balance of speed and compression, WebP required substantially longer processing time in its current configuration, AVIF provided excellent compression with moderate encoding time, and PNG demonstrated the trade-offs inherent to lossless compression.

More importantly, every optimization happened entirely inside the browser.

No image uploads.

No server-side processing.

No backend CPU costs.

For me, that validates the architecture more than any individual benchmark.

Lessons Learned

When I started this project, I assumed the hardest problem would be making WebAssembly fast enough for image processing.

It wasn't.

The biggest challenge turned out to be memory management.

A compressed image that occupies only a few megabytes on disk can expand to hundreds of megabytes once decoded in memory. Processing several large images simultaneously can quickly exhaust the browser's available heap, making scheduling decisions just as important as compression algorithms.

That realization completely changed the way I designed the processing pipeline.

Instead of maximizing parallelism at all costs, I focused on building a scheduler that adapts to available hardware, balances lightweight and heavyweight tasks, and prioritizes stability over raw throughput.

Another lesson was how much browser security influences architecture.

Features like SharedArrayBuffer aren't simply APIs you can enable—they require understanding browser security models, Cross-Origin Isolation, and how workers communicate safely with WebAssembly.

Those constraints ultimately shaped the design of the entire application.

Perhaps the most important takeaway, though, is that modern browsers are capable of far more than many developers realize.

With WebAssembly, Web Workers, SharedArrayBuffer, and mature native libraries like libvips, the browser is no longer just a rendering engine.

For workloads like image optimization, it can act as a high-performance application platform capable of replacing an entire image-processing backend.

There is still plenty to improve—better encoder tuning, broader browser compatibility, additional optimization strategies, and more performance benchmarking—but building this project completely changed how I think about browser engineering.

Sometimes the best backend is the one you don't need to build.

Future Work

Although the current implementation already performs all image processing inside the browser, there are several areas I'd like to improve:

SIMD optimization for additional operations
Better AVIF encoder tuning
Progressive decoding for extremely large images
Smarter scheduling based on memory pressure
Additional image formats
More comprehensive benchmarking across browsers and hardware

One advantage of this architecture is that improvements primarily happen inside the browser rather than requiring additional backend infrastructure.

Tech Stack

React
Next.js
TypeScript
WebAssembly
libvips
Web Workers
SharedArrayBuffer
Cross-Origin Isolation (COOP/COEP)

Conclusion

Building this project challenged many assumptions I had about what browsers are capable of.

What started as an experiment in reducing backend infrastructure turned into an exploration of WebAssembly, browser memory management, worker scheduling, and modern web platform capabilities.

If you're building applications that process user files, I'd encourage you to ask the same question that started this project:

Does this really need a backend?

If you've built something similar—or have ideas for improving the architecture—I'd genuinely enjoy hearing your thoughts.

Browser-native computing is evolving quickly, and I think we're only beginning to explore what's possible with WebAssembly and modern browser APIs.

Thanks for reading! If you found this useful, I'd appreciate your feedback. If you've built something similar with WebAssembly or browser-native processing, I'd love to hear about your approach in the comments.

Try It Yourself

Everything described in this article is already running in production.

🌐 Live Demo: https://www.imageoptimizer.org/

Everything described in this article is already running in production.
Upload a few images, experiment with different formats, and inspect the processing times yourself.

All image processing happens locally in your browser. Your files remain on your device unless you explicitly choose to export them.

If you've experimented with WebAssembly, Web Workers, or browser-native processing, I'd love to hear your thoughts in the comments.

📚 Browser-Native Image Processing Series

If you're interested in browser-native image processing, this article is part of a two-part series:

Part 1: How I Replaced My Image Processing Backend with WebAssembly (You're here)

Learn why I moved image processing entirely into the browser, how WebAssembly made it possible, and why modern browsers are now capable of replacing traditional image-processing backends.

Part 2: How I Built a High-Performance Browser Image Processing Pipeline with Web Workers and WebAssembly

A deep dive into the worker pool, task scheduler, SharedArrayBuffer, zero-copy transfers, memory management, and fault recovery that make the architecture production-ready.

I Built an Image Optimizer That Actually Feels Fast

Neetin Singh Negi — Mon, 30 Mar 2026 07:20:54 +0000

As developers, we deal with images all the time. Uploads, previews, performance optimization… and somehow it’s always a bit annoying.

So I decided to build something simple:

A tool that optimizes images instantly without friction.

💡 The Problem

Most image optimization tools:

Feel slow
Require multiple steps
Or destroy image quality

When you just want to:

“Reduce image size quickly and move on”

🛠️ The Solution

I built ImageOptimizer — a lightweight tool to:

Compress images in seconds
Maintain quality
Handle different sizes intelligently
Work directly in the browser

No unnecessary steps. No clutter.

⚙️ Tech Stack

Built using:

MERN stack principles
Next.js for performance
Optimized backend processing
Smart credit-based system for scaling

🧠 What I Learned

Building this taught me:

Performance matters more than features
UX simplicity beats complexity
Small tools can solve real problems
Distribution is harder than development

🔥 What’s Next

I’m planning to:

Improve compression strategies
Add batch processing
Optimize for mobile workflows
Explore AI-based optimization

🙌 Feedback Wanted

I’d love honest feedback from the community:

What would you improve?
What’s missing?
Would you actually use this?

If you want to try it out:

imageoptimizer.org

If you're interested in practical tools and building real-world products, I’ll be sharing more of these.

Let’s build things people actually use.