Neetin Singh Negi

Posted on Jul 3 • Edited on Jul 5

How I Replaced My Image Processing Backend with WebAssembly: Building a Browser-Native Image Optimization Engine

#webassembly #performance #javascript #webdev

For years, I assumed every image optimizer needed a backend.

The workflow seemed obvious:

Upload an image.
Compress it on the server.
Download the optimized result.

That's how almost every online image optimization service works.

But while building my own browser-based image optimizer, I kept asking myself one simple question:

If the browser already has the image, why should it upload it just to compress it?

That question completely changed the architecture of the project.

Instead of scaling servers to process images, I moved the entire image-processing pipeline into the browser using WebAssembly, libvips, Web Workers, and SharedArrayBuffer.

What started as an experiment to reduce backend infrastructure quickly became a deep dive into browser performance, memory management, worker scheduling, and modern web platform capabilities.

In this article, I'll explain how I built a browser-native image optimization engine, the engineering challenges I encountered, the trade-offs I had to make, and why browser memory—not CPU—became the hardest problem to solve.

TL;DR

Instead of uploading images to a server, this project processes them entirely inside the browser using WebAssembly, libvips, Web Workers, and SharedArrayBuffer.

Images stay on the user's device, improving privacy while eliminating backend image-processing costs.

Try the Live Demo

Everything described in this article is already running in production.

🌐 https://www.imageoptimizer.org/

Upload a few images, experiment with different output formats, and inspect the processing times yourself.

All image processing happens locally in your browser. Your images remain on your device unless you explicitly choose to export them.

Architecture

Before diving into the implementation, let's look at how the system is structured at a high level and how the browser has become the new image-processing engine.

Traditional Architecture

Nearly every online image optimizer follows the same workflow:

Upload the image
Process it on the server
Store it temporarily
Return the optimized image

This approach is simple and proven, but it introduces several challenges:

Every image must leave the user's device.
CPU usage grows with traffic.
Storage and bandwidth costs increase over time.
Users must trust the service with their files.

I wanted to see if I could remove the most expensive part of the system entirely.

Instead of sending images to the backend, I moved the backend into the browser.

Browser-First Architecture

Modern browsers are capable of much more than many developers realize.

With WebAssembly, Web Workers, SharedArrayBuffer, and modern browser APIs, it's possible to move the entire image-processing pipeline into the client.

That led to the following architecture:

Every optimization happens locally inside the browser.

The backend no longer performs image processing.

Instead, it is responsible for:

Authentication
Presigned upload URLs
Optional cloud exports
Account management

Images are never uploaded for optimization.

Inside the Worker: Processing Pipeline

One challenge with browser-side image processing is keeping the interface responsive.

Running image compression on the main thread quickly causes the UI to freeze, especially when processing multiple large images.

To avoid that, every optimization task runs inside a dedicated Web Worker.

The worker receives image data, processes it using libvips running inside WebAssembly, and returns the optimized result back to the main thread.

SharedArrayBuffer enables efficient communication between workers and the WebAssembly runtime while reducing memory overhead.

Smart Batch Scheduling

The biggest challenge wasn't CPU performance.

It was memory.

A compressed image can expand dramatically once decoded, which means processing several large images simultaneously can exhaust browser memory.

To solve that problem, I implemented a scheduling strategy that separates work into lightweight and heavyweight queues.

Small images are processed aggressively in parallel, while large images are scheduled more conservatively.

The goal isn't maximum throughput.

The goal is maintaining a responsive application without exhausting memory.

Engineering Highlights

The architecture evolved around several key design decisions:

WebAssembly + libvips for production-grade image processing
Dedicated Worker Pool to keep the UI responsive
Zero-Copy Transfers using transferable ArrayBuffers
Dynamic Concurrency based on device capabilities
Heavy/Light Scheduling to reduce memory pressure
Privacy-First Processing where images never leave the device

Building the Processing Pipeline

Moving image processing into the browser wasn't as simple as compiling libvips to WebAssembly.

The real challenge was building a pipeline that could process multiple large images efficiently without freezing the UI or exhausting browser memory.

That required solving three different problems:

Running native image processing inside the browser.
Communicating efficiently between the main thread and workers.
Enabling shared memory safely.

Running libvips in WebAssembly

The goal wasn't simply to make image processing possible in the browser—it was to bring a production-grade native image-processing pipeline to the web. I chose libvips, one of the fastest image-processing libraries available.

Compiling it to WebAssembly allowed me to reuse a mature native library while keeping all image processing inside the browser.

Initializing the runtime only happens once, after which every worker can reuse the same compiled module.

let vipsPromise: Promise<VipsRuntime> | undefined;

async function getVips(): Promise<VipsRuntime> {
    if (vipsPromise) return vipsPromise;

    vipsPromise = (async () => {
        const memory = getSharedWasmMemory();

        return factory({
            wasmMemory: memory,
            locateFile: file => `${ORIGIN}/wasm-vips/${file}`,
        });
    })();

    return vipsPromise;
}

Instead of recreating the runtime for every image, the application lazily initializes it once and reuses the same instance throughout the session.

This significantly reduces startup overhead when processing batches of images.

Why Web Workers Matter

Image compression is computationally expensive.

Running libvips directly on the main thread caused the interface to freeze whenever multiple large images were processed.

The solution was to move every optimization task into a dedicated worker pool.

Each worker operates independently, allowing multiple images to be processed in parallel while the React interface remains responsive.

Zero-Copy Memory Transfers

One optimization that made a surprisingly large difference was avoiding unnecessary memory copies.

Large decoded images can easily occupy hundreds of megabytes in memory.

Copying those buffers between threads quickly becomes expensive.

Instead of copying image data, the application transfers ownership of the underlying ArrayBuffer.

slot.worker.postMessage(task.request, [task.request.buffer]);

The second argument tells the browser to transfer ownership of the buffer instead of cloning it.

That single line dramatically reduces memory pressure during large optimization batches.

The SharedArrayBuffer Problem

The biggest surprise wasn't image processing.

It was SharedArrayBuffer.

Modern browsers intentionally disable SharedArrayBuffer unless an application runs inside a Cross-Origin Isolated environment.

Without those security headers, workers cannot safely share memory with the WebAssembly runtime.

Initially this was frustrating because everything appeared to work—until shared memory was required.

The solution was enabling two HTTP response headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Together, these create a Cross-Origin Isolated environment that allows the browser to expose SharedArrayBuffer safely.

Once those headers were configured correctly, workers could communicate with the WebAssembly runtime much more efficiently while keeping the processing pipeline entirely inside the browser.

Browser Security Shapes Architecture

One lesson I didn't expect was how much browser security policies influence software architecture.

On the backend, memory sharing is usually an implementation detail.

Inside the browser, it's part of the platform itself.

Design decisions such as enabling COOP/COEP, transferring ownership of ArrayBuffers instead of copying them, and initializing a shared WebAssembly runtime all became architectural decisions—not just implementation details.

In the end, the hardest part wasn't getting image compression to work.

It was designing a browser-native processing pipeline that remained fast, memory-efficient, and secure at the same time.

Real-World Benchmarks

Architecture diagrams and code are one thing, but performance is what ultimately determines whether a browser-native approach is practical.

To evaluate the pipeline, I optimized the same 9.4 MB JPEG image into four different output formats using the current WebAssembly implementation.

Benchmark note: Rather than relying on synthetic benchmarks, I tested the optimizer with a real 9.4 MB JPEG image and measured both compression ratio and processing time for each output format.

JPEG

Original Size: 9.4 MB
Optimized Size: 918 KB
Reduction: 91%
Processing Time: 4.75 seconds

JPEG delivered the best balance between compression ratio and processing time in this test. It reduced the image by 91% while completing in under five seconds, making it an excellent default choice for general photography.

WebP

Original Size: 9.4 MB
Optimized Size: 985 KB
Reduction: 90%
Processing Time: 13.49 seconds

WebP produced a file size comparable to JPEG but required significantly more processing time in my current implementation. This is likely influenced by the encoder configuration and quality settings rather than the format itself, making it an area I plan to continue optimizing.

AVIF

Original Size: 9.4 MB
Optimized Size: 1.2 MB
Reduction: 87%
Processing Time: 6.88 seconds

AVIF achieved an impressive 87% reduction while completing in under seven seconds. It offers an attractive compromise between compression efficiency and execution time, especially for applications that prioritize bandwidth savings over encoding speed.

PNG

Original Size: 9.4 MB
Optimized Size: 6.4 MB
Reduction: 32%
Processing Time: 4.06 seconds

PNG behaved exactly as expected. Because it is a lossless format, dramatic size reductions are naturally more difficult to achieve. Even so, the optimizer reduced the file size by 32% while preserving image fidelity.

What These Results Tell Me

One of the biggest surprises wasn't the compression ratios—it was how different the encoding costs were.

JPEG delivered the fastest balance of speed and compression, WebP required substantially longer processing time in its current configuration, AVIF provided excellent compression with moderate encoding time, and PNG demonstrated the trade-offs inherent to lossless compression.

More importantly, every optimization happened entirely inside the browser.

No image uploads.

No server-side processing.

No backend CPU costs.

For me, that validates the architecture more than any individual benchmark.

Lessons Learned

When I started this project, I assumed the hardest problem would be making WebAssembly fast enough for image processing.

It wasn't.

The biggest challenge turned out to be memory management.

A compressed image that occupies only a few megabytes on disk can expand to hundreds of megabytes once decoded in memory. Processing several large images simultaneously can quickly exhaust the browser's available heap, making scheduling decisions just as important as compression algorithms.

That realization completely changed the way I designed the processing pipeline.

Instead of maximizing parallelism at all costs, I focused on building a scheduler that adapts to available hardware, balances lightweight and heavyweight tasks, and prioritizes stability over raw throughput.

Another lesson was how much browser security influences architecture.

Features like SharedArrayBuffer aren't simply APIs you can enable—they require understanding browser security models, Cross-Origin Isolation, and how workers communicate safely with WebAssembly.

Those constraints ultimately shaped the design of the entire application.

Perhaps the most important takeaway, though, is that modern browsers are capable of far more than many developers realize.

With WebAssembly, Web Workers, SharedArrayBuffer, and mature native libraries like libvips, the browser is no longer just a rendering engine.

For workloads like image optimization, it can act as a high-performance application platform capable of replacing an entire image-processing backend.

There is still plenty to improve—better encoder tuning, broader browser compatibility, additional optimization strategies, and more performance benchmarking—but building this project completely changed how I think about browser engineering.

Sometimes the best backend is the one you don't need to build.

Future Work

Although the current implementation already performs all image processing inside the browser, there are several areas I'd like to improve:

SIMD optimization for additional operations
Better AVIF encoder tuning
Progressive decoding for extremely large images
Smarter scheduling based on memory pressure
Additional image formats
More comprehensive benchmarking across browsers and hardware

One advantage of this architecture is that improvements primarily happen inside the browser rather than requiring additional backend infrastructure.

Tech Stack

React
Next.js
TypeScript
WebAssembly
libvips
Web Workers
SharedArrayBuffer
Cross-Origin Isolation (COOP/COEP)

Conclusion

Building this project challenged many assumptions I had about what browsers are capable of.

What started as an experiment in reducing backend infrastructure turned into an exploration of WebAssembly, browser memory management, worker scheduling, and modern web platform capabilities.

If you're building applications that process user files, I'd encourage you to ask the same question that started this project:

Does this really need a backend?

If you've built something similar—or have ideas for improving the architecture—I'd genuinely enjoy hearing your thoughts.

Browser-native computing is evolving quickly, and I think we're only beginning to explore what's possible with WebAssembly and modern browser APIs.

Thanks for reading! If you found this useful, I'd appreciate your feedback. If you've built something similar with WebAssembly or browser-native processing, I'd love to hear about your approach in the comments.

Try It Yourself

Everything described in this article is already running in production.

🌐 Live Demo: https://www.imageoptimizer.org/

Everything described in this article is already running in production.
Upload a few images, experiment with different formats, and inspect the processing times yourself.

All image processing happens locally in your browser. Your files remain on your device unless you explicitly choose to export them.

If you've experimented with WebAssembly, Web Workers, or browser-native processing, I'd love to hear your thoughts in the comments.

📚 Browser-Native Image Processing Series

If you're interested in browser-native image processing, this article is part of a two-part series:

Part 1: How I Replaced My Image Processing Backend with WebAssembly (You're here)

Learn why I moved image processing entirely into the browser, how WebAssembly made it possible, and why modern browsers are now capable of replacing traditional image-processing backends.

Part 2: How I Built a High-Performance Browser Image Processing Pipeline with Web Workers and WebAssembly

A deep dive into the worker pool, task scheduler, SharedArrayBuffer, zero-copy transfers, memory management, and fault recovery that make the architecture production-ready.

DEV Community