Moving FFmpeg to the Browser: How I Saved 100% on Server Costs Using WebAssembly

#javascript #performance #privacy

By Baojian Yuan

I am a half indie developer and a dad based in Shanghai. When I’m not changing diapers for my 3-year-old daughter, I am usually building AI tools or optimizing workflows.

Recently, I needed a simple tool to convert massive audio files (WAV to MP3) for a local ASR (Automatic Speech Recognition) project. I looked at existing online converters and immediately hit two roadblocks:

Privacy: Uploading private meeting recordings to a random server feels wrong. I have no idea where that data goes or how long it stays there.
Speed: Uploading a 500MB file takes forever before the actual processing even starts.

I sat there looking at my laptop's specs—an Intel Core Ultra 9 with 32GB of RAM—and thought: "Why am I paying AWS for computing power when the user has a perfectly good CPU sitting idle?"

So, I decided to port FFmpeg to the browser using WebAssembly (WASM). The goal was simple: Zero server uploads, 100% privacy, and $0 server bills.

Here is how I built LocalAudioConvert.com, the technical hurdles I faced, and why I believe "Local First" is the future of utility apps.

The Challenge: The Browser is Not an OS

Running FFmpeg—a heavy, complex C library—inside Chrome isn't straightforward. While tools like ffmpeg.wasm (powered by Emscripten) exist, making them production-ready for large files requires solving several engineering nightmares.

1. The SharedArrayBuffer Headache

To make video/audio conversion bearable in a browser, you need multi-threading. FFmpeg needs to utilize multiple cores. However, enabling SharedArrayBuffer (which allows threads to share memory) in modern browsers requires strict security isolation to prevent Spectre attacks.

If you just drop the WASM file in, it won't work. You have to configure your static file server (Nginx, Vercel, or Netlify) to send specific response headers:

HTTP

Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin

The Trade-off: Enforcing these headers isolates your document process. This broke my external image loading for a while (e.g., loading avatars from a CDN). I had to proxy those resources or ensure they were served from the same origin. It’s a classic security vs. convenience trade-off, but necessary for performance.

2. The Memory OOM (Out of Memory)

Browsers are notoriously stingy with WebAssembly memory allocation. In my early tests, when I dragged in a 1GB WAV file, the WASM instance would immediately crash with an Out of Memory error. Chrome doesn't just let you allocate unlimited heap size.

The Strategy: Instead of loading the entire file into the WASM virtual file system (MEMFS) at once, I implemented a chunking mechanism.

Read the file from the user's disk in small chunks.
Feed the buffer into the WASM heap.
Process and flush.

This keeps the memory footprint low, regardless of the input file size.

Performance: Native vs. WASM vs. Cloud

The biggest question I get is: "Is it slower than native FFmpeg?"

The short answer is: Yes. WebAssembly is fast, but it still has overhead compared to native C++ running directly on the OS.

However, the more important question is: "Is it slower than Cloud Converters?" The answer is: No.

Here is a rough benchmark for a 100MB WAV to MP3 conversion:

Native FFmpeg (M1 Mac): ~0.8 seconds
WASM (Browser): ~4.5 seconds
Traditional Online Tool: ~45 seconds (30s Upload + 5s Process + 10s Download)

The User Experience Win: The "perceived latency" is significantly lower because processing starts instantly. There is no progress bar for uploading. For a user on a slow coffee shop Wi-Fi, the WASM solution is infinitely faster than the cloud solution.

The "Killer" Feature: Batch Processing

Most online converters limit you to 1 or 2 files at a time. This isn't a technical limitation; it's a financial one. They don't want you hogging their server CPU.

Since I am using your CPU, I don't care how many files you convert.

I built a queue system using Web Workers. This allows users to drop in 100+ files at once. The main UI thread remains responsive (no freezing) while the worker threads churn through the audio queue in the background.

This effectively turns your browser into a desktop-grade batch processor.

What's Next? (VAD & ASR)

Now that I have a stable audio processing pipeline running entirely client-side, I'm experimenting with more advanced AI features.

I am currently working on running VAD (Voice Activity Detection) and Whisper (ASR) directly in the browser. Imagine being able to transcribe sensitive legal or medical recordings without the audio data ever leaving your laptop. That is the future I want to build.

Follow me on X (@YuanAudio) to see the progress.