Khoa Nguyen

Posted on May 19

I replaced a $200/month audio processing server with 40 lines of browser JavaScript

#webdev #javascript #audio #performance

Last year I was paying $200/month for an EC2 instance that did one thing: accept audio file uploads, run FFmpeg, and return the converted file. FLAC to MP3. WAV to OGG. Bitrate changes. Speed adjustments.

The server handled maybe 2,000 conversions per day. Most files were under 20 MB. The actual FFmpeg processing took 2-4 seconds per file. The upload and download took longer than the conversion itself.

I replaced the entire thing with client-side JavaScript. Here's how, and where the approach breaks down.

The Web Audio API is more capable than you think

Most developers know the Web Audio API as "the thing that plays sounds in games." It's actually a full audio processing pipeline with decode, transform, and encode capabilities.

The key insight: if the browser can play a format, it can decode it. And if it can decode it, you can re-encode it into any format the browser supports for encoding.

async function convertAudio(file, targetFormat = 'audio/mp3', bitrate = 192000) {
  const audioContext = new OfflineAudioContext(2, 1, 44100);
  const arrayBuffer = await file.arrayBuffer();
  const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

  // Now audioBuffer contains raw PCM samples
  // We can re-encode to any supported format
  const encoded = await encodeAudio(audioBuffer, targetFormat, bitrate);
  return new Blob([encoded], { type: targetFormat });
}

decodeAudioData handles FLAC, WAV, OGG, MP3, AAC, and WebM audio in all modern browsers. You get back raw PCM samples regardless of the input format. From there, encoding is a separate step.

Encoding with AudioEncoder (WebCodecs API)

The WebCodecs API landed in Chrome 94 and is now available in all Chromium-based browsers and Firefox. It gives you direct access to hardware-accelerated audio encoders.

async function encodeToMp3(audioBuffer, bitrate = 192000) {
  const numberOfChannels = audioBuffer.numberOfChannels;
  const sampleRate = audioBuffer.sampleRate;
  const frames = [];

  const encoder = new AudioEncoder({
    output: (chunk) => {
      const buffer = new ArrayBuffer(chunk.byteLength);
      chunk.copyTo(buffer);
      frames.push(buffer);
    },
    error: (e) => console.error('Encode error:', e),
  });

  encoder.configure({
    codec: 'mp3',
    numberOfChannels,
    sampleRate,
    bitrate,
  });

  // Feed PCM data in chunks
  const chunkSize = 1152; // MP3 frame size
  const totalSamples = audioBuffer.length;

  for (let offset = 0; offset < totalSamples; offset += chunkSize) {
    const frameCount = Math.min(chunkSize, totalSamples - offset);
    const data = new Float32Array(frameCount * numberOfChannels);

    for (let ch = 0; ch < numberOfChannels; ch++) {
      const channelData = audioBuffer.getChannelData(ch);
      for (let i = 0; i < frameCount; i++) {
        data[i * numberOfChannels + ch] = channelData[offset + i];
      }
    }

    const audioData = new AudioData({
      format: 'f32-planar',
      sampleRate,
      numberOfFrames: frameCount,
      numberOfChannels,
      timestamp: (offset / sampleRate) * 1_000_000,
      data,
    });

    encoder.encode(audioData);
    audioData.close();
  }

  await encoder.flush();
  encoder.close();

  return concatenateBuffers(frames);
}

This runs at near-native speed because it uses the platform's hardware encoder. On my M2 MacBook, a 5-minute FLAC file (50 MB) converts to 192kbps MP3 in about 1.8 seconds. The same file took 2.1 seconds on the EC2 instance, plus 8 seconds of upload time.

The fallback: FFmpeg.wasm for Safari

Safari's WebCodecs support for audio encoding is still incomplete as of early 2026. For Safari users, FFmpeg.wasm is the fallback:

async function convertWithFFmpeg(file, outputFormat = 'mp3', bitrate = '192k') {
  const { FFmpeg } = await import('@ffmpeg/ffmpeg');
  const { toBlobURL } = await import('@ffmpeg/util');

  const ffmpeg = new FFmpeg();
  await ffmpeg.load({
    coreURL: await toBlobURL('/ffmpeg-core.js', 'text/javascript'),
    wasmURL: await toBlobURL('/ffmpeg-core.wasm', 'application/wasm'),
  });

  const inputName = 'input' + getExtension(file.name);
  await ffmpeg.writeFile(inputName, new Uint8Array(await file.arrayBuffer()));

  await ffmpeg.exec([
    '-i', inputName,
    '-b:a', bitrate,
    '-map', 'a',
    `output.${outputFormat}`,
  ]);

  const data = await ffmpeg.readFile(`output.${outputFormat}`);
  return new Blob([data], { type: `audio/${outputFormat}` });
}

The initial FFmpeg.wasm load is ~30 MB (cached after first use). After that, conversions run at roughly 60-70% of native FFmpeg speed. Still faster than uploading to a server for files under 100 MB.

Audio compression without quality loss

Audio compression (reducing file size, not dynamic range) is mostly about bitrate selection. The perceptual quality curve for MP3 looks like this:

Bitrate	File size (5 min)	Perceptual quality
64 kbps	2.4 MB	Noticeable artifacts, AM radio quality
128 kbps	4.8 MB	Acceptable for speech, podcasts
192 kbps	7.2 MB	Transparent for most listeners
256 kbps	9.6 MB	Indistinguishable from source for 99% of people
320 kbps	12 MB	Placebo territory

For podcasts and voice content, 128 kbps mono is the sweet spot. For music, 192 kbps stereo is where most listeners cannot distinguish from the FLAC source in blind tests. Going above 256 kbps is measurably identical to the source on consumer equipment.

The practical implication: a 50 MB FLAC file becomes a 7 MB MP3 at 192 kbps with no audible difference. That's a 7x reduction with zero perceptual cost.

Speed change without pitch shift

Changing audio playback speed while preserving pitch is a time-stretching problem. The Web Audio API has a built-in solution:

function changeSpeed(audioBuffer, speedFactor) {
  const newLength = Math.round(audioBuffer.length / speedFactor);
  const offlineCtx = new OfflineAudioContext(
    audioBuffer.numberOfChannels,
    newLength,
    audioBuffer.sampleRate
  );

  const source = offlineCtx.createBufferSource();
  source.buffer = audioBuffer;
  source.playbackRate.value = speedFactor;
  source.connect(offlineCtx.destination);
  source.start();

  return offlineCtx.startRendering();
}

playbackRate with OfflineAudioContext renders the speed-adjusted audio to a new buffer without pitch artifacts. The browser's internal resampler handles the time-stretching. Quality is comparable to SoX or Audacity's default algorithm.

This is how I built the audio speed changer on my site. Users upload a file, pick a speed (0.5x to 3x), and download the result. The entire operation runs in the browser in under 2 seconds for a typical podcast episode.

Where this breaks down

Files over 500 MB. The Web Audio API loads the entire decoded PCM into memory. A 60-minute FLAC at 44.1kHz stereo is ~600 MB of raw PCM. On mobile devices with 3-4 GB of browser memory, this causes crashes. For large files, you need chunked processing with AudioDecoder (WebCodecs) instead of decodeAudioData.

Exotic formats. The browser can decode what it can play. Formats like APE, WV (WavPack), and DSD are not supported. For these, FFmpeg.wasm is the only browser-side option.

Safari audio encoding. As mentioned, Safari's AudioEncoder support is incomplete. You need the FFmpeg.wasm fallback or accept that ~15% of users get a slower path.

Batch processing. Converting 50 files sequentially in a browser tab is technically possible but UX-hostile. Users expect to close the tab. For batch workloads, a server (or at minimum a service worker with Background Fetch) is still the right architecture.

The economics

My old setup:

t3.medium EC2: $30/month
EBS storage for temp files: $10/month
Data transfer (uploads + downloads): $80-150/month
CloudFront for delivery: $20/month
Total: ~$200/month for 2,000 conversions/day

My current setup:

Static hosting on Cloudflare Pages: $0
Total: $0/month for unlimited conversions

The conversion moved from my bill to the user's CPU. But the user's experience is better because there's no upload wait, no queue, and no "your file is being processed" spinner. The 8-second upload that used to precede every conversion is gone.

What I'd build differently today

Start with WebCodecs, fall back to FFmpeg.wasm. I did it the other way around and had to refactor when WebCodecs matured.
Use AudioDecoder for large files from day one. Streaming decode avoids the memory cliff that decodeAudioData hits at ~500 MB.
Show a progress bar based on samples processed. The Web Audio API doesn't give you progress events, but you can calculate it from the chunk offset in the encoding loop.
Test on Android Chrome early. Mobile Chromium has lower memory limits and some WebCodecs codecs are software-only (no hardware acceleration). Performance is 2-3x slower than desktop.

The browser audio stack in 2026 is production-ready for the vast majority of consumer audio processing tasks. If you're still routing audio files through a server for format conversion, compression, or speed adjustment, you're paying for infrastructure that delivers a worse user experience than the zero-infrastructure alternative.

I built all of this into a set of free browser-based audio tools — FLAC to MP3, audio compression, speed change, trimming, noise removal. No upload, no signup, no watermark. The patterns above are what's running in production.

Happy to answer questions about specific codec support or edge cases in the comments.

DEV Community