nareshipme

Posted on Apr 2

How to Render and Export Video in the Browser with WebCodecs, OffscreenCanvas, and a Web Worker

#webcodecs #javascript #video #webdev

Server-side video rendering is expensive. FFmpeg on a Railway worker, Vercel Sandbox spin-up times, memory limits — it all adds up fast. But modern browsers now ship WebCodecs, which lets you decode, transform, and re-encode video frames entirely on the client.

This post walks through a complete client-side rendering pipeline: decode a video from a presigned URL, draw cropped frames + captions onto an OffscreenCanvas, re-encode to H.264 MP4, and upload the result — all inside a Web Worker, keeping the main thread completely free.

Why Client-Side?

No server compute cost for rendering
No timeouts or memory OOM issues
Progress feedback is natural (frame-by-frame)
Works offline once the video is fetched

The trade-off: only Chrome 94+ and Firefox 130+ support WebCodecs. Detect it before starting:

export const WEBCODECS_SUPPORTED =
  typeof window !== 'undefined' &&
  typeof (window as any).VideoEncoder !== 'undefined' &&
  typeof (window as any).VideoDecoder !== 'undefined';

Architecture Overview

Main thread
  └─ Fetches presigned URLs from API
  └─ Spawns Web Worker
  └─ Listens for { type: 'progress' | 'done' | 'error' }

Web Worker (worker.ts)
  └─ Fetches source video blob
  └─ Input → VideoTrack (via mediabunny)
  └─ For each frame:
        drawFrame() → OffscreenCanvas (crop + aspect ratio)
        drawCaption() → OffscreenCanvas (text overlay)
  └─ Conversion.execute() → H.264 MP4 buffer
  └─ postMessage({ type: 'done', blob })

Main thread
  └─ Uploads blob to R2 via presigned PUT

We use mediabunny to handle demuxing, decoding, encoding, and muxing — it wraps the low-level WebCodecs API in a clean Conversion interface.

The Worker

1. Fetch and demux

const response = await fetch(msg.videoUrl);
const videoBlob = await response.blob();
const input = new Input({ source: new BlobSource(videoBlob), formats: ALL_FORMATS });
const videoTrack = await input.getPrimaryVideoTrack();

2. Per-frame processing

This is the core — for every decoded frame, we get a VideoSample and draw it onto an OffscreenCanvas. The canvas handles cropping and caption overlays:

async function processFrame(sample: VideoSample): Promise<OffscreenCanvas> {
  const canvas = new OffscreenCanvas(outputWidth, outputHeight);
  const ctx = canvas.getContext('2d')!;

  // Draw cropped frame
  drawFrame(ctx, sample, frameOpts);

  // Draw caption if active at this timestamp
  if (withCaptions && captions.length > 0) {
    const relativeMs = sample.timestamp * 1000;
    const active = captions.find(c => relativeMs >= c.startMs && relativeMs < c.endMs);
    if (active) drawCaption(ctx, { text: active.text, ...captionOpts });
  }

  sample.close(); // Must close to free GPU memory
  return canvas;
}

Important: always call sample.close() after drawing. VideoSamples hold GPU-backed memory and leaking them will crash the worker.

3. Crop modes

We support four crop modes — cover, contain, face (top-biased cover), and custom (manual zoom + pan):

function drawFrame(ctx, sample, { srcW, srcH, outW, outH, cropMode, cropX, cropY, cropZoom }) {
  const outAspect = outW / outH;

  if (cropMode === 'contain') {
    ctx.fillStyle = '#000';
    ctx.fillRect(0, 0, outW, outH);
    const scale = Math.min(outW / srcW, outH / srcH);
    sample.draw(ctx, 0, 0, srcW, srcH, (outW - srcW*scale)/2, (outH - srcH*scale)/2, srcW*scale, srcH*scale);
    return;
  }

  // cover: crop to fill
  if (cropMode === 'cover') {
    const [sx, sy, sw, sh] = coverCrop(srcW, srcH, outW, outH);
    sample.draw(ctx, sx, sy, sw, sh, 0, 0, outW, outH);
  }

  // custom: zoom + user-defined center point
  if (cropMode === 'custom') {
    const zoom = Math.max(1, cropZoom);
    const sw = srcW / zoom;
    const sh = srcH / zoom;
    const cx = (cropX / 100) * srcW;
    const cy = (cropY / 100) * srcH;
    const sx = Math.max(0, Math.min(cx - sw/2, srcW - sw));
    const sy = Math.max(0, Math.min(cy - sh/2, srcH - sh));
    sample.draw(ctx, sx, sy, sw, sh, 0, 0, outW, outH);
  }
}

4. Encode and mux

const output = new Output({ format: new Mp4OutputFormat(), target: new BufferTarget() });

const conversion = await Conversion.init({
  input, output,
  trim: { start: msg.startSec, end: msg.endSec },
  video: {
    codec: 'avc',
    bitrate: outputWidth >= 1080 ? 4_000_000 : 2_000_000,
    keyFrameInterval: 2,
    processedWidth: outputWidth,
    processedHeight: outputHeight,
    process: processFrame,
  },
});

conversion.onProgress = (p: number) => {
  self.postMessage({ type: 'progress', percent: Math.round(10 + p * 82) });
};

await conversion.execute();
const buffer = (output.target as BufferTarget).buffer;
self.postMessage({ type: 'done', blob: new Blob([buffer], { type: 'video/mp4' }) });

Caption Rendering

Captions are drawn with pill-shaped backgrounds using OffscreenCanvas 2D. Key trick: measure text width first, then derive the pill size:

function drawCaption(ctx, { text, style, position, size, canvasW, canvasH }) {
  const fontSize = { sm: 24, md: 32, lg: 44 }[size];
  ctx.font = `900 ${fontSize}px Impact, sans-serif`;
  ctx.textAlign = 'center';

  const textW = Math.min(ctx.measureText(text).width, canvasW * 0.85);
  const pillW = textW + 40;
  const pillH = fontSize * 1.3 + 20;
  const pillY = position === 'bottom' ? canvasH * 0.9 - pillH / 2 : canvasH / 2;

  // Draw pill background
  ctx.fillStyle = 'rgba(0,0,0,0.55)';
  roundRect(ctx, (canvasW - pillW) / 2, pillY - pillH/2, pillW, pillH, 12);
  ctx.fill();

  // Draw text with stroke for Hormozi style
  ctx.strokeStyle = '#000';
  ctx.fillStyle = '#fff';
  ctx.strokeText(text.toUpperCase(), canvasW / 2, pillY);
  ctx.fillText(text.toUpperCase(), canvasW / 2, pillY);
}

Main Thread Interface

The main thread just needs to spawn the worker and listen for messages:

export async function renderClipInBrowser(options: ClientRenderOptions): Promise<string> {
  if (!WEBCODECS_SUPPORTED) throw new Error('WebCodecs not supported');

  const { outputWidth, outputHeight } = getOutputDimensions(options.aspectRatio);
  // e.g. '9:16' → { outputWidth: 1080, outputHeight: 1920 }

  const worker = new Worker(new URL('./worker.ts', import.meta.url));
  // Next.js/webpack will bundle worker.ts as a separate chunk automatically

  const blob = await new Promise<Blob>((resolve, reject) => {
    worker.onmessage = (e) => {
      if (e.data.type === 'progress') options.onProgress?.(e.data.percent * 0.95);
      else if (e.data.type === 'done') { worker.terminate(); resolve(e.data.blob); }
      else { worker.terminate(); reject(new Error(e.data.message)); }
    };
    worker.postMessage({ type: 'render', ...message });
  });

  // Upload to R2 via presigned PUT
  await fetch(options.uploadUrl, {
    method: 'PUT',
    headers: { 'Content-Type': 'video/mp4' },
    body: blob,
  });

  options.onProgress?.(100);
  return options.uploadUrl.split('?')[0]; // base R2 object URL
}

Tips and Gotchas

1. Always close VideoSamples
Forget sample.close() and you'll exhaust GPU memory within seconds. The worker will silently hang or crash.

2. Progress math
We split the 0–100% range across phases: 0–8% = fetch, 8–10% = demux, 10–92% = encode, 92–95% = buffer, 95–100% = upload. Feels smooth.

3. Bitrate by resolution
4 Mbps for 1080p, 2 Mbps for anything smaller. H.264 at these rates gives good quality for short clips without huge file sizes.

4. Worker bundling in Next.js
new Worker(new URL('./worker.ts', import.meta.url)) works out of the box with Next.js + webpack. No extra config needed.

5. Capability fallback
Not every user has Chrome 94+. Always check WEBCODECS_SUPPORTED and fall back to a server-side path if needed.

Browser Support

Browser	Min version
Chrome	94
Edge	94
Firefox	130
Safari	❌ (as of 2025)

Safari still doesn't support VideoEncoder/VideoDecoder. Plan a server fallback for Safari users.

Conclusion

WebCodecs + OffscreenCanvas + Web Workers is a genuinely powerful combo for client-side video processing. The main thread stays responsive, the worker handles the heavy lifting, and the result is a fully encoded MP4 that you can upload directly to R2 or S3.

The key abstractions that made this manageable:

mediabunny for the codec/mux layer
OffscreenCanvas for frame compositing without DOM access
Presigned URLs for direct browser ↔ R2 transfers (no proxying through your server)

If you're building any kind of video editing tool, this stack is worth considering — it moves work to where compute is free.

DEV Community