DEV Community

nareshipme
nareshipme

Posted on

Why We Switched to Streaming Frame Extraction for Mobile Video Editing

When we first started building ClipCrafter's browser-based video engine, our biggest enemy wasn’t the complexity of FFmpeg—it was something much more silent and deadly: Memory Exhaustion.

If you are working with high-resolution clips or long durations in a Web Worker, there is an invisible wall you will eventually hit. We call it "The OOM (Out Of Render) Wall."

Recently, we noticed that while our desktop users were enjoying smooth multi-clip stitching, mobile Chrome and Safari users on mid-range Androids/iPhones were experiencing frequent tab crashes during the rendering process. The culprit? Our frame extraction logic was attempting to hold too much in memory at once.

The Problem: $O(n)$ Memory Complexity

In our early architecture of framewebworker, we used a pattern that looked like this conceptually (simplified for clarity):

// ❌ THE OLD WAY: High risk of OOM crashes on mobile
async function extractFramesAllAtOnce(videoSource: Blob) {
  const frames = []; // This array grows linearly with video length!
  let currentTime = 0;

  while (currentTime < duration) {
    // We were grabbing a frame, converting it to an ImageBitmap/Blob...
    const frame = await captureFrameAt(videoSource, currentTime);
    frames.push(frame); // Every single frame stays in RAM until the end
    currentTime += interval; 
  }

  return frames; // If you have 500 frames of a long clip? Goodbye memory!
}
Enter fullscreen mode Exit fullscreen mode

In this "Load-All" approach, our memory complexity was $O(n)$, where $n$ is total number of extracted frames. For an 8MB video file might be fine—but for high-bitrate clips or longer sequences involving multiple stitches, the browser's heap would balloon until it hit its limit and killed the worker thread (and usually our entire app tab).

The Solution: Streaming Extraction

To fix this, we upgraded framewebworker to version 0.4.0 with a focus on Streaming Frame Extraction. Instead of accumulating an array of frames in memory before starting the stitch process, we moved toward $O(1)$ spatial complexity relative to total frame count (per step).

We refactored our pipeline so that each extracted frame is processed through its next stage immediately—whether it's being passed into a WebCodecs encoder or written as part of an intermediate stream.

Here’s how we restructured the logic:

// ✅ THE NEW WAY: Streaming approach (O(1) memory footprint per step)
async function processFramesStreaming(videoSource: Blob, onFrameReady: Callback<Blob>) {
  let currentTime = 0;
  const interval = 1 / 30; // target fps

  while (currentTime < duration) {
    // Extract ONLY the current frame needed for this specific timestamp
    const singleFrameBuffer = await captureSingleFrame(videoSource, currentTime);

    /**
     * Instead of pushing to a massive global array 'frames', 
     * we emit it immediately. The downstream consumer (WebCodecs) 
     * processes the frame and then allows this buffer to be garbage collected.
     */
    await onFrameReady(singleFrameBuffer);

    currentTime += interval;
  }
}
Enter fullscreen mode Exit fullscreen mode

Why This Matters for Developers Building Heavy Web Apps

By moving from an "Accumulate-then-Process" model to a Streaming model, we achieved two critical wins:

  1. Memory Stability: The memory footprint of our worker is now tied more closely to the size of one single frame rather than total video duration/frame count. This makes mobile rendering significantly more reliable even on low-end devices with restricted heap sizes.
  2. Time To First Byte (TTFB) for Rendering: Because we start processing frames as soon as they are extracted, our downstream encoders can begin working immediately without waiting for the "extraction phase" to finish entirely.

The Takeaway

When building resource-intensive features in JavaScript—whether it’s video editing with FFmpeg/WebCodecs or heavy data visualization—always ask yourself: Is my memory usage $O(n)$?

If your app's stability degrades as the input size increases, you don't have a logic bug; you have an architectural bottleneck. Switching to streaming-based processing is often more difficult than simple array manipulation (due to managing backpressure and state), but it’s what makes "pro" web tools possible on mobile browsers.

ClipCrafter continues to evolve! Check out our latest updates for even faster WebCodecs hardware acceleration.

Top comments (0)