Alex Neamtu

Posted on Feb 10 • Originally published at sendrec.eu

How We Built Server-Side Video Trimming With ffmpeg

#opensource #go #react #webdev

Every screen recording has dead time. The three seconds before you start talking. The fumbled tab switch at the end. The pause where you forgot what to say. Small things, but they make the difference between a recording someone watches and one they skip.

Most screen recording tools either don't offer trimming, or they do it in the browser. Both approaches have real problems. Here's what we built instead and why.

Why not trim in the browser?

The browser can decode and play video, so it feels natural to trim there too. The typical approach: load the video into a <video> element, let the user pick start and end points, then use the MediaRecorder API or WebCodecs to re-encode just the selected portion.

The problems start immediately:

Re-encoding is slow. WebCodecs can decode and encode video frames, but doing it in the browser means running on the user's machine. A 5-minute 1080p recording can take minutes to re-encode on a laptop. The user stares at a progress bar, can't close the tab, and if they do, the trim is lost.

Format support is limited. Browsers support a narrow set of codecs for encoding. Want to output VP9 in a WebM container? Chrome can do it, but Safari can't. Want H.264 in MP4? You'll need a WASM build of ffmpeg, which adds megabytes to your bundle and still runs slower than native.

Memory is a problem. Loading a full video into memory for frame-by-frame processing can easily exceed browser memory limits. A 10-minute 1080p recording at 30fps is thousands of frames, each consuming several megabytes when decoded.

MediaRecorder can't seek. You can't tell MediaRecorder to start recording from timestamp 5.0s. You'd have to play the video in real time from the start point to the end point while recording — meaning a 2-minute trim takes 2 minutes to produce. This isn't trimming, it's re-recording.

Server-side trimming avoids all of this. The video stays in object storage, ffmpeg runs natively on the server, and the user's browser only needs to show a UI.

The trim UI: a custom timeline

The first version of our trim UI used two HTML range sliders — one for the start point, one for the end point. It worked, but it was confusing. Two disconnected sliders don't give you a sense of the selected region, and there's no way to preview what you're cutting.

We replaced it with a custom timeline that shows the video alongside draggable handles:

+------------------------------------+
|                                    |
|         <video player>             |
|                                    |
+------------------------------------+

[===|████████████████|=====]
    ^start          ^end

Start: 0:05    Duration: 1:40    End: 1:45

                   [Cancel]  [Trim]

The video player loads the actual recording. Dragging either handle seeks the video to that timestamp, so you can see exactly what frame you're cutting to. Clicking anywhere on the track jumps the nearest handle to that position.

The implementation uses vanilla React with mouse and touch event handling — no external dependencies:

function handlePointerDown(handle: "start" | "end") {
  return (e: React.MouseEvent | React.TouchEvent) => {
    e.preventDefault();
    draggingRef.current = handle;

    function onMove(ev: MouseEvent | TouchEvent) {
      const point = "touches" in ev ? ev.touches[0] : ev;
      const rect = trackRef.current.getBoundingClientRect();
      const x = Math.max(0, Math.min(point.clientX - rect.left, rect.width));
      const percent = (x / rect.width) * 100;
      const seconds = (percent / 100) * duration;

      if (draggingRef.current === "start") {
        setStartSeconds(Math.max(0, Math.min(seconds, endSeconds - 1)));
      } else {
        setEndSeconds(Math.min(duration, Math.max(seconds, startSeconds + 1)));
      }
      videoRef.current.currentTime = seconds;
    }

    document.addEventListener("mousemove", onMove);
    document.addEventListener("mouseup", () => {
      draggingRef.current = null;
      document.removeEventListener("mousemove", onMove);
    });
  };
}

Key details: the handles enforce a minimum 1-second selection so you can't create an empty trim. The video seeks during drag, not after, which makes it feel responsive. Touch events are handled alongside mouse events for mobile support.

Server-side: ffmpeg does the heavy lifting

When the user clicks Trim, the frontend sends a simple request:

POST /api/videos/{id}/trim
{ "startSeconds": 5.0, "endSeconds": 105.0 }

The server validates the input, sets the video status to "processing," and kicks off an async goroutine. The response is an immediate 202 Accepted — the user doesn't wait for the trim to finish.

The actual trimming is straightforward:

ffmpeg -i input.webm \
  -ss 5.000 -to 105.000 \
  -c:v libvpx-vp9 -c:a copy \
  -y output.webm

The -ss and -to flags handle the time range. The video track is re-encoded to VP9 (necessary for frame-accurate cutting), while the audio is copied through unchanged since audio codecs don't have keyframe alignment issues.

In Go, this is a simple exec.Command:

func trimVideo(inputPath, outputPath string, startSeconds, endSeconds float64) error {
    cmd := exec.Command("ffmpeg",
        "-i", inputPath,
        "-ss", fmt.Sprintf("%.3f", startSeconds),
        "-to", fmt.Sprintf("%.3f", endSeconds),
        "-c:v", "libvpx-vp9",
        "-c:a", "copy",
        "-y",
        outputPath,
    )
    output, err := cmd.CombinedOutput()
    if err != nil {
        return fmt.Errorf("ffmpeg trim: %w: %s", err, string(output))
    }
    return nil
}

The full async flow: download the original from S3, run ffmpeg, upload the trimmed version back to S3, update the database with the new duration, and regenerate the thumbnail. On our Hetzner CX33 (4 vCPU, 8GB RAM), a typical trim takes under 10 seconds.

Handling the async gap

Between clicking Trim and the server finishing, the video is in a "processing" state. The user needs to know what's happening.

When the trim starts, we update the video's status locally:

onTrimStarted={() => {
  setVideos((prev) =>
    prev.map((v) => (v.id === trimmingId ? { ...v, status: "processing" } : v))
  );
  setTrimmingId(null);
}}

The library shows "processing..." next to the video. But how does the UI know when trimming is done? We poll:

useEffect(() => {
  const hasProcessing = videos.some((v) => v.status === "processing");
  if (!hasProcessing) return;

  const interval = setInterval(async () => {
    const result = await apiFetch("/api/videos");
    setVideos(result ?? []);
  }, 5000);

  return () => clearInterval(interval);
}, [videos]);

Every 5 seconds, if any video is processing, the library refetches the video list. When the video's status changes back to "ready," polling stops automatically. No WebSockets, no server-sent events — just simple polling that starts and stops as needed.

Graceful degradation

Server-side processing introduces failure points. ffmpeg might crash. S3 might be unreachable. The server might run out of disk space. If any of these happen during a trim, the user should not lose their original video.

Every error path in the trim function falls back to restoring the video's "ready" status:

setReadyFallback := func() {
    db.Exec(ctx,
        `UPDATE videos SET status = 'ready', updated_at = now() WHERE id = $1`,
        videoID,
    )
}

If the trim fails at any step — download, ffmpeg, upload — the fallback runs. The original video file in S3 is untouched until the trimmed version is successfully uploaded and ready to replace it. The user sees the video return to "ready" and can try again.

Race conditions and concurrent trims

What happens if someone clicks Trim twice, or trims a video that's already being composited with a webcam overlay?

The handler checks the video's current status before proceeding:

if status != "ready" {
    httputil.WriteError(w, http.StatusConflict, "video is currently being processed")
    return
}

And the status update uses an atomic check:

tag, err := h.db.Exec(r.Context(),
    `UPDATE videos SET status = 'processing'
     WHERE id = $1 AND user_id = $2 AND status = 'ready'`,
    videoID, userID,
)
if tag.RowsAffected() == 0 {
    httputil.WriteError(w, http.StatusConflict, "video is already being processed")
    return
}

The WHERE status = 'ready' clause in the UPDATE acts as an optimistic lock. Even if two trim requests arrive simultaneously and both pass the initial check, only one will successfully set the status to "processing." The other gets a 409 Conflict.

Validation at the boundary

The trim endpoint validates everything before starting work:

startSeconds must be non-negative
endSeconds must be greater than startSeconds
endSeconds can't exceed the video's actual duration
The trimmed segment must be at least 1 second long

All of these return 400 Bad Request with specific error messages. The frontend enforces these constraints too — the timeline handles can't cross each other and enforce the 1-second minimum — but the server doesn't trust the client.

The tradeoff

Client-side trimming would be instant. No upload, no server processing, no polling. But it would be limited to browsers that support the right codecs, it would consume the user's CPU and memory, and it would break if they close the tab.

Server-side trimming takes a few seconds but works on any video format, runs on hardware we control, and degrades gracefully when things go wrong. For a tool where recordings are important enough to keep, reliability matters more than speed.

Try it

SendRec is open source (AGPL-3.0) and self-hostable. The video trimming feature is live at app.sendrec.eu. The implementation is in trim.go and TrimModal.tsx.

DEV Community