DEV Community

Alberto Loddo
Alberto Loddo

Posted on • Originally published at firstcutstudio.xyz

I built an AI that edits GoPro footage automatically. Here is how it works

Every action camera owner knows the feeling. You come back from a surf trip, a mountain bike ride, or a ski weekend with 200 clips on your SD card. You tell yourself you'll edit them this weekend. You never do.

I've been sitting on GoPro footage from trips I took years ago. The editing process is just too painful: scrub through hours of shaky, boring footage to find the 30 seconds of gold, then figure out transitions, music timing, pacing. Most people give up and the footage sits on a hard drive forever.

So I built FirstCut Studio to fix this. You upload your raw clips, pick a vibe, and the AI does the rest. No timeline. No editing skills required. Just a highlight reel that actually looks good.

Here's how it works under the hood.

The Pipeline

FirstCut runs two separate pipelines: one for understanding your footage, one for creating the edit.

Import Pipeline (what happens when you upload clips):

  1. Ingest - We extract metadata, compute file hashes, probe video properties, and detect orientation. One important decision: we do zero video normalization at this stage. Raw files go straight through.

  2. Gemini Analyze - This is the core. We send each clip to Gemini 2.5 Flash with a structured prompt asking it to grade quality, identify scene boundaries, detect key moments (big air, crashes, scenic views, celebrations), and tag the emotional tone. Gemini returns JSON with timestamps and confidence scores.

  3. Music Analysis - We run librosa for beat tracking on the audio, then pass it through Gemini for semantic understanding (is this a buildup? a drop? a chill section?).

  4. Segment - Scene detection and clip extraction using the boundaries Gemini identified.

Render Pipeline (what happens when you hit "Create Edit"):

  1. Narrative Planner - An LLM-driven composition engine that selects which clips to include, in what order, with what effects.

  2. Music Timing - Beat-aligns every cut so transitions land on the beat. This is what makes auto-edits feel professional instead of random.

  3. Render - A 3-pass memory-efficient FFmpeg render. We process segments sequentially with garbage collection between operations, keeping peak memory around 1.5GB instead of 8GB.

  4. QC - Automated quality check validating EDL integrity and beat alignment accuracy.

The Interesting Engineering Challenge

The CreativeExecutionEngine bridges the gap between "make it feel energetic" and actual FFmpeg filter parameters.

The LLM outputs creative intent like "speed ramp into the jump, hold the apex, snap cut to landing." The engine maps that to concrete VFX: a 2x speed ramp with ease-in curve, a 0.5x slow-motion hold, and a 3-frame hard cut. We enforce hard caps (max 2 split screens, 3 speed ramps, 5 text overlays per edit) to prevent the AI from going overboard.

The stack: Next.js frontend, Python/FastAPI backend, Gemini 2.5 Flash for all video understanding, FFmpeg for rendering, and Cloudflare R2 for storage.

One thing I learned: skip video normalization. Early versions re-encoded every upload to a standard format before analysis. This tripled storage costs and added minutes to processing time. When I tested sending raw files directly to Gemini, it handled them perfectly. Removing normalization was the single biggest infrastructure win.

What Actually Happens

A user recently uploaded around 200 clips from a multi-day trip. The import pipeline processed all of them, Gemini graded each one, and the system identified the strongest moments across the entire collection. When they hit render, the narrative planner pulled the best footage, beat-matched everything, and delivered the final edit in minutes.

Try It

FirstCut is live at firstcutstudio.xyz with a free tier. Upload your forgotten GoPro footage and see what comes out. If you're a developer curious about the video AI space, I'd love to hear what you think.

Building in public, so reach out with questions about the architecture or suggestions for what to build next.

Top comments (0)