Every time I watch a stream, I think the same thing: there are 10 incredible moments buried in 3 hours of footage that no one will ever see.
Editing is the bottleneck. It takes skill, time, and software most people don't want to learn. So the best gaming moments just... disappear.
I built clipforge to fix that. Upload a gameplay video. Get three ready-to-post formats back — automatically.
What it produces
From a single upload (mp4, mov, or mkv — up to 10 minutes):
- TikTok/Reels clip — 60 seconds max, vertical 9:16 crop, auto-captions via Whisper
- YouTube highlight reel — top moments sequenced, up to 10 minutes
- Cinematic trailer — 90 seconds, fast cuts + slow-mo climax on the best moment
All three come back as a single ZIP download.
How the pipeline works
Upload video
↓
Scene detection (PySceneDetect)
↓
Highlight scoring (librosa RMS energy)
↓
Clip selection (top N by score)
↓
Format assembly (moviepy)
├── TikTok: best moment, vertical crop, captions
├── YouTube: top moments concatenated
└── Trailer: fast cuts + slow-mo climax
↓
ZIP download
Scene detection
PySceneDetect finds boundaries where the video content changes significantly — cut to a new location, a killcam, a respawn screen. Each boundary becomes a candidate scene.
Highlight scoring
For each scene, I extract the audio with librosa and compute the mean RMS energy. This is a simple but effective proxy for excitement: explosions, clutch moments, commentary peaks all produce louder audio than menu screens or downtime.
def _rms_score(y, sr, start, end):
segment = y[int(start * sr):int(end * sr)]
rms = librosa.feature.rms(y=segment, frame_length=2048, hop_length=512)
return float(np.mean(rms))
Scenes are ranked by score. The top 10 go to the assembler.
Format assembly
moviepy handles the actual video cutting. The TikTok path crops to 9:16 and adds caption overlays. The trailer path applies 0.5x slow-mo to the highest-scoring moment and stacks fast cuts before it.
Whisper runs locally on the TikTok segment to generate captions — no API key, no upload, no cost per use.
The stack
| Layer | Tech |
|---|---|
| Backend | Python, FastAPI, BackgroundTasks |
| Scene detection | PySceneDetect |
| Audio analysis | librosa |
| Video editing | moviepy |
| Captions | OpenAI Whisper (local) |
| Frontend | Next.js 15, Tailwind CSS |
| Deploy | Railway (backend) + Vercel (frontend) |
What I learned building it
RMS energy is a surprisingly good highlight detector. I expected to need something more sophisticated — computer vision, game event detection, kill feed parsing. But audio alone gets you 80% of the way there. Exciting moments in games are almost always loud moments.
PySceneDetect is fast and battle-tested. I considered writing my own frame differencing logic. I'm glad I didn't. The library handles edge cases (fades, flashes, black frames) that would have taken weeks to debug myself.
Whisper on a 60-second clip is fast enough. I expected local transcription to be a bottleneck. On a modern machine with the base model, a 60-second clip transcribes in under 10 seconds. Good enough for v1.
moviepy's resource management requires care. VideoFileClip objects need explicit .close() calls or you'll leak file handles and temp files across the process lifetime. I wrapped the source in a try/finally block after the first review caught a resource leak on exception paths.
Try it
Code: LakshmiSravyaVedantham/clipforge
git clone https://github.com/LakshmiSravyaVedantham/clipforge
cd clipforge/backend
pip install -r requirements.txt
uvicorn main:app --reload
Open the frontend separately:
cd frontend && npm install && npm run dev
The most common gaming moments I see not getting shared aren't the planned ones. They're the accidental clutches, the absurd bug moments, the 1-in-1000 shots nobody was recording for. clipforge is built for those.
Top comments (0)