I built an AI video clip finder that runs 100% in your browser — no uploads, no API, no GPU costs

Oleksandr — Tue, 16 Jun 2026 06:00:00 +0000

Every time I used Opus Clip or Vidyo.ai, the same thought hit me:
I’m paying $20/month to upload my video to someone else’s server,
wait in a queue, and hope their AI finds something useful.

So I built an alternative that runs entirely in the browser.
No file uploads. No subscriptions. No server costs on my end.
The result is ClipGG’s AI Video Highlights tool —
and in this post I’ll walk through exactly how it works technically.

The core problem I was solving

Finding highlights in a long video is genuinely hard to automate well.
The expensive approach: transcribe with Whisper, feed text to GPT-4,
profit. But that requires a backend, API costs, and user uploads.

I wanted zero server involvement.
That meant doing everything with browser APIs.

What actually runs in the browser

The pipeline has four stages:

1. File reading — no upload needed

const arrayBuffer = await file.arrayBuffer()
// The file never leaves the device.
// ArrayBuffer is passed directly to Web Audio API.

2. Audio analysis — Web Audio API + Web Worker

I use OfflineAudioContext to decode audio faster than real-time,
then downsample to 8000–11025 Hz before analysis.
This reduces RAM usage from ~115MB to ~19MB for a 10-minute video.

// Decode in a Web Worker so the UI never freezes
const tempCtx = new OfflineAudioContext(1, 44100, 44100)
const audioBuffer = await tempCtx.decodeAudioData(arrayBuffer)

// Downsample manually — OfflineAudioContext does NOT resample automatically
function downsample(channelData, originalRate, targetRate) {
  const ratio = originalRate / targetRate
  const output = new Float32Array(Math.floor(channelData.length / ratio))
  for (let i = 0; i < output.length; i++) {
    const start = Math.floor(i * ratio)
    const end = Math.min(Math.floor((i + 1) * ratio), channelData.length)
    let sum = 0
    for (let j = start; j < end; j++) sum += channelData[j]
    output[i] = sum / (end - start)
  }
  return output
}

3. Scoring — three audio signals

For each 500ms window I compute:

RMS (Root Mean Square) — average energy/loudness
ZCR (Zero Crossing Rate) — distinguishes speech from noise
Volume Peak — catches sudden loud moments

Then I do relative normalization so a quiet podcast
and a loud gaming stream are scored fairly against themselves:

// Relative normalization — key insight
const normalizedRms = (seg.rms - globalMinRms) / (globalMaxRms - globalMinRms)

Different content types use different weights:

Mode	RMS	ZCR	Peak
Gaming	0.20	0.35	0.20
Podcast	0.50	0.05	0.20
Funny	0.15	0.20	0.35
General	0.30	0.20	0.25

4. Clip selection — diversity + peak centering

The selector groups high-scoring segments into zones,
finds the peak moment in each zone, and centers a 30–90 second
clip around it. A diversity radius of 12 seconds prevents
three clips from covering the same moment.

const combinedSignal =
  (seg.score ?? 0) +
  (seg.energyChange ?? 0) * 2.0 +
  (seg.volumePeak ?? 0) * 1.5

// Center the clip around the strongest combined signal,
// not just the loudest sustained section

The Safari problem I didn’t expect

Safari on iOS can’t decode video containers
via AudioContext.decodeAudioData().
It only accepts clean audio files.

The fix: detect iOS and pre-extract audio with FFmpeg.wasm
before passing it to the Web Audio API:

const isIOS = /iPhone|iPad|iPod/i.test(navigator.userAgent)

if (isIOS) {
  await ffmpeg.exec([
    '-i', 'input_video',
    '-vn',
    '-acodec', 'pcm_s16le',  // WAV — guaranteed to work on all iOS versions
    '-ar', '44100',
    '-ac', '1',
    'audio.wav'
  ])
  // Pass audio.wav to Web Audio instead of the original video
}

WAV/PCM is uncompressed and works reliably on every iOS version.
AAC containers are not.

Export — FFmpeg.wasm with stream copy

Once highlights are found, FFmpeg.wasm cuts the clips:

// Fast path: H.264 + AAC + MP4 = stream copy, no re-encoding
// A 90-second clip exports in ~2–3 seconds
await ffmpeg.exec([
  '-ss', String(clip.start),
  '-i', 'input',
  '-t', String(clip.end - clip.start),
  '-c', 'copy',               // copy bytes, don't re-encode
  '-avoid_negative_ts', 'make_zero',
  '-movflags', '+faststart',
  outputName
])

Non-standard formats (MOV, MKV, AV1) get converted to MP4 first
before the analysis pipeline runs. This also fixed all the
“file won’t export” bugs from iPhone footage.

What I learned

OfflineAudioContext doesn’t resample.
I assumed new OfflineAudioContext(1, length, 8000)
would give me 8kHz audio. It doesn’t.
You get whatever sample rate the source file has.
Downsampling has to be manual.

Transfer, don’t copy ArrayBuffers.
worker.postMessage({ arrayBuffer }, [arrayBuffer])
transfers ownership with zero memory copy.
Without the second argument you’re doubling RAM usage.

-ss before -i for stream copy, after for re-encode.
This one cost me an hour. For -c copy, seek before input
for speed. For re-encoding, seek after input for frame accuracy.

Try it

The tool is live and free at:
👉 https://clipgg.uk/en/ai-video-highlights

Drop a video, pick a mode (Gaming / Podcast / Funny / General),
and get three highlight clips with timestamps in about 30 seconds.

No account. No upload. Works on desktop Chrome, Firefox,
and now iOS Safari too.

Curious what others think about the audio scoring approach —
would love feedback on the algorithm in the comments.

I built 12 free browser-based tools for creators — here's what I learned

Oleksandr — Sun, 31 May 2026 19:12:10 +0000

A few weeks ago I launched ClipGG — a collection of 12 free
browser-based tools for content creators, video editors,
and writers. No signups, no file uploads, no subscriptions.
Everything runs locally in the browser.

Here is what I learned building it.

Why browser-based?

The obvious reason is privacy. When you upload a video to
an online tool, you have no idea where that file goes or
how long it sits on someone's server. With browser-based
processing, the file never leaves your device.

The less obvious reason is speed. No upload queue, no
server processing time, no waiting. The Web Audio API,
Canvas API, and MediaRecorder handle surprisingly heavy
tasks directly in the tab.

What the tools do

The suite covers the small repetitive tasks that slow
down a creator's workflow:

Word & Character Counter — real-time word count, reading time, speaking time, and keyword density
SRT Subtitle Cleaner — strips timecodes and tags from subtitle files, converts to plain text or article format
SRT ↔ VTT Converter — converts between subtitle formats for HTML5 video and YouTube
YouTube Title Validator — previews how your title looks in desktop and mobile search before publishing
Audio Extractor — pulls audio from MP4, WebM, MKV using the Web Audio API, no upload
Video Aspect Ratio Resizer — crops horizontal video to 9:16 for TikTok and Shorts with blur background fill
AI Video Hook Generator — generates scroll-stopping opening lines for short-form video
AI Freelance Email Generator — writes cold emails and Upwork proposals based on job description
AI Content Repurpose Machine — turns one piece of content into Twitter threads, LinkedIn posts, and more
Bulk Image Compressor — batch compresses JPEG, PNG, WebP locally using Canvas
YouTube Thumbnail Downloader — grabs HD thumbnails and extracts dominant color palettes
Video Teleprompter — smooth scrolling prompter with mirror mode and webcam overlay

One technical thing worth sharing

The Audio Extractor was the hardest to get right. True MP3
encoding requires a licensed codec that browsers don't
include natively. The output is WebM/Opus — smaller than
WAV, excellent quality, plays in every modern browser and
media player. Renaming to .mp3 works in most players too.

For the Video Aspect Ratio Resizer, the blur background
effect uses a second canvas layer running the same video
at low resolution with a CSS blur filter, composited
behind the main cropped layer using MediaRecorder. It
runs at full speed on any modern laptop.

The stack

Next.js App Router, 16 languages via i18n, all processing
in the browser using native Web APIs. No backend for the
tools themselves.

Try it

Everything is free at clipgg.uk —
no account, no limits.

Happy to answer questions about any of the browser API
implementations.

DEV Community: Oleksandr

I built an AI video clip finder that runs 100% in your browser — no uploads, no API, no GPU costs

The core problem I was solving

What actually runs in the browser

1. File reading — no upload needed

2. Audio analysis — Web Audio API + Web Worker

3. Scoring — three audio signals

4. Clip selection — diversity + peak centering

The Safari problem I didn’t expect

Export — FFmpeg.wasm with stream copy

What I learned

Try it

I built 12 free browser-based tools for creators — here's what I learned

Why browser-based?

What the tools do

One technical thing worth sharing

The stack

Try it