DEV Community: Mason K

Build a video watch-time heatmap: interval tracker, beacon endpoint, canvas render

Mason K — Mon, 13 Jul 2026 09:17:47 +0000

TL;DR

We're building the "which parts of this video do people actually watch" chart from scratch: a client-side interval tracker (~80 lines, zero dependencies), a beacon endpoint with SQLite per-second counters, and a canvas renderer. No analytics vendor, no SDK, data stays yours.

Completion rate tells you how much of a video people watched. The heatmap tells you where: the intro everyone skips, the section a quarter of viewers rewatch, the cliff where they leave. Hosted platforms sell this chart as a premium analytics feature, but the mechanics are simple: watched intervals in, per-second counters out.

The one rule that makes it work: track segments of media time, not events. Events lie the moment someone seeks or rewatches. Intervals don't.

1. The interval tracker 🎯

A viewing session is a list of spans: {start, end} in media time. We open a span when playback advances and close it on anything that breaks continuity (pause, seek, ended, tab hidden). A rewatch just produces overlapping spans, which is exactly the signal we want.

// public/heatmap-tracker.js
export class WatchTracker {
  constructor(video, { videoId, endpoint }) {
    this.video = video;
    this.videoId = videoId;
    this.endpoint = endpoint;
    this.sessionId = crypto.randomUUID();
    this.spans = [];
    this.openStart = null;

    const close = () => this.closeSpan();
    video.addEventListener("timeupdate", () => this.tick());
    video.addEventListener("pause", close);
    video.addEventListener("seeking", close);   // close at departure point
    video.addEventListener("ended", close);
    document.addEventListener("visibilitychange", () => {
      if (document.visibilityState === "hidden") { this.closeSpan(); this.flush(); }
    });
    window.addEventListener("pagehide", () => { this.closeSpan(); this.flush(); });
  }

  tick() {
    const t = this.video.currentTime;
    if (this.video.paused || this.video.seeking) return;
    if (this.openStart === null) { this.openStart = t; return; }
    // guard: timeupdate cadence is load-dependent (spec: ~4Hz to 66Hz).
    // A jump far beyond one tick means we missed a seek; close defensively.
    if (t < this.openStart || t - this.lastTick > 5) {
      this.closeSpan();
      this.openStart = t;
    }
    this.lastTick = t;
  }

  closeSpan() {
    if (this.openStart === null) return;
    const end = this.video.currentTime;
    if (end - this.openStart > 0.5) {           // ignore sub-500ms noise
      this.spans.push({ start: this.openStart, end });
    }
    this.openStart = null;
  }

  flush() {
    if (!this.spans.length) return;
    const payload = JSON.stringify({
      videoId: this.videoId,
      sessionId: this.sessionId,
      spans: this.spans,
    });
    navigator.sendBeacon(this.endpoint, payload);  // survives tab close
    this.spans = [];
  }
}

Wire it to any <video> (works the same under hls.js or video.js, since they drive a media element):

// public/app.js
import { WatchTracker } from "./heatmap-tracker.js";

const video = document.querySelector("video");
new WatchTracker(video, {
  videoId: "onboarding-demo-v3",
  endpoint: "/collect",
});

💡 Tip: sendBeacon is the whole reason this data survives. It's a fire-and-forget POST the browser completes even as the page unloads. fetch(..., { keepalive: true }) works too if you need headers.

2. The collect endpoint + per-second counters 🗄️

Server side, we slice each video into one-second bins and increment every bin a span covers. SQLite is plenty; one row per (video, second):

// server.js  (node 20.x, better-sqlite3 ^11)
import express from "express";
import Database from "better-sqlite3";

const db = new Database("heatmap.db");
db.exec(`CREATE TABLE IF NOT EXISTS bins (
  video_id TEXT NOT NULL,
  second   INTEGER NOT NULL,
  views    INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (video_id, second)
)`);

const bump = db.prepare(`
  INSERT INTO bins (video_id, second, views) VALUES (?, ?, 1)
  ON CONFLICT(video_id, second) DO UPDATE SET views = views + 1
`);

const app = express();
app.use(express.text({ type: "*/*" }));       // beacons arrive as text

app.post("/collect", (req, res) => {
  const { videoId, spans } = JSON.parse(req.body);
  const insertAll = db.transaction((spans) => {
    for (const { start, end } of spans) {
      const a = Math.max(0, Math.floor(start));
      const b = Math.ceil(end);
      for (let s = a; s < b; s++) bump.run(videoId, s);
    }
  });
  insertAll(spans);
  res.sendStatus(204);
});

app.get("/heatmap/:videoId", (req, res) => {
  const rows = db.prepare(
    "SELECT second, views FROM bins WHERE video_id = ? ORDER BY second"
  ).all(req.params.videoId);
  res.json(rows);
});

app.listen(3000, () => console.log("collector on :3000"));

Sanity-check it before touching the UI:

curl -X POST localhost:3000/collect \
  -d '{"videoId":"demo","sessionId":"t1","spans":[{"start":0,"end":12.4},{"start":8,"end":12.4}]}'

curl -s localhost:3000/heatmap/demo | jq -c '.[0:4]'
# [{"second":0,"views":1},{"second":1,"views":1},...]
# seconds 8-12 should show views=2 (the rewatch overlap)

⚠️ Note: don't store PII. A random sessionId per playback is enough for de-duping later, and the aggregate table contains no user data at all.

3. Draw it 🎨

One canvas, one bar per second, color by intensity:

// public/render.js
export async function drawHeatmap(canvas, videoId, duration) {
  const bins = await (await fetch(`/heatmap/${videoId}`)).json();
  const ctx = canvas.getContext("2d");
  const max = Math.max(...bins.map(b => b.views), 1);
  const w = canvas.width / duration;

  ctx.clearRect(0, 0, canvas.width, canvas.height);
  for (const { second, views } of bins) {
    const heat = views / max;                       // 0..1
    ctx.fillStyle = `hsl(${220 - heat * 190}, 85%, ${35 + heat * 20}%)`;
    const h = canvas.height * (0.15 + 0.85 * heat); // floor so gaps stay visible
    ctx.fillRect(second * w, canvas.height - h, Math.ceil(w), h);
  }
}

Drop it under the player and you'll immediately see the three signatures every heatmap shows: the intro cliff (cold bars after second ~10), the slow bleed, and (on tutorial content) hot rewatch spikes. A spike is ambiguous by nature: it's either your best moment or your most confusing one, and only watching that section tells you which.

4. Production notes 📋

Bin size: per-second is fine up to feature length. For hour-plus content, 2 to 5 second bins keep the table and the chart sane.
Cohorts: add a source column (email, landing page, organic) and keep separate counters. Blended cohorts produce a heatmap of nobody.
Playback rate: spans are in media time, so 2x watchers are counted correctly by construction. That's the quiet advantage of intervals over "ping every N wall-clock seconds".
Total plays vs unique viewers: the views counter counts coverage, so one viewer looping a section three times adds three. That's the right default for a "most replayed" signal. If you also want "how many distinct sessions reached this second", keep a second counter and increment it at most once per sessionId per bin (dedupe the bins per beacon batch before writing).
Volume: one row write per viewed second per session. For most product videos that's nothing; if you're at real scale, buffer beacons into a queue and batch the upserts.
Live streams: this design is VOD-shaped. For live you'd bin by stream clock instead; different article.

What's next 🚀

Two upgrades fall out of owning this data. First, "most replayed": you already have the array, so rendering the peak like the big platforms do is pure UI. Second, feed the peaks into your pipeline: auto-pick thumbnails from the hottest second, or place chapter markers at attention spikes. And if you want the player-side signals to go deeper (startup time, rebuffering, bitrate switches), that's the QoE half of analytics; the interval model here composes cleanly with those event streams too.

Turn on AV1 film grain synthesis and measure what it saves on your own footage

Mason K — Mon, 13 Jul 2026 09:15:51 +0000

TL;DR

Film grain is random, codecs can't predict randomness, so grainy sources cost a fortune in bitrate. AV1 can strip the grain, encode the clean image, and have the decoder synthesize matching grain at playback. We'll enable it with one FFmpeg parameter, build a small compare script, and check the results the only way that's valid: with your eyes.

Grain and sensor noise change every frame, everywhere, which means motion prediction fails on every block. The encoder either burns bits reproducing noise or smears it into that waxy denoised look. AV1's film grain synthesis (FGS) offers a third path: the grain is modeled at encode time, the parameters ride along in the bitstream (a few bytes), and the decoder repaints statistically matched grain as a post-processing step. dav1d, the decoder in your browser, does this in production today.

Let's actually run it instead of reading about it.

1. Check your toolchain 🛠️

You need FFmpeg built with libsvtav1. FFmpeg 5.1+ passes SVT-AV1 parameters through; anything current (FFmpeg 7.x or the 8.x line, which is on 8.1 as of March 2026) is fine, ideally with a recent SVT-AV1 (4.x shipped early this year):

ffmpeg -version | head -1
# ffmpeg version 8.1 ...

ffmpeg -hide_banner -encoders | grep svtav1
#  V....D libsvtav1            SVT-AV1(Scalable Video Technology for AV1) encoder

No libsvtav1 line? Grab a static build or brew install ffmpeg / your distro's current package.

Pick a genuinely grainy test clip: a film scan, low-light footage, a concert recording. FGS on clean screen-capture content will show you nothing, because there's no grain to model. Cut a 60 second sample so iterations are fast:

ffmpeg -ss 00:05:00 -i source-master.mov -t 60 -c copy sample.mov

2. Encode the baseline 📼

First, a normal SVT-AV1 encode, no grain handling:

ffmpeg -i sample.mov -c:v libsvtav1 -preset 5 -crf 32 \
  -pix_fmt yuv420p10le \
  -svtav1-params tune=0 \
  -an baseline.mkv

Notes on the flags: preset 5 is a reasonable quality/speed midpoint, crf 32 is a sane starting quality target for 1080p, tune=0 selects the perceptual tuning, and 10-bit (yuv420p10le) is standard practice for AV1 even with 8-bit sources since it reduces banding.

3. Encode with film grain synthesis 🎞️

One parameter changes:

ffmpeg -i sample.mov -c:v libsvtav1 -preset 5 -crf 32 \
  -pix_fmt yuv420p10le \
  -svtav1-params tune=0:film-grain=8:film-grain-denoise=1 \
  -an grain-synth.mkv

What the two new knobs do:

film-grain=8 sets the grain modeling level (0 to 50). Community guidance that holds up in practice: around 8 for normally grainy live action, around 4 for animation or clean digital sources. Crank it only for genuinely noisy scans.
film-grain-denoise=1 tells the encoder to actually remove the modeled grain from the frames before encoding them. This is where large savings come from on noisy sources, because the encoder now sees a clean, predictable image. With it off, you're layering synthetic grain over grain that's still (expensively) in the encode.

⚠️ Note: FGS is off by default in SVT-AV1. If you didn't ask for it, you don't have it.

If the encode dies immediately with something like this:

# Svt[error]: Error parsing svtav1-params: unrecognized option film-grain-denoise
# Error setting option svtav1-params to value ...

your libsvtav1 predates the option spelling your build expects, or your FFmpeg predates 5.1's svtav1-params passthrough entirely. Update the FFmpeg build rather than fighting it; current static builds bundle a recent SVT-AV1.

4. Compare the numbers 📊

# compare.sh
for f in baseline.mkv grain-synth.mkv; do
  ffprobe -v error -select_streams v:0 \
    -show_entries format=size,bit_rate,duration \
    -of default=noprint_wrappers=1 "$f" | sed "s/^/$f  /"
done

./compare.sh
# baseline.mkv     duration=60.000000
# baseline.mkv     size=<yours here>
# baseline.mkv     bit_rate=<yours here>
# grain-synth.mkv  duration=60.000000
# grain-synth.mkv  size=<yours here>
# grain-synth.mkv  bit_rate=<yours here>

I'm deliberately not printing my numbers, and I'd side-eye any article that quotes one as if it were general. The delta depends entirely on how much of your bitrate was grain. Noisy film scans can shrink dramatically; lightly grainy content shifts modestly; clean content barely moves. Run it on your library sample, not mine.

5. Verify with your eyes, not a metric 👀

Here's the part most write-ups skip: full-reference quality metrics are structurally unreliable on FGS output. The synthesized grain is intentionally not the same pixels as the source grain, so pixel-comparing metrics can punish a frame that looks right and reward a smeared one compared against a denoised reference. The community docs are blunt that heavy grain remains an AV1 weak point even with FGS. So we A-B it:

# play both in sync-ish loops, switch between windows
mpv --loop baseline.mkv &
mpv --loop grain-synth.mkv &

Watch for the three classic FGS failure smells:

Uniformity. Real grain varies with brightness; synthetic grain can look same-everywhere, especially in shadows.
Twinkle. Grain that reads as animated static rather than texture.
Plastic mid-tones. If film-grain-denoise was too aggressive for the source, faces go waxy underneath the synthetic layer.

If level 8 shows artifacts, step down to 6 or 4 and re-run; the sweep takes minutes on a 60 second sample. For archival or authorial grain (a director chose that film stock), keep a human review in the loop before committing a whole catalog.

6. Ladder notes 🪜

Two practical integration details before you wire this into a pipeline:

Apply FGS on the renditions where grain survives anyway. Your 240p rung has no grain worth preserving after downscaling; spend the analysis on the top rungs.
Decoder-side cost is real but small on modern devices (dav1d handles synthesis efficiently). If your audience includes very old or very cheap hardware, put one such device on the test matrix before rollout.

7. Automate the sweep 🔁

Once the two-encode version works, the per-title version is a loop. This emits a size table so the grain level becomes a data-driven, per-content decision:

# sweep.sh
set -e
for g in 0 4 8 12; do
  ffmpeg -y -v error -i sample.mov -c:v libsvtav1 -preset 5 -crf 32 \
    -pix_fmt yuv420p10le \
    -svtav1-params "tune=0:film-grain=${g}:film-grain-denoise=1" \
    -an "out-fg${g}.mkv"
  printf "film-grain=%-3s %s bytes\n" "$g" "$(wc -c < out-fg${g}.mkv)"
done

Run it overnight against one representative sample per title category (film scan, talking head, animation, screen capture) and you'll have an encoding policy instead of a default.

What's next 🚀

Two directions from here. First, read the SVT-AV1 film grain appendix (in the repo's Docs/ folder) to understand the underlying noise model; it's short and it explains why the synthesis behaves the way it does. Second, if you're already doing per-title encoding, FGS slots naturally into the same "let the content decide" philosophy, and the sweep script above is the seed of exactly that pipeline.

Ship your first HLS interstitial: one DATERANGE tag, an asset list, and hls.js 1.6

Mason K — Mon, 13 Jul 2026 09:12:56 +0000

TL;DR

We're going to insert a bumper into an existing VOD stream without touching a single media segment. One EXT-X-DATERANGE tag in the playlist, one JSON endpoint for the asset list, and hls.js 1.6+ handles scheduling, playback, and resume. You'll also get the event wiring for a real "Ad playing" UI.

The old way to put an ad or slate inside an HLS stream was to splice its segments into the media playlist behind an EXT-X-DISCONTINUITY tag. It works, but your playlist stops being cacheable, your timeline math gets weird, and every player handles the seam slightly differently.

HLS interstitials flip the model: the primary playlist stays untouched, and a date-range tag tells the player "at this point, go play this other thing, then come back." Apple players have supported this natively for a while. On the open web it became practical when hls.js shipped interstitials support in v1.6.0. Let's wire it up end to end.

1. What you need 🛠️

Any working VOD HLS stream (a .m3u8 you control)
hls.js 1.6.0 or newer (we'll use the latest 1.6.x)
Node 20.x for the tiny asset-list server
A short bumper clip, already packaged as HLS (5 to 15 seconds is perfect)

Check your hls.js version first; interstitials do nothing on 1.5:

npm ls hls.js
# └── hls.js@1.6.7

2. Schedule the interstitial in the playlist 📼

Interstitials are scheduled with EXT-X-DATERANGE and keyed to wall-clock time, so the playlist needs a PROGRAM-DATE-TIME anchor. For VOD, you pick an arbitrary epoch and offset from it. Here's a primary media playlist with a bumper scheduled 10 seconds in:

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-TARGETDURATION:6
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-PROGRAM-DATE-TIME:2026-01-01T00:00:00.000Z
#EXT-X-DATERANGE:ID="mid-1",CLASS="com.apple.hls.interstitial",START-DATE="2026-01-01T00:00:10.000Z",X-ASSET-LIST="https://localhost:3000/asset-list?break=mid-1",X-RESUME-OFFSET=0,X-RESTRICT="SKIP,JUMP"
#EXTINF:6.0,
segment0.ts
#EXTINF:6.0,
segment1.ts
... rest of your segments unchanged ...
#EXT-X-ENDLIST

Three attributes do the work:

CLASS="com.apple.hls.interstitial" marks this DATERANGE as an interstitial.
X-ASSET-LIST points at a JSON endpoint (we build it next). For a fixed slate you could use X-ASSET-URI="https://cdn.example.com/bumper/index.m3u8" directly instead.
X-RESUME-OFFSET=0 means "resume the primary exactly where it paused", which is what a VOD ad break wants. Leave it out entirely and the player advances the primary by the interstitial's duration instead; that default is designed for live (keeps you at a constant delay from the edge) and for content replacement.

X-RESTRICT="SKIP,JUMP" stops viewers from seeking past the break. Skip it while debugging, add it back for ads.

💡 Tip: the primary playlist never changes per viewer. Ad decisioning happens at the asset-list URL, so the playlist stays a static, cacheable file.

3. Serve the asset list 🧾

The asset list is a JSON document with an ASSETS array. Each asset has a URI and a DURATION (seconds). This is where you'd call your ad decision server; we'll return a fixed bumper:

// server.js
import express from "express";

const app = express();

app.get("/asset-list", (req, res) => {
  // real life: pick assets per user/break here
  res.json({
    ASSETS: [
      {
        URI: "https://cdn.example.com/bumper/index.m3u8",
        DURATION: 6.0,
      },
    ],
  });
});

app.listen(3000, () => console.log("asset list on :3000"));

node server.js
# asset list on :3000
curl -s "http://localhost:3000/asset-list?break=mid-1" | jq .
# {
#   "ASSETS": [
#     { "URI": "https://cdn.example.com/bumper/index.m3u8", "DURATION": 6 }
#   ]
# }

Return two objects in ASSETS and you have an ad pod; the player plays them back to back before resuming.

Pre-rolls and post-rolls don't need a timeline position at all. Add CUE="PRE" (or "POST") to the DATERANGE and the break anchors to the start or end of the presentation:

#EXT-X-DATERANGE:ID="preroll",CLASS="com.apple.hls.interstitial",START-DATE="2026-01-01T00:00:00.000Z",CUE="PRE",X-ASSET-URI="https://cdn.example.com/preroll/index.m3u8"

CUE="ONCE" marks a break that plays a single time per session, which is also where the subtler player edge cases live (more on that in section 5).

⚠️ Note: serve this endpoint with CORS headers the player can use (Access-Control-Allow-Origin), same as your playlists, or the asset list fetch will fail silently in dev.

4. Wire up hls.js 🎬

Playback needs no special code; interstitials are on by default in 1.6 when the manifest contains them. The events are where you build UI:

// player.js
import Hls from "hls.js";

const video = document.querySelector("video");
const hls = new Hls();

hls.loadSource("https://localhost:8080/primary/index.m3u8");
hls.attachMedia(video);

// the full interstitial schedule (fires on manifest parse + updates)
hls.on(Hls.Events.INTERSTITIALS_UPDATED, (_, data) => {
  console.log("breaks:", data.schedule.map(item => item.start));
});

// asset list fetched for a break
hls.on(Hls.Events.ASSET_LIST_LOADED, (_, data) => {
  console.log("pod for", data.event.identifier, data.event.assetList);
});

// an interstitial asset player was created (asset about to preload/play)
hls.on(Hls.Events.INTERSTITIAL_ASSET_PLAYER_CREATED, () => {
  showAdBadge(true);          // "Ad" chip, disable your seek bar
});

// primary content resumed after the break
hls.on(Hls.Events.INTERSTITIALS_PRIMARY_RESUMED, () => {
  showAdBadge(false);         // restore controls
});

Two debugging affordances worth knowing. The config exposes enableInterstitialPlayback; set it to false and the player ignores interstitial DATERANGEs entirely, which gives you a clean A/B while you bring the feature up (and a kill switch in production). There's also hls.interstitialsManager, which exposes the schedule and playback state so you can render "Ad 1 of 2" and a countdown. Asset list failures surface as non-fatal Hls.Events.ERROR with ASSET_LIST_LOAD_ERROR / ASSET_LIST_PARSING_ERROR details; handle them by letting content continue, which is exactly what the player does by default.

5. Test the seams, not the happy path ✅

The happy path will work on the first try, which is a trap. Interstitials moved the complexity into player state around the break boundaries, and that's where hls.js has been landing fixes through late 2025 and early 2026 (resume offsets with CUE="ONCE", seeking across a scheduled break, and similar edge cases in the issue tracker). Budget a real QA pass:

[ ] Pause during the interstitial, wait 30 seconds, resume
[ ] Seek from before the break to after it (should the break play? check your X-RESTRICT)
[ ] Background the tab mid-break, return after it should have ended
[ ] Kill the asset-list endpoint and confirm content plays through
[ ] Watch the break twice in one session with CUE="ONCE" set (it shouldn't replay)
[ ] Compare on Safari/AVPlayer if you ship to Apple devices; behavior is native there

Run through that list on desktop Chrome, one mobile browser, and Safari, and you've covered the paths that actually break.

What's next 🚀

Two follow-ups worth your time. First, integrated timelines: Apple's spec includes X-TIMELINE-OCCUPIES so scrubbers can render ad blocks as visible slots; if you build custom controls, that's your next feature. Second, live: schedule the DATERANGE ahead of the live edge, drop X-RESUME-OFFSET so viewers stay at a constant delay, and you have ad breaks in a live channel with the same three pieces you built today. The Apple "Getting Started with HLS Interstitials" PDF and the hls.js API docs cover both in depth.

Build an AI dubbing pipeline: faster-whisper + XTTS-v2 + FFmpeg

Mason K — Wed, 08 Jul 2026 06:29:58 +0000

TL;DR

We're building a script that takes a video in English and produces the same video narrated in Spanish, in a cloned version of the original speaker's voice. Stack: faster-whisper for timestamped transcription, an LLM (or any MT engine) for translation, XTTS-v2 for voice-cloned synthesis, FFmpeg for surgery. We'll also handle the problem every demo skips: translated audio that doesn't fit its time slot.

📦 Code: github.com/USER/repo (replace before publishing)

If you'd rather start from a finished system, Softcatala's open-dubbing and KrillinAI are full pipelines behind one CLI. This post builds the minimal version by hand so you understand what those tools are doing, and where they break.

0. Setup and a licensing warning ⚠️

Python 3.10–3.12. The original Coqui company shut down in early 2024; the maintained fork of their TTS library is published by Idiap as coqui-tts:

$ python -m venv dub && source dub/bin/activate
$ pip install faster-whisper coqui-tts
$ ffmpeg -version | head -1   # 6.0+ is fine, 8.x current

⚠️ Note: the XTTS-v2 model weights ship under the Coqui Public Model License, which restricts commercial use. Prototype freely, but before dubbed videos ship to paying customers, someone must read that license and possibly swap the synthesis step for a commercially licensed model or paid API. Voice cloning also requires the speaker's consent. Get it in writing.

1. Extract audio and transcribe with word timestamps 🎙️

# pull mono 16k audio for the ASR step
$ ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -y source.wav

# dub/transcribe.py
from faster_whisper import WhisperModel

model = WhisperModel("large-v3-turbo", compute_type="int8")
segments, info = model.transcribe("source.wav", word_timestamps=True)

lines = []
for seg in segments:
    lines.append({
        "start": seg.start,
        "end": seg.end,
        "text": seg.text.strip(),
    })
print(f"language={info.language} segments={len(lines)}")

The timestamps are the skeleton of the whole pipeline. Every downstream step preserves start/end per segment, because that's where the translated speech has to fit back.

2. Translate with a length budget 🌍

Per-segment MT gives you sentences that are individually fine and collectively wrong (inconsistent terminology, drifting register). Feed the whole transcript to your translation step with context, and, crucially, give it a length constraint per segment. This is the single biggest lever against sync drift:

# dub/translate.py (engine-agnostic sketch)
PROMPT = """Translate this video narration from English to Spanish.
Rules:
- Keep terminology consistent (glossary: {glossary})
- Each numbered line must be speakable within its duration.
  Line 3 has 2.8s. Line 7 has 4.1s. Prefer shorter phrasings.
- Return the same numbered lines, translated."""

Whether the engine is an LLM, a local NLLB/M2M model, or a cloud MT API matters less than the contract: same segments in, same segments out, lengths respected. Have a native speaker skim the output. One reviewer-hour here prevents most of the embarrassment this pipeline can produce.

3. Clone the voice and synthesize 🗣️

XTTS-v2 supports 17 languages and clones a voice from a few seconds of clean reference audio. Cut a reference clip of the original narrator (no music, no crosstalk):

$ ffmpeg -i source.wav -ss 00:00:12 -t 8 -y reference.wav

# dub/synthesize.py
from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")

for i, seg in enumerate(translated_segments):
    tts.tts_to_file(
        text=seg["text_es"],
        speaker_wav="reference.wav",
        language="es",
        file_path=f"segments/{i:04d}.wav",
    )

First run downloads the weights; after that it's local. GPU strongly recommended; CPU works for short content if you're patient.

4. The boss fight: fitting audio back into time slots ⏱️

Spanish runs longer than English as a rule. Some synthesized segments will overflow their slots, and naive concatenation drifts out of sync within minutes. Measure first:

# dub/align.py
import soundfile as sf

report = []
for i, seg in enumerate(translated_segments):
    audio, sr = sf.read(f"segments/{i:04d}.wav")
    actual = len(audio) / sr
    slot = seg["end"] - seg["start"]
    report.append((i, slot, actual, actual / slot))

for i, slot, actual, ratio in report:
    flag = "⚠️ OVERFLOW" if ratio > 1.1 else "ok"
    print(f"seg {i:04d}  slot={slot:.2f}s  synth={actual:.2f}s  ratio={ratio:.2f}  {flag}")

seg 0007  slot=4.10s  synth=5.23s  ratio=1.28  ⚠️ OVERFLOW
seg 0012  slot=2.80s  synth=2.91s  ratio=1.04  ok

Then apply fixes in escalating order:

Absorb into silence. If the next segment starts late, let the audio spill into the gap. Free and inaudible.
Time-stretch gently. FFmpeg's atempo up to ~1.1 is usually imperceptible on speech; beyond that it sounds rushed:

$ ffmpeg -i segments/0007.wav -filter:a "atempo=1.12" -y segments/0007_fit.wav

Re-translate the outliers. Anything still over ratio ~1.2 goes back to step 2 with a tighter length budget. Retranslating five bad segments beats stretching fifty.

Build the final track by placing each segment at its original start on a silent canvas, then remux against the untouched video stream:

# assemble placed segments into one track (adelay per segment, amix), then:
$ ffmpeg -i input.mp4 -i dubbed_es.wav \
    -map 0:v -map 1:a -c:v copy -shortest -y output_es.mp4

-c:v copy matters: the video stream is never re-encoded, so the dub costs nothing in visual quality.

5. Ship it as a track, not a fork 📦

Don't create tutorial_es_final_v2.mp4 files. Mux the dub as an additional audio track and let the player expose a language menu:

$ ffmpeg -i input.mp4 -i dubbed_es.wav \
    -map 0 -map 1:a -c copy \
    -metadata:s:a:0 language=eng -metadata:s:a:1 language=spa \
    -y output_multilang.mp4

For HLS delivery, each language becomes an audio rendition in the master playlist; one video ladder, N audio tracks, and the player switches without a second stream.

Things that will bite you 🧾

A short list from the failure modes this kind of pipeline reliably produces:

Numbers, units, and code. TTS models mangle "0.00096" and "av1_vulkan" in every language. Pre-process the script: expand numbers to words in the target language, and decide whether code identifiers stay English (they should).
Background music. If the source audio has music under the voice, your extracted track carries it into transcription fine, but your dubbed track loses it entirely. Either separate stems first (Demucs works) or accept music-free dubs; mixing the original music bed back under the synthesized voice is the professional-sounding middle path.
Speaker changes. One reference clip means one voice. Interviews and multi-presenter webinars need diarization (who speaks when) before synthesis, which is where the prebuilt pipelines earn their keep.
Acronyms and product names. Add them to the glossary with explicit pronunciation guidance, or enjoy hearing your product's name pronounced five different ways across one video.
Silence is load-bearing. Don't trim inter-segment gaps to make room; viewers use those pauses to process what's on screen.

What's next

Automate the pipeline against your upload flow and keep a per-language glossary file under version control; translation consistency is what makes a library feel professionally localized.
The overflow report from step 4 is your QA dashboard. Track the overflow rate per language over time; it tells you when your translation prompt or TTS pacing regressed.
When you outgrow the DIY version: open-dubbing and KrillinAI cover more edge cases (multi-speaker, subtitle sync), and lip-sync models exist as a heavier stage for talking-head content.

Probing FFmpeg's av1_vulkan encoder: does your GPU actually support it?

Mason K — Wed, 08 Jul 2026 06:29:44 +0000

TL;DR

FFmpeg 8.x includes av1_vulkan, the first cross-vendor GPU AV1 encoder in mainline FFmpeg. We'll probe whether your GPU + driver actually expose AV1 encode, run a first working encode, benchmark it against SVT-AV1 on your own content, and talk about which jobs deserve it.

📦 Code: github.com/USER/repo (replace before publishing)

Until FFmpeg 8.0 ("Huffman", released August 2025), GPU AV1 encoding meant picking a vendor: av1_nvenc for NVIDIA RTX 40+, av1_amf for AMD, av1_qsv for Intel Arc. Three code paths, three sets of flags, three driver stacks. The Vulkan Video encode work gives FFmpeg one encoder that reaches all three vendors through the standard VK_KHR_video_encode_av1 extension.

The catch: driver support is a lottery. Plenty of capable hardware sits behind drivers that don't expose the encode extension yet. So before any pipeline decisions, we probe.

1. Check what you're running

You want FFmpeg 8.x (8.1.2 is current as of late June 2026) built with Vulkan support, plus the vulkaninfo tool from the Vulkan SDK / vulkan-tools package.

$ ffmpeg -version | head -1
ffmpeg version 8.1.2 Copyright (c) 2000-2026 the FFmpeg developers

$ ffmpeg -hide_banner -encoders | grep vulkan
 V....D av1_vulkan           AV1 (Vulkan) (codec av1)

If av1_vulkan doesn't appear, your build wasn't compiled with --enable-vulkan (distro packages vary; the BtbN static builds and most 8.x distro packages include it).

2. Probe the driver for AV1 encode 🔍

The encoder existing in FFmpeg means nothing if the driver doesn't expose the extension. This is the step that separates "should work" from "works":

$ vulkaninfo | grep -iE "video_encode_(av1|queue)"
    VK_KHR_video_encode_av1    : extension revision 1
    VK_KHR_video_encode_queue  : extension revision 12

You see	Meaning
Both extensions listed	You can encode AV1 via Vulkan 🎉
Only `video_encode_queue`	Driver does Vulkan encode, but not AV1 (maybe H.264/H.265 only)
Neither	Driver too old, or GPU lacks an AV1-capable video engine

Rough hardware floor: the last couple of GPU generations from each vendor have AV1 encode silicon. On Linux, Mesa/RADV support for AMD has been the pacesetter; NVIDIA and Intel arrived on their own schedules, and Windows driver branches differ again. When in doubt, update the driver first and re-probe; that fixes more failures than any FFmpeg flag.

⚠️ Note: there are documented cases of reasonable hardware + wrong driver branch producing zero frames. If step 3 fails, the error message usually names the missing extension. It's a driver problem, not your command line.

3. First encode: synthetic, then real

Start with a generated source so file quirks can't confuse the diagnosis:

$ ffmpeg -init_hw_device vulkan=gpu -filter_hw_device gpu \
    -f lavfi -i testsrc2=duration=10:size=1920x1080:rate=30 \
    -vf "format=nv12,hwupload" \
    -c:v av1_vulkan -y /tmp/probe.mp4

Expected output tail:

frame=  300 fps=142 q=-0.0 size=    1843KiB time=00:00:10.00 bitrate=1509.6kbits/s speed=4.7x

If that works, swap in a real file. Keep the decode on the GPU too, so frames never cross the PCIe bus:

$ ffmpeg -init_hw_device vulkan=gpu -hwaccel vulkan -hwaccel_output_format vulkan \
    -i input_1080p.mp4 \
    -c:v av1_vulkan -b:v 2M -maxrate 2.5M -bufsize 4M \
    -c:a copy -y out_av1.mp4

That decode → encode residency is half the point of Vulkan here: FFmpeg 8's Vulkan work includes GPU-side processing, so a multi-rendition transcode can stay on-device instead of bouncing frames to system RAM for every filter.

4. Benchmark against SVT-AV1 on YOUR content 📊

Codec verdicts on someone else's clips are astrology. Take a representative sample of your real content and run a three-way bake-off at matched bitrates:

# software baseline
$ time ffmpeg -i sample.mp4 -c:v libsvtav1 -preset 8 -b:v 2M -y svt.mp4

# vulkan hardware path
$ time ffmpeg -init_hw_device vulkan=gpu -hwaccel vulkan -hwaccel_output_format vulkan \
    -i sample.mp4 -c:v av1_vulkan -b:v 2M -y vk.mp4

# score both with VMAF
$ ffmpeg -i vk.mp4 -i sample.mp4 -lavfi libvmaf -f null -

What you should expect, without me inventing numbers for hardware I'm not running: the software encoder wins quality-per-bit, the hardware path wins throughput and watts. That's been the hardware-encoder trade since forever and Vulkan routes to the same silicon through a standard door. The decision input is your VMAF delta at your target bitrate versus your compute bill, so measure both on your content and your GPUs.

5. Which ladder rungs actually move

Workload	Recommended path	Why
Archival VOD top renditions	SVT-AV1 (CPU)	Every saved kilobit multiplies across storage + delivery
Live transcode ladders	`av1_vulkan` / hw	Realtime throughput is the constraint
UGC bulk ingest	`av1_vulkan` / hw	Most uploads get few views; economics favor throughput
Previews, proxies, scrub assets	`av1_vulkan` / hw	Nobody A/Bs the grain on a preview
Vendor-mixed fleet	`av1_vulkan`	One code path across NVIDIA/AMD/Intel is the whole pitch

The cross-vendor part is easy to underrate. If your pipeline speaks Vulkan instead of NVENC, the day cheaper spot capacity shows up on a different vendor's silicon, you follow the price with a config change instead of a rewrite.

Running it in containers 🐳

Your workstation probe passing means little if production is containerized. Vulkan in a container needs three things: the GPU device passed through, the vendor ICD (installable client driver) JSON present in the image, and a userspace driver that matches the host kernel driver.

# NVIDIA (with nvidia-container-toolkit)
$ docker run --rm --gpus all \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    my-ffmpeg:8.1 vulkaninfo | grep -i encode_av1

# AMD/Intel (Mesa): pass the render node
$ docker run --rm --device /dev/dri \
    my-ffmpeg:8.1 vulkaninfo | grep -i encode_av1

💡 Tip: NVIDIA_DRIVER_CAPABILITIES=all (or at least video,compute,graphics) matters; the default capability set doesn't expose the video engines, and the resulting "No Vulkan device found" error looks identical to missing hardware.

Run the same probe inside the exact image your orchestrator schedules. Host/image driver mismatches produce failures that no amount of FFmpeg flag-tweaking fixes.

Common failures and fixes

# "Device does not support the requested video codec profile"
→ driver exposes encode_queue but not encode_av1; update driver or use av1_nvenc/amf/qsv

# "No Vulkan device found"
→ missing ICD loader or running in a container without --device/gpu passthrough

# encode runs but output is garbage/black
→ pixel format mismatch; force format=nv12 before hwupload

What's next

Wire the probe into CI: a nightly vulkaninfo | grep + testsrc encode across your fleet tells you when a driver update flips a machine from software to hardware eligible.
Re-run your bake-off quarterly. The 8.1.x cycle has been landing driver-facing fixes steadily, and results from January are stale by summer.
If you want the decode-side story (Vulkan AV1/H.264/HEVC decode is much further along), that pairs well with this as the other half of a GPU-resident pipeline.

Ship a 'Go Live' button: OBS in, LL-HLS out, webhooks in between

Mason K — Wed, 08 Jul 2026 06:29:34 +0000

TL;DR

We're adding live streaming to a SaaS dashboard: a backend endpoint that creates a stream, OBS as the broadcaster over RTMPS, LL-HLS playback with hls.js, and a webhook handler that keeps the UI honest. Working "go live" flow in an afternoon.

📦 Code: github.com/USER/repo (replace before publishing)

Webinars, coaching sessions, company town halls: sooner or later your product gets the "can users go live?" ticket. The hard parts (ingest servers, transcoding, CDN delivery) are exactly the parts you should not build. We'll use FastPix as the managed layer here; the same flow works nearly line-for-line on Mux, Cloudflare Stream, or api.video.

What we're building:

A backend endpoint that creates a live stream and returns a stream key
An OBS setup broadcasters can follow in two minutes
A viewer page playing LL-HLS with hls.js
A webhook handler that flips the webinar between scheduled → live → ended

1. Create the stream server-side 🛠️

You need API credentials (Access Token ID + Secret Key). FastPix uses Basic auth on the server API. Node 20.x, plain fetch, no SDK required (though official Node.js/Python/Go/Ruby/PHP/Java/C# SDKs exist if you prefer).

// server/routes/streams.js
import { Router } from "express";
const router = Router();

const AUTH = "Basic " + Buffer.from(
  `${process.env.FP_TOKEN_ID}:${process.env.FP_SECRET}`
).toString("base64");

router.post("/webinars/:id/stream", async (req, res) => {
  const r = await fetch("https://api.fastpix.io/v1/live/streams", {
    method: "POST",
    headers: { "Content-Type": "application/json", Authorization: AUTH },
    body: JSON.stringify({
      playbackSettings: { accessPolicy: "public" },
    }),
  });
  if (!r.ok) return res.status(502).json({ error: "stream create failed" });

  const stream = await r.json();
  // persist against your webinar row:
  // streamId, streamKey (SECRET!), playbackId
  await db.webinar.update(req.params.id, {
    streamId: stream.streamId,
    streamKey: stream.streamKey,
    playbackId: stream.playbackIds?.[0]?.id,
    status: "scheduled",
  });
  res.json({ ok: true });
});

export default router;

⚠️ Note: the stream key is a credential. Anyone who has it can broadcast as your customer. Show it once in the UI, store it encrypted, and offer a "reset key" button.

Two IDs come back and they have opposite audiences:

Value	Who gets it	Purpose
`streamKey`	The broadcaster only	Authenticates ingest
`playbackId`	Every viewer	Builds the playback URL

2. Point OBS at it 🎥

Your broadcasters will use OBS or something that behaves like it. RTMPS is the default ingest protocol (SRT is there too if your users broadcast from unreliable networks). The setup you'll paste into your help docs:

Settings → Stream
  Service:    Custom
  Server:     rtmps://<ingest host from the dashboard/API response>
  Stream Key: <streamKey from step 1>

Hit "Start Streaming" in OBS. Within a few seconds the platform starts transcoding into an adaptive ladder. You don't configure renditions; ABR generation is automatic.

3. Play it with hls.js 📺

Playback is a standard LL-HLS manifest:

https://stream.fastpix.io/<playbackId>.m3u8

Safari plays HLS natively. Everything else needs MSE, which is what hls.js (1.6.16 at the time of writing) is for:

// app/components/LivePlayer.jsx
import { useEffect, useRef } from "react";
import Hls from "hls.js";

export function LivePlayer({ playbackId }) {
  const videoRef = useRef(null);
  const src = `https://stream.fastpix.io/${playbackId}.m3u8`;

  useEffect(() => {
    const video = videoRef.current;
    if (video.canPlayType("application/vnd.apple.mpegurl")) {
      video.src = src; // Safari
      return;
    }
    const hls = new Hls({ lowLatencyMode: true });
    hls.loadSource(src);
    hls.attachMedia(video);
    return () => hls.destroy();
  }, [src]);

  return <video ref={videoRef} controls playsInline muted autoPlay />;
}

FastPix also ships a prebuilt web/iOS/Android player if you'd rather not own the player surface. For paid content, switch accessPolicy off public and mint short-lived JWTs server-side; don't ship permanent public URLs for gated streams.

💡 Tip: LL-HLS gets you latency in the low seconds. If you need sub-second conversational latency (auctions, telehealth), that's WebRTC territory and a different architecture.

4. Webhooks drive the UI, not polling 🔔

The stream has a lifecycle your app needs to mirror: the encoder connects, the stream goes active, the encoder drops, maybe reconnects, the broadcast ends, the recording becomes ready. Every managed platform emits webhooks for these.

// server/routes/webhooks.js
router.post("/webhooks/video", express.json(), async (req, res) => {
  const { type, data } = req.body;

  switch (type) {
    case "video.live_stream.active":
      await db.webinarByStream(data.streamId).update({ status: "live" });
      break;
    case "video.live_stream.disconnected":
      // encoder dropped; reconnect window keeps the session alive
      await db.webinarByStream(data.streamId).update({ status: "reconnecting" });
      break;
    case "video.live_stream.idle":
      await db.webinarByStream(data.streamId).update({ status: "ended" });
      break;
    case "video.asset.ready":
      // live-to-VOD recording is ready: attach the replay
      await db.webinarByStream(data.streamId).update({
        status: "replay_ready",
        replayPlaybackId: data.playbackId,
      });
      break;
  }
  res.sendStatus(200); // ack fast, do heavy work async
});

Event names vary slightly per platform; check the webhook reference for exact types and payloads.

Two rules that save you from 2 AM pages:

Idempotency. Webhooks arrive duplicated and out of order. Make every transition safe to replay (e.g., ignore active if you're already ended).
Distinguish "disconnected" from "ended". Presenters drop off hotel Wi-Fi constantly. The platform's reconnect window keeps the stream session alive through short drops; your UI should show "reconnecting", not kill the room.

The video.asset.ready branch is the sleeper feature: live-to-VOD recording means every broadcast automatically becomes an on-demand replay. Wire it to the same webinar page and your event content compounds instead of evaporating.

Test the whole loop end to end ✅

$ curl -s -X POST localhost:3000/webinars/42/stream | jq
{ "ok": true }

# start OBS → watch your webhook log
POST /webhooks/video  type=video.live_stream.active
# viewer page flips to LIVE

# stop OBS, wait out the reconnect window
POST /webhooks/video  type=video.live_stream.idle
POST /webhooks/video  type=video.asset.ready
# webinar page now shows the replay

If the viewer page never flips to LIVE, it's almost always one of: webhook endpoint not publicly reachable (use a tunnel in dev), wrong stream key in OBS, or your handler acking before persisting.

Production checklist before real customers 📋

[ ] Stream key shown once, stored encrypted, reset button wired
[ ] Webhook endpoint verified with the platform's signing secret (don't accept unsigned POSTs)
[ ] Idempotent transitions tested by replaying the same webhook twice
[ ] Kill-the-encoder drill: stop OBS mid-stream, confirm UI shows "reconnecting", restart OBS, confirm recovery
[ ] Replay page renders from video.asset.ready without manual steps
[ ] Signed playback (JWT) for anything gated; public URLs only for genuinely public streams
[ ] A "stream health" admin view (even just raw webhook log) so support can answer "is it us or them"

The kill-the-encoder drill is the one teams skip and the one that matters most. Presenters lose connectivity during real events constantly; if your UI slams to "ended" on a blip, the audience leaves and doesn't come back.

What's next

Signed playback with JWTs for paid/gated streams, and simulcast if your users also want to hit YouTube/Twitch simultaneously; both are API-level features, no re-architecture.
The full API surface used here is in the FastPix API reference. New accounts get $25 in credits without a card, which is plenty for this build. And if you're on Mux, Cloudflare Stream, or api.video instead: the create-stream/webhook/playback pattern in this post maps over almost 1:1.

Disclosure: this tutorial was prepared for FastPix, whose team reviews and publishes it; the patterns shown transfer to any managed video platform.

Playing HLS through Managed Media Source on iPhone with hls.js

Mason K — Mon, 06 Jul 2026 13:21:27 +0000

TL;DR

iPhone Safari never supported Media Source Extensions, so JS-driven players fell back to native HLS. Since iOS 17.1, Apple ships Managed Media Source (MMS), an MSE variant where the browser controls when you fetch. We'll wire up hls.js to use it, then handle the one behavior that bites people: the browser can evict your buffer.

📦 Code: github.com/USER/mms-hls-demo (replace before publishing)

What we're building

A minimal HLS player page that works through Managed Media Source on iPhone (iOS 17.1+) and falls back cleanly everywhere else. We'll do it twice: first with hls.js doing the heavy lifting (what you'll actually ship), then a stripped-down raw MMS example so you understand what the library is doing under the hood.

Versions used: hls.js 1.6.13, tested on iOS 17.1+ Safari and desktop Chrome/Firefox.

1. Why the old approach fell short

On every platform except iPhone, you'd attach a MediaSource and feed it segments:

// classic MSE: works on desktop, NOT on iPhone Safari (pre-17.1)
const mediaSource = new MediaSource();
video.src = URL.createObjectURL(mediaSource);

On iPhone, window.MediaSource was simply undefined, so libraries detected that and fell back to native HLS:

// the fallback every player shipped
if (video.canPlayType('application/vnd.apple.mpegurl')) {
  video.src = 'https://example.com/playlist.m3u8'; // browser owns everything
}

That works, but you lose JS control over buffering, ABR, DVR UI, and quality menus. MMS is the way back in.

2. Feature-detect Managed Media Source

The new global is ManagedMediaSource. Detect it alongside the classic one:

// src/detect.js
export function getMediaSource() {
  return self.ManagedMediaSource || self.MediaSource || null;
}

export function hasManagedMediaSource() {
  return 'ManagedMediaSource' in self;
}

💡 Tip: self works in both window and worker contexts, which matters if you ever move media handling into a Web Worker.

3. The hls.js path (what you'll ship)

Here's the good news: hls.js 1.6.x already uses the ManagedMediaSource global by default when it's present. For most teams the "port to MMS" task is "upgrade hls.js and fix two attributes."

npm install hls.js@1.6.13

<!-- index.html -->
<video id="player" controls playsinline></video>

// src/player.js
import Hls from 'hls.js';

const video = document.getElementById('player');
const src = 'https://test-streams.mux.dev/x36xhzz/x36xhzz.m3u8';

// Two attributes that matter on iOS with MMS:
video.disableRemotePlayback = true; // keeps AirPlay from hijacking the managed buffer

if (Hls.isSupported()) {
  // Covers desktop MSE AND iPhone Managed Media Source, hls.js picks the right one.
  const hls = new Hls({ enableWorker: true });
  hls.loadSource(src);
  hls.attachMedia(video);
  hls.on(Hls.Events.MANIFEST_PARSED, () => video.play().catch(() => {}));
} else if (video.canPlayType('application/vnd.apple.mpegurl')) {
  // Very old iOS or browsers without MSE/MMS: native HLS.
  video.src = src;
}

That's it for the common case. The reason it's anticlimactic is the whole point: hls.js subscribes to the MMS streaming events and handles buffer eviction so you don't have to. But you should understand what it's doing, because debugging it without that mental model is miserable.

4. The raw MMS example (so you understand the library)

If you're building DASH-on-iOS or your own player, here's the shape of MMS by hand. Three things differ from classic MSE.

Difference 1: attach via a child `<source>`, not an object URL

// src/raw-mms.js
const MMS = self.ManagedMediaSource;
const mediaSource = new MMS();

video.disableRemotePlayback = true;

// iOS wants a child <source> element, not video.src = URL.createObjectURL(...)
const source = document.createElement('source');
source.type = 'video/mp4'; // your fMP4 mime
source.src = URL.createObjectURL(mediaSource);
video.appendChild(source);

Difference 2: only fetch between startstreaming and endstreaming

let sourceBuffer;

mediaSource.addEventListener('sourceopen', () => {
  URL.revokeObjectURL(source.src);
  sourceBuffer = mediaSource.addSourceBuffer('video/mp4; codecs="avc1.4d401f"');
});

// The browser tells YOU when it wants data. Respect it or you lose the battery win.
mediaSource.addEventListener('startstreaming', async () => {
  while (mediaSource.streaming && moreSegments()) {
    const chunk = await fetchNextSegment();
    await appendChunk(sourceBuffer, chunk);
  }
});

mediaSource.addEventListener('endstreaming', () => {
  // Stop fetching. The UA has enough buffered for now.
});

Difference 3: the buffer can be evicted under you

This is the gotcha. With classic MSE, whatever you appended stayed until you removed it. With MMS, the browser can purge buffered ranges under memory pressure and fire bufferedchange:

sourceBuffer.addEventListener('bufferedchange', (e) => {
  // e.addedRanges and e.removedRanges are TimeRanges-like.
  for (let i = 0; i < e.removedRanges.length; i++) {
    const start = e.removedRanges.start(i);
    const end = e.removedRanges.end(i);
    console.log(`UA evicted buffered range ${start} to ${end}s`);
    // If the user seeks back into an evicted range, you must re-fetch it.
    markRangeForRefetch(start, end);
  }
});

⚠️ Note: if you assume appended data is permanent (the MSE habit), seeking backward into an evicted range will look like a frozen player on iPhone and work fine everywhere else. This is the single most common MMS bug. hls.js handles it; hand-rolled players must.

5. Test it on a real device

Simulators lie about media. Use a physical iPhone on iOS 17.1+.

# serve over HTTPS-ish locally; iOS media needs a secure-ish context
npx vite --host
# then open the LAN URL on the iPhone, on cellular if you can

Checklist:

[ ] Playback starts on the iPhone with the hls.js path (not the native fallback). Log which branch ran.
[ ] AirPlay to an Apple TV hands off cleanly with disableRemotePlayback = true.
[ ] Watch a few minutes, scrub backward a long way, confirm no freeze (the eviction case).
[ ] Throttle to a slow profile and confirm ABR still down-switches.

6. Confirm which path actually ran

The most confusing MMS bug is thinking you're on Managed Media Source when you quietly fell back to native HLS (or vice versa). Add an explicit log so you're never guessing:

// src/which-path.js
function reportPlaybackPath(video) {
  if ('ManagedMediaSource' in self && Hls.isSupported()) {
    console.log('[playback] hls.js via ManagedMediaSource');
  } else if (Hls.isSupported()) {
    console.log('[playback] hls.js via classic MediaSource');
  } else if (video.canPlayType('application/vnd.apple.mpegurl')) {
    console.log('[playback] native HLS fallback');
  } else {
    console.warn('[playback] no supported path');
  }
}

On a real iPhone running iOS 17.1+, you want to see the first line, not the native fallback. If you see the fallback, check that your hls.js version is recent enough to use the ManagedMediaSource global and that you didn't accidentally short-circuit to video.src somewhere.

You can also inspect it live from Safari's Web Inspector (connect the iPhone to a Mac, open Develop menu) and confirm 'ManagedMediaSource' in window returns true:

# in the Web Inspector console, attached to the iPhone tab
> 'ManagedMediaSource' in window
true

💡 Tip: ship that reportPlaybackPath log behind a debug flag in production too. "Which playback path did this session use?" is the first question you'll ask when an iPhone-only bug report lands.

When you should NOT bother

Honest take: if you only target Apple devices and you're happy with native HLS, native HLS is still Apple's recommendation and it's less code. Reach for MMS when you need MSE-only powers on iPhone: MPEG-DASH on iOS, a custom DVR/quality UI, or one unified player codebase instead of a forked iOS path.

What's next

Move segment fetching into a Web Worker and drive it from the streaming events for smoother main-thread behavior.
If you use DASH, look at dash.js / Shaka MMS support to drop your separate iOS HLS packaging step.
Read the WebKit AirPlay-with-MSE post for the child-<source> handoff details.

If your player library is current, you may already be on MMS and not know it. Add a one-line log for which code path runs on iOS and go check.

Force keyframe-aligned GOPs across an ABR ladder with FFmpeg

Mason K — Mon, 06 Jul 2026 13:20:47 +0000

TL;DR

If your stream hitches on quality switches but shows zero rebuffering, your ABR renditions probably have keyframes in different places. We'll encode a ladder with identical, forced, closed GOPs in FFmpeg, then verify alignment with ffprobe and MP4Box so it can't regress.

📦 Code: github.com/USER/abr-keyframe-align (replace before publishing)

The bug

A clean ABR down-switch needs both renditions to have a keyframe (IDR) at the same media time. If 1080p has a keyframe at 4.0s and 720p's nearest is at 4.3s, the player can't splice cleanly and you get a micro-freeze. Your QoE dashboard says zero rebuffers because, technically, nothing rebuffered. The seam just collided.

Versions: ffmpeg 7.x, MP4Box (GPAC), tested with x264 and NVENC.

1. Pick one cadence for the whole ladder

Keyframe interval should equal framerate × segment_seconds. Lock it once and apply it to every rung.

# 30 fps source, 2-second segments => keyint = 60
FPS=30
SEG=2
KEYINT=$(( FPS * SEG ))   # 60
echo "keyint=$KEYINT"

The rungs vary in resolution and bitrate. The keyframe cadence does not vary. That's the whole trick.

2. The naive command that causes the bug

This looks fine and ships broken:

# DON'T do this for an ABR ladder
ffmpeg -i source.mov -c:v libx264 -b:v 5000k -hls_time 2 out.m3u8

Two problems:

Scene-cut detection inserts keyframes at content-driven moments, not your grid, and different rungs make different decisions.
min-keyint defaults low, so the encoder can drop in early keyframes whenever it likes.

3. Force fixed, closed GOPs (x264)

# encode-1080p.sh
ffmpeg -i source.mov \
  -c:v libx264 -profile:v high -b:v 5000k -maxrate 5350k -bufsize 7500k \
  -x264opts "keyint=60:min-keyint=60:no-scenecut" \
  -force_key_frames "expr:gte(t,n_forced*2)" \
  -c:a aac -b:a 128k \
  -hls_time 2 -hls_playlist_type vod -hls_segment_type fmp4 \
  -hls_segment_filename "1080p_%03d.m4s" 1080p.m3u8

What each part does:

Flag	Job
`keyint=60`	GOP size = 60 frames (2s at 30fps)
`min-keyint=60`	min == max => closed, fixed GOPs, no early keyframes
`no-scenecut`	stop the encoder adding content-driven keyframes that break alignment
`-force_key_frames expr:gte(t,n_forced*2)`	belt-and-suspenders: a keyframe every 2s by time, robust to VFR
`-hls_time 2`	segment duration matches the GOP boundary

Run the same keyframe flags for every rung, changing only resolution/bitrate:

# encode-720p.sh: identical keyframe settings, different bitrate/scale
ffmpeg -i source.mov \
  -vf "scale=-2:720" \
  -c:v libx264 -profile:v high -b:v 2800k -maxrate 3000k -bufsize 4200k \
  -x264opts "keyint=60:min-keyint=60:no-scenecut" \
  -force_key_frames "expr:gte(t,n_forced*2)" \
  -c:a aac -b:a 128k \
  -hls_time 2 -hls_playlist_type vod -hls_segment_type fmp4 \
  -hls_segment_filename "720p_%03d.m4s" 720p.m3u8

4. NVENC is different

The x264opts string doesn't apply to hardware encoders. Use -g and keep the force_key_frames expression:

# GPU path
ffmpeg -hwaccel cuda -i source.mov \
  -c:v h264_nvenc -preset p5 -b:v 5000k \
  -g 60 -force_key_frames "expr:gte(t,n_forced*2)" \
  -hls_time 2 -hls_segment_type fmp4 1080p_gpu.m3u8

💡 Tip: NVENC honors -g for GOP size and -force_key_frames for exact placement. There's no no-scenecut knob the same way, so the forced-keyframe expression does the heavy lifting.

5. Verify with ffprobe

Don't trust it, check it. List keyframe timestamps for each rendition:

# print PTS of every keyframe in a rendition
ffprobe -v error -select_streams v:0 \
  -show_entries packet=pts_time,flags \
  -of csv=print_section=0 1080p.m3u8 \
  | awk -F',' '$2 ~ /K/ { print $1 }'

Realistic output for both 1080p and 720p should match:

If 720p shows 0, 2, 3.4, 5.4, ... instead, scene-cut detection leaked in. Re-check your flags.

6. Verify with MP4Box and wire it into CI

GPAC's MP4Box has a keyframe-alignment check across renditions. Wrap it in a script that fails the build on misalignment:

# ci/check-alignment.sh
set -euo pipefail

# extract keyframe times for two rungs, diff them
kf () {
  ffprobe -v error -select_streams v:0 \
    -show_entries packet=pts_time,flags -of csv=print_section=0 "$1" \
    | awk -F',' '$2 ~ /K/ { printf "%.3f\n", $1 }'
}

diff <(kf 1080p.m3u8) <(kf 720p.m3u8) && echo "Keyframes aligned ✅" \
  || { echo "Keyframes MISALIGNED ❌"; exit 1; }

$ ./ci/check-alignment.sh
Keyframes aligned ✅

⚠️ Note: an encoder upgrade can silently change a default. A CI check is the only thing that stops a future ladder change from quietly reintroducing the hitch.

7. Audio is the other half of alignment

Video keyframes are only half the story. If your segments are muxed (audio plus video together), the audio frame boundaries also have to fall on segment edges, or you reintroduce the same splice problem on the audio track. Two things bite here.

First, AAC has encoder priming (a few hundred samples of silence at the start) that shifts the audio timeline relative to video. Second, audio frames are a fixed sample count (1024 samples for AAC-LC), so a segment duration that isn't a clean multiple of the audio frame duration will never line up perfectly.

The robust fix is to keep audio in a separate rendition (demuxed) so the player can align it independently, which is also what most HLS ladders do anyway:

# audio-only rendition, separate from video
ffmpeg -i source.mov -vn \
  -c:a aac -b:a 128k -ar 48000 \
  -hls_time 2 -hls_playlist_type vod -hls_segment_type fmp4 \
  -hls_segment_filename "audio_%03d.m4s" audio.m3u8

# confirm the audio segment count matches the video segment count
$ grep -c '\.m4s' audio.m3u8 1080p.m3u8
audio.m3u8:31
1080p.m3u8:31

💡 Tip: matching segment counts across audio and every video rung is a fast sanity check. A mismatch means a duration or framerate assumption is off somewhere.

The cost (be honest)

Forcing keyframes and disabling scene-cut detection sacrifices a little compression efficiency, since you're inserting IDR frames the encoder would rather skip. For ABR delivery that trade is almost always worth it: a marginally bigger file that switches cleanly beats a smaller one that stutters every time the network dips.

What's next

Align audio segment duration with video so muxed segments don't drift; watch encoder/priming delay.
If you run per-title / content-aware encoding, keep the forced keyframe cadence even when bitrates vary per title.
Add the alignment check to your transcoding pipeline's test suite, not just local runs.

Next time a switch hitches and the dashboard says all-clear, dump the keyframe times for two rungs before you touch the CDN.

Build resumable video uploads with tus, Node, and tus-js-client

Mason K — Mon, 06 Jul 2026 13:20:27 +0000

TL;DR

A single multipart POST throws away the whole upload when the connection blinks. The tus protocol fixes that with HTTP-based resumable uploads. We'll stand up a Node @tus/server endpoint, wire a tus-js-client browser uploader, then kill the network mid-upload and watch it resume from the exact offset.

📦 Code: github.com/USER/tus-video-upload (replace before publishing)

What we're building

A tiny upload app: a Node server that accepts resumable uploads and stores them (disk first, then S3), and a browser page that uploads a large video file and survives an interrupted connection. By the end you'll understand the three HTTP requests that make tus work, and you'll have a demo that resumes instead of restarting.

Versions: @tus/server 1.x, @tus/file-store, tus-js-client 4.x, Node 22.x. (tusd v2 exists in Go too; we'll use the Node server here.)

1. The 30-second protocol primer

tus is plain HTTP (built on RFC 9110), stable since v1.0. Three requests do everything:

Request	Purpose
`POST`	Create an upload, declare `Upload-Length`, get back a URL
`PATCH`	Send a chunk at a given `Upload-Offset`
`HEAD`	Ask "how many bytes do you already have?" before resuming

That HEAD is the magic. After a drop, the client asks the server for the current offset and continues from there instead of guessing or restarting.

2. Stand up the Node server

mkdir tus-video-upload && cd tus-video-upload
npm init -y
npm install @tus/server @tus/file-store

// server.js
import http from 'node:http';
import { Server } from '@tus/server';
import { FileStore } from '@tus/file-store';

const tusServer = new Server({
  path: '/files',
  datastore: new FileStore({ directory: './uploads' }),
  // basic guardrails
  maxSize: 5 * 1024 * 1024 * 1024, // 5 GB cap
  async onUploadCreate(req, upload) {
    // hook for auth + validation; throw to reject
    const type = upload.metadata?.filetype ?? '';
    if (!type.startsWith('video/')) {
      throw { status_code: 415, body: 'Only video uploads allowed' };
    }
    return {};
  },
  async onUploadFinish(req, upload) {
    console.log(`✅ upload complete: ${upload.id} (${upload.size} bytes)`);
    return {};
  },
});

const server = http.createServer((req, res) => {
  if (req.url.startsWith('/files')) return tusServer.handle(req, res);
  res.writeHead(404).end();
});

server.listen(1080, () => console.log('tus server on http://localhost:1080/files'));

$ node server.js
tus server on http://localhost:1080/files

💡 Tip: onUploadCreate is where auth lives. Check a session token from req.headers and throw to reject before a single byte is stored.

3. Wire the browser client

npm install tus-js-client

<!-- index.html -->
<input type="file" id="file" accept="video/*" />
<progress id="bar" value="0" max="100"></progress>
<pre id="log"></pre>

// upload.js
import * as tus from 'tus-js-client';

const input = document.getElementById('file');
const bar = document.getElementById('bar');
const log = (m) => (document.getElementById('log').textContent += m + '\n');

input.addEventListener('change', () => {
  const file = input.files[0];
  if (!file) return;

  const upload = new tus.Upload(file, {
    endpoint: 'http://localhost:1080/files',
    retryDelays: [0, 1000, 3000, 5000, 10000], // auto-retry on failure
    metadata: { filename: file.name, filetype: file.type },
    onError: (err) => log('failed: ' + err),
    onProgress: (sent, total) => {
      const pct = ((sent / total) * 100).toFixed(1);
      bar.value = pct;
      log(`progress: ${pct}%`);
    },
    onSuccess: () => log('done: ' + upload.url),
  });

  // resume a previous upload of this file if one exists
  upload.findPreviousUploads().then((prev) => {
    if (prev.length) upload.resumeFromPreviousUpload(prev[0]);
    upload.start();
  });
});

The retryDelays array is what makes a transient drop invisible: tus-js-client backs off and retries automatically. findPreviousUploads is what lets a page reload pick up where it left off.

4. Prove it resumes

Start a big upload, then kill connectivity mid-flight and bring it back. You'll see it continue, not restart:

progress: 0.0%
progress: 41.7%
progress: 62.3%
failed: Error: tus: failed to upload chunk ... (network)
# ... retryDelays kicks in ...
progress: 62.3%   <- HEAD asked the server; resumed from the stored offset
progress: 88.9%
progress: 100.0%
done: http://localhost:1080/files/9f3c...

You can watch the underlying requests in the network tab: a HEAD to the upload URL returns Upload-Offset: <bytes already stored>, then PATCH continues from there.

5. Move the store to S3 for production

Disk is fine for a demo. In production you want durable, multi-instance storage:

npm install @tus/s3-store

// swap FileStore for S3Store
import { S3Store } from '@tus/s3-store';

const datastore = new S3Store({
  partSize: 8 * 1024 * 1024, // 8 MB parts
  s3ClientConfig: {
    bucket: process.env.S3_BUCKET,
    region: process.env.AWS_REGION,
    credentials: {
      accessKeyId: process.env.AWS_ACCESS_KEY_ID,
      secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
    },
  },
});

⚠️ Note: resumable state must be durable. With FileStore, a server restart with ./uploads on ephemeral disk loses partial uploads. S3 (or a shared volume) fixes that and lets multiple server instances share upload state.

6. Lock it down: auth and expiry

An open upload endpoint is an open invitation to fill your storage with junk. The onUploadCreate hook runs before any bytes land, so it's the right place to check a token and reject early.

// in the Server({...}) config
async onUploadCreate(req, upload) {
  const auth = req.headers['authorization'] ?? '';
  const token = auth.replace(/^Bearer\s+/i, '');
  const user = await verifySession(token); // your logic
  if (!user) {
    throw { status_code: 401, body: 'Not authorized' };
  }
  // stamp the owner onto the upload metadata for later
  return { metadata: { ...upload.metadata, userId: user.id } };
}

Pass the token from the client through the headers option:

const upload = new tus.Upload(file, {
  endpoint: 'http://localhost:1080/files',
  headers: { Authorization: `Bearer ${sessionToken}` },
  // ...rest as before
});

For cleanup, tus tracks an Upload-Expires value. Abandoned partials (someone closed the tab at 40%) otherwise sit in your store forever. The @tus/server exposes an expiration extension and a cleanUpExpiredUploads() call you can run on a schedule:

// cron.js: run periodically
import { tusServer } from './server.js';
const removed = await tusServer.cleanUpExpiredUploads();
console.log(`cleaned ${removed} expired uploads`);

⚠️ Note: without expiry cleanup, a public upload form becomes a slow storage leak. Schedule the cleanup before you launch, not after the bill arrives.

7. Things to know before you ship

[ ] Set maxSize and validate filetype in onUploadCreate.
[ ] Handle Upload-Expires and garbage-collect abandoned partials.
[ ] Put the tus route behind auth (token in headers, checked in the create hook).
[ ] Pin versions: the Go reference server tusd moved to v2 with breaking changes vs 1.13.0; the Node packages version independently.
[ ] Several managed platforms (Cloudflare Stream, Vimeo, Supabase) expose tus endpoints, so this same client code largely ports to them.

What's next

Add Uppy on top of tus-js-client for a drag-and-drop UI with progress and retry built in.
Trigger your transcoding pipeline from onUploadFinish.
Read the tus 1.0 spec; the protocol is small enough to read in one sitting and it'll demystify the network tab.

The next time you build an upload box for anything large, don't reach for a single POST. Reach for the protocol that already solved the dropped-connection problem.

Ship multi-language audio in HLS: author the manifest, wire the hls.js switcher

Mason K — Mon, 06 Jul 2026 06:42:48 +0000

📦 Code: github.com/USER/hls-multi-audio - replace before publishing

TL;DR

We'll add a working language picker to an HLS player. The hard part isn't the dropdown, it's the manifest. We'll author alternate audio with EXT-X-MEDIA audio groups, package it correctly, debug the classic "zero audio tracks" bug, and wire a switcher on hls.js v1.7.

Adaptive video, captions, the whole pipeline already works. Now someone wants an English/Spanish audio toggle. In HLS, "which audio can the viewer pick" is decided at packaging time and written into the master playlist. The player just displays it. Let's build it in that order.

1. Understand the structure (audio groups)

HLS decouples video variants from audio renditions:

Each audio rendition is an #EXT-X-MEDIA:TYPE=AUDIO entry pointing at its own media playlist.
Renditions are bundled into a named audio group via GROUP-ID.
Each video variant (#EXT-X-STREAM-INF) references a group with AUDIO="...".

A correct master playlist:

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aud",NAME="English",LANGUAGE="en",DEFAULT=YES,AUTOSELECT=YES,CHANNELS="2",URI="audio/en.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aud",NAME="Espanol",LANGUAGE="es",DEFAULT=NO,AUTOSELECT=YES,CHANNELS="2",URI="audio/es.m3u8"
#EXT-X-STREAM-INF:BANDWIDTH=2128000,CODECS="avc1.640028,mp4a.40.2",AUDIO="aud"
video/720p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1128000,CODECS="avc1.640020,mp4a.40.2",AUDIO="aud"
video/480p.m3u8

Every attribute earns its place:

LANGUAGE - BCP-47 code, used for the label.
DEFAULT - plays when the viewer has no preference.
AUTOSELECT - may be auto-picked from the OS language.
CHANNELS - needed so the player can reason about stereo vs surround.
BANDWIDTH on each video variant must include the audio group's bitrate, or your ABR logic works from a wrong total.

2. Author the renditions with FFmpeg

Extract/encode each language's audio, then package. First, encode video-only and audio-only renditions:

# video only (no audio), two ladder rungs
ffmpeg -y -i master_en.mov -an -c:v libx264 -preset veryfast -b:v 2000k -vf scale=-2:720 video_720.mp4
ffmpeg -y -i master_en.mov -an -c:v libx264 -preset veryfast -b:v 1000k -vf scale=-2:480 video_480.mp4

# audio only, per language (AAC stereo, aligned settings)
ffmpeg -y -i master_en.mov -vn -c:a aac -b:a 128k -ac 2 audio_en.m4a
ffmpeg -y -i dub_es.mov    -vn -c:a aac -b:a 128k -ac 2 audio_es.m4a

⚠️ Note: keep the same segment duration across video and every audio rendition. Misaligned boundaries cause gaps and slow desync that won't show up in a 10-second test.

Then segment with a packager that emits grouped audio. Bento4's mp4-dash/mp4hls or Shaka Packager handle this cleanly. Example with Shaka Packager:

packager \
  in=video_720.mp4,stream=video,init_segment=v720/init.mp4,segment_template=v720/$Number$.m4s,playlist_name=video/720p.m3u8 \
  in=video_480.mp4,stream=video,init_segment=v480/init.mp4,segment_template=v480/$Number$.m4s,playlist_name=video/480p.m3u8 \
  in=audio_en.m4a,stream=audio,hls_group_id=aud,hls_name=English,language=en,init_segment=aen/init.mp4,segment_template=aen/$Number$.m4s,playlist_name=audio/en.m3u8 \
  in=audio_es.m4a,stream=audio,hls_group_id=aud,hls_name=Espanol,language=es,init_segment=aes/init.mp4,segment_template=aes/$Number$.m4s,playlist_name=audio/es.m3u8 \
  --hls_master_playlist_output master.m3u8 \
  --segment_duration 4

3. Debug the "zero audio tracks" bug

The most common failure: player loads, video plays with sound, but the switcher is empty.

hls.audioTracks → []   # 😞

This is almost always the manifest, not the player. Open master.m3u8 and check, in this order:

Are there #EXT-X-MEDIA:TYPE=AUDIO lines at all? (Packager misconfig drops them.)
Does every #EXT-X-STREAM-INF have AUDIO="aud"?
Do the GROUP-ID and the variant's AUDIO value match exactly, including case?
Is the version sane (#EXT-X-VERSION:6 for grouped audio)?

Fix the manifest, and the tracks appear. You almost never fix this in JavaScript.

4. Wire the switcher (hls.js v1.7)

// player.js - hls.js 1.7.x
import Hls from "hls.js";

const video = document.querySelector("#video");
const select = document.querySelector("#audio-picker");

if (Hls.isSupported()) {
  const hls = new Hls();
  hls.loadSource("/stream/master.m3u8");
  hls.attachMedia(video);

  hls.on(Hls.Events.AUDIO_TRACKS_UPDATED, (_evt, data) => {
    select.innerHTML = "";
    data.audioTracks.forEach((track, i) => {
      const opt = document.createElement("option");
      opt.value = String(i);
      opt.textContent = track.name || track.lang || `Track ${i}`;
      if (track.default) opt.selected = true;
      select.appendChild(opt);
    });
  });

  hls.on(Hls.Events.AUDIO_TRACK_SWITCHED, (_evt, data) => {
    console.log("now playing audio track", data.id);
  });

  select.addEventListener("change", (e) => {
    hls.audioTrack = Number(e.target.value); // triggers the switch
  });
} else if (video.canPlayType("application/vnd.apple.mpegurl")) {
  // Safari / native HLS: the OS audio menu is driven by EXT-X-MEDIA. No JS needed.
  video.src = "/stream/master.m3u8";
}

That's the whole player side. On Safari you write zero switching code, the native menu reads your EXT-X-MEDIA tags directly.

Gotchas checklist

✅ Same segment duration across video and all audio renditions.
✅ One codec per audio group. Mixing AAC + EC-3? Use separate groups + matching variants.
✅ Video BANDWIDTH includes audio bitrate.
✅ LANGUAGE is BCP-47 (en, es, pt-BR), not freeform.
✅ Exactly one DEFAULT=YES per group.

5. Validate the manifest in CI

The "zero audio tracks" bug ships when a packaging change silently drops a group reference and no human reads the manifest. Catch it with a tiny parser in CI so a bad master playlist fails the build instead of the player.

// validate-audio-groups.mjs - node 20+
import { readFileSync } from "node:fs";

const master = readFileSync(process.argv[2], "utf8");
const lines = master.split(/\r?\n/);

const groups = new Set();
for (const l of lines) {
  if (l.startsWith("#EXT-X-MEDIA:") && l.includes("TYPE=AUDIO")) {
    const m = l.match(/GROUP-ID="([^"]+)"/);
    if (m) groups.add(m[1]);
  }
}

const errors = [];
for (let i = 0; i < lines.length; i++) {
  if (lines[i].startsWith("#EXT-X-STREAM-INF:")) {
    const m = lines[i].match(/AUDIO="([^"]+)"/);
    if (!m) errors.push(`variant missing AUDIO=: ${lines[i]}`);
    else if (!groups.has(m[1])) errors.push(`AUDIO="${m[1]}" has no matching group`);
  }
}

if (errors.length) {
  console.error("Audio group validation failed:\n" + errors.join("\n"));
  process.exit(1);
}
console.log(`OK: ${groups.size} audio group(s), all variants reference a real group`);

node validate-audio-groups.mjs dist/master.m3u8
# OK: 1 audio group(s), all variants reference a real group

Run it on every packaged output. It is twenty lines and it prevents the single most common multi-audio outage.

6. fMP4 vs TS, and default-track behavior

Two things that trip people once the basics work:

Container: prefer fMP4 (CMAF) audio segments over legacy MPEG-TS. fMP4 plays cleaner across MSE browsers, is required for some codecs, and lets you share segments with a DASH manifest later. The Shaka command above already emits fMP4.
Default selection across browsers: DEFAULT=YES plus AUTOSELECT=YES is what most players honor, but Safari weighs the OS language against AUTOSELECT tracks first. If a Spanish-locale device keeps starting on Spanish even when you expect English, that is AUTOSELECT doing its job. Set AUTOSELECT=NO on tracks you never want auto-picked.

💡 Tip: persist the viewer's last chosen track (in app state, not the manifest) and re-apply it on the next load by setting hls.audioTrack after MANIFEST_PARSED. HLS has no concept of "remember my language"; that is your job.

7. Troubleshooting table

Symptom	Likely cause	Fix
`audioTracks` is empty	Variant missing `AUDIO=` or group-id mismatch	Re-read master playlist; run the CI validator
Audio drifts out of sync after minutes	Audio/video segment durations differ	Re-segment all renditions with one `--segment_duration`
Switch works but audio cuts out briefly	Init segment reload on switch	Update to hls.js 1.7+ (smoother audio-track switching)
Track shows but won't play in one browser	Mixed codecs in one group	Split codecs into separate groups + matching variants
Wrong default language on some devices	`AUTOSELECT=YES` + OS locale	Set `AUTOSELECT=NO` on non-default tracks

What's next

Add a surround track as a second group (hls_group_id=aud-surround, CHANNELS="6") and reference it from higher-bitrate variants.
Validate manifests in CI with a parser so a packaging change can't ship an empty audio list.
Layer WebVTT subtitle groups the same way (TYPE=SUBTITLES), the grouping model is identical.
Bump #EXT-X-VERSION to at least 6 once you use fMP4 segments; older version numbers with modern features are a common source of "plays in Safari, breaks in hls.js" reports.
If you serve the same content as DASH too, generate the audio AdaptationSets from the same fMP4 segments so you maintain one set of media files for both manifests.

Benchmark NVENC vs CPU transcoding (and find your real break-even) with FFmpeg

Mason K — Mon, 06 Jul 2026 06:42:30 +0000

📦 Code: github.com/USER/nvenc-vs-cpu-bench - replace before publishing

TL;DR

A GPU encodes faster than a CPU, but "faster" and "cheaper" are different claims. We'll build a small FFmpeg + VMAF harness that times software (libx264/SVT-AV1) against hardware (h264_nvenc/av1_nvenc), then plug the results into a dollars-per-encoded-minute formula so you find your break-even instead of trusting a benchmark blog.

We're using FFmpeg 7.1.x (current stable line) and an NVIDIA GPU with NVENC. Same approach works for Intel QSV (*_qsv) and AMD AMF (*_amf) if you swap the encoder names.

Why this isn't obvious

NVENC is a fixed-function hardware block, not "the GPU doing x264 in parallel." It's extremely fast and barely touches the CPU, but it exposes fewer rate-control knobs and gives up a little compression efficiency versus a slow software preset. The gap has narrowed a lot, but it's still there at the quality-obsessed end.

So the decision is per-job, and it comes down to one number: dollars per encoded minute = (instance $/hr) ÷ (minutes encoded/hr). GPU instances cost more per hour but encode many streams in parallel, so the answer depends on whether you can keep the encoder saturated.

Let's measure instead of argue.

1. Set up the encoders

Three contenders. One representative source file (use real footage, not a synthetic clip).

# software H.264, quality-leaning preset
ffmpeg -y -i source.mp4 -c:v libx264 -preset slow -crf 21 -an out_cpu.mp4

# NVENC H.264, quality-tuned
ffmpeg -y -hwaccel cuda -i source.mp4 -c:v h264_nvenc -preset p6 -tune hq \
  -rc vbr -cq 23 -an out_gpu.mp4

# AV1: software (SVT-AV1) vs hardware (needs Ada / RTX 40+)
ffmpeg -y -i source.mp4 -c:v libsvtav1 -preset 6 -crf 30 -an out_svtav1.mp4
ffmpeg -y -hwaccel cuda -i source.mp4 -c:v av1_nvenc -preset p5 -cq 30 -an out_av1nvenc.mp4

💡 Tip: -preset p1 (fastest) through -preset p7 (slowest/highest quality) for NVENC. p6/p7 is where it competes on quality; p1-p3 is where it competes on raw throughput.

If av1_nvenc errors with Cannot load nvEncodeAPI or "encoder not found," your GPU is pre-Ada and doesn't have the AV1 block. That's fine, drop it from the comparison.

2. Time each encode

# bench.sh - time an encode and print seconds
#!/usr/bin/env bash
set -euo pipefail
label="$1"; shift
start=$(date +%s.%N)
"$@" >/dev/null 2>&1
end=$(date +%s.%N)
printf "%s\t%.2fs\n" "$label" "$(echo "$end - $start" | bc)"

./bench.sh cpu      ffmpeg -y -i source.mp4 -c:v libx264 -preset slow -crf 21 -an out_cpu.mp4
./bench.sh nvenc    ffmpeg -y -hwaccel cuda -i source.mp4 -c:v h264_nvenc -preset p6 -tune hq -rc vbr -cq 23 -an out_gpu.mp4

Realistic output:

cpu     63.40s
nvenc   18.10s

Faster, clearly. But don't migrate yet, we haven't checked quality or cost.

3. Measure quality with VMAF (not bitrate)

Bitrate alone tells you nothing about whether the picture held up. Use libvmaf:

ffmpeg -i out_gpu.mp4 -i source.mp4 \
  -lavfi "[0:v]setpts=PTS-STARTPTS[dist];[1:v]setpts=PTS-STARTPTS[ref];[dist][ref]libvmaf=log_fmt=json:log_path=vmaf_gpu.json" \
  -f null -

python3 -c "import json;print('VMAF', json.load(open('vmaf_gpu.json'))['pooled_metrics']['vmaf']['mean'])"

Run it for each output. You'll typically see software at a slow preset edge out NVENC by a few VMAF points at a similar file size, and the two converge as bitrate climbs. A few points of VMAF may not matter for streaming; it absolutely matters for an archival master.

4. The break-even math

Now put it together. Say:

CPU instance: $0.20/hr, encodes 1 stream → ~57 min/hr of this content.
GPU instance: $0.55/hr, encodes ~8 of these streams in parallel → ~456 min/hr.

# breakeven.py
cpu = {"cost_per_hr": 0.20, "minutes_per_hr": 57}
gpu = {"cost_per_hr": 0.55, "minutes_per_hr": 456}

for name, m in {"cpu": cpu, "gpu": gpu}.items():
    print(name, "$/encoded-min =", round(m["cost_per_hr"] / m["minutes_per_hr"], 5))

cpu $/encoded-min = 0.00351
gpu $/encoded-min = 0.00121

The GPU wins here, but only because it's saturated (8 parallel streams). Drop minutes_per_hr to 57 (one stream, idle GPU) and the GPU becomes more expensive than the CPU. That's the whole lesson: the GPU is cheaper per minute only when you actually feed it.

⚠️ Note: these instance numbers are placeholders. Plug in your cloud's real prices and your measured parallel-stream count. The break-even is yours, not mine.

5. Saturate the GPU (or the math lies)

The break-even above assumed 8 parallel streams. That number is not free; you have to actually run encodes in parallel to hit it, because one ffmpeg process will not max out a modern encoder block on its own. Measure your real parallel ceiling:

# parallel-throughput.sh - how many concurrent NVENC encodes before it degrades
#!/usr/bin/env bash
set -euo pipefail
N="${1:-8}"
start=$(date +%s.%N)
for i in $(seq 1 "$N"); do
  ffmpeg -y -hwaccel cuda -i source.mp4 -c:v h264_nvenc -preset p5 \
    -rc vbr -cq 23 -an "out_$i.mp4" >/dev/null 2>&1 &
done
wait
end=$(date +%s.%N)
echo "encoded ${N} streams in $(echo "$end - $start" | bc)s"

./parallel-throughput.sh 8
# encoded 8 streams in 22.40s

Watch nvidia-smi while it runs; the Enc utilization column tells you whether you are actually feeding the block or just queuing. If utilization plateaus below 100% at, say, 6 streams, then 6, not 8, is your real divisor in the cost formula. Consumer cards also cap concurrent NVENC sessions in the driver, so a datacenter card may be required to reach high parallelism at all.

6. Rate control matters more than preset

For a transcoding pipeline you usually want consistent quality at a predictable size, which means the rate-control mode is a bigger lever than the preset:

Mode	NVENC flag	Use when
Constant quality	`-rc vbr -cq N`	VOD, you want stable quality, variable size
Capped CRF-like	`-rc vbr -cq N -maxrate M -bufsize 2M`	ABR ladder rungs with a bitrate ceiling
Constant bitrate	`-rc cbr -b:v M`	Live, fixed-bandwidth delivery

# ABR rung: quality target with a hard ceiling so the rung stays in its lane
ffmpeg -y -hwaccel cuda -i source.mp4 -c:v h264_nvenc -preset p6 -tune hq \
  -rc vbr -cq 23 -maxrate 4M -bufsize 8M -an rung_1080.mp4

⚠️ Note: NVENC's -cq is not the same scale as x264's -crf. Do not copy a CRF value across; pick -cq by measuring VMAF on your own footage.

Two more flags claw back most of the remaining quality gap with software, at almost no throughput cost: enable B-frames and a lookahead window so the encoder can plan bit allocation across upcoming frames.

ffmpeg -y -hwaccel cuda -i source.mp4 -c:v h264_nvenc -preset p6 -tune hq \
  -rc vbr -cq 23 -bf 3 -rc-lookahead 20 -spatial-aq 1 -an out_tuned.mp4

Re-run your VMAF check after adding these; on most content they lift the score a point or two for free, which can be the difference that makes NVENC "good enough" for your quality bar.

When to pick which

Workload	Pick	Why
High-volume / near-real-time / live	NVENC	Throughput dominates; keeps the block saturated
Large catalog, steady upload firehose	NVENC	Cost-per-minute wins when utilized
Archival / mastering ladders	Software (x264/x265)	Quality-per-bit compounds forever
AV1 for storage savings	SVT-AV1 (software)	Currently beats av1_nvenc on efficiency
Spiky, low-volume jobs	Software	Idle GPU is wasted money

What's next

Wire the timing + VMAF steps into CI so a preset change can't silently tank quality or throughput.
Try QSV (h264_qsv, av1_qsv on Arc) if you're on Intel, the same harness works.
Measure with footage that matches your real catalog; grain, motion, and resolution all move the numbers.

5 video APIs compared on what's included before you pay extra (2026)

Mason K — Mon, 06 Jul 2026 06:42:07 +0000

📦 Code: github.com/USER/video-api-bench - replace before publishing

TL;DR

The per-minute delivery rate is the easiest number to compare and the least useful. The real cost lives in encoding, analytics, and the player. This post compares Mux, Cloudflare Stream, api.video, FastPix, and AWS on what each includes by default, then gives you a tiny script to benchmark upload and time-to-ready on your own files so you stop trusting marketing pages.

I have shipped video on four managed APIs across three jobs, and every single time the invoice surprised someone. Not because the delivery rate was wrong, but because encoding, analytics, and the player turned out to be separate line items on some platforms and free on others. Let's compare the parts that don't show up in the headline number.

⚠️ Note: pricing pages move. Everything here was checked in June 2026; verify the links before quoting numbers.

1. Encoding: free or metered?

This is the widest spread in the whole comparison.

Platform	Encoding	Delivery	Storage
Cloudflare Stream	Free	$1 / 1,000 min delivered	$5 / 1,000 min stored
api.video	Free (unlimited)	$0.0017 / min	$0.00285 / min
FastPix	Free on standard plan	~$0.00096 / min @1080p	Per-minute, tiered
Mux	Metered per minute	Per minute	Per minute
AWS (DIY)	Per minute (MediaConvert)	Per GB (CloudFront)	Per GB (S3)

If your catalog is upload-heavy (lots of assets encoded once, watched rarely), metered encoding is not a rounding error. It can flip which platform is cheapest, even when the delivery rates look identical.

2. Analytics: included or a $499 floor?

QoE analytics is the feature teams forget to price until playback breaks in production.

Platform	QoE analytics	Entry cost
FastPix (Video Data)	Session-level, 50+ signals/session	Free up to 100K views/month
Mux (Mux Data)	Mature, broad device SDKs	$499/month (Media plan, 1M views, +$0.50/1K)
Cloudflare Stream	Basic	Included, limited depth
api.video	Available	Usage-based
AWS	Build it yourself (CloudWatch + logs)	Engineering time

Honest call: Mux Data has broader data-SDK coverage (Roku, smart TVs, and more). If your playback lives on ten device types, that breadth is worth paying for. If it's web + mobile and you want diagnostics without a monthly floor, the free-up-to-100K option wins on cost.

3. The player

Not every "video API" ships a player. AWS does not. The rest mostly do.

FastPix: programmable player for web, iOS, Android, included, not separately licensed. Pipes telemetry into Video Data.
Mux: Mux Player included with streaming.
Cloudflare: Stream Player included.
AWS: bring your own, instrument it yourself.

The thing to check isn't whether a player exists. It's whether using it locks your analytics to that vendor.

4. Benchmark your own files (don't trust the table)

Speed depends on your media. Here's a minimal harness to measure upload + time-to-ready yourself instead of believing anyone's published numbers, including mine.

# measure-upload.sh - time a direct upload to any signed URL
#!/usr/bin/env bash
set -euo pipefail
FILE="$1"          # e.g. sample.mp4
UPLOAD_URL="$2"    # signed/direct upload URL from your provider

start=$(date +%s.%N)
curl -s -X PUT --upload-file "$FILE" "$UPLOAD_URL" \
  -H "Content-Type: video/mp4" -o /dev/null
end=$(date +%s.%N)

echo "upload_seconds=$(echo "$end - $start" | bc)"

Then poll for readiness so you get an apples-to-apples time-to-ready:

// poll-ready.js - node 20+, measures time from "processing" to "ready"
const started = Date.now();

async function waitForReady(statusUrl, headers) {
  while (true) {
    const res = await fetch(statusUrl, { headers });
    const { status } = await res.json(); // normalize per provider's shape
    if (status === "ready") {
      console.log(`time_to_ready_s=${((Date.now() - started) / 1000).toFixed(1)}`);
      return;
    }
    if (status === "errored") throw new Error("processing failed");
    await new Promise(r => setTimeout(r, 1000));
  }
}

Run the same file against two providers, throttle your network (Chrome DevTools or tc) to simulate real users, and you'll get numbers that mean something for your app.

For reference, a public benchmark suite measured a 177.2 MB file over 4G and FastPix uploaded in 15.2s vs Mux's 47.7s, with time-to-ready 29.4s vs 53.3s. But on a smaller 64.9 MB file, Cloudinary won the whole test and FastPix placed fifth. Different file sizes, different winners. That's exactly why you benchmark your own media.

5. Record results in a format you can defend

Run the harness, then drop the numbers into a table you can hand to whoever signs off. Measuring three things per provider, on the same file and the same throttled network, is enough to make a real decision:

Provider	Upload (s)	Time-to-ready (s)	Cold start (ms)	Notes
A
B

Fill it in yourself. The point of the table is not the absolute numbers, which depend on your network and file, but that every row used identical conditions, so the comparison is honest.

6. The line item nobody benchmarks: getting out

Lock-in is a cost too, and it never shows up in a pricing table. Before you commit, check two things. First, can you export your originals, or only the transcoded renditions? If a provider only hands back the encoded outputs, re-platforming later means re-encoding from lossy sources. Second, is there a migration path in? Some platforms ship a batch migration tool that pulls a library in from another provider; others leave you to script transfers by hand, which is a project of its own.

The same coupling applies to analytics. If your playback telemetry is wired to a vendor's player SDK, switching players means losing historical QoE continuity. None of this should necessarily change your pick, but it should be a row in your spreadsheet next to price, because the cheapest platform to adopt is not always the cheapest one to leave.

What I'd pick

Early-stage, analytics matters, larger files: FastPix first. The Startup Program ($600 in credits for teams under 4 years and under $10M raised, more for YC/VC-backed) makes the trial nearly free.
Already on Cloudflare: Stream. Flat per-minute, free encoding, one less vendor.
Enterprise control, you have the DevOps: AWS.
Just want free encoding and clean PAYG: api.video.

One caveat across all of them: these are API-first, not no-code CMSes. If your team wants drag-and-drop with zero integration work, you'll be building a front-end on top of any of these.

What's next

Wire the benchmark harness into CI so a regression in upload time shows up before your users notice.
Read the pricing pages and the docs side by side: FastPix, Mux, Cloudflare Stream, api.video.
The same upload + poll-for-ready pattern works against every provider here, so your harness is portable if you switch.
Most of these offer free credits to start (FastPix gives $25 on signup, more through its Startup Program), so you can run the benchmark on a real account before you commit a card to anyone.

DEV Community: Mason K

Build a video watch-time heatmap: interval tracker, beacon endpoint, canvas render

TL;DR

1. The interval tracker 🎯

2. The collect endpoint + per-second counters 🗄️

3. Draw it 🎨

4. Production notes 📋

What's next 🚀

Turn on AV1 film grain synthesis and measure what it saves on your own footage

TL;DR

1. Check your toolchain 🛠️

2. Encode the baseline 📼

3. Encode with film grain synthesis 🎞️

4. Compare the numbers 📊

5. Verify with your eyes, not a metric 👀

6. Ladder notes 🪜

7. Automate the sweep 🔁

What's next 🚀

Ship your first HLS interstitial: one DATERANGE tag, an asset list, and hls.js 1.6

TL;DR

1. What you need 🛠️

2. Schedule the interstitial in the playlist 📼

3. Serve the asset list 🧾

4. Wire up hls.js 🎬

5. Test the seams, not the happy path ✅

What's next 🚀

Build an AI dubbing pipeline: faster-whisper + XTTS-v2 + FFmpeg

TL;DR

0. Setup and a licensing warning ⚠️

1. Extract audio and transcribe with word timestamps 🎙️

2. Translate with a length budget 🌍

3. Clone the voice and synthesize 🗣️

4. The boss fight: fitting audio back into time slots ⏱️

5. Ship it as a track, not a fork 📦

Things that will bite you 🧾

What's next

Probing FFmpeg's av1_vulkan encoder: does your GPU actually support it?

TL;DR

1. Check what you're running

2. Probe the driver for AV1 encode 🔍

3. First encode: synthetic, then real

4. Benchmark against SVT-AV1 on YOUR content 📊

5. Which ladder rungs actually move

Running it in containers 🐳

Common failures and fixes

What's next

Ship a 'Go Live' button: OBS in, LL-HLS out, webhooks in between

TL;DR

1. Create the stream server-side 🛠️

2. Point OBS at it 🎥

3. Play it with hls.js 📺

4. Webhooks drive the UI, not polling 🔔

Test the whole loop end to end ✅

Production checklist before real customers 📋

What's next

Playing HLS through Managed Media Source on iPhone with hls.js

TL;DR

What we're building

1. Why the old approach fell short

2. Feature-detect Managed Media Source

3. The hls.js path (what you'll ship)

4. The raw MMS example (so you understand the library)

Difference 1: attach via a child <source>, not an object URL

Difference 2: only fetch between startstreaming and endstreaming

Difference 3: the buffer can be evicted under you

5. Test it on a real device

6. Confirm which path actually ran

When you should NOT bother

What's next

Force keyframe-aligned GOPs across an ABR ladder with FFmpeg

TL;DR

The bug

1. Pick one cadence for the whole ladder

2. The naive command that causes the bug

3. Force fixed, closed GOPs (x264)

4. NVENC is different

5. Verify with ffprobe

6. Verify with MP4Box and wire it into CI

7. Audio is the other half of alignment

The cost (be honest)

Difference 1: attach via a child `<source>`, not an object URL