DEV Community

Mason K
Mason K

Posted on

Shipping WebVTT subtitles in HLS that actually stay in sync (a hands-on guide for 2026)

๐Ÿ“ฆ Code: github.com/USER/hls-webvtt-pipeline (replace before publishing)

TL;DR

We are going to build a real HLS subtitle pipeline. Take an SRT file, convert to WebVTT, segment it the way HLS expects, add the X-TIMESTAMP-MAP cue, declare it in the manifest, and verify it plays in sync in HLS.js 1.6.16 and Shaka Player. By the end you will have a working ladder + subtitle track and the confidence that it will not drift on seek.

If you have shipped HLS captions as a single sidecar .vtt and called it done, this is the upgrade you have been putting off. The bug it fixes (caption drift after seek, especially on Chrome on Android) is the kind of bug that gets reported as "the subtitles are wrong", with no useful repro steps, by users who never come back. Let's build the version that does not break.

๐Ÿ› ๏ธ 1. The pieces we need

  • FFmpeg 8.1.1 (webvtt muxer; older versions work but the muxer behavior is friendlier in 8.x).
  • A packager. We will use shaka-packager because it has the cleanest support for segmented WebVTT and fMP4-wrapped subtitles. mp4box works too.
  • HLS.js 1.6.16 (latest as of April 2026) and Shaka Player 4.x for verification.
  • A short test video (1 to 2 minutes) and an SRT file with at least a dozen cues.
# bash
mkdir hls-webvtt && cd hls-webvtt
mkdir source segments manifest

# A short test video
curl -L -o source/master.mp4 \
  https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/720/Big_Buck_Bunny_720_10s_5MB.mp4

# A test SRT
cat > source/captions.srt <<'SRT'
1
00:00:00,000 --> 00:00:03,500
Hello, and welcome to the show.

2
00:00:03,500 --> 00:00:07,000
Today we are talking about HLS captions.

3
00:00:07,000 --> 00:00:10,000
And why most pipelines get them wrong.
SRT
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ 2. SRT to WebVTT (the easy part)

FFmpeg's webvtt muxer converts SRT for you. The conversion itself is mostly a header swap, but it normalizes line breaks and timestamps along the way:

# bash
ffmpeg -i source/captions.srt -c:s webvtt source/captions.vtt
cat source/captions.vtt
Enter fullscreen mode Exit fullscreen mode
WEBVTT

1
00:00:00.000 --> 00:00:03.500
Hello, and welcome to the show.

2
00:00:03.500 --> 00:00:07.000
Today we are talking about HLS captions.

3
00:00:07.000 --> 00:00:10.000
And why most pipelines get them wrong.
Enter fullscreen mode Exit fullscreen mode

This file works as a sidecar in HLS. It is also the source of every cross-browser sync bug you are about to hit. We need to segment it and add the timestamp map.

โœ‚๏ธ 3. Segmenting WebVTT for HLS

HLS expects WebVTT split into segments that align with the media segments. With six-second video segments, the subtitles should be on six-second segments too. The packager handles this.

First, build the video ladder. Two renditions is enough for the demo:

# bash
ffmpeg -i source/master.mp4 -c:v libx264 -preset medium -crf 23 \
  -vf scale=1280:720 -g 144 -keyint_min 144 -sc_threshold 0 \
  -hls_time 6 -hls_playlist_type vod \
  -hls_segment_filename "segments/720p_%03d.ts" \
  manifest/720p.m3u8

ffmpeg -i source/master.mp4 -c:v libx264 -preset medium -crf 26 \
  -vf scale=854:480 -g 144 -keyint_min 144 -sc_threshold 0 \
  -hls_time 6 -hls_playlist_type vod \
  -hls_segment_filename "segments/480p_%03d.ts" \
  manifest/480p.m3u8
Enter fullscreen mode Exit fullscreen mode

Now segment the subtitles. Shaka Packager does this with a single descriptor:

# bash
docker run --rm -v "$PWD:/work" -w /work \
  google/shaka-packager \
  in=source/captions.vtt,stream=text,language=en,segment_template=segments/subs/en_$Number$.vtt,playlist_name=manifest/subs/en.m3u8,hls_group_id=subs,hls_name=English \
  --hls_master_playlist_output manifest/master.m3u8 \
  --segment_duration 6
Enter fullscreen mode Exit fullscreen mode

The result is a directory of per-six-second WebVTT files plus a media playlist (manifest/subs/en.m3u8) that references them.

Open one of the segment files:

# segments/subs/en_2.vtt
WEBVTT
X-TIMESTAMP-MAP=MPEGTS:540000,LOCAL:00:00:00.000

00:00:00.000 --> 00:00:01.000
And why most pipelines get them wrong.
Enter fullscreen mode Exit fullscreen mode

That X-TIMESTAMP-MAP line is the whole point. The cue inside the segment is on a local clock starting at zero. The packager has set MPEGTS:540000 to tell the player "this clock-zero corresponds to MPEG-2 PTS 540000", which is 6 seconds at the 90kHz tick rate, which is where the segment starts.

๐Ÿ’ก Tip: open every segment file and confirm the X-TIMESTAMP-MAP is present and correct. Some packagers only emit it on the first segment. The HLS spec is generous about this; player implementations are not.

๐Ÿงท 4. Wiring the manifest

Shaka Packager wrote a master playlist for us. Let's look at it:

# manifest/master.m3u8
#EXTM3U
#EXT-X-VERSION:6

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",LANGUAGE="en",AUTOSELECT=YES,DEFAULT=NO,FORCED=NO,URI="subs/en.m3u8"

#EXT-X-STREAM-INF:BANDWIDTH=3200000,RESOLUTION=1280x720,SUBTITLES="subs",CODECS="avc1.640028"
720p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1600000,RESOLUTION=854x480,SUBTITLES="subs",CODECS="avc1.4d401f"
480p.m3u8
Enter fullscreen mode Exit fullscreen mode

A few things worth pointing out:

  • SUBTITLES="subs" on every EXT-X-STREAM-INF line. Forget this and the renditions never advertise the subs.
  • AUTOSELECT=YES lets the player pick this track when the OS language matches. Set to NO if you want subs strictly opt-in.
  • DEFAULT=NO keeps subs off by default. Most product teams want this; set to YES only for forced subtitles (translated dialogue, on-screen text translation).

๐ŸŒ 5. Serving and verifying

Serve the whole tree with any static server:

# bash
cd manifest && python3 -m http.server 8080
Enter fullscreen mode Exit fullscreen mode

A minimal HTML harness using HLS.js 1.6.16:

<!-- index.html -->
<!doctype html>
<html>
  <body>
    <video id="v" controls autoplay style="width: 100%; max-width: 720px;"></video>
    <script src="https://cdn.jsdelivr.net/npm/hls.js@1.6.16/dist/hls.min.js"></script>
    <script>
      const video = document.getElementById('v');
      if (Hls.isSupported()) {
        const hls = new Hls({ enableWebVTT: true });
        hls.loadSource('http://localhost:8080/master.m3u8');
        hls.attachMedia(video);
        hls.on(Hls.Events.SUBTITLE_TRACKS_UPDATED, (_, d) =>
          console.log('subtitle tracks:', d.subtitleTracks),
        );
      } else if (video.canPlayType('application/vnd.apple.mpegurl')) {
        // Native HLS path (Safari, iOS)
        video.src = 'http://localhost:8080/master.m3u8';
      }
    </script>
  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

Open the page, hit the cc button in the player UI, and confirm the cue text shows up exactly when the actor speaks. Now hit seek to the middle of the timeline. The cue should jump to whatever line was being spoken at that point, with no drift.

โš ๏ธ Note: HLS.js's light build (hls.light.min.js) does not include WebVTT parsing. Use the standard build for subtitle work.

Repeat with Shaka Player to make sure both engines agree:

<script src="https://cdn.jsdelivr.net/npm/shaka-player@4/dist/shaka-player.compiled.min.js"></script>
<script>
  shaka.polyfill.installAll();
  const player = new shaka.Player(document.getElementById('v'));
  player.load('http://localhost:8080/master.m3u8');
  player.setTextTrackVisibility(true);
</script>
Enter fullscreen mode Exit fullscreen mode

If you see drift in one player but not the other, it is almost always an X-TIMESTAMP-MAP problem in your packager output. Open the failing segment, check the MPEGTS value, recompute it (segment_start_seconds * 90000), and re-segment.

๐Ÿšจ 6. The bugs you will probably hit

A quick checklist of what trips teams up:

Symptom Likely cause
Captions perfect at start, drift after seek X-TIMESTAMP-MAP missing or wrong on later segments
Captions show up but lag by a few seconds Subtitle segments and media segments have different durations
Captions work on Safari, blank on Chrome Using HLS.js light build (no WebVTT parser)
Captions appear but lose positioning Converter flattened align, line, position cue settings
Captions show on 720p stream but not 480p Missing SUBTITLES="subs" on one EXT-X-STREAM-INF line
Low-latency live captions arrive late or never LL-HLS part loading bug (fixed in HLS.js 1.6.x; upgrade)

๐Ÿงช 7. Going further: fMP4-wrapped subtitles

For CMAF-aligned pipelines and modern players, you can package the same WebVTT inside fMP4 segments instead of standalone .vtt files. The shaka-packager invocation becomes:

# bash (fMP4-wrapped subtitles)
docker run --rm -v "$PWD:/work" -w /work google/shaka-packager \
  in=source/captions.vtt,stream=text,language=en,format=ttml,init_segment=segments/subs/en_init.mp4,segment_template=segments/subs/en_$Number$.m4s,playlist_name=manifest/subs/en.m3u8,hls_group_id=subs,hls_name=English \
  --hls_master_playlist_output manifest/master.m3u8 \
  --segment_duration 6
Enter fullscreen mode Exit fullscreen mode

Modern Safari, HLS.js 1.6.x, Shaka Player 4.x, and ExoPlayer all read fMP4 subtitle tracks. Smart TVs are still a mixed bag; if your product targets connected TV, run the test matrix before flipping the switch.

What's next

A few directions worth exploring once the basic pipeline is solid:

  • IMSC ingest. Premium content arrives as IMSC 1.1. Build a converter that preserves positioning and styling on the way to WebVTT; you will need it the first time a partner sends you a .dfxp.
  • Multi-language ladders. The same EXT-X-MEDIA pattern scales to N languages; the work is in the packager invocation, not the manifest.
  • Forced subtitles. Set FORCED=YES on subtitle tracks that translate on-screen text (foreign-language signs, locked subtitles in mixed-language scenes). The player surfaces them differently.

The shortcut version of HLS subtitles is fine for a demo. The version above is what stops being a support ticket.

video #webdev #tutorial #javascript

Top comments (0)