Mason K

Posted on May 19

Wiring up a hybrid WebRTC + LL-HLS live stack (the protocol decision tree that actually works)

#webdev #tutorial #video #webrtc

TL;DR

We're going to build a hybrid live stack: presenters connect over WebRTC for sub-second feedback, the SFU re-publishes to RTMP, and the audience plays it back over LL-HLS at ~2 seconds. By the end you'll have a decision tree, a working ingest config, an LL-HLS player setup, and a list of the failure modes that bit me in production.

📦 Code: github.com/USER/llhls-webrtc-hybrid, replace before publishing

The "WebRTC vs LL-HLS" debate is mostly fake. In production, you almost always want both. The presenter needs sub-second so they can talk over their co-host without that awful video-call lag, and the 5,000 people watching just want a stream that doesn't buffer on their phone. We're going to wire up that pattern, end to end.

💡 Tip: If your concurrent viewer count will stay under 500 and the experience is genuinely interactive, skip the LL-HLS half of this article. Pure WebRTC is fine for you. The hybrid pattern is for products where the audience grows past what your media server can hold.

1. The decision tree, codified

Before you write a line of code, ask two questions:

Q1: Does any viewer need sub-second latency (auctions, real-time interaction)?
    → YES: WebRTC is mandatory for those viewers.
    → NO:  LL-HLS only. You're done. Stop reading.

Q2: How many concurrent viewers in 18 months?
    → < 500:    All-WebRTC. Skip LL-HLS.
    → 500–10K:  Hybrid. WebRTC for presenters, LL-HLS for the rest.
    → 10K+:     Hybrid mandatory. LL-HLS is the only thing that scales here.

The math under that tree: a 4-core media server fits roughly 200 WebRTC viewers vs roughly 1,000 LL-HLS viewers. LL-HLS sits behind a CDN cleanly; WebRTC is per-session and can't.

2. The stack we're building

[Presenter]  --WebRTC-->  [SFU]  --RTMP-->  [Ingest]  --LL-HLS-->  [CDN]  -->  [Viewers]
                            |
                            +--WebRTC-->  [Co-presenter / Guest]

For the SFU, I'll use LiveKit because it has a clean cloud-or-self-hosted story and its egress feature pushes RTMP out of the box. For ingest + packaging, I'll use FFmpeg (free, hackable, the reference for everything). For playback, HLS.js with low-latency mode enabled.

⚠️ Note: This article isn't a LiveKit tutorial. Substitute any SFU that supports RTMP egress (Janus, mediasoup, Daily, etc.). The pattern is the same.

3. Wire up the WebRTC half

A minimal browser-side publish using livekit-client. This is the presenter joining the stage:

// app/stage.js
import { Room, RoomEvent, createLocalTracks } from 'livekit-client';

const room = new Room({
  adaptiveStream: true,
  dynacast: true,
  publishDefaults: { simulcast: true },
});

await room.connect(import.meta.env.VITE_LIVEKIT_URL, await fetchJoinToken());

const tracks = await createLocalTracks({ audio: true, video: { resolution: { width: 1280, height: 720, frameRate: 30 } } });
for (const track of tracks) await room.localParticipant.publishTrack(track);

room.on(RoomEvent.ConnectionStateChanged, (state) => {
  console.log('[stage] state =', state);
});

The token is signed server-side. Here's the minimal Node endpoint:

// server/token.js
import { AccessToken } from 'livekit-server-sdk';

export function joinToken(identity, room) {
  const at = new AccessToken(process.env.LIVEKIT_API_KEY, process.env.LIVEKIT_API_SECRET, { identity });
  at.addGrant({ roomJoin: true, room, canPublish: true, canSubscribe: true });
  return at.toJwt();
}

That gets the presenter and co-presenters on WebRTC. Their glass-to-glass is sub-300 ms on a decent network.

4. Re-publish the room to RTMP

LiveKit's egress service can record or relay a room composition out to an RTMP endpoint. We'll point it at our LL-HLS ingest.

# scripts/start-egress.sh
curl -X POST "$LIVEKIT_URL/twirp/livekit.Egress/StartRoomCompositeEgress" \
  -H "Authorization: Bearer $LIVEKIT_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "room_name": "live-show",
    "layout": "speaker",
    "audio_only": false,
    "stream_outputs": [
      { "protocol": "RTMP", "urls": ["rtmp://ingest.example.com/live/show1"] }
    ]
  }'

Now your SFU is pushing the composed stream into your ingest server as plain RTMP. Anything downstream that speaks RTMP can pick it up.

5. Package to LL-HLS with FFmpeg

This is the part most teams get wrong on the first try. The LL-HLS spec requires CMAF partial segments (typically ~200 ms), preload hints, and an EXT-X-SERVER-CONTROL tag in the manifest. FFmpeg 6.0+ supports this directly:

# scripts/package-llhls.sh
ffmpeg -i rtmp://ingest.example.com/live/show1 \
  -c:v libx264 -preset veryfast -tune zerolatency -g 60 -keyint_min 60 -sc_threshold 0 \
  -c:a aac -ar 48000 -b:a 128k \
  -hls_time 2 \
  -hls_playlist_type event \
  -hls_segment_type fmp4 \
  -hls_fmp4_init_filename init.mp4 \
  -hls_segment_filename "/var/www/hls/seg_%05d.m4s" \
  -hls_flags independent_segments+program_date_time+append_list \
  -hls_list_size 6 \
  -master_pl_name master.m3u8 \
  -strftime 1 \
  -method PUT \
  -http_persistent 1 \
  -ldash 1 \
  -window_size 6 \
  -extra_window_size 3 \
  -streaming 1 \
  -seg_duration 2 \
  -frag_duration 0.2 \
  /var/www/hls/stream.m3u8

💡 Tip: The two parameters that actually matter for latency are -frag_duration 0.2 (200 ms CMAF chunks, which the LL-HLS spec wants) and -streaming 1 (write chunks as they're produced rather than after the segment closes). Without those, you have plain HLS with extra steps.

The output is a /var/www/hls/ directory full of .m4s chunks plus a stream.m3u8 manifest with partial-segment annotations. Point your CDN at that directory and respect the chunked transfer encoding when you proxy.

6. Player setup

HLS.js with low-latency mode:

// app/player.js
import Hls from 'hls.js';

const video = document.querySelector('#viewer');
const hls = new Hls({
  lowLatencyMode: true,
  backBufferLength: 4,
  maxLiveSyncPlaybackRate: 1.5,
  liveSyncDuration: 1.5,
  liveMaxLatencyDuration: 3.5,
});

hls.loadSource('https://cdn.example.com/hls/stream.m3u8');
hls.attachMedia(video);

hls.on(Hls.Events.MEDIA_ATTACHED, () => video.play());

hls.on(Hls.Events.ERROR, (_evt, data) => {
  if (data.fatal) {
    console.error('[player] fatal', data.type, data.details);
    hls.startLoad();
  }
});

The values for liveSyncDuration and liveMaxLatencyDuration are the ones I've landed on after testing. They tell the player to stay 1.5 seconds from live with a 3.5-second hard cap before it resyncs. Tighter values rebuffer too often; looser values defeat the point of LL-HLS.

⚠️ Note: Safari uses its native HLS implementation, not HLS.js, and it respects the manifest's EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES tag for low latency. FFmpeg's LL-HLS muxer writes that tag for you when -streaming 1 is set. Verify by curling the manifest and grepping for it.

7. Verifying it actually works

The first time you wire this up, the latency will not be what you expected. Tools to sanity-check:

# Check the manifest has LL-HLS annotations
curl -s https://cdn.example.com/hls/stream.m3u8 | grep -E '(SERVER-CONTROL|PART-INF|EXT-X-PRELOAD-HINT)'

# Should print something like:
# #EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=0.6
# #EXT-X-PART-INF:PART-TARGET=0.2
# #EXT-X-PRELOAD-HINT:TYPE=PART,URI="seg_00123.m4s?byterange=..."

# Measure end-to-end latency with a clock overlay on the source
# Run on the presenter machine:
ffmpeg -re -f lavfi -i "color=c=black:s=640x360:d=600,drawtext=text='%{localtime}'" \
  -f flv rtmp://ingest.example.com/live/show1

Open the playback URL on another device, eyeball the difference between the wall clock on the source and the rendered clock in the player. Production LL-HLS on a tuned CDN gets you 2 to 3 seconds. WebRTC on the same path gets you under half a second.

8. The failure modes that bit me

A short list of things to check before you ship:

CDN that doesn't honor chunked transfer encoding. Some legacy CDNs buffer the segment fully before serving it. You'll get plain HLS latency with all the LL-HLS code paths enabled. Test with curl: curl -v https://cdn.example.com/hls/stream.m3u8 2>&1 | grep -i chunked should show Transfer-Encoding: chunked.
Player buffering "just in case". Default HLS.js buffers more than you'd expect. The liveSyncDuration and liveMaxLatencyDuration values above are deliberate.
TURN-only WebRTC sessions on the presenter side. Corporate firewalls force TURN, which adds a relay hop. Mitigate by deploying TURN close to your SFU.
Egress lag from the SFU. RTMP relay from LiveKit egress has a built-in ~1-2 second buffer on top of LL-HLS latency. The hybrid pattern is fundamentally "WebRTC fast for presenters, LL-HLS slow-ish for audience". Don't try to make the audience side faster than the protocol allows.

What's next

If you want to push the audience-side latency lower, the next step is HESP or WebRTC-WHEP for everyone, both of which have rougher edges than LL-HLS today but are worth tracking. If you want to reduce egress cost, look at peer-assisted delivery (Peer5, CDNBye) which works cleanly on top of LL-HLS.

For player customization (DVR window, captions, multi-audio), the HLS.js docs are the right next read. For ingest at scale, you'll eventually outgrow single-box FFmpeg and want to look at managed packaging or a Kubernetes setup with horizontal ingest.

The protocol war is over. The interesting work in 2026 is building the right hybrid for your audience. Happy streaming.

DEV Community