DEV Community

Duchan
Duchan

Posted on

We switched simulator streaming to H.264 and it felt worse. Here's how we fixed the latency.

In an earlier post I described how tapflow streams iOS simulators to the browser: pull frames off the simulator's IOSurface, JPEG-encode them on the Mac, push them over WebSocket at ~30fps.

JPEG has one great property for interactive streaming: every frame is independent and decodes instantly. There's no buffer, no inter-frame dependency. On localhost it feels like you're touching the simulator directly.

It also has one terrible property: size. A full-frame JPEG of a scrolling screen is ~590KB. On a LAN that's 12–16 MB/s, and our relay started dropping 16–27 frames a second under backpressure — visible tearing.

So we did the obvious thing and moved to H.264. Bandwidth dropped roughly 140× on a still screen and 5× while scrolling. Drops nearly vanished.

And the stream felt worse.

This post is about why, and the two fixes that got H.264 back to "feels like direct touch."


The bar: localhost JPEG

Before touching anything I needed a number, not a vibe. So I instrumented the pipeline end to end — a per-stage panel that reports decode→present and glass→glass (capture timestamp to on-screen) latencies live.

One caveat I'll repeat throughout: glass→glass absolute values are only valid on localhost, where capture and display share one clock. decode→present is a same-machine delta and valid anywhere, so I'll lean on it for the cross-environment claims.

Here's the baseline that mattered, measured on localhost:

Path decode→present p50/p95 (ms)
JPEG still 12.4 / 15.4
JPEG scroll 9.4 / 11.6
H.264 (WebCodecs) still 267 / 274

H.264 decode was ~20× slower than JPEG. On a hardware decoder. That made no sense — until I looked at what the decoder was actually doing.


Fix 1: the decoder was buffering 8 frames for no reason

The transport was clean (~1ms), the input queue was empty. The latency was entirely inside the decoder: it was holding ~8 frames before emitting the first one.

That's a DPB (decoded picture buffer). A decoder reorders frames when B-frames are present — it has to wait for future frames to arrive before it can output the current one in display order. So it buffers up to the level's maximum.

But our encoder is baseline H.264, B-frames off. There is no reordering. The actual reorder depth is zero. The decoder was buffering anyway because the bitstream never told it the reorder depth was zero.

The signal lives in the SPS (sequence parameter set), in the bitstream_restriction flags inside VUI. Our VideoToolbox encoder wasn't setting them, so the decoder fell back to the worst case for the level — max_dec_frame_buffering of ~8 frames at Level 5.0.

The fix is to rewrite the SPS and inject the missing declaration:

max_num_reorder_frames = 0
max_dec_frame_buffering = num_ref_frames
Enter fullscreen mode Exit fullscreen mode

We do this in the agent, on the keyframe SPS, before the frame ever leaves the Mac — so every decoder downstream benefits, not just one browser path:

// agent-core/utils/sps.ts — rewrite the SPS to declare zero reordering
function rewriteLowLatencySps(sps: Uint8Array): Uint8Array {
  const bits = new BitstreamWriter(parseSps(sps))
  bits.vui.bitstreamRestriction = true
  bits.vui.maxNumReorderFrames = 0
  bits.vui.maxDecFrameBuffering = bits.numRefFrames
  return serialize(bits)
}
Enter fullscreen mode Exit fullscreen mode

Result on localhost:

Path decode→present p50/p95 (ms)
H.264 WebCodecs still (before) 267 / 274
H.264 WebCodecs still (after) 2.5 / 4
H.264 WebCodecs scroll (after) 2.1 / 3.9

267 → 2.5ms, roughly 100×. The encoder was lying to the decoder by omission, and the decoder defended itself by buffering. One declaration fixed it.

The browser confirms it's receiving the rewrite — the SPS now reports bitstreamRestriction: true, maxNumReorderFrames: 0.


Fix 2: MSE is a buffer you can't turn off

Fix 1 only helps the WebCodecs path. And WebCodecs has a hard constraint: it only runs in a secure context — HTTPS or localhost.

A team using tapflow over their LAN hits it at plain http://<mac-ip>:4000. That's a non-secure context, so the browser can't use WebCodecs. The fallback at the time was MSE (Media Source Extensions): feed the H.264 into a <video> element through a muxer.

The problem is that <video> is a buffer. It's designed for media playback, where a jitter buffer is a feature. For interactive streaming it's structural latency you can't remove. I measured it on localhost by forcing the MSE tier:

Path decode→present p50/p95 (ms)
H.264 MSE still 239 / 254
H.264 MSE scroll 229 / 244

~235ms, on the same reorder=0 stream that WebCodecs decoded in 2.5ms. The SPS fix can't reach this — it's the media-element buffer, not the decoder's DPB. I'd already set the muxer's flushingTime to 0. There was nothing left to shave.

So I stopped trying to make MSE fast and removed it.

The decoder layer is now two tiers, picked automatically per environment:

// pickDecoder — secure → WebCodecs, otherwise WASM
export function pickDecoder(): Decoder | null {
  if (isSecureContext && 'VideoDecoder' in window) {
    return new WebCodecsDecoder()      // HW, lowest latency
  }
  if (webgl2Available && wasmSupported) {
    return new WASMDecoder()           // tinyh264, zero-buffer
  }
  return null                          // → fall back to JPEG
}
Enter fullscreen mode Exit fullscreen mode

On non-secure LAN-HTTP, we decode H.264 in WASM (tinyh264). It's a software decoder, so it costs CPU — but it has no media-element buffer at all. That's the whole point: it gives you JPEG's immediacy with H.264's bandwidth, on plain HTTP.

Measured on localhost (the worst case — encoder and decoder share one Mac):

Path decode→present p50/p95 (ms)
H.264 WASM still 8.7 / 30.4
H.264 WASM scroll 14.3 / 37.9

That's on par with the localhost-JPEG baseline (12.4 / 9.4) — the bar we set at the start. Removing MSE also let us drop the muxer dependency entirely.

One constraint this introduces: tinyh264 only decodes baseline H.264. iOS already encodes baseline. For Android we pin scrcpy to baseline (profile:int=1) so both platforms share the exact same HTTP→WASM path. High profile is still available on the WebCodecs (secure) tier.


One more thing: dropping H.264 isn't like dropping JPEG

There's a subtlety the switch exposed. With JPEG, every frame is a keyframe, so dropping a frame under backpressure is harmless — the next one stands alone. With H.264, if you drop a P-frame, every following P-frame references something the decoder never received. A zero-buffer decoder like WASM tinyh264 shears until the next IDR arrives.

So the relay had to become keyframe-aware: once it starts dropping under backpressure, it drops the whole GOP until the next keyframe, rather than handing the decoder a broken reference chain. The keyframe flag rides in our frame envelope, so this needs zero NAL parsing on the relay.

// relay — once dropping, drop until the next keyframe
if (backpressured) {
  if (!frame.isKeyframe) return       // skip P-frames in a broken GOP
  dropping = false                    // keyframe resets the chain
}
Enter fullscreen mode Exit fullscreen mode

Honest limitations

  • WASM decode is CPU-bound. At high resolution × fps it hits a CPU ceiling. We mitigate by downscaling the encode resolution — the display is small, so it's a triple win on bandwidth, CPU, and latency.
  • The localhost numbers are best-case for latency and worst-case for CPU. On a real LAN the decoder runs on a separate machine. In our cross-machine measurements, scroll p95 climbs to ~50ms on both decoders — at that point the bottleneck is load/transport, not the codec. The decode→present deltas above hold; the glass→glass absolutes do not transfer across two clocks.
  • Still v0.x. The decoder tiers and SPS rewrite are in agent-core; expect them to keep moving.

Takeaway

Two bugs, same symptom ("H.264 feels laggy"), completely different causes:

  1. The decoder's DPB buffered 8 frames because the SPS didn't declare reorder=0. Fix: rewrite the SPS at the encoder.
  2. The media-element buffer in MSE added ~235ms that no encoder flag can reach. Fix: remove MSE, decode in WASM on non-secure contexts.

The lesson I keep relearning: when streaming feels slow, measure each stage before you change the codec. The codec usually isn't the problem — the buffer you didn't know you had is.


Try it

tapflow is MIT licensed.

npm install -g tapflow
tapflow start
Enter fullscreen mode Exit fullscreen mode

Top comments (0)