DEV Community

Cover image for Replacing Playwright's hardcoded VP8 encoder: a deep dive into page.screencast
Mu-Tsun Tsai
Mu-Tsun Tsai

Posted on

Replacing Playwright's hardcoded VP8 encoder: a deep dive into page.screencast

If you've ever recorded a Playwright session of a text-heavy page — a code editor, a font preview, anything with crisp glyphs — and the output looked like it was filmed through a screen door, you've met Playwright's recordVideo. The artifacts are not a CDP problem. They're not a frame-rate problem. They are an encoder choice problem, and the encoder is hardcoded.

This post is about replacing it without patching playwright-core, using a public API that landed in Playwright 1.59 and that — as far as I can tell — almost nobody is using yet.

Here's what triggered this whole exercise. I was recording a tutorial video for FontFreeze — a small web app that bakes OpenType features and variable axes into static font files — using Playwright. The Preview panel of FontFreeze is essentially a wall of glyphs at small point sizes: exactly the workload that VP8 at 1 Mbps falls apart on.

Before — recorded with Playwright's built-in recordVideo (VP8 @ 1 Mbps), single frame extracted, 2× crop on the glyph table:

VP8 1 Mbps — mosquito noise around glyph edges, small punctuation barely legiblen

After — same page, same single frame, recorded with playwright-recorder-plus (libx264 CRF 18):

libx264 CRF 18 — clean glyph edges, punctuation crisp

Crops are pixel-aligned (same source coordinates) and 2× nearest-neighbor upscaled, so what you see is the codec, not the resampler. As an aside: the before crop is 73 KB as PNG, the after crop is 33 KB — the missing 40 KB is mosquito noise turned into entropy.

The problem, in one line of source

So why does the Before image look like that? It's tempting to blame the CDP screencast feed, or the JPEG step, or the resolution. None of those are the culprit. The culprit is a single line buried inside playwright-core:

// playwright-core/lib/server/videoRecorder.js
const args = ['-c:v', 'vp8', '-b:v', '1M', '-deadline', 'realtime', '-speed', '8', '-threads', '1'];
Enter fullscreen mode Exit fullscreen mode

VP8, 1 megabit, realtime, single-threaded. There is no option to change any of those. People have asked — repeatedly:

  • #8683Tuning video performance (closed 2021)
  • #12056Configure video quality (closed 2022)
  • #17217Specify video params (like fps) (open, no maintainer response)
  • #31424Video recording quality control (open, no maintainer response)

The first two were closed as "won't do." The latter two are technically still open, but they've sat without a maintainer reply long enough that the message is the same. This isn't "nobody thought of it" — it's "this is not a direction the project wants to go."

I went looking for a workaround. The first thing I tried was the obvious one: pnpm patch playwright-core, swap the args. It doesn't work — Playwright ships its own ffmpeg binary, and that binary is built with libvpx only. You can ask it for libx264 all you want; the ffmpeg in the box doesn't have it.

I needed a different layer.

The hook: page.screencast.start({ onFrame })

Playwright 1.59 (Nov 2024) added a public API that sits above the internal video recorder:

await page.screencast.start({
  size: { width: 1280, height: 720 },
  quality: 90,
  onFrame: async (jpeg) => {
    // jpeg is a raw JPEG Buffer, one per frame the page paints
  },
});
Enter fullscreen mode Exit fullscreen mode

This is the same CDP screencast stream Playwright's built-in VideoRecorder consumes — except now you get the JPEGs first. Spawn your own ffmpeg, pipe them in, and you control the encoder completely. No internals patched, no version coupling beyond 1.59+.

A naive 30-line prototype looks like this:

import { spawn } from "node:child_process";
import ffmpegPath from "ffmpeg-static";

const ff = spawn(ffmpegPath!, [
  "-f", "image2pipe",
  "-r", "25",
  "-i", "-",
  "-c:v", "libx264",
  "-preset", "ultrafast",
  "-crf", "18",
  "out.mp4",
]);

await page.screencast.start({
  size: { width: 1280, height: 720 },
  quality: 90,
  onFrame: async (jpeg) => { ff.stdin.write(jpeg); },
});
Enter fullscreen mode Exit fullscreen mode

This works for about thirty seconds before everything you assumed is wrong.

Non-obvious problem #1: frame numbering can't be left to wall-clock timers

Initial mistake: I drove the writer with setInterval(write, 1000/25), grabbing the latest JPEG every tick. Tested it on Windows: I'd record for 87 seconds of wall clock and the output was 65 seconds long. A 32% drift. setInterval on Windows is not a tight-loop CFR clock; it slips, and the slips compound.

Plotted out, the bug looks like this:

Frames written vs. wall-clock time: setInterval line slopes at ~19 fps and ends at frame 1625 after 87 s of wall clock; the wall-clock-anchored target line slopes at 25 fps and ends at frame 2175.

The upper line is what -r 25 ffmpeg expects: 25 frames per wall-clock second, ending at frame 2175 after 87 s. The lower line is what setInterval actually delivers on Windows — about 19 frames per second, ending at frame 1625. ffmpeg labels frame 1625 as "second 65" because it's still doing 25 fps math. The recording is 22 seconds short.

The fix is in Playwright's own videoRecorder.js, hidden in plain sight:

// per JPEG frame:
const frameNumber = Math.floor((nowMs - startMs) * fps / 1000);
const gap = frameNumber - lastFrameNumber;
for (let i = 1; i < gap; i++) ff.stdin.write(lastJpeg); // duplicate to fill
ff.stdin.write(jpeg);
lastFrameNumber = frameNumber;
lastJpeg = jpeg;
Enter fullscreen mode Exit fullscreen mode

No timer. No setInterval. The frame number is computed from wall-clock time on every JPEG arrival, and any gap is back-filled by duplicating the last JPEG. CDP is variable-rate (it skips frames when the page doesn't repaint), so the duplication isn't waste — it's exactly what "page didn't change for 200 ms" should look like in a CFR file.

One extra knob: feed ffmpeg with -fps_mode passthrough (the modern replacement for -vsync 0). Without it, libvpx-vp9 — and some libx264 builds — will silently drop "duplicate" frames as an optimization, undoing your padding.

Non-obvious problem #2: t=0 cannot be the first CDP frame

This one bit me on a real page. FontFreeze loads Pyodide and a font file before it's interactive — about 7 seconds of nothing happening visually after recorder.start(). CDP, being variable-rate, sends zero frames during those 7 seconds. The page hasn't repainted, so there's nothing to send.

If you anchor t=0 to the first JPEG you receive, those 7 seconds vanish from the output. The video is too short. Anything you try to align to it later — narration, a Playwright trace overlay, click sounds — is off by 7 seconds and useless.

Fix:

// at recorder.start():
this._startWallMs = performance.now();
this._lastFrameNumber = -1;

// at first onFrame:
if (this._lastFrameNumber === -1) {
  // Backfill slot 0 .. frameNumber-1 with this JPEG.
  // Reasoning: CDP not sending frames means the page didn't change.
  // So the page looked like *this* the whole time.
  for (let i = 0; i < frameNumber; i++) ff.stdin.write(jpeg);
}
Enter fullscreen mode Exit fullscreen mode

The first frame represents not just "now" but "everything since start()." That's the right approximation for a static warm-up. (For an animation that CDP somehow misses, it isn't. I haven't seen this case in practice.)

The same wall-clock anchor matters for audio. If you schedule a click sound at "frame 42 / 25 fps," you've baked in up-to-1/fps of error against real time, because frameCount lags wall-clock between onFrame calls. Use performance.now() - startWallMs and the click lands where the user clicked.

Non-obvious problem #3: encoder speed isn't a quality knob — it's a correctness knob

Once #1 and #2 are fixed, you get the next failure mode for free. Try VP9 CRF 24. Beautiful files, half the size of H.264 at the same visual quality. Run a 90-second recording.

It drifts. Not 32% — small, maybe 2 seconds over 90, but it grows with length.

Here's why. ff.stdin.write(jpeg) returns false when the kernel pipe buffer is full and ffmpeg can't drain it fast enough. Node honours backpressure: the next write awaits 'drain'. But that backpressure travels backwards through your onFrame queue. CDP keeps producing frames; they pile up in Node's microtask queue waiting for onFrame to return; the timestamp you read inside onFrame (performance.now()) drifts later than reality; your wall-clock-anchored frameNumber grows slower than wall time should make it grow; the video is short again.

You can't fix this from the Node side. The ffmpeg process must, on average, encode faster than realtime, with enough headroom to absorb spikes. libx264 -preset ultrafast does 5–10× realtime on a modern laptop CPU. libvpx-vp9 does not, especially on text content that compresses poorly under realtime constraints. Neither does libsvtav1 at any preset that produces a small file.

This sounds like a quality-vs-speed tradeoff. It isn't, because there's a clean way out: two passes.

Two-pass pipeline: a blue first-pass box (live, libx264 ultrafast CRF 18, fixed) flows into a green second-pass box (background, intermediate.mp4 to whatever the user wants, audio mux). The arrow between them is labelled

The first pass exists to never block CDP. It's so wasteful, in bitrate terms, that backpressure is impossible — the file is large because the encoder is fast. The second pass reads a file, not a live stream, so backpressure is a non-issue: a slow encoder just means you wait longer for the final mp4. It can't reach back through time and shorten the recording.

This split also means the API can be opinionated where it has to be (the live encoder) and flexible where it can be (the final encoder). playwright-recorder-plus exposes the second pass as preset: "youtube" | "web" or, if you want full control, ffmpegArgs: string[]. The first pass is locked.

I tried not doing it this way. I tried letting users set the live encoder. The amount of "my video is shorter than the wall clock" issues that would generate makes me certain it'd be the most-reported bug in the package.

Non-obvious problem #4: the JPEG size is locked by whoever asked first

A subtle one, mentioned briefly. Playwright's screencast server starts on the first client that calls addClient, and the size that first client requests is locked for the lifetime of the session. Subsequent clients silently get the locked size.

The most common way to hit this: context.tracing.start({ screenshots: true }) runs before attachRecorder(), because tracing is also a screencast client. Tracing's fallback size (often 800×450) wins, and recorder quietly produces 800×450 video despite asking for 1280×720.

Fix: parse the JPEG SOF marker from the first frame, compare against the requested size, throw immediately on mismatch. ~30 lines, runs once per recording, sub-microsecond. Better than shipping a 480p file and finding out two days later.

What the API ended up looking like

import { attachRecorder } from "playwright-recorder-plus";

const recorder = await attachRecorder(page, {
  path: "out.mp4",
  size: { width: 1280, height: 720 },
  fps: 25,
  preset: "youtube",          // or: ffmpegArgs: ["-c:v", "libsvtav1", ...]
  // autoStart: true (default) — start() called inside attachRecorder
});

// ... drive the page ...

await recorder.stop();
await recorder.finalized;       // wait for second-pass mp4
Enter fullscreen mode Exit fullscreen mode

Pause/resume — needed for tutorial recording where you want to skip a long setup phase:

await recorder.pause();
await page.evaluate(() => waitForPyodide());
await recorder.resume();
Enter fullscreen mode Exit fullscreen mode

Click sounds (or any audio cue), scheduled against wall clock, mixed in during the second pass:

await page.locator(".save-button").click();
recorder.audio("./assets/click.wav", { offset: 0.05 });  // 50 ms after now
Enter fullscreen mode Exit fullscreen mode

For multi-page contexts (popups, target=_blank flows), there's attachRecorderForContext(context) that auto-attaches a recorder to every page that opens.

What I'm not solving

  • Page audio capture. CDP has no API for it. The package provides recorder.audio(path, opts?) for mixing in pre-recorded audio (TTS narration, click SFX), but it cannot capture what the page itself plays. The README points at getDisplayMedia + MediaRecorder injection if you really need it.
  • A unified video processing pipeline. v0.1.0 is "a better recordVideo," not a framework. If you need to transcode, you have ffmpeg.

Try it

pnpm add playwright-recorder-plus
Enter fullscreen mode Exit fullscreen mode

Repo: github.com/MuTsunTsai/playwright-recorder-plus
npm: playwright-recorder-plus

If you've ever filed or thumbed-up Playwright #8683, #12056, #17217, or #31424 — this is built for you.

Top comments (0)