I built a YouTube downloader where the video bytes never touch my server

#javascript #webdev #performance #showdev

I've been running VidPickr for a few months and the thing that took me longest to get right wasn't the YouTube extraction part. It was deciding where to do the actual mux.

Every other YouTube downloader I looked at does it the same way:

[your browser]  →  "give me this video please"
                            ↓
                     [their server]
                            ↓
              downloads the YouTube streams
                            ↓
                 runs ffmpeg, muxes it
                            ↓
                  sends it back to you

This is fine when you have twelve users. Once you have any traffic, it gets brutal. You're paying bandwidth twice (YouTube → you → user), the server has to keep temp files around while it works, and the whole thing tips over the moment some popular video starts trending.

I tried it that way first, of course. Worked great on my laptop. Then I left it on a small VPS overnight, woke up to a full disk because temp files weren't getting cleaned up fast enough. That was the moment I started looking for a different shape.

What if the browser just did it itself

YouTube serves videos as separate streams. Video on one URL, audio on another, sometimes a third video stream at a different resolution. The "mux" step is interleaving those into an MP4 container with proper headers.

ffmpeg does that. So does pretty much any decent JS muxer. And modern browsers have WebCodecs now.

So the shape I ended up with looks like this:

[browser]  →  my API: "what are the stream URLs for this video id?"
[my API]   →  returns JSON: { video_url, audio_url, ...metadata }
[browser]  →  fetches BOTH streams directly from googlevideo.com
[browser]  →  muxes locally with a WebCodecs-based muxer
[browser]  →  saves the file

My server's job in the whole flow is returning around 5 KB of JSON. The actual video never gets close to my infra.

The pipeline, more or less

The download/mux loop ends up looking something like this:

// fetch both streams in parallel
const [videoChunks, audioChunks] = await Promise.all([
  streamSegments(videoUrl, onVideoChunk),
  streamSegments(audioUrl, onAudioChunk),
]);

// feed chunks into the muxer as they arrive
function onVideoChunk(chunk, meta) {
  muxer.addVideoChunk(chunk, meta);
}
function onAudioChunk(chunk, meta) {
  muxer.addAudioChunk(chunk, meta);
}

muxer.finalize();
saveAs(muxer.output(), 'video.mp4');

Real code has range requests, progress tracking, error recovery on flaky chunks, all the boring stuff. But the shape is exactly that.

The annoying bits

A few things bit me that I would not have guessed up front.

Init segments are separate from data segments. YouTube's segmented streams come with a tiny init segment that has the codec params, and you can't just concat it onto the front of the data. You have to feed it to the muxer first, then start streaming the data chunks in. I spent two evenings convinced the muxer was busted before I figured this out.

CORS. googlevideo.com does not return CORS headers for arbitrary origins. Your two options are run a proxy (which defeats the whole point of the architecture), or use a session-bound URL that the browser is pre-authorized for. I ended up doing the second thing.

WebCodecs support is uneven. Chrome and Edge are great. Safari has partial support that keeps improving. Firefox does not have it. So I have a slower WASM-based fallback path for everything that isn't Chromium.

Memory. A 4K hour-long video muxes out to something close to 10 GB. You cannot hold that in RAM and hope. The fix is streaming the muxer's output to disk as you go, using the File System Access API where it exists and chunked downloads where it doesn't.

What it gets you

Once this was working, the operational story flipped:

The server does basically nothing during a download. CPU is flat.
I cannot log what people downloaded even if I wanted to, because the URLs don't pass through me.
My bandwidth bill is for the API JSON, which is rounding-error money.
"Scale" is whatever the user's laptop can handle. One user, ten thousand users, my infra does not notice.

The tradeoff is that it's a worse experience on a phone or an old ThinkPad. I keep a server-side path around as a paid fallback for people who actually need that.

If you want to poke at it: vidpickr.com. There's also a REST API at api.vidpickr.com for the cases where you do actually want server-side extraction (cron jobs, agent pipelines, that kind of thing).

Happy to answer anything in the comments about the WebCodecs side or the YouTube extraction side. Curious what the worst gotcha anyone else has hit doing browser-native media work is — mine was definitely the init-segment thing.

DEV Community

I built a YouTube downloader where the video bytes never touch my server

What if the browser just did it itself

The pipeline, more or less

The annoying bits

What it gets you

Top comments (0)