Hesan “Hesanam” Aminiloo

Posted on May 14

How YouTube handles downloading video

#javascript #webdev #programming #react

One thing I’ve always been curious about regarding YouTube Premium is how you can download videos and access them offline. But YouTube doesn’t just hand you an .mp4 or similar file. Instead, you need to log in again, head to your downloads section, and access them from there.

The big question is: where are these videos stored? How is the storage done? And more interestingly, how can we implement a (much simpler) version of this ourselves?

Let’s break it down into a few simple steps:

Splitting the video into smaller chunks for better storage and streaming
Storing the chunks in IndexedDB — with some encoding and extras
Retrieving and playing the video offline (after decryption and some more extras)

That’s it!

The Core Mechanism

YouTube is hands down one of the most badass services out there. Explaining what’s happening behind the scenes (not even counting the algorithm and monetization side) is really difficult. But here’s the simplified version:

A content creator uploads a video — let’s say in 4K quality. YouTube’s internal services automatically convert that video into multiple qualities and store them somewhere (we don’t know exactly where). So although the original was uploaded in high quality, lower-quality versions are also saved. That makes sense — different devices, bandwidths, and user conditions.

Once it’s stored, users can access it. When you play a video, you’ve probably noticed that a portion ahead of where you’re watching is already downloaded (or better yet, buffered).

YouTube follows a few rules here:

Even if the video is ~54 minutes long, it doesn’t buffer the whole thing.
If you’re watching at minute 2, buffering till minute 40 is unnecessary — it may only buffer up to minute 6.
This prevents wasting resources and allows flexible streaming based on connection quality.
This logic also works great for live streams.

To provide the best user experience, YouTube breaks videos into much smaller chunks. I don’t know the exact logic — maybe a 1-hour video becomes 60 one-minute chunks or 120 thirty-second chunks, or even depends on size/length. But this chunking definitely happens, and we’ll replicate that later.

Also, we can’t just pass an MP4 file to a <video> tag and call it a day. If we’re streaming in chunks, we need a way to feed these into the player. This is where the MediaSource API comes in (which YouTube uses too). It gives us full control over playback, allows us to dynamically push video chunks, adjust bitrate, etc.

So far, we’ve outlined a simplified view of YouTube’s logic. (These are my findings — happy to hear yours too!)

The vision

We’re not YouTube, but we’re curious enough to build something similar.
We won’t deal with video streaming here. We’ll focus solely on when a Premium user downloads a video completely for offline use. But as you know, YouTube doesn’t give you a downloadable MP4. Instead, it stores the video (encrypted) in something like IndexedDB.

Videos are chunked and stored in IndexedDB, which supports storing Blobs and ArrayBuffers — perfect for saving video data.
When it’s time to play the video, these chunks are aggregated using a reduce function and played into a <video> tag or more advance MediaSource.

const chunk = {
   videoId: "Prince Rouge",
   chunkIndex: 5,
   quality: "720p",
   data: Blob // or ArrayBuffer
}

Later, to replay the video, this ArrayBuffer or Blob will be reassembled, and fed into the MediaSource. Note that here, the chunkIndex is 5, meaning this is the fifth chunk out of n total chunks of the full video.

Why these tools?

Why IndexedDB?
For browser-side storage, we have:

localStorage — limited to ~5MB
sessionStorage — also small, not shared across tabs
cookies — meant for other use cases

That leaves IndexedDB as the only viable choice for large binary storage.

Why Blob & ArrayBuffer?
You need to deal with file objects — that’s where Blob comes in.
ArrayBuffer acts as a bridge (not exactly but similar) between Blob and MediaSource. You convert a blob to buffer, then feed it to MediaSource or <video>.

const res = await fetch('video.mp4');
const blob = await res.blob();

// And later
const reader = new FileReader();
reader.readAsArrayBuffer(blob);

Why MediaSource?
The basic <video> tag only works with a complete file URL. But with MediaSource, we can:

Add video data chunk by chunk
Buffer dynamically
Load from memory (or disk), workers, databases
Stream in real time
Build fully custom video players

const mediaSource = new MediaSource();
video.src = URL.createObjectURL(mediaSource);

mediaSource.addEventListener("sourceopen", () => {
  const sourceBuffer = mediaSource.addSourceBuffer('video/mp4; codecs="avc1.42E01E"');

  // Append chunks manually
  fetchChunk().then((chunk) => {
    sourceBuffer.appendBuffer(chunk);
  });
});

That's it.

Orchestration — Putting It All Together

You can see a full sample app Here — it’s bootstrapped with Vite.

To keep it simple:

The user selects a video (they don’t upload it, we just store it into the internal browser db)
We split it into 1MB chunks
Save each chunk in IndexedDB with a key
Reconstruct and play it using <video>

Steps:

Build UI (Tailwind-based, so skipping details)
Chunk the video
Save to IndexedDB
Retrieve and feed to <video> on play
(Optional) Remove from IndexedDB

There’s also a debug button to show database content using a <pre> tag (or check Application tab in DevTools).

Chunking
We have a function that reads the file from input, turns it into ArrayBuffer, and slices it into 1MB chunks. A simple while loop handles the chunking.

const arrayBuffer = await videoFile.arrayBuffer();
const chunkSize = 1024 * 1024; // 1MB chunks
const chunks = [];
let offset = 0;

while (offset < arrayBuffer.byteLength) {
  const size = Math.min(chunkSize, arrayBuffer.byteLength - offset);
  const chunk = arrayBuffer.slice(offset, offset + size);
  chunks.push(chunk);
  offset += size;
}

Storing
We split the video into:

Metadata (e.g. filename, size, total chunks)
Actual chunks

Here's the sample of a metadata object:

const metadata = {
  id: videoId,
  title: videoFile.name, 
  mimeType: videoFile.type,
  size: arrayBuffer.byteLength,
  chunkCount: chunks.length, // this is important
  dateAdded: new Date().toISOString()
};

Now we get to the part where we store data in IndexedDB. Here, you can either use an ORM-like library such as idb, or work with it directly. Since we’re not launching Apollo here, I chose not to use any library.

First, we need to create a database, and then reuse that same instance to run queries on it — whether it's saving data, reading, deleting, or anything else.

let dbInstance = null;

// Initialize the database once and store the connection
const initDB = () => {
  return new Promise((resolve, reject) => {
    if (dbInstance) {
      // Using existing database connection
      resolve(dbInstance);
      return;
    }

    const request = indexedDB.open('VideoStorageDB', 1);

    request.onerror = (event) => {
      console.error("IndexedDB error:", event.target.error);
      reject(event.target.error);
    };

    request.onupgradeneeded = (event) => {
      // Upgrading database schema
      const db = event.target.result;

      // Create the metadata store
      if (!db.objectStoreNames.contains('metadata')) {
        db.createObjectStore('metadata', { keyPath: 'id' });
      }

      // Create the chunks store
      if (!db.objectStoreNames.contains('chunks')) {
        db.createObjectStore('chunks', { keyPath: 'id' });
      }
    };

    request.onsuccess = (event) => {
      dbInstance = event.target.result;

      // Handle connection errors
      dbInstance.onerror = (event) => {
        console.error("Database error:", event.target.error);
      };

      resolve(dbInstance);
    };
  });
};

We use a simple singleton pattern to initialize and reuse the database instance.

const storeCompleteVideo = async (metadata, chunks) => {
  try {   
    const db = await initDB();

    return new Promise((resolve, reject) => {
      const transaction = db.transaction(['metadata', 'chunks'], 'readwrite');

      transaction.onerror = (event) => {
        reject(event.target.error);
      };

      transaction.oncomplete = () => {
        console.log(`Video ${metadata.id} stored successfully with all ${chunks.length} chunks`);
        resolve(metadata);
      };

      const metadataStore = transaction.objectStore('metadata');
      const chunksStore = transaction.objectStore('chunks');

      // INJA 1
      metadataStore.put(metadata);

      // INJA 2
      for (let i = 0; i < chunks.length; i++) {
        const chunkData = {
          id: `${metadata.id}_chunk_${i}`,
          videoId: metadata.id,
          chunkIndex: i,
          data: chunks[i]  // ArrayBuffer chunk
        };

        chunksStore.put(chunkData);
      }
    });
  } catch (error) {
    console.error('Error storing video:', error);
    throw error;
  }
};

Alright, in the code, I marked two spots with comments: INJA1 and INJA2 (I always debug using this word 😄). In section 1, it’s pretty straightforward — we’re just saving the metadata. Nothing fancy — it’s a simple object. The only golden point here is the chunkCount, which we’ll need later.

In section 2, things get a bit more interesting. Here, we loop through the video chunks we previously created and store each one as the object I showed earlier. The data is already in ArrayBuffer format — we just need to save it. Each chunk has its own index, which we’ll need later when reconstructing the video for playback.

If everything goes smoothly, our data gets stored in IndexedDB.

At this point, we’ve basically covered 90% of the journey. What’s left is retrieving that data and feeding it into the <video> tag. And again — our player is just a plain video tag that we feed data into. Nothing fancy or custom here.

Retrieving and Playing
We have a readVideoFromIndexedDB function:

// Get a specific chunk
const getVideoChunk = async (chunkId) => {
  try {
    const db = await initDB();

    return new Promise((resolve, reject) => {
      const transaction = db.transaction(['chunks'], 'readonly');
      const store = transaction.objectStore('chunks');

      const request = store.get(chunkId);

      request.onsuccess = () => {
        resolve(request.result);
      };

      request.onerror = (event) => {
        console.error(`Error getting chunk ${chunkId}:`, event.target.error);
        reject(event.target.error);
      };
    });
  } catch (error) {
    console.error("Error in getVideoChunk:", error);
    throw error;
  }
};


// Read all chunks for a video and combine them
const readVideoFromIndexedDB = async (videoId) => {
  try {    
    // Get the metadata
    const metadata = await getVideoMetadata(videoId);
    if (!metadata) {
      throw new Error(`Video metadata not found for ID: ${videoId}`);
    }

    // Get all chunks in sequence to ensure correct order
    const chunks = [];
    for (let i = 0; i < metadata.chunkCount; i++) {
      const chunkId = `${videoId}_chunk_${i}`;

      const chunk = await getVideoChunk(chunkId);
      if (!chunk) {
        throw new Error(`Chunk ${i} missing for video ${videoId}`);
      }

      chunks.push(chunk.data);
    }

    // INJA 1
    const totalLength = chunks.reduce((sum, chunk) => sum + chunk.byteLength, 0);
    const combinedArray = new Uint8Array(totalLength);
    let offset = 0;

    // INJA 2
    for (const chunk of chunks) {
      combinedArray.set(new Uint8Array(chunk), offset);
      offset += chunk.byteLength;
    }

    return {
      data: combinedArray.buffer,
      type: metadata.mimeType || 'video/mp4'
    };
  } catch (error) {
    console.error("Error reading video:", error);
    throw error;
  }
};

Alright, the code's getting a bit long, but it's still the same stuff we've already gone over. First, we call the readVideoFromIndexedDB function — this is the one responsible for fetching the video, combining all the chunks in order, and passing the result to the <video> tag and again, you’ll see two INJA comments here — those are explained in more detail.

We start by reading the metadata to get general info about the video. We use the chunkCount property to know how many chunks the video has in total — we’ll use that in a loop to fetch them one by one.

Now, about ArrayBuffer: it’s a low-level binary data structure (according to MDN). If you want to work with it, you have to create a buffer with a fixed size — and it’s not mutable. So to work with it, you need a so-called “view” which is exactly what Uint8Array provides. You can think of it as an interface that helps us handle the raw data more easily.

I hope I managed to explain that well — it was a bit tricky for me to grasp at first too 😅.

Once combined:

const blob = new Blob([combinedBuffer], { type: mimeType });
const url = URL.createObjectURL(blob);
video.src = url;

Well, this is where everything comes together.
We take the buffer we created earlier, turn it into a Blob, generate a URL from it, and then set that URL as the src for the <video> tag.

If everything works — we now have an offline-capable video system using IndexedDB.

Final Thoughts

This post took a lot of time to write — and probably a lot to read. I skipped several advanced topics which I don't know how to deal with them (e.g. encryption is only mentioned but not implemented).

Areas for improvement:

No thumbnail support yet
YouTube stores audio and video separately
We combine all data upfront — so there’s no progressive buffering/preloading
Only works for user-uploaded files — could be extended to work with URLs
UI needs work 😅 ## Last Words It’s good to be curious. Lately, I’ve been poking around more, trying to understand how things really work under the hood.

This article idea had been in my head for over 3 years but I was procrastinating it. It was always fascinating to me. The actual implementation isn’t complex (it looks simple at least) — but the potential is huge. YouTube uses it as part of their paid feature 😁

I hope this article was helpful and taught you something new.

Feel free to reach out on LinkedIn!