DEV Community

aykhlf yassir
aykhlf yassir

Posted on

URL to Pixel: How you got Rickrolled

A complete sequential trace of every system layer traversed when a user types http://www.youtube.com/watch?v=dQw4w9WgXcQ into Chrome and presses Enter — from the first kernel interrupt to the final GPU composited frame.


Layer ① — Client


Phase 1 — Input Processing

KEY_UP → URI parse

The browser's UI thread intercepts the KEY_UP event on Enter. The raw string http://www.youtube.com/watch?v=dQw4w9WgXcQ is passed to Chrome's embedded GURL parser.

1.1 Event dispatch (UI thread)

The OS delivers a WM_KEYUP (Windows) or KeyRelease (X11) event to the browser process's message pump. The Compositor thread forwards it to the Browser's Main (UI) thread via an IPC message.

1.2 URI structural parse (GURL)

GURL splits the string into:

  • scheme: http
  • host: www.youtube.com
  • path: /watch
  • query: ?v=dQw4w9WgXcQ

A strict URI regex confirms it is not an Omnibox search query — routing to the search engine is bypassed.

1.3 Search vs. navigate decision (Omnibox)

If the parsed scheme is http, https, ftp, file, or a registered custom scheme, the input is treated as a navigation. Any ambiguous string (no TLD, embedded spaces) would instead be forwarded to the default search engine.


Phase 2 — HSTS Enforcement & Scheme Upgrade

307 internal redirect

Before any network I/O, Chrome consults its preloaded HSTS list (sourced from the HSTS Preload List, baked into Chromium's binary and also cached locally). youtube.com is on this list.

2.1 Preload list lookup (HSTS)

Chrome walks a prefix tree over the eTLD+1 of the host (youtube.com). The entry signals includeSubDomains, so www.youtube.com is also covered. Stored policy fields: max-age, includeSubDomains, preload.

2.2 Scheme rewrite (307 Internal)

Chrome synthesises a 307 Internal Redirect entirely in-process — no bytes leave the machine. The URL is rewritten to https://www.youtube.com/watch?v=dQw4w9WgXcQ. This prevents any HTTP probe packet that could be intercepted by a MITM before a 301 redirect arrives from the server.

2.3 MITM downgrade prevention

Without HSTS, an attacker on the local network could intercept the initial http:// SYN, impersonate the server, serve a spoof page, and later redirect to HTTPS — the user would never notice. HSTS eliminates the plaintext window entirely.


Phase 3 — Inter-Process Communication (IPC)

UI thread → Network thread

The UI thread cannot perform blocking network I/O directly. It packages the navigation intent into an IPC message and posts it asynchronously to the Network Service process (isolated since Chrome 66).

3.1 Navigation request serialisation (Mojo IPC)

Chrome uses the Mojo IPC framework. The UI thread creates a network::ResourceRequest struct, serialises it over a Mojo pipe to the Network Service. This pipe is a named shared-memory channel; passing a file descriptor for the socket avoids a data copy.

3.2 Network Service sandbox

The Network Service runs in a restricted sandbox (no filesystem access, limited syscalls via seccomp-BPF). It owns all sockets, DNS resolution, and TLS. Compromise of the Renderer or UI process does not expose raw socket access — only the Network Service can read/write encrypted bytes.


Phase 4 — DNS Resolution — Cache Traversal

Browser → OS → hosts file

Chrome's Network Service walks a three-tier cache hierarchy before issuing any external DNS packet. Each tier is a TTL-governed LRU cache.

4.1 Browser DNS cache (Tier 1)

Chrome maintains its own in-process DNS cache, independent of the OS. Keyed by hostname + record type. chrome://net-internals/#dns exposes its contents. On a cache hit, the IP is returned immediately; the A/AAAA record TTL governs expiry.

4.2 OS stub resolver — getaddrinfo (Tier 2)

On a miss, Chrome calls getaddrinfo() (POSIX) or DnsQuery() (Win32). The OS checks its own DNS cache (nscd on Linux, DNSCache service on Windows). It also reads /etc/hosts (POSIX) or C:\Windows\System32\drivers\etc\hosts for static overrides.

4.3 Hosts file override (Tier 3)

127.0.0.1 localhost is the canonical example. Entries here short-circuit all network lookups. Malware often poisons this file to redirect domains — it takes effect before any DNS packet is emitted.


Phase 5 — DNS Resolution — Recursive Hierarchy

Stub → Root → TLD → Authoritative

On a full cache miss, the OS stub resolver emits a UDP datagram to the configured recursive resolver (e.g., 8.8.8.8). If that resolver also lacks the record, it traverses the global DNS tree.

5.1 DNS query datagram (UDP)

The stub resolver constructs a 12-byte DNS header plus the QNAME wire format: \x03www\x07youtube\x03com\x00, QTYPE A (0x0001), QCLASS IN (0x0001). Sent as a single UDP datagram to port 53 of the configured resolver.

5.2 Root → TLD → Authoritative traversal (Recursive)

If the recursive resolver (e.g. Google Public DNS at 8.8.8.8) lacks a cached answer:

  1. It queries one of the 13 root server clusters (A–M.ROOT-SERVERS.NET) for a referral to the .com TLD servers.
  2. It queries a Verisign TLD server for the NS records of youtube.com.
  3. It queries Google's authoritative nameservers (ns1–ns4.google.com) for the A/AAAA record of www.youtube.com.

5.3 DNSSEC validation (optional)

Google's public resolvers support DNSSEC. The resolver validates the chain of trust from the root KSK → .com ZSK → youtube.com ZSK → the RRSet signature. An invalid signature causes the resolver to return SERVFAIL rather than a spoofed IP.

5.4 A/AAAA record returned

Google's authoritative servers return an Anycast VIP (Virtual IP) belonging to Google's global load-balancing fabric — a Google Front End (GFE). The TTL is typically 300 seconds. The recursive resolver caches the answer and returns it to the OS, which caches it and returns it to Chrome.


Layer ② — Network Transit


Phase 6 — Socket Allocation & Transport Handshake

Socket alloc, QUIC/TCP handshake

Chrome prefers HTTP/3 over QUIC (UDP-based) when connecting to Google services. It remembers QUIC availability via an Alt-Svc header from previous sessions. If QUIC is blocked, it falls back to TCP/TLS.

6.1 QUIC 0-RTT or 1-RTT handshake

QUIC multiplexes TLS 1.3 and transport into a single UDP-based protocol. On a known server (cached session ticket), Chrome attempts a 0-RTT handshake: it sends the ClientHello + HTTP request data in the very first UDP datagram. The server can respond immediately, saving one full round-trip versus TCP+TLS.

6.2 TCP three-way handshake (fallback)

If QUIC is unavailable (UDP blocked, no cached session): the OS allocates a socket FD, binds an ephemeral source port (e.g., 54321), and sends a TCP SYN to the GFE's Anycast IP on port 443. The server responds with SYN-ACK; the client sends ACK. The connection is established after one RTT.

6.3 TLS 1.3 handshake (TCP path)

Immediately after the TCP ACK, the client sends a ClientHello containing:

  • Supported cipher suites: TLS_AES_128_GCM_SHA256, TLS_CHACHA20_POLY1305_SHA256
  • An ECDH key share (X25519 curve)
  • SNI extension: www.youtube.com
  • ALPN extension: h2

The server responds with ServerHello + Certificate + CertificateVerify + Finished. The client verifies the cert chain against Mozilla's Root CA store (bundled in Chrome). A symmetric session key is derived via ECDHE (Elliptic Curve Diffie-Hellman Ephemeral). Total handshake overhead: 1 RTT.

6.4 Cipher negotiation & symmetric key (AES-NI)

The agreed cipher is typically AES-128-GCM. On modern x86 CPUs, the AES-NI instruction set executes each AES round as a single opcode, achieving ~40 Gbps throughput per core. GCM mode provides both encryption and a Message Authentication Code (MAC) in one pass.


Phase 7 — HTTP Request Construction & Transmission

GET /watch HEADERS frame

The Network Service formats an HTTP/2 HEADERS frame (or HTTP/3 HEADERS frame over QUIC) containing the full request, then encrypts and transmits it.

7.1 Header compression — HPACK/QPACK

HTTP/2 uses HPACK to compress headers. Common headers like :method: GET, :scheme: https are represented as 1-byte indices into a static table. Custom headers like Cookie are Huffman-encoded and stored in a dynamic table for future requests on the same connection. This reduces header overhead from ~800 bytes to ~100 bytes.

7.2 HEADERS frame structure

The HEADERS frame contains:

  • :method GET
  • :path /watch?v=dQw4w9WgXcQ
  • :authority www.youtube.com
  • :scheme https
  • user-agent (Chrome version string)
  • accept-encoding: br, gzip
  • cookie: SID=...; HSID=...; SSID=... (YouTube session tokens)

Stream ID is 1 (first client-initiated stream).

7.3 TLS record layer encryption

The HEADERS frame bytes are passed to BoringSSL's SSL_write(). They are chunked into TLS records (max 16 KB each), each prepended with a 5-byte header (content type, version, length) and appended with a 16-byte GCM authentication tag. The ciphertext is written to the kernel's TCP send buffer via write() / sendmsg().


Phase 8 — Network Transit & BGP Routing

BGP routing, PoP ingress

Encrypted IP packets traverse the public internet, routed by BGP (Border Gateway Protocol), before arriving at a Google Point of Presence (PoP).

8.1 Autonomous system routing (BGP)

Each internet router makes next-hop decisions by looking up the destination IP in its BGP routing table. The table is built by exchanging AS PATH advertisements between Autonomous Systems. Packets may traverse 10–20 router hops, each performing a longest-prefix match in nanoseconds via hardware TCAM (Ternary Content Addressable Memory).

8.2 Google Anycast ingress

Google's GFE VIPs are announced from hundreds of PoPs simultaneously via BGP Anycast. The internet's routing infrastructure naturally directs the packet to the topologically nearest Google PoP. A user in a given region (e.g. North Africa) reaches a Google edge in a nearby city (e.g. Paris or Frankfurt) rather than a US datacenter.

8.3 Maglev load balancer

At the PoP, Maglev — Google's software load balancer — receives the packet. It computes a consistent hash over the 5-tuple (src IP, src port, dst IP, dst port, protocol) to deterministically map the connection to a specific GFE server. The same 5-tuple always maps to the same GFE, preserving TCP state across server pool changes via consistent hashing.


Layer ③ — Server


Phase 9 — NIC Ingress & Kernel DMA

DMA → sk_buff → NAPI

The TCP SYN (or QUIC UDP datagram) physically arrives at the GFE server's Network Interface Controller (NIC).

9.1 Optical → electrical conversion (PHY)

The NIC's PHY transceiver converts arriving optical pulses (1310 nm or 1550 nm photons on the fibre) into differential electrical signals, then into digital logic. The MAC layer reads the Ethernet frame and verifies the Frame Check Sequence (FCS) — a CRC-32 over the entire frame. A bad FCS causes the frame to be silently dropped.

9.2 PCIe DMA → sk_buff

On a valid FCS, the NIC's PCIe DMA engine copies the packet payload directly into a pre-allocated kernel ring buffer (the RX descriptor ring) without CPU involvement. Linux represents each packet as an sk_buff (socket buffer) struct — a 256-byte metadata header pointing to the payload in kernel memory.

9.3 IRQ → NET_RX_SOFTIRQ → NAPI poll

The NIC asserts a hardware IRQ. The CPU services it, but immediately masks further NIC interrupts and schedules a NET_RX_SOFTIRQ. The kernel shifts to NAPI polling mode: it spins draining the RX ring in a tight loop, processing up to budget=64 packets per poll cycle. This prevents interrupt storms at high packet rates (10–100 Gbps).


Phase 10 — Kernel TCP State Machine & Accept Queue

SYN queue → Accept queue → epoll

The packet ascends the Linux network stack. The TCP layer manages the SYN queue and Accept queue, then notifies user space via epoll.

10.1 IP layer demultiplexing

The IPv4/IPv6 layer validates the header checksum and the destination IP against the server's interface addresses. It looks up the transport protocol (6 = TCP, 17 = UDP) in the protocol table and passes the payload upward.

10.2 SYN queue → Transmission Control Block (TCB)

The TCP layer hashes the 4-tuple to locate or create a Transmission Control Block (TCB). For a new SYN, it allocates a half-open TCB in the SYN queue and sends SYN-ACK. The SYN queue is protected against SYN flood attacks by SYN cookies — the initial sequence number encodes a hash of the 4-tuple so no per-connection state is needed until the final ACK arrives.

10.3 ACK → Accept queue → accept4()

When the client's final ACK arrives, the kernel moves the TCB to the socket's Accept queue. The GFE process is blocked on epoll_wait(). The kernel signals an EPOLLIN event on the listening socket FD. A GFE worker thread wakes, calls:

accept4(listen_fd, (struct sockaddr *)&client_addr, &addr_len, SOCK_NONBLOCK)
Enter fullscreen mode Exit fullscreen mode

This dequeues the connection and returns a new non-blocking socket FD plus the client's sockaddr_in struct (IP + ephemeral port, big-endian network byte order):

struct sockaddr_in {
    sa_family_t    sin_family; /* AF_INET */
    in_port_t      sin_port;   /* client ephemeral port, big-endian */
    struct in_addr sin_addr;   /* client public IP address */
    char           sin_zero[8];/* padding */
};
Enter fullscreen mode Exit fullscreen mode

Phase 11 — User-Space TLS Termination

BoringSSL, AES-NI, ECDHE

The GFE registers the new client FD with its epoll instance and performs TLS termination using BoringSSL, Google's internal fork of OpenSSL.

11.1 BoringSSL handshake

BoringSSL processes the ClientHello, selects a cipher suite, signs with its private key (ECDSA P-256 or RSA-2048), and derives symmetric session keys via ECDHE. The server's private key is stored in a hardware security module (HSM) or in Google's Titan security chips.

11.2 SSL_read() + AES-NI decryption

After the handshake, incoming TLS records trigger EPOLLIN. The GFE calls SSL_read(), which:

  1. Pulls ciphertext from the kernel socket buffer
  2. Invokes the EVP_AEAD_CTX_open() primitive
  3. Calls the AES-NI-accelerated aesni_gcm_decrypt routine

Each 16-byte AES block is processed in a single VAESENC SIMD instruction. Simultaneously, the GCM authentication tag is verified — a forged or bit-flipped byte causes immediate rejection and connection termination.

11.3 Private key security (FIDO/Titan)

The TLS private key never exists in cleartext RAM on production GFE servers. Signing operations are delegated to a Titan coprocessor or an Asylo TEE (Trusted Execution Environment). This limits the blast radius of a memory-disclosure vulnerability such as Heartbleed.


Phase 12 — HTTP/2 Parsing, Header Decompression & RPC Dispatch

HPACK → gRPC dispatch

Decrypted bytes enter the GFE's HTTP/2 state machine. Headers are decompressed, the request is parsed, and an internal gRPC call is formed.

12.1 HPACK decompression & header extraction

The GFE's HTTP/2 parser consumes HEADERS frames. It runs the HPACK decoder — maintaining a dynamic table of past headers — to reconstruct the full header list: :method GET, :path /watch?v=dQw4w9WgXcQ, cookie: SID=.... The SID token is extracted for Gaia authentication.

12.2 Virtual host & path routing

The GFE evaluates the Host header (www.youtube.com) and path (/watch) against its routing table — a radix tree keyed on (host, path prefix). It resolves to the "Watch" backend cluster. A weighted round-robin or least-loaded selection picks a specific backend server.

12.3 Protocol translation → protobuf / gRPC

The GFE strips the external TLS envelope, constructs an internal Protocol Buffer message encoding the request (method, path, headers, client IP from sockaddr_in.sin_addr), and transmits it as a gRPC call over Google's internal encrypted datacenter mesh (ALTS — Application Layer Transport Security).


Phase 13 — Backend Microservice Orchestration — Fan-out RPCs

Gaia, Spanner, TPU, Ads

The YouTube Watch service (C++ / Go) receives the gRPC call and immediately fans out parallel RPCs to specialised datastores and ML serving systems.

13.1 Identity / auth — Gaia service

An RPC to Gaia (Google's identity service) validates the SID session cookie. Gaia decrypts the token, checks its HMAC signature, looks up the user record in a distributed Bigtable, and returns the internal User ID (GAIA ID) and OAuth scopes within ~1 ms.

13.2 Video metadata — Cloud Spanner / Vitess

An RPC queries the video metadata shard keyed on dQw4w9WgXcQ. Cloud Spanner provides globally consistent reads via TrueTime timestamps (GPS + atomic clocks). The result includes title, description, channel ID, DRM flags, and publish date — retrieved in typically ~5 ms from a local replica.

13.3 Recommendations — TensorFlow / TPU

An async RPC to a TPU serving cluster running a TensorFlow ranking model. Input: user embedding vector + video embedding vector. The model scores hundreds of candidate videos via dot-product attention. Results (the "Up Next" list) arrive asynchronously while the rest of the page is assembled.

13.4 Monetisation — ad pre-roll check

An RPC to the ads engine checks the user's subscription status (YouTube Premium → no ads) and the video's monetisation flags. If an ad is to be served, an ad auction runs: DSPs submit bids in a real-time bidding (RTB) second-price auction within a ~100 ms deadline.

13.5 DASH manifest assembly

The video CDN registry is queried for all available encodings of dQw4w9WgXcQ: VP9 4K/1080p/720p/360p, AV1 1080p, H.264 1080p (for Safari/iOS). A DASH MPD manifest (XML) is generated listing segment URIs for each representation, enabling the client to perform adaptive bitrate streaming.


Phase 14 — Response Serialisation & Egress

Brotli, HTTP/2 DATA frames

The Watch service aggregates the fan-out responses and serialises the final HTML+JSON payload. The GFE compresses, frames, and encrypts it for transmission.

14.1 Brotli compression

The ~120 KB HTML+JSON payload is compressed with the Brotli algorithm (RFC 7932), which achieves ~20% better compression than gzip for text. Compression runs on a pre-built static Brotli dictionary tuned for web content. The compressed payload is typically ~25 KB.

14.2 HTTP/2 DATA frame chunking

The compressed bytes are split into HTTP/2 DATA frames (max frame size typically 16 KB). Each DATA frame carries a 9-byte header (length, type 0x0, flags, stream ID 1). Flow-control window management ensures the client's receive buffer is not overflowed.

14.3 SSL_write()sendmsg() → NIC TX ring

Each DATA frame is passed to SSL_write() for TLS record encryption. The GFE calls sendmsg(), which copies ciphertext from user-space buffers into the kernel's TCP TX socket buffer. The TCP stack segments the data, appends IP/TCP headers (including TCP sequence numbers and ACK fields), and enqueues segments in the NIC's DMA TX ring. The NIC's DMA engine reads from the ring and drives the PHY to emit optical pulses back toward the client.


Layer ④ — Rendering & Media


Phase 15 — Renderer Process Allocation & Stream Handoff

MIME sniff, Site Isolation

As the first bytes of the HTTP response arrive at Chrome's Network Service, the browser decides which renderer process will own this navigation.

15.1 MIME sniff & response validation

The Network Service reads the Content-Type: text/html; charset=utf-8 header. It also performs MIME sniffing (reading the first 512 bytes) as a fallback. It confirms the response is navigable HTML, not a download or an opaque resource.

15.2 Site Isolation & renderer selection

Chrome's Site Isolation policy (enabled since Chrome 67, post-Spectre) requires that youtube.com content runs in a renderer process dedicated exclusively to the youtube.com origin. The Browser process either spawns a new sandboxed Renderer Process or reuses an existing one already dedicated to youtube.com. Each renderer is a separate OS process with a seccomp-BPF sandbox — a compromised renderer cannot directly access the filesystem, GPU, or network.

15.3 Byte stream handoff to renderer (Mojo data pipe)

The Network Service streams response bytes to the Renderer Process via a Mojo data pipe — a shared-memory ring buffer. The renderer reads from its end while the network thread writes, enabling true streaming parsing: the DOM can begin construction before the last byte of HTML arrives.


Phase 16 — DOM Construction — Tokenisation & Tree Building

Tokeniser → tree builder

The renderer's HTML parser (Blink's HTMLParser) processes the byte stream into the Document Object Model.

16.1 Byte stream → UTF-8 character stream

The TextResourceDecoder reads the charset=utf-8 from the Content-Type header and decodes the byte stream to Unicode code points. If charset is absent, the parser sniffs the BOM or uses the HTML5 encoding detection heuristics.

16.2 Tokenisation → StartTag / EndTag / Text

Blink's HTML tokeniser implements the HTML5 parsing algorithm — a state machine with ~80 states. It emits tokens: StartTag, EndTag, Character, Comment, DOCTYPE. The parser is speculative: it scans ahead for resource-loading tags (<img>, <script>, <link>) and issues preload requests before fully constructing the tree.

16.3 Token stream → DOM tree (tree builder)

The tree builder consumes tokens and constructs DOM nodes (HTMLElement, TextNode, etc.) using a stack-based insertion mode automaton (Initial → BeforeHtml → BeforeHead → InHead → InBody → etc.). The result is the document tree rooted at the Document node.


Phase 17 — CSSOM Construction

StyleSheets → CSSOM

Concurrently with DOM construction, the parser encounters <link rel="stylesheet"> tags and fetches CSS resources.

17.1 CSS parsing → StyleSheetContents

Each CSS file is parsed by Blink's CSSParser into a StyleSheetContents object — a sorted set of rules. Selectors are parsed into SelectorLists (trees of compound selectors). Property values are tokenised and validated against their grammar.

17.2 Rule matching & specificity → ComputedStyle

Blink builds the CSS Object Model (CSSOM): a mapping from DOM nodes to computed style objects. Rule matching traverses the style sheet tree; specificity (inline > ID > class > element) determines which conflicting declarations win. The result is stored in a ComputedStyle struct per element, containing the final resolved value for every CSS property.


Phase 18 — JavaScript Compilation & Execution (V8)

AST → Ignition → TurboFan

When the parser encounters a <script> tag (without async/defer), it pauses HTML parsing and hands the script source to the V8 engine.

18.1 Source → AST (Parser)

V8's Parser tokenises and parses the JS source into an Abstract Syntax Tree (AST). A pre-parser (scanner-only) performs a fast first pass to identify function boundaries, enabling the full parser to skip inner function bodies until they are called.

18.2 AST → bytecode (Ignition interpreter)

The Ignition interpreter walks the AST and generates register-machine bytecode (compact 1–3 bytes per instruction). Bytecode is immediately executed. Ignition profiles execution frequency and type feedback — recording, for example, that a particular addition always operates on SMI (small integer) operands.

18.3 Hot path → native code (TurboFan JIT)

Functions that exceed a call or loop-iteration threshold are submitted to the TurboFan optimising compiler. TurboFan builds a Sea of Nodes IR, applies type-specialised optimisations (function inlining, escape analysis, SIMD auto-vectorisation), and emits native x86-64 / ARM64 machine code. If a type assumption is violated at runtime, a deoptimisation trap drops execution back to Ignition bytecode.

18.4 YouTube SPA bootstrap

YouTube's JS initialises a Web Components / Polymer-based Single Page Application. It registers custom elements (<ytd-app>, <ytd-video-primary-info-renderer>), sets up a client-side router listening to history.pushState, and issues Fetch API calls for the video metadata JSON (loaded separately from the initial HTML skeleton).


Phase 19 — Render Tree Formation & Layout (Reflow)

Layout tree, reflow

The DOM and CSSOM are merged into the Layout Tree. The rendering engine then computes the exact geometry of every visible element.

19.1 DOM + CSSOM → Layout tree

Blink traverses the DOM and, for each node with a ComputedStyle that has display ≠ none, creates a LayoutObject. Pseudo-elements (::before, ::after) become anonymous LayoutObjects. The Layout Tree is distinct from the DOM tree — some DOM nodes have no LayoutObject (e.g., <head>); others (anonymous block boxes) have no corresponding DOM node.

19.2 Block formatting context & reflow

Blink's layout engine runs the CSS block/inline formatting algorithm. For each LayoutObject, it computes (x, y, width, height) relative to the containing block. Flexbox and Grid elements run their respective constraint-solving algorithms. YouTube's responsive layout relies heavily on CSS Grid and CSS Custom Properties. Reflow is triggered top-down; dirty bits propagate to ensure only subtrees with changed input geometry are re-laid out (incremental reflow).


Phase 20 — Paint, Layer Compositing & GPU Rasterisation

Paint records, layers, GPU

Blink generates paint operations, divides the page into compositor layers, rasterises them (hardware-accelerated on the GPU), and hands the final frame to the OS.

20.1 Display list generation (PaintController)

The PaintController traverses the Layout Tree in paint order (stacking contexts, z-index) and records a display list of drawing commands: DrawRect, DrawTextBlob, DrawImage. This is a serialisable record, not a bitmap — it can be replayed at any resolution or scale factor.

20.2 Compositor layer promotion

Elements are promoted to their own compositor layer when they have:

  • CSS will-change: transform (or transform/opacity animations)
  • A <video> or <canvas> element
  • Heuristic prediction of animation

YouTube's video player is always a separate layer. Layer promotion trades VRAM (each layer is a GPU texture) for scroll/animation smoothness — the compositor can reposition a layer without re-running JavaScript or layout.

20.3 GPU rasterisation (OOP-R, Skia-Ganesh)

Blink sends display lists to the GPU process (out-of-process rasterisation, OOP-R). The GPU process submits them to the Skia-Ganesh rendering library, which issues OpenGL / Vulkan / Metal draw calls. The GPU rasterises each layer into a GPU texture stored in VRAM.

20.4 Compositor thread → final frame → vsync

The Compositor Thread (separate from the Main Thread) receives the list of layers and their transforms. It composites them into a final frame using the GPU's texture blending units — no CPU involvement. It submits a swap command to the OS graphics API (DirectX 12 / Metal / Vulkan). The display controller reads the framebuffer and drives the monitor's scan-out circuit at the refresh rate (60 or 120 Hz). The first pixels appear on screen.


Phase 21 — Media Source Extensions & DASH Adaptive Streaming

DASH manifest, SourceBuffer, hardware decode

The JS application initialises the <video> element using the Media Source Extensions (MSE) API to implement Dynamic Adaptive Streaming over HTTP (DASH).

21.1 MediaSource & SourceBuffer creation

JavaScript calls video.src = URL.createObjectURL(new MediaSource()). When the MediaSource sourceopen event fires, JS calls:

  • addSourceBuffer('video/webm; codecs="vp9"')
  • addSourceBuffer('audio/webm; codecs="opus"')

Creating separate demuxed buffers for video and audio tracks.

21.2 MPD manifest fetch & ABR selection

JS fetches the DASH MPD manifest (XML) from YouTube's CDN. It parses available Representations (1080p VP9 @ 4 Mbps, 720p VP9 @ 2 Mbps, etc.). An ABR (Adaptive Bitrate) algorithm estimates available bandwidth using throughput history and buffer occupancy, then selects the highest quality that avoids rebuffering.

21.3 Segment fetch → SourceBuffer.appendBuffer()

JS issues fetch() calls for ~2–10 second video/audio segments. Received ArrayBuffers are passed to sourceBuffer.appendBuffer(data). The browser's media pipeline demultiplexes the WebM container (EBML format), extracts compressed frames, and maintains a coded frame buffer. As the video element's currentTime advances, the media pipeline dequeues the next coded frame.

21.4 Hardware media decoder (VP9 / AV1 / H.264)

Compressed video frames are submitted to the OS's hardware media decoder:

  • VAAPI (Linux / Intel/AMD GPU)
  • VideoToolbox (macOS / iOS)
  • DXVA2 / D3D11VA (Windows)
  • MediaCodec (Android)

The dedicated video decode ASIC on the GPU die processes frames at full resolution without CPU involvement. Output is planar YUV 4:2:0 video frames in VRAM.

21.5 A/V sync via presentation timestamps (PTS)

Each video frame and audio packet carries a Presentation Timestamp (PTS) in the container's timebase. The media pipeline maintains a media clock locked to the audio renderer's playback position (audio is the master clock). Video frames are held in a decoded frame queue and released to the compositor exactly when currentTime ≥ frame.PTS.

21.6 YUV texture upload, GPU composite & audio DAC output

Decoded YUV frames are uploaded as GPU textures and composited into the video compositor layer by the Compositor Thread, with a YUV→RGB conversion shader applied on the GPU.

Audio PCM samples are written into the OS audio daemon's ring buffer:

  • CoreAudio (macOS / iOS)
  • PulseAudio / PipeWire (Linux)
  • WASAPI (Windows)

The DAC converts the digital PCM samples to analogue waveforms, drives the speaker amplifier — and Rick Astley begins to sing.


Summary of Layer Boundaries

Layer Entry point Exit point
① Client KEY_UP event on Enter TLS-encrypted bytes leave the NIC
② Network transit First router hop after the client NIC Packet arrives at Google PoP NIC
③ Server GFE NIC PHY receives optical pulse Encrypted HTTP/2 DATA frames leave GFE NIC
④ Rendering & media Chrome Network Service receives first response bytes GPU scans out final composited frame; DAC emits audio

Total elapsed wall-clock time (typical broadband, nearby PoP): ~200–400 ms to first meaningful paint; ~1–2 s to first video frame playback.

Top comments (0)