A complete sequential trace of every system layer traversed when a user types
http://www.youtube.com/watch?v=dQw4w9WgXcQinto Chrome and presses Enter — from the first kernel interrupt to the final GPU composited frame.
Layer ① — Client
Phase 1 — Input Processing
KEY_UP → URI parse
The browser's UI thread intercepts the KEY_UP event on Enter. The raw string http://www.youtube.com/watch?v=dQw4w9WgXcQ is passed to Chrome's embedded GURL parser.
1.1 Event dispatch (UI thread)
The OS delivers a WM_KEYUP (Windows) or KeyRelease (X11) event to the browser process's message pump. The Compositor thread forwards it to the Browser's Main (UI) thread via an IPC message.
1.2 URI structural parse (GURL)
GURL splits the string into:
-
scheme:
http -
host:
www.youtube.com -
path:
/watch -
query:
?v=dQw4w9WgXcQ
A strict URI regex confirms it is not an Omnibox search query — routing to the search engine is bypassed.
1.3 Search vs. navigate decision (Omnibox)
If the parsed scheme is http, https, ftp, file, or a registered custom scheme, the input is treated as a navigation. Any ambiguous string (no TLD, embedded spaces) would instead be forwarded to the default search engine.
Phase 2 — HSTS Enforcement & Scheme Upgrade
307 internal redirect
Before any network I/O, Chrome consults its preloaded HSTS list (sourced from the HSTS Preload List, baked into Chromium's binary and also cached locally). youtube.com is on this list.
2.1 Preload list lookup (HSTS)
Chrome walks a prefix tree over the eTLD+1 of the host (youtube.com). The entry signals includeSubDomains, so www.youtube.com is also covered. Stored policy fields: max-age, includeSubDomains, preload.
2.2 Scheme rewrite (307 Internal)
Chrome synthesises a 307 Internal Redirect entirely in-process — no bytes leave the machine. The URL is rewritten to https://www.youtube.com/watch?v=dQw4w9WgXcQ. This prevents any HTTP probe packet that could be intercepted by a MITM before a 301 redirect arrives from the server.
2.3 MITM downgrade prevention
Without HSTS, an attacker on the local network could intercept the initial http:// SYN, impersonate the server, serve a spoof page, and later redirect to HTTPS — the user would never notice. HSTS eliminates the plaintext window entirely.
Phase 3 — Inter-Process Communication (IPC)
UI thread → Network thread
The UI thread cannot perform blocking network I/O directly. It packages the navigation intent into an IPC message and posts it asynchronously to the Network Service process (isolated since Chrome 66).
3.1 Navigation request serialisation (Mojo IPC)
Chrome uses the Mojo IPC framework. The UI thread creates a network::ResourceRequest struct, serialises it over a Mojo pipe to the Network Service. This pipe is a named shared-memory channel; passing a file descriptor for the socket avoids a data copy.
3.2 Network Service sandbox
The Network Service runs in a restricted sandbox (no filesystem access, limited syscalls via seccomp-BPF). It owns all sockets, DNS resolution, and TLS. Compromise of the Renderer or UI process does not expose raw socket access — only the Network Service can read/write encrypted bytes.
Phase 4 — DNS Resolution — Cache Traversal
Browser → OS → hosts file
Chrome's Network Service walks a three-tier cache hierarchy before issuing any external DNS packet. Each tier is a TTL-governed LRU cache.
4.1 Browser DNS cache (Tier 1)
Chrome maintains its own in-process DNS cache, independent of the OS. Keyed by hostname + record type. chrome://net-internals/#dns exposes its contents. On a cache hit, the IP is returned immediately; the A/AAAA record TTL governs expiry.
4.2 OS stub resolver — getaddrinfo (Tier 2)
On a miss, Chrome calls getaddrinfo() (POSIX) or DnsQuery() (Win32). The OS checks its own DNS cache (nscd on Linux, DNSCache service on Windows). It also reads /etc/hosts (POSIX) or C:\Windows\System32\drivers\etc\hosts for static overrides.
4.3 Hosts file override (Tier 3)
127.0.0.1 localhost is the canonical example. Entries here short-circuit all network lookups. Malware often poisons this file to redirect domains — it takes effect before any DNS packet is emitted.
Phase 5 — DNS Resolution — Recursive Hierarchy
Stub → Root → TLD → Authoritative
On a full cache miss, the OS stub resolver emits a UDP datagram to the configured recursive resolver (e.g., 8.8.8.8). If that resolver also lacks the record, it traverses the global DNS tree.
5.1 DNS query datagram (UDP)
The stub resolver constructs a 12-byte DNS header plus the QNAME wire format: \x03www\x07youtube\x03com\x00, QTYPE A (0x0001), QCLASS IN (0x0001). Sent as a single UDP datagram to port 53 of the configured resolver.
5.2 Root → TLD → Authoritative traversal (Recursive)
If the recursive resolver (e.g. Google Public DNS at 8.8.8.8) lacks a cached answer:
- It queries one of the 13 root server clusters (A–M.ROOT-SERVERS.NET) for a referral to the
.comTLD servers. - It queries a Verisign TLD server for the NS records of
youtube.com. - It queries Google's authoritative nameservers (
ns1–ns4.google.com) for the A/AAAA record ofwww.youtube.com.
5.3 DNSSEC validation (optional)
Google's public resolvers support DNSSEC. The resolver validates the chain of trust from the root KSK → .com ZSK → youtube.com ZSK → the RRSet signature. An invalid signature causes the resolver to return SERVFAIL rather than a spoofed IP.
5.4 A/AAAA record returned
Google's authoritative servers return an Anycast VIP (Virtual IP) belonging to Google's global load-balancing fabric — a Google Front End (GFE). The TTL is typically 300 seconds. The recursive resolver caches the answer and returns it to the OS, which caches it and returns it to Chrome.
Layer ② — Network Transit
Phase 6 — Socket Allocation & Transport Handshake
Socket alloc, QUIC/TCP handshake
Chrome prefers HTTP/3 over QUIC (UDP-based) when connecting to Google services. It remembers QUIC availability via an Alt-Svc header from previous sessions. If QUIC is blocked, it falls back to TCP/TLS.
6.1 QUIC 0-RTT or 1-RTT handshake
QUIC multiplexes TLS 1.3 and transport into a single UDP-based protocol. On a known server (cached session ticket), Chrome attempts a 0-RTT handshake: it sends the ClientHello + HTTP request data in the very first UDP datagram. The server can respond immediately, saving one full round-trip versus TCP+TLS.
6.2 TCP three-way handshake (fallback)
If QUIC is unavailable (UDP blocked, no cached session): the OS allocates a socket FD, binds an ephemeral source port (e.g., 54321), and sends a TCP SYN to the GFE's Anycast IP on port 443. The server responds with SYN-ACK; the client sends ACK. The connection is established after one RTT.
6.3 TLS 1.3 handshake (TCP path)
Immediately after the TCP ACK, the client sends a ClientHello containing:
- Supported cipher suites:
TLS_AES_128_GCM_SHA256,TLS_CHACHA20_POLY1305_SHA256 - An ECDH key share (X25519 curve)
- SNI extension:
www.youtube.com - ALPN extension:
h2
The server responds with ServerHello + Certificate + CertificateVerify + Finished. The client verifies the cert chain against Mozilla's Root CA store (bundled in Chrome). A symmetric session key is derived via ECDHE (Elliptic Curve Diffie-Hellman Ephemeral). Total handshake overhead: 1 RTT.
6.4 Cipher negotiation & symmetric key (AES-NI)
The agreed cipher is typically AES-128-GCM. On modern x86 CPUs, the AES-NI instruction set executes each AES round as a single opcode, achieving ~40 Gbps throughput per core. GCM mode provides both encryption and a Message Authentication Code (MAC) in one pass.
Phase 7 — HTTP Request Construction & Transmission
GET /watch HEADERS frame
The Network Service formats an HTTP/2 HEADERS frame (or HTTP/3 HEADERS frame over QUIC) containing the full request, then encrypts and transmits it.
7.1 Header compression — HPACK/QPACK
HTTP/2 uses HPACK to compress headers. Common headers like :method: GET, :scheme: https are represented as 1-byte indices into a static table. Custom headers like Cookie are Huffman-encoded and stored in a dynamic table for future requests on the same connection. This reduces header overhead from ~800 bytes to ~100 bytes.
7.2 HEADERS frame structure
The HEADERS frame contains:
:method GET:path /watch?v=dQw4w9WgXcQ:authority www.youtube.com:scheme https-
user-agent(Chrome version string) accept-encoding: br, gzip-
cookie: SID=...; HSID=...; SSID=...(YouTube session tokens)
Stream ID is 1 (first client-initiated stream).
7.3 TLS record layer encryption
The HEADERS frame bytes are passed to BoringSSL's SSL_write(). They are chunked into TLS records (max 16 KB each), each prepended with a 5-byte header (content type, version, length) and appended with a 16-byte GCM authentication tag. The ciphertext is written to the kernel's TCP send buffer via write() / sendmsg().
Phase 8 — Network Transit & BGP Routing
BGP routing, PoP ingress
Encrypted IP packets traverse the public internet, routed by BGP (Border Gateway Protocol), before arriving at a Google Point of Presence (PoP).
8.1 Autonomous system routing (BGP)
Each internet router makes next-hop decisions by looking up the destination IP in its BGP routing table. The table is built by exchanging AS PATH advertisements between Autonomous Systems. Packets may traverse 10–20 router hops, each performing a longest-prefix match in nanoseconds via hardware TCAM (Ternary Content Addressable Memory).
8.2 Google Anycast ingress
Google's GFE VIPs are announced from hundreds of PoPs simultaneously via BGP Anycast. The internet's routing infrastructure naturally directs the packet to the topologically nearest Google PoP. A user in a given region (e.g. North Africa) reaches a Google edge in a nearby city (e.g. Paris or Frankfurt) rather than a US datacenter.
8.3 Maglev load balancer
At the PoP, Maglev — Google's software load balancer — receives the packet. It computes a consistent hash over the 5-tuple (src IP, src port, dst IP, dst port, protocol) to deterministically map the connection to a specific GFE server. The same 5-tuple always maps to the same GFE, preserving TCP state across server pool changes via consistent hashing.
Layer ③ — Server
Phase 9 — NIC Ingress & Kernel DMA
DMA → sk_buff → NAPI
The TCP SYN (or QUIC UDP datagram) physically arrives at the GFE server's Network Interface Controller (NIC).
9.1 Optical → electrical conversion (PHY)
The NIC's PHY transceiver converts arriving optical pulses (1310 nm or 1550 nm photons on the fibre) into differential electrical signals, then into digital logic. The MAC layer reads the Ethernet frame and verifies the Frame Check Sequence (FCS) — a CRC-32 over the entire frame. A bad FCS causes the frame to be silently dropped.
9.2 PCIe DMA → sk_buff
On a valid FCS, the NIC's PCIe DMA engine copies the packet payload directly into a pre-allocated kernel ring buffer (the RX descriptor ring) without CPU involvement. Linux represents each packet as an sk_buff (socket buffer) struct — a 256-byte metadata header pointing to the payload in kernel memory.
9.3 IRQ → NET_RX_SOFTIRQ → NAPI poll
The NIC asserts a hardware IRQ. The CPU services it, but immediately masks further NIC interrupts and schedules a NET_RX_SOFTIRQ. The kernel shifts to NAPI polling mode: it spins draining the RX ring in a tight loop, processing up to budget=64 packets per poll cycle. This prevents interrupt storms at high packet rates (10–100 Gbps).
Phase 10 — Kernel TCP State Machine & Accept Queue
SYN queue → Accept queue → epoll
The packet ascends the Linux network stack. The TCP layer manages the SYN queue and Accept queue, then notifies user space via epoll.
10.1 IP layer demultiplexing
The IPv4/IPv6 layer validates the header checksum and the destination IP against the server's interface addresses. It looks up the transport protocol (6 = TCP, 17 = UDP) in the protocol table and passes the payload upward.
10.2 SYN queue → Transmission Control Block (TCB)
The TCP layer hashes the 4-tuple to locate or create a Transmission Control Block (TCB). For a new SYN, it allocates a half-open TCB in the SYN queue and sends SYN-ACK. The SYN queue is protected against SYN flood attacks by SYN cookies — the initial sequence number encodes a hash of the 4-tuple so no per-connection state is needed until the final ACK arrives.
10.3 ACK → Accept queue → accept4()
When the client's final ACK arrives, the kernel moves the TCB to the socket's Accept queue. The GFE process is blocked on epoll_wait(). The kernel signals an EPOLLIN event on the listening socket FD. A GFE worker thread wakes, calls:
accept4(listen_fd, (struct sockaddr *)&client_addr, &addr_len, SOCK_NONBLOCK)
This dequeues the connection and returns a new non-blocking socket FD plus the client's sockaddr_in struct (IP + ephemeral port, big-endian network byte order):
struct sockaddr_in {
sa_family_t sin_family; /* AF_INET */
in_port_t sin_port; /* client ephemeral port, big-endian */
struct in_addr sin_addr; /* client public IP address */
char sin_zero[8];/* padding */
};
Phase 11 — User-Space TLS Termination
BoringSSL, AES-NI, ECDHE
The GFE registers the new client FD with its epoll instance and performs TLS termination using BoringSSL, Google's internal fork of OpenSSL.
11.1 BoringSSL handshake
BoringSSL processes the ClientHello, selects a cipher suite, signs with its private key (ECDSA P-256 or RSA-2048), and derives symmetric session keys via ECDHE. The server's private key is stored in a hardware security module (HSM) or in Google's Titan security chips.
11.2 SSL_read() + AES-NI decryption
After the handshake, incoming TLS records trigger EPOLLIN. The GFE calls SSL_read(), which:
- Pulls ciphertext from the kernel socket buffer
- Invokes the
EVP_AEAD_CTX_open()primitive - Calls the AES-NI-accelerated
aesni_gcm_decryptroutine
Each 16-byte AES block is processed in a single VAESENC SIMD instruction. Simultaneously, the GCM authentication tag is verified — a forged or bit-flipped byte causes immediate rejection and connection termination.
11.3 Private key security (FIDO/Titan)
The TLS private key never exists in cleartext RAM on production GFE servers. Signing operations are delegated to a Titan coprocessor or an Asylo TEE (Trusted Execution Environment). This limits the blast radius of a memory-disclosure vulnerability such as Heartbleed.
Phase 12 — HTTP/2 Parsing, Header Decompression & RPC Dispatch
HPACK → gRPC dispatch
Decrypted bytes enter the GFE's HTTP/2 state machine. Headers are decompressed, the request is parsed, and an internal gRPC call is formed.
12.1 HPACK decompression & header extraction
The GFE's HTTP/2 parser consumes HEADERS frames. It runs the HPACK decoder — maintaining a dynamic table of past headers — to reconstruct the full header list: :method GET, :path /watch?v=dQw4w9WgXcQ, cookie: SID=.... The SID token is extracted for Gaia authentication.
12.2 Virtual host & path routing
The GFE evaluates the Host header (www.youtube.com) and path (/watch) against its routing table — a radix tree keyed on (host, path prefix). It resolves to the "Watch" backend cluster. A weighted round-robin or least-loaded selection picks a specific backend server.
12.3 Protocol translation → protobuf / gRPC
The GFE strips the external TLS envelope, constructs an internal Protocol Buffer message encoding the request (method, path, headers, client IP from sockaddr_in.sin_addr), and transmits it as a gRPC call over Google's internal encrypted datacenter mesh (ALTS — Application Layer Transport Security).
Phase 13 — Backend Microservice Orchestration — Fan-out RPCs
Gaia, Spanner, TPU, Ads
The YouTube Watch service (C++ / Go) receives the gRPC call and immediately fans out parallel RPCs to specialised datastores and ML serving systems.
13.1 Identity / auth — Gaia service
An RPC to Gaia (Google's identity service) validates the SID session cookie. Gaia decrypts the token, checks its HMAC signature, looks up the user record in a distributed Bigtable, and returns the internal User ID (GAIA ID) and OAuth scopes within ~1 ms.
13.2 Video metadata — Cloud Spanner / Vitess
An RPC queries the video metadata shard keyed on dQw4w9WgXcQ. Cloud Spanner provides globally consistent reads via TrueTime timestamps (GPS + atomic clocks). The result includes title, description, channel ID, DRM flags, and publish date — retrieved in typically ~5 ms from a local replica.
13.3 Recommendations — TensorFlow / TPU
An async RPC to a TPU serving cluster running a TensorFlow ranking model. Input: user embedding vector + video embedding vector. The model scores hundreds of candidate videos via dot-product attention. Results (the "Up Next" list) arrive asynchronously while the rest of the page is assembled.
13.4 Monetisation — ad pre-roll check
An RPC to the ads engine checks the user's subscription status (YouTube Premium → no ads) and the video's monetisation flags. If an ad is to be served, an ad auction runs: DSPs submit bids in a real-time bidding (RTB) second-price auction within a ~100 ms deadline.
13.5 DASH manifest assembly
The video CDN registry is queried for all available encodings of dQw4w9WgXcQ: VP9 4K/1080p/720p/360p, AV1 1080p, H.264 1080p (for Safari/iOS). A DASH MPD manifest (XML) is generated listing segment URIs for each representation, enabling the client to perform adaptive bitrate streaming.
Phase 14 — Response Serialisation & Egress
Brotli, HTTP/2 DATA frames
The Watch service aggregates the fan-out responses and serialises the final HTML+JSON payload. The GFE compresses, frames, and encrypts it for transmission.
14.1 Brotli compression
The ~120 KB HTML+JSON payload is compressed with the Brotli algorithm (RFC 7932), which achieves ~20% better compression than gzip for text. Compression runs on a pre-built static Brotli dictionary tuned for web content. The compressed payload is typically ~25 KB.
14.2 HTTP/2 DATA frame chunking
The compressed bytes are split into HTTP/2 DATA frames (max frame size typically 16 KB). Each DATA frame carries a 9-byte header (length, type 0x0, flags, stream ID 1). Flow-control window management ensures the client's receive buffer is not overflowed.
14.3 SSL_write() → sendmsg() → NIC TX ring
Each DATA frame is passed to SSL_write() for TLS record encryption. The GFE calls sendmsg(), which copies ciphertext from user-space buffers into the kernel's TCP TX socket buffer. The TCP stack segments the data, appends IP/TCP headers (including TCP sequence numbers and ACK fields), and enqueues segments in the NIC's DMA TX ring. The NIC's DMA engine reads from the ring and drives the PHY to emit optical pulses back toward the client.
Layer ④ — Rendering & Media
Phase 15 — Renderer Process Allocation & Stream Handoff
MIME sniff, Site Isolation
As the first bytes of the HTTP response arrive at Chrome's Network Service, the browser decides which renderer process will own this navigation.
15.1 MIME sniff & response validation
The Network Service reads the Content-Type: text/html; charset=utf-8 header. It also performs MIME sniffing (reading the first 512 bytes) as a fallback. It confirms the response is navigable HTML, not a download or an opaque resource.
15.2 Site Isolation & renderer selection
Chrome's Site Isolation policy (enabled since Chrome 67, post-Spectre) requires that youtube.com content runs in a renderer process dedicated exclusively to the youtube.com origin. The Browser process either spawns a new sandboxed Renderer Process or reuses an existing one already dedicated to youtube.com. Each renderer is a separate OS process with a seccomp-BPF sandbox — a compromised renderer cannot directly access the filesystem, GPU, or network.
15.3 Byte stream handoff to renderer (Mojo data pipe)
The Network Service streams response bytes to the Renderer Process via a Mojo data pipe — a shared-memory ring buffer. The renderer reads from its end while the network thread writes, enabling true streaming parsing: the DOM can begin construction before the last byte of HTML arrives.
Phase 16 — DOM Construction — Tokenisation & Tree Building
Tokeniser → tree builder
The renderer's HTML parser (Blink's HTMLParser) processes the byte stream into the Document Object Model.
16.1 Byte stream → UTF-8 character stream
The TextResourceDecoder reads the charset=utf-8 from the Content-Type header and decodes the byte stream to Unicode code points. If charset is absent, the parser sniffs the BOM or uses the HTML5 encoding detection heuristics.
16.2 Tokenisation → StartTag / EndTag / Text
Blink's HTML tokeniser implements the HTML5 parsing algorithm — a state machine with ~80 states. It emits tokens: StartTag, EndTag, Character, Comment, DOCTYPE. The parser is speculative: it scans ahead for resource-loading tags (<img>, <script>, <link>) and issues preload requests before fully constructing the tree.
16.3 Token stream → DOM tree (tree builder)
The tree builder consumes tokens and constructs DOM nodes (HTMLElement, TextNode, etc.) using a stack-based insertion mode automaton (Initial → BeforeHtml → BeforeHead → InHead → InBody → etc.). The result is the document tree rooted at the Document node.
Phase 17 — CSSOM Construction
StyleSheets → CSSOM
Concurrently with DOM construction, the parser encounters <link rel="stylesheet"> tags and fetches CSS resources.
17.1 CSS parsing → StyleSheetContents
Each CSS file is parsed by Blink's CSSParser into a StyleSheetContents object — a sorted set of rules. Selectors are parsed into SelectorLists (trees of compound selectors). Property values are tokenised and validated against their grammar.
17.2 Rule matching & specificity → ComputedStyle
Blink builds the CSS Object Model (CSSOM): a mapping from DOM nodes to computed style objects. Rule matching traverses the style sheet tree; specificity (inline > ID > class > element) determines which conflicting declarations win. The result is stored in a ComputedStyle struct per element, containing the final resolved value for every CSS property.
Phase 18 — JavaScript Compilation & Execution (V8)
AST → Ignition → TurboFan
When the parser encounters a <script> tag (without async/defer), it pauses HTML parsing and hands the script source to the V8 engine.
18.1 Source → AST (Parser)
V8's Parser tokenises and parses the JS source into an Abstract Syntax Tree (AST). A pre-parser (scanner-only) performs a fast first pass to identify function boundaries, enabling the full parser to skip inner function bodies until they are called.
18.2 AST → bytecode (Ignition interpreter)
The Ignition interpreter walks the AST and generates register-machine bytecode (compact 1–3 bytes per instruction). Bytecode is immediately executed. Ignition profiles execution frequency and type feedback — recording, for example, that a particular addition always operates on SMI (small integer) operands.
18.3 Hot path → native code (TurboFan JIT)
Functions that exceed a call or loop-iteration threshold are submitted to the TurboFan optimising compiler. TurboFan builds a Sea of Nodes IR, applies type-specialised optimisations (function inlining, escape analysis, SIMD auto-vectorisation), and emits native x86-64 / ARM64 machine code. If a type assumption is violated at runtime, a deoptimisation trap drops execution back to Ignition bytecode.
18.4 YouTube SPA bootstrap
YouTube's JS initialises a Web Components / Polymer-based Single Page Application. It registers custom elements (<ytd-app>, <ytd-video-primary-info-renderer>), sets up a client-side router listening to history.pushState, and issues Fetch API calls for the video metadata JSON (loaded separately from the initial HTML skeleton).
Phase 19 — Render Tree Formation & Layout (Reflow)
Layout tree, reflow
The DOM and CSSOM are merged into the Layout Tree. The rendering engine then computes the exact geometry of every visible element.
19.1 DOM + CSSOM → Layout tree
Blink traverses the DOM and, for each node with a ComputedStyle that has display ≠ none, creates a LayoutObject. Pseudo-elements (::before, ::after) become anonymous LayoutObjects. The Layout Tree is distinct from the DOM tree — some DOM nodes have no LayoutObject (e.g., <head>); others (anonymous block boxes) have no corresponding DOM node.
19.2 Block formatting context & reflow
Blink's layout engine runs the CSS block/inline formatting algorithm. For each LayoutObject, it computes (x, y, width, height) relative to the containing block. Flexbox and Grid elements run their respective constraint-solving algorithms. YouTube's responsive layout relies heavily on CSS Grid and CSS Custom Properties. Reflow is triggered top-down; dirty bits propagate to ensure only subtrees with changed input geometry are re-laid out (incremental reflow).
Phase 20 — Paint, Layer Compositing & GPU Rasterisation
Paint records, layers, GPU
Blink generates paint operations, divides the page into compositor layers, rasterises them (hardware-accelerated on the GPU), and hands the final frame to the OS.
20.1 Display list generation (PaintController)
The PaintController traverses the Layout Tree in paint order (stacking contexts, z-index) and records a display list of drawing commands: DrawRect, DrawTextBlob, DrawImage. This is a serialisable record, not a bitmap — it can be replayed at any resolution or scale factor.
20.2 Compositor layer promotion
Elements are promoted to their own compositor layer when they have:
- CSS
will-change: transform(ortransform/opacityanimations) - A
<video>or<canvas>element - Heuristic prediction of animation
YouTube's video player is always a separate layer. Layer promotion trades VRAM (each layer is a GPU texture) for scroll/animation smoothness — the compositor can reposition a layer without re-running JavaScript or layout.
20.3 GPU rasterisation (OOP-R, Skia-Ganesh)
Blink sends display lists to the GPU process (out-of-process rasterisation, OOP-R). The GPU process submits them to the Skia-Ganesh rendering library, which issues OpenGL / Vulkan / Metal draw calls. The GPU rasterises each layer into a GPU texture stored in VRAM.
20.4 Compositor thread → final frame → vsync
The Compositor Thread (separate from the Main Thread) receives the list of layers and their transforms. It composites them into a final frame using the GPU's texture blending units — no CPU involvement. It submits a swap command to the OS graphics API (DirectX 12 / Metal / Vulkan). The display controller reads the framebuffer and drives the monitor's scan-out circuit at the refresh rate (60 or 120 Hz). The first pixels appear on screen.
Phase 21 — Media Source Extensions & DASH Adaptive Streaming
DASH manifest, SourceBuffer, hardware decode
The JS application initialises the <video> element using the Media Source Extensions (MSE) API to implement Dynamic Adaptive Streaming over HTTP (DASH).
21.1 MediaSource & SourceBuffer creation
JavaScript calls video.src = URL.createObjectURL(new MediaSource()). When the MediaSource sourceopen event fires, JS calls:
addSourceBuffer('video/webm; codecs="vp9"')addSourceBuffer('audio/webm; codecs="opus"')
Creating separate demuxed buffers for video and audio tracks.
21.2 MPD manifest fetch & ABR selection
JS fetches the DASH MPD manifest (XML) from YouTube's CDN. It parses available Representations (1080p VP9 @ 4 Mbps, 720p VP9 @ 2 Mbps, etc.). An ABR (Adaptive Bitrate) algorithm estimates available bandwidth using throughput history and buffer occupancy, then selects the highest quality that avoids rebuffering.
21.3 Segment fetch → SourceBuffer.appendBuffer()
JS issues fetch() calls for ~2–10 second video/audio segments. Received ArrayBuffers are passed to sourceBuffer.appendBuffer(data). The browser's media pipeline demultiplexes the WebM container (EBML format), extracts compressed frames, and maintains a coded frame buffer. As the video element's currentTime advances, the media pipeline dequeues the next coded frame.
21.4 Hardware media decoder (VP9 / AV1 / H.264)
Compressed video frames are submitted to the OS's hardware media decoder:
- VAAPI (Linux / Intel/AMD GPU)
- VideoToolbox (macOS / iOS)
- DXVA2 / D3D11VA (Windows)
- MediaCodec (Android)
The dedicated video decode ASIC on the GPU die processes frames at full resolution without CPU involvement. Output is planar YUV 4:2:0 video frames in VRAM.
21.5 A/V sync via presentation timestamps (PTS)
Each video frame and audio packet carries a Presentation Timestamp (PTS) in the container's timebase. The media pipeline maintains a media clock locked to the audio renderer's playback position (audio is the master clock). Video frames are held in a decoded frame queue and released to the compositor exactly when currentTime ≥ frame.PTS.
21.6 YUV texture upload, GPU composite & audio DAC output
Decoded YUV frames are uploaded as GPU textures and composited into the video compositor layer by the Compositor Thread, with a YUV→RGB conversion shader applied on the GPU.
Audio PCM samples are written into the OS audio daemon's ring buffer:
- CoreAudio (macOS / iOS)
- PulseAudio / PipeWire (Linux)
- WASAPI (Windows)
The DAC converts the digital PCM samples to analogue waveforms, drives the speaker amplifier — and Rick Astley begins to sing.
Summary of Layer Boundaries
| Layer | Entry point | Exit point |
|---|---|---|
| ① Client |
KEY_UP event on Enter |
TLS-encrypted bytes leave the NIC |
| ② Network transit | First router hop after the client NIC | Packet arrives at Google PoP NIC |
| ③ Server | GFE NIC PHY receives optical pulse | Encrypted HTTP/2 DATA frames leave GFE NIC |
| ④ Rendering & media | Chrome Network Service receives first response bytes | GPU scans out final composited frame; DAC emits audio |
Total elapsed wall-clock time (typical broadband, nearby PoP): ~200–400 ms to first meaningful paint; ~1–2 s to first video frame playback.
Top comments (0)