DEV Community

Cover image for How the Web Actually Works: HTTP from the Ground Up
Sunil Kumar
Sunil Kumar

Posted on

How the Web Actually Works: HTTP from the Ground Up

I've been going through Jim Kurose's networking lectures lately, and I kept finding myself pausing to re-read the same sections. Not because they were confusing - because things I'd been using for years were finally clicking into place.

This post is me writing down what I learned, in the order it started making sense.

Before HTTP, there's a webpage

A webpage isn't one file.

When you open a URL, your browser fetches a base HTML file - and that file references other objects. Images. Scripts. Stylesheets. Each one lives at its own URL. Each one has to be fetched separately. So loading a single "page" might mean firing off 20+ individual requests.

This detail matters because the entire evolution of HTTP - from 1.0 to 3 - is basically the story of making those 20 fetches faster.

HTTP runs on TCP. That has consequences.

HTTP doesn't manage its own connections. It hands that job to TCP. When your browser wants something, it first opens a TCP connection to the server (port 80 for HTTP, 443 for HTTPS), and then asks for the object.

Opening a TCP connection isn't free. It takes a round-trip - your machine says "hello," the server says "hello back," and then you can actually talk. That's one RTT(Round Trip Time) just to shake hands, before a single byte of your webpage arrives.

So every HTTP request carries at least 2 RTTs of overhead: 1 to open the TCP connection, 1 for the actual request/response. Do that 20 times and you've spent 40 RTTs before the page renders.

HTTP/1.0 vs HTTP/1.1: one change that mattered a lot

HTTP/1.0 (non-persistent): open a TCP connection, fetch one object, close the connection. Repeat for every object.

HTTP/1.1 (persistent): open a TCP connection, fetch as many objects as you need, then close. The server leaves the connection open after each response.

That one change cuts subsequent fetches from 2 RTTs to 1 RTT each. For a page with 20 objects, that's real time saved - not microseconds, but hundreds of milliseconds that users actually feel.

What an HTTP message looks like

HTTP messages are plain text. You can read them. That's a deliberate design choice.

A request looks like this:

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0
Connection: keep-alive
Enter fullscreen mode Exit fullscreen mode

The first line is the request line: method, path, version. Then headers. For POST requests, there's a body - that's where form data lives.

A response looks like this:

HTTP/1.1 200 OK
Date: Sun, 14 Jun 2026 10:00:00 GMT
Content-Type: text/html
Content-Length: 4821

<html>...
Enter fullscreen mode Exit fullscreen mode

Status line first, then headers, then the actual content. Status codes tell you what happened: 200 means it worked, 301 means it moved, 404 means it's not there, 500 means the server broke.

Worth memorizing: HEAD does everything GET does but returns no body - useful when you just want to check metadata.

HTTP is stateless. Cookies exist because of that.

The server remembers nothing between requests. Each request arrives as if the client is brand new.

That's intentional. Stateless servers are simpler to build and easier to scale. But it creates an obvious problem: how does Amazon know you're still logged in between page loads?

Cookies solve this. When you first connect, the server generates a unique ID and sends it back:

Set-Cookie: user_id=8273645
Enter fullscreen mode Exit fullscreen mode

Your browser stores that. On every future request to that domain, it sends:

Cookie: user_id=8273645
Enter fullscreen mode Exit fullscreen mode

The server looks up that ID in its own database and now "remembers" you. The state lives server-side; the cookie is just the key.

This is also why GDPR exists. Cookies let companies track you across sessions and build profiles. EU law now requires explicit consent before non-essential cookies run.

Web caches: the math is more interesting than the concept

Web caches (proxy servers) sit between your browser and the origin server. If the object you want is cached, you get it from the proxy instead of the origin. Faster, cheaper, less load on the server.

Kurose walks through a concrete example I found genuinely clarifying. Say an institution's access link runs at 1.54 Mbps and is at 97% utilization. That's not "nearly full" - at 97% utilization, queuing delays go into the minutes. The link becomes the bottleneck even if every other hop is fast.

You have two options:

  1. Upgrade to a faster link (expensive)
  2. Install a local cache (cheap)

Assume a 40% cache hit rate - 40% of objects are served locally without touching the access link at all. That drops link utilization from 97% to about 58%. At 58%, queuing delays collapse. Average end-to-end latency drops from minutes to ~1.2 seconds.

Same raw bandwidth. Radically different user experience. Just by not fetching the same things twice.

Conditional GET: caches without stale data

Caches raise an obvious question: what if the cached version is out of date?

HTTP solves this with the Conditional GET. When your browser (or a proxy) has a cached copy, it includes a header with the last-modified date:

GET /image.jpg HTTP/1.1
If-Modified-Since: Wed, 10 Jun 2026 08:00:00 GMT
Enter fullscreen mode Exit fullscreen mode

The server checks. If the file hasn't changed since that date, it sends back:

HTTP/1.1 304 Not Modified
Enter fullscreen mode Exit fullscreen mode

No body. Just the status. The client uses its cached copy.

If the file has changed, the server sends a normal 200 with the new content.

The result: you never serve stale content, and you don't waste bandwidth re-downloading things that haven't changed.

HTTP/2: attacking head-of-line blocking

HTTP/1.1 is persistent, but it's still sequential within a connection. If the first response is a 50MB video, the three tiny images behind it wait in line.

This is called head-of-line (HOL) blocking. A big object blocks smaller objects that could have been delivered instantly.

HTTP/2 fixes this by breaking objects into frames and interleaving them. The large video doesn't block the images - they take turns at the byte level. The client can also tell the server which objects matter more, so critical assets arrive first.

HTTP/2 also introduced server push: the server can proactively send objects the client is likely to need next, without waiting to be asked.

HTTP/3: fixing the layer underneath

HTTP/2 solved HOL blocking at the application layer, but there's still a problem underneath: TCP itself.

If a single TCP packet is lost, TCP stops all streams on that connection and waits for retransmission. HTTP/2 has multiple streams, but they all share one TCP connection - so one lost packet freezes everything.

HTTP/3 replaces TCP with QUIC - a transport protocol built on UDP. QUIC implements its own reliability and congestion control, but independently per stream. A lost packet only stalls the stream it belongs to. Everything else keeps moving.

QUIC also bakes in encryption by default and reduces connection setup time (combining the TCP handshake and TLS handshake into fewer round trips). The web gets faster.

A rough mental model of the whole thing

HTTP/1.0  →  2 RTTs per object, new connection each time
HTTP/1.1  →  1 RTT for subsequent objects, persistent connections
HTTP/2    →  parallel streams, no HOL blocking at app layer
HTTP/3    →  QUIC replaces TCP, no HOL blocking at transport layer
Enter fullscreen mode Exit fullscreen mode

Each version is a direct response to the bottleneck the previous version exposed. That's a pattern worth noticing: protocol design is mostly the history of people fixing the problems they created.

What I'd look at next

If this is interesting to you, the next concepts that make this all more concrete are:

  • TLS handshakes (what actually happens on port 443)
  • DNS (how example.com becomes an IP address before any of the above runs)
  • TCP congestion control (what QUIC replaced and why)

Kurose's lectures cover all of it. The textbook (Computer Networking: A Top-Down Approach) is the companion read.

Learning in public at @sunbuilds. Portfolio at sunilk02.vercel.app.

Top comments (0)