You may have built some personal projects where you have a backend and a database, and everything just… works. You hit an API, the server does its thing, sends back a response, done. That's a perfectly fine way to build something for yourself or for a college project.
The problem starts when real users show up.
This is the first post in a series where I'm starting system design from absolute zero. No whiteboard interview tricks, no "design Twitter in 45 minutes" just the handful of ideas that everything else in this field sits on top of: what system design actually means, what a server really is, why latency and throughput get mixed up all the time, and what actually happens in the few hundred milliseconds between you hitting Enter and a page showing up on your screen.
What Is System Design, Really?
Here's the thing nobody tells you when you start: system design isn't about knowing a long list of fancy tools. It's about tradeoffs. Every single decision you make one big server or ten small ones, one database or five, add a cache or don't gives you something and takes something away at the same time.
Take a food delivery app as an example. A single laptop sitting under your desk could technically run the whole thing for ten users. It would fall over instantly at ten thousand. System design is basically the set of decisions that gets you from "works fine on my laptop" to "works for ten thousand people ordering lunch at the exact same time" more servers, a load balancer to split the traffic between them, a cache so the menu isn't hitting the database on every single request.
I used to think system design meant memorizing architecture diagrams draw the boxes, draw the arrows, done. It doesn't. It means understanding why the diagram looks the way it does. Which is exactly why this post starts with servers and latency instead of jumping straight to Kafka and Kubernetes.
Whenever you're evaluating any system design choice, just ask yourself what it costs. More servers cost money. More caching costs you staleness (your data might be a little outdated). More replication costs you consistency. There's no free option here only the tradeoff you're willing to live with.
So What Is a Server, Actually?
Most of you already have a rough idea of what a server is, but I want to slow down here because a lot of the confusion later in this field traces back to a shaky understanding of this one word.
A server is nothing but a program usually running on some dedicated hardware that sits there listening on a network port, waits for requests, and sends back responses. That's it. "Server" describes a behavior, not a physical box sitting in a data center somewhere.
Your laptop can be a server. A $5-a-month virtual machine can be a server. A cluster of a thousand machines behind one domain name is also, collectively, "the server" as far as your browser cares. What actually makes something a server is that it sits idle, listening, until a client a browser, a mobile app, another server sends it something to do.
Think of a coffee shop. The barista stands behind the counter, listening for orders (listening on a port). A customer walks up and asks for a latte (sends a request). The barista makes it and hands it over (sends a response). The barista isn't walking around the street forcing coffee on random people they wait to be asked first. That's the client-server model in one sentence: clients ask, servers answer.
| Term | What it actually means | Real-world version |
|---|---|---|
| Server | A program that listens for requests and responds | The barista behind the counter |
| Client | A program that sends requests | The customer ordering |
| Port | The specific "door" a server listens on (e.g. 443 for HTTPS) | The table number the order goes to |
| Request | The data a client sends, asking for something | "One latte, please" |
| Response | The data the server sends back | The coffee, handed over |
Note: When you type https://abc.com into your browser, your browser first asks DNS "what IP address does this domain point to?" Then it hits that IP address directly. Requesting https://abc.com is really the same thing as requesting 35.154.33.64:443 443 is just the default port for HTTPS. Remembering IP addresses is a pain, which is exactly why people buy domain names and point them at their server's IP.
A mistake I made early on: I assumed bigger always meant better, so I'd instinctively reach for a beefier server instead of asking whether I actually needed more listeners, or just a smarter cache. Most of the slow systems I've had to debug weren't underpowered at all they were just doing a bunch of unnecessary work on every single request.
Key takeaway: A server is defined by its behavior listening on a port and responding to requests not by its size, its location, or how impressive it sounds. And one "server" in production is usually many machines quietly working together behind the scenes.
Latency vs Throughput Why People Keep Mixing These Up
These are two words you'll hear constantly, and they sound related because they're both kind of describing "speed." But you can absolutely improve one while making the other worse, and that trips people up all the time.
Latency is the time a single request takes, start to finish. Measured in milliseconds. If loading a webpage takes 200ms, that request has 200ms of latency. Simple as that faster page, lower latency; slower page, higher latency.
Throughput is how many requests your system can handle per second. Measured in requests per second (RPS) or transactions per second (TPS). Every server has a ceiling it can only handle so many requests per second before it starts choking or falling over entirely.
I ran into this exact confusion building a checkout flow once. Each individual request was taking 80ms looked totally fine on paper. Then, under Black-Friday-style load, checkouts started timing out even though no single request had gotten any slower. What was actually happening: the server could only process so many requests at once, so new requests were queueing up behind old ones. That queueing time doesn't show up as "latency of the operation" in your metrics it shows up as your user staring at a loading spinner, wondering what's wrong. That's a throughput ceiling wearing a latency costume.
The classic analogy is a highway. Latency is how long it takes one car to drive from one end to the other. Throughput is how many cars pass a given point in an hour. You can widen the highway add more lanes to raise throughput without making any single car go a single mile-per-hour faster.
| Metric | Measures | Typical unit | Improved by |
|---|---|---|---|
| Latency | Time for one request | milliseconds (ms) | Caching, shorter network path, faster disk/CPU |
| Throughput | Requests handled per second | requests/sec (RPS) | More servers, parallelism, batching, queues |
Batching is where this tradeoff shows up most clearly. Say you group 100 small database writes into one batch write that raises your throughput because you're paying the per-write overhead only once instead of 100 times. But the first write in that batch is now sitting around waiting for the other 99 to show up before anything gets committed, so its individual latency just went up. You haven't actually made anything faster you've shifted the cost from "system capacity" onto "the first user's wait."
Before you go optimizing anything, ask which one your users actually feel. A checkout button cares about latency one person, one click, one wait. A nightly report job cares about throughput total rows processed by 6 a.m., nobody's staring at a spinner for that. Optimizing the wrong one just wastes your time.
In short: Latency measures how long one request takes. Throughput measures how many requests your system can chew through concurrently. In an ideal world you want both high throughput and low latency but real systems almost always make you trade one for the other somewhere.
What Actually Happens Between Hitting Enter and Seeing a Page
Type a URL and hit Enter, and here's the honest sequence of events before a single pixel changes on your screen:
- DNS lookup. Your browser asks a DNS resolver, "what IP address does this domain point to?" On an uncached lookup, this typically adds 80–200ms of delay before anything even starts loading, though a fast anycast resolver can bring that down to roughly 10ms.
- TCP handshake. Browser and server exchange SYN, SYN-ACK, and ACK packets to open a reliable connection three trips of "hello, hello back, confirmed" before any real data moves at all.
- TLS handshake. For HTTPS, both sides negotiate encryption and verify the server's certificate. TLS 1.3 typically adds 50–100ms (one extra round trip); older TLS 1.2 needs two round trips and can tack on 100–200ms.
-
HTTP request and response. Only now does your browser send
GET /, and only now does the server actually send back the page.
And none of that even counts the physical distance the signal has to travel. Light in fiber-optic cable moves slower than in a vacuum roughly 206,856,796 meters per second which works out to about a millisecond for every 207 kilometers of cable. A round trip from New York to Sydney, purely from geography, costs about 160ms and that's before your server has done a single bit of actual work.
Request timeline cold connection, cross-region
DNS lookup........... 80-200ms (uncached, no anycast)
TCP handshake......... 20-50ms (1 round trip)
TLS handshake......... 50-150ms (1-2 round trips)
HTTP request + server processing... varies (10ms cached, 500ms+ hitting a slow DB)
This is why every CDN pitch you've ever heard boils down to one idea: shorten the distance. A CDN puts a copy of your content on a server physically close to the user, so the DNS lookup, the TCP handshake, and the TLS handshake all happen against a nearby edge node instead of your origin server sitting on the other side of the planet.
People tend to assume a slow page load means a slow server. A lot of the time it isn't the server at all it's an uncached DNS lookup, a cold TCP/TLS handshake, or just a user who's three continents away from your origin. Always check the network waterfall before you go blaming the backend.
I've personally kept connections alive using HTTP keep-alive just to avoid paying for a brand new TCP and TLS handshake on every single request that alone shaved over 100ms off repeat requests in a project I benchmarked, without touching a single line of application logic.
Exercise for you: Open your browser's dev tools, go to the Network tab, and reload any website. Look at the waterfall for the very first request you'll actually see the DNS, TCP, and TLS phases laid out as separate chunks of time before "Content Download" even starts.
Key Takeaways
- System design is about tradeoffs between speed, cost, and reliability not about memorizing a fixed toolset.
- A server is defined by behavior listening on a port and responding and a production "server" is usually many machines, not one.
- Latency is the time cost of one request; throughput is the volume a system can sustain over time batching and queueing routinely trade one for the other.
- A single request pays for DNS, TCP, and TLS before your server even starts processing it often 150–450ms combined on a cold, cross-region connection.
- CDNs and connection reuse are both attacking the exact same problem: shortening or avoiding the distance and handshakes a request has to pay for.
Next up in this series: how multiple servers actually share load load balancing strategies, and the tradeoffs between scaling vertically versus horizontally. Everything in this post is the prerequisite for that, since every scaling decision changes where and how often these network costs get paid.
Top comments (0)