Subham

Posted on Jul 3

System Design Chapter 1: Fundamentals: Servers, Latency, Request Flow and Throughput

#interview #career #systemdesign #backend

You tap "order food" and just sit there staring at the screen.
Two seconds pass, then five, and you start refreshing like that'll help.
That miserable wait is basically the whole first chapter of system design, right there.

What Actually Happens When You Tap That Button

A server is just a computer sitting somewhere else, waiting for you to ask it something.
That's it, no magic, just a machine in a building somewhere, ready to answer. 🖥️

Think about ordering food through a delivery app for a second.
Before your food even starts cooking, five things quietly happen behind the scenes.

Your phone looks up which restaurant location can take the order — that's DNS, turning a name into an address.
A dispatcher decides which kitchen and driver have room to handle it — that's the load balancer.
The kitchen itself cooks your food — that's the server doing the actual work.
Your food travels back to your door — that's the response.

Feels like a lot just for one button tap, doesn't it? 😅

You → DNS Lookup → Load Balancer → Server (Kitchen) → Response (Your Food)

Latency: The Wait You Feel

Latency is just the time between asking for something and actually getting it.
Same as waiting for those three typing dots to finally turn into your friend's reply.

Same request, three different distances, three totally different waits:

Same data center: under 1 millisecond
Across data centers, same region: 1 to 2 milliseconds
Across the globe: 50 to 150 milliseconds

That's the difference between a whisper across the room and a shout across the ocean.
Wild how much geography still matters online, huh? 🌊

Throughput: The Capacity Nobody Notices Until It's Gone

Throughput doesn't care how long your one request took.
It only cares how many people the whole system can serve at the same time.

Picture a single elevator in a busy office building.
One elevator ride floor to floor is latency; how many people the whole bank moves per hour is throughput.
Add three more elevators, and suddenly the wait disappears — nobody's ride got faster, though. 🛗

More throughput usually comes from one of these:

More servers running in parallel
Batching small requests into bigger ones
Multiple threads or workers per machine

Latency vs Throughput: The Tug of War

Here's the two side by side, since people mix them up constantly.

	Latency	Throughput
Measures	Time for one request	Requests handled per time unit
Common units	ms, seconds	RPS, TPS, QPS
Improved by	Caching, CDNs, faster hardware	More servers, batching, more workers
Who feels it	The one user waiting	The whole system under heavy load
Cares most	Trading platforms, chat apps	Batch analytics, log ingestion

Notice they don't move together — fixing one doesn't automatically fix the other.
Sometimes fixing one makes the other worse. 😬

I once spent a whole afternoon adding servers to a slow app.
Nothing got faster, because it turns out it was a latency problem, not a throughput one.
Lesson learned the hard way. 🙃

Picture it like a rubber band stretched between two hands — pull one side, the other side tightens too.

Doing the Math: Little's Law

There's actually a real formula tying latency and throughput together (stay with me, it's simple).

Throughput = Concurrency ÷ Latency

Say your delivery app wants to handle 500 orders per second, and each order takes 200 milliseconds to process.
That means roughly 100 orders need to be happening at the exact same moment.
Sounds like a lot of pans on the stove at once, right? 🍳

That number matters in the real world too, not just on a whiteboard.
Amazon once found that an extra 100 milliseconds of latency cost them roughly 1% in sales.
Google saw an extra half a second of load time drop search traffic by 20%. 📉

When It All Falls Apart: A Real Outage Story

In October 2025, this exact stuff took down a huge chunk of the internet for hours.
One of Amazon's databases, DynamoDB, had a DNS problem — basically, the internet's phonebook lost its own entry.

DNS lookup fails → servers can't find each other → requests time out → apps go down

Requests couldn't find the right server anymore, so app after app started timing out.
Downdetector logged over 6.5 million reports that day, and apps like Snapchat, Alexa, Coinbase, and Duolingo all went down.
The DNS itself got fixed in about 3 hours, but the cascading mess took roughly 12 more hours to fully clear. 😳

One tiny broken lookup, and half the internet forgot how to find itself. 🙃

TL;DR

Latency = how long one request takes. Throughput = how many requests get handled at once.
A request travels You → DNS → Load Balancer → Server → Response.
Fixing latency and fixing throughput are different jobs — sometimes even opposite ones.
Little's Law: Throughput = Concurrency ÷ Latency.
Real numbers matter: under 1ms in one data center, 50–150ms across the globe.
Even giants like AWS go down when one tiny DNS lookup fails.

Up next: we go inside the server itself — threads, connections, and why "just add more RAM" doesn't always save you. Try this yourself: next time an app feels slow, ask whether it's actually a latency problem or a throughput one.

Top comments (1)

dyagzy • Jul 9

Well simplified looking forward to the next