Latency vs Throughput

#systemdesign #backend #architecture #performance

The first time I heard "optimize for latency," I thought it meant "make it fast." So I turned off batching, flushed writes immediately, set Kafka's linger.ms to 0.

The system responded faster. And handled way less load.

Latency and throughput pull in opposite directions. Making your system faster for individual requests usually means it handles fewer of them per second. Handling more per second usually means individual requests wait longer. This tradeoff shows up everywhere and misidentifying which axis to optimize is one of the most common architecture mistakes.

DB Batch Writes

Your database is getting hammered with writes. Each write goes to disk immediately.

Latency per write: 10ms
One thread handles: 100 writes per second

Change the approach: collect 100 records, flush in one batch. One disk operation instead of 100.

Throughput went up — the disk does the same work with 100× fewer operations.

Latency went up — the first record now waits for 99 more before anything gets written.

You traded fast individual responses for higher total capacity.

Netflix's Loading Spinner

Netflix needs to send you a video. Two options:

Option A: stream one tiny chunk the moment it's ready. Low latency, you start watching fast. But thousands of tiny chunks = thousands of network trips per second = inefficient. Fewer users served.

Option B: buffer and send larger chunks. Fewer network trips, more users served per second. But you wait a few seconds before playback starts.

That loading spinner isn't a bug. Netflix deliberately accepts higher startup latency to serve more users efficiently. Same tradeoff, product-level decision.

Kafka's linger.ms

This one makes the tradeoff explicit. Kafka producers have a config called linger.ms — how long the producer waits before flushing a batch.

linger.ms = 0:
  Every event fires immediately — one network call per event
  At 100K events/sec = 100K network calls/sec
  Low latency per event, terrible network efficiency

linger.ms = 5:
  Producer accumulates events for 5ms, flushes together
  At 100K events/sec ≈ 20K batch calls/sec
  Slightly higher latency per event, 5× better throughput

linger.ms is a literal dial between the two extremes. Kafka doesn't choose for you — it expects you to understand the tradeoff and set it intentionally.

The Pattern

Every example is the same thing:

Make individuals wait → serve more of them overall.

Scenario	What you traded	What you got
DB batch writes	Per-record speed	Total capacity
Netflix buffering	Startup speed	User capacity
Kafka linger.ms > 0	Message delay	Network efficiency

The inverse is also true: processing immediately = low latency, lower throughput. You're always on this spectrum.

How to Decide

Ask: what does a bad experience look like for this system?

Chat message takes 5 seconds to send → users leave. Optimize for latency.
Analytics pipeline runs overnight → nobody cares if a log arrived 2 seconds late. Optimize for throughput.
Payment confirmation → user is staring at a loading screen. Optimize for latency.
Log processing → you're crunching billions of events in bulk. Optimize for throughput.

DEV Community