Tomáš Svojanovský

Posted on Dec 4, 2025

Latency: What It Really Means, How It’s Measured, and Why Every Field Sees It Differently

#latency #cpp #performance #softwareengineering

Latency is basically the delay between the moment you start a task and the moment it finishes. Every system has some latency — if it’s doing any work at all, it can never be truly zero. In some systems, the latency is so small that it’s just a fraction of a millisecond, and there even an extra microsecond of delay can be a big problem.

Applications that must respond extremely quickly are called low-latency applications. For these, the goal is to respond and return results as fast as possible. If latency rises, the system can slow down, degrade in performance, or even completely break.

But if it runs with the expected low latency, it can:

beat the competition
run at maximum performance
handle more work (higher throughput)
increase productivity
or improve the user experience

Depending on what the application does.

Latency is both a quantitative and a qualitative concept.

Quantitatively, it’s just time
Qualitatively, it depends on the context — some systems can tolerate latency measured in seconds (like loading a webpage), but once a video starts playing, it can’t afford multi-second pauses

That would kill the entire experience.

And then there’s the extreme case: high-frequency trading (HFT). There, a few microseconds decide whether a company is profitable or irrelevant.

It’s a world where people optimize the network, software, CPU, RAM, and even the physical cables — all to shave off microseconds.

How Is Latency Actually Measured?

You can measure latency in multiple ways. The difference is mainly where you start measuring and where you stop. Here are the most common metrics.

1) Time To First Byte (TTFB)

The time between:

T₁: when the sender transmits the first byte
T₂: when the receiver reads that first byte

Useful for networks, APIs, or streaming. It shows how quickly the server starts responding, not how fast it completes the whole response.

2) Round-Trip Time (RTT)

The full “there and back” cycle:

Send → deliver → process → respond → deliver

RTT includes the time the other side spends processing the data. It’s the complete picture of real-world response time.

In trading, RTT consists of three parts:

time for market data to reach the participant
time for the system to make a decision
time for the order to reach the exchange

3) Tick-to-Trade (TTT)

A special trading-specific metric.

It measures only what happens inside your infrastructure:

How long from receiving a packet until you send an order back

TTT = pure algorithm + infrastructure reaction time.

In practice → when you receive market data, how fast can you send a trade?

This is where microseconds matter.

4) CPU Clock Cycles

The lowest-level measurement — CPU cycles.

Useful for analyzing:

how fast a specific instruction executes
pipeline behavior, branch mispredictions, cache hits/misses
how different compilers generate assembly

This is “high-end optimization” where you tune the last few percent of performance.

Latency vs Throughput: Don’t Mix Them Up

People often confuse latency and throughput, but they’re entirely different.

Latency = how fast you finish one thing
Throughput = how many things you finish per unit of time

If you want high throughput, you use parallelism — individual tasks might take longer but you complete more of them.

If you want low latency, every single operation must be fast.

Key Latency Metrics

Mean latency

Average of all measurements.

Median latency

Better representation of reality. Unaffected by extreme outliers.

Peak latency

Worst-case scenario. Crucial in real-time systems.

Latency variance (jitter)

How much latency fluctuates.

Become a member

Extremely important in HFT, robotics, and automotive — you need predictability.

What Do Latency-Sensitive Applications Need?

Robustness and correctness
Low-latency systems process huge amounts of data. They must be stable.

Low mean and median latency

Both need to stay minimal.

Capped peak latency

You must guarantee the upper bound — the system must never “fall apart” and spike to seconds.

Low variance

Even if average latency is low, occasional millisecond-level spikes are bad.

High throughput

Sometimes handling a massive load is more important than the absolute fastest single operation.

Example — Server vs Client

T₁ — server sends the first byte

Server starts transmitting the packet.

T₂ — client receives the first byte

Client gets the first data from the server.

→ T₂ − T₁ = TTFB (server → client)

T₃ — client sends a response

Client processes the data and sends a packet back.

T₄ — server receives the first byte of the response

→ T₄ − T₃ = TTFB (client → server)

Full RTT

If you want the entire round-trip time:

RTT = (T₂ − T₁) + (T₄ − T₃)

In many cases, you add the client processing time (T₂ → T₃), so:

Real-world RTT = T₄ − T₁

The full cycle from sending to receiving.

Why Should You Care?

Latency is not just a technical parameter.

It’s a competitive advantage, a quality-of-experience factor, and in some fields (trading, automotive, robotics), it’s literally a matter of life and death.

And here’s the fascinating part:

Every field sees latency completely differently.

How React Thinks About Latency

In the React world, latency is not about microseconds.

React doesn’t care if a function takes 80 ns or 120 ns.

React focuses on:

keeping the UI “responsive enough” for humans
avoiding flicker, tearing, and unnecessary re-renders
preventing users from feeling slowdowns
scheduling work so the UI stays interactive

React concepts around latency:

Concurrent rendering — pause work to keep the UI smooth
Transitions — mark lower-priority updates
React Compiler — eliminate unnecessary re-renders

Here, latency is measured in tens or hundreds of milliseconds, because humans don’t notice anything under ~16 ms.

400 ms? Fine.
1 second? Annoying.
2 seconds? Now it’s a problem.

In this world, latency is a psychological experience more than a technical one.

And Then There’s the Opposite Extreme: Trading, Robotics, Automotive

These fields don’t care about “user experience.”

A single microsecond can mean:

your algorithm loses to competitors
a robot doesn’t react in time
a car fails to compute a braking trajectory

React couldn’t care less if something takes 0.1 ms instead of 1 ms.

Trading systems absolutely care.

They’d lose money — and relevance — over a 0.001 ms difference.

And There’s a “Third World” — Typical Backends and APIs

Here, latency is measured in:

milliseconds (e.g., 20–200 ms)
the focus is throughput, scaling, caching, parallelization
as long as requests finish within a few hundred ms, everything is fine

This is the kind of system where “latency” means something completely different than in HFT or real-time control.

The Point

Understanding latency is all about understanding context:

In React, you optimize for UI smoothness and user perception
In backend systems, you optimize for scaling and throughput
In robotics and trading, you optimize for microsecond reaction times

And yet all these fields use the exact same word: latency.

Just in completely different universes.

DEV Community