Latency is basically the delay between the moment you start a task and the moment it finishes. Every system has some latency — if it’s doing any work at all, it can never be truly zero. In some systems, the latency is so small that it’s just a fraction of a millisecond, and there even an extra microsecond of delay can be a big problem.
Applications that must respond extremely quickly are called low-latency applications. For these, the goal is to respond and return results as fast as possible. If latency rises, the system can slow down, degrade in performance, or even completely break.
But if it runs with the expected low latency, it can:
- beat the competition
- run at maximum performance
- handle more work (higher throughput)
- increase productivity
- or improve the user experience
Depending on what the application does.
Latency is both a quantitative and a qualitative concept.
- Quantitatively, it’s just time
- Qualitatively, it depends on the context — some systems can tolerate latency measured in seconds (like loading a webpage), but once a video starts playing, it can’t afford multi-second pauses
That would kill the entire experience.
And then there’s the extreme case: high-frequency trading (HFT). There, a few microseconds decide whether a company is profitable or irrelevant.
It’s a world where people optimize the network, software, CPU, RAM, and even the physical cables — all to shave off microseconds.
How Is Latency Actually Measured?
You can measure latency in multiple ways. The difference is mainly where you start measuring and where you stop. Here are the most common metrics.
1) Time To First Byte (TTFB)
The time between:
- T₁: when the sender transmits the first byte
- T₂: when the receiver reads that first byte
Useful for networks, APIs, or streaming. It shows how quickly the server starts responding, not how fast it completes the whole response.
2) Round-Trip Time (RTT)
The full “there and back” cycle:
Send → deliver → process → respond → deliver
RTT includes the time the other side spends processing the data. It’s the complete picture of real-world response time.
In trading, RTT consists of three parts:
- time for market data to reach the participant
- time for the system to make a decision
- time for the order to reach the exchange
3) Tick-to-Trade (TTT)
A special trading-specific metric.
It measures only what happens inside your infrastructure:
How long from receiving a packet until you send an order back
TTT = pure algorithm + infrastructure reaction time.
In practice → when you receive market data, how fast can you send a trade?
This is where microseconds matter.
4) CPU Clock Cycles
The lowest-level measurement — CPU cycles.
Useful for analyzing:
- how fast a specific instruction executes
- pipeline behavior, branch mispredictions, cache hits/misses
- how different compilers generate assembly
This is “high-end optimization” where you tune the last few percent of performance.
Latency vs Throughput: Don’t Mix Them Up
People often confuse latency and throughput, but they’re entirely different.
- Latency = how fast you finish one thing
- Throughput = how many things you finish per unit of time
If you want high throughput, you use parallelism — individual tasks might take longer but you complete more of them.
If you want low latency, every single operation must be fast.
Key Latency Metrics
Mean latency
Average of all measurements.
Median latency
Better representation of reality. Unaffected by extreme outliers.
Peak latency
Worst-case scenario. Crucial in real-time systems.
Latency variance (jitter)
How much latency fluctuates.
Become a member
Extremely important in HFT, robotics, and automotive — you need predictability.
What Do Latency-Sensitive Applications Need?
Robustness and correctness
Low-latency systems process huge amounts of data. They must be stable.
Low mean and median latency
Both need to stay minimal.
Capped peak latency
You must guarantee the upper bound — the system must never “fall apart” and spike to seconds.
Low variance
Even if average latency is low, occasional millisecond-level spikes are bad.
High throughput
Sometimes handling a massive load is more important than the absolute fastest single operation.
Example — Server vs Client
T₁ — server sends the first byte
Server starts transmitting the packet.
T₂ — client receives the first byte
Client gets the first data from the server.
→ T₂ − T₁ = TTFB (server → client)
T₃ — client sends a response
Client processes the data and sends a packet back.
T₄ — server receives the first byte of the response
→ T₄ − T₃ = TTFB (client → server)
Full RTT
If you want the entire round-trip time:
RTT = (T₂ − T₁) + (T₄ − T₃)
In many cases, you add the client processing time (T₂ → T₃), so:
Real-world RTT = T₄ − T₁
The full cycle from sending to receiving.
Why Should You Care?
Latency is not just a technical parameter.
It’s a competitive advantage, a quality-of-experience factor, and in some fields (trading, automotive, robotics), it’s literally a matter of life and death.
And here’s the fascinating part:
Every field sees latency completely differently.
How React Thinks About Latency
In the React world, latency is not about microseconds.
React doesn’t care if a function takes 80 ns or 120 ns.
React focuses on:
- keeping the UI “responsive enough” for humans
- avoiding flicker, tearing, and unnecessary re-renders
- preventing users from feeling slowdowns
- scheduling work so the UI stays interactive
React concepts around latency:
- Concurrent rendering — pause work to keep the UI smooth
- Transitions — mark lower-priority updates
- React Compiler — eliminate unnecessary re-renders
Here, latency is measured in tens or hundreds of milliseconds, because humans don’t notice anything under ~16 ms.
- 400 ms? Fine.
- 1 second? Annoying.
- 2 seconds? Now it’s a problem.
In this world, latency is a psychological experience more than a technical one.
And Then There’s the Opposite Extreme: Trading, Robotics, Automotive
These fields don’t care about “user experience.”
A single microsecond can mean:
- your algorithm loses to competitors
- a robot doesn’t react in time
- a car fails to compute a braking trajectory
React couldn’t care less if something takes 0.1 ms instead of 1 ms.
Trading systems absolutely care.
They’d lose money — and relevance — over a 0.001 ms difference.
And There’s a “Third World” — Typical Backends and APIs
Here, latency is measured in:
- milliseconds (e.g., 20–200 ms)
- the focus is throughput, scaling, caching, parallelization
- as long as requests finish within a few hundred ms, everything is fine
This is the kind of system where “latency” means something completely different than in HFT or real-time control.
The Point
Understanding latency is all about understanding context:
- In React, you optimize for UI smoothness and user perception
- In backend systems, you optimize for scaling and throughput
- In robotics and trading, you optimize for microsecond reaction times
And yet all these fields use the exact same word: latency.
Just in completely different universes.

Top comments (0)