DEV Community

Vladyslav Hutov
Vladyslav Hutov

Posted on

Latency: 6 questions you were afraid to ask

Throughout my career, I’ve been collecting knowledge about performance piece by piece. It seemed to me as some kind of magic available to the chosen. In this post, I want to share with you my findings about such a topic as Application Latency.

Here I’ve collected 6 the most common questions you might have when struggling with application optimisations.

1. What is latency?

There is a lot of confusion in terms when talking about software engineering. This one is no exception. Ask ten engineers, and you will hear ten different answers.

In this article, I’m going to talk about a term that “Uncle Bob” Martin Fowler defined as “response time”:

Response time is the amount of time it takes for the system to process a request from the outside. This may be … a server API call. -- Martin Fowler

That is the time it takes from when a server receives a request to the time for it to respond.

Also, some may refer to latency as the total time it takes between a client making a request and receiving a response:

Network latency 1+Response time+Network latency 2Network\ latency\ 1 + Response\ time + Network\ latency\ 2

Request Response Route

2. Should I care about latency?

Yes! If you want to create high-quality software, you absolutely should care about it. Users are affected by slow applications. According to this study, 1000 ms is enough to lose focus. But let me be clear, “care” ≠ “optimise”.

Premature optimization is the root of all evil -- Donald Knuth

Before jumping in and hunting for optimisations, you must understand that a good application is an equilibrium of characteristics: functional and non-functional. Foremost, it must satisfy user needs.

Ensure that you measure the app correctly. You can decide to optimise the app only when you have a good picture of its performance. Optimising is hard, and it doesn’t always make sense to spend weeks rewriting logic and going from 500ms to 400ms.

3. How to measure latency?

Usually, web servers have built-in tools for monitoring your app performance. They show only high-level metrics such as route response time. Having them is enough to judge if you need to work on optimisations but not enough to understand where to begin.

Tracing libraries can help you with both. They allow pinpointing where the poor performance occurs. An example of such a library is statstd. It provides a convenient API to monitor functions and code blocks. Check an example of this far-fetched Python app:

import statsd
import time

client = statsd.StatsClient()


@app.route("/")
@client.timer("HomeService.home")
def home():
    with client.timer("HomeService.database_calls"):
        message = _get_message()
        name = _get_name()
    page = _render_page(message, name)
    return page


@client.timer("HomeService._render_page")
def _render_page(message: str, name: str) -> str:
    time.sleep(0.1)
    return f"<p>{message} {name}</p>"


@client.timer("HomeService._get_message")
def _get_message() -> str:
    time.sleep(0.4)
    return "Hello"


@client.timer("HomeService._get_name")
def _get_name() -> str:
    time.sleep(0.5)
    return "world"
Enter fullscreen mode Exit fullscreen mode

You can also rate limit it, e.g. log every 100th request, if your RPS is high enough.

When reading metrics, you need to decide which aggregation function to choose. Often the choice falls on “avg”. But measuring latency with avg is misleading.

Imagine that users open the home page with the following latencies: 100ms, 97ms, 312ms, 750ms, 60ms, 150ms, 809ms, and 121ms. The average latency is 300ms, which doesn’t describe the reality. For a better understanding, you can use percentiles (P50, P75, P95, P99, etc.).

For given example, percentiles are:

Percentile Latency, ms
P50 136
P75 422
P95 788
P99 805

P95 of 788ms means that 95% of requests execute faster than 788ms. So when optimising your app, you mainly target 5% of users who wait more than 788ms for it to complete the request.

4. What causes latency?

You won’t be surprised if I tell you that many factors come into play. The list I provide is not a complete picture, but a good start.

  • Underprovisioned resource - CPU/Memory/Network doesn’t meet your workload requirements. It means that a request is awaiting resources.
  • I/O operations - calling external services, writing to files, logging, etc. are slow and cause your request to wait.
  • Other applications which run on the host are utilising its resources.
  • Inefficient code - when your application has complex business logic, ensure that you use optimal algorithms to solve the problem. It may also be related to the inefficient concurrent code (i.e. frequent context switches) or instantiation of heavy objects.
  • Garbage collection - some language environments like JVM come with a garbage collector making our life much easier. However, it comes at its cost, and GC can pause the entire app/thread execution and do a memory cleanup, which results in additional wait time for the request.
  • Different zones can cause network latency between your client and the app, e.g. a user from Europe calls a service in the US.

5. Does programming language matter?

It only matters in rare use cases. Modern languages have plenty of optimisations like JIT, AOT compilation, etc., making them run almost as fast as machine code.

6. How to improve latency?

  • Analyse DB access patterns, add indices and optimise queries. But remember, adding an index is a trade-off between read and write speed.
  • Cache data. Use local and/or remote in-memory caches, as this reduces the load from other external components like the database and allows it to handle responses much faster.
  • Defer work. Instead of making users wait, consider executing logic asynchronously, for example, passing messages to a durable queue and processing them later.
  • Precompute. Think if you can efficiently precompute and store computed data. For example, generate analytics reports on a scheduled basis instead of ad-hoc, and your users will be surprised by the response time.
  • Prefetch. Actually, it's not a latency improvement but a hack to "hide" it. Instead of waiting for the user's action, prefetch the most possible results. For example, if you have a search bar with suggestions, you can predict the subsequent letters and prefetch results, so that clients don't need to request them.
  • Scale. Improve resource utilisation by scaling instances of your app, database or other systems.
  • Use the right tools. Analyse the effectiveness of the existing app architecture and ensure that you use suitable tools for the problem, e.g. columnar vs row databases, pull vs push queues, binary vs text protocols, stream vs batch processing etc.
  • Monitor performance regressions as part of the CI/CD pipeline or set up alerts on latency metrics.

Let me know in the comments, if you have more questions.

Top comments (1)

Collapse
 
a_shp profile image
Ana

Nice tips, thanks for providing an example 👍