Christopher Kujawa

Posted on Jan 19, 2025 • Originally published at Medium on Jan 16, 2025

Why batching matters: Real-world example of performance

#example #benchmark #performance #latency

Why batching matters: Real-world example of performance. The interaction of latency and throughput.

Have you ever had a case in your real life where you had to collect a lot of items/objects and move them from A to B? I’m pretty sure you had.

May it be something like putting dishes into the dishwasher, clothes for the laundry, doing a relocation, or even rearranging some firewood, as I did.

On every task, every day, we naturally improve our performance, for example by taking more objects at a time (batching) or taking a shorter route (reducing latency). Thus to reduce the total time it takes us to do the work. In this blog post, I want to bring you closer to the theory behind that.

In my work as a software engineer at Camunda, where I do a lot of benchmarking, we talk about performance frequently. Different terms like latency and throughput are used all day. Sometimes it is not clear or tangible for everyone what they mean, and how they interact with each other. Especially how important the right batching is.

As I’m interested in such topics personally and professionally, I would like to share a real-world situation I had and use it as an example to explain latency, throughput, and batching. My ambition with this blog post is that you have a better understanding of the interaction between latency and throughput and why batching matters, after reading this post.

A real-world example

A while ago, I had a real-life challenge (or ambition?). I had the glorious idea to rearrange the firewood in my garden.

The wood was next to my shed (place A) and I wanted to move it next to the entrance of the property (place B).

When I started, I picked two or three logs at a time and walked from place A to place B. I quickly realized that this is actually quite inefficient, and it would take ages (it felt at least to me like it).

I realized that I had a wheelbarrow, so I started to fill it and walk with the wheelbarrow the way back and forth. I did this until I was done. I felt (and it was) way more performant. During the time of doing this exercise, I thought actually that is a great example of batching (and finding the right limits). The idea of this blog post was born.

Example in depth

Let’s unfold the scenario and explain the example in more detail.

We have the place A, where the old woodpile is located. We want to move all the logs to place B (the new place). The way from A to B takes us around 20 seconds.

For simplicity, we take this as constant. Imagine we are a robot which walks all the time very fast, with the same speed :). In reality, this is not true (especially if you work with software and networks).

Latency

There are some good definitions of latency out there, I don’t want to replace them, I just want to bring you closer to the topic.

When we talk about latency in our example then this means we take one log from place A and walk to place B and put it there. This will take us 20 seconds.

Latency: A -> B = 20 s

This means the latency for moving one log is 20 seconds.

Important to note for the latency is that low values are preferable and higher values are bad. We always want to decrease the latency.

Throughput

Throughput is how many logs we can move during a certain time unit. In our example one (or more) per 20 seconds. Normally throughput is measured per second, meaning in our case:

Throughput = Amount of objects / latency

For our example, this would mean: 1/20 log/s = 0.05 log/s. Our throughput is 0.05 log/s. In other words, we can move 0.05 log per second from A to B.

Different from latency, for the throughput we want to increase the values, here higher values are better. If the latency is lower, in reverse the throughput is going up (as you can see in the formula above).

Batching

Based on the formula above, we can see that if we change the amount of logs, that we move, we can increase the throughput.

This means when we start batching, we can increase the throughput.

This is what I naturally did in the described scenario: taking more than one log and collecting them in my arms (it describes our batch).

If we take three logs at a time this would mean 3/20 log/s = 0.15 log/s. With that, we tripled the throughput! But this is only true if the batching itself is free.

I claim that you have been in this situation already, at least once, when collecting dishes, clothes, or whatever. It takes some time to collect them, carry and hold more of them (adding more to the batch). We call it delay before you start with your actual task carrying/moving them over.

This delay is added to the actual latency of every item/log we collect. For simplicity let us say every log added to our batch takes 1 second. They are heavy, you have to pick them up, put them in your collection, etc.

This means latency is now a function:

latency(batch size) = 20s + batch size * 1s

If we come back to our example of three logs instead of one, this would mean our latency is now: 23 seconds. This means it takes 23 seconds for a log to move from A to B, as it needs to be put first into the batch, the batch needs to be filled until its limits (three logs) and then moved. This is the maximum latency in our case. The last log in the batch might have lower latency, but the maximum is 23s.

As our latency and batch size have changed, our throughput in consequence changed as well and is now: 3/23 log/s ~ 0.130 log/s

We can see a significant throughput increase of 260% (0.13 log/s vs before 0.05 log/s), while the latency increased by 15% (23s vs before the 20s).

As I mentioned above I used a wheelbarrow, so we could increase the batch size even further, maybe to 10–15 logs.

Batch size 10 logs

Latency: 20 s + 10 logs * 1 s = 30s -> 50% increase
Throughput: 10/30 = ⅓ ~ 0.333 log/s -> 666% increase

Batch size 15 logs

Latency: 20s + 15 logs * 1 s = 35s -> 75% increase
Throughput: 15 / 35 = 3/7 = 0.429 log/s -> 858% increase

Total Execution Time

Depending on your scenario/use case you might have to look at different metrics and tune them accordingly.

Sometimes latency of a singular object is more important than the throughput of multiple, sometimes it is the total execution time that is important.

In the software world, you often have endless data which you have to process and work on. In reality, this is different, the data or objects are limited. Like my woodpile (luckily).

To calculate the total execution time we need to move everything from A to B we can use the following formula:

total execution time = total amount / batch size * latency(batch size)

Taking one log

Let’s say we have 200 logs in the woodpile, which we want to move. If we would take one log at a time it would take us:

200 logs * 20 s = 4000s = 4000s / 60s = 66,666 min.

After around one hour we would be done with moving the woodpile.

Batch size three

If we had increased the batch size to three, then this would mean:

200 log/ 3 log * 23 s = 1533,33s = 1533,33s / 60s = 25,55 min.

We would be done after around 25 minutes when taking three logs at a time.

Batch size ten

With our wheelbarrow and taking ten logs:

200 log/ 10 log * 30 s = 600s =600s / 60s = 10 min

We would be done after 10 minutes.

This is why batching matters

We learn this naturally as kids. If we take more at once and walk less, we are faster.

It is important to note that the total execution time will behave differently when the latency, which is a function of batch size, grows significantly/non-linearly. It’s not uncommon for the latency to grow faster than the batch size, leading to the so-called latency/throughput tradeoff.

Conclusion

We have seen that latency and batching influence throughput and the total execution time. They interact or cohere with each other.

Interaction of latency and throughput

It is always a tradeoff. It always depends on the situation. It is an art of finding the right balance between batch sizes, throughput, and good latency.

In software systems acting on requests, we need to keep the balance between being responsive, having an acceptable latency, and a good throughput (reacting/processing multiple requests at the same time). This means it doesn’t make sense to batch all requests forever and send them at once. Luckily we can parallelize more in software systems, which is a way to compensate for high latency.

This is not easily doable in a real-world scenario, except if you have a big family or a lot of friends to ask for help. We are limited by physical laws (or to be specific to our example there is no more room in our wheelbarrow).

As we have seen batching can and will introduce delays and impact the latency of an individual object, like adding the log. In our example, it is not an issue, as the important metric is the total execution time and the latency is linear growing with the batch size.

There exist situations where latency grows non-linear with the batch size producing more trouble. In general, if we reduce the latency or make sure that the growth with the batch size is close to linearity we can improve the throughput.

I hope this gave you some insights into how latency and throughput interact with each other and why batching matters.

Thanks for reading so far. Let me know what you think and share your stories. Thank you. :)

Thanks to Lena Schoenburg and Carlo Sana for reviewing this post.

DEV Community