Concurrent programming techniques are the most used approach to structure the applications since the multicore processors are invented, and it's the most effective way to speed your applications when the number of tasks is much greater than the number of CPUs especially when the coordination of these smaller tasks requires network communication. For instance, using concurrent programming a CPU will do other things after triggering an HTTP request instead of blocking the CPU until the response is fulfilled. In this article, I want to explain the importance of putting limits on concurrent requests.
The behavior of concurrent requests in a server could be analyzed with a well-known queueing theory called Little’s Law L = ℷW
. When we adopt the idea in this theory to a typical client-server architecture based on which requests arrive at a server from the client, processed some time in the server, and then responded to the client. The specifying the formula:
- L - the level of concurrency; the number of requests that are currently our server is processing.
- ℷ - throughput; the numbers of completed requests (amount) per second in a server, completed because otherwise the request will be piled up in the server and not counted in throughput.
- W - average latency; the latency (delay) of handling a specific request.
As you could imagine a change in one characteristic might influence two others. Thus the entire system. So let's jump to playing around with the formula to see the effects, in terms of performance, we care about increasing the throughput which means increase the number of things, a server can do per second. If we want to increase our lambda ℷ - throughput, then we might try to decrease our latency with the same concurrency limit, most of the time that is beyond our control other than typical algorithm optimization because when we process that request, we might want to access some database or some other service on the network, so basically we might not have control for how long it takes to the response.
Another thing to do to increase our throughput would be to increase the level of concurrency L - the level of concurrency. How high our throughput can go giving some average latency which is usually more or less constant per system. It's constant because when the concurrency of the system is increased it means an overload of the actual server, and if the server could not handle that amount of requests at the same time, if there is no specific implementation added (by default) to handle the load like rejecting or queuing the request, the connections might drop because the request will time out. Or worse the machine running the server might crash or throttled because of running out of memory or CPU limits.
In conclusion, it’s important to note that concurrency limits need to be enforced at the server level, and if you don't have control over the client and the usage of the server/API should be documented considering the best performant concurrency limit. If you have control over your client, the client needs to send requests with considering a good concurrency limit to not overload the servers.
Top comments (0)