This is a very specific case, and it helps reduce resource consumption of the microservice container and the database, and allows data insertion much faster. Of course, for those who have already been through this and have a lot of experience, it seems obvious, but of course it will not be obvious to all software development teams.
Let's go, I'm going to start from two assumptions: 1. Unlike Elixir, which seems like a great language to handle thousands of simultaneous HTTP requests or websockets, as long as there are CPU threads to process, NodeJs is single-threaded, and HTTP requests ARE EXPENSIVE for NodeJs to manage [1]; 2. Batch transactions have more performance in the database.
Problem of example
Imagine that you want to insert some data that you receive from an SQS queue into a microservice. You create a POST endpoint that receives information and performs an insert. Easy, it's a simple CRUD that you learned in college. However, some time later, and this SQS queue is scaling faster, you can increase the number of concurrent executions of your queue, but this represents an increase in costs, and an increase in monetary costs is not something desirable, right? Another idea would be to increase the lambda's batchSize and have it go through the array of messages, gather them all into one, create an array of promises, and then wait for them all to complete. In a way, you have made a lambda execution, instead of making 1 HTTP request, make 10, that is, consuming the queue faster - OK, this should solve the problem.
However, imagine that the queue keeps growing and you need to address this problem again, but this time, since the lambdas are executing HTTP requests in parallel, increasing the number of executions when consuming the message from the queue also exponentially increases the number of HTTP requests, resulting in an increase in the number of microservice containers running due to high CPU consumption. (AutoScalling)
This is just an example I thought of to illustrate a problem situation in which the test I am doing fits. Let's think in a more architectural way of how to solve it, remembering that HTTP requests are expensive, that's where the trick comes in.
This can be solved by changing the way the microservice processes requests. Instead of receiving one item per HTTP request, it can receive a list of items, inserting them into the database in batches.
It is estimated that this will greatly increase your data insertion capacity in your system, but let's do the tests to compare whether this really makes sense in practice or if everything is the same.
Test parameters
I wrote a simple ExpressJs application that uses a simple lib to connect to a PostgresSQL database. I wrote a docker-compose to start the container and the database, with a SQL script to create tables, indexes and a small number of items in the database.
The microservice container has 1vCPU and 1GB RAM, while the PostgresSQL container has 2vCPU and 1GB RAM (perhaps similar to RDS t4g.micro, of course RDS will have more processing and more disk write capacity).
This is running on a notebook with an I7-8665U, with 8GB of RAM, on a LITEON CA5-8D256 SSD.
The software used to make HTTP requests is K6.
Tests
Two tests will be performed with the same endpoint, since it has already been configured to receive an array.
- stress-test-one: There will be a maximum of 5000 VUs each passing an array of a single item;
- stress-test-batch: There will be a maximum of 100 VUs each passing an array of a thousand items.
The duration is 10 seconds.
Execution
stress-test-one
The test went well, no HTTP requests gave an error, inserting a total of 20081 items, 1481 items per second, which is the same number of HTTP requests.
Duration.............: avg=2.02s min=43.57ms max=9.14s p(95)=6.56s
http_reqs............: 20081 1481.148722/s
inserted_itens.......: 20081 1481.148722/s
successful_requests..: 100.00% 20081 out of 20081
However, one interesting thing is that this took an average of 2 seconds, with a maximum of 9 seconds, with a 95% percentile of 6.56 seconds.
Another thing was that the CPU usage of the node container was quite high:
stress-test-batch
The test went well, no HTTP requests giving an error, inserting a total of 756,000 items, 57,692 items per second, with 57 HTTP requests per second.
Duration.......................: avg=566.9ms min=14.87ms max=1.02s p(95)=888.86ms
http_reqs......................: 756 57.692505/s
inserted_itens.................: 756000 57692.505231/s
successful_requests............: 100.00% 756 out of 756
It took an average of 566.9 ms to insert 1000 items, with a maximum of 1.02 seconds and a 95% percentile of 800 ms.
The CPU usage of the node container was low, but the CPU usage of the database container was high, at 200% (I believe this is a trick of Docker, 100% of each of the 2 vCPU, resulting in 200%).
Database behavior
Another interesting thing is the relationship between transactions and block/IO. While in the test sending one item per request the number of transactions was very high, with a low number of block/IO, sending data in batches the number of transactions was much lower, while the block/IO was higher.
I separated the first test from the second test with a black line:
Results and conclusions
By using batch insertion, which uses 96.7% fewer HTTP requests than the single insertion method, I got 3,664% more items inserted into the database, with much less CPU usage on the NodeJs microservice and a lower average response time of requests, at the cost of more CPU usage on the database side.
Running the numbers
All code used in the tests is available in the repository: test-performance-inserting-strategies-nodejs, containing README files teaching how to upload docker containers and run load tests.
Top comments (0)