Vitor Araujo

Posted on Dec 1, 2024

Improving the performance of a node http server when inserting large amounts of data

This is a very specific case, and it helps reduce resource consumption of the microservice container and the database, and allows data insertion much faster. Of course, for those who have already been through this and have a lot of experience, it seems obvious, but of course it will not be obvious to all software development teams.

Let's go, I'm going to start from two assumptions: 1. Unlike Elixir, which seems like a great language to handle thousands of simultaneous HTTP requests or websockets, as long as there are CPU threads to process, NodeJs is single-threaded, and HTTP requests ARE EXPENSIVE for NodeJs to manage [1]; 2. Batch transactions have more performance in the database.

Problem of example

Imagine that you want to insert some data that you receive from an SQS queue into a microservice. You create a POST endpoint that receives information and performs an insert. Easy, it's a simple CRUD that you learned in college. However, some time later, and this SQS queue is scaling faster, you can increase the number of concurrent executions of your queue, but this represents an increase in costs, and an increase in monetary costs is not something desirable, right? Another idea would be to increase the lambda's batchSize and have it go through the array of messages, gather them all into one, create an array of promises, and then wait for them all to complete. In a way, you have made a lambda execution, instead of making 1 HTTP request, make 10, that is, consuming the queue faster - OK, this should solve the problem.

However, imagine that the queue keeps growing and you need to address this problem again, but this time, since the lambdas are executing HTTP requests in parallel, increasing the number of executions when consuming the message from the queue also exponentially increases the number of HTTP requests, resulting in an increase in the number of microservice containers running due to high CPU consumption. (AutoScalling)

This is just an example I thought of to illustrate a problem situation in which the test I am doing fits. Let's think in a more architectural way of how to solve it, remembering that HTTP requests are expensive, that's where the trick comes in.

This can be solved by changing the way the microservice processes requests. Instead of receiving one item per HTTP request, it can receive a list of items, inserting them into the database in batches.

It is estimated that this will greatly increase your data insertion capacity in your system, but let's do the tests to compare whether this really makes sense in practice or if everything is the same.

Test parameters

I wrote a simple ExpressJs application that uses a simple lib to connect to a PostgresSQL database. I wrote a docker-compose to start the container and the database, with a SQL script to create tables, indexes and a small number of items in the database.

The microservice container has 1vCPU and 1GB RAM, while the PostgresSQL container has 2vCPU and 1GB RAM (perhaps similar to RDS t4g.micro, of course RDS will have more processing and more disk write capacity).

This is running on a notebook with an I7-8665U, with 8GB of RAM, on a LITEON CA5-8D256 SSD.

The software used to make HTTP requests is K6.

Tests

Two tests will be performed with the same endpoint, since it has already been configured to receive an array.

stress-test-one: There will be a maximum of 5000 VUs each passing an array of a single item;
stress-test-batch: There will be a maximum of 100 VUs each passing an array of a thousand items.

The duration is 10 seconds.

Execution

stress-test-one

The test went well, no HTTP requests gave an error, inserting a total of 20081 items, 1481 items per second, which is the same number of HTTP requests.

Duration.............: avg=2.02s    min=43.57ms  max=9.14s  p(95)=6.56s   
http_reqs............: 20081   1481.148722/s
inserted_itens.......: 20081   1481.148722/s  
successful_requests..: 100.00% 20081 out of 20081

However, one interesting thing is that this took an average of 2 seconds, with a maximum of 9 seconds, with a 95% percentile of 6.56 seconds.

Another thing was that the CPU usage of the node container was quite high:

stress-test-batch

The test went well, no HTTP requests giving an error, inserting a total of 756,000 items, 57,692 items per second, with 57 HTTP requests per second.

Duration.......................: avg=566.9ms  min=14.87ms max=1.02s p(95)=888.86ms
http_reqs......................: 756     57.692505/s
inserted_itens.................: 756000  57692.505231/s
successful_requests............: 100.00% 756 out of 756

It took an average of 566.9 ms to insert 1000 items, with a maximum of 1.02 seconds and a 95% percentile of 800 ms.

The CPU usage of the node container was low, but the CPU usage of the database container was high, at 200% (I believe this is a trick of Docker, 100% of each of the 2 vCPU, resulting in 200%).

Database behavior

Another interesting thing is the relationship between transactions and block/IO. While in the test sending one item per request the number of transactions was very high, with a low number of block/IO, sending data in batches the number of transactions was much lower, while the block/IO was higher.

I separated the first test from the second test with a black line:

Results and conclusions

By using batch insertion, which uses 96.7% fewer HTTP requests than the single insertion method, I got 3,664% more items inserted into the database, with much less CPU usage on the NodeJs microservice and a lower average response time of requests, at the cost of more CPU usage on the database side.

Running the numbers

All code used in the tests is available in the repository: test-performance-inserting-strategies-nodejs, containing README files teaching how to upload docker containers and run load tests.

DEV Community

Improving the performance of a node http server when inserting large amounts of data

Problem of example

Test parameters

Tests

Execution

stress-test-one

stress-test-batch

Database behavior

Results and conclusions

Running the numbers

Top comments (0)

Read next

Stremlining Development with Daytona

Creating a Product Overview Layout Using Tailwind CSS

ReScript Schema V9 - Zod-like library to the next level 🚀

🚀 Creando Animaciones Interactivas con CSS y HTML ✨