Improving the performance of a node http server when inserting large amounts of data

Vitor Araujo — Sun, 01 Dec 2024 19:17:03 +0000

This is a very specific case, and it helps reduce resource consumption of the microservice container and the database, and allows data insertion much faster. Of course, for those who have already been through this and have a lot of experience, it seems obvious, but of course it will not be obvious to all software development teams.

Let's go, I'm going to start from two assumptions: 1. Unlike Elixir, which seems like a great language to handle thousands of simultaneous HTTP requests or websockets, as long as there are CPU threads to process, NodeJs is single-threaded, and HTTP requests ARE EXPENSIVE for NodeJs to manage [1]; 2. Batch transactions have more performance in the database.

Problem of example

Imagine that you want to insert some data that you receive from an SQS queue into a microservice. You create a POST endpoint that receives information and performs an insert. Easy, it's a simple CRUD that you learned in college. However, some time later, and this SQS queue is scaling faster, you can increase the number of concurrent executions of your queue, but this represents an increase in costs, and an increase in monetary costs is not something desirable, right? Another idea would be to increase the lambda's batchSize and have it go through the array of messages, gather them all into one, create an array of promises, and then wait for them all to complete. In a way, you have made a lambda execution, instead of making 1 HTTP request, make 10, that is, consuming the queue faster - OK, this should solve the problem.

However, imagine that the queue keeps growing and you need to address this problem again, but this time, since the lambdas are executing HTTP requests in parallel, increasing the number of executions when consuming the message from the queue also exponentially increases the number of HTTP requests, resulting in an increase in the number of microservice containers running due to high CPU consumption. (AutoScalling)

This is just an example I thought of to illustrate a problem situation in which the test I am doing fits. Let's think in a more architectural way of how to solve it, remembering that HTTP requests are expensive, that's where the trick comes in.

This can be solved by changing the way the microservice processes requests. Instead of receiving one item per HTTP request, it can receive a list of items, inserting them into the database in batches.

It is estimated that this will greatly increase your data insertion capacity in your system, but let's do the tests to compare whether this really makes sense in practice or if everything is the same.

Test parameters

I wrote a simple ExpressJs application that uses a simple lib to connect to a PostgresSQL database. I wrote a docker-compose to start the container and the database, with a SQL script to create tables, indexes and a small number of items in the database.

The microservice container has 1vCPU and 1GB RAM, while the PostgresSQL container has 2vCPU and 1GB RAM (perhaps similar to RDS t4g.micro, of course RDS will have more processing and more disk write capacity).

This is running on a notebook with an I7-8665U, with 8GB of RAM, on a LITEON CA5-8D256 SSD.

The software used to make HTTP requests is K6.

Tests

Two tests will be performed with the same endpoint, since it has already been configured to receive an array.

stress-test-one: There will be a maximum of 5000 VUs each passing an array of a single item;
stress-test-batch: There will be a maximum of 100 VUs each passing an array of a thousand items.

The duration is 10 seconds.

Execution

stress-test-one

The test went well, no HTTP requests gave an error, inserting a total of 20081 items, 1481 items per second, which is the same number of HTTP requests.

Duration.............: avg=2.02s    min=43.57ms  max=9.14s  p(95)=6.56s   
http_reqs............: 20081   1481.148722/s
inserted_itens.......: 20081   1481.148722/s  
successful_requests..: 100.00% 20081 out of 20081

However, one interesting thing is that this took an average of 2 seconds, with a maximum of 9 seconds, with a 95% percentile of 6.56 seconds.

Another thing was that the CPU usage of the node container was quite high:

stress-test-batch

The test went well, no HTTP requests giving an error, inserting a total of 756,000 items, 57,692 items per second, with 57 HTTP requests per second.

Duration.......................: avg=566.9ms  min=14.87ms max=1.02s p(95)=888.86ms
http_reqs......................: 756     57.692505/s
inserted_itens.................: 756000  57692.505231/s
successful_requests............: 100.00% 756 out of 756

It took an average of 566.9 ms to insert 1000 items, with a maximum of 1.02 seconds and a 95% percentile of 800 ms.

The CPU usage of the node container was low, but the CPU usage of the database container was high, at 200% (I believe this is a trick of Docker, 100% of each of the 2 vCPU, resulting in 200%).

Database behavior

Another interesting thing is the relationship between transactions and block/IO. While in the test sending one item per request the number of transactions was very high, with a low number of block/IO, sending data in batches the number of transactions was much lower, while the block/IO was higher.

I separated the first test from the second test with a black line:

Results and conclusions

By using batch insertion, which uses 96.7% fewer HTTP requests than the single insertion method, I got 3,664% more items inserted into the database, with much less CPU usage on the NodeJs microservice and a lower average response time of requests, at the cost of more CPU usage on the database side.

Running the numbers

All code used in the tests is available in the repository: test-performance-inserting-strategies-nodejs, containing README files teaching how to upload docker containers and run load tests.

How did I speed up a Javascript function by 97%

Vitor Araujo — Wed, 30 Oct 2024 18:05:17 +0000

In a recent project, I came across a function whose efficiency left something to be desired. This function performed two map loops, three filters (each accompanied by a Array.includes) and an additional map with a find built-in, totaling 12 iterations. Although some of these methods, such as filters, did not need to traverse the entire array, the operation was still quite costly, especially for large quantities of items.

The complexity of this function was O(n * m), which could quickly become a problem as the project scaled.

So I decided to optimize this function. The first step I took was to replace the two key arrays with a Set. In JavaScript, a Set is a structure that stores unique data and offers much faster presence checking than an array. While checking an array has O(n) complexity, in Set it is O(1). Furthermore, the performance of the Set.has method does not decrease with the increase in the number of data in Set, unlike Array.includes.

This change has already provided a significant improvement in the filters that operated on the arrays. However, in one of the maps, there was an Array.find that could also be optimized, changing Array.find to Map.get. In JavaScript, a Map is an indexed list, while Array.find performs a linear search, and can be 2,100 to 12,000 times slower than Map, depending on the performance of the processor where the code is being executed.

By replacing Array.find with Map.get in one of the loops, I was able to reduce the total number of iterations from 12 to 9. While a reduction of three loops may not seem significant, the complexity of the algorithm became O(n + m), and the function execution time was reduced by an impressive 96%!

In tests carried out on an Intel Core i7-10510U, executing the function with arrays took 28 times longer than executing it with Map and Set, using arrays of 5,000 items: 191.19 ms to 6.80 ms.

It is worth mentioning that, while the original algorithm with arrays had O(n * m) complexity, the execution time increased exponentially with the number of items. In a software development scenario, it is crucial to consider business growth and the limitations of the machines on which the code will be executed. For example, if the arrays grew to 50,000 items, the execution time of the original algorithm would be 13,585 ms, while the optimized algorithm with Set and Map would only take 135 ms. In this case, the original algorithm would be 100 times slower, showing a 99% reduction in execution time with optimization.

Conclusion

Given the speed superiority of Set and Map compared to Array for information retrieval, the cost of iterating to create a Set or a Map is justified when it is necessary to check this information in iterations like Array.filter or Array.find.

However, the use of Set may not always be viable due to some disadvantages, such as the lack of sequential ordering, the impossibility of direct access to elements by index and the restriction of not storing duplicate elements.

Despite these limitations, in many situations, replacing arrays with Set or Map can bring significant advantages in terms of performance and efficiency.

DEV Community: Vitor Araujo