DEV Community: Max

What Exactly Is the Memory Limit of Node.js?

Max — Sat, 07 Dec 2024 14:18:29 +0000

Proficiency in Node.js API can get you going fast, but a profound understanding of the memory footprint of Node.js programs can take you further.

Let's kick things off by taking a peek at our memory usage with process.memoryUsage(), updating every second:

setInterval(() => { console.log('Memory Usage:', process.memoryUsage()); }, 1000);

Since the output is in bytes, it's not user-friendly. Let's spruce it up by formatting the memory usage into MB:

function formatMemoryUsageInMB(memUsage) {
    return {
        rss: convertToMB(memUsage.rss),
        heapTotal: convertToMB(memUsage.heapTotal),
        heapUsed: convertToMB(memUsage.heapUsed),
        external: convertToMB(memUsage.external)
    };
}

const convertToMB = value => {
    return (value / 1024 / 1024).toFixed(2) + ' MB';
};

const logInterval = setInterval(() => {
    const memoryUsageMB = formatMemoryUsageInMB(process.memoryUsage());
    console.log(`Memory Usage (MB):`, memoryUsageMB);
}, 1000);

Now, we can get the following output every second:

Memory Usage (MB): {
  rss: '30.96 MB', // The actual OS memory used by the entire program, including code, data, shared libraries, etc.
  heapTotal: '6.13 MB', // The memory area occupied by JS objects, arrays, etc., dynamically allocated by Node.js
                      // V8 divides the heap into young and old generations for different garbage collection strategies
  heapUsed: '5.17 MB',
  external: '0.39 MB'
}

Memory Usage (MB): {
  rss: '31.36 MB',
  heapTotal: '6.13 MB',
  heapUsed: '5.23 MB',
  external: '0.41 MB'
}

We all know that the V8 engine's memory usage is limited, not only by the OS's memory management and resource allocation policies but also by its own settings.

Using os.freemem(), we can see how much free memory the OS has, but that doesn't mean it's all up for grabs by a Node.js program.

console.log('Free memory:', os.freemem());

For 64-bit systems, Node.js V8's default maximum old space size is around 1.4GB. This means that even if your OS has more memory available, V8 won't automatically use more than this limit.

Tip: This limit can be changed by setting environment variables or specifying parameters when starting Node.js. For example, if you want V8 to use a larger heap, you can use the --max-old-space-size option:

node --max-old-space-size=4096 your_script.js

This value needs to be set based on your actual situation and scenario. For instance, if you have a machine with a lot of memory, deployed standalone, and you have many small-memory machines deployed in a distributed manner, the setting for this value will definitely differ.

Let's run a test by stuffing an array with data indefinitely until memory overflows and see when it happens.

const array = [];
while (true) {
    for (let i = 0; i < 100000; i++) {
        array.push(i);
    }
    const memoryUsageMB = formatMemoryUsageInMB(process.memoryUsage());
    console.log(`Memory Usage (MB):`, memoryUsageMB);
}

This is what we get when we run the program directly. After adding data for a bit, the program crashes.

Memory Usage (MB): {
  rss: '2283.64 MB',
  heapTotal: '2279.48 MB',
  heapUsed: '2248.73 MB',
  external: '0.40 MB'
}
Memory Usage (MB): {
  rss: '2283.64 MB',
  heapTotal: '2279.48 MB',
  heapUsed: '2248.74 MB',
  external: '0.40 MB'
}


#
# Fatal error in , line 0
# Fatal JavaScript invalid size error 169220804
#
#
#
#FailureMessage Object: 0x7ff7b0ef8070

Confused? Isn't the limit 1.4G? Why is it using over 2G? Actually, Node.js's 1.4GB limit is a historical limit of the V8 engine, applicable to early V8 versions and certain configurations. In modern Node.js and V8, Node.js automatically adjusts its memory usage based on system resources. In some cases, it may use much more than 1.4GB, especially when dealing with large data sets or running memory-intensive operations.

When we set the memory limit to 512M, it overflows when rss hits around 996 MB.

Memory Usage (MB): {
  rss: '996.22 MB',
  heapTotal: '993.22 MB',
  heapUsed: '962.08 MB',
  external: '0.40 MB'
}
Memory Usage (MB): {
  rss: '996.23 MB',
  heapTotal: '993.22 MB',
  heapUsed: '962.09 MB',
  external: '0.40 MB'
}

<--- Last few GCs --->

[22540:0x7fd27684d000]     1680 ms: Mark-sweep 643.0 (674.4) -> 386.8 (419.4) MB, 172.2 / 0.0 ms  (average mu = 0.708, current mu = 0.668) allocation failure; scavenge might not succeed
[22540:0x7fd27684d000]     2448 ms: Mark-sweep 962.1 (993.2) -> 578.1 (610.7) MB, 240.7 / 0.0 ms  (average mu = 0.695, current mu = 0.687) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

In summary, to be more precise, Node.js's memory limit refers to the heap memory limit, which is the maximum memory that can be occupied by JS objects, arrays, etc., allocated by V8.

Does the size of the heap memory determine how much memory a Node.js process can occupy? No! Keep reading.

Can I Put a 3GB File into Node.js Memory?

We saw in the test that the array can only hold a bit over 2GB before the program crashes. So, if I have a 3GB file, can't I put it into Node.js memory all at once?

You can!

We saw an external memory through process.memoryUsage(), which is occupied by the Node.js process but not allocated by V8. As long as you put the 3GB file there, there's no memory limit. How? You can use Buffer. Buffer is a C++ extension module of Node.js that allocates memory using C++, not JS objects and data.

Here's a demo:

setTimeout(()=>{
    let buffer = Buffer.alloc(1024 * 1024 * 3000);
}, 3000)

Even if you allocate 3GB of memory, our program is still running smoothly, and our Node.js program has occupied over 5GB of memory because this external memory is not limited by Node.js but by the operating system's limit on memory allocated to threads (so you can't just go wild, even Buffer can run out of memory; the essence is to handle large data with Streams).

In Node.js, the lifecycle of a Buffer object is tied to a JavaScript object. When the JavaScript reference to a Buffer object is removed, the V8 garbage collector marks the object as recyclable, but the underlying memory of the Buffer object is not immediately released. Typically, when the destructor of the C++ extension is called (for example, during the garbage collection process in Node.js), this part of the memory is released. However, this process may not be completely synchronized with V8's garbage collection.

Memory Usage (MB): {
  rss: '2392.73 MB',
  heapTotal: '2392.57 MB',
  heapUsed: '2359.93 MB',
  external: '3000.41 MB'
}
Memory Usage (MB): {
  rss: '2392.75 MB',
  heapTotal: '2392.57 MB',
  heapUsed: '2359.94 MB',
  external: '3000.41 MB'
}
Memory Usage (MB): {
  rss: '2392.75 MB',
  heapTotal: '2392.57 MB',
  heapUsed: '2359.94 MB',
  external: '3000.41 MB'
}

In summary: Node.js memory usage consists of JS heap memory usage (determined by V8's garbage collection) + memory allocation by C++

Why Is the Heap Memory Segregated into New and Old Generations?

The generational garbage collection strategy is highly prevalent in the implementations of modern programming languages! Similar strategies like Generational Garbage Collection can be found in Ruby,.NET, and Java. When garbage collection occurs, it often leads to a "stop the world" situation, which inevitably impacts program performance. However, this design is conceived with performance optimization in mind.

Divergent Object Lifespans During program development, a significant portion of variables are temporary, serving to fulfill specific local computational tasks. Such variables are better suited for Minor GC, that is, the new generation GC. The objects in the new generation memory are primarily subject to garbage collection via the Scavenge algorithm. The Scavenge algorithm bisects the heap memory into two parts, namely From and To (a classic space-for-time tradeoff. Thanks to their short survival time, they don't consume a large amount of memory).

When memory is allocated, it takes place within From. During garbage collection, the live objects in From are inspected and copied to To, followed by the release of non-live objects. In the subsequent round of collection, the live objects in To are replicated to From, at which point To morphs into From and vice versa. With each garbage collection cycle, From and To are swapped. This algorithm replicates only live objects during the copying process and thereby averts the generation of memory fragments.
So, how is the liveness of a variable determined? Reachability analysis comes into play. Consider the following objects as an example:

globalObject: The global object.
obj1: An object directly referenced by globalObject.
obj2: An object referenced by obj1.
obj3: An isolated object without any references from other objects.

In the context of reachability analysis:

globalObject, being a root object, is inherently reachable.
obj1, due to being referenced by globalObject, is also reachable.
obj2, as it is referenced by obj1, is reachable as well.
In contrast, obj3, lacking any reference paths to the root object or other reachable objects, is adjudged unreachable and thus eligible for recycling.

Admittedly, reference counting can serve as an auxiliary means. Nevertheless, in the presence of circular references, it fails to accurately ascertain the true liveness of objects.

In the old generation memory, objects are generally less active. However, when the old generation memory becomes full, it triggers the cleanup of the old generation memory (Major GC) through the Mark-Sweep algorithm.

The Mark-Sweep algorithm comprises two phases: marking and sweeping. In the marking phase, the V8 engine traverses all objects in the heap and tags the live ones. In the sweeping phase, only the unmarked objects are cleared. The merit of this algorithm is that the sweeping phase consumes relatively less time since the proportion of dead objects in the old generation is relatively small. However, its drawback is that it only clears without compacting, which may result in a discontinuous memory space, making it inconvenient to allocate memory for large objects.

This shortcoming gives rise to memory fragmentation, necessitating the employment of another algorithm, Mark-Compact. This algorithm shifts all live objects to one end and then eradicates the invalid memory space on the right side of the boundary in one fell swoop, thereby obtaining a complete and continuous available memory space. It resolves the memory fragmentation issue that might be caused by the Mark-Sweep algorithm, albeit at the cost of consuming more time in moving a large number of live objects.

If you find this post useful, please give it a thumbs up. :D

🔥 Practical Concurrent Control for Node.js Servers: Keep Your Server from Being Overwhelmed by Traffic!

Max — Thu, 28 Nov 2024 04:36:53 +0000

💡 Why Do We Need to Limit Concurrent Connections?

First, we need to understand a harsh reality: server resources are limited! Just like a small restaurant, if too many customers rush in at the same time, the service quality will inevitably decline and may even lead to chaos. The same applies to web servers:

🎯 Limited memory resources
🎯 Limited CPU processing power
🎯 Limited network bandwidth
🎯 Limited database connections

If we don't limit the number of concurrent connections, it may lead to:

😱 Slower server response
😱 Memory overflow
😱 Complete service downtime
😱 Other users unable to access

Let's take an example of a service we wrote:

Allocated 2GB memory
No request limit
Each request consumes some memory

Combining these 3 conditions, when there are too many requests and memory exceeds the limit, the service crashes directly. Let's simulate this:

const http = require('http');
const { promisify } = require('util');
const fs = require('fs');

const readFileAsync = promisify(fs.readFile);

async function loadLargeFileIntoMemory() {
    try {
        const data = await readFileAsync('./largefile');
        return data;
    } catch (error) {
        console.error('Error reading large file:', error);
        return null;
    }
}

const server = http.createServer(async (req, res) => {
    const largeFileData = await loadLargeFileIntoMemory();
    if (largeFileData) {
        res.writeHead(200, { 'Content-Type': 'text/plain' });
        res.end('Request processed');
    } else {
        res.writeHead(500, { 'Content-Type': 'text/plain' });
        res.end('Internal Server Error');
    }
});

const PORT = process.env.PORT || 3000;
server.listen(PORT, () => {
    console.log(`Server listening on port ${PORT}`);
});

Before sending requests, our program only occupied 35.1 MB of memory.

Simulated 200 concurrent connections using ab

Now our Node.js program has reached its peak memory usage! 27GB!

Let's limit the memory and make it crash.

node --max-old-space-size=2048 server.js

Can this set the Node.js memory usage?
Let's briefly mention the difference between V8 heap memory and process total memory:
--max-old-space-size only limits V8 engine's heap memory, but Node.js process total memory also includes:

Off-heap memory (Buffer, thread pool, etc.)
System memory (system calls, file operations, etc.)
Native memory (C++ level memory allocation)

For example, if we use Buffer, Buffer is allocated outside the V8 heap, here's an example:

const randomString = crypto.randomBytes(1024).toString('hex');

Then the Buffer created by crypto.randomBytes() is not limited by --max-old-space-size.

Taking fs.readFile in our program as an example, it's used to asynchronously read the entire contents of a file. When reading a file, the content is stored in a Buffer object, so it won't be limited by heap memory allocation.

So how should we limit it?

System-level memory limits (ulimit or Docker)
Process management tools (PM2)
Code-level memory monitoring and control

Interested readers can try it themselves. The above example shows that not controlling request concurrency can have disastrous effects on our program, so after development, we need to estimate how many visitors we will have to prepare appropriate resources for deployment.

🛠️ Implementation Solutions

Let's first implement the basic functionality of limiting concurrency

1. Using Queue to Control Concurrency

Let's look at a simple but practical implementation solution using a queue to control concurrent requests:

const express = require('express');
const app = express();

// Create a simple queue class
class RequestQueue {
  constructor(maxConcurrent) {
    this.maxConcurrent = maxConcurrent;
    this.currentRequests = 0;
    this.queue = [];
  }

  // Add request to queue
  enqueue(req, res, next) {
    if (this.currentRequests < this.maxConcurrent) {
      this.currentRequests++;
      next();
    } else {
      this.queue.push({ req, res, next });
    }
  }

  // Release resources after processing
  dequeue() {
    this.currentRequests--;
    if (this.queue.length > 0) {
      const { req, res, next } = this.queue.shift();
      this.currentRequests++;
      next();
    }
  }
}

// Create queue instance, max concurrency set to 100
const requestQueue = new RequestQueue(100);

// Middleware: limit concurrency
const limitConcurrent = (req, res, next) => {
  requestQueue.enqueue(req, res, next);
};

// Use middleware
app.use(limitConcurrent);

// Release resources when request ends
app.use((req, res, next) => {
  res.on('finish', () => {
    requestQueue.dequeue();
  });
  next();
});

// Example route
app.get('/api/test', async (req, res) => {
  // Simulate time-consuming operation
  await new Promise(resolve => setTimeout(resolve, 1000));
  res.json({ message: 'Request processed successfully!' });
});

app.listen(3000, () => {
  console.log('Server started on port 3000');
});

We limit concurrent request handling to 10, let's test it with ab

ab -n 100 -c 20  http://127.0.0.1:3000/api/test

The effect is very good, the results show that it can process 9 requests per second.

Let's explain the code briefly. First, we create a queue to store queued requests. If the current number of requests has already exceeded, say 10, then we put these requests in the queue

if (this.currentRequests < this.maxConcurrent) {
      this.currentRequests++;
      next();
    } else {
      this.queue.push({ req, res, next });
    }

When the previous requests are processed, we retrieve these queued requests to continue processing

  res.on('finish', () => {
    requestQueue.dequeue();
  });

2. Implementation Using Third-Party Libraries

With the basic principles covered above, we can take a look at mature libraries like bottleneck to see how they implement concurrency limiting. The implementation methods are quite similar.

const Bottleneck = require('bottleneck');

// Create a limiter instance
const limiter = new Bottleneck({
  maxConcurrent: 100,  // Maximum number of concurrent requests
  minTime: 100       // Minimum time between requests (ms)
});

// Apply the limiter middleware
app.use(limiter);

🎨 Optimizing Our Program's Concurrency Control

1. Implementing Graceful Degradation 🎯

Our current implementation makes users wait in line, consuming server resources while waiting. Another approach is to implement service degradation. Let requests return directly so that users won't be stuck waiting, which provides a better user experience and reduces the server load. We just need to return when the concurrency limit is reached.

// Return a friendly prompt when the concurrency limit is reached
if (requestQueue.currentRequests >= requestQueue.maxConcurrent) {
  return res.status(503).json({
    message: 'The server is busy. Please try again later.'
  });
}

After modification, we can see that the program's concurrent processing capacity has improved. It can handle 50 requests per second. Only the business logic of 10 requests will actually be processed, and other requests will return directly.

2. Monitoring and Alert Mechanism 📊

After the program is deployed online, we need a complete monitoring and alert mechanism. For example, to achieve the following purposes:

I can have data to observe the concurrency situation in the past two days.
When the concurrency exceeds a certain value, I should be notified.

At this time, we need to:

Expose the current connection situation of the program.
Collect connection information and customize alert rules.

In the Node.js ecosystem, prom-client is a commonly used library for creating and exposing monitoring metrics. It works well with monitoring systems (such as Prometheus), making it convenient for us to collect and display various indicator data of the application.

const prometheus = require('prom-client');
const counter = new prometheus.Counter({
  name: 'concurrent_requests',
  help: 'Current number of concurrent requests'
});

// Record the number of concurrent requests
app.use((req, res, next) => {
  counter.inc();
  res.on('finish', () => counter.dec());
  next();
});

Through this integration, we have exposed the connection number of the program to the /metric path. The next step is to configure the collector to collect data and observe and customize alert rules on the Prometheus platform.

3. Dynamically Adjusting Concurrency Limits 🔄

The concurrency limits we set earlier, such as 10 or 100, are not very intelligent. Although we can dynamically adjust them through environment variables and deployment resources to cope with unknown traffic volumes, is there a smarter way to help us determine what the limit should be set to? Yes, there is!

Dynamic concurrency limiting can intelligently adjust the maximum number of concurrent requests based on the real-time load of the system. When the system load is light, appropriately increase the concurrency limit to make full use of idle resources and improve the processing capacity of the application.

When the system load is heavy, timely reduce the concurrency limit to avoid excessive competition and depletion of resources and ensure the basic response ability and stability of the service.

To obtain the current load capacity of the system, we need to use the os library. The os.loadavg() method returns an array containing the average loads for 1 minute, 5 minutes, and 15 minutes:

[ 3.521484375, 3.57373046875, 3.6845703125 ]

This value is related to the number of CPU cores. For example, in a single-core system, a return value of 1 indicates a fully loaded state. Taking a single-core system as an example, we can dynamically adjust the concurrency limit for our program.

const os = require('os');

function startMonitoring() {
  setInterval(() => {
    const load = os.loadavg()[0];
    if (load > 0.7) {
      requestQueue.maxConcurrent = Math.max(50, requestQueue.maxConcurrent - 10);
    } else if (load < 0.3) {
      requestQueue.maxConcurrent = Math.min(200, requestQueue.maxConcurrent + 10);
    }
  }, 60000);
}

const server = http.createServer((req, res) => {
  handleRequest(req, res);
});

server.listen(3000, () => {
  startMonitoring();
  console.log('Server listening on port 3000');
});

By monitoring the system load in real time and adjusting the maximum number of concurrent requests flexibly according to the load situation, we can avoid resource waste and service failures. However, implementing this dynamic concurrency limiting mechanism requires comprehensive testing based on our actual scenario. In this example, loadavg provides us with a good reference:

For example, a Node.js application may frequently perform disk I/O operations (such as reading or writing files) or network I/O operations (such as sending HTTP requests and waiting for responses). These I/O operations may cause the process to enter a waiting state, which will be counted towards the system load. Therefore, even if the CPU usage is not high, if there are a large number of processes waiting for I/O completion, the value of loadavg may still be high.

💝 Practical Suggestions

🎯 Set a reasonable concurrency limit based on the server configuration and have a clear understanding of your program's capabilities.
🎯 Consider the characteristics of the business. It may be necessary to set different limits for different APIs. This article only sets the maximum concurrency limit and does not consider fairness.
🎯 Regularly monitor and analyze performance data to avoid being unaware of the situation online.
🎯 Establish a comprehensive degradation plan.

🌈 Summary

Through reasonable concurrency control, we can:

🌟 Protect the server from being overloaded.
🌟 Provide more stable services.
🌟 Optimize resource utilization.
🌟 Improve the user experience.

Rather than waiting for the server to be overwhelmed and then trying to rescue it, it's better to take preventive measures in advance! I hope this article is helpful to everyone! If you find it useful, don't forget to like and follow me! 💖

The Big Reveal of Node.js Performance Optimization! 🚀 Part One: Profiling Node.js

Max — Fri, 22 Nov 2024 04:26:32 +0000

Have you ever encountered such a dilemma: You thought that by leveraging Node.js's event-driven and asynchronous I/O, you could smoothly increase the throughput of your service, but the actual test results showed that it could only handle 5 requests per second? When actually applying Node.js to the production environment, do you also have a big question mark in your mind regarding the performance bottlenecks of Node.js? This article will help you deepen your understanding of Node.js performance analysis through a case study and assist you in dealing with the new challenges brought by Node.js while it brings us convenience. This article conducts an in-depth analysis by profiling Node.js applications, uncovers the key bottlenecks that affect performance, and implements improvement measures to achieve a significant increase in throughput.

Appetizer

When developing Node.js applications, the use of libraries is an essential part. Among them, native libraries like fs and http interact with the underlying operating system through C++ binding to implement functions. This implementation ensures the efficient execution of functions such as file operations and network communications, which is an undoubted fact.

However, we need to think deeply about a question: Is it only native libraries that use this efficient implementation method? Obviously not. There are numerous libraries on the current market, and many of them are implemented in C++ to improve performance, such as the encryption library bcrypt, which is a typical CPU-intensive operation.

When installing such libraries, they have a typical characteristic: they all need to be downloaded and compiled when running npm i. For example, when you install canvas, by adding the parameter npm i canvas --verbose, you will see the following information:

npm info run canvas@2.11.2 install node_modules/canvas node-pre-gyp install --fallback-to-build --update-binary
npm info run canvas@2.11.2 install { code: 0, signal: null }

When node-pre-gyp is running, it indicates that this library uses C++ plugins and calls the underlying operating system. It needs to provide packages that support your operating system according to the operating system you are using during installation. node-pre-gyp first compiles this C++ plugin before it can be used in your Node.js application.

To improve the installation speed of libraries, generally, authors will provide several pre-compiled programs for mainstream operating systems. Then node-pre-gyp will directly download and install them. However, when the pre-compiled program is not found, it will trigger the build process and compile during installation, like the following:

   node-pre-gyp info it worked if it ends with ok
    node-pre-gyp info using node-pre-gyp@0.14.0
    node-pre-gyp info using node@14.17.0 | darwin | x64
    node-gyp info find Python using Python version 3.8.2 found at "/usr/local/bin/python"
    node-gyp info spawn /usr/local/bin/python
    node-gyp info spawn args [
      '/Users/user/.nvm/versions/node/v14.17.0/lib/node_modules/npm/node_modules/node-gyp/gyp/gyp_main.py',
      'binding.gyp',
      '-f',
      'make',
      '-I',
      '/Users/user/projects/my-project/node_modules/my-native-module/build/config.gyp.i',
      '-I',
      '/Users/user/.nvm/versions/node/v14.17.0/lib/node_modules/npm/node_modules/node-gyp/addon.gyp.i',
      '-I',
      '/Users/user/Library/Caches/node-gyp/14.17.0/include/node/common.gyp.i',
      '-Dlibrary=shared_library',
    ]
    node-gyp info spawn make
    node-gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build' ]
      CXX(target) Release/obj.target/my-native-module/my-native-module.o
      SOLINK_MODULE(target) Release/binding.node
    node-pre-gyp info ok

Through this example, we understand a principle: when you need to provide libraries for CPU-intensive operations, using C++ modules is a common practice.

The Service Suddenly Became as Slow as a PPT 😅

Here's what happened. On the monitoring dashboard, I saw that the throughput of a certain service was very low. So I analyzed the historical data of the service and found that it was previously high but had become very low recently. I guessed that it might be due to the implementation of a certain business requirement that led to the decline in program performance. How to troubleshoot it? There were hundreds of commit records across different teams, just like looking for a needle in a haystack.

First, I used ab to conduct a benchmark on the service in the test environment to confirm that it was not a problem with the monitoring data but that there was indeed a problem with the service performance.

ab -n 500 -c 100 https://HOST/endpoint

The results are as follows:

The Requests per second was only 272. With so much CPU and memory added, it could only handle 272 requests per second? It was confirmed that there was a problem with the service performance. With the intention of saving costs for the company, I began to use technical means to troubleshoot the performance bottlenecks of this service.

Identifying Performance Bottlenecks Through Profiling Node.js

When considering server performance, we are always making a balance. On the one hand, there are the throughput requirements brought by business volume, and on the other hand, how to make better use of hardware resources to reduce hardware costs. For Node.js applications, we are no exception. First, we use the Profiling tool that comes with Node.js to investigate CPU usage.

Profiling is a technique for analyzing program performance. It can help us understand the behavior of the program when it is running and find out where the performance bottlenecks are.
The performance was improved by 50%. It could handle 400 requests per second, and the return speed was also reduced from over 400 ms before to over 200 ms.

Next, we modify the running command of the program and add the --prof option when running:

node --prof app.js

Then we can get a file named isolate-0x7fcef2c4d000-60450-v8.log in the project path as follows:

By analyzing this file, we can understand various performance-related aspects of the Node.js program when it is running, such as shared library usage, time consumption, memory allocation, and code creation. However, as shown above, currently, we can't understand this file at all. The current format of this file is only "convenient for storage". We need another command to analyze this file, which also comes with Node.js.

node --prof-process isolate-0x7fcef2c4d000-60450-v8.log > processed.txt

By analyzing the generated processed.txt file, we can see information such as the execution time and the number of calls of each function to find out who is dragging down the performance.

The analysis results show:

    [Summary]:
       ticks  total  nonlib   name
       19557   95.7%   99.8%  JavaScript
         89    0.4%    0.5%  C++
         0     0.0%    0.0%  GC

Wow! The JavaScript part occupies 99.8% of the CPU time. Could it be that someone is using Node.js to perform CPU-intensive tasks? Here, ticks is not a specific time. You can understand it as time slices. Generally, the more ticks there are, the more time it occupies.

Continuing to view the file, in the JavaScript part, we found the culprit gaussianBlur. What the heck? Gaussian blur? This function occupies a large amount of CPU calculation.

 [JavaScript]:
   ticks  total  nonlib   name
   12105  61.6%   61.8%  LazyCompile: *gaussianBlur /app/server.js:15:23
...

 [C++]:
   ticks  total  nonlib   name
     35    0.2%          node::Start(int, char**)
...

 [Bottom up (heavy) profile]:
   ticks parent  name
   12105   61.6%  LazyCompile: *gaussianBlur /app/server.js:15:23
    ├─8473   70.0%    Function: processPixel /app/server.js:35:20
    │  └─4892   57.7%      Function: exp native math.js:178:12
    └─2560   21.1%    LazyCompile: *calculateWeight /app/server.js:45:24

By looking up the record of the commit of this code, I quickly located the relevant developer. The developer said that there was a requirement to blur an image and thought of implementing it in the simplest way without adding additional dependencies, thus resulting in this performance problem. It seems simple, but actually it's not.

After communicating with the developer, it was decided to use a third-party library to calculate the blurred image for this implementation solution. Remember the appetizer part earlier? That's right, we use the canvas library to implement this function. Why does the performance of the third-party library is higher? It's not because the calculation logic is better, but because it is a C++ library and is more suitable for CPU-intensive operations.
After the developer modified the solution, a performance test was conducted on the program again:

It's Time to Unleash a Big Move: C++ Plugins Are Coming! 💪

The previous solution happened to be able to be handled by open-source solutions, but what if there is no ready-made library for the problem encountered? We can implement a C++ extension by ourselves to solve the performance problems caused by calculations.
First, write a calculation function for Gaussian blur.

// gaussian_blur.cpp
#include <node.h>
#include <cmath>

namespace gaussian_blur {

using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::Object;
using v8::Number;
using v8::Value;

void GaussianBlur(const FunctionCallbackInfo<Value>& args) {
    Isolate* isolate = args.GetIsolate();

    Local<Object> buffer = args[0].As<Object>();
    int width = args[1].As<Number>()->Value();
    int height = args[2].As<Number>()->Value();
    int radius = args[3].As<Number>()->Value();

    unsigned char* data = (unsigned char*)node::Buffer::Data(buffer);
    //... Implement the code written in JS here with C++...
}

void Initialize(Local<Object> exports) {
    NODE_SET_METHOD(exports, "gaussianBlur", GaussianBlur);
}

NODE_MODULE(gaussian_blur, Initialize)
}

After writing this file, Node.js cannot call it directly. As many of you may know, Node.js is based on V8. However, many students may not pay much attention to what V8 is. I will write a separate article to talk about V8 later, but the conclusion is that V8 does not support running C++ code. So we need to indirectly or build a bridge to let V8 call C++ code. This bridge is the compiler node-gyp. We use this tool written in python to compile the above C++ code.

{
  "targets": [{
    "target_name": "gaussian_blur",
    "sources": [ "gaussian_blur.cpp" ]
  }]
}

Why use node-gyp to compile C++ code instead of a C++ compiler? Because it not only compiles C++ code into binary files for your operating system but also creates a Node.js module. The interfaces in this module will tell V8 how to initialize the module, create objects, set properties, and methods, thus exposing the functions implemented in C++ to JS. For example 🌰: When JS passes a numeric parameter to a C++ function, it will be converted to the C++ numeric type int or double. Similarly, when C++ returns its own data type, it will also be converted into a data type that V8 can understand, thus solving the code differences.

For more details, please refer to the subsequent articles in this series. Just have a concept here for now.
Finally, it's very simple to call the module just compiled by node-gyp in Node.js.

const gaussianBlur = require('./build/Release/gaussian_blur');

app.post('/blur', async (req, res) => {
    const image = req.body.image;
    const result = gaussianBlur.gaussianBlur(image.data, image.width, image.height, 5);
    res.send(result);
});

Preview

This article has introduced identifying bottlenecks at the CPU level through Profiling. In the actual production environment, there are also numerous cases where throughput is reduced due to memory issues. Everyone can leave a message to urge for more updates.

Epilogue

The code examples used in the text are actually quite different from real business scenarios. However, after this improvement, the business effect has been significantly improved 🎉.

Before optimization: It took 2000 ms to process a 1920x1080 image.
After optimization: It only takes 150 ms to process the same image.
Throughput improvement: Originally, only 0.5 images could be processed per second, but now 6 - 7 images can be processed!
CPU usage: It has dropped from 100% of a single core to about 30%.

I hope these experiences will be helpful to other developers in the Node.js stack. Here are some takeaways 🌟:

Don't blindly believe in JavaScript. It's not omnipotent. Consider using C++ for CPU-intensive tasks.
Performance optimization should be based on data rather than speculation.
The Node.js ecosystem provides powerful performance analysis tools, and we should make good use of them.
Reasonable use of C++ plugins can significantly improve performance.

Finally, a reminder: Performance optimization should be determined according to the actual situation. Sometimes, using **Worker Threads **may be enough, and it's not necessary to use the "heavy weapon" like C++. Choosing the appropriate optimization solution is more important than blindly pursuing extreme performance!

If the article helps you, please give it a like ❤️