DEV Community

Chimzi Chiorlu
Chimzi Chiorlu

Posted on

Concurrency in Node.js (Child Processes vs Worker Threads)

In this article, we'll explore how to leverage the power of child processes and worker threads to overcome the single-threaded limitation of Node.js, optimizing your applications for performance. Whether you're battling with heavy computations or managing numerous I/O operations, this article will provide you with practical insights and code examples to master concurrency in Node.js. Let’s get started!

As a JavaScript runtime, Node.js is single-threaded by nature. This means that by default, all your JavaScript code in a Node.js application runs in a single sequence or thread. While this model simplifies many aspects of coding — for example, you don't need to worry about complex synchronization issues common in multi-threaded environments — it also means that your application can process only one operation at a time. This is generally not an issue for I/O bound work, like network requests or file system tasks, because Node.js uses a non-blocking I/O handling system and can offload these operations, allowing the single thread to do other work while waiting for the I/O operation to complete. However, for CPU-bound tasks — those involving heavy computation — the single-threaded nature can become a bottleneck. This is where techniques like asynchronous programming, worker threads, and child processes come into play, allowing for more efficient utilization of system resources and enhanced application performance.

While worker threads and child processes are alike in function — offering ways to handle tasks in parallel to the main execution thread — their applications vary significantly. Imagine a chocolate factory, the entire factory is a process with a building, machinery, raw materials, and workers. In computer terms, a process has its own memory space and resources allocated by the operating system. Within this factory, there are several assembly lines, each managed by a worker. These assembly lines are like threads, operating within the process (the factory) and sharing their resources (like raw materials and machinery); each thread (assembly line) can also work independently of the others.

Now, suppose the factory decides to make a new type of chocolate that requires a completely different set of machinery and raw materials. It might make sense to set up a new factory for this purpose. This new factory is like a child process. It's a completely separate process with dedicated resources, but it was spawned by the original process (the parent factory).

In computer science, threads and processes are fundamental units of execution. Using worker threads and spawning child processes can help speed up your code, but they also add complexity and can introduce new classes of bugs, so they should be used judiciously. In Node.js, the child_process and the worker_threads modules are used to spawn child processes and create worker threads respectively.

Let’s consider how these modules work.

child_process

The child_process module allows us access to operating system features by executing commands within a child process. The child_process module provides various methods such as fork(), spawn(), exec(), execFile(), and their synchronous counterparts for different use cases.
Note: The examples in this article are Linux-based. For Windows, please replace the commands with their Windows equivalents.

spawn()

The spawn() method is the foundation of the child_process module, and it is essential for creating new processes to execute system commands. It is asynchronous by design, allowing the primary program to run concurrently without waiting for the child process to complete. The spawnSync() method provides equivalent functionality in a synchronous manner that blocks the event loop until the spawned process either exits or is terminated.

const { spawn } = require('child_process');
const child = spawn('ls', ['-lh', '/usr']);

child.stdout.on('data', (data) => {
  console.log(`stdout: ${data}`);
});
Enter fullscreen mode Exit fullscreen mode

Pipes for stdin, stdout, and stderr are established between the parent Node.js process and the spawned subprocess. These data streams allow us to manage the arguments provided to the underlying OS commands as well as the output of the operations. The child_process module allows the output of one process to be used as the input of another.

const { spawn } = require('child_process');
const grep = spawn('grep', ['ssh']);
const ps = spawn('ps', ['aux']);

ps.stdout.pipe(grep.stdin);
grep.stdout.on('data', (data) => {
  console.log(data.toString());
});
Enter fullscreen mode Exit fullscreen mode
exec() and execFile()

exec() and execFile() methods allow for command execution with and without a shell, respectively. The exec() method spawns a shell and runs a command within that shell, passing the stdout and stderr to a callback function when complete. The exec() method is especially powerful, allowing numerous commands to be executed in a shell, but it must be used with caution due to security issues such as the potential of command injection attacks.

const { exec } = require('child_process');
exec('ls -lh /usr', (error, stdout, stderr) => {
  console.log(`stdout: ${stdout}`);
});
Enter fullscreen mode Exit fullscreen mode

The synchronous counterparts for the exec and execFile methods are execSync() and execFileSync() respectively

fork()

The fork() method is used for spawning new Node.js processes. It creates a separate V8 instance, allowing JavaScript files to run in separate processes, and facilitating direct communication between the parent and child.

const { fork } = require('child_process');
const child = fork('child.js');

child.on('message', (message) => {
  console.log('Message from child:', message); // prints ‘hello’
});

Enter fullscreen mode Exit fullscreen mode

In child.js, messages can be sent to the parent process

process.send('hello')
Enter fullscreen mode Exit fullscreen mode
Communication and Buffer Limits in Child Processes

Communication between parent and child processes is facilitated through Inter-Process Communication (IPC) channels. These channels allow messages to be sent between the parent and child, enabling data exchange and coordination for various tasks. Buffer limits play a crucial role in managing the data flow between parent and child processes. Exceeding these limits can lead to truncated data or failure in child processes. Adjusting parameters like maxBuffer helps in customizing these limits based on the application’s needs.

exec('ls -lh /usr', { maxBuffer: 1024 * 1024 }, (error, stdout, stderr) => {
  console.log(`stdout: ${stdout}`);
});
Enter fullscreen mode Exit fullscreen mode

worker_threads

The worker_threads module, introduced in Node.js v10.5.0, allows JavaScript code to be executed in parallel threads. This is especially useful for CPU-intensive tasks, as it improves program performance by allowing numerous threads to execute concurrently.

Unlike the child_process module, worker_threads can share memory. This shared memory capability allows for faster communication as data can be directly written and read from the buffer.

Here’s how you can create a simple worker thread:

const { Worker, isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {
  const worker = new Worker(__filename);
  worker.once('message', (message) => {
    console.log(message); // Prints 'Worker thread: Hello!'
  });
} else {
  parentPort.postMessage('Worker thread: Hello!');
}
Enter fullscreen mode Exit fullscreen mode

In the above example, we're creating a new worker thread and setting up message passing between the main thread and the worker.

Worker Pools

Creating a worker incurs some overhead. For better efficiency, especially in high-load situations, using worker pools could prove useful. A worker pool pre-allocates worker threads, allowing them to be reused, avoiding the cost of recreating them and thus improving performance and resource utilization.

Imagine you have a large dataset that needs to be processed. To process each data point independently using workers, first create a worker file containing the code to be executed by each worker. Let’s call this file worker.js.

const { parentPort } = require('worker_threads');

parentPort.on('message', (data) => {
  // Simulating a CPU-intensive task
  let result = 0;
  for (let i = 0; i < 1e7; i++) {
    result += i;
  }

  // Sending the result back to the main thread
  parentPort.postMessage(result);
});
Enter fullscreen mode Exit fullscreen mode

Worker pools can then be implemented in the main file:

// main.js
const { Worker } = require('worker_threads');

const poolSize = 5;

// Create a worker pool
const workers = [];
for (let i = 0; I < poolSize; i++) {
  workers.push(new Worker('./worker.js'));
}

// Function to get an idle worker
function getWorker() {
  return workers.find(worker => worker.isIdle);
}
Enter fullscreen mode Exit fullscreen mode

Tasks can then be assigned to the workers and the results handled

// main.js
...
function processDataset(dataset) {
  dataset.forEach(data => {
    const worker = getWorker();
    if (worker) {
      worker.isIdle = false;
      worker.once('message', (result) => {
        worker.isIdle = true;
        console.log(`Result: ${result}`);
      });
      worker.postMessage(data);
    }
  });
}

const dataset = new Array(100).fill(null);
processDataset(dataset);
Enter fullscreen mode Exit fullscreen mode

Now that you understand how these modules work, it's important to understand when NOT to use them. Despite their powerful capabilities, these modules should not be used for every task in your application. Worker threads, though ideal for CPU-bound tasks, can add unnecessary overhead for I/O-bound tasks, which are already efficiently handled by Node.js. Similarly, spawning child processes indiscriminately can lead to high memory consumption and system resource exhaustion, as each process carries a significant overhead in terms of system resources. In fact, for many web applications, the default single-threaded, event-driven model of Node.js is more than sufficient.

Furthermore, remember that since all threads of a process share the same heap, you will have to manage data access carefully to avoid race conditions. A race condition occurs when two or more threads can access shared data and try to change it at the same time. This can lead to unpredictable results as one thread may be modifying the data while another is trying to read it. Similarly, while child processes don't share the same heap, they still communicate with the parent process via Inter-Process Communication (IPC), and improper synchronization or mishandling of this communication can also lead to data inconsistencies or race conditions. Therefore, when utilizing these parallelism tools in Node.js, always ensure that shared resources are adequately synchronized, and carefully consider thread-safe strategies to prevent potential data corruption or unexpected application behaviors.

That’s all for now! The article Node.js: Beyond the Basics provides a wealth of helpful insights and information to help you understand the non-blocking, event-driven architecture which Node.js is built with.

Top comments (0)