DEV Community

Cover image for Node.js is Not Single-Threaded
Evgenii Tkachenko
Evgenii Tkachenko

Posted on

Node.js is Not Single-Threaded

Node.js is known as a blazingly fast server platform with its revolutionary single-thread architecture, utilizing server resources more efficiently. But is it actually possible to achieve that amazing performance using only one thread? The answer might surprise you.

In this article we will reveal all the secrets and magic behind Node.js in a very simple manner.

Process vs Thread ⚙️

Before we begin, we have to understand what a process and a thread are, and discover their differences and similarities.

A process is an instance of a program that is currently being executed. Each process runs independently of others. Processes have several substantial resources:

  • Execution code;
  • Data Segment - contains global and static variables that needs to be accessible from any part of the program;
  • Heap - dynamic memory allocation;
  • Stack - local variables, function arguments and function calls;
  • Registers - small, fast storage locations directly within CPU used to hold data temporarily during execution of programs (like program pointer and stack pointer).

A thread is a single unit of execution within a process. There might be multiple threads within the process performing different operations simultaneously. The process share execution code, data and heap with threads, but stack and registers are allocated separately for each thread.

Process and Thread diagram

JavaScript is Not Threaded ❗️

To avoid misunderstanding of terms, it's important to note that JavaScript itself is neither single-threaded nor multi-threaded. The language has nothing to do with threading. It's just a set of instructions for the execution platform to handle. The platform handle these instructions in its own way - whether in a single-threaded or multi-threaded manner.

I/O operations 🧮

(Or Input / Output operations) are generally considered to be slower compared to other computer operations. Here are some examples:

  • write data to the disk;
  • read data from the disk;
  • wait for user input (like mouse click);
  • send HTTP request;
  • performing a database operation.

I/O's are Slow 🐢

You might be wondering why reading data from disk is considered slow? The answer lies in the physical implementation of hardware components.

Accessing the RAM is in the order of nanoseconds, while accessing data on the disk or the network is in the order of milliseconds.

The same applies to the bandwidth. RAM has a transfer rate consistently in the order of GB/s, while the disk or network varies from MB/s to optimistically GB/s.

On top of that, we have to consider the human factor. In many circumstances, the input of an application comes from a real person (like, a key press). So the speed and frequency of I/O doesn't only depend on technical aspects.

I/O's Block the Thread 🚧

I/O's can significantly slow down a program. The thread remains blocked, and no further operations will be executed until the I/O is completed.

Blocking operations

Create More Threads! 🤪

Okay, why not just spawn more threads inside the program and handle each request separately? Well, it seems like a good idea. Now, each client request has its own thread, and the server can handle multiple requests simultaneously.

Multithreaded program

Them program needs to allocate additional memory and CPU resources for each thread. This sounds reasonable. However, a significant issue arises when threads perform I/O operations - they become idle and spend most of their time using 0% of resources, waiting for the operation to complete. The more threads there are, the more resources are inefficiently utilized.

On top of that, managing threads is a challenging tasks leading to potential issues such as race conditions, deadlocks, and livelocks. The operating system needs to switch between threads, which can add overhead and reduce the efficiency gains from multithreading.

What the Solution? 🤔

Luckily, humanity has already invented smart mechanisms to perform these kinds of operations in an efficient manner.

Welcome the Event Demultiplexer. It involves a process called Multiplexing - method by which signals are combined into one signal over a shared resource. The aim is to share a scarce resource (in our case it's CPU and RAM). For example, in telecommunications, several telephone calls may be carried using one wire.

Multiplexing Diagram

The responsibilities of the Event Demultiplexer are divided into the following steps:

  • Identify event Sources. Each source can generate events;
  • Register event Sources. The registration involves specifying which events to monitor for each source;
  • Wait for events;
  • Send event notification.

Important! The Event Demultiplexer is not a component or device that exist in real world. It's more like a theoretical model used to explain how to handle numerous simultaneous events efficiently.

To understand this complex process, let's go back to the past. Imagine an old phone switchboard: it identifies and registers sources of events (phones) and waits for new events (calls). Once there is a new event (a phone call), the switchboard delivers a notification (lights up a bulb). Then, the switchboard operator reacts to the notification by checking the target phone number and forwarding the call to its desired destination.

Old Telephone Switch Board

For computers, the principle is the same. However, the role of sources is played by things such as file descriptors, network sockets, timers, or user input devices. Each source can generate events like data available to read, space available to write, or connection requests.

Each operating system has already implemented the Event Demultiplexer mechanism: epoll (Linux), kqueue (macOS), event ports (Solaris), IOCP (Windows).

But Node.js is crossplatform. To govern this entire process while supporting cross-platform I/O, there is an abstraction layer that encapsulates these inter-platform and intra-platform complexities and expose a generalized API for the upper layers of Node.

Libuv the King 🏆

Libuv logo

Welcome libuv - a cross-platform library (written in C) originally developed for Node.js to provide a consistent interface for non-blocking I/O across various operating systems. Libuv not only interfaces with the system's Event Demultiplexer but also incorporates two important components: the Event Queue and the Event Loop. These components work together to efficiently handle concurrent non-blocking resources

The Event Queue is a data structure where all events are placed by the Event Demultiplexer, ready to be enqueued and processed sequentially by the Event Loop until the queue is empty.

The Event Loop is a continuously running process that waits for messages in the Event Queue and then dispatches them to the appropriate handlers.

Problem Solved? 🥳

This is what happens when we call an I/O operation:

  1. Libuv initializes the appropriate event demultiplexer depending on the operating system;
  2. The Node.js interpreter scans the code and puts every operation into the call stack;
  3. Node.js sequentially executes operations in the call stack. However, for I/O operations, Node.js sends them to the Event Demultiplexer in a non-blocking way. This approach ensures that the I/O operation does not block the thread, allowing other operations to be executed concurrently.
  4. The Event Demultiplexer identifies the source of the I/O operation and registers the operation using the OS's facilities;
  5. The Event Demultiplexer continuously monitors the source (e.g., network sockets) for events (e.g., when data is available to read); When the event occurs (such as data becoming available to read), the 6. Event Demultiplexer signals and adds the event with the associated callback to the Event Queue;
  6. The Event Loop continuously checks the Event Queue and processes the event callback.

What Node.js does is that while one request is waiting, it can handle another request. Node.js does not wait for a request to complete before processing all other requests. By default, all requests you make in Node.js are concurrent - they do not wait for other requests to finish before executing.

Libuv diagram

Hooray! It seems like the problem is solved. Node.js can run efficiently on a single thread since most of the complexities of blocking I/O operations have been solved by OS developers. Thank you!

Problem is NOT Solved 🫠

But if we take a closer look at the libuv structure, we find an interesting aspect:

Structure of libuv

Wait, Thread Pool? What? Yes, now we've delved deep enough to answer the main question - Why Node.js is not (entirely) single-threaded?

Unveiling The Secret 🤫

Okay, we have a powerful tool and OS utilities that allow us to run asynchronous code in a single thread.

But here is a problem with Event Demultiplexer. Since the implementation of the Event Demultiplexer on each OS is different, some parts of I/O operations are not fully supported in terms of asynchrony. It is difficult to support all the different types of I/O in all the different types of OS platforms. Those issues are especially related to the file I/O implementations. This also has an impact on some of Node.js's DNS functions.

Not only that. There are other types of I/O's that can not be completed in asynchronous manner, like:

  • DNS Operations, like dns.lookup can block because they might need to query a remote server;
  • CPU-bound tasks, like cryptography;
  • ZIP compression.

For these kinds of cases, the thread pool is used to perform the I/O operations in separate threads (typically there are 4 threads by default). So, the complete Node.js architecture diagram would look like this:

Libuv diagram

Yes, Node.js itself is single-threaded, but the libraries it uses internally, such as libuv with its thread pool for some I/O operations, are not.

The Thread Pool, in conjunction with the Tasks Queue, is used to handle blocking I/O operations. By default, the Thread Pool includes 4 threads, but this behavior can be modified by providing additional environment variable:

UV_THREADPOOL_SIZE=8 node my_script.js
Enter fullscreen mode Exit fullscreen mode

This is what happens when an I/O operation cannot be performed asynchronously, but the key differences are:

  1. When the Event Demultiplexer identifies the source of I/O operation it registers the operation in the Tasks Queue;
  2. The Thread Pool continuously monitors the Tasks Queue for new tasks;
  3. When a new task is placed in the Tasks Queue, the Thread Pool reacts by handling it with one of the pre-defined threads asynchronously;
  4. After finishing the operation, the Thread Pool signals and adds the event with the associated callback to the Event Queue.

There is no magic here. I/O cannot be actually non-blocking and there is no way to achieve that (at least for now). Data cannot be transferred faster that it dictated by physics constraints. Nothing is perfect, so until we find ways to increase data transfer speeds at the hardware level, we use a set of optimised algorithms to perform asynchronous operations in the most efficient way possible.

Thank you for reading and have a wonderful day :)

Top comments (11)

Collapse
 
derekmurawsky profile image
Derek Murawsky

Don't forget that the main event loop in Node can still get backlogged. This can result in skipping garbage collection which can appear like a memory leak and even shut everything down if you're not careful. So while you can federate out the work, the central thread can still get bogged down.

Collapse
 
tmlr profile image
Tony Miller

There’s no “main event loop” in Node. There’s default event loop in libuv.

Collapse
 
derekmurawsky profile image
Derek Murawsky • Edited

I mean, there are many folks that would argue with that. Example
I will freely admit that I don't know these intricacies well, especially since this came from a debugging session for work many years ago for a crypto service written in JS/Node. It received so many requests that it appeared garbage collection could not run reliably and that it appeared like we had a memory leak when we did not. We had several devs look at it during this time as it was high profile, and that was the conclusion we came to, and our monitoring support it.

Thread Thread
 
tmlr profile image
Tony Miller

“It received so many requests”.

This is the key, than means that the stage of libvuv event loop that calls handlers (C level handlers that is) was running all of them and there were so many, that the event loop didn’t progress to further stages.

I didn’t dig further into when node is triggering GC. Would be interesting to see if GC is tied to certain stages.

Collapse
 
tmlr profile image
Tony Miller

Great article!
The only nuance is that diagrams look like libuv call into V8 directly, which is not the case. Node sets up the handlers and those handlers prepare stack and state for V8 to run and then run V8.

The reason this is important is because a lot of people have a notion of having V8 run continuously which is not the case. We enter there and exit from there though Node’s handlers for various libuv stages.

Collapse
 
adaptive-shield-matrix profile image
Adaptive Shield Matrix

That about Bun?
It feels like nodejs is completely superseded by Bun,
at least it has in all of my use cases.

Collapse
 
cookiemonsterdev profile image
Mykhailo Toporkov 🇺🇦

It would be worth to mention that UV_THREADPOOL_SIZE is actually limited for extention by your CPU logic threads amount.

Collapse
 
evgenytk profile image
Evgenii Tkachenko

That's a good one, thanks!

Collapse
 
perisicnikola37 profile image
Nikola Perišić

Wow, great article. Thank you for sharing!

Collapse
 
aladinyo profile image
Aladinyo

Amazing article, also if we have CPU extensive tasks to do, then we can simply run them with C++ as NodeJS is very well integrated with C++

Collapse
 
martinbaun profile image
Martin Baun

It is a good thing we have threads now so that node can be a little more versatile when cpu intensive tasks are needed, great writeup Tkachenko :)