Caio Borghi

Posted on Dec 11, 2023

Inside Node.js: Exploring Asynchronous I/O

#node #javascript #programming #linux

Introduction
How Node Handles Asynchronous Code
Asynchronous Operations: What Are They?
- Blocking vs Non-Blocking Asynchronous Operation
Experiments with Blocking Functions
Experiments with Non-Blocking Functions
Non-Blocking Asynchronous Operations and OS
- Understanding File Descriptors
  - What is a FD?
  - FD and Non-Blocking I/O
- Monitoring FDs with syscalls
Conclusion

Introduction

Recently, I've been studying asynchronous code execution in Node.js.

I ended up learning (and writing) a lot, from an article about how the Event Loop works to a Twitter thread explaining who waits for the http request to finish.

If you want, you can also access the mind map I created before writing this post by clicking here.

Now, let's get to the point!

How Node Handles Asynchronous Code

In Node:

All JavaScript code is executed in the main thread.
The libuv library is responsible for handling I/O (In/Out) operations, i.e., asynchronous operations.
By default, libuv provides 4 worker threads for Node.js.
- These threads will only be used when blocking asynchronous operations are performed, in which case they will block one of the libuv threads (which are OS threads) instead of the main Node execution thread.
There are both blocking and non-blocking operations, and most of the current asynchronous operations are non-blocking.

Asynchronous Operations: What Are They?

Generally, there's confusion when it comes to asynchronous operations.

Many believe it means something happens in the background, in parallel, at the same time, or in another thread.

In reality, an asynchronous operation is an operation that won't return now, but later.

They depend on communication with external agents, and these agents might not have an immediate response to your request.

We're talking about I/O (input/output) operations.

Examples:

Reading a file: data leaves the disk and enters the application.
Writing to a file: data leaves the application and enters the disk.
Network Operations
- HTTP requests, for example.
- The application sends an http request to some server and receives the data.

Blocking vs Non-Blocking Asynchronous Operation

In the modern world, ~~people don't talk to each other~~ most asynchronous operations are non-blocking.

But wait, does that mean:

libuv provides 4 threads (by default).
they "take care of" the blocking I/O operations.
the vast majority of operations are non-blocking.

Seems kind of useless, right?

With this question in mind, I decided to do some experiments.

Experiments with Blocking Functions

First, I tested an asynchronous CPU-intensive function, one of the rare asynchronous blocking functions in Node.

The used code was as follows:



// index.js
import { pbkdf2 } from "crypto";

const TEN_MILLIONS = 1e7;

// CPU-intensive asynchronous function
// Goal: Block a worker thread
// Original goal: Generate a passphrase
// The third parameter is the number of iterations
// In this example, we are passing 10 million
function runSlowCryptoFunction(callback) {
  pbkdf2("secret", "

salt", TEN_MILLIONS, 64, "sha512", callback);
}

// Here we want to know how many worker threads libuv will use
console.log(`Thread pool size is ${process.env.UV_THREADPOOL_SIZE}`);

const runAsyncBlockingOperations = () => {
  const startDate = new Date();
  const runAsyncBlockingOperation = (runIndex) => {
    runSlowCryptoFunction(() => {
      const ms = new Date() - startDate;
      console.log(`Finished run ${runIndex} in ${ms/1000}s`);
    });
  }
  runAsyncBlockingOperation(1);
  runAsyncBlockingOperation(2);
};

runAsyncBlockingOperations();

To validate the operation, I ran the command:



UV_THREADPOOL_SIZE=1 node index.js

IMPORTANT:

UV_THREADPOOL_SIZE: It's an environment variable that determines how many libuv worker threads Node will start.

The result was:



Thread pool size is 1
Finished run 1 in 3.063s
Finished run 2 in 6.094s

That is, with only one thread, each execution took ~3 seconds, and they occurred sequentially. One after the other.

Now, I decided to do the following test:



UV_THREADPOOL_SIZE=2 node index.js

And the result was:



Thread pool size is 2
Finished run 2 in 3.225s
Finished run 1 in 3.243s

With that, it's proven that LIBUV's Worker Threads in Node.js handle blocking asynchronous operations.

But what about the non-blocking ones? If no one waits for them, how do they work?

I decided to write another function to test it.

Experiments with Non-Blocking Functions

The fetch function (native to Node) performs a non-blocking network asynchronous operation.

With the following code, I redid the test of the first experiment:



//non-blocking.js
// Here we want to know how many worker threads libuv will use
console.log(`Thread pool size is ${process.env.UV_THREADPOOL_SIZE}`);

const startDate = new Date();
fetch("https://www.google.com").then(() => {
  const ms = new Date() - startDate;
  console.log(`Fetch 1 returned in ${ms / 1000}s`);
});

fetch("https://www.google.com").then(() => {
  const ms = new Date() - startDate;
  console.log(`Fetch 2 returned in ${ms / 1000}s`);
});

And I executed the script with the following command:



UV_THREADPOOL_SIZE=1 node non-blocking.js

The result was:



Thread pool size is 1
Fetch 1 returned in 0.391s
Fetch 2 returned in 0.396s

So, I decided to test with two threads, to see if anything changed:



UV_THREADPOOL_SIZE=2 node non-blocking.js

And then:



Thread pool size is 2
Fetch 2 returned in 0.402s
Fetch 1 returned in 0.407s

With this, I observed that:

Having more threads running in LIBUV does not help in the execution of non-blocking asynchronous operations.

But then, I questioned again, if no libuv thread is "waiting" for the request to return, how does this work?

My friend, that's when I fell into a gigantic hole of research and knowledge about the operation of:

Non-Blocking Asynchronous Operations and OS

The Operating System has evolved quite a bit over the years to deal with non-blocking I/O operations, this is done through syscalls, they are:

select/poll: These are the traditional ways of dealing with non-blocking I/O and are generally considered less efficient.
IOCP: Used in Windows for asynchronous operations.
kqueue: A method for MacOS and BSD.
epoll: Efficient and used in Linux. Unlike select, it is not limited by the number of FDs.
io_uring: An evolution of epoll, bringing performance improvements and a queue-based approach.

To understand better, we need to dive into the details of non-blocking I/O operations.

Understanding File Descriptors

To explain non-blocking I/O, I need to quickly explain the concept of File Descriptors (FDs).

What is a FD?

It's a numerical index of a table maintained by the kernel, where each record has:

Resource type (such as file, socket, device).
Current position of the file pointer.
Permissions and flags, defining modes like read or write.
Reference to the resource's data structure in the kernel.

They are fundamental for I/O management.

FD and Non-Blocking I/O

When initiating a non-blocking I/O operation, Linux associates an FD with it without interrupting (blocking) the process's execution.

For example:

Imagine you want to read the contents of a very large file.

Blocking approach:

The process calls the read file function.
The process waits while the OS reads the file's content.
- The process is blocked until the OS finishes.

Non-blocking approach:

The process requests asynchronous read.
The OS starts reading the content and returns an FD to the process.
The process isn't locked up and can do other things.
Periodically, the process calls a syscall to check if the reading is finished.

The process decides the mode of reading through the fcntl function with the O_NONBLOCK flag, but this is secondary at the moment.

Monitoring FDs with syscalls

To efficiently observe multiple FDs, OSs rely on some syscalls:

Understanding select:

Receives a list of FDs.
Blocks the process until one or more FDs are ready for the specified operation (read, write, exception).
After the syscall returns, the program can iterate over the FDs to identify those ready for I/O.
Uses a search algorithm that is O(n).
- Inefficient, slow, tiresome with many FDs.

Epoll

An evolution of select, it uses a self-balancing tree to store the FDs, making access time almost constant, O(1).

Pretty fancy!

How it works:

Create an epoll instance with epoll_create.
Associate the FDs with this instance using epoll_ctl.
Use epoll_wait to wait for activity on any of the FDs.
Has a timeout parameter.
- Extremely important and well utilized by the libuv Event Loop!

Io_uring

This is a game-changer.

While epoll significantly improved the performance of searching and handling FDs, io_uring rethinks the entire nature of I/O operations.

And so, after understanding how it works, I wondered how nobody thought of this before!!!

Recapping:

select: Receives a list of FDs, stores them sequentially (like an array), and checks each one for changes or activity, with complexity O(n).
epoll: Receives a list of FDs, stores them using a self-balancing tree, does not check each one individually, is more efficient, and does the same as select but with complexity O(1).

Historically, the process was responsible for iterating over the returned FDs to know which have finished or not.

io_uring: What? Return a list? Do polling? Are you kidding? Ever heard of queues?

It works using two main queues, in the form of rings (hence the name io-ring).

1 for submitting tasks.
1 for completed tasks.

Simple, right?

The process, when starting an I/O operation, queues the operation using the io_uring structure.

Then, instead of calling select or epoll and iterating over the returned FDs, the process can choose to be notified when an I/O operation is completed.

Polling? No. Queues!

Conclusion

With this knowledge, I now understand precisely the path Node takes to perform an asynchronous operation.

If it's blocking:

Executes the asynchronous operation using libuv.
Adds it to a libuv worker thread.
The worker thread is blocked, waiting for the operation to finish.
Once finished, the thread is responsible for placing the result in the Event Loop in the MacroTasks queue.
The callback is executed on the main thread.

If it's non-blocking:

Executes the asynchronous operation using libuv.
Libuv performs a non-blocking I/O syscall.
Performs polling with the FDs until they resolve (epoll).
From version 20.3.0, uses io_uring.
- Queue-based approach for submission/completed operations.
Upon receiving the event of operation completion:
- libuv takes care of executing the callback on the main thread.

Top comments (2)

Dusan Petkovic • Dec 11 '23

Thanks for the great article, I feel like I need a few days to digest this :D also wonder how the whole async operations compare to the browser env

Caio Borghi • Dec 12 '23

Thanks for the feedback Dusan, feel free to ask me any questions.

About how does it compares to the browser, I would need more deep study on how the browser handles network requests under the hood. I know it uses the famous Web APIs but not sure how they bind to the OS

DEV Community