DEV Community

Cover image for Deep dive into Node.js Architecture
Altamash Ali
Altamash Ali

Posted on • Edited on

Deep dive into Node.js Architecture

In this article, we are going to deep dive into Node.js architecture and understand the asynchronous nature of Node.js.

Let's dive into it.

Node.js is a single-threaded, asynchronous, event-driven runtime environment for running Javascript code on the server.

By Single-threaded means Javascript runtime executes only one piece of code( or statement) at any instance of time synchronously. It has only one call stack and one heap memory. But then how does runtime handles multiple asynchronous operations in an efficient way ? Node.js handles it efficiently using its event-driven approach. Don't worry about that now. We will come back to it soon :) .

I/O(input/output) is the slowest among the fundamental operations of a computer. It involves accessing data on the disk, reading and writing a file, waiting for user input, doing a network call, performing some database operation etc. It adds a delay between the moment the request is sent to the device and the moment the operation completes.

In tradition blocking I/O programming, the function call corresponding to an I/O request will block the execution of the thread until the operation completes. So, any web server that is implemented using blocking I/O will not be able to handle multiple connections in the same thread. Solution to this problem is using a separate thread( or process) to handle each concurrent connection.

Most modern operating systems support another mechanism to access resources which is called non-blocking I/O where the system call always return immediately without waiting for the I/O operation to complete. To handle concurrent non-blocking resources in an efficient way, it uses a mechanism called synchronous event demultiplexing or event notification interface. The synchronous event demultiplexing watches multiple resources and returns a new event (or set of events) when a read or write operation executed over one of those resources completes. The advantage here is that the synchronous event demultiplexer is synchronous so it blocks until there are new events to process.

Pseudocode of an algorithm that uses a generic synchronous event demultiplexer to read from two different resources:

Image description
Let's see what is happening in the above code snippet:

  1. The resources are added to a data structure ( in our case watchedList), associating each one with a specific operation ( e.g. read)

  2. The demultiplexer is set up with the group of resources to be watched. The call to demultiplexer.watch() is synchronous and blocks until any of the watched resources are ready for read. When this occurs, the event demultiplexer returns from the call and a new set of events is available to be processed.

  3. Each event returned by the event demultiplexer is processed. At this point, the resource associated with each event is guaranteed to be ready to read and to not block during the operation. When all the events are processed, the flow will block again on the event demultiplexer until new events are again available to be processed. This is called the mysterious event loop.

You might notice with this pattern that we can handle several I/O operations inside a single thread. Reason we are talking about demultiplexing as using just a single thread, we can deal with multiple resources.

Multithreaded network applications handle the network load like this:

request ---> spawn a thread
---> wait for database request
----> answer request

request ---> spawn a thread
---> wait for database request
----> answer request

request ---> spawn a thread
---> wait for database request
----> answer request

so the thread spend most of their time using 0% CPU waiting for the database to return data. While doing so they have had to allocate memory required for a thread which includes a complete separate program stack for each thread etc. Also they would have to start a thread which while is not as expensive as starting a full process is still not exactly cheap.

Since, we spend most of our time using 0% CPU, why not run some code when we are not using CPU ? That way, each request will still get the same amount of CPU time as multithreaded applications but we don't need to start a thread. so this is what happens in a single threaded environment:

request -> make DB req
request -> make DB req
request -> make DB req
DB req complete -> send response
DB req complete -> send response
DB req complete -> send response

Image description

We can see that using only one thread doesn't impair our ability to run multiple I/O bound tasks concurrently. The tasks are spread over time, instead of being spread across multiple threads.

Let me now introduce the reactor pattern which is the heart of Node.js.

The main idea behind the reactor pattern is to have a handler associated with each I/O operation. A handler in Node.js is represented by a callback function.The handler will be invoked as soon as an event is produced and processed by the event loop. So, The reactor pattern handles I/O by blocking until new events are available from a set of observed resources and then reacts by dispatching each event to an associated handler.

The structure of the reactor pattern is shown below:

Image description

  1. The application generates a new I/O operation and request will be submitted to Event Demultiplexer. The application also specifies a handler, which will be invoked when the operation completes. Submitting a new request to the Event Demultiplexer is a non-blocking operation and it returns control to the application immediately.

  2. When a set of I/O operations completes, the Event Demultiplexer pushes a set of corresponding events into the Event Queue.

  3. After receiving a set of events from Event Demultiplexer, event loop iterates over the items of the Event Queue.

  4. Handler associated with each handler is invoked.

  5. The handler which is part of the application code, gives control back to the Event loop when its execution completes(a).
    While the handler execute, it can request new asynchronous operations, which in turn new items are added to the Event Demultiplexer(b).

  6. When all the items in the Event Queue are processed, the Event loop blocks again on the Event Demultiplexer, which then triggers another cycle when a new event is available.

A Node.js application will exit when there are no more pending operations in the event demultiplexer and no more events to be processed inside the event queue.

Each OS has its own interface for the event demultiplexer and each I/O operation can behave quite differently depending on the type of resource, even within the same OS.

  • To handle these inconsistencies, Node.js core team created a native library called libuv which is written in C++.
  • Libuv represents the low-level I/O engine of Node.js. It is a higher-level abstraction for the OS event demultiplexer, which make Node.js compatible with all the major operation systems and normalise the non-blocking behaviour of the different types of resource.
  • It also implements the reactor pattern, thus providing an API for creating event loops, managing event queue, running asynchronous I/O operations and queuing other type of tasks.
  • Internally libuv maintains a thread pool for managing I/O operations as well as CPU-intensive operations like crypto and zlib. This is a pool of finite size where I/O operations are allowed to happen. If the pool only contains four threads, then only four files can be read at the same time.

The final high level architecture of Nodejs includes:

Image description

  • A set of bindings responsible for wrapping and exposing libuv and other low level functionalities to Javascript.

  • V8, the Javascript engine originally developed by Google for the Chrome browser. This is one of the reason why Node.js is so fast and efficient.

  • A core Javascript Library that implements the high-level Node.js API.

Conclusion
Node.js architecture is one of the hot topic for backend interviews. Having a deep understanding of Node.js asynchronous nature is a must for all Node.js devs for writing code efficiently. I really hope you have enjoyed reading this article. I would really recommend Node.js Design patterns book if you want to learn more about Node.js. In the next article, we will talk more about event loop.

References:

  1. Node.js Design Patterns by Mario Casciaro and Luciano Mammino

  2. Stack-overflow definitely :)

See you guys. Bye :)

Top comments (7)

Collapse
 
gabrielaquino profile image
Gabriel Aquino Castelo Branco • Edited

Awesome, Altamash, thanks a lot. I'm learning NodeJS. I'm from Brazil and I'm wanting to write (on the future xD) a paper in Portuguese about NodeJS architecture. If I do it, could I mention you on and use pieces of your article? (My english is not good yet, sorry)

Collapse
 
altamashali profile image
Altamash Ali

Thank you for appreciation. Definitely you can mention this article for reference 🙂.

Collapse
 
miladtehrany profile image
Milad Tehrany

Awesome, thanks

Collapse
 
mnepita profile image
Martin Nepita

Such of great article, thanks !

Collapse
 
altamashali profile image
Altamash Ali

Thank you 🙂 I am glad you like it!

Collapse
 
snigarora profile image
snigdha

Nice one Altamash!

Collapse
 
altamashali profile image
Altamash Ali

Thank you very much Snigdha :)