loading...
Cover image for Everything you need to know about Node.js

Everything you need to know about Node.js

jorge_rockr profile image Jorge Ramón Updated on ・9 min read

Node.js is one of the most popular technologies nowadays to build scalable and efficent REST API's. It is also used to build hybrid mobile applications, desktop applications and even Internet of Things.

I have been working with Node.js for about 6 years and I really love it. This posts tries to be an ultime guide to understand how Node.js works.

Let's get started!!

Table of Contents

The World Before Node.js

Multi Threaded Server

Web applications were written in a client/server model where the client would demand resources from the server and the server would respond with the resources. The server only responded when the client requested and would close the connection after each response.

This pattern is efficient because every request to the server takes time and resources (memory, CPU, etc). To attend the next request the server must complete the previous one.

So, the server attends one request at time? Well not exactly, when the server gets a new request, the request will be processed by a thread.

A thread in simple words is time and resources the CPU gives to execute a small unit of instructions. With that said, the server attends multiple requests at once, one per thread (also called thread-per-request model).

To attend N requests at once, the server needs N threads. If the server gets the N+1 request, then it must wait until any of those N threads is available.

In the Multi Threaded Server example, the server allows up to 4 requests (threads) at once and when it receives the next 3 requests, those requests must wait until any of those 4 threads is available.

A way to solve this limitation is add more resources (memory, CPU cores, etc) to the server but maybe it's not a good idea at all...

And of course, there will be technological limitations.

Blocking I/O

The number of threads in a server isn't the only problem here. Maybe you are wondering why a single thread can't attend 2 or more request at once? That's because blocking Input/Output operations.

Suppose you are developing an online store and it needs a page where the user can view all your products.

The user access to http://yourstore.com/products and the server renders an HTML file with all your products from database. Pretty simple right?

But, what happens behind?...

  1. When the user access to /products a specific method or function needs to be executed to attend the request, so a little piece of code (maybe yours or framework's) parses the requested url and searches for the right method or function. The thread is working. ✔️

  2. The method or function is executed, as well as the first lines. The thread is working. ✔️

  3. Because you are a good developer, you save all system logs in a file and of course, to be sure the route is executing the right method/function you log a "Method X executing!!" string, that's a blocking I/O operation. The thread is waiting.

  4. The log is saved and the next lines are being executed. The thread is working again. ✔️

  5. It's time to go to the database and get all products, a simple query such as SELECT * FROM products does the job but guess what? that's a blocking I/O operation. The thread is waiting.

  6. You get an array or list of all products but to be sure you log them. The thread is waiting.

  7. With those products it's time to render a template but before render it you need to read it first. The thread is waiting.

  8. The template engine does it's job and the response is sent to the client. The thread is working again. ✔️

  9. The thread is free, like a bird. 🕊️

How slow are I/O operations? Well, it depends.
Let's check the table below:

Operation Number of CPU ticks
CPU Registers 3 ticks
L1 Cache 8 ticks
L2 Cache 12 ticks
RAM 150 ticks
Disk 30,000,000 ticks
Network 250,000,000 ticks

Disk and Network operations are too slow. How many queries or external API calls does your system make?

In resume, I/O operations make threads wait and waste resources.

The C10K Problem

The Problem

In the early 2000s, servers and client machines were slow. The problem was about concurrently handling 10,000 clients connections on a single server machine.

But why our traditional thread-per-request model can't solve the problem? Well, let's do some math.

The native thread implementations allocate about 1 MB of memory per thread, so 10k threads require 10GB of RAM just for the thread stack and remember we are in the early 2000s!!

Nowadays servers and client machines are better than that and almost any programming language and/or framework solves the problem. Actually, the problem has been updated to handle 10 million clients connections on a single server machine (also called C10M Problem).

Javascript to the rescue?

Spoiler alert 🚨🚨🚨!!
Node.js solves the C10K problem... but why?!

Javascript server-side wasn't new in the early 2000s, there were a few implementations ontop of the Java Virtual Machine like RingoJS and AppEngineJS, based on thread-per-request model.

But if that didn't solve the C10K problem then why Node.js did?! Well, it's because Javascript is single threaded.

Node.js and the Event Loop

Node.js

Node.js is a server-side platform built on Google Chrome's Javascript Engine (V8 Engine) which compiles Javascript code into Machine code.

Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. It's not a Framework, it's not a Library, it's a runtime environment.

Let's write a quick example:

// Importing native http module
const http = require('http');

// Creating a server instance where every call
// the message 'Hello World' is responded to the client
const server = http.createServer(function(request, response) {
  response.write('Hello World');
  response.end();
});

// Listening port 8080
server.listen(8080);

Non-blocking I/O

Node.js is non-blocking I/O, which means:

  1. The main thread won't be blocked in I/O operations.
  2. The server will keep attending requests.
  3. We will be working with asynchronous code.

Let's write an example, in every /home request the server sends a HTML page, otherwise the server sends 'Hello World' text. To send the HTML page is necessary to read the file first.

home.html

<html>
  <body>
    <h1>This is home page</h1>
  </body>
</html>

index.js

const http = require('http');
const fs = require('fs');

const server = http.createServer(function(request, response) {
  if (request.url === '/home') {
    fs.readFile(`${ __dirname }/home.html`, function (err, content) {
      if (!err) {
        response.setHeader('Content-Type', 'text/html');
        response.write(content);
      } else {
        response.statusCode = 500;
        response.write('An error has ocurred');
      }

      response.end();
    });
  } else {
    response.write('Hello World');
    response.end();
  }
});

server.listen(8080);   

If the requested url is /home then using fs native module we read the home.html file.

The functions passed to http.createServer and fs.readFile are called callbacks. Those functions will execute sometime in the future (the first one when the server gets a request and the second one when the file has been read and the content is buffered).

While reading the file Node.js can still attend requests, even to read the file again, all at once in a single thread... but how?!

The Event Loop

The Event Loop is the magic behind Node.js. In short terms, the Event Loop is literally an infinite loop and is the only thread available.

Libuv is a C library which implements this pattern and it's part of the Node.js core modules. You can read more about libuv here.

The Event Loop has six phases, the execution of all phases is called a tick.

  • timers: this phase executes callbacks scheduled by setTimeout() and setInterval().
  • pending callbacks: executes almost all callbacks with the exception of close callbacks, the ones scheduled by timers, and setImmediate().
  • idle, prepare: only used internally.
  • poll: retrieve new I/O events; node will block here when appropriate.
  • check: setImmediate() callbacks are invoked here.close callbacks: such as socket.on(‘close’).

Okay, so there is only one thread and that thread is the Event Loop, but then who executes the I/O operations?

Pay attention 📢📢📢!!!
When the Event Loop needs to execute an I/O operation it uses an OS thread from a pool (through libuv library) and when the job is done, the callback is queued to be executed in pending callbacks phase.

Isn't that awesome?

The Problem with CPU Intensive Tasks

Node.js seems to be perfect, you can build whatever you want.

Let's build an API to calculate prime numbers.

A prime number is a whole number greater than 1 whose only factors are 1 and itself.

Given a number N, the API must calculate and return the first N prime numbers in a list (or array).

primes.js

function isPrime(n) {
  for(let i = 2, s = Math.sqrt(n); i <= s; i++)
    if(n % i === 0) return false;
  return n > 1;
}

function nthPrime(n) {
  let counter = n;
  let iterator = 2;
  let result = [];

  while(counter > 0) {
    isPrime(iterator) && result.push(iterator) && counter--;
    iterator++;
  }

  return result;
}

module.exports = { isPrime, nthPrime };

index.js

const http = require('http');
const url = require('url');
const primes = require('./primes');

const server = http.createServer(function (request, response) {
  const { pathname, query } = url.parse(request.url, true);

  if (pathname === '/primes') {
    const result = primes.nthPrime(query.n || 0);
    response.setHeader('Content-Type', 'application/json');
    response.write(JSON.stringify(result));
    response.end();
  } else {
    response.statusCode = 404;
    response.write('Not Found');
    response.end();
  }
});

server.listen(8080);

prime.js is the prime numbers implementation, isPrime checks if given a number N, that number is prime and nthPrime gets the nth prime (of course).

index.js creates a server and uses the library in every call to /primes. The N number is passed through query string.

To get the first 20 prime numbers we make a request to http://localhost:8080/primes?n=20.

Suppose there are 3 clients trying to access this amazing non-blocking API:

  • The first one requests every second the first 5 prime numbers.
  • The second one requests every second the first 1,000 prime numbers.
  • The third one requests once the first 10,000,000,000 prime numbers, but...

When the third client sends the request the main thread gets blocked and that's because the prime numbers library is CPU intensive. The main thread is busy executing the intensive code and won't be able to do anything else.

But what about libuv? If you remember this library helped Node.js to do I/O operations with OS threads to avoid blocking the main thread and you are right, that's the solution to our problem but to use libuv our library must be written in C++ language.

Thanksfully Node.js v10.5 introduced the Worker Threads.

Worker Threads

As the documentation says:

Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.

Fixing the code

It's time to fix our initial code:

primes-workerthreads.js

const { workerData, parentPort } = require('worker_threads');

function isPrime(n) {
  for(let i = 2, s = Math.sqrt(n); i <= s; i++)
    if(n % i === 0) return false;
  return n > 1;
}

function nthPrime(n) {
  let counter = n;
  let iterator = 2;
  let result = [];

  while(counter > 0) {
    isPrime(iterator) && result.push(iterator) && counter--;
    iterator++;
  }

  return result;
}

parentPort.postMessage(nthPrime(workerData.n));

index-workerthreads.js

const http = require('http');
const url = require('url');
const { Worker } = require('worker_threads');

const server = http.createServer(function (request, response) {                                                                                              
  const { pathname, query } = url.parse(request.url, true);

  if (pathname === '/primes') {                                                                                                                                    
    const worker = new Worker('./primes-workerthreads.js', { workerData: { n: query.n || 0 } });

    worker.on('error', function () {
      response.statusCode = 500;
      response.write('Oops there was an error...');
      response.end();
    });

    let result;
    worker.on('message', function (message) {
      result = message;
    });

    worker.on('exit', function () {
      response.setHeader('Content-Type', 'application/json');
      response.write(JSON.stringify(result));
      response.end();
    });
  } else {
    response.statusCode = 404;
    response.write('Not Found');
    response.end();
  }
});

server.listen(8080);

index-workerthreads.js in every call creates a new instance of Worker class (from worker_threads native module) to load and execute the primes-workerthreads.js file in a worker thread. When the prime numbers' list is calculated the message event is fired, sending the result to the main thread and because the job is done the exit event is also fired, letting the main thread send the data to the client.

primes-workerthreads.js changes a little bit. It imports workerData (parameters passed from main thread) and parentPort which is the way we send messages to the main thread.

Now let's do the 3 clients example again to see what happens:

The main thread doesn't block anymore 🎉🎉🎉🎉🎉!!!!!

It worked like expected but spawning worker threads like that isn't the best practice, it isn't cheap to create a new thread. Be sure to create a pool of threads before.

Conclusion

Node.js is a powerful technology, worth to learn.
My recommendation is always be curious, if you know how things work, you will make better decisions.

That's all for now, folks. I hope you learned something new about Node.js.
Thanks for reading and see you in the next post ❤️.

Posted on Sep 26 '18 by:

jorge_rockr profile

Jorge Ramón

@jorge_rockr

Full Stack Developer at @Globant. Javascript is all that you need.

Discussion

markdown guide
 

Thanks for the great post!

Let's say I need a worker to do heavy CPU calculations. How does the worker works under the hood? Does it start on a separate CPU core and uses this core by 100%?
What if a CPU has only one core? Will Node worker help with this case or it'll be useless to start a worker, because there are no free resources to split between the main thread and worker thread?

 

Thanks for commenting.

Node.js by default works in a single CPU core, so worker thread will spawn and execute in the same CPU core that the main thread does.

If you want to deploy a Node.js application across all CPU cores you need to write some code using cluster native module. Thanksfully there is a library called PM2 which does the dirty work for you and deploys a Node.js application in all CPU cores with its built in load balancer.

The worker thread real problem is spawn like crazy, since creating a subthread (worker thread) isn't cheap (talking about CPU time and resources) but it's cheaper than fork the same process.

 

Well, if a separate worker and main thread are executed in one CPU core, then they share the same resource.
Is it correct that if we write heavy CPU code in "chunks", i.e. we will return execution context from a heavy function to the function which handles http requests with some interval, then we will emulate workers? But in this case it's not necessary for Node to allocate resources for a subthread, i.e. it's cheaper in terms of performance.

Remember Worker as OS thread actually so I don't think they will be emulated in a single CPU core case. Anyway that's an interesting question but unfortunately workers are pretty "new" so there isn't enough information about them.

 

@jorge_rockr amazing article, i learned a lot about nodejs.
Question: I am building a mobile-responsive website, and my stack has Nuxt.js (vue.js framework) for the front end, talking to back-end APIs in Laravel (php framework). Nuxt.js is built on top of node (I think?) and I am using it for server-side-rendering. What are your thoughts on this architecture, versus using something purely in node.js?

 

Hi! Thanks you so much for reading my article.

Yeah, Nuxt.js and Next.js (React) are built on top of Node.js and it's cool since you can use your Frontend knowledge adding some Backend code.

In fact, using pure Node.js could have a better performance but it could be more complex and hard to maintain.

 

@jorge_rockr it was a phenomenal article, thank you for writing it.

Thanks for the recommendation. Sounds like I should stick with back-end APIs in Laravel, and use Nuxt for my front end. Hopefully I should see a lot of the user-benefits one can get from using Node with server-side-rendering for SEO.

 

Thanks for this clear article on how the things are done in background using nodejs....I have some question : what is the difference between worker threads and os threads....in prime number example why os threads are not used by default by libuv when doing CPU heavy task...what is the limit for the os threads...

 

As said in the article, your code must be written in C++ in order to be able to use the OS threads pool.

Hence the introduction of worker threads which can be used directly in JS.

 

Yeah, he is right :D.

That's why DB drivers are written in C++.

 

Node.js is also a variety of ready-to-use packages written for the Node.js, thanks to which we can easily connect to almost any service or database. The most common Node itechcraft.com/node-js/ integration dialect, however, is JSON, which speaks well with the NoSQL databases.

 

Jorge Ramón, thank you for this awesome article!)

The company I am working at, in January-February 2020 starts the open-source project for Node.js developers (microservices)!
Warm welcome🥳
Spectrum: spectrum.chat/yap?tab=posts (community chat)
GitBook: manual.youngapp.co/community-edition/ (docs)
Twitter: twitter.com/youngapp_pf (news)
GitHub: github.com/youngapp/yap (docs)
(click🌟star to support us and stay connected🙌)

 

Thank you for sharing this awesome post! Node.js rocks🤟

My team just completed an open-sourced Content Moderation Service built Node.js, TensorFlowJS, and ReactJS that we have been working over the past weeks. We have now released the first part of a series of three tutorials - How to create an NSFW Image Classification REST API and we would love to hear your feedback. Any comments & suggestions are more than welcome. Thanks in advance! 😊

 

Wow! Amazing Article!
Thanks for sharing ❤️

 

Amazing! I'm surprised about how much you know about the internal process of Node.js, Can you suggest me some resources (aside of documentation) to learn advanced concepts?

 
 

Increible explicacion antes de este post no tenia ni la mas remota idea de como funcionaba Node.js en realidad, excelente

 
 

Yes, it's very powerful, but how to build dynamic sites that uses mysql db like PHP?

 

If you want to build a site in a PHP style you need a template engine.

There are many templates engines out there. It's better to use Express Framework and any of the following templates engines:

expressjs.com/en/guide/using-templ...

But if you really love PHP you can use it with Express Framework too:

npmjs.com/package/php-express

 
 

As someone who is new to coding and just beginning to learn Node.js, this article was super helpful to get an overview of how Node.js works and why it's beneficial. Thanks for sharing!

 

Welcome! I'm glad it was useful for you

 
Sloan, the sloth mascot Comment marked as low quality/non-constructive by the community View code of conduct

You might also like to visit : learnnodeonline.blogspot.com/
More blogs are about to come.

 

Nice overview Jorge. Question for you - with the advent of asynch enabled REST endpoints in .NET, specifically Web API, does this mitigate the C10M problem you referred to somewhat?

 

According to Microsoft's official documentation:

For example, in Windows an OS thread makes a call to the network device driver and asks it to perform the networking operation via an Interrupt Request Packet (IRP) which represents the operation. The device driver receives the IRP, makes the call to the network, marks the IRP as "pending", and returns back to the OS. Because the OS thread now knows that the IRP is "pending", it doesn't have any more work to do for this job and "returns" back so that it can be used to perform other work.
When the request is fulfilled and data comes back through the device driver, it notifies the CPU of new data received via an interrupt. How this interrupt gets handled will vary depending on the OS, but eventually the data will be passed through the OS until it reaches a system interop call (for example, in Linux an interrupt handler will schedule the bottom half of the IRQ to pass the data up through the OS asynchronously). Note that this also happens asynchronously! The result is queued up until the next available thread is able to execute the async method and "unwrap" the result of the completed task.
Throughout this entire process, a key takeaway is that no thread is dedicated to running the task. Although work is executed in some context (that is, the OS does have to pass data to a device driver and respond to an interrupt), there is no thread dedicated to waiting for data from the request to come back. This allows the system to handle a much larger volume of work rather than waiting for some I/O call to finish.

Please note there is no thread dedicated to waiting for data from the request to come back, so that means they are not using Reactor Pattern (i.e. an Event Loop), they are really using purely OS resources (threads and interrupts) and that is awesome!!

I don't know if that's enough to solve C10M problem, maybe with big server resources such as 128 RAM, etc.

 
 

thanks for explaining worker threads more clearly!

 
 
 

Are there any resources you recommend for beginners?

 

There are many Youtube videos like this to get started with Node.js:

youtube.com/watch?v=RLtyhwFtXQA

The common pattern to learn is Node.js + Express + MongoDB.

 

You might also like to visit : learnnodeonline.blogspot.com/
More blogs are about to come.