Lucas Santos

Posted on Aug 28, 2019 • Edited on Nov 13, 2023

Node.js Under The Hood #1 - Getting to know our tools

#node #javascript #cpp #advanced

I was recently called to speak at a huge Brazilian conference called The Conf.

The whole point of the conference is to create content in English, in a way that others can benefit from it in the future by watching the recorded talks on-line and not only Brazilians who speak Portuguese.

You can watch this series in the video below

I felt the content I was delivering in my previous presentations were not as advanced and deep as I wanted them to be. So I decided to write a talk about how Node.js, JavaScript and the whole Node.js ecosystem actually work. This is due to the fact that most programmers only use things, but never really know what they do or how they work at all.

In our present world, this is "fine", we have a lot of libraries which have removed the need for us to read books and more books about something specific about the architecture of our processor so we could code a simple clock in assembly. However, this made us very lazy, using things without knowing about them created an atmosphere where everyone just read enough to create what they need and forget about all the concepts that come with it. After all, copying and pasting Stack Overflow code is much easier.

So, with that in mind, I decided to deep-dive into Node.js internals, at least to show how things are glued together and how most of our code actually runs in the Node.js environment.

This is the first of several articles about this particular theme, which I compiled and studied in order to make my talk. I won't post all the references in this first article since there's a lot of stuff. Instead, I'll split the whole content into several articles, each one covering a part of the study and, in the last article, I'll post the references and the slides to my talk.

Hope you all like it :D

Goal

The goal of this whole series is to make possible the understanding of how Node.js works internally, this was mainly due to the fact that Node.js and JavaScript are worldwide celebrities due to their libraries, but no one actually knows how do they even work under the hood. In order to do this, we'll try to cover several topics:

What is Node.js
1. Brief history
2. A brief history of JavaScript itself
3. Elements that are part of Node.js
Following through an I/O file read function call
JavaScript
1. How does it work under the hood?
  1. Callstack
2. Memory allocation
Libuv
1. What is libuv?
2. Why do we need it?
3. EventLoop
4. Microtasks and Macrotasks
V8
1. What is v8
2. Overview
  1. Abstract Syntax Tree using Esprima
3. Old compiling pipeline
  1. The full codegen
  2. Crankshaft
    1. Hydrogen
    2. Lithium
4. The new compiling pipeline
  1. Ignition
  2. TurboFan
    1. Hidden Classes and variable allocation
5. Garbage collection
Compiler optimizations
1. Constant Folding
2. Induction Variable Analysis
3. Rematerialization
4. Removing Recursion
5. Deforestation
6. Peephole Optimisations
7. Inline Expansion
8. Inline Caching
9. Dead Code Elimination
10. Code Block Reordering
11. Jump Threading
12. Trampolines
13. Common subexpression elimination

What is Node.js

Node.js is defined by Ryan Dahl (the original creator) as a "set of libraries that run on top of the V8 engine, allowing us to run JavaScript code on the server", Wikipedia defines it as "an open-source, cross-platform JavaScript runtime environment that executes code outside of a browser".

Essentially, Node.js is a runtime that allows us to execute JS outside the browser's domain. However, this is not the first implementation of server-side Javascript. In 1995, Netscape implemented what was called Netscape Enterprise Server, which allowed users to run LiveScript (early JavaScript) in the server.

Brief History of Node.js

Node.js was first released in 2009, written by Ryan Dahl, which was later sponsored by Joyent. The whole origin of the runtime begins with the limited possibilities of the Apache HTTP Server - the most popular web server back then - to handle a lot of concurrent connections. Also, Dahl criticized the way of writing code, which was sequential, this could lead to entire process blocking or multiple execution stacks in case of multiple simultaneous connections.

Node.js was first presented in the JSConf EU, on November 8th, 2009. It combined V8, an event loop provided by the - recently written - libuv and a low level I/O API.

Brief History of JavaScript itself

Javascript is defined as a "high-level, interpreted scripting language" that conforms to the ECMAScript specification and maintained by TC39. JS was created in 1995 by Brendan Eich while he worked in a scripting language to Netscape browser. JavaScript was solely created to fulfill Marc Andreessen's idea of having a "glue language" between HTML and web designers, which should be easy to use to assemble components such as images and plug-ins, in such way that the code would be directly written in the web page markup.

Brendan Eich was recruited to implement Scheme language into Netscape, but, due to a partnership between Sun Microsystems and Netscape in order to include Java in the Netscape navigator, his focus was switched into creating a language that was somehow Java-like with a similar syntax. In order to defend the idea of JavaScript against other proposals, Eich wrote, in 10 days, a working prototype.

The ECMA specification came a year later when Netscape submitted the JavaScript language to ECMA International in order to carve out a standard specification, which other browser vendors could then implement based on the work done at Netscape. This led to the first ECMA-262 standard in 1997. ECMAScript-3 was released in December 1999 and it is the modern-day baseline for JavaScript language. ECMAScript 4 was mothballed because Microsoft had no intention of cooperating or implementing proper JavaScript in IE, even though they had no competing proposals and had a partial, but divergent, implementation of the .NET language server-side.

In 2005, the open-source and developer communities set to work to revolutionize what could be done with JavaScript. First, in 2005, Jesse James Garrett published the draft of what would be called AJAX, this resulted in the renaissance of JavaScript usage led by open source libraries like jQuery, Prototype, and MooTools. In 2008, after this whole community started using JS again, the ECMAScript 5 was announced and launched in 2009.

Elements that compose Node.js

Node.js is composed of few dependencies:

V8
Libuv
http-parser
c-ares
OpenSSL
zlib

This image has the perfect explanation:

With that said, we can split Node.js into two parts: V8 and Libuv. V8 is about 70% C++ and 30% JavaScript, while Libuv is almost completely written in C.

Our example - I/O function call

In order to achieve our goal (and to have a clear roadmap of what we're going to do), we'll start by writing a simple program that reads a file and prints it to the screen. You'll see that this code will not be the optimal code a programmer can write, but it'll fulfill the purpose of being an object of study for all the parts we are supposed to go through.

If you take a closer look at the Node.js source, you'll notice two main folders: lib and src. The lib folder is the one that contains all the JavaScript definitions of all functions and modules we require into our projects. The src folder is the C++ implementations that comes along with them, this is where Libuv and V8 resides, where all the implementations for modules like fs, http, crypto and others end up residing.

Let there be this simple program:



const fs = require('fs')
const path = require('path')
const filePath = path.resolve(`../myDir/myFile.md`)

// Parses the buffer into a string
function callback (data) {
  return data.toString()
}

// Transforms the function into a promise
const readFileAsync = (filePath) => {
  return new Promise((resolve, reject) => {
    fs.readFile(filePath, (err, data) => {
      if (err) return reject(err)
      return resolve(callback(data))
    })
  })
}

(() => {
  readFileAsync(filePath)
    .then(console.log)
    .catch(console.error)
})()

Yes, I know there's util.promisify and fs.promises, however, I wanted to manually convert the callback into a promise so we could have a better understanding of how things actually work.

All the examples we'll have in this article will be related to this program. And this is due to the fact that fs.readFile is not either part of V8 or JavaScript. This function is solely implemented by Node.js as a C++ binding to the local OS, however, the high-level API we use as fs.readFile(path, cb) is fully implemented in JavaScript, which calls those bindings. Here's the full source code of this specific readFile function (because the whole file is 1850 lines long, but it's in the references):



// https://github.com/nodejs/node/blob/0e03c449e35e4951e9e9c962ff279ec271e62010/lib/fs.js#L46
const binding = internalBinding('fs');
// https://github.com/nodejs/node/blob/0e03c449e35e4951e9e9c962ff279ec271e62010/lib/fs.js#L58
const { FSReqCallback, statValues } = binding;

// https://github.com/nodejs/node/blob/0e03c449e35e4951e9e9c962ff279ec271e62010/lib/fs.js#L283
function readFile(path, options, callback) {
  callback = maybeCallback(callback || options);
  options = getOptions(options, { flag: 'r' });
  if (!ReadFileContext)
    ReadFileContext = require('internal/fs/read_file_context');
  const context = new ReadFileContext(callback, options.encoding);
  context.isUserFd = isFd(path); // File descriptor ownership

  const req = new FSReqCallback();
  req.context = context;
  req.oncomplete = readFileAfterOpen;

  if (context.isUserFd) {
    process.nextTick(function tick() {
      req.oncomplete(null, path);
    });
    return;
  }

  path = getValidatedPath(path);
  binding.open(pathModule.toNamespacedPath(path),
               stringToFlags(options.flag || 'r'),
               0o666,
               req);
}

Disclaimer: I'm pasting the code references in the Github source links as of the commit 0e03c449e35e4951e9e9c962ff279ec271e62010 which is the latest right now, this way this document will always point to the right implementation in the time I wrote it.

See line 5? We have a require call to read_file_context, another JS file (which is in the references as well). At the end of the fs.readFile source code, we have a call to binding.open, which is a C++ call to open a file descriptor, passing the path, the C++ fopen flags, the file mode permissions in octal format (0o is octal in ES6) and, lastly, the req variable which is the async callback function which will receive our file context.

Along with all that, we have the internalBinding, which is the private internal C++ binding loader, this is not accessible to the end-users (like us) because they're available through NativeModule.require, this is the thing that actually loads C++ code. And this is where we depend on V8, A LOT.

So, basically, in the code above, we're requiring a fs binding with internalBinding('fs'), which calls and loads the src/node_file.cc (because this whole file is in the namespace fs) file that contains all the C++ implementations for our FSReqCallback and statValues functions.

The function FSReqCallback is the async callback used when we call fs.readFile (when we use fs.readFileSync there's another function called FSReqWrapSync which is defined here) and all its methods and implementations are defined here and exposed as bindings here:



// https://github.com/nodejs/node/blob/0e03c449e35e4951e9e9c962ff279ec271e62010/src/node_file.cc

FileHandleReadWrap::FileHandleReadWrap(FileHandle* handle, Local<Object> obj)
  : ReqWrap(handle->env(), obj, AsyncWrap::PROVIDER_FSREQCALLBACK),
    file_handle_(handle) {}

void FSReqCallback::Reject(Local<Value> reject) {
  MakeCallback(env()->oncomplete_string(), 1, &reject);
}

void FSReqCallback::ResolveStat(const uv_stat_t* stat) {
  Resolve(FillGlobalStatsArray(env(), use_bigint(), stat));
}

void FSReqCallback::Resolve(Local<Value> value) {
  Local<Value> argv[2] {
    Null(env()->isolate()),
    value
  };
  MakeCallback(env()->oncomplete_string(),
               value->IsUndefined() ? 1 : arraysize(argv),
               argv);
}

void FSReqCallback::SetReturnValue(const FunctionCallbackInfo<Value>& args) {
  args.GetReturnValue().SetUndefined();
}

void NewFSReqCallback(const FunctionCallbackInfo<Value>& args) {
  CHECK(args.IsConstructCall());
  Environment* env = Environment::GetCurrent(args);
  new FSReqCallback(env, args.This(), args[0]->IsTrue());
}

// Create FunctionTemplate for FSReqCallback
Local<FunctionTemplate> fst = env->NewFunctionTemplate(NewFSReqCallback);
fst->InstanceTemplate()->SetInternalFieldCount(1);
fst->Inherit(AsyncWrap::GetConstructorTemplate(env));
Local<String> wrapString =
    FIXED_ONE_BYTE_STRING(isolate, "FSReqCallback");
fst->SetClassName(wrapString);
target
    ->Set(context, wrapString,
          fst->GetFunction(env->context()).ToLocalChecked())
    .Check();

In this last bit, there's a constructor definition: Local<FunctionTemplate> fst = env->NewFunctionTemplate(NewFSReqCallback). This basically says that when we call new FSReqCallback() the NewFSReqCallback will be called. Now see how the context property appears in the target->Set(context, wrapString, fst->GetFunction) part, and also how oncomplete also is defined and used on the ::Reject and ::Resolve.

It is also valuable to note that the req variable is built upon the result of the new ReadFileContext call, which is referenced as context and set as req.context. This means that the req variable is also a C++ binding representation of a request callback built with the function FSReqCallback() and setting its context to our callback and listening to an oncomplete event.

Conclusion

Right now we haven't seen much. However, in later articles, we'll be diving more and more into how things actually work and how we can use our function to better understand our tooling!

See ya!

Top comments (29)

Lucas Santos • Aug 29 '19

Hey everyone! For those waiting for the next article in the series! It's already being written (about 60%) so I'll guess It'll come out by November 10th or something like this :)

Thank you all for your support and feedback <3

Caio Borghi • Aug 30 '23

Outstanding!

You did all the work for me 😅

I started to studying async/await and promises in JavaScript and ended up falling into a rabbit role of learning that led me to:

Event Loop
Micro/MacroTasks
V8
LibUV

But it’s really laborious and the information is spread across the web.

This article is pure gold.

Thank you so much for writing it 😁

Lucas Santos • Oct 12 '23

Thanks a lot for the words! I hope I can keep up with the expectations and write more like this in the future haha

Nikolay Belichki • May 8 '23

As a C/C++ embedded software developer that recently got interested into cloud and web solutions - I would say your set of articles were a great help for me to understand the Node architecture and principles of operation. Great work!

Lucas Santos • May 10 '23

Thanks a lot! It's so nice to know that my content is still helping people even this many years later :D

K-Sato • Apr 21 '22 • Edited

A bit late to the party but this is definitely one of the most informative articles I've ever read on Dev.to!!!

Thank you!

Daniel Brum • Aug 28 '19

Great article, it definitely will help those who are looking for a deeper understanding of how Node works.

Lucas Santos • Aug 28 '19

Thanks man! I hope to help as much people as I can :D

Mary Krivokhat • Jan 6 '20

Lucas Santos, thank you for this awesome article!)

The company I am working at, in January-February 2020 starts the open-source project for Node.js developers (microservices)!
Warm welcome🥳
Spectrum: spectrum.chat/yap?tab=posts (community chat, to be launched soon)
GitBook: manual.youngapp.co/community-edition/ (docs)
Twitter: twitter.com/youngapp_pf (news)
GitHub: github.com/youngapp/yap (docs)
(click🌟star to support us and stay connected🙌)

Pablo Coronel • Sep 2 '19

Awesome post, Lucas!
It is very important to understand the reason of things.

Fortunately I came to your article just when I started studying Node.
Therefore, everything you explained serves as a map to start studying.

I await the next article.

Thank you very much from Argentina!

Lucas Santos • Sep 3 '19

Hey Pablo! Thank you so much for your words :D I really appreciate it! I'll work hard to get the next article up and running by the end of this week or maybe in the beginning of the next one!

Hope you like it!

Thanks a lot, from Brazil!

Noel Koutlis • Sep 3 '19

Ryan Dahl was already writing Nginx modules before he created nodejs. He even considered writing something on top of another language before choosing JavaScript. A major factor here was Google releasing V8 engine as open source
news.ycombinator.com/item?id=15140669

Lucas Santos • Sep 3 '19

Wow! Really cool! I didn't know about this! Thanks a lot for sharing :D

rmollel • Aug 30 '19

Great article. I'm looking forward to read the next one. Can't wait!!!!

Victor Teodoro • Aug 31 '19

Great article! Very few people really know what goes on under node's hood until they have to actually implement some native modules for some specialized domain.

As a note, libuv is mostly written in C, not C++ (I think it was a typo tho :) ). There is a very didactic intro to it provided by the libuv team itself here: nikhilm.github.io/uvbook/An%20Intr.... You have to know some C first to use it but you don't need to be a specialist.

Lucas Santos • Sep 2 '19

Awesome man! Thanks for the correction, will fix it as soon as possible :D

View full discussion (29 comments)