Node.js Under The Hood (7 Part Series)
Photo by Priscilla Du Preez on Unsplash
After version V8.5.9, V8 changed its old pipeline (composed of Full-Codegen and Crankshaft) to a new pipeline which uses two brand new compilers, the Ignition and TurboFan. This new pipeline is mostly why JS runs blazing fast nowadays.
Basically, the initial steps have not changed, we still need to generate an AST and parse all the JS code, however, Full-Codegen has been replaced by Ignition and Crankshaft has been replaced by TurboFan.
Ignition is a bytecode interpreter for V8, but why do we need an interpreter? Compilers are much faster than an interpreter. Ignition was mainly created for the purpose of reducing memory usage. Since V8 don't have a parser, most code is parsed and compiled on the fly, so several parts of the code are actually compiled and recompiled more than once. This locks up to 20% of memory in V8's heap and it's specially bad for devices with low memory capabilities.
One thing to notice is that Ignition is not a parser, it is a bytecode interpreter, which means that the code is being read in bytecode and outputted in bytecode, basically, what ignition does is take a bytecode source and optimized it to generate much smaller bytecode and remove unused code as well. This means that, instead of lazy compiling the JS on the fly, like before, Ignition just takes the whole script, parses it and compiles all at once, reducing compiling time and also generating much smaller bytecode footprints.
So in short. This old compiling pipeline:
Note that this is the step in between the old compiling pipeline we just saw, and this new compiling pipeline that V8 uses now.
Has become this:
Which means that the AST, which was the source of truth for the compilers, is now fed into Ignition which walks all nodes and generates bytecodes that is the new source for all compilers.
Essentially, what Ignition does is turn code into bytecodes, so it does things like this:
As you can see, this is a register-based interpreter, so you can see the registers being manipulated around function calls.
r0 is the representation of a local variable or a temporary expression which needs to be stored on the stack. The baseline to imagine is that you have an infinite register file, since those are not machine registers, they get allocated onto the stack frame when we start. In this specific function there's only one register that's used. Once the function starts,
r0 is allocated onto the stack as
undefined. The other registers (
a2) are the arguments for that function (
c) which are passed by the calee, so they're on the stack as well, this means we can operate them as registers.
There's also another implicit register called
accumulator, which is stored in the machine's registers, where all the input or output should go, this means the results of operations and variable loadings
Reading that bytecode we have these set of instructions:
LdaSmi #100 -> Load constant 100 into the accumulator (Smi is Small Integer) Sub a2 -> Subtract the constant we loaded from the a2 parameter (which is c) and store in the accumulator Star r0 -> Store the value in the accumulator into r0 Ldar a1 -> Read the value of the a1 parameter (b) and store into the accumulator Mul r0 -> Multiply r0 by the accumulator and store the result also in the accumulator Add a0 -> Adds the first parameter a0 (a) into the accumulator and stores the result in the accumulator Return -> Return
We'll talk about bytecodes in depth in our next article
After walking the AST, the generated bytecode is fed one at a time to an optimisation pipeline. So before Ignition can interpret anything, some optimisation techniques like register optimisation, peephole optimisations and dead code removal are applied by the parser.
The optimisation pipeline is sequential, which makes possible for Ignition to read smaller bytecode and interpret more optimized code.
So this is the full pipeline before from the parser to Ignition:
The bytecode generator happens to be another compiler which compiles to bytecode instead of machine code, which can be executed by the interpreter.
Ignition is not written in C++ since it'd need trampolines between interpreted and JITed functions, since the call conventions are different.
It's also not written in hand-crafted assembly, like a lot of things in V8, because it'd need to be ported to 9 different architectures, which is not practical.
Rather than doing that stuff, Ignition is basically written using the backend of the TurboFan compiler, a write-once macro assembler and compiles to all architectures. And also, we can have the low level optimisations that TurboFan generates for free.
The whole problem with this, besides the technical complexity, is that the language features should be implemented in different parts of the pipeline and all those pipelines should be compatible with each other, including the code optimisations they all generated. V8 used this compiling pipeline for a while, when TurboFan couldn't actually handle all the use cases, but, eventually, this pipeline was replaced by this other one:
As we saw in the previous chapter, Ignition came to interpret the parsed JS code into bytecode, which became the new source of truth for all compilers in the pipeline, the AST was no longer the single source of truth which all compilers relied on while compiling code. This simple change made possible a number of different optimisation techniques such as the faster removal of dead code and also a lot smaller memory and startup footprint.
Aside of that, TurboFan is clearly divided into 3 separate layers: the frontend, the optimizing layer and the backend.
The frontend layer is responsible for the generation of bytecode which is run by the Ignition interpreter, the optimizing layer is responsible solely for optimizing code using the TurboFan optimizing compiler. All other lower level tasks, such as low level optimisations, scheduling and generation of machine code for supported architectures is handled by the backend layer - Ignition also relies on TurboFan's backend layer to generate its bytecode. The separation of the layers alone led to 29% less machine-specific code than before.
the solution to this was out of TurboFan or Crankshaft scope, this was solved by creating Ignition. Optimizing the bytecode generated by the parser led to a much smaller AST, which led to a smaller bytecode which finally led to a much smaller memory footprint, since further optimisations could be deferred to a later time. And executing code a while longer led to more type-feedback to the optimizing compiler and finally this led to less deoptimisations due to wrong type-feedback information.