Node.js Under the Hood #8 - Understanding Bytecodes

#node #javascript #v8 #assembly

We've been talking a lot about bytecodes lately. But what are bytecodes?

Bytecodes are abstractions of machine codes. Think of it as something in between the code we can read and the code machines execute. However, bytecodes are machine agnostic, which means that bytecodes can be compiled into whatever machine architecture you're running on – however, compiling bytecode to machine code is way easier if you generate bytecode which was designed with the same computational model as the underlying CPU.

CPUs are Turing machines that are either based on stacks, registers or states. V8's Ignition interpreter is a register-based interpreter with an accumulator as well as most of the CPUs.

In the end, bytecode is translated into assembly/machine code which can be sent to the processor and executed.

You can think of JavaScript as a series of small building blocks. Each operator (or sets of operators) have a bytecode notation in V8. So we have bytecodes for operators like typeof, add, sub and we also have operators for property loads like LdaSmi for small integers or LdaNamedProperty. The complete list can be found at the header file

Registers

Ignition uses registers like r0, r1, r2 ... to store bytecode inputs or outputs and specifies which ones to use. Along with input registers, Ignition also has an accumulator register, which stores the results of operations, we'll call it acc. It's pretty much the same as common registers but operands do not specify it at all, for instance, sub r0 is subtracting the value in r0 from the value in the accumulator, leaving the result value in the acc itself.

You'll see that many bytecodes start with Lda or Sta, the a stands for "accumulator", while Ld is "load" and St is "store". So, by intuition, LdaSmi [99] loads the small integer 99 into the accumulator, while Star r0 stores the value of the accumulator into the register r0.

This is because if we wrote: "LoadSmallIntToAccumulator" instead of "LdaSmi" we'd have to allocate more memory just to store the bytecode name. This is why bytecodes scare a lot of people.

Shorter bytecodes = less memory

Hands-on

Let's take a real bytecode from a real function in JavaScript. We're not using our readFile function since it'd be too complicated. Let's use this simple function:



function multiplyXByY (obj) {
  return obj.x * obj.y
}

multiplyXByY({ x: 1, y: 2 })

A small note: V8 compiler is lazy, so if you don't run a function, it is not compiled, which means it does not generate any bytecode.

This will generate the following bytecode:



[generated bytecode for function: multiplyXByY]
Parameter count 2
Register count 1
Frame size 8
   22 E> 0x334a92de11fe @    0 : a5                StackCheck
   43 S> 0x334a92de11ff @    1 : 28 02 00 01       LdaNamedProperty a0, [0], [1]
         0x334a92de1203 @    5 : 26 fb             Star r0
   51 E> 0x334a92de1205 @    7 : 28 02 01 03       LdaNamedProperty a0, [1], [3]
   45 E> 0x334a92de1209 @   11 : 36 fb 00          Mul r0, [0]
   52 S> 0x334a92de120c @   14 : a9                Return
Constant pool (size = 2)
Handler Table (size = 0)

You can generate and print your bytecode using the command node --print-bytecode --print-bytecode-filter=functionName <file>

Let's ignore the header and the footer since it's just metadata for the bytecodes.

LdaNamedProperty a0, [0], [1]

This bytecode loads a named property of a0 – Ignition identifies parameters as a0, a1, a2..., the number is the index of the argument, so a0 is the first argument of the function (obj) – into the accumulator.

In this particular bytecode, we're looking up the named property on a0, so we're loading the first argument of the function, which is obj. The name we're looking at is determined by the first parameter: [0]. This constant is used to look up the name in a separate table – which can be accessed in the Constant Pool part of the print, but only in Node.js debug mode:



0x263ab302cf21: [FixedArray] in OldSpace
 - map = 0x2ddf8367abce <Map(HOLEY_ELEMENTS)>
 - length: 2
           0: 0x2ddf8db91611 <String[1]: x>
           1: 0x2ddf8db67544 <String[1]: y>

So we see that position 0 is x. The [1] is the index of what is called "feedback vector", which contains runtime information that is used for optimizations.

Star r0

Star r0 stores the value that is currently in the accumulator, which is the value of the x index we just loaded, in the register r0.

LdaNamedProperty a0, [1], [3]

This is the same thing, but we're now loading the index 1, which is y.

Mul r0, [0]

This operation multiplies the value that is currently in the accumulator (y) by r0 (x) and stores the result into the accumulator.

Return

The return statement returns the value that is currently in the accumulator. It's also the end of the function. So the function caller will start with the result of our last bytecode operation – which is 2 – already in the accumulator.

Takeaways

Most bytecodes may seem gibberish at first glance. But keep in mind that Ignition is a register machine with an accumulator, that is basically how we can simply understand how it works.

This would be the bytecode for our readFile function:



[generated bytecode for function: readFileAsync]
Parameter count 2
Register count 3
Frame size 24
         0x23e95d8a1ef6 @    0 : 84 00 01          CreateFunctionContext [0], [1]
         0x23e95d8a1ef9 @    3 : 16 fb             PushContext r0
         0x23e95d8a1efb @    5 : 25 02             Ldar a0
         0x23e95d8a1efd @    7 : 1d 04             StaCurrentContextSlot [4]
  261 E> 0x23e95d8a1eff @    9 : a5                StackCheck
  279 S> 0x23e95d8a1f00 @   10 : 13 01 00          LdaGlobal [1], [0]
         0x23e95d8a1f03 @   13 : 26 fa             Star r1
         0x23e95d8a1f05 @   15 : 81 02 00 02       CreateClosure [2], [0], #2
         0x23e95d8a1f09 @   19 : 26 f9             Star r2
         0x23e95d8a1f0b @   21 : 25 fa             Ldar r1
  286 E> 0x23e95d8a1f0d @   23 : 65 fa f9 01 02    Construct r1, r2-r2, [2]
  446 S> 0x23e95d8a1f12 @   28 : a9                Return
Constant pool (size = 3)
Handler Table (size = 0)

We can see it has a series of bytecodes specifically designed to several aspects of the language, such as closures, globals and so on... Can you read that bytecode? Leave it here in the comments :)

Thanks

A big thanks to Franziska Hinkelmann, her articles and talks about V8 bytecodes are simply awesome and helped me a lot when I started studying this topic. Especially this article!