Mykola Musiienko

Posted on Aug 27, 2021

Compiling a small Scala subset into x86-64 assembly

#computerscience #compiler #learning

Writing a toy compiler is a ton of fun!

Probably that's why there are so many of these around. So what sets this one apart?

1) It compiles down to x86-64 assembly. Then invokes GNU as and ld to produce a native executable. To me personally, this is much more rewarding than emitting MIPS asm and using an emulator to run it. I also prefer emitting asm to emitting, for example, C — at the very least, it forces the developer to figure out converting expressions into asm. That really drives home the point how much work high level langs do for us.

2) The lang is simple but not too simple. A lot of mini-compilers have languages with functions, primitive values and pretty much nothing else. Whereas this project's lang has classes, inheritance, virtual dispatch, and even a very simple form of pattern matching.

Scroll to the final sections for a number of links to awesome compiler-learning resources. To cover every major topics with only one or two links, I had to make these sections opinionated. Give it a chance, it might click for you too.

Go to the project's repository to explore the source code.

What are we going to compile?

Why a Scala subset?

How are we going to compile?

Lexing and parsing

Semantics

Assembly

Runtime library

Executable linking

Intermediate representations and optimizations

If you want to learn about compilers

What are we going to compile?

Let’s first get a quick taste of our programming language and dive into details a bit later. Here’s a small complete program in Cool 2020.

class QuickSort() extends IO() {
  def quicksort(array: ArrayAny, lo: Int, hi: Int): Unit = {
    if (lo < hi) {
      var p: Int = partition(array, lo, hi);
      quicksort(array, lo, p - 1);
      quicksort(array, p + 1, hi)
    } else ()
  };

  def partition(array: ArrayAny, lo: Int, hi: Int): Int = {
    var pivot: Int = array.get(lo) match { case i: Int => i };
    var p: Int = lo;
    var i: Int = lo + 1;
    while (i <= hi) {
      if (((array.get(i)) match { case i: Int => i }) <= pivot)
        array_swap(array, i, { p = p + 1; p })
      else
        ();
      i = i + 1
    };

    array_swap(array, p, lo);
    p
  };

  def array_swap(array: ArrayAny, p: Int, q: Int): Unit = {
    var tmp: Any = array.get(p);
    array.set(p, array.get(q));
    array.set(q, tmp)
  };

  def out_array(array: ArrayAny): Unit = {
    var i: Int = 0;
    while (i < array.length()) {
      array.get(i) match {
        case i: Int => out_int(i); out_string(" ")
      };

      i = i + 1
    };

    out_nl()
  };

  {
    var array: ArrayAny = new ArrayAny(5);
    array.set(0, 30);
    array.set(1, 20);
    array.set(2, 50);
    array.set(3, 40);
    array.set(4, 10);

    out_array(array);

    quicksort(array, 0, array.length() - 1);

    out_array(array)
  };
}

class Main() {
  { new QuickSort() };
}

Use this link to run the code.

Alex Aiken came up with Classroom Object Oriented Language or COOL for their Compilers course. Our language's name is similar — Cool 2020. In spite of similar names, the languages are fairly different. Cool 2020 is a Scala subset (with a couple minor incompatibilities). John Boyland extracted and documented it.

Why a Scala subset?

I wanted to focus on developing a compiler, so instead of coming up with my own language I looked for an existing educational one. Out of what I found on the Internet, Cool 2020 seemed to be the best fit.

We’ve seen a sample program earlier, you can take a look at an implementation of Conway’s Game of Life and the grammar in the project’s repository. Please notice, the grammar ignores the operations precedence for the sake of simplicity. The precedence of operations is described here .

Going with a Scala subset has some advantages. One, as Scala is quite a modern language, Cool 2020 too has features typical of modern programming languages. Thanks to this, we get to think about and solve problems similar to the problems developers of modern programming languages solve, though, obviously, on a hugely smaller scale.

An example of such a feature is the way we write constructors in Cool 2020. Its parameters appear in parentheses just after the class name. And the constructor’s code can be split across a number of code blocks appearing in curly braces inside the class body.

Let’s take a look at a bit of code illustrating this feature. The constructor of the Greeter class below has one parameter — name. When the compiler sees this constructor parameter, it automatically adds an attribute name to the class, and emits the code to assign the parameter to the attribute. The method speak then uses the attribute name.

class Greeter(var name: String) { 
  var io: IO = new IO();
  def speak(): Unit = {  
    io.out_string( "Hello, " + name + "!" );  
    io.out_nl()  
  };  
}

class Main() { { new Greeter("World" ).speak() }; }

Two, we can use Scala tools. Syntax highlighting in editors and in this document is a very nice to have. Scastie runs Scala code right in the browser. It is going to be really useful for us too as we can run Cool 2020 programs in Scastie.

Here are the bits of code we’ll need to run Cool 2020 in Scastie.

class ArrayAny(len: Int) {
  var _array = new Array[Any](len);
  def length(): Int = _array.length;
  def get(i: Int): Any = _array(i);
  def set(i: Int, value: Any): Unit = _array(i) = value;
}

class IO {
  def out_string(s: String): Unit = print(s);
  def out_int(i: Int): Unit = print(i);
  def out_nl(): Unit = println();
}

object Program {
  def main(args: Array[String]): Unit = {
    new Main()
  };
}

That’s it about Cool 2020 for the moment. If you happen to be looking for a language to implement in your own compiler, make sure to also take a look at ChocoPy. It’s very well documented, has static typing, classes, dynamic dispatch, nested functions, etc. At the same time ChocoPy manages to be concise and possible to implement in a limited time frame.

How are we going to compile?

Here’s how.

Well, to reproduce what happens in the gif, we’ll need to write a compiler.

I have wanted to practice F# for a long time. And what's better to practice a new programming language than a side-project? A toy compiler is exactly that side-project.

Now, that we’ve decided on F#, let’s get one thing out of the way: please don’t expect to find purely functional code in the repository. While writing the code, I quickly realised trying to stick to functional programming concepts and learning about compilers was a little bit more than I could handle at the same time. So, there is mutable state, loops, and classes in the code.

Compile-time errors

Before we get down to coding, let’s do a bit of planning. One thing to decide on is what our compiler is supposed to do when it encounters an error in the user’s code.

The simplest thing to do is to output a message telling the user about the error and stop translation once the compiler found the first error. As a result, if there are multiple errors in the code, the user will have to restart compilation multiple times to catch them all. It doesn’t make for a good user experience.

A very different approach exists. When the compiler detects an error it informs the user, but instead of stopping, it keeps analysing the code. The goal is to find and report as many errors as possible in a single run. So that the user can fix all of them and then — ideally — restart compilation only once.

Let’s discuss one downside of this approach. It requires the compiler’s code to be noticeably more complicated. As the compiler has to use heuristics reducing cascading errors.

What’s a cascading error? The Scala code sample below contains one error. The closing parenthesis is missing in line 2.

/* 1 */ class CascadingError {
/* 2 */   def force(: Unit = {
/* 3 */   };  
/* 4 */ }

But the Scala compiler diagnoses two errors. It gets “confused” after seeing the first one, which is real. As a result, it reports the second error in line 3, where there is no actual problem with the code, — this is a cascading error.

CascadingError.scala:2: error: identifier expected but ':' found.  
    def speak(: Unit = {  
              ^
CascadingError.scala:3: error: ':' expected but ';' found.  
    };  
     ^

In some cases, cascading errors make diagnostic messages so noisy, users start looking for a way to make the compiler stop after the first encountered error. Trying to figure out which errors are real and which are just cascading and will go away when the user fixes the real ones is just not worth it in such cases.

To try and get the best of both worlds, some compilers combine the two error handling strategies we just discussed. A great example of going this way is the C# compiler. Here’s a short description Eric Lippert gave on StackOverflow :

Briefly, the way the compiler works is it tries to get the program through a series of stages...

The idea is that if an early stage gets an error, we might not be able to successfully complete a later stage without (1) going into an infinite loop, (2) crashing, or (3) reporting crazy "cascading" errors . So what happens is, you get one error, you fix it, and then suddenly the next stage of compilation can run, and it finds a bunch more errors.

...

The up side of this design is that (1) you get the errors that are the most "fundamental" first, without a lot of noisy, crazy cascading errors , and (2) the compiler is more robust because it doesn't have to try to do analysis on programs where the basic invariants of the language are broken. The down side is of course your scenario: that you have fifty errors, you fix them all, and suddenly fifty more appear.

Let’s see how this idea translates into the C# compiler’s source code. The method CompileAndEmit of the class Microsoft.CodeAnalysis.CommonCompiler organizes the work related to compilation of C# and VB.NET. Notice the invocations of compilation.GetDiagnostics, each one has a different value from the enum CompilationStage as its argument. Every invocation is followed by a number of checks: did the translation stage complete successfully, should the translation stop or move to the next stage?

Compilation stages

We’re going to follow the combined error handling strategy as best we can. Of course our code is going to be much simpler than the C# compiler’s. Let’s split compilation into the following stages.

Lexical analysis, or simply lexing;
Syntactic analysis, or simply parsing;
Semantic analysis;
Emitting x86-64 assembly.

Within one compilation stage, we’re going to try and diagnose as many errors in the user’s code as possible. But the next stage will only start if we don’t detect any errors at the current one. Let’s say the compiler found syntactic errors at the parsing stage, in such a case semantic analysis wouldn’t even start.

The same idea applies to the lexing and parsing stages. If there are portions of the program’s text we didn’t manage to turn into tokens while lexing, parsing isn’t going to start, the compiler will stop instead.

To see these ideas reflected in the project’s code, take a look at CompileToAsmStep.fs and the method Invoke of EmitExeHandler.

Also, we simplify things a lot here compared to the C# compiler. As it performs lexing and parsing at the same time. If the C# compiler’s lexer cannot recognize a token, the parser handles this scenario and keeps going anyway. But semantic analysis will not start, if lexical or syntactic errors exist in the code — translation terminates.

Lexing and parsing

First things first, we need a lexer and a parser.

The lexer’s goal is to take in source code as a sequence of characters and recognize tokens in it. A token can be a number literal, a string literal, an identifier, a language keyword, an operator, a punctuation mark, etc. In our specific case, we’ll throw away any whitespace and comments.

Next, the parser will try to convert its input sequence of tokens into an abstract syntax tree (AST).

An AST represents the structure of a program’s text in the form of a tree. In other words, as far as compilation goes, it’s hard to do anything useful with a character sequence. But a tree is a much more convenient data structure to work with. That’s why we’ll organize the next translation stages around the AST.

A program’s text

class Main() {
  var io: IO = new IO ();

  {
    io.out_string("Hello, World!" );
    io.out_nl()
  };
}

The AST, serialized to JSON

{ "classes": [
 { "name": "Main" ,
   "varformals": [],
   "extends": { "type": "Any" ,
                "actuals": [] },
   "body": [
    { "kind": "attribute" ,
      "name": "io" , "type": "IO" ,
      "value": { "kind": "new" ,
                 "type": "IO" ,
                 "actuals": [] } },
    { "kind": "block" ,
      "statements": [
       { "kind": "stmt" ,
         "expr":
         { "kind": "dispatch" ,
           "receiver": "io" ,
           "method": "out_string" ,
           "actuals": [ "\"Hello, World!\""  ] }
       },
       { "kind": "expr" ,
         "expr":
         { "kind": "dispatch" ,
           "receiver": "io" ,
           "method": "out_nl" ,
           "actuals": [] }
       }
     ]
    }
   ] } ] }

There are a number of ways to get the AST from source code: parser generators such as GNU Bison and ANTLR ; parser combinators libraries such as Parsec ; writing a lexer and a parser from scratch. Which one should we go with? Well, if we have a chance to gain experience at least remotely similar to writing an enterprise-grade compiler, we should take that chance.

As far as I can tell, the teams of most contemporary enterprise-grade compilers built their lexers and recursive-descent parsers by hand. Some of them are: GCC, Clang, D, Rust, Swift, Go, Java, V8, TypeScript, C#, VB.NET.

Here’s how one of the C# compiler’s developers explained their decision to implement their lexer and parser manually.

Hello, I work on the C# compiler and we use a handwritten recursive-descent parser. Here are a few of the more important reasons for doing so:

Incremental re-parsing. If a user in the IDE changes the document, we need to reparse the file, but we want to do this while using as little memory as possible. To this end, we re-use AST nodes from previous parses.

Better error reporting. Parser generators are known for producing terrible errors. While you can hack around this, by using recursive-descent, you can get information from further "up" the tree to make it more relevant to the context in which the error occurred.

Resilient parsing. This is the big one! If you give our parser a string that is illegal according to the grammar, our parser will still give you a syntax tree! (We'll also spit errors out). But getting a syntax tree regardless of the actual validity of the program being passed in means that the IDE can give autocomplete and report type-checking error messages. As an example, the code var x = velocity. is invalid C#. However, in order to give autocomplete on velocity, that code needs to be parsed into an AST, and then typechecked, and then we can extract the members on the type in order to provide a good user experience.

The story of GCC’s parser seems to be quite a bright example. GCC started out using the Bison parser generator, but in 2004 the developers decided to switch to a handwritten recursive-descent parser. Apparently, migrating turned out to be easier than trying to get around Bison's limitations. In the changelog , you can read the following.

A hand-written recursive-descent C++ parser has replaced the YACC-derived C++ parser from previous GCC releases. The new parser contains much improved infrastructure needed for better parsing of C++ source codes, handling of extensions, and clean separation (where possible) between proper semantics analysis and parsing.

Now that we’ve considered all of the info above, the choice is clear: we’re going to write our lexer and parser by hand. In fact, it’s not difficult.

In their book Crafting Interpreters, Robert Nystrom explains writing lexers and recursive-descent parsers really, really well. Our parser is going to be recursive-descent too.

The lexer’s code is in Lexer.fs, if you feel like untangling it, consider starting from the method GetNext.
The parser’s code is in Parser.fs, and the method Parser.Parse is its entry point.

Semantics

Once parsing completes, the next compilation stage is semantic analysis.

Eric Lippert wrote a very interesting blog post How many passes?. It lists some of the passes the C# compiler performs and gives a bit of details on each. Spoilers ahead: There are more than twenty passes. Many of them are related to semantic analysis.

We’re going to perform semantic analysis over three passes. The method Translator.Translate is responsible for starting the analysis passes one after another and checking their results.

The first semantic pass is the simplest. The only thing it’s going to do is create a map, where the key is a class’ name and the value is the AST node corresponding to this class. We’ll need this map to look up class names during the second pass.

In case we meet two or more classes with the same name, we’ll report an error and stop compilation.

The first semantic pass is implemented by the function collect_class_nodes from Translator.fs.

In the scope of the second semantic pass we’ll go over class definitions, their attributes and methods, but will not "look" inside the methods’ bodies. Along the way, we’ll form a class metadata map, where the key is again a class’ name, but the value this time is the list of the class’ attributes and methods with all the associated types. This map will come in handy while performing the third semantic pass. Also the second pass makes sure:

There are no loops in the inheritance hierarchy;
All classes referenced by name in the source code have definitions;
There are no methods which have two or more parameters with the same name;
The source code is free from a number of other similar problems.

In case, the second pass finds any semantic problems, the compiler informs the user about every one of them and stops.

The second semantic pass is implemented by the class ClassSymbolCollector. Translator.Translate sets up the second pass and starts it by a call to ClassSymbolCollector.Collect.

We’re at a point where we have the class metadata map, so it’s time to start the third semantic pass. What do we need the class metadata map for? Let’s say the third pass is looking at a method’s invocation instance.some_method(). Thanks to having the metadata map, we can make sure some_method is actually defined in the class of instance.

The third semantic pass analyses methods’ bodies. It makes sure the code doesn’t contain attempts to assign a String value to an Int variable. Or pass an Int argument to a method that expects a String. It also checks for other similar errors.

Another thing the third pass needs is a symbol table. A symbol table maps names from the current lexical scope to a collection of details on this name: does the name refer to a variable, a parameter, or an attribute; what’s its type, etc. If a name is not defined or not visible in the current scope, it won’t be present in the symbol table. Check out the class SymbolTable to see the code.

Apart from diagnosing semantic errors, the third pass could build a new tree of the same shape as AST, but extended with additional info: the type of each node, list and description of parameters and the return type for method nodes, etc. To keep the amount of code down and its complexity in check, we’re going to skip generating an intermediate tree. Instead, we’ll combine the third semantic pass and assembly generation.

The class ExprTranslator contains the third pass’ code. ExprTranslator.Translate is the entry point.

Assembly

OK, the parser built an AST for us. Along the way, the compiler made sure the code doesn’t contain syntactic or semantic errors. Now we’re going to walk the AST and emit assembly in the process. We’ll turn the assembly into an executable a bit later.

The Compilers course and many other learning materials have MIPS as their target for assembly generation. Surely, they have very good reasons for that and the fundamentals of generating assembly don’t depend on the particular instruction set architecture anyway. But in practice, I personally find generating x86-64 assembly much, much more motivating and rewarding. You need an emulator to run MIPS assembly — that is, unless you own an SGI workstation or a Tesla Model S. In contrast, executables produced from x86-64 assembly code can run natively on Windows or Linux right on a boring desktop PC.

Producing native executables has one more practical advantage, it makes it much simpler to automate compilation and execution of test programs to check the results of their work.

Here’s an example of a test program. If the results of building and execution of a test program don’t match the DIAG and OUT comments respectively, we consider the test failed.

class Main() {  
 { var io: IO = new IO();  
   io.out_string("Hello, Cool2020!" );  
   io.out_nl()  
 };  
}
// DIAG: Build succeeded: Errors: 0. Warnings: 0  
// OUT: Hello, Cool2020!

Expressions

Emitting assembly corresponding to control flow statements is rather straightforward. For example, imagine we had to organize a loop using only if and goto statements. To a first approximation, we organize loops in assembly in the same way.

Translating expressions is more interesting. A program in assembly language is a sequence of instructions, each has from zero to three operands. There is no assembly instruction that can evaluate a simplest expression like a + b + c + d. And expressions can be way more complex: contain parenthesis, different math operations, methods invocations...

So, what can we do? The idea is to “pretend” that instead of one complex expression we have multiple simple expressions. An expression is simple if we can trivially translate it into assembly. Let’s take a look at a bit of pseudocode.

A complex expression
a + b + c + d

Multiple simple expressions

var tmp0: Int;
var tmp1: Int;
var result: Int;

tmp0 = a;
tmp1 = b;
tmp0 = tmp0 + tmp1;

// tmp0 = a + b
tmp1 = c;
tmp0 = tmp0 + tmp1;

// tmp0 = a + b + c
tmp1 = d;
tmp0 = tmp0 + tmp1;

// tmp0 = a + b + c + d
result = tmp0

// result = a + b + c + d

To better understand how this idea applies in practice, let’s follow a detailed example.

We’re translating a + b + c + d.

Adding parentheses to the expressions we get ([(a + b) + c] + d) . (In this case, square brackets are equal to parentheses, we use them to make reading easier.)
Now, we’re moving the plus signs to the left, such that they’re placed before the corresponding terms: (+ [+ (+ a b) c] d) .
The last form of expression represents a tree. Hopefully, an ASCII diagram makes it obvious.

+  
├── +  
│   ├── +  
│   │   ├── a  
│   │   └── b  
│   └── c  
└── d

Let’s perform a post-order (LRN) traversal of this tree. When the post-order function finishes visiting both subtrees, it will print an assembly instruction corresponding to the current tree node. The post-order function will return the name of the register holding the result of the subtree’s expression evaluation.

If we trace the tree’s post-order traversal, we’ll see the following (using AT&T syntax).

# Entering the root node `+`
# The node has two subtrees: `[+ (+ a b) c]` and `d`
# Recursively invoking the post-order function for the left subtree
#
#   Entering the `+` node of the subtree `[+ (+ a b) c]`
#   The node has two subtrees: `(+ a b)` and `c`
#   Recursively invoking the post-order function for the left subtree
#
#     Entering the `+` node of the subtree `(+ a b)`
#     The node has two leaf nodes as its subtrees: `a` and `b`
#     Recursively invoking the post-order function for the left subtree
#
#       Entering the `a` node
#       Picking an unused register %rbx to load the value `a` into it
#       Printing the assembly instruction
mov    a,   %rbx
#       Returning the name of the register %rbx
#       %rbx contains the value at the address `a`
#       Leaving the `a` node
#
#     Recursively invoking the post-order function for the right subtree
#
#       Entering the `b` node
#       Picking an unused register %r10 to load the value `b` into it
#       Printing the assembly instruction
mov    b,   %r10
#       Returning the name of the register %r10
#       %r10 contains the value at the address `b`
#       Leaving the `b` node
#
#     Printing the assembly instruction for (+ a b)
#     %rbx = a
#     %r10 = b
add     %r10, %rbx # %rbx = %rbx + %r10
#     Marking %r10 as unused
#     Returning the name of the register %rbx
#     %rbx contains the value of `a + b`
#     Leaving the `+` node of the subtree `(+ a b)`
#
#   Recursively invoking the post-order function for the right subtree

#
#     Entering the `c` node of the subtree `[+ (+ a b) c]`
#     Picking an unused register %r10 to load the value `c` into it
#     Printing the assembly instruction
mov    c,    %r10
#     Returning the name of the register %r10
#     %r10 contains the value at the address `c`
#     Leaving the `c` node
#
#   Printing the assembly instruction for `[+ (+ a b) c]`
#   %rbx = a + b
#   %r10 = c
add     %r10, %rbx # %rbx = %rbx + %r10
#   Marking %r10 as unused
#   Returning the name of the register %rbx
#   %rbx contains the value of `(a + b) + c`
#   Leaving the `+` node of the subtree `[+ (+ a b) c]`
#
# Recursively invoking the post-order function for the right subtree

#
#   Entering the `d` node of the tree `(+ [+ (+ a b) c] d)`
#   Picking an unused register %r10 to load the value `d` into it
#   Printing the assembly instruction
mov    d,    %r10
#   Returning the name of the register %r10
#   %r10 contains the value at the address `d`
#   Leaving the `d` node
#
# Printing the assembly instruction for `(+ [+ (+ a b) c] d)`
# %rbx = (a + b) + c
# %r10 = d
add     %r10, %rbx
# Marking %r10 as unused
# Returning the name of the register %rbx
# %rbx contains the value of `((a + b) + c) + d`
# Leaving the root `+` node

Yippee, we translated an expression into assembly code! The %rbx register will contain the expression’s value at runtime. Let’s feast our eyes on the result one last time.

Expression
a + b + c + d

Assembly (using AT&T syntax)

mov     a,    %rbx
mov     b,    %r10
add     %r10, %rbx
mov     c,    %r10
add     %r10, %rbx
mov     d,    %r10
add     %r10, %rbx

The method Translator.Translate kicks off assembly generation by invoking ProgramTranslator.Translate. There are multiple classes responsible for emitting assembly: ProgramTranslator, ClassTranslator, ExprTranslator and AsmBuilder.

Runtime library

A number of built-in classes are available in Cool 2020. Additionally, although they are not directly accessible from user code, there are procedures for allocating memory, copying objects, and several procedures for aborting a program in various invalid scenarios. Another important element of our runtime is the entry point that transfers control to user code, we’ll discuss it in more details later.

All the code described above lives in the runtime library (or simply runtime).

Stanford’s Compilers course gives students a pre-made runtime library. It targets the MIPS architecture, assumes a Unix-like operating system and implements the classes built into the Cool language.

At the same time, we decided to emit x86-64 assembly code, target Windows and Linux, and our language is a Scala subset — its built-in classes and the built-in classes of Stanford’s Cool are different.

By now, the attentive reader will have noticed, the runtime library from the Compilers course is not going to work for us as is. Instead, we can study its code and apply the gained insights to roll our own runtime library.

Compiler Explorer (aka — godbolt.org) is a great help in writing relatively complex assembly code. For example, I used it to implement the string to number routine. I implemented the algorithm in C and used Compiler Explorer to translate it into assembly code. It took a lot of cleaning up to arrive at the final version of the code, but writing it from scratch would’ve been much, much harder for me.

What can our runtime do?

The runtime is made up of three parts. One is rt_common.s. Just as the name suggests, it’s common code that doesn’t depend on the platform. rt_common.s contains the process entry point — the routine main . It’s responsible for invoking the platform-dependent code’s initialization procedure, creating an instance of the class Main and invocation of the constructor. Exiting the process is something main is responsible for too.

Here’s a look at the entry point’s code.

Assembly (using AT&T syntax)

   .global main
main:
  call   .Platform.init

  # A class 'Main' must be present
  # in every Cool2020 program.

  # Create a new instance of 'Main'.
  movq  $Main_proto_obj, %rdi
  call  .Runtime.copy_object

  # 'Main..ctor' is a Cool2020
  # program's entry point.

  # Pass a reference to the newly
  # created 'Main' instance in %rdi.
  # Invoke the constructor.
  movq   %rax, %rdi
  call   Main..ctor

  xorq  %rdi, %rdi
  jmp   .Platform.exit_process

Pseudocode

def main() = {
  .Platform .init();

  new Main();

  .Platform.exit_process(0)
}

Also, rt_common.s contains the code of constructors and methods of the built-in classes.

IO includes input/output methods;
ArrayAny is a class representing arrays with the element type Any;
String is a class representing strings, in particular it has the concat and substring methods;
There are a couple of other classes as well, but we’ll discuss them another time;
As I mentioned earlier, the routine for copying objects and several procedures for the program's abnormal termination: in case of accessing a null reference, going out of bounds of an array, etc.

The platform-specific rt_windows.s and rt_linux.s each contain five procedures.

.Platform.init is responsible for platform-specific initialization. At the moment, this is preparation for memory allocation during program execution;
And .Platform.alloc is the memory allocation procedure;
.Platform.out_string and .Platform.in_string write a string to stdout and read from stdin, respectively;
And, finally, the most important procedure .Platform.exit_process is responsible for ending the process.

What else is necessary to make the runtime cross-platform? Of course, the end of line constant. Both rt_windows and rt_linux define a label .Platform.ascii_new_line pointing at the respective end of line characters sequence.

rt_windows.s and rt_linux.s are fully self-contained. As in they don’t depend on the standard C library or any other library. Instead they work directly with Win API on Windows and system calls on Linux.

So how do we make a Win API call from assembly code? First, we follow The Microsoft x64 Calling Convention. Second, it is very important to understand and keep in mind the concept of shadow space.

Things are a bit simpler on Linux: we don’t have to worry about the shadow space when performing a system call. We still need to stick to the platform’s calling conventions — for Linux it’s System V ABI. One place where you can find the list of system calls, their parameters and short descriptions is Linux System Call Table from Chromium OS Docs.

What our runtime can not do?

It cannot make coffee. And it cannot collect garbage. At the moment, Cool 2020 programs allocate memory but never free it up in the course of their execution. Eventually, we’ll add a generational garbage collector inspired by the one from Stanford’s Compilers course to the runtime. And for now, we’ll just pretend we already have a garbage collector — fake it till you make it!

There are officially no plans of adding coffee making routines.

Executable linking

Whew! We took a Cool 2020 program as the input and emitted assembly code as the output. The runtime’s code that the emitted assembly depends on is also in place. Not that much left to do to finally turn all of these into an executable.

For starters, we need an assembler that translates assembly code into machine code packaged into an object file. In other words, the assembler assembles assembly — programming is not without poetry, don’t you think?

Although object files contain machine code — i.e. the code that can be directly executed on the target CPU — we cannot run them just yet. We’ll need to combine the object files generated from the user code and from the runtime code into an executable using a linker.

As we’re aiming to have the compiler cross-platform, it makes sense to go for the assembler and linker from GNU Binutils — as and ld.

For Linux it’s a package basically every distro has. To give a specific example, on Ubuntu the package name is binutils.

Installing GNU Binutils on Windows is really simple too. MinGW includes the utilities in the distribution. If you don’t know what MinGW is, don’t bother right now. The important thing here is after installing it, we’ll have as and ld. Item 3 on this page explains where to get a MinGW graphical installer from and how to use it. (Advanced users can opt for an alternative way of obtaining MinGW. Install MSYS2. You now have bash and pacman. Say pacman -S mingw-w64-x86_64-toolchain to get the latest version of MinGW.)

On Windows, remember to add the path to as and ld to the PATH environment variable. Otherwise, our compiler won’t find them and will fail to generate executables.

The EmitExeHandler.Invoke is the method that runs as and ld and checks the results of their work.

And so, we have reached the point where our compiler can create executables from Cool 2020 programs! Now we have everything we need to record that GIF from "How are we going to compile?".

Intermediate representations and optimizations

Our compiler does not generate an intermediate representation (IR) of the program and does not perform any optimizations. Intermediate representations and code optimizations are extremely important, probably the most important, topics in compiler development, but working on them also takes a lot of time and effort. For this reason, I opted for taking them out of this project’s scope, at least for now. Hopefully, it doesn’t take away from studying the fundamentals of compilers. Better still, understanding IR and optimizations is much easier when you have a solid understanding of the basics.

Anyway, intermediate representations and optimizations are exceptionally important and very interesting topics in compiler development. That’s why teams of enterprise-grade compilers as well as computer scientists work really hard on them. How fast compiled programs work and their memory consumption directly depends on the optimizations compilers perform. There are a couple of links explaining these things in more detail in the "Advanced topics" section.

If you want to learn about compilers

Cannot recommend highly enough the book Crafting Interpreters by Robert Nystrom . Don’t let the word Interpreters in the book’s title discourage you. Compilers and interpreters have a lot in common and until you get to the topics specific to interpreters, you’ll gain a ton of useful knowledge. The author has a real talent for explaining things in a very clear and precise manner.

Alex Aiken, Compilers. This is basically a classic on compilers. A Stanford professor Alex Aiken created the course. As you would expect, compared to Crafting Interpreters, the course’s lectures focus on the theoretical aspects much more. To give an example, where Crafting Interpreters only mentions LR(1) parsing existence, the Compilers course goes into details about it.

For some reasons, the Compilers course, just as many other compiler studying materials, discusses emitting assembly code targeting the MIPS architecture. But x86-64 assembly is a better fit for our specific project. For this reason, a freely available book Introduction to Compilers and Language Design by Douglas Thain is very useful to us. As, among other things, the book explains how to work with the x86-64 assembly and how to translate different constructs of high-level languages into it.

Who’s gonna watch a Fortnite stream on Twitch, when Immo Landwerth is writing a compiler live on YouTube?! Of course, records of the live streams are available as well and you can explore the code in the project’s github repo.

Advanced topics

A talk on intermediate representations (IR). Previous knowledge is not required. What’s so cool about this talk is that it’s built around one of the most widely used IR — LLVM IR. 2019 EuroLLVM Developers’ Meeting: V. Bridgers & F. Piovezan "LLVM IR Tutorial - Phis, GEPs ...".

A free self-guided online course from Cornell University. Goes in depth on IR and optimizations but is very accessible at the same time. Again, cannot recommend highly enough. CS 6120: Advanced Compilers: The Self-Guided Online Course.

DEV Community