loading...

Discussion on: ELI5: How does someone write a new computer language?

Collapse
mortoray profile image
edA‑qa mort‑ora‑y

There's no way to truly ELI5 this without obscuring details, so I'll opt for simplifying, yet trying to stay correct.

Vocabulary

A language starts with a human level description of what the language can do. You decide on a vocabulary, and how that'll look inside a text file. That is, without considering the hardware/software for a moment, you create an agreement on what the words in your language will accomplish.

Once you've done that you'll write a compiler for the language. This is something that is going to let the computer understand your vocabulary.

The compiler consists of several parts.

Parsing

That text file may be readable by you, but the computer won't be able to make any sense of. Parsing translates that text into a form that the computer understands. This is the abstract syntax tree (AST).

At this point the compiler has read your code, but does not yet understand it.

Semantics / Typing

Once the compiler has an AST, it needs to make sense of this. In this stage it goes through and determines what the code does. It figures out what the function names are, and how to call them. It determines what are variables, and how to store data in them. It figures out the conditions for if statements and loops.

At this point the compiler understands your code.

Translation / Lowering

The compiler will now translate the code into a new form, a form that the computer understands. This is often called "lowering" as it takes your high-level language and brings it closer to the machine language the computer understands.

Compilers do this in a variety of stages, often lowering to one or more intermediate languages (IRs), and then finally to the machine code of the machine.

A simple compiler may choose to "lower" to another high-level language, such as C++. This provides a quick way to implement some new languages.

Linking

Your code doesn't live in isolation, it needs to interact with the target machine. In addition to translating the code, the compiler will also setup tables of names and resources. These are how your code will attach to other components on the system.

Host Language

But you may be wondering in what language you write the compiler? This can be any language. The job of the compiler is to translate from your high-level code, to a known low-level language. It doesn't matter what language the compiler itself is written in.

It's relatively easy to write a compiler for a simple language. Complex languages take more effort.

I encourage everybody to write a language once. I've written several of them, from basic domain specific languages, to full modern languages like C++.

This description focuses on a compiled language, and is best understood with imperative or functional languages. The details, and terminology, tends to change as we get into other paradigms, like declarative languages, or interpreted languages. Most of the stages still exist though.

Collapse
karataev profile image
Eugene Karataev
  1. Do all compilers transform source code to AST? Are there any languages which omit this step?

  2. Let's take python and javascript for example. Their syntax is different, but abstractions (variables, functions, loops, conditions, e.t.c) are almost the same. Do they have similar ASTs? Can we take AST generated by python compiler and convert it to the source code in javascript?

Collapse
mortoray profile image
edA‑qa mort‑ora‑y
  1. Yes. However, here is where we'd need to go into technical details to define what an AST truly is. There are some languages, potentially, where the parse directly results in a usable tree. There are also stream languages where no complete tree is ever created. But there's no avoiding the concept of the AST ever -- the compiler/VM/interpreter needs to understand the source.

  2. The ASTs have similar features, but are different depending on the compiler and language. When I talked about "lowering", it is possible, and quite common, to lower from the AST to another language. Not all languages support enough of the source language to be viable. Others require essentially writing a complete VM in them to do the compilation.

Thread Thread
karataev profile image