Writing My First Compiler

Pablo Díaz Márquez on April 05, 2017

Start I wrote my first compiler for a university class project and I really enjoyed it so I want to share my experience on that. I'm a... [Read Full]
markdown guide

This was insightful! I dont understand a lot about compilers yet, but I have a question: Is it also possible to determine syntax errors during lexical analysis? I am not sure if I really understand the difference between lexical and syntax analysis.


Both are essentially different.

The order used in a compiler is:

Lexical -> Syntactical -> Semantical -> Code generation.

Until you reach the Code generation step, you only have ensured that the code the developer wrote is "valid" according to your spec (or as technically correctly said: the construction belongs to the language).

The Lexical analysis part only parses the text and converts it to a list of tokens.


int a = 1;
int b=2;a = b;

Would transform into a list containing:

<type, int>, <id, a>, <EQUAL>, <literal, 1>, <SEMICOLON>, <type, int>, <id, b>, <literal, 2>, <SEMICOLON>, <id, a>, <EQUAL>, <id, b>, <SEMICOLON>

As you can see, is the Lexical's responsibility to take whitespaces and line breaks into account.

This list is the input of the Syntactical analyzer, which will use a defined syntax/grammar (in example, you can see Python's official grammar: docs.python.org/3/reference/gramma...)

It will consume this list and check that the order matches SYNTACTICALLY VALID language constructions (ie: TYPE, ID, EQUAL, [expression, literal, id]).

A lexical error could be "ID name too long" (if the language only allowed for ids with less than 30 characters), and a syntactical error would be trying to do int 2 = 3, as the construction TYPE, literal, EQUAL, literal does not belong to the language.

(I will only make a brief point about the semantical analysis, as it's out of the scope of your question)

The SEMANTICAL analysis takes care about the usage about the said constructions, in example, using a variable before it's defined.

int a = 1;
a = b;
int b = 2;

This is a SYNTACTICALLY valid construction, but SEMANTICALLY incorrect, as you are using the variable b before it's been declared.


Thank you for taking the time to explain! Your explanation made perfect sense, Before I didnt fully realize that the computer is given only characters and has to group them at first. Its hard to read the characters without reading the words for humans but the computer obviously has to figure out the words first! This helped me a lot.


I loved the post! I think it's easy for programmers to think we just found compilers on a mountain struck by lighting, but this seemed to be very insightful. Thanks for sharing.


Nice post! The lovely task of creating a compiler. It gives so much insight on how the programming languages I use work. For a Java made compiler I would recommend the Gold parser.


For people looking to understand beginning of translators - A beginner's primer on Assembler, compiler, interpreter.


code of conduct - report abuse