re: Writing My First Compiler VIEW POST

re: This was insightful! I dont understand a lot about compilers yet, but I have a question: Is it also possible to determine syntax errors during lexi...

Both are essentially different.

The order used in a compiler is:

Lexical -> Syntactical -> Semantical -> Code generation.

Until you reach the Code generation step, you only have ensured that the code the developer wrote is "valid" according to your spec (or as technically correctly said: the construction belongs to the language).

The Lexical analysis part only parses the text and converts it to a list of tokens.


int a = 1;
int b=2;a = b;

Would transform into a list containing:

<type, int>, <id, a>, <EQUAL>, <literal, 1>, <SEMICOLON>, <type, int>, <id, b>, <literal, 2>, <SEMICOLON>, <id, a>, <EQUAL>, <id, b>, <SEMICOLON>

As you can see, is the Lexical's responsibility to take whitespaces and line breaks into account.

This list is the input of the Syntactical analyzer, which will use a defined syntax/grammar (in example, you can see Python's official grammar:

It will consume this list and check that the order matches SYNTACTICALLY VALID language constructions (ie: TYPE, ID, EQUAL, [expression, literal, id]).

A lexical error could be "ID name too long" (if the language only allowed for ids with less than 30 characters), and a syntactical error would be trying to do int 2 = 3, as the construction TYPE, literal, EQUAL, literal does not belong to the language.

(I will only make a brief point about the semantical analysis, as it's out of the scope of your question)

The SEMANTICAL analysis takes care about the usage about the said constructions, in example, using a variable before it's defined.

int a = 1;
a = b;
int b = 2;

This is a SYNTACTICALLY valid construction, but SEMANTICALLY incorrect, as you are using the variable b before it's been declared.


Thank you for taking the time to explain! Your explanation made perfect sense, Before I didnt fully realize that the computer is given only characters and has to group them at first. Its hard to read the characters without reading the words for humans but the computer obviously has to figure out the words first! This helped me a lot.

code of conduct - report abuse