Discussion on: Writing My First Compiler

View post

Both are essentially different.

The order used in a compiler is:

Lexical -> Syntactical -> Semantical -> Code generation.

Until you reach the Code generation step, you only have ensured that the code the developer wrote is "valid" according to your spec (or as technically correctly said: the construction belongs to the language).

The Lexical analysis part only parses the text and converts it to a list of tokens.

IE:

int a = 1;
int b=2;a = b;

Would transform into a list containing:

<type, int>, <id, a>, <EQUAL>, <literal, 1>, <SEMICOLON>, <type, int>, <id, b>, <literal, 2>, <SEMICOLON>, <id, a>, <EQUAL>, <id, b>, <SEMICOLON>

As you can see, is the Lexical's responsibility to take whitespaces and line breaks into account.

This list is the input of the Syntactical analyzer, which will use a defined syntax/grammar (in example, you can see Python's official grammar: docs.python.org/3/reference/gramma...)

It will consume this list and check that the order matches SYNTACTICALLY VALID language constructions (ie: TYPE, ID, EQUAL, [expression, literal, id]).

A lexical error could be "ID name too long" (if the language only allowed for ids with less than 30 characters), and a syntactical error would be trying to do int 2 = 3, as the construction TYPE, literal, EQUAL, literal does not belong to the language.

(I will only make a brief point about the semantical analysis, as it's out of the scope of your question)

The SEMANTICAL analysis takes care about the usage about the said constructions, in example, using a variable before it's defined.

int a = 1;
a = b;
int b = 2;

This is a SYNTACTICALLY valid construction, but SEMANTICALLY incorrect, as you are using the variable b before it's been declared.