DEV Community

Eli
Eli

Posted on • Updated on

Intro To Parsing & Lexing

About

I decided one day, that I would learn C++ and make a programming language. It is PieLang. I have been working on it for 6 months, and hope it will come out swell. I dived into it, and found that the water was, way, way, way to deep. I decided to take a break, and learn how Lexing and Parsing work. 2 months later, I continued. It is an advanced topic, and I still don't know it all. But this is the basics to help you start creating your own programming language!

Lexing

Lexing or lex is the first step of making a programming language. I, if I need to provide code snippets, will write them in C++.

Well, Lexing basically takes the source code(file/input) and turns them into a stream of tokens. Take this code for example:

print(2 + 2);
Enter fullscreen mode Exit fullscreen mode

This would return a token stream, something like this:

IDENT: "print"
LPAREN: '('
NUMBER: '2'
OP: '+'
NUMBER: '2'
RPAREN: ')'
EOL: ';'
Enter fullscreen mode Exit fullscreen mode

EOL stands for end of line by the way

You see how that went character by character? It returned a token stream!

If you don't get what I mean, think of this:

std::string Lexer(){
  // ...
  return token_stream;
}
Enter fullscreen mode Exit fullscreen mode

Or something like that.

Lets do another tokenize...

int main(){
  std::cout << "Hello, world!";
}
Enter fullscreen mode Exit fullscreen mode

Let's see what the lexer spits out this time:

IDENT: "int"
IDENT: "main"
LPAREN: '('
RPAREN: ')'
LCURLY: '{'
IDENT: "std::cout"
LESS_THAN: '<'
LESS_THAN: '<'
STRING: "Hello, world!"
EOL: ';'
RCURLY: '}'
Enter fullscreen mode Exit fullscreen mode

Woah, that was much more! Imagine your biggest program, then imagine how many tokens there were!

Note: See how the double quotes 'disappear'? Thats because the lexer sees them, but it doesn't output them as tokens, it just tells the lexer: "Oh, hey, this is a string!"

Double-Note: std:: is a library so if the real lexer saw that it would be like: "Lets access the struct/namespace/class called std! Oh wait, that is a built in lib! Lets access that instead"

Lets imagine that we were checking if something was a print statement(this code is not real)!

if(type == "IDENT" && val == "std::cout"){
  //...
  std::cout << string;
}
Enter fullscreen mode Exit fullscreen mode

That is not real code, it just helps us understand more!

Now that you know some lexing, its parsing...

Parsing

The parser creates an AST(abstract syntax tree) and then either executes it or sends it to the compiler/interpreter to do the work.

Example of an AST:

     +
    / \
   1   2
Enter fullscreen mode Exit fullscreen mode

This basic representation of a AST helps us understand it better! Lets take a look here.

The + symbol is telling that we are adding these numbers.
Then the two numbers are below it(1, 2).

Lets take a more advanced equation.

1+2*3-4

Hmm, we couldn't do that in a if statement! That's why we have to send it to the parser!

       -
      /  \
     +    4
    / \
   1   *
       /\
      2  3
Enter fullscreen mode Exit fullscreen mode

Remember that little AST that we made before? Well that is this, just more of them packed together! Don't believe me? Lemme explain. We start of with - or minuce. (Remember how we started off with the + last time?) Then we went to + and 4. Lets think of the plus as a variable, which it holds this value:
+ = 1+(2*3)
I know that is against the rules of coding, but this is just to show you
You get it? Good. Cause next is the second stage.

The parser eventually solves all of the problems, like so:

# Second stage 
      - 
     / \
    +   4
   / \
  1   6 
Enter fullscreen mode Exit fullscreen mode

Notice: Notice how the (2*3) turned to six? Yeah. Cool right?

# Third stage
    -
   / \ 
  7   4
Enter fullscreen mode Exit fullscreen mode

Yay, we got it all the way down to 7-4! Now we can just do the equation...

# Final stage 
3
Enter fullscreen mode Exit fullscreen mode

Boom! We just solved 1+2*3-4!

After

It takes a strong mind to actually code these things. I hope you can do this, with the foundation I have created for you.

Links

Here are some useful links!

Craftinginterpreters.com This link helped me with the basic knowledge I know, so go check it out: https://craftinginterpreters.com

PieLang Want to see the progress that I am making with my language? Here is the github link: https://github.com/elipie/PieLang

Then, the most useful link I could give you is: https://google.com Google and any browser is a great source of ideas, and help.

Found something?

Found a bug, error, typo, or something thats not true? Put it in the comments

Happy coding, and stay safe.

Top comments (6)

Collapse
 
whippingdot profile image
Sanjaay R.

Hey, you should create a series, like @codemonkey51

Collapse
 
potentialstyx profile image
PotentialStyx

Lol, you know I didn't invent series's

Collapse
 
whippingdot profile image
Sanjaay R.

but still

Collapse
 
potentialstyx profile image
PotentialStyx

Cool!

Collapse
 
dynamicsquid profile image
DynamicSquid

PieLang!

Collapse
 
elipie profile image
Eli

PieLang!