DEV Community

LarmkaartDev
LarmkaartDev

Posted on

The lexer

The lexer is a surprisingly simple component of a compiler. It mainly consists of a bunch of if-statments. The lexer will break up the code in seperate lines and analyze them one by one.

1. Tokens

But before looking at the lexer, let's take a better look at tokens. A token is a very simple array that consists of a Type and a Value:
{ Type = TYPE_NAME, Value = TOKEN_VALUE }

The Type property describes the type of token like:

  • a statement
  • a number
  • a variable

The Value property can have different meanings depending on the type:

  • the statement type
  • the number value
  • the variable name

a token can have even more properties depending on the type.

2. The lexer

Let's see how the lexer will break down this example line into tokens:
var y = x + 2
I will use the string variable in the code examples to represent the part we are currently looking at.

the first part is var. We will insert a variable declaration token:

if string == "var" then
    pushTokens({Type = "statement", Value = "var"}) -- puts the token at the end of the tokens list
end
Enter fullscreen mode Exit fullscreen mode

Now we have created our first token using the lexer!

Next up is y. This is a variable, but the lexer doesn't know this. Luckily, it is able to look at the previous token and see that it's a declaration, so it will add the y variable to the variable list.

if prevToken.Type == "statement" and prevToken.Value == "var" then -- the previous token declares a variable
    pushTokens({Type = "variable", Value = string}) -- put new token in the token list
    table.insert(localVariables, string) -- put new variable in the variable list
end
Enter fullscreen mode Exit fullscreen mode

Next we have =. This is a simple case of adding a new token with Type assigner and value =:

pushTokens({Type = "assigner", Value = "="})
Enter fullscreen mode Exit fullscreen mode

now the variable x. We will say we defined x earlier in the code somewhere, so the lexer already knows it's a variable. If it wasn't defined and the previous token is not a variable declaration token then the lexer should throw an error.

if table.find(localVariables, string) then -- A variable exist with name x?
    pushTokens({Type == "variable", Value = string}) -- Let's add it!
elseif prevToken.Type == "statement" and prevToken.Value == "var" then
    pushTokens({Type = "variable", Value = string})
    table.insert(localVariables, string)
else
    error("Unknown variable " .. string)
end
Enter fullscreen mode Exit fullscreen mode

all thats's left are + and 2 these will be converted into these simple tokens:

{Type = "operator", Value = "+"}
{Type = "number", Value = "2"}
Enter fullscreen mode Exit fullscreen mode

Now we have fully generated all of our tokens:

[Type = "statement", Value = "var"},
{Type = "variable", Value = "y"},
{Type = "assigner", Value = "="},
{Type = "variable", Value = "x"},
{Type = "operator", Value = "+"},
{Type = "number", Value = "2"}
Enter fullscreen mode Exit fullscreen mode

This is a basic overview of the lexer. The more additions you add to your language, the complexer the lexer will become, so be sure to keep your code nice and tidy!

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay