Resources
- Find the Github link here
Tokens
So we need to define a list of all the tokens we expect in a math input. So go ahead and add a class Tokens.cs
. The class will have an enum with all the tokens like so. We define it above the Tokens class.
public enum Token
{
NUMBER=0,
ADD, // +
MINUS, // -
MULTIPLY, // *
DIVISION, // /
RBRACE, // (
LBRACE, // )
EOF // END OF FILE
}
We also need a way to store the values attached to the Number Token. We will add the following to the Tokens
class
public class Tokens
{
public readonly Token _tokenType;
public readonly object _value;
public Tokens(Token tokenType, object value)
{
this._tokenType = tokenType;
this._value = value;
}
public override string ToString()
{
return " " + this._tokenType + ":" + this._value;
}
}
Let's add a way we can transform our Text input to Tokens. Enter Lexer
Lexer
So we need a class which will transform text to tokens, go ahead and create a Lexer.cs
class.
Let's define the basics for the class
public class Lexer
{
private readonly List<Tokens> tokens;
private readonly string _input;
private Int32 pos=0;
private char curr_input;
public Lexer(string input)
{
this._input = input;
tokens = new List<Tokens>();
this.curr_input = input.Length > 0 ? this._input[pos] : '\0'; // set first char
}
}
The class will have a list of tokens, a string input and a position counter pos
and also a current_input which is a char.
NOTE : this.curr_input
sets the first value if empty it sets a null byte as first value.
We need a method which will allow us to get the next character(char) of input and update the position;
private void Get_Next()
{
if(pos < this._input.Length - 1)
{
pos++;
this.curr_input = this._input[pos];
}
else
{
curr_input = '\0';
}
}
So right after the constructor we define the above method. This method checks if the pos(position of the current token) is within range. It increments the position and then it updates the current character. If it's out of range it sets the current character as a null byte \0
from ASCII table.
Now we need a method which will iterate through the whole input and create tokens.
public List<Tokens> Get_Tokens()
{
while (true)
{
if(curr_input == '\0')
{
Tokens eofToken = new Tokens(Token.EOF, null);
tokens.Add(eofToken); // Add the End OF File TOKEN
break;
}
Get_Next();
}
return tokens;
}
Our Get_Tokens
method simply iterates through the whole list and generates the tokens, in this case we are checking for the null byte and creating an EOF token then breaking from the while loop.
Since we're already here let's create a override ToString()
method. So that we can see all the tokens.
public override string ToString()
{
StringBuilder sb = new StringBuilder();
foreach (var token in tokens)
{
sb.Append(token.ToString());
}
return sb.ToString();
}
To view the EOF output update last Console.WriteLine
in Program.cs
like so;
// generate tokens
Lexer lexer = new Lexer(input);
List<Tokens> tokens = lexer.Get_Tokens();
Console.WriteLine(">> {0}", lexer.ToString());
So now when you run it and hit ENTER
you'll see this out put
Lexing Numbers
When you think of basic number structure it really only consist of the following;
- Numbers from
0-9
- Decimal numbers can be of the form
0.455
or9.345
With this in mind let's create a method that can generate a number token and store it in decimal
object which has 128 bits (sufficiently large enough for our test).
Let's start by creating a representation of what we expect a number should contain A list will do;
Add the below list at the top alongside other private
fields
// LIST CHECKER
private List<char> NumberList = new List<char> { '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
We can now define our Generate Number method;
private Tokens Generate_Number()
{
int decimal_count = 0;
StringBuilder sb = new StringBuilder();
while(NumberList.Contains(curr_input))
{
if (curr_input == '.' && decimal_count <= 1)
{
decimal_count++;
}
if(sb.Length < 1 && decimal_count > 0)
{
// You have a decimal place starting
// with no preceding number i.e .6767 = 0.6767
sb.Append("0");
}
sb.Append(curr_input);
Get_Next();
}
string str = sb.ToString();
decimal val = Convert.ToDecimal(str);
return new Tokens(Token.NUMBER, val);
}
Let's update the While loop to in Get_Tokens
like so;
while (true)
{
if(curr_input == ' ' || curr_input == '\t')
{
// Skip empty space
Get_Next();
continue;
}
else if(NumberList.Contains(curr_input))
{
Tokens numberToken = Generate_Number();
tokens.Add(numberToken);
}
else if(curr_input == '\0')
{
Tokens eofToken = new Tokens(Token.EOF, null);
tokens.Add(eofToken);
break;
}
}
We can run the application again and you can now enter a number and you should see an output like this;
Lexing Operators
Let's now add tokens for + - * / ( )
. They all take the same structure, so we will update our while
loop in Get_Tokens
while (true)
{
if(curr_input == ' ' || curr_input == '\t')
{
// Skip empty space
Get_Next();
continue;
}
else if(NumberList.Contains(curr_input))
{
Tokens numberToken = Generate_Number();
tokens.Add(numberToken);
}
else if(curr_input == '+')
{
Tokens additionToken = new Tokens(Token.ADD, null);
tokens.Add(additionToken);
Get_Next();
}
else if (curr_input == '-')
{
Tokens minusToken = new Tokens(Token.MINUS, null);
tokens.Add(minusToken);
Get_Next();
}
else if (curr_input == '*')
{
Tokens multiplyToken = new Tokens(Token.MULTIPLY, null);
tokens.Add(multiplyToken);
Get_Next();
}
else if (curr_input == '/')
{
Tokens divideToken = new Tokens(Token.DIVISION, null);
tokens.Add(divideToken);
Get_Next();
}
else if (curr_input == '(')
{
Tokens lbraceToken = new Tokens(Token.LBRACE, null);
tokens.Add(lbraceToken);
Get_Next();
}
else if (curr_input == ')')
{
Tokens rbraceToken = new Tokens(Token.RBRACE, null);
tokens.Add(rbraceToken);
Get_Next();
}
else if(curr_input == '\0')
{
Tokens eofToken = new Tokens(Token.EOF, null);
tokens.Add(eofToken);
break;
}
}
Now you can run the program and enter some operators like so;
Error Handling
Thus far, we haven't tested some edge case;
- What if the input has characters?
- What if a number has multiple decimal places or characters?
For unknown characters let's add the below else in the above while loop;
else
{
throw new InvalidOperationException($"{curr_input} is an unsupported type");
}
When a user enters an invalid number structure let's say; 56.564.4657
the Convert.ToDecimal()
throws an invalid Format exception so that's handled.
Lets wrap our lexer object in Program.cs
with a try-catch
like so;
try
{
// generate tokens
Lexer lexer = new Lexer(input);
List<Tokens> tokens = lexer.Get_Tokens();
Console.WriteLine(">> {0}", lexer.ToString());
}
catch (Exception ex)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.Error.WriteLine(ex.Message);
Console.ForegroundColor = ConsoleColor.White;
}
So, now we can see errors like so;
Unit Tests
- Right click the solution > Add > New Project
- Choose xUnit
- Give it a project name like
calcy.test
, click next till you see it on your solution. - Right click dependencies on the test project and select
Add Project Reference
- Check the project shown which should be the math interpreter
Then click Ok
- Rename
UnitTest1
toLexerTest
Then add the following tests
public class LexerTest
{
[Fact]
public void TestAllTokens()
{
string expected = " LBRACE: RBRACE: NUMBER:4646 ADD: MINUS: MULTIPLY: DIVISION: NUMBER:565.788 EOF:";
Lexer lexer = new Lexer("( ) 4646 + - * / 565.788");
List<Tokens> tokens = lexer.Get_Tokens();
string actual = lexer.ToString();
Assert.NotEmpty(tokens);
Assert.Equal(expected, actual);
}
[Fact]
public void TestInvalidCharacters()
{
Lexer lexer = new Lexer("Wabebe");
Assert.Throws<InvalidOperationException>(() => lexer.Get_Tokens());
}
[Fact]
public void TestInvalidDecimalNumber()
{
Lexer lexer = new Lexer("35.4533.4546");
Assert.Throws<FormatException>(() => lexer.Get_Tokens());
}
}
To run the tests in visual studio right click the test project and select Run Test
or in dotnet cli
you can use.
dotnet test .\calcy.test
In our part 3 we will;
- Create an AST
Top comments (0)