Lets Get Started
This is a simple lexer written in Rust that can tokenize arithmetic expressions containing numbers and the +
, -
, *
, and /
operators.
Token
The Token
enum represents the different types of tokens that can be produced by the lexer. It has five variants: Number(i32)
, Plus
, Minus
, Multiply
, and Divide
.
#[derive(Debug, PartialEq)]
pub enum Token {
Number(i32),
Plus,
Minus,
Multiply,
Divide,
}
The Number(i32)
variant represents a number token and contains an integer value. The other variants represent the four arithmetic operators.
Lexer
The Lexer
struct represents the lexer itself. It has one field, chars
, which is an iterator over the characters of the input string.
pub struct Lexer<'a> {
chars: Chars<'a>,
}
The lifetime parameter 'a
indicates that the lexer borrows its input string for its entire lifetime.
new
The new
method creates a new instance of the lexer with a given input string.
impl<'a> Lexer<'a> {
pub fn new(input: &'a str) -> Self {
Lexer { chars: input.chars() }
}
tokenize
The tokenize
method tokenizes the input string and returns a vector of tokens. It repeatedly calls the private method next_token
to obtain each token until there are no more tokens left.
pub fn tokenize(&mut self) -> Vec<Token> {
let mut tokens = Vec::new();
while let Some(token) = self.next_token() {
tokens.push(token);
}
tokens
}
next_token
The private method next_token
returns the next token from the input string or None if there are no more tokens left. It uses pattern matching on characters to determine which type of token to return.
fn next_token(&mut self) -> Option<Token> {
let next_char = self.chars.next()?;
match next_char {
'+' => Some(Token::Plus),
'-' => Some(Token::Minus),
'*' => Some(Token::Multiply),
'/' => Some(Token::Divide),
'0'..='9' => {
let mut number = next_char.to_digit(10)? as i32;
while let Some(next_char) = self.chars.clone().next() {
if let Some(digit) = next_char.to_digit(10) {
number = number * 10 + digit as i32;
self.chars.next();
} else {
break;
}
}
Some(Token::Number(number))
}
_ => None,
}
}
If it encounters a character representing one of the four arithmetic operators (+
, -
, *
, /
), it returns the corresponding Token
variant. If it encounters a digit character (0
to 9
), it reads all subsequent digit characters to form a number and returns a Number
token with that value. If it encounters any other character, it returns None
.
Example
Here's an example that shows how to use the lexer to tokenize an arithmetic expression:
let mut lexer = Lexer::new("1 + 2 * 3 - 4 / 5");
let tokens = lexer.tokenize();
assert_eq!(
tokens,
vec![
Token::Number(1),
Token::Plus,
Token::Number(2),
Token::Multiply,
Token::Number(3),
Token::Minus,
Token::Number(4),
Token::Divide,
Token::Number(5)
]
);
This code creates a new instance of the Lexer
with the input string "1 + 2 * 3 - 4 / 5"
, calls its tokenize
method to obtain a vector of tokens, and then asserts that the resulting vector of tokens is equal to the expected value.
Is there anything else you would like to know, if so contact me at @SensoryKopi
Top comments (0)