Inspiration
I have always been fascinated by compilers and interpreters and their ability to convert human-readable text into something machines can understand. It always felt magical — almost like a black box.
I went through the Crafting Interpreters book and it was incredibly enlightening. But before implementing a full fledged programming language, I wanted to warm up with something smaller
A mathematical expression evaluator.
Turns out, even a tiny calculator contains many of the same ideas used in real language implementations.
TL;DR
In this article we build a small mathematical expression evaluator from scratch using:
- a Lexer to tokenize input
- a Recursive Descent Parser to generate an AST
- an Interpreter to walk the tree and evaluate expressions
Along the way we explore:
- grammars
- operator precedence
- recursive parsing
- abstract syntax trees
- tree-walking interpreters
The goal isn’t to build the “best” calculator parser, but to understand the core ideas that power real compilers and interpreters.
Why do this?
Even though we are not building a full programming language, the stages involved are surprisingly similar:
- Lexing
- Parsing
- AST generation
- Tree walking
- Interpretation
Building one yourself makes parsers stop feeling magical.
Why build an Expression Evaluator?
- Understand how interpreters work internally
- Learn recursive descent parsing
- Learn how ASTs are constructed
- Understand operator precedence naturally
NOTE: Recursive descent is not necessarily the "best" parser here, but it’s the best parser to teach parsing concepts.
The Input We Want to Understand
Say we have this expression:
1 + 2 * (3 + 4)
To evaluate it correctly, we need to understand:
- precedence
- nesting
- structure
Evaluating strictly left-to-right would produce the wrong result.
This is where parsing happens.
Stages of Parsing
The overall big picture looks like this

Each stage transforms the input into a more structured representation.
1. Lexer
What is a Lexer?
A lexer converts raw text into meaningful symbols called tokens.
For example, the expression
(40 + 2) / (3 - 1)
becomes
L_PAREN
NUMBER(40)
PLUS
NUMBER(2)
R_PAREN
SLASH
...
Instead of dealing with raw characters directly, the parser can now work with structured symbols.
A simplified definition of Token class can look like
class TokenType(Enum):
PLUS = '+'
MINUS = '-'
NUMBER = 'NUMBER'
# along with parenthesis, division, EOF etc.
class Token:
t_type: TokenType
value: str
NOTE:
EOForEnd of Filemarks the end of an expression
At the end of lexing we have a clean stream of tokens to work with
2. Parser
The lexer gives us tokens, but tokens alone do not describe relationships.
The parser is responsible for establishing the relationship and structure.
The Grammar
The tokens from the Lexer are just a representation of the source expression, but they lack rules and relationships that the expression needs to follow.
This is where the Grammar comes in. The Grammar defines said rules and allows us to represent the source expression in a Tree structure called the Abstract Syntax Tree (AST)
This is where:
- Precedence naturally emerges
- Rules like multiplication binds tighter than addition is logically represented
The grammar defines rules our expressions must follow.
expression ::= term (('+' | '-') term)*
term ::= unary (('*' | '/') unary)*
unary ::= ('+' | '-') unary | factor
factor ::= NUMBER | '(' expression ')'
NOTE: The rule names like
expression,term,unary, andfactorare not arbitrary. They are referenced from actual mathematical notations
This structure naturally captures operator precedence:
-
*and/bind tighter than+and- -
parenthesescreate nested expressions
The recursive nature of these rules is what gives recursive descent parsing its name.
Each grammar rule becomes a parsing function.
Building the AST?
The parser does not directly compute results.
Instead, it builds an Abstract Syntax Tree (AST).
For example:
1 + 2 * 3
It can be represented as
+
/ \
1 *
/ \
2 3
This tree correctly captures precedence.
The AST nodes themselves are simple:
class NumberNode(ASTNode):
value: float
class BinOpNode(ASTNode):
left: ASTNode
op: Token
right: ASTNode
class UnaryOpNode(ASTNode):
op_token: Token
expr: ASTNode
The idea here is that:
The tree stores structure not evaluation
Implementing the Parser
Now we finally start building the AST. The parser mirrors the grammar directly.
For example, the expression() rules can be written as:
def _expression(self) -> ASTNode:
node: ASTNode = self._term()
while self._current_token.t_type in (
TokenType.PLUS,
TokenType.MINUS
):
op: Token = self._current_token
self._eat(op.t_type)
# Create a tree branch: left is the previous node, right is the next term
node: ASTNode = BinOpNode(
left=node,
op_token=op,
right=self._term()
)
return node
The parser recursively descends through the grammar while constructing the AST.
This is the core idea behind recursive descent parsing.
3. Interpreting the AST
Now that we have a tree structure, evaluating the expression becomes much easier.
The interpreter recursively walks the AST and evaluates each node.
A simplified version looks like:
def evaluate(self, node: ASTNode):
# check for other types of nodes
if isinstance(node, BinOpNode):
left = self.evaluate(node.left)
right = self.evaluate(node.right)
match node.op_token:
case TokenType.PLUS:
return left + right
case TokenType.STAR:
return left * right
Evaluation happens bottom-up:
- evaluate child expressions first
- then combine the results
This traversal naturally respects operator precedence because the AST already encodes it.
Puting Everything Together
The final pipeline is an elegant sequence of functions calls like:
tokens = Lexer(expr).lex()
ast = Parser(tokens).parse()
result = Interpreter().evaluate(ast)
What started as raw text is now transformed into a structured representation that can be evaluated safely and correctly.
Why Recursive Descent?
For expression parsing specifically, recursive descent is not the only approach.
Pratt parsers and the Shunting Yard algorithm are often more elegant or flexible for handling precedence.
However, recursive descent makes the parsing process extremely explicit:
- grammar rules map directly to code
- recursion mirrors nested structure
- AST construction becomes intuitive
For learning how parsers and interpreters work internally, this tradeoff is usually worth it.
Closing Thoughts
A simple calculator turned out to contain many of the same ideas used in real programming language implementations:
- tokenization
- grammars
- tree structures
- recursive evaluation
Once I implemented these pieces myself, parsers stopped feeling magical and start feeling elegant.
The entire code for this article is present in my Github
Do let me know what are your thoughts on parsers and if you have implemented one yourself!
Top comments (0)