DEV Community

Pavel Kutáč

Posted on • Updated on

Recursive descent parser - expression calculator

Another way how to parse input tokens is to use a Recursive descent parser. Comparing to Shunting yard from previous articles, this parser can manage more complicated grammars, not only math operations.

🇨🇿 V češtině si lze článek přečíst na kutac.cz

By the name, it is quite clear that recursion is the key in this parser. Each function handles one grammar rule and functions are calling each other. To understand how to design a parser properly, you need to first understand the basics of context-free grammar.

🔗 See whole code on GitHub in arxeiss/go-expression-calculator repository, you can run REPL (interactive mode) too.
⚠️ Theory about grammars is boring, but the parser is based on context-free grammars. So at least the explanation below is required.

LL(k) grammar

A better explanation can be found on Wikipedia - LL parser. But for now, the basics are enough. Class of grammar marked as `LL(k)` means, the grammar does not contain left recursion. And parser needs to know at most `k` tokens to decide which way to go next. In calculator, it would be `LL(1)`. The issue is when there is variable token in the beginning. It can continue with numeric operator, or with assign operator `=`. In both cases, the following step is different. So my calculator is `LL(2)`.

No left recursion

`LL(k)` cannot contain left recursion. The exact definition can be found again on Wiki - Left recursion. An easier explanation is with sample code, which you cannot actually finish.

``````// S is input (non-terminal), can be rewritten with following rules
S -> S + N | N // S can be rewritten recursively to S + N or only to N
N -> number // N can be any valid number
``````

There is left recursion at non-terminal S. Let's try to write a parser for this grammar. The input can be something like `8 + 7 + 2`. The code is simplified into pseudo-code.

``````func (p Parser) parseNumber() {
return p.expect(Number) // Expect number or return error
}
func (p Parser) parseExpression() {
if p.has(???) { // This condition is impossible to finish
left, err := parserExpression() // The endless loop happening here
operator, err := p.parseOperator()
right, err := parserNumber()
} else {
return p.parseNumber()
}
}
``````

You can see in the code, where the endless loop starts. The reason is left recursion in the grammar. If you would put there `p.has(Number)`, it would be always true. There is no way how to know if you should continue with recursion, or parse number and ends. However, it is enough to rewrite the grammar to have only right recursion. For our example it is easy. But for complex grammars, it might be not so easy. See Wiki - Removing left recursion.

ℹ️ If someone thinks, it can be checked like `p.has(Number) && p.hasNext(operator)`, it is not possible. Because in the recursive call, the internal pointer would still point to the beginning of input. And if someone would parse number and operator and then do the recursive call, it would be right recursion. And that is described below.

``````// Recursion is now on the right side
S -> N + S | N // S can be rewritten to N + S or N
N -> number
``````
``````// Now, number is parsed first, then operator is checked. No endless loop here
// In case something cannot be parsed, error is returned
func (p Parser) parseExpression() {
left, err := p.parseNumber()
operator, err := p.parseOperator()
right, err := parserExpression()
}
}
``````

Parse with configurable precedence and associativity

Recursive descent parser is almost like mapping grammar to functions. A nice example is also in Wiki - Recursive descent parser. The implementation is not so hard. However, my calculator has configurable precedence and associativity of operators. And that is complicating the main idea of the parser.

The solution I took is to implement 1 main function. That one is called recursively like with basic approach. The first step is to go recursively so deep as possible. Then the operator priority is checked. Then it is either parsed, or parser steps out of recursion and handles operator with lower priorities. Below is a simplified implementation of this solution. Full code with tests can be found on github.com/arxeiss/go-expression-calculator.

``````func (p *parserInstance) parseExpression(currentPrecedence parser.TokenPrecedence) (ast.Node, error) {
var leftNode ast.Node
var err error

// Like in basic implementation, first step is to go to deepest iteration
// See example https://en.wikipedia.org/wiki/Recursive_descent_parser
if currentPrecedence < p.maxPrecedence {
leftNode, err = p.parseExpression(p.parser.priorities.NextPrecedence(currentPrecedence))
}

// If there is nothing from deeper iteration, we can expect '(' or number or identifier
if leftNode == nil {
switch {
case p.has(lexer.LPar, lexer.Identifier, lexer.Number):
leftNode, err = p.parseTerm()
// Here we should handle also unary operators
}
}
// Iterate over tokens with same priority
for p.getPrecedence(p.current().Type()) == currentPrecedence {
// Now, there must be operator
operatorToken, err := p.expect(binaryOperators...)
// Here we should handle also unary operators

// If operator is RightPrecedence, keep same precedence as right parts should be lower in AST
// So next iteration with same precedence will parse it first
nextPrecedence := currentPrecedence
if p.getAssociativity(operatorToken.Type()) == parser.LeftAssociativity {
nextPrecedence = p.parser.priorities.NextPrecedence(currentPrecedence)
}
rightNode, err := p.parseExpression(nextPrecedence)
return ast.NewBinaryNode(tokenTypeToOperation(operatorToken.Type()), leftNode, rightNode, operatorToken), nil
}
return leftNode, err
}
``````