Shunting Yard parser - Expression calculator

#go #programming #algorithms

There are many ways how to parse input tokens in the correct order with respecting parenthesis, operator priorities and associativity. But Shunting Yard algorithm is probably the most known one, especially for mathematical expression.

🇨🇿 V češtině si lze článek přečíst na kutac.cz

The output of lexical analysis from the previous post is list of tokens, which is basically input in a different format. A parser is here to build some data structure from the token list, which will be easy to evaluate. And in the correct order, so parenthesis and operators with higher priority will be evaluated last.

Algorithm Shunting Yard is known for converting infix notation into Reverse Polish notation, known as postfix notation. However, RPN has many downsides. It is not possible to recognize between unary and binary operators like + or -. Or handle functions with a variable number of arguments. And just because RPN is not using parenthesis.

So my implementation below is modified to produce Abstract syntax tree rather than RPN.

Algorithm

The Shunting Yard algorithm is named after railway classification/marshalling yard by E. Dijkstra. Because the whole process is about moving tokens between input queue, operator stack and output queue/stack. My implementation is producing AST, so the output in my case is output stack.

🔗 ⚠️ Code below is simplified. See whole code on GitHub in arxeiss/go-expression-calculator repository.

func (p *Parser) Parse(tokenList []*lexer.Token) (ast.Node, error) {
    output := make([]ast.Node, 0)
    opStack := make([]*lexer.Token, 0)
    var err error

tokenLoop:
    for i := 0; i < len(tokenList); i++ {
        curToken := tokenList[i]
        switch curToken.Type() {
        case lexer.Number:
            // if the token is a number, then: push it to the output queue
            output = append(output, ast.NumericNode(curToken.Value()))

        case lexer.Identifier:
            //  if the token is a function then: push it onto the operator stack
            if tokenList[i+1].Type() == lexer.LPar {
                // Expecting the identifier is function name, because is followed by (
                opStack = append(opStack, tokenList[i])
            } else {
                // if the token is a variable name, then: push it to the output queue.
                output = append(output, ast.VariableNode(tokenList[i].Identifier()))
            }

        case lexer.Addition, lexer.Substraction, lexer.Exponent, lexer.Multiplication,
            lexer.Division, lexer.FloorDiv, lexer.Modulus:
            // if the token is an operator then: while there is an operator at the top of the operator stack
            for len(opStack) > 0 {
                topStackEl := opStack[len(opStack)-1]
                if topStackEl.Type() == lexer.LPar {
                    break
                }
                // If the operator at the top of the operator stack has greater precedence
                // OR
                // The operator at the top of the operator stack has equal precedence AND the token is left associative
                if p.priorities.GetPrecedence(topStackEl.Type()) > p.priorities.GetPrecedence(curToken.Type()) ||
                    (p.priorities.GetPrecedence(topStackEl.Type()) == p.priorities.GetPrecedence(curToken.Type()) &&
                        p.priorities.GetAssociativity(curToken.Type()) == parser.LeftAssociativity) {

                    // pop operators from the operator stack onto the output queue
                    output, err = p.addToOutput(output, topStackEl)
                    opStack = opStack[:len(opStack)-1]
                    continue
                }
                break
            }
            // push token onto the operator stack
            opStack = append(opStack, curToken)

        case lexer.LPar:
            // if the token is a left parenthesis, then: push it onto the operator stack
            opStack = append(opStack, curToken)

        case lexer.RPar:
            // if the token is a right parenthesis, then:
            // while the operator at the top of the operator stack is not a left parenthesis:
            for len(opStack) > 0 {
                topStackEl := opStack[len(opStack)-1]
                if topStackEl.Type() == lexer.LPar {
                    break
                }
                // pop the operator from the operator stack onto the output queue.
                output, err = p.addToOutput(output, topStackEl)
                opStack = opStack[:len(opStack)-1]
            }
            // if there is a left parenthesis at the top of the operator stack, then:
            // pop the operator from the operator stack and discard it
            opStack = opStack[:len(opStack)-1]
            // if there is a function token at the top of the operator stack, then:
            // pop the function from the operator stack onto the output queue.
            if opStack[len(opStack)-1].Type() == lexer.Identifier {
                output, err = p.addToOutput(output, opStack[len(opStack)-1])
                opStack = opStack[:len(opStack)-1]
            }
        }
        if err != nil {
            return nil, err
        }
    }

    // while there are still operator tokens on the stack
    for len(opStack) > 0 {
        // pop the operator from the operator stack onto the output queue
        topStackEl := opStack[len(opStack)-1]
        output, err = p.addToOutput(output, topStackEl)
        opStack = opStack[:len(opStack)-1]
    }
    return opStack, err
}
// addToOutput is converting nodes in output list into a AST
func (p *Parser) addToOutput(output []ast.Node, token *lexer.Token) ([]ast.Node, error) {
    var op ast.Operation
    switch t := token.Type(); t {
    case lexer.UnaryAddition, lexer.UnarySubstraction:
        output[len(output)-1] = ast.NewUnaryNode(t, output[len(output)-1])
    case lexer.Addition, lexer.Substraction, lexer.Multiplication, lexer.Division, lexer.Exponent,
        lexer.FloorDiv, lexer.Modulus:

        r := output[len(output)-1]
        l := output[len(output)-2]
        output = output[:len(output)-1]
        output[len(output)-1] = ast.NewBinaryNode(t, l, r)
    case lexer.Identifier:
        // Current functions support only 1 parameter
        output[len(output)-1] = ast.NewFunctionNode(token.Identifier(), output[len(output)-1])
    }
    return output, nil
}

Algorithm modifications and possible extensions

The algorithm above is based on the pseudo-code from the Wikipedia page. That is not fully handling cases like 5 44 90 or 5 +/*-//-* 20. It isn't checking if the number or variable token is followed by the operator and vice-versa. It does not handle unary operators or functions with a variable number of arguments.

The first two issues can be easily fixed by state variable expect which can be either ExpectOperand ExpectOperator. This is checking if the incoming token is of the correct type. The extension is described in more detail in StackOverflow. The same approach can be used to handle unary operators.

ℹ️ Both cases are handled in the full code on GitHub in arxeiss/go-expression-calculator repository.

To solve the issue with a variable number of parameters will be solved in future post. It might get a little bit harder.

DEV Community

Shunting Yard parser - Expression calculator

Algorithm

Algorithm modifications and possible extensions

Top comments (0)

Read next

All 22 Software Engineer Skills to Master

2940. Find Building Where Alice and Bob Can Meet

LiveAPI Devlogs Part 3: Transforming User Onboarding with 3 Industry-Inspired Methods

Counting the number of Tokens sent to a LLM in Go (part 2)