Introduction
Writing an expression evaluator, as simple as it may be, has always been a long standing challenge for me, mostly because it feels so simple, and yet, gets tricky very fast.
Writing code to parse and evaluate a simple expression with few operands and/or operators with the same precedence, is simple. However, the generalization to higher complexities can be really tricky due to multiple factors: operator precedence, expression nesting and parenthesis.
Inspired by the first chapter of what promises to be an amazing read of the book: "Engineering a Compiler", 2nd edition, and, taking only their simple definitions of a parser and a scanner, I managed to put together a small example that seems good enough to share.
Devising a language to talk about mathematical expressions
The first thing we need to define when writing something like this, is a language, that translates well into code, that allows us to think about the problem at a higher level of abstraction.
We will say that an expression is composed of tokens, or, as I called it, TokenType
s.
public enum TokenType {
ADD,
SUB,
MUL,
DIV,
POW,
LPAR,
RPAR,
VALUE;
@Override
public String toString() {
switch (this.ordinal()){
case 0:
return "+";
case 1:
return "-";
case 2:
return "*";
case 3:
return "/";
case 4:
return "^";
case 5:
return "(";
case 6:
return ")";
case 7:
return this.name();
default:
return "null";
}
}
public static TokenType fromString(String s){
switch (s){
case "+":
return TokenType.ADD;
case "-":
return TokenType.SUB;
case "*":
return TokenType.MUL;
case "/":
return TokenType.DIV;
case "^":
return TokenType.POW;
case "(":
return TokenType.LPAR;
case ")":
return TokenType.RPAR;
default:
return TokenType.VALUE;
}
}
}
This is a simple Java enum with methods to write from/to Strings into Tokens and vice-versa.
We immediately notice that the .
for decimal values is encoded in the enum as being a "value" since it's not a special token. Obviously this can be improved further, but, seems good enough for now.
So, from now on, when we talk about a mathematical expression, we will talk about it in terms of its TokenType
s.
Writing and understanding the Scanner
The scanner is the piece in a compiler front-end that will read an expression and match its pieces to known tokens, so that the expression can have a real meaning in the context where we will be evaluating it.
Starting with a simple expression like 12+5
, after a Scanner pass, we could get output along the lines of:
(12, Token.NUM) , (+, Token.ADD), (5, Token.NUM)
This will enable us to look at the expression and parse it, evaluate it, reduce it to a simpler form, etc.
In the first Scanner pass, some things will need to be corrected, for instance, if we see the expression:
(-, Token.SUB), (3, Token.NUM)
, what we really want to have is: (-3, Token.NUM)
.
I chose to do this further down the pipeline, in the Parser stage.
The code for the Scanner is as follows:
public class ScannedToken {
private String expressionPiece;
private TokenType type;
public ScannedToken(String exp, TokenType type){
this.expressionPiece = exp;
this.type = type;
}
@Override
public String toString(){
return "(Expr:"+ expressionPiece+ ", Token:"+ type+")";
}
public TokenType type(){
return type;
}
public String expression(){
return expressionPiece;
}
}
A ScannedToken
is a token annotated with the tokenType, which we will use when performing an actual expression scan.
The code for the Scanner is as follows:
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class Scanner {
private final String expression;
public Scanner(String expr) {
this.expression = expr;
}
public List<ScannedToken> scan() {
StringBuilder value = new StringBuilder();
List<ScannedToken> scannedExpr = new ArrayList<>();
for (char c : expression.toCharArray()) {
TokenType type = TokenType.fromString(new String(new char[] {c}));
if (!type.equals(TokenType.VALUE)) {
if (value.length() > 0) {
//Add the full value TOKEN
ScannedToken st = new ScannedToken(value.toString(), TokenType.VALUE);
scannedExpr.add(st);
}
value = new StringBuilder(new String(new char[] {c}));
ScannedToken st = new ScannedToken(value.toString(), type);
scannedExpr.add(st);
value = new StringBuilder();
} else {
value.append(new String(new char[] {c}));
}
}
if (value.length() > 0) {
//Add the full value TOKEN
ScannedToken st = new ScannedToken(value.toString(), TokenType.VALUE);
scannedExpr.add(st);
}
return scannedExpr;
}
public double evaluate(List<ScannedToken> tokenizedExpression) {
if (tokenizedExpression.size() == 1) {
return Double.parseDouble(tokenizedExpression.get(0).expression());
}
//Eval order is PEMDAS - Parenthesis, exponents, multiply, divide, add, subtract
List<ScannedToken> simpleExpr = new ArrayList<>();
int idx =
tokenizedExpression.stream()
.map(ScannedToken::type)
.collect(Collectors.toList())
.lastIndexOf(TokenType.LPAR);
int matchingRPAR = -1;
if (idx >= 0) {
for (int i = idx + 1; i < tokenizedExpression.size(); i++) {
ScannedToken curr = tokenizedExpression.get(i);
if (curr.type() == TokenType.RPAR) {
matchingRPAR = i;
break;
} else {
simpleExpr.add(tokenizedExpression.get(i));
}
}
} else {
simpleExpr.addAll(tokenizedExpression);
return evaluateSimpleExpression(tokenizedExpression);
}
double value = evaluateSimpleExpression(simpleExpr);
// System.out.println("val is " + value);
List<ScannedToken> partiallyEvaluatedExpression = new ArrayList<>();
for (int i = 0; i < idx; i++) {
partiallyEvaluatedExpression.add(tokenizedExpression.get(i));
}
partiallyEvaluatedExpression.add(new ScannedToken(Double.toString(value), TokenType.VALUE));
for (int i = matchingRPAR + 1; i < tokenizedExpression.size(); i++) {
partiallyEvaluatedExpression.add(tokenizedExpression.get(i));
}
// from idx find first ), extract, evaluate, replace, call recursively
// System.out.println("Expr to eval indexes: " + idx + ", " + matchingRPAR);
System.out.println(partiallyEvaluatedExpression);
return evaluate(partiallyEvaluatedExpression);
}
//A simple expression won't contain parenthesis
public double evaluateSimpleExpression(List<ScannedToken> expression) {
if (expression.size() == 1) {
return Double.parseDouble(expression.get(0).expression());
} else {
List<ScannedToken> newExpression = new ArrayList<>();
int idx = expression.stream().map(ScannedToken::type).collect(Collectors.toList()).indexOf(TokenType.POW);
if (idx != -1) {
double base = Double.parseDouble(expression.get(idx - 1).expression());
double exp = Double.parseDouble(expression.get(idx + 1).expression());
DecimalFormat df = new DecimalFormat(".00");
double ans = Math.pow(base, exp);
for (int i = 0; i < idx - 1; i++) {
newExpression.add(expression.get(i));
}
newExpression.add(new ScannedToken(ans + "", TokenType.VALUE));
for (int i = idx + 2; i < expression.size(); i++) {
newExpression.add(expression.get(i));
}
return evaluateSimpleExpression(newExpression);
} else {
int mulIdx = expression.stream()
.map(ScannedToken::type)
.collect(Collectors.toList())
.indexOf(TokenType.MUL);
int divIdx = expression.stream()
.map(ScannedToken::type)
.collect(Collectors.toList())
.indexOf(TokenType.DIV);
int computationIdx = (mulIdx >= 0 && divIdx >= 0) ? Math.min(mulIdx, divIdx) : Math.max(mulIdx, divIdx);
if (computationIdx != -1) {
double left = Double.parseDouble(expression.get(computationIdx - 1).expression());
double right = Double.parseDouble(expression.get(computationIdx + 1).expression());
DecimalFormat df = new DecimalFormat(".00");
double ans = computationIdx == mulIdx ? left * right : left / right * 1.0;
for (int i = 0; i < computationIdx - 1; i++) {
newExpression.add(expression.get(i));
}
newExpression.add(new ScannedToken(ans + "", TokenType.VALUE));
for (int i = computationIdx + 2; i < expression.size(); i++) {
newExpression.add(expression.get(i));
}
return evaluateSimpleExpression(newExpression);
} else {
int addIdx = expression.stream()
.map(e -> e.type())
.collect(Collectors.toList())
.indexOf(TokenType.ADD);
int subIdx = expression.stream()
.map(e -> e.type())
.collect(Collectors.toList())
.indexOf(TokenType.SUB);
int computationIdx2 = (addIdx >= 0 && subIdx >= 0) ?
Math.min(addIdx, subIdx) :
Math.max(addIdx, subIdx);
if (computationIdx2 != -1) {
double left = Double.parseDouble(expression.get(computationIdx2 - 1).expression());
double right = Double.parseDouble(expression.get(computationIdx2 + 1).expression());
DecimalFormat df = new DecimalFormat(".00");
double ans = computationIdx2 == addIdx ? left + right : (left - right) * 1.0;
for (int i = 0; i < computationIdx2 - 1; i++) {
newExpression.add(expression.get(i));
}
newExpression.add(new ScannedToken(ans + "", TokenType.VALUE));
for (int i = computationIdx2 + 2; i < expression.size(); i++) {
newExpression.add(expression.get(i));
}
return evaluateSimpleExpression(newExpression);
}
}
}
}
return -1.0;
}
}
The code for the scanner is a bit too convoluted, but, the basic idea is to look at an expression and evaluate it like a human mathematician would:
We look for the innermost parenthesis, extract its inner expression and consider it a Simple expression
. So a simple expression is an expression without parenthesis.
A simple expression can be evaluated directly, via recursion, following the PEMDAS rule: Parenthesis first, then exponents, then multiplication and division in the order which they appear and then subtraction and additions in the order they appear.
Once simple expressions are evaluated, the resulting token is replaced in the original expression, and recursively passed to be evaluated again, until the end result is a single token with a single value.
And this is the basic idea.
Adding a Parser to avoid issues with "incomplete" tokens
Obviously, the way this is implemented now, it's still incomplete, both in terms of error handling and wrongly tokenized expressions. Error handling hasn't been done, so things like incomplete expressions or unbalanced parenthesis will either take the process into a loop or just crash. To do later :)
What we need to do now in the parser, to get a simple solution working, is to decide when can we create a negative number token from a pair of Token.SUB and Token.VALUE.
The edge cases here are two:
if Token.SUB and Token.VALUE are in the beginning of the expression, then we create a single token with a negative value;
if these two tokens are right after an opening parenthesis, they are part of the beginning of a complex expression on which case we need to unify them as well;
This is what we do in the parser:
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class Parser {
private final List<ScannedToken> expression;
public Parser(List<ScannedToken> expression) {
this.expression = expression;
}
/*
We need a triple in a sliding window fashion where middle is current token so we can contextualize what
needs to be parsed into correct tokens
*/
public List<ScannedToken> parse() {
TokenType prev = null;
TokenType curr = null;
TokenType next = null;
List<ScannedToken> properlyParsedExpression = new ArrayList<>();
List<TokenType> types = expression.stream().map(ScannedToken::type).collect(Collectors.toList());
List<Integer> indexes = new ArrayList<>();
List<ScannedToken> negativeValues = new ArrayList<>();
for (int i = 0; i < types.size() - 1; i++) {
prev = i == 0 ? null : types.get(i - 1);
curr = types.get(i);
next = types.get(i + 1);
if (prev == null && curr == TokenType.SUB && next == TokenType.VALUE) {
ScannedToken negativeValue =
new ScannedToken("" + (-1 * Double.parseDouble(expression.get(i + 1).expression())),
TokenType.VALUE);
System.out.println("new token at index " + i);
indexes.add(i);
negativeValues.add(negativeValue);
} else if (prev == TokenType.LPAR && curr == TokenType.SUB && next == TokenType.VALUE) {
ScannedToken negativeValue =
new ScannedToken("" + (-1 * Double.parseDouble(expression.get(i + 1).expression())),
TokenType.VALUE);
System.out.println("new token at index " + i);
indexes.add(i);
negativeValues.add(negativeValue);
}
}
int maxIterations = expression.size();
int i = 0;
int j = 0;
while (i < maxIterations) {
if (indexes.contains(i) && j < negativeValues.size()) {
properlyParsedExpression.add(negativeValues.get(j));
j++;
i++;
}
else {
properlyParsedExpression.add(expression.get(i));
}
i++;
}
System.out.println(properlyParsedExpression);
return properlyParsedExpression;
}
}
And, finally the Main function of our code, where we execute what we did:
import java.util.List;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
class Main {
public static void main(String[] args) throws IOException{
System.out.println("Enter a mathematical expression");
//Enter data using BufferReader
while(true){
BufferedReader reader =
new BufferedReader(new InputStreamReader(System.in));
// Reading data using readLine
String name = reader.readLine();
Scanner sc = new Scanner(name);
//(12*5)+1*(8.6-3*2^2)-34
//(12*5)
//12*5
//12
//-5+12*(-3+2)
//-12
//12
//15/(3+2)
List<ScannedToken> scanExp = sc.scan();
Parser parser = new Parser(scanExp);
List<ScannedToken> parsed = parser.parse();
scanExp.forEach(e->System.out.println(e));
System.out.println(sc.evaluate(parsed));
}
}
}
So, we evaluate the parsed expression, with the corrected tokens, and we get the value we want.
Additional design considerations
There are many additional design considerations to take that can improve this:
Error handling is inexistent, so, if there is an invalid expression, the scanner will just die with a stackoverflow or outOfBounds exception;
The parser needs to be more robust to handle wrong inputs graciously;
Obviously, this is an approach that mimics how a human would evaluate an expression, and the grammar rules, which are in essence the precedence rules of operations are all encoded in the
Scanner
code by following the PEMDAS rule programatically; In a real implementation, we can assign for example precedence values to the operators and build a tree that takes these precedences into account, maybe delegating this step to the parser, and then evaluating the expression is as easy as doing an in-order tree traversal by priority values recursively evaluating sub-expressions;
Conclusion
This was a great exercise and I think it's a nice challenge that everyone can take on! It shows how something which is deceptively simple can actually be quite tricky and it's a humbling exercise!
As a recommended read from where I got the inspiration to write this, I recommend the 2nd edition of the book "Engineering a Compiler"
Top comments (1)
Interesting approach!
I've personally done something similiar in a project a few weeks ago.
First, I'm taking the input in infix notation (
1 + 2
) and convert it into RPN (reverse polish notation,1 2 +
) using Dijkstra's Shunting-yard algorithm.Having the expression in RPN format, you start solving it by pushing values and operators on a Stack and
pop()
-ing through them, until no more data to process is available.You can see my code over here on Github inside the project I used it for:
github.com/RicoBrase/ChatCalculato...
Asscociated unit test:
github.com/RicoBrase/ChatCalculato...
There are some code leftovers from experimenting with Optionals, just ignore that. 😉