SEN LLC

Posted on May 20

A Math-Expression Parser in 250 Lines of JavaScript — Recursive Descent, Right-Associative ^, and -2^2 = -4

#javascript #parsers #compilers #webdev

Every "math evaluator in N lines" article either uses shunting-yard (Dijkstra's stack-based RPN converter) or hand-waves over the hard cases (-2^2, right associativity, error positions). This one uses recursive descent because the grammar and the code map 1-to-1, which makes the whole thing easier to read and write about. ~250 lines of JS, 36 unit tests, supports functions, variables, constants, parentheses, and reports errors with a column number and a caret.

🌐 Demo: https://sen.ltd/portfolio/expression-eval/
📦 GitHub: https://github.com/sen-ltd/expression-eval

Why recursive descent (not shunting-yard)

Both work. Shunting-yard is more compact (one for loop, two stacks). But for explaining what's happening, recursive descent wins:

Grammar ↔ code is 1-to-1. You can read the BNF and the implementation in parallel and they look the same.
Stack traces show the AST shape. When you're debugging "why is -2^2 returning 4 instead of -4", the call stack literally is the parse tree.
Error positions are natural. The "current token" is always explicit, so reporting "unexpected ) at column 6" falls out for free.

The grammar:

expression     := additive
additive       := multiplicative (('+' | '-') multiplicative)*
multiplicative := unary (('*' | '/' | '%') unary)*
unary          := ('-' | '+')? power
power          := atom ('^' unary)?
atom           := number | identifier | identifier '(' args? ')' | '(' expression ')'
args           := expression (',' expression)*

Operator precedence is encoded in the nesting depth. That's the whole magic. The deeper a rule is, the tighter it binds.

Trap 1: making `^` right-associative

2^3^2 should be 2^(3^2) = 2^9 = 512, not (2^3)^2 = 64.

For left-associative operators (+ - * / %) the parser builds a left-leaning tree with a while loop:

parseAdditive() {
  let left = this.parseMultiplicative();
  while (current is '+' or '-') {
    const right = this.parseMultiplicative();
    left = { type: "binop", op, left, right };       // accumulate on the LEFT
  }
  return left;
}

For the right-associative ^, switch from loop to recursion:

parsePower() {
  const base = this.parseAtom();
  if (current is '^') {
    consume();
    const exponent = this.parseUnary();              // RECURSE
    return { type: "binop", op: "^", left: base, right: exponent };
  }
  return base;
}

Recursion (not iteration) means the AST for 2^3^2 looks like:

binop ^
  number 2
  binop ^
    number 3
    number 2

DFS evaluation walks the inner 3^2 = 9 first, then 2^9 = 512.

Pin both sides:

test("^ is right-associative", () => {
  assert.equal(evaluateExpression("2^3^2"), 512);          // not 64
});

test("parse produces right-associative AST for '2^3^2'", () => {
  const ast = parse("2^3^2");
  assert.equal(ast.op, "^");
  assert.equal(ast.left.value, 2);
  assert.equal(ast.right.op, "^");
  assert.equal(ast.right.left.value, 3);
  assert.equal(ast.right.right.value, 2);
});

Trap 2: `-2^2 = -4` (unary minus binds LOOSER than `^`)

This is where calculator implementations disagree:

Tool	`-2^2`	Convention
WolframAlpha	`-4`	`^` > unary `-`
Python (`**`)	`-4`	`**` > unary `-`
Mathematica	`-4`	`^` > unary `-`
Excel	`4`	unary `-` > `^` ⚠
Google Sheets	`4`	unary `-` > `^` ⚠
Mathematical convention	`-4`	`^` > unary `-`

The math convention is -2^2 = -(2^2) = -4. Excel is the outlier (probably for backward compatibility with VisiCalc-era spreadsheets).

To get this right, ^ must bind tighter than unary -. That's achieved by the grammar nesting:

unary  := ('-' | '+')? power      ← unary sees '-' first, then drops down to power
power  := atom ('^' unary)?       ← but power's right-hand side can recurse to unary

Walking -2^2 through the parser:

parseUnary() sees -, consumes it, calls parsePower()
parsePower() calls parseAtom(), reads 2
Next token is ^, consume, recurse to parseUnary() for the exponent
The recursion reads 2
parsePower() returns { ^, 2, 2 }
parseUnary() wraps: { unary -, { ^, 2, 2 } } → evaluates to -(2^2) = -4 ✓

The reason the right-hand side of ^ recurses to unary (instead of parsePower() or parseAtom()) is to also allow negative exponents: 2^-3 = 1/8. Both rules — -2^2 = -4 and 2^-3 = 0.125 — fall out of the same single grammar choice.

test("^ binds tighter than * but unary minus binds looser than ^", () => {
  assert.equal(evaluateExpression("-2^2"), -4);     // -(2^2)
  assert.equal(evaluateExpression("(-2)^2"), 4);    // explicit parens win
  assert.equal(evaluateExpression("2 * 3^2"), 18);  // 2 * (3^2)
});

test("^ accepts unary minus on the exponent side", () => {
  assert.equal(evaluateExpression("2^-3"), 0.125);  // 2^(-3)
});

If a user wants 4, they write (-2)^2. The parens are non-optional and that's the right design.

Trap 3: numeric literal tokenisation order matters

Numbers seem easy, but the edge cases bite:

42 — integer
3.14 — float
.5 — leading-dot float
1e3 — scientific (= 1000)
1.5e-2 — negative exponent (= 0.015)
1e — error: exponent has no digits
1e+ — error: sign with no digits

The naïve "scan while character matches [0-9.eE+-]" approach breaks on 1+2 because the + between numbers gets pulled into the first literal.

The fix is to scan state by state in fixed order:

// 1. integer part (optional if dot follows)
while (i < length && isDigit(source[i])) i++;

// 2. fractional part (optional)
if (source[i] === ".") {
  i++;
  while (i < length && isDigit(source[i])) i++;
}

// 3. exponent (optional)
if (source[i] === "e" || source[i] === "E") {
  i++;
  if (source[i] === "+" || source[i] === "-") i++;
  const expStart = i;
  while (i < length && isDigit(source[i])) i++;
  if (i === expStart) throw new ExpressionError("exponent has no digits", start);
}

The exponent can have a sign, but after the sign you need at least one digit — 1e and 1e+ are invalid literals, not "1 followed by an identifier e". Once you've committed to scanning a number, partial exponents are syntax errors:

test("tokenize rejects exponent without digits", () => {
  assert.throws(() => tokenize("1e"), ExpressionError);
  assert.throws(() => tokenize("1e+"), ExpressionError);
});

Trap 4: error positions with a caret

"Syntax error" is hostile. Give the column and a visual pointer:

export class ExpressionError extends Error {
  constructor(message, position) {
    super(message);
    this.name = "ExpressionError";
    this.position = position;       // 0-indexed column
  }
}

Inside parseAtom(), throw with the token's position:

throw new ExpressionError(`unexpected "${t.value}"`, t.position);

In the UI, render a caret:

const caret = " ".repeat(Math.max(0, position)) + "^";
els.error.textContent = `${msg}\n${source}\n${caret}`;

The user sees:

parse: unexpected ")"
1 + + )
      ^

Pin the position in tests:

test("error positions are reported", () => {
  try {
    parse("1 + + )");
  } catch (e) {
    assert.ok(e instanceof ExpressionError);
    assert.equal(e.position, 6);   // the ")" column
    return;
  }
  assert.fail("expected ExpressionError");
});

Function calls as part of `atom`

When the parser sees an identifier, it peeks at the next token. If it's (, it's a function call:

parseAtom() {
  // ...
  if (t.type === "ident") {
    this.next();
    if (peek is "(") {
      this.next();
      const args = [];
      if (peek is not ")") {
        args.push(this.parseAdditive());
        while (peek is ",") {
          this.next();
          args.push(this.parseAdditive());
        }
      }
      consume(")");
      return { type: "call", name: t.value, args };
    }
    return { type: "ident", name: t.value };
  }
  // ...
}

Variadic functions (min, max, hypot) fall out for free — the args loop already accepts any count. The evaluator just spreads:

const FUNCTIONS = {
  min: (...xs) => Math.min(...xs),
  max: (...xs) => Math.max(...xs),
  hypot: (...xs) => Math.hypot(...xs),
  // ...
};

Variables override constants

CONSTANTS = { pi, e, tau, ... } is hard-coded, but user-supplied variables win:

case "ident": {
  if (Object.prototype.hasOwnProperty.call(variables, node.name)) {
    return Number(variables[node.name]);             // variables first
  }
  if (Object.prototype.hasOwnProperty.call(CONSTANTS, node.name)) {
    return CONSTANTS[node.name];
  }
  throw new ExpressionError(`undefined identifier "${node.name}"`, 0);
}

Two design choices worth calling out:

hasOwnProperty.call(obj, key) instead of key in obj to avoid prototype pollution — without hasOwnProperty, a user variable named __proto__ would be intercepted by the prototype chain.
User variables override constants because the user's intent is more specific. e = 10 makes sense as an override (e.g., "the elasticity coefficient is named e in my problem").

test("variables override constants if same name", () => {
  assert.equal(evaluateExpression("e + 1", { e: 10 }), 11);
});

Edge cases: division by zero, sqrt of negative

I let JavaScript's defaults through:

1 / 0 → Infinity (no throw)
sqrt(-1) → NaN
min(3, 1, 2) → 1

The display layer formats them:

export function formatResult(value) {
  if (Number.isNaN(value)) return "NaN";
  if (!Number.isFinite(value)) return value > 0 ? "∞" : "−∞";
  if (Number.isInteger(value) && Math.abs(value) < 1e15) return String(value);
  return String(Number(value.toPrecision(12)));
}

Number(value.toPrecision(12)) is the trick to make 0.1 + 0.2 = 0.3 display cleanly. toPrecision(12) rounds floats to 12 significant digits then Number() strips trailing zeros — so 0.30000000000000004 shows as 0.3.

TL;DR

Recursive descent beats shunting-yard for readability. Grammar maps 1-to-1 to code.
Right-associative ^ is recursion (not loop) on the operator's RHS.
-2^2 = -4 is the math convention; it falls out naturally if ^ (power) sits a level below unary minus in the grammar.
Numeric literals need state-machine scanning (integer → fractional → exponent), not a regex.
Errors carry a column and the UI renders a caret. Pin positions in tests.
Function calls are atom-level; variadic functions work for free.
Variables override constants with hasOwnProperty.call to avoid proto pollution.

Source: https://github.com/sen-ltd/expression-eval — MIT, ~250 lines of JS + 36 unit tests, zero dependencies, no build step.

🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

DEV Community

A Math-Expression Parser in 250 Lines of JavaScript — Recursive Descent, Right-Associative ^, and -2^2 = -4

Why recursive descent (not shunting-yard)

Trap 1: making `^` right-associative

Trap 2: `-2^2 = -4` (unary minus binds LOOSER than `^`)

Trap 3: numeric literal tokenisation order matters

Trap 4: error positions with a caret

Function calls as part of `atom`

Variables override constants

Edge cases: division by zero, sqrt of negative

TL;DR

Top comments (0)

Why recursive descent (not shunting-yard)

Trap 1: making ^ right-associative

Trap 2: -2^2 = -4 (unary minus binds LOOSER than ^)

Trap 3: numeric literal tokenisation order matters

Trap 4: error positions with a caret

Function calls as part of atom

Variables override constants

Edge cases: division by zero, sqrt of negative

TL;DR

Trap 1: making `^` right-associative

Trap 2: `-2^2 = -4` (unary minus binds LOOSER than `^`)

Function calls as part of `atom`