Ebendttl

Posted on Jun 25 • Originally published at akinseinde.netlify.app

Building an AST-Parsing JavaScript Interpreter from Scratch: Lexing, Parsing, and Evaluation

#javascript #webdev #computerscience #typescript

This is an excerpt. The full article includes a live interactive AST interpreter sandbox — select or edit JavaScript code loops, run the interpreter, and watch the lexer tokens, AST nodes, execution logs, and variable scope values update in real time. Read the full interactive version →

How Languages Run: Compilers vs. Interpreters

To run a program, source code must be translated into instructions a computer can execute. There are two primary approaches to doing this:

Compilers: Translate the entire source code into native machine code (or bytecode) all at once, producing a standalone executable file (e.g. C++, Rust, Go).
Interpreters: Read and run the source code instructions step-by-step, evaluating expressions in real time (e.g. standard Python, Ruby).

Modern JavaScript engines (like Google Chrome's V8) use a hybrid approach called Just-In-Time (JIT) Compilation. V8 runs code quickly using an interpreter, but monitors execution in the background. Frequently used "hot" functions are compiled directly into optimized machine code on the fly, combining the fast startup of an interpreter with the high execution speed of compiled code.

To understand how these runtimes work, we can build a custom tree-walk interpreter in three distinct stages: Lexical Analysis, Parsing, and Evaluation.

Phase 1: Lexical Analysis (The Lexer)

The first step of an interpreter is Lexical Analysis, performed by a Lexer or Tokenizer.

A lexer reads raw source code characters (e.g., let x = 10;) and splits them into a flat array of categorized units called Tokens. Each token represents a basic grammar symbol, such as keywords (let), identifiers (x), assignment operators (=), numbers (10), and punctuation (;).

Source Stream: "let sum = 0;"
                    │
                    ▼
Tokens: [
  { type: "KEYWORD",     value: "let" },
  { type: "IDENTIFIER",  value: "sum" },
  { type: "OPERATOR",    value: "=" },
  { type: "NUMBER",      value: "0" },
  { type: "PUNCTUATION", value: ";" }
]

During this step, the lexer strips out irrelevant characters like white spaces and code comments, and flags syntax errors like unclosed strings or unrecognized characters before the code is sent to the parser.

Phase 2: Syntactic Analysis (The Parser)

Once tokenized, the tokens are passed to the Parser for Syntactic Analysis.

The parser reads the flat token list and maps it into a hierarchical data structure called an Abstract Syntax Tree (AST). The AST represents the grammatical structure of the program.

To parse tokens, we use a technique called Recursive Descent Parsing. The parser matches the tokens against the language's grammar rules. It looks at the current token to decide which node type to parse, builds the node, and recursively parses its child expressions.

For example, the variable declaration let sum = 0; parses into a VariableDeclaration node:

{
  "type": "VariableDeclaration",
  "id": "sum",
  "init": {
    "type": "Literal",
    "value": 0
  }
}

Phase 3: Runtime Environments & Lexical Scopes

To evaluate the AST, the interpreter needs a way to track variables. This is managed by the Environment.

An environment is basically a key-value hash map mapping variable names to their active values. To support nested blocks (like functions or loop scopes), environments are chained together:

┌──────────────────────────────────────┐
│ GLOBAL ENVIRONMENT                   │
│   sum = 15                           │
└──────────────────────────────────────┘
         ▲
         │ (parent reference)
┌──────────────────────────────────────┐
│ LOOP BLOCK ENVIRONMENT               │
│   i = 5                              │
└──────────────────────────────────────┘

Each scope environment holds a reference to its outer "parent" environment. When looking up a variable, the engine checks the current local scope first. If not found, it traverses up the chain until it finds the variable or hits the global environment. If it isn't found anywhere, the engine throws a ReferenceError.

This chaining of environments is what enables lexical scoping in modern programming languages.

TypeScript AST Interpreter Implementation

Here is a complete, modular TypeScript implementation mapping the tokenizing, AST node building, and environment scope lookup:

export interface Token {
  type: "KEYWORD" | "IDENTIFIER" | "NUMBER" | "OPERATOR" | "PUNCTUATION";
  value: string;
}

export interface ASTNode {
  type: string;
  [key: string]: any;
}

export class Environment {
  private values: Map<string, any> = new Map();
  private parent: Environment | null;

  constructor(parent: Environment | null = null) {
    this.parent = parent;
  }

  // Define variable in current scope
  public define(name: string, value: any): void {
    this.values.set(name, value);
  }

  // Lookup variable traversing parent chain
  public get(name: string): any {
    if (this.values.has(name)) {
      return this.values.get(name);
    }
    if (this.parent) {
      return this.parent.get(name);
    }
    throw new Error(`ReferenceError: ${name} is not defined`);
  }

  // Assign variable in scope chain where declared
  public assign(name: string, value: any): void {
    if (this.values.has(name)) {
      this.values.set(name, value);
      return;
    }
    if (this.parent) {
      this.parent.assign(name, value);
      return;
    }
    throw new Error(`ReferenceError: Cannot assign to undefined variable ${name}`);
  }
}

export class ASTEvaluator {
  /**
   * Recursively evaluates AST nodes in the current Environment scope
   */
  public evaluate(node: ASTNode, env: Environment): any {
    switch (node.type) {
      case "Program":
        let lastValue = null;
        for (const statement of node.body) {
          lastValue = this.evaluate(statement, env);
        }
        return lastValue;

      case "VariableDeclaration":
        const initVal = node.init ? this.evaluate(node.init, env) : null;
        env.define(node.id, initVal);
        return initVal;

      case "AssignmentExpression":
        const rightVal = this.evaluate(node.value, env);
        env.assign(node.id, rightVal);
        return rightVal;

      case "Literal":
        return node.value;

      case "Identifier":
        return env.get(node.name);

      case "BinaryExpression":
        const leftVal = this.evaluate(node.left, env);
        const operand = this.evaluate(node.right, env);
        return this.applyOperator(node.operator, leftVal, operand);

      case "ForStatement":
        // Initialize loop variable in a nested loop scope environment
        const loopEnv = new Environment(env);
        this.evaluate(node.init, loopEnv);

        while (this.evaluate(node.test, loopEnv)) {
          this.evaluate(node.body, loopEnv);
          this.evaluate(node.update, loopEnv);
        }
        return null;

      default:
        throw new Error(`Unknown AST Node type: ${node.type}`);
    }
  }

  private applyOperator(op: string, left: any, right: any): any {
    switch (op) {
      case "+": return left + right;
      case "-": return left - right;
      case "*": return left * right;
      case "/": return left / right;
      case "<=": return left <= right;
      case "==": return left == right;
      default:
        throw new Error(`Unsupported operator: ${op}`);
    }
  }
}

Engineering Takeaways

ASTs are the common language of developer tools: Almost all developer tools—like Babel, ESLint, and Prettier—convert your code into an AST to analyze and transform it.
Environment chains enforce scope: Chaining local environments to parent scopes is the core mechanic enabling variables and functions to resolve correctly.
Understand language performance: Grasping compiler processes helps you write optimized code that is easy for engines like V8 to inline and optimize.

The full article features a live interactive JavaScript interpreter sandbox — input custom loops, step through lexer token streams, inspect the parsed AST tree, and watch variables update in the execution scope in real time.

Read the full interactive article →

Written by Ebenezer Akinseinde — Software Developer & AI Automations Engineer.

Portfolio · GitHub

Top comments (2)

Frank • Jun 26

How do you handle lexical scope resolution in the presence of nested functions, and do you implement a separate symbol table or rely on the AST itself? I'd love to swap ideas on this.

Ebendttl • Jul 4

@frank_signorini

Hey Frank! Great question.

I went with a separate runtime representation of scope - using parent-linked Environment instances - rather than walking the AST directly for variable lookups.

To handle lexical scope for nested functions:

At declaration time, the function object stores a reference to the active Environment (its closure).

At invocation time, the interpreter spins up a new execution Environment and points its parent to that stored closure environment (rather than the caller's environment). Variable lookups then recursively crawl the parent pointers. To make it robust against shadowing and avoid slow runtime crawls, you can also run a static resolver pass right after parsing to calculate the exact scope "depth" (in hops) for each identifier.

How did you go about implementing it? Did you go the static resolution route or stick to dynamic traversal?