What if you could design your own programming language? In this post, I walk through building TinyLang โ a complete interpreter from scratch in Python with lexer, parser, AST, closures, first-class functions, and more.
Why Build a Language Interpreter?
Building a programming language interpreter is one of the best exercises in computer science. It forces you to understand:
- How code is read โ lexical analysis and tokenization
- How structure is extracted โ parsing and abstract syntax trees
- How meaning is given โ evaluation and runtime semantics
- How state is managed โ scoping, closures, and environments
TinyLang is a dynamically-typed language with Python-like syntax and curly-brace blocks. It supports integers, floats, strings, booleans, arrays, dictionaries, first-class functions, closures, arrow functions, a pipeline operator, structured error handling, and 60+ built-in functions โ all in zero-dependency Python.
The Compiler Pipeline
Source Code โ Lexer โ Tokens โ Parser โ AST โ Interpreter โ Result
Every interpreter follows this pipeline. Let's walk through each stage.
Stage 1: The Lexer (Tokenizer)
The lexer breaks raw source code into a stream of tokens โ the smallest meaningful units of the language.
from dataclasses import dataclass
from enum import Enum, auto
class TokenType(Enum):
INTEGER = auto()
FLOAT = auto()
STRING = auto()
IDENTIFIER = auto()
PLUS = auto()
MINUS = auto()
FN = auto()
LET = auto()
IF = auto()
# ... 50+ token types
@dataclass
class Token:
type: TokenType
value: any
line: int
column: int
The lexer processes characters one at a time, handling multi-character operators (==, !=, |>, =>), string escape sequences (\n, \t, \uXXXX), numeric literals with underscores (1_000_000), and comments (single-line // and multi-line /* */).
Key insight: Handle operator ambiguity by checking the next character. When you see =, peek ahead โ is it == (equality) or => (arrow)?
def _read_operator(self):
ch = self.current
if ch == '=' and self.peek() == '>':
self.advance()
self.advance()
return Token(TokenType.ARROW, "=>", self.line, start)
elif ch == '=' and self.peek() == '=':
self.advance()
self.advance()
return Token(TokenType.EQ, "==", self.line, start)
elif ch == '=':
self.advance()
return Token(TokenType.ASSIGN, "=", self.line, start)
Stage 2: The Parser (Recursive Descent)
The parser takes tokens and builds an Abstract Syntax Tree (AST). I chose recursive descent because it's intuitive, debuggable, and maps cleanly to the grammar.
Operator Precedence
The key challenge is getting operator precedence right. Each precedence level gets its own parsing function:
def parse_expression(self):
return self.parse_assignment()
def parse_assignment(self):
left = self.parse_pipe() # |>
# ... handle = and +=/-=/*=
def parse_pipe(self):
left = self.parse_logical_or() # or
# ... handle |>
def parse_logical_or(self):
left = self.parse_logical_and() # and
# ...
def parse_addition(self):
left = self.parse_multiplication() # * / %
# ...
def parse_power(self):
base = self.parse_unary() # - not
# Right-associative: 2 ** 3 ** 2 = 2 ** 9
Each level calls the next-higher precedence level, creating a natural hierarchy where multiplication binds tighter than addition, which binds tighter than comparison, and so on.
AST Nodes
Every language construct becomes a dataclass node:
@dataclass
class BinaryOp:
op: str
left: any # AST node
right: any
line: int
@dataclass
class FunctionDeclaration:
name: str
params: list[str]
defaults: list # Default parameter values
body: list # List of AST statements
@dataclass
class IfStatement:
condition: any
then_body: list
else_body: list
Stage 3: The Interpreter (Tree-Walking)
The interpreter visits each AST node and evaluates it. I used the visitor pattern โ a dictionary mapping node types to handler methods:
class Interpreter:
def __init__(self):
self.visitors = {
IntegerLiteral: self.visit_IntegerLiteral,
BinaryOp: self.visit_BinaryOp,
FunctionDeclaration: self.visit_FunctionDeclaration,
CallExpression: self.visit_CallExpression,
# ... 30+ visitors
}
def execute(self, node, env):
visitor = self.visitors.get(type(node))
return visitor(node, env)
Closures: The Hardest Part
Closures are functions that "remember" their defining environment. This is the single most important concept in the interpreter:
class UserFunction:
def __init__(self, declaration, closure_env):
self.declaration = declaration
self.closure_env = closure_env # โ captures the defining scope
def visit_FunctionDeclaration(self, node, env):
# Capture the current environment as the closure
func = UserFunction(node, env)
env.set(node.name, func)
return func
def _call_user_function(self, func, args, node):
# Create new scope INSIDE the closure's environment
call_env = Environment(parent=func.closure_env)
for param, arg in zip(func.declaration.params, args):
call_env.set(param, arg)
# Execute function body
for stmt in func.declaration.body:
self.execute(stmt, call_env)
The key: when calling a function, the new scope's parent is the closure environment (where the function was defined), NOT the calling environment. This is what makes closures work:
fn make_counter() {
let count = 0 // lives in make_counter's scope
fn increment() {
count = count + 1 // accesses parent scope
return count
}
return increment // carries make_counter's scope with it
}
let c = make_counter()
print(c()) // 1
print(c()) // 2 โ count persists!
The Environment (Scope Chain)
The Environment class implements lexical scoping:
class Environment:
def __init__(self, parent=None):
self.variables = {}
self.constants = set()
self.parent = parent
def get(self, name):
if name in self.variables:
return self.variables[name]
if self.parent:
return self.parent.get(name)
raise NameError(f"Undefined variable: {name}")
def set(self, name, value):
if name in self.constants:
raise RuntimeError(f"Cannot reassign constant: {name}")
if name in self.variables:
self.variables[name] = value
elif self.parent:
self.parent.set(name, value)
Built-in Functions
TinyLang ships with 60+ built-in functions wrapped in a TinyLangCallable class:
class TinyLangCallable:
def __init__(self, name, func, min_args=0, max_args=None):
self.name = name
self.func = func
self.min_args = min_args
self.max_args = max_args
def __call__(self, *args):
# Validate argument count
return self.func(*args)
This includes map, filter, reduce, sorted, reversed, enumerate, zip, math functions, string operations, and type conversion. The interpreter wraps user functions in Python callables before passing them to builtins like map โ so map((x) => x * 2, [1, 2, 3]) works seamlessly.
The Pipeline Operator
One of TinyLang's nicest features โ the pipeline operator |> passes the left side as the first argument to the right side:
[1, 2, 3, 4, 5] |> sum |> str
// Equivalent to: str(sum([1, 2, 3, 4, 5]))
// Result: "15"
Implementation is straightforward:
def visit_PipeExpression(self, node, env):
value = self.execute(node.left, env)
func = self.execute(node.right, env)
# Call func with value as first argument
return self._call_function(func, [value], node)
Error Handling
TinyLang supports try/catch/throw with any value type:
try {
throw {"code": 404, "message": "Not found"}
} catch err {
print(err["code"]) // 404
print(err["message"]) // Not found
}
Under the hood, throw raises a Python ThrowSignal exception that carries the thrown value. try/catch catches it and binds the value to the catch variable.
Testing: 248 Tests
The test suite covers every language feature across four modules:
tests/
โโโ test_lexer.py # Token-level tests
โโโ test_parser.py # AST structure tests
โโโ test_interpreter.py # Execution tests
โโโ test_programs.py # Full program integration tests
Tests include edge cases like recursive closures, max recursion depth, division by zero, index out of bounds, and multi-line expressions.
What I Learned
Recursive descent is elegant. Each grammar rule maps to a function. The precedence hierarchy emerges naturally.
Closures are surprisingly simple once you understand that functions capture their defining environment, not the calling environment.
Error messages matter. Every error carries line and column numbers. Users need to know where things went wrong.
The visitor pattern scales. Adding new node types is just adding a new visitor method โ the interpreter stays clean.
Test-driven development works. Writing tests first caught bugs in operator precedence, closure semantics, and edge cases before they became hard to debug.
Try It Yourself
pip install git+https://github.com/hajirufai/tinylang.git
tinylang # Start the REPL
Or run an example:
tinylang examples/closures.tl
tinylang examples/fibonacci.tl
tinylang -e 'print(map((x) => x ** 2, range(10)))'
๐ฆ GitHub Repository โ 5,300+ lines, 248 tests, zero dependencies.
This is project #11 in my portfolio series. Previously: RaftKV, QueryForge, and more on my dev.to profile.
Top comments (0)