DEV Community

Matt I Michie
Matt I Michie

Posted on • Originally published at mattmichie.com

Building a Programming Language for Fun

I've been building a programming language. It started as an experiment and turned into sixty thousand lines of Go. That's what happens when you try to answer the question: what would a language look like if Lisp and Python had a kid?

The Premise

I've always liked two things about Lisp that most people hate: the parentheses and the homoiconicity. Code is data, data is code. It's a simple idea with deep consequences. Macros work because you're manipulating the same structures the interpreter reads. There's an elegance to it that never wore off for me.

But I also write a lot of Python. Python gets things right that Lisp never bothered with: readable syntax for common operations, a pragmatic standard library, data structures that do what you expect without ceremony. Python's dict comprehensions are nicer than anything Lisp has for the same task.

So I built a language that tries to have both. S-expressions when you want them, Pythonic syntax when you don't. The same interpreter handles both:

# Lisp style
(def factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))

# Pythonic style
def factorial(n): (if (<= n 1) 1 (* n (factorial (- n 1))))
Enter fullscreen mode Exit fullscreen mode

You can mix them freely. The parser desugars the Pythonic forms into the same AST. Under the hood, it's all S-expressions.

What I Actually Built

The interpreter is written in Go, which turned out to be a good choice for a language runtime. Go's interfaces map cleanly to the kind of type dispatch a dynamic language needs. The garbage collector handles memory management for hosted objects without me having to think about it.

The feature list got long. Classes with inheritance and method resolution order. A protocol system with Python-style dunder methods so you can define __add__ and __getitem__ on your own types. Exception handling with proper tracebacks. Generators with yield. List comprehensions, context managers, f-strings. Most of the Python features I actually use day-to-day.

The protocol system was the most satisfying piece to design. Operator dispatch goes through three tiers: check for a dunder method first, then a protocol implementation, then fall back to type-based defaults. It means the language is extensible in the same way Python is. Define __iter__ and __next__ on your class and for-loops just work.

I also spent time on Python standard library compatibility. Pure Python stdlib modules can run directly when possible. For C extension modules, I wrote Go stubs that provide the same interface. The goal isn't perfect compatibility. It's enough compatibility that useful code works without modification.

The Architecture Rabbit Hole

At some point I started thinking about what this could become if the architecture was right. The idea is a multi-frontend, multi-backend design: multiple source languages parse into a shared intermediate representation, which can target multiple backends.

Right now there's one frontend (M28's hybrid syntax) and one backend (a tree-walking interpreter). But the architecture is designed so a Python frontend could parse into the same IR, and a bytecode VM or LLVM backend could execute it. Solve the problem once in the middle layer and every frontend/backend combination benefits.

I've designed the bytecode VM on paper. Register-based, targeting a 3-10x speedup over the tree walker. Haven't built it yet. The interpreter is fast enough for everything I use it for, and there's always another language feature that's more interesting to implement than making the existing ones faster.

Why Build a Language

People don't ask "why" as much as you'd expect. Maybe because the answer is obvious: because it's interesting. You learn things building an interpreter that you can't learn any other way. How scoping actually works. Why tail call optimization matters. What makes a type system feel good to use versus feel like it's fighting you.

Every language is a set of opinions about how programmers should think. Building one forces you to articulate your own opinions and then live with the consequences. I think operator overloading should use protocols. I think immutable data should be the default. I think S-expressions are underrated. Now I have a language that embodies those opinions, and I can see where they work and where they don't.

It's also the longest-running side project I have. Languages are never done. There's always another feature, another optimization, another edge case in the parser. It's the kind of project that rewards showing up for an hour on a Tuesday night, making one thing slightly better, and closing the laptop.

The code is on GitHub if you're curious. It's a toy in the sense that I wouldn't deploy it to production. But it's the most instructive toy I've ever built.

Top comments (0)