Forem

Cover image for Create Your Own Programming Language (Part I)
Bas Steins
Bas Steins

Posted on • Originally published at bas.codes

Create Your Own Programming Language (Part I)

A computer works with electrical power. Millions of transistors form the Central Processing Unit. Programming such a machine is incredibly flexible yet unbearable complex. Think of pen and paper as an analogy: Being able to do calculations, store intermediate results, and repeat that cycle actually gives you the ability to solve any problem. Any logical problem, to be precise. Pen and paper might not help you to figure out the deeper struggles in life, though.

However, doing some calculations and playing around with some transistors is the “limitless” utilisation of computing power. It will get difficult to create 3D animations in a browser that way or to create a video player or any piece of modern software. This is where “abstractions” come into play.

Instead of breaking down every aspect of the program you want to create into the smallest “atoms”, you break it down to the lowest available abstraction. This might sound abstract in and of itself, but let me give you an example.

Say, you want an input field to turn red in case of a validation error. If it's a web app, you probably have some JavaScript that handles the validation and – if this fails – applies a CSS class to your input field. You are not dealing with some electrons moving from the CPU to the graphics unit and turning certain areas of a screen red. Instead, you rely on the abstractions (“input field”, “CSS class”, ...) provided in that specific environment. In the world of software, there is a tremendous amount of such abstractions piled on top of each other. Starting at the Operating System level to applications running inside that Operating System, such as the browser, network communication, graphical processing and so on and so forth (pun intended for computer language enthusiasts).

Programming languages themselves are abstractions: They limit what you can do with your pen and paper. You wouldn't code a device driver in JavaScript.

To give you an example, let's have a look at some languages that are so specific that they limit your ability as a programmer to use them out of their intended scope:

Prolog

Prolog is a language made for logical analysis. It came to life in 1972, and it is crystal clear that this language is designed for formal logic. In the example below, we create a simple family tree. The line starting with ? demonstrates the power of Prolog: It can answer the question if two individuals are siblings based on the explicitly stated parental relationship.

mother_child(trude, sally).

father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).

sibling(X, Y)      :- parent_child(Z, X), parent_child(Z, Y).

parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).

- sibling(sally, erica).
% Yes

Enter fullscreen mode Exit fullscreen mode

SQL

SQL – or “Structured Query Language”, is a language designed to interact with relational databases. It is just 2 years younger than Prolog, but it is still used today.

In the example below, we filter the table person so that we get only the individuals that have an above-average weight.

-- Persons who weigh more than the average of all persons

SELECT id, lastname, weight
FROM   person
WHERE  weight >= (SELECT avg(weight) FROM person)   -- another position for the subquery
ORDER BY lastname;
Enter fullscreen mode Exit fullscreen mode

Multi-Purpose Languages


public class HelloWorldApp {
    public static void main(String[] args) {
        System.out.println("Hello World!"); // Prints the string to the console.
    }
}

Enter fullscreen mode Exit fullscreen mode

But not only such specific use case languages pose restrictions on you as a programmer. All modern languages do. Object Oriented programming, Functional programming, and all the other “programming” paradigms shape the way of reasoning about a problem by imposing restrictions. There is a great video by Uncle Bob Martin if you want to dig deeper into the topic.

Java is seen as a general purpose programming language. It comes with a bunch of features compared to lower-level languages, such as C. It introduces classes and keywords, such as private or public, that control access to certain attributes or methods of such classes. This is called the object-oriented paradigm. If you think about it, a paradigm like this actually imposes limitations on how to use your pen and paper. For example, there is no goto. That means that in Java, it is impossible to just draw a line from one arbitrary corner of your paper to another. This is not a bad thing, however. The more complex our software projects get, the more we actually desire such limitations. It’s our job as programmers to pick the right tool and align our problem-solving skills with the mental model of the language we’re using.

The essence is, however, that programming languages are opinionated by design. As a programmer, you somehow need to adapt the mental model of the language creator. It is for that reason that many programmers seek to try new languages to see if there is any language that either is a better fit for a particular task or somehow better matches the programmer's mental model.

Some may even think about creating their own programming language. In this article, we’ll start by creating a parser for simple arithmetic expressions. In the next part, we use that knowledge to actually create a very rudimentary programming language. So stay tuned! There are a lot of Computer Science textbooks on this topic, such as “Crafting Interpreters”, “Compiler Construction”, so apparently, I'll cover only the basics in this post. If you're interested in this topic, there is a worthwhile and practical online course conducted by Roberto Ierusalimschy, one of the creators of Lua, that goes into the details.

Our first goal is to parse expressions such as 1 + 2 or 5 * 8 or 10 * 2 + 5.

We can do this by using the LPeg library for Lua. As it will turn out, some of its features that set it apart from other pattern-matching engines come in very handy when parsing such expressions.

So, take a simple expression apart:

 1+ 10
Enter fullscreen mode Exit fullscreen mode

As we can see, the general pattern for a sum is:

  • Any number of spaces (including 0)
  • A number
  • An operator
  • Another number

In Lua’s LPeg we could write it like so:


local lpeg = require lpeg
local space = lpeg.S( \n\t)^0
local number = space * lpeg.C(lpeg.R(09)^1) * space
local operator = + * space
local sum = number * operator * number

Enter fullscreen mode Exit fullscreen mode

Let’s go through our program line by line:

  • space = lpeg.S(" \n\t")^0: \ This is just our representation of any number of whitespace characters. We use the S operator (“Set”) here. This just matches any of the supplied characters. With ^0 we indicate that such a character can occur 0 or more times.
  • number = lpeg.C(lpeg.R("09")^1 ) / tonumber * space: We are using two LPeg operators here: R (“Range”) matches any character in a certain range (digits from 0 to 9 in our case). This is however, the crucial information that we will act on later. So, we need to “capture” this information which is exactly what LPeg’s C operator does. Next, we need to convert the result of the capture to a number, which is what we achieve by adding /tonumber. With the * operator we can concatenate LPeg patterns, so we make sure that a number is still valid if it has trailing white spaces.
  • operator = lpeg.C(lpeg.S"+-*/") * space: Another capture is the operator. We want to be able to provide basic arithmetic operations, i.e. +, -, /, and *.

The / tonumber is important here: Whatever is the result of the capture is sent to that number. Of course, our digits are string characters and hence, need to be converted to numbers. With the knowledge about this operator, we can think of a similar function that performs the basic math operations for us.

How does that look like?

Imagine a string like 1*2+3. So, it is clear it has to start with a number. After the first number, there are pairs of operators and other numbers.

Hence, we can build a function like this:

function fold (lst)
    local acc = lst[1]
    for i = 2, #lst, 2 do
        if lst[i] == "+" then
            acc = acc + lst[i + 1]
        elseif lst[i] == "*" then
            acc = acc * lst[i + 1]
        elseif lst[i] == "/" then
            acc = acc / lst[i + 1]
        else
            acc = acc - lst[i + 1]
        end
    end
    return acc
end
Enter fullscreen mode Exit fullscreen mode

This function expects a list, takes the first element and then iterates over the list with a step of 2. So, inside the for loop, our position i must refer to an operator, and i+1 to a number.

  • sum = space * lpeg.Ct(number * (operator * number)^0) / fold * -1:

Now, we introduce another capture variant: Ct which makes LPeg capture to a table. The same way we applied /tonumber for our number matcher, we can now pass the table capture to our fold function.

Let’s see if that works:

print(arithmetic:match(" 10*2 + 1 -1 / 2 "))
10.0
Enter fullscreen mode Exit fullscreen mode

So, yes, it does work. However, we already stepped into the first pitfal of designing a language and creating a parser for it: operator preference. We want multiplication and division to be executed before addition and subtraction.

Of course, we also did not account for parentheses in our arithmetic expressions.

💸 Note: This blog post contains sponsored content. See my policy on sponsored content

Top comments (1)

Collapse
 
jd2r profile image
DR

Really interesting. I'd love to see where you go with this series :)