Take this article as a sort-of short introductory post. So, recently I've come up with an idea to create a programming language. Seams easy, isn't it? 😅 Is it possible? - Yes. Is it worth it? - Kind-of. You see, by creating a language you can learn certain concepts that you wouldn't otherwise. It just makes you a better programmer. Also, it's fun. 😉 So, without further ado, let's create a programming language!
Photo by Glenn Carstens-Peters / Unsplash
Let's get prepared!
Well, I think it's gonna be an interesting (and quite long) series. Keep in mind, that even this will be some new experience even for me. Like many others, I'm just an ordinary guy without deeper knowledge of language creation. I just plan on creating something from nothing (or rather from other sources on the web). Now, some basic information. The language I'm going to make will be named AIM (because I'm aiming at my goal of creating a language and it's a nice shortcut). It's meant to be multi-paradigm, statically typed, compiled language. For this, I'll be using LLVM(more on that later) and Node.js. As you might know, programming languages and similar lower-level stuff are done using C/C++, so I take Node.js - the unusual project is being made using unusual tools. Also, I'm a bit more experienced with Node.js that C/C++. 👍
So, if you're ready, let's first create a TODO list:
- Find a cool name ( done! ) 😀
- Create / imagine language's syntax.
- Create lexer.
- Create parser.
- Create compiler.
These are the most basic and definitely required steps. Next, going forward in the future without much order:
- Standard library implementation
- Runtime library implementation
- Self-hosting/compiler bootstrapping
- Examples/other libraries/first app
Now, let's get back to the ordered list and discuss each point for a second or two. Also, I'll be discussing each of these points in the following post, so I'll be brief.
Photo by Fabian Grohs / Unsplash
Create language syntax
Syntax and the general idea of it is one of, if not the most important when designing a language. Many programming languages have similar syntaxes. They base on what has already been proven to work and that's fine. That's what you should consider when creating a programming language of your own. But, as I'm not really planning on creating this particular language for general purpose (but who knows 😂) I most likely won't follow this advice in order to create something new and fresh. Of course, it won't be as radical as this, so don't panic, just a little bit different. Still, I don't know exactly how will it look like but I'm sure it won't be so standard. 🤔
Nothing fancy here. A lexer is just a software that takes your code and creates a series of tokens with additional metadata for it. At least that's what I know for now. Of course there's a plan for posts about each step of language development, so be patient.
A parser is much more important that lexer. Parser, from a list of tokens created by the lexer, creates what's called AST (Abstract Syntax Tree). It's basically a representation of parsed code in a form of the data tree, which allows interacting with it programmatically. So, it's important for it to be fast and well-design since it'll be used in things like linters, pretty-printers and so on. Generally by software that's meant to interact with the syntax of your language directly. Also, AST is what will be used in the stage of compiling the language to its machine code representation. Which brings us to the last, most important step...
This is what all the programming languages are about (the compiled ones, of course) - to just being able to execute. The compiler just takes your code and outputs (usually) the machine code. I've decided to implement the compiler with the help from LLVM (Low-Level Virtual Machine), so-called compiler infrastructure library. This piece of software has been used to create languages like (most notably) Rust and Swift together with one of the most popular C/C++ compiler front-ends - Clang, so it must be good enough for this project. 😉 It's much easier to use the C API of the LLVM to generate machine code than to use Assembly, obviously. Still, it's very big and complicated, so I'm going to spend a fair bit of time with its documentation, which is a very well-written one. Here the Node.js C binding may be necessary.
Photo by Vlad Bagacian / Unsplash
So here you go with what's coming next. I'm not covering the other points as only time will tell what will come in the future. After this little intro, I hope you and me myself will enjoy this journey. Again, keep in mind that I'm not an expert, I'll just share my own experience, so if you got any tips for me, I'll be grateful. Now, the next post is coming about the syntax of this creation, so if you want to follow me on Twitter for any updates. Once more, thank you for reading this intro and let the journey begin...👍
Top comments (7)
Will that be what kind of language? Functional? Imperative? OO? maybe multi-paradigm?
America On-Line Instant Messenger language
Has been written - most likely multi-paradigm.
An odd choice for Node.js. Will the development be on GitHub?
Why not Rust?
You can always export the AST as JSON for dev tools.
Do you expect to rely on
new Function(string)while prototyping, before switching to LLVM?
In fact, yes - Rust is a good choice, but I wanted to create this language in JS. Of course, compiler performance may not be perfect, but even if, then I may have planned for self-hosting this language. Haven't thought about prototyping much by now, only parser and LLVM docs ( a lot of reading ) but yeah, a small interpreter would be nice to have. For now, I focus on syntax with a new post coming soon 👍