As a programmer, Compilers have always seemed to me like a million line black box only out-daunted by making an operating system. But hard challenges are the best challenges, so a while ago I set out to try and make on myself.
The process of lexing or, Lexical Analysis is, relative to the rest of this process is actually very straightforward. Consider the following code:
const hello = "Hello, " + "World!"; const sum = 4 + 5;
When lexing a piece of code, you must go through the entire source and convert the string into a collection of Tokens. Tokens are simple structures that store information about a small sliver of the source code. For the lexer that I wrote, I use four main Token types:
Symbol. So the code above might look like something this after lexing:
Keyword<"const"> Word<"hello"> Symbol<"="> String<"Hello, "> Symbol<"+"> String<"World"> Symbol<";"> Keyword<"const"> Word<"sum"> Symbol<"="> Word<"4"> Symbol<"+"> Word<"5"> Symbol<";">
If you've made it this far, then Awesome!
My project, Mantle, makes this super* to do through an abstract class you can extend called
mantle.lexer.Lexer. You simply define a list of keywords, symbols, and string delimiters, tell it whether to allow comments or not, and pass a function that defines if a character can be used in a word. After that, creating the list above becomes as easy as calling
Lexer.parse() but moving on, you will almost never call
More on mantle can be found at https://github.com/Nektro/mantle.js
This is the hard part.
Parsing requires you to figure out patterns of tokens that can compress the token list into a single node. This took a lot of trial and error to get right, and is the main reason why this project took so long.
For instance for the code we had above we might define the following rules:
Add <= String + String Add <= Integer + Integer AssignmentConst <= const Word = Add StatementList <= Add Add
There are more complex rules, the more complex the language which I discovered very soon.
The JSON example for
mantle.parser.Parser can be found at https://github.com/Nektro/mantle.js/blob/master/langs/mantle-json.js
3) Code generation
This is the process of going through your final condensed node, also called an Abstract Syntax Tree, and
toString()ing them all until you get your new output.
Optimization of higher-level languages requires a lot more work than calling toString(), but is way above my scope
4) Corgi - my new HTML Preprocessor
At this point I was ecstatic. I successfully made a JSON parser. But I wanted to make something a little more complicated. So I moved onto HTML. The thing is though, HTML isn't very well formed. So I thought I'd make a version that's a little easier for Mantle to parse. And that's how a came onto Corgi.
Corgi syntax is inspired by Pug but isn't tab based so you can compress a file onto one line theoretically. I loved this because forcing the tab structure made using cosmetic HTML tags in Pug really awkward. So Corgi makes HTML great for structure and style.
An example Corgi document would look like:
doctype html html( head( title("Corgi Example") meta[charset="UTF-8"] meta[name="viewport",content="width=device-width,initial-scale=1"] ) body( h1("Corgi Example") p("This is an example HTML document written in "a[href="https://github.com/corgi-lang/corgi"]("Corgi")".") p("Follow Nektro on Twitter @Nektro") ) )
Making compilers is hard but has definitely been fun and I hope this helps demystifies them some.
And now I also have an HTML Proprocessor I'm going to use in as many projects as it makes sense.
Top comments (11)
This is cool. A great extension would be one where you could mix-in standard Bootstrap markup through use of a keyword, eg.
p("This is an example HTML document written in "ahref="github.com/corgi-lang/corgi"".")
p("Follow Nektro on Twitter @nektro ")
now THAT would save a ton of time methinks. Wonder if the pre-processor could load up that json while compiling and substitute it in (for readability and all.)
Totally. Sometime very soon I was going to add syntax for an import statement that I could work into a (gulp, etc) plugin that could reference other corgi documents.
As it stands right now tags and attributes are allowed a
([a-z0-9-]+)range for the name so custom elements and attributes are already possible. :)
What did you have in mind for the contents of
So if you look at the standard bootstrap3 accordian setup: getbootstrap.com/docs/3.3/javascri... that could be turned into a JSON object, I'm sure. Perhaps something like this:
This is potentially the content of my first tab",
This is potentially the content of my second tab",
This is potentially the content of my third tab"
Your parser could interpret a shortcode with a json parameter to quickly build the entire structure of the accordian very quickly and cleverly. Perhaps you could allow for passing in variables/settings like in the json above. It could potentially work for any of the preset Bootstrap components, I think?
Something like this?
Importing the code from
<bs4-accordion>becomes a tag available anywhere in the document. The slightly monotonous init JS is because of the specificity of Bootstrap syntax but with the Custom Element it sets all that up for you.
Sounds ideal then :)
Have you looked at riot.js? Perhaps some knowledge to glean from there too.
Are you happy with it?
I like the way this gives a high level overview of the problem. It was helpful for me as I'm a CS student and have a compilers course atm.
I think for that you have the .pug files they does the same. Do we need HTML pre-compilers
We don't need preprocessors, especially because my files can't be natively interpreted by a browser, so using this does add a step to your development process depending on how all-in you use a pre-processor. Mileage may vary.
While I was using Pug for another project of mine, this project has been on my backburner for a while and decided to write this post because I got it to the point where I had finally made something potentially useful. On the other hand, as I mentioned, I was using Pug for another project of mine but the other day just converted all those pages to use Corgi instead for the advantages I mentioned above as well as the added bonus I have the satisfaction that the code I publish for my next project is that little bit more mine.