loading...
Cover image for Application to the GSoC 2020

Application to the GSoC 2020

_gusgustavo profile image Gustavo Castellanos Updated on ・4 min read

gsoc2020 (2 Part Series)

1) My project for GSoC 2020: A Parser Generator with Automatic Error Recovery on LPeg(Label) 2) Application to the GSoC 2020

Background

Firstly, I am a fan of functional programming since I had to learn Haskell for an interpreter I had to build for a compiler design course at my university. I built that interpreter with Germán Robayo, who is also working on a project related to programming languages for the GSoC 2020. You can check his posts about his journey here.

After taking that course, I decided to take another course where instead of building an interpreter, we had to build a compiler from a programming language designed by us to MIPS32. Funny thing: Since the COVID-19 quarantine, my professor has not had the chance to evaluate our final compiler.

Preparation

So with that background, I decided I should participate on a project related to programming languages for the GSoC. As a first step, we the students had to reach out the organizations we would like to work with and choose one of their projects. My little knowledge about Elixir made me write to the Erlang Ecosystem Foundation for working on a syntax highlighter; obviously I also reached out to Haskell.org, but I had no answer. Then I stepped upon LabLua and their project about building a parser generator with automatic error recovery. Until then, the only parser generator I used was Happy, which I really loved to use for the projects I already mentioned. So it seemed like a challenge but an interesting one.

So I reached out to Sérgio Medeiros, the mentor of that project. He is a very nice guy. He told me that I should first get familiar with LPegLabel, an extended version of the library LPeg for building PEG parsers with labeled failures. I had to be familiar with Lua, of course.

My first PEG parser

To see if I am capable of working on the parser generator, Sérgio gave me the task of building a parser for the following expression grammar, using LPegLabel:

Program -> (Cmd | Exp)*
Cmd      -> var '=' Exp
Exp       -> Exp '+' Term | Exp '-' Term | Term
Term     -> Term '*' Factor | Term '/' Factor | Factor
Factor   -> num | var | '(' Exp ')'

He told me that the parser needed error labels to provide better error messages, so I may wanted to read the first 10 pages of his paper about annotating PEGs.

Before coding, I had to get familiar with Lua. Right then I forgot all the other projects. I started reading the chapters of Programming in Lua (first edition) that seemed essential to me for the project. Those were all the Part I, Part II excluding metatables (I still don't get them) and some chapters from Part III.

Then I read about Parsing Expression Grammars, but not about their implementations. Maybe I will in the future. And next, I read the documentation of LPeg and LPegLabel.

Cool, I was ready to start coding. Oh, wait! LPeg does not support left recursions, so first I had to change the grammar and remove any left recursion. This was the resulted grammar:

Program <- (Cmd | Exp)*
Cmd <- var = Exp
Exp <- Term ('+' Term | '-' Term)*
Term <- Factor ('*' Factor | '/' Factor)*
Factor <- var | num | '(' Exp ')'

Coding it with Lua and LPeg was fun! Lua, despite of being really simple, has many high level constructors. Functions are first class citizens, tables are like objects in JavaScript, LPeg defines its own algebraic operations for the patterns (using metatables!!), and many other things were very cool to me.

Then I inserted some error labels. The approach was the following: If a specific element in a pattern does not match and then the entire matching fails, the parser should throw an error label when that element does not match. Easy to understand, right? If you have any questions about that, let me know in the comments.

Considering {ErrLabel} as a syntax for "throwing ErrLabel", and p^ErrrLabel as syntax sugar for p | {ErrLabel}, here is the annotated grammar:

Program <- ( Cmd | Exp | &. {ErrStmt} )*
Cmd     <- var '=' Exp^ErrExp
Exp     <- Term ( ( '+' | '-' ) Term^ErrTerm )*
Term    <- Factor ( ( '*' | '/' ) Factor^ErrFactor )*
Factor  <- var | num | '(' Exp^ErrExp ')'^ErrClosePar

I wrote some tests using LuaUnit. Then I updated them using Busted, recommended by Sérgio.

I shared the parser to Sérgio, he gave me some observations and told me I "passed the test" (he did not say that literally).

Proposal

So now I had to write a proposal for the Parser Generator. I read again Sérgio's paper, made some notes, analyzed what I can and cannot do. The paper describes two possible algorithms, one that inserts many labels but may insert some labels wrongly, and another one that inserts few labels but only good ones. I decided to implement the second one.

I wrote the proposal. Submitted it to the GSoC 2020 site and waited for a response.

Approval

One day I went to sleep very late. I woke up in the afternoon. I had an unread message from my girlfriend saying "CONGRATULATIONS", and I was like "wut?". Then I read a group chat I have with some friends, where Germán was giving the news that he got approved and so did I! MY FRIENDS GET THE NEWS BEFORE ME! Anyway, it was a very happy moment. I wrote to Sérgio thanking him for this opportunity. And here I am, still working in the early morning with my parser generator, now called Caribay.

gsoc2020 (2 Part Series)

1) My project for GSoC 2020: A Parser Generator with Automatic Error Recovery on LPeg(Label) 2) Application to the GSoC 2020

Posted on by:

_gusgustavo profile

Gustavo Castellanos

@_gusgustavo

I'm a student of Computer Engineering in my last year.

Discussion

markdown guide