Dwayne Crooks

Posted on Apr 16

Writing more readable parsers in Haskell using keep and skip from elm/parser

#haskell #parser #elm

The CPS transformer from Chapter 6 of EOPL3 reuses the LETREC language from Chapter 3 with the addition of multiargument procedures and multideclaration letrec expressions. I already wrote a parser for it when I solved exercise 3.33 so I decided to start writing my implementation of the CPS transformer by basing my implementation on LETREC-3.33.

Before starting on the main task I decided to refactor its parser. I extracted a lexer, Lexer.hs, which simplified the parser alot but then I was left with parsing expressions that worked but felt a bit icky.

For e.g.

ifExpr :: Parser Expr
ifExpr = If <$> (rIf *> expr) <*> (rThen *> expr) <*> (rElse *> expr)

In particular, I didn't like the placement of the parentheses and how it grouped the terminals with the expression to the right of it. It didn't match how I thought of the parser doing a left-to-right parse of the string. This made me wonder if thinking in terms of keep and skip from elm/parser would improve the situation.

`keep` and `skip` from `elm/parser`

keep is implemented by the function (|=) : Parser (a -> b) -> Parser a -> Parser b and it is used to keep values in a parser pipeline.

skip is implemented by the function (|.) : Parser keep -> Parser ignore -> Parser keep and it is used to skip values in a parser pipeline.

You typically use them in a pipeline along with succeed as follows:

ifExpr : Parser Expr
ifExpr =
  P.succeed If
    |. rIf
    |. leftParen
    |= P.lazy (\_ -> expr)
    |. rightParen
    |= block
    |= optional
        ( P.succeed identity
            |. rElse
            |= block
        )

N.B. This example was taken from Monkey/Parser.elm.

I personally find that to be easier to read and understand. It's also easier to write. You just have to think about what you want to keep and what you want to skip. And, you don't have to worry about the proper placement of parentheses.

Finding equivalents in Haskell

pure :: a -> f a corresponds with succeed.

(<*>) :: f (a -> b) -> f a -> f b corresponds with (|=).

(<*) :: f a -> f b -> f a corresponds with (|.).

Hence,

If <$> (rIf *> expr) <*> (rThen *> expr) <*> (rElse *> expr)

can be rewritten as:

pure If <* rIf <*> expr <* rThen <*> expr <* rElse <*> expr

Since (<*) and (<*>) are both left associative and have the same precendence the above is interpreted as:

(((((((pure If) <* rIf) <*> expr) <* rThen) <*> expr) <* rElse) <*> expr)

Starting with the innermost parenthesized expression and moving outward, it says:

Lift If into the applicative
Parse "if" but skip its value
Parse expr and keep its value
Parse "then" but skip its value
Parse expr and keep its value
Parse "else" but skip its value
Parse expr and keep its value

The key takeaway for me is that you write, read, and understand it from left-to-right.

About `(<$)`

And then I learned about (<$) :: a -> f b -> f a and noticed that:

pure x <* y

is equivalent to

x <$ y

So now:

pure If <* rIf <*> expr <* rThen <*> expr <* rElse <*> expr

can be rewritten as:

If <$ rIf <*> expr <* rThen <*> expr <* rElse <*> expr

Conclusion

Thinking in terms of keep and skip helped me to improve the readability of all my parsers. An unexpected consequence is that they are also easier to write now. I simply look at the non-terminal for which I want to write the parser and determine what I want to keep and what I want to skip.

Grammar:

Program ::= Expr
Expr    ::= Const | Var | Diff | Zero | If | Let | Proc | Call | Letrec
Const   ::= Number
Var     ::= Id
Diff    ::= '-' '(' Expr ',' Expr ')'
Zero    ::= 'zero?' '(' Expr ')'
If      ::= 'if' Expr 'then' Expr 'else' Expr
Let     ::= 'let' Id '=' Expr 'in' Expr
Proc    ::= 'proc' '(' Id ')' Expr
Call    ::= '(' Expr Expr* ')'
Letrec  ::= 'letrec' (Id '(' (Id (',' Id)*)? ')' '=' Expr)* 'in' Expr
Number  ::= [0-9]+
Id      ::= [a-z]+

Parser:

program :: Parser Program
program = Program <$ whiteSpace <*> expr <* eof

expr :: Parser Expr
expr
  = constExpr
  <|> diffExpr
  <|> zeroExpr
  <|> ifExpr
  <|> letrecExpr
  <|> letExpr
  <|> procExpr
  <|> callExpr
  <|> varExpr

constExpr :: Parser Expr
constExpr = Const <$> number

diffExpr :: Parser Expr
diffExpr = hyphen *> parens (Diff <$> expr <* comma <*> expr)

zeroExpr :: Parser Expr
zeroExpr = Zero <$ rZero <*> parens expr

ifExpr :: Parser Expr
ifExpr = If <$ rIf <*> expr <* rThen <*> expr <* rElse <*> expr

letrecExpr :: Parser Expr
letrecExpr = Letrec <$ rLetrec <*> many recProc <* rIn <*> expr
  where
    recProc = (,,) <$> identifier <*> parens (commaSep identifier) <* equal <*> expr

letExpr :: Parser Expr
letExpr = Let <$ rLet <*> identifier <* equal <*> expr <* rIn <*> expr

procExpr :: Parser Expr
procExpr = Proc <$ rProc <*> parens identifier <*> expr

callExpr :: Parser Expr
callExpr = parens (Call <$> expr <*> many expr)

varExpr :: Parser Expr
varExpr = Var <$> identifier

N.B. See Letrec/Parser.hs for the full source code.

P.S. It seems I discovered this improvement 4 years ago, when I implemented Monkey in Haskell, but forgot about it. Hopefully, writing about it will help me remember to use it in the future.

Subscribe to my newsletter

If you're interested in improving your skills with Elm then I invite you to subscribe to my newsletter, Elm with Dwayne. To learn more about it, please read this announcement.

DEV Community

Writing more readable parsers in Haskell using keep and skip from elm/parser

`keep` and `skip` from `elm/parser`

Finding equivalents in Haskell

About `(<$)`

Conclusion

Subscribe to my newsletter

Top comments (0)

keep and skip from elm/parser

Finding equivalents in Haskell

About (<$)

Conclusion

Subscribe to my newsletter

`keep` and `skip` from `elm/parser`

About `(<$)`