Joona Piirainen

Posted on

# Implementing parser combinators pt. 1

One of the many strengths of haskell is the ability to easily embed other small languages (sometimes called EDSL, short for embedded domain specific language) inside it. One famous example of this is the family of libraries known as parser combinators. They aim to solve the task of parsing structured text into well-typed data-structures, and succeed in it quite beautifully. In this short series of blog posts I will show you how to implement a simple and usable parser combinator library. You can find the source code for the library we will be developing here. For real world use-cases you should propably just use megaparsec.

## MicroParser

So now let's build own own toy parser combinator library! We will call it MicroParser.

``````{-# LANGUAGE DeriveFunctor #-}
module MicroParser where

import Control.Applicative (Alternative (..))
import qualified Data.Char as Char
import Data.List (intercalate)
``````

Let's start by thinking what the type of a parser should be? Structurally a parser is a function which takes an input stream of characters and yields a parse tree by applying the parsing logic over the input stream of characters to build up a composite data structure.

``````data ParseResult
= ParseSuccess !a !Int String
| ParseFailure [(Int, String)]
deriving (Functor, Show)

newtype Parser a
= Parser {unParser
:: Int
-- ^ offset
-> String
-- ^ input stream of characters
-> ParseResult a
}
``````

Our parser is polymorphic over the thing our parser should produce when successful. Internally our parser function takes an `Int` as it's first argument, which represents the position of the character we are currently parsing. This is useful for reporting errors. Other than that the type is relatively straight forward. Let's define a function for running a parser.

``````runParser :: Parser a -> String -> Either String a
runParser (Parser p) ts = case p 0 ts of
ParseSuccess x _ _ -> Right x
ParseError es ->
Left \$
"Expecting "
<> intercalate
" OR "
[ e <> " at position " <> show i
| (i, e) <- es
]
``````

Running a parser yields either a parse result when successful and an error message in the case of an error, which we represent here as a (somewhat) formatted string. In proper parser combinator libraries error handling is handled in more elegant and structured ways.

Now we have defined the basic structure of our library, but it is not very usable at this point. We have no convenient way of constructing parsers nor do we have any way of combining them. So clearly we need to do some more work. The power of parser combinators mostly come from the typeclass instances we define for our `Parser` type. Especially `Applicative` and `Alternative` in our case. Some parser combinator libraries encourage the usage of the syntactic sugar haskell provides for monads. We are not going to define a monad instance for our parser since we can do plenty without it.

To define `Applicative` and `Alternative` instances we need to first define `Functor` for out `Parser` type. Here is the `Functor` instance definition for our `Parser` type.

``````instance Functor Parser where
fmap f (Parser p) = Parser (\i ts -> fmap f (p i ts))
``````

Now we can move on to the more interesting instances. Let's start with Applicative.

``````instance Applicative Parser where
pure x = Parser (ParseSuccess x)
Parser a <*> Parser b =
Parser
( \i ts ->
case f i ts of
ParseError es -> ParseError es
ParseSuccess x i' ts' -> case g i' ts' of
ParseError es' -> ParseError es'
ParseSuccess x' i'' ts'' -> ParseSuccess (x x') i'' ts''
)
``````

This might be a bit intimidating if you are unfamiliar with `Applicative`. I'll not go through explaining `Applicatives` in detail here, since it's out of the scope of this blog post, but I highly suggest learning what they are and to start building an intuition on when you should use them. Here is a good starting point.

### Alternatives

Next we will define `Alternative`. Alternative is not as well known as typeclasses such as `Functor`, `Applicative` and `Monad` but are super useful for our use-case, so I think it's worth looking at them in a bit more detail.
The `Alternative` typeclass has four members including `empty`, `some`, `many` and most importantly for us `<|>`. Let's explore what `<|>` does. The `Maybe` data type is well known so I will use it for demostration purposes since most people know what it represents. For me the mental model for `<|>` is the following. "Try executing the left hand side function and if successful, return the result, otherwise return the result of calling the right hand side function". (This mental model happens to work for `Maybe`, but is not always correct since the operator can have different effects on other data types.)

``````ghci> Just 4 <|> Just 5
Just 4
ghci> Just 4 <|> Nothing
Just 4
ghci> Nothing <|> Just 4
Just 4
ghci> Nothing <|> Nothing
Nothing
``````

But we are implementing parsers... how can we use this? Well we can express choice with it. Particulally we can think of it as saying "try parse this, then try parse this, then try parse this, and so on". This might sound a bit abstract at this point. But will become clear when we are finished!

``````instance Alternative Parser where
empty = Parser (\_ _ -> ParseError [])
Parser f <|> Parser g =
Parser
( \i ts ->
case f i ts of
success@ParseSuccess {} -> success
ParseError errs0 -> case g i ts of
success@ParseSuccess {} -> success
ParseError errs1 -> ParseError (errs0 <> errs1)
)
``````

This instance definition actually follows the mental model quite clearly. "Run the first parser, if successful, return the results, otherwise run the second parser".

We are getting closer. Almost all the hard work is now done and we can move on to the fun part.

### Combinators

To wrap up the first part of this series of posts, we will define a fundamental combinator found in almost all of the parser combinator libraries, `satisfy`, and show a couple of use-cases for it. We will write `satisfy` in terms of a more general function called `satisfyMaybe`, which takes a description (used for error messages), a function from `Char` to a `Maybe` value indicating weather the character satisfies some condition or not.

``````satisfyMaybe :: String -> (Char -> Maybe a) -> Parser a
satisfyMaybe descr p =
Parser
( \i ts -> case ts of
(t : ts') | Just x <- p t -> ParseSuccess x (i + 1) ts'
_ -> ParseError [(i, descr)]
)

satisfy :: String -> (Char -> Bool) -> Parser Char
satisfy descr p = satisfyMaybe descr (\t -> if p t then Just t else Nothing)
``````

Once we have `satisfy` we can define various combinators based on it. We will start with one that parses any character, one that parses a specific character, one for parsing whitespace and finally one that parses a specific sequence of characters.

``````anyChar :: Parser Char
anyChar = satisfy "any character" (const True)

char :: Char -> Parser ()
char c = void \$ satisfy (show c) (== c)

spaces :: Parser ()
spaces = void \$ many (satisfy "whitespace" Char.isSpace)

string :: String -> Parser ()
string [] = pure ()
string (x : xs) = char x *> string xs
``````

Here's an example of how one might use these. Suppose we want to parse a key value pair into a `KV` data type.

``````-- | example kaye value pair
text :: String
text = "key: value"

-- | Data type we would like parse the text into
newtype KV = KV (String, String) deriving (Show)
``````

We can do this via the following parser. Here we leverage some of the combinators we defined previously.

``````pKv :: Parser KV
pKv = KV <\$> parseKv
where
parseKv = (,) <\$> many alpha <* char ':' <* spaces <*> many alpha
``````

Admittedly this example is not the best one and doesn't show the full power of parser combinators. We need some additional combinators to gain more power to see what they are really capable of. But that is a challenge we will tackle in part 2 of this series of posts.

Thank you for reading, hope you enjoyed. Have a nice day, and until next time!

My other posts can be found here.

• Joona