DEV Community

Anay Nayak
Anay Nayak

Posted on

Haskell : Parsing a log message

Week 2 of CIS194 has an interesting problem which deals with parsing log messages. A set of types are provided and we need to write a parseMessage method which returns a LogMessage from a String parameter.

Provided types:
data MessageType = Info
                 | Warning
                 | Error Int
  deriving (Show, Eq)

type TimeStamp = Int

data LogMessage = LogMessage MessageType TimeStamp String
                | Unknown String
  deriving (Show, Eq)
Enter fullscreen mode Exit fullscreen mode
Sample log file:
I 11 Initiating self-destruct sequence
E 70 3 Way too many pickles
E 65 8 Bad pickle-flange interaction detected
W 5 Flange is due for a check-up
I 7 Out for lunch, back in two time steps
Bad message
Enter fullscreen mode Exit fullscreen mode

The log structure is largely similar for all log lines except for Error where we also have an error code.

W 5 Flange is due for a check-up is parsed as a Warning message with timestamp=5 and the rest as the message.

E 23 5 Flange is due for a check-up is parsed as a Error with code 23, timestamp=5 and the rest as the message.

We parse the message as Unknown if any of the following hold true:

  1. Timestamp is not an integer
  2. message starts with a symbol other than E/W/I
  3. E messages not followed by an integer code.
  4. The message structure doesn't match <type> <ts> <msg>

Attempt #1

parseMessage :: String -> LogMessage
parseMessage msg = case parseCode $ words msg of
        (Just messageType, Just timestamp, rest)  -> LogMessage messageType timestamp rest
        _ -> Unknown msg

parseCode :: [String] -> (Maybe MessageType, Maybe TimeStamp, String )
parseCode ("E":code:ts:rest) =  (parseError code, toInt ts, unwords rest)
parseCode ("W":ts:rest) = (Just Warning, toInt ts, unwords rest)
parseCode ("I":ts:rest) = (Just Info, toInt ts, unwords rest)
parseCode msg = (Nothing, Nothing, unwords msg)

parseError :: String -> Maybe MessageType
parseError code = Error `fmap` toInt code

toInt :: String -> Maybe Int
toInt = readMaybe
Enter fullscreen mode Exit fullscreen mode

Notes:

  1. The parsing of timestamp can fail and so we use a Maybe Int type to handle the absence of a meaningful timestamp. Same holds true for error code.
  2. We pattern match on the parseCode response so that we can handle happy scenarios and fallback to Unknown for everything else.

Attempt #2

parseMessage :: String -> LogMessage
parseMessage s =
    let (maybeMessagetype, s1) = parseType $ words s
        (maybeTs, s2) = parseTs s1
        lm = liftA3 LogMessage maybeMessagetype maybeTs (Just $ unwords s2)
    in fromMaybe (Unknown s) lm

parseType :: [String] -> (Maybe MessageType, [String])
parseType ("I":xs) = (Just Info, xs)
parseType ("W":xs) = (Just Warning, xs)
parseType s@("E":code:rest) = (Error <$> readMaybe code, rest)
parseType s = (Nothing, s)

parseTs :: [String] -> (Maybe TimeStamp, [String])
parseTs (x:xs) = (readMaybe x, xs)
Enter fullscreen mode Exit fullscreen mode

In Attempt #1, we looked at the entire log message within the parseCode method so that the structure was visible. However this doesn't scale that well. Instead, we change the structure so that each method handles a subset of the string and returns a Maybe along with the string that wasn't consumed.

Notes:

  1. parseType and parseTs are only responsible for handling the bits that they understand. If it can't process the string, it returns a tuple with Nothing and the original string.
  2. the parseMessage method needs to pass through left over state to the subsequent parseXYZ method.
  3. We combine all the Just values into a LogMessage by using liftA3.

Week 10 introduces us to a Parser type:

newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }

The method signature in the previous solution looks very similar to a Parser.

The Parser type lets you define parsers such as:

satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
  where
    f [] = Nothing
    f (x:xs)
        | p x       = Just (x, xs)
        | otherwise = Nothing

Enter fullscreen mode Exit fullscreen mode

How do you use it ? For e.g. char c = satisfy (== c) defines an exact match character parser. Similarly we can define a parser for other smaller units and compose them.

Attempt #3

parseMessage :: String -> LogMessage
parseMessage str = fromJust  $ runLogMessageParser str where
    runLogMessageParser s = fst <$> runParser logMessage s
    logMessage = parseError <|> parseInfo <|> parseWarn <|> parseUnknown
    parseUnknown = fmap Unknown parseMsg
    parseError = liftA3 LogMessage parseECode parseTs parseMsg
    parseInfo = liftA3 LogMessage parseICode parseTs parseMsg
    parseWarn = liftA3 LogMessage parseWCode parseTs parseMsg
    parseECode = Error . fromInteger <$> (char 'E' *> char ' ' *> posInt)
    parseICode = char 'I' $> Info
    parseWCode = char 'W' $> Warning
    parseTs = fromInteger <$> (char ' ' *> posInt)
    parseMsg = many <$> satisfy $ const True
Enter fullscreen mode Exit fullscreen mode

Notes:

  1. We use Alternative to make the various possibilities clear. logMessage = parseError <|> parseInfo <|> parseWarn <|> parseUnknown tells us that we can expect only one of those four.
  2. We can compose small and well defined parsers so that the code is much more readable and expresses the state clearly. Unlike the previous attempt, we no longer have to explicitly pass the leftover string.

Some symbols from the previous block:

  (*>) :: f a -> f b -> f b
  ($>) :: f a -> b -> f b   
Enter fullscreen mode Exit fullscreen mode

Approach 3 doesn't show all the underlying constructs required to get the solution working. See the full solution for all that. There are a lot of concepts like Functor, Applicative and Alternatives which help bring clarity in the final solution.

You can also experiment with the various solutions using repl.it

Top comments (0)