Week 2 of CIS194 has an interesting problem which deals with parsing log messages. A set of types are provided and we need to write a parseMessage
method which returns a LogMessage
from a String
parameter.
Provided types:
data MessageType = Info
| Warning
| Error Int
deriving (Show, Eq)
type TimeStamp = Int
data LogMessage = LogMessage MessageType TimeStamp String
| Unknown String
deriving (Show, Eq)
Sample log file:
I 11 Initiating self-destruct sequence
E 70 3 Way too many pickles
E 65 8 Bad pickle-flange interaction detected
W 5 Flange is due for a check-up
I 7 Out for lunch, back in two time steps
Bad message
The log structure is largely similar for all log lines except for Error where we also have an error code.
W 5 Flange is due for a check-up
is parsed as a Warning message with timestamp=5 and the rest as the message.
E 23 5 Flange is due for a check-up
is parsed as a Error with code 23, timestamp=5 and the rest as the message.
We parse the message as Unknown if any of the following hold true:
- Timestamp is not an integer
- message starts with a symbol other than E/W/I
- E messages not followed by an integer code.
- The message structure doesn't match
<type> <ts> <msg>
Attempt #1
parseMessage :: String -> LogMessage
parseMessage msg = case parseCode $ words msg of
(Just messageType, Just timestamp, rest) -> LogMessage messageType timestamp rest
_ -> Unknown msg
parseCode :: [String] -> (Maybe MessageType, Maybe TimeStamp, String )
parseCode ("E":code:ts:rest) = (parseError code, toInt ts, unwords rest)
parseCode ("W":ts:rest) = (Just Warning, toInt ts, unwords rest)
parseCode ("I":ts:rest) = (Just Info, toInt ts, unwords rest)
parseCode msg = (Nothing, Nothing, unwords msg)
parseError :: String -> Maybe MessageType
parseError code = Error `fmap` toInt code
toInt :: String -> Maybe Int
toInt = readMaybe
Notes:
- The parsing of timestamp can fail and so we use a
Maybe Int
type to handle the absence of a meaningful timestamp. Same holds true for error code. - We pattern match on the
parseCode
response so that we can handle happy scenarios and fallback to Unknown for everything else.
Attempt #2
parseMessage :: String -> LogMessage
parseMessage s =
let (maybeMessagetype, s1) = parseType $ words s
(maybeTs, s2) = parseTs s1
lm = liftA3 LogMessage maybeMessagetype maybeTs (Just $ unwords s2)
in fromMaybe (Unknown s) lm
parseType :: [String] -> (Maybe MessageType, [String])
parseType ("I":xs) = (Just Info, xs)
parseType ("W":xs) = (Just Warning, xs)
parseType s@("E":code:rest) = (Error <$> readMaybe code, rest)
parseType s = (Nothing, s)
parseTs :: [String] -> (Maybe TimeStamp, [String])
parseTs (x:xs) = (readMaybe x, xs)
In Attempt #1, we looked at the entire log message within the parseCode
method so that the structure was visible. However this doesn't scale that well. Instead, we change the structure so that each method handles a subset of the string and returns a Maybe
along with the string that wasn't consumed.
Notes:
- parseType and parseTs are only responsible for handling the bits that they understand. If it can't process the string, it returns a tuple with
Nothing
and the original string. - the parseMessage method needs to pass through left over state to the subsequent parseXYZ method.
- We combine all the Just values into a LogMessage by using liftA3.
Week 10 introduces us to a Parser
type:
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
The method signature in the previous solution looks very similar to a Parser.
The Parser type lets you define parsers such as:
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where
f [] = Nothing
f (x:xs)
| p x = Just (x, xs)
| otherwise = Nothing
How do you use it ? For e.g. char c = satisfy (== c)
defines an exact match character parser. Similarly we can define a parser for other smaller units and compose them.
Attempt #3
parseMessage :: String -> LogMessage
parseMessage str = fromJust $ runLogMessageParser str where
runLogMessageParser s = fst <$> runParser logMessage s
logMessage = parseError <|> parseInfo <|> parseWarn <|> parseUnknown
parseUnknown = fmap Unknown parseMsg
parseError = liftA3 LogMessage parseECode parseTs parseMsg
parseInfo = liftA3 LogMessage parseICode parseTs parseMsg
parseWarn = liftA3 LogMessage parseWCode parseTs parseMsg
parseECode = Error . fromInteger <$> (char 'E' *> char ' ' *> posInt)
parseICode = char 'I' $> Info
parseWCode = char 'W' $> Warning
parseTs = fromInteger <$> (char ' ' *> posInt)
parseMsg = many <$> satisfy $ const True
Notes:
- We use
Alternative
to make the various possibilities clear.logMessage = parseError <|> parseInfo <|> parseWarn <|> parseUnknown
tells us that we can expect only one of those four. - We can compose small and well defined parsers so that the code is much more readable and expresses the state clearly. Unlike the previous attempt, we no longer have to explicitly pass the leftover string.
Some symbols from the previous block:
(*>) :: f a -> f b -> f b
($>) :: f a -> b -> f b
Approach 3 doesn't show all the underlying constructs required to get the solution working. See the full solution for all that. There are a lot of concepts like Functor, Applicative and Alternatives which help bring clarity in the final solution.
You can also experiment with the various solutions using repl.it
Top comments (0)