DrBearhands

Posted on Nov 12, 2019 • Originally published at drbearhands.com

Haskell for madmen: Reflection and cleanup

#haskell #tutorial

Post from my blog. If you're enjoying the series, please consider buying me a coffee.

It is my personal belief that about 1/5 of the time, preferably the Monday of a full-time week of work, should be spent contemplating the work done so far and doing cleanup (refactoring, issues, etc.). Ignoring the setup chapter, this would be the fifth chapter and thus we shall do some retrospective and refactoring.

What have we done?

We've taken 5 chapters to make a basic server that just fetches data from a database and turns it into a simple HTML string. This sounds like little result for a lot of effort. However, we've also written an infinite number of tests.

In a purely functional, strongly typed language, the types correspond to proofs (we call this the Curry-Howard correspondence). A few examples:

myNumber :: Int
myNumber = 7

The type, myNumber :: Int, is saying "we have a proof that an Int exists, and that proof is called myNumber". We later give this proof in the term myNumber = 7. This proof doesn't seem really useful, and not something that would replace any tests for sure. So let's go further:

sum :: [Int] -> Int
sum = foldr (+) 0

This time sum :: [Int] -> Int tells us there is a proof called sum that can be used to get an Int from a list of Ints. We don't have to test that it returns an Int, we have proof, which is better! If we had to test this behavior, without types or inspecting the body / term of the function, we would theoretically have to test with every possible list of Ints, since one of them might yield a value of a different type. Far-fetched in this example, yes, but it becomes more relevant as functions become more complex.

We would also have to test how it works under incorrect input, since it might fail silently, crash the program or do something else unexpected.

mySumOfStrings = sum ["Foo", "Bar"] -- Bad! Won't compile

The above "proof" is invalid and won't compile, much less run. There is no need to write any tests for such cases.

There's another cool thing you can do with types using Abstract Data Types.

When writing a module, you can simply choose not to export a type's constructors. That way, any value of that type must come from functions you have exposed. We've seen this trick with the ResponseReceived type, where it ensures we don't accidentally forget to respond. We could use it in our tutorial project too, for instance by hiding the constructors for Task, so that all tasks must be retrieved from the database. Alternatively, we could define a SortedList which is implemented like a regular list, but with a hidden constructor, so we can only get one using sorting functions, and cannot use functions that would break sort-order, like map.

Basically, we can use Abstract Data Types for better encapsulation.

Because types put restrictions to what can go in and come out of a function, they fit together like Lego bricks. This makes working on projects that have grown beyond the mundane much more enjoyable.

Cleaning up our work so far

An often-cited advantage of purely functional programming is the ease of refactoring code. This makes sense if you realize function terms are proofs. A proof is valid wherever and whenever we use it! There is no context (global state) to our proofs that might change when refactoring.

Down to business!

Our code is a mess. We've put everything in the Lib module, are only exposing someFunc and we've changed none of the default names...

First, let's rename the Lib module. We'll need to change the file name, the module name declaration and any imports (in app/Main.hs in our case). We'll call it ServerMain. If we miss anything, compilation will fail, so the compiler has our back again. Let's also change someFunc to startServer.

app/Main.hs

module Main where

import ServerMain (startServer)

main :: IO ()
main = startServer

src/ServerMain.hs

{-# LANGUAGE OverloadedStrings #-}

module ServerMain
    ( startServer
    ) where

-- [Some import statements]

startServer :: IO ()
startServer = do
  run 8080 requestHandler

-- [rest of the module]

Do a stack build to check if you missed anything.

Second, we'll want to declare the HTML typeclass and any functions related to it in their own module, that way they can be used by other modules than ServerMain. Create a new file src/HTML.hs:

src/HTML.hs

module HTML where

class HTML a where
  toHTML :: a -> String

instance HTML a => HTML [a] where
  toHTML listOfElements =
    let
      elementToListElement :: HTML a => a -> String
      elementToListElement element = "<li>" ++ toHTML element ++ "</li>"
    in
      "<ul>" ++ (concat $ map elementToListElement listOfElements) ++ "</ul>"


toHTMLPage :: HTML a => a -> String
toHTMLPage a =
  "<!DOCTYPE html>\
  \<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"\" xml:lang=\"\">\
  \<head>\
  \  <meta charset=\"utf-8\" />\
  \  <title>TODO list </title>\
  \</head>\
  \<body>\
  \ " ++ toHTML a ++
  "</body>"

Notice that we didn't specify any exports, that means everything is exported. This is fine for certain kind of modules, like this one which only has utility functions.

We also didn't put the instance declaration for Task in there. We couldn't, because we don't know about Task in this module. Thanks to the open world assumption, we can declare Task to be an instance of the HTML class elsewhere.

We import the HTML module in ServerMain:

import HTML (HTML, toHTML, toHTMLPage)

And stack build to ensure we didn't forget anything.

Third, we should probably move the database operations to its own module. We'll also implement the suggestion from the last section; hiding the constructors of the Task types.

src/TaskDB.hs

{-# LANGUAGE OverloadedStrings #-}

module TaskDB
  ( TaskStatus(Done, NotDone)
  , Task
  , fetchFromDB
  ) where

import HTML (HTML, toHTML)

import qualified Hasql.Connection
import qualified Hasql.Session
import qualified Hasql.Statement
import qualified Hasql.Encoders
import qualified Hasql.Decoders
import qualified Data.Text
import qualified Data.ByteString.UTF8 as StrictUTF8 (fromString, toString)


data TaskStatus = Done | NotDone

data Task = Task String TaskStatus

instance HTML Task where
  toHTML task =
    case task of
      Task description status ->
        case status of
          NotDone -> "<p>" ++ description ++ "</p>"
          Done -> "<p><strike>" ++ description ++ "</strike></p>"


fetchFromDB :: IO [Task]
fetchFromDB =
  let
    connectionSettings :: Hasql.Connection.Settings
    connectionSettings =
      Hasql.Connection.settings
        "localhost"
        (fromInteger 5432)
        "Haskell-student"
        "Why-are-you-putting-credentials-in-code?-You-absolute-potato!"
        "todolists"
  in do
    connectionResult <- Hasql.Connection.acquire connectionSettings
    case connectionResult of
      Left (Just errMsg) -> error $ StrictUTF8.toString errMsg
      Left Nothing -> error "Unspecified connection error"
      Right connection -> do
        queryResult <- Hasql.Session.run selectTasksSession connection
        Hasql.Connection.release connection
        case queryResult of
          Right result -> return result
          Left err -> error $ show err

selectTasksSession :: Hasql.Session.Session [Task]
selectTasksSession = Hasql.Session.statement () selectTasksStatement

selectTasksStatement :: Hasql.Statement.Statement () [Task]
selectTasksStatement =
  Hasql.Statement.Statement
    "SELECT * FROM todolist_ch5"
    Hasql.Encoders.unit
    tasksDecoder
    True

tasksDecoder :: Hasql.Decoders.Result [Task]
tasksDecoder = Hasql.Decoders.rowList taskDecoder

boolToTaskStatus :: Bool -> TaskStatus
boolToTaskStatus True = Done
boolToTaskStatus False = NotDone

taskStatusDecoder :: Hasql.Decoders.Value TaskStatus
taskStatusDecoder = fmap boolToTaskStatus Hasql.Decoders.bool

stringDecoder :: Hasql.Decoders.Value String
stringDecoder = fmap Data.Text.unpack Hasql.Decoders.text

stringDecoder_row :: Hasql.Decoders.Row String
stringDecoder_row = Hasql.Decoders.column stringDecoder

taskStatusDecoder_row :: Hasql.Decoders.Row TaskStatus
taskStatusDecoder_row = Hasql.Decoders.column taskStatusDecoder

taskDecoder :: Hasql.Decoders.Row Task
taskDecoder = Task <$> stringDecoder_row <*> taskStatusDecoder_row

The module declaration is the only new thing here:

module TaskDB
  ( TaskStatus(Done, NotDone)
  , Task
  , fetchFromDB
  ) where

This specifies exactly what is exported: TaskStates with its constructors Done and NotDone, Task with no constructors, and fetchFromDB, the only way with which we can get a Task outside our module at all. Instance declarations of an exported type are exported automatically.

Optionally, we can rewrite taskDecoder, taskStatusDecoder_row and stringDecoder_row using do-notation for applicatives:

taskDecoder :: Hasql.Decoders.Row Task
taskDecoder = do
  taskDescription <- Hasql.Decoders.column stringDecoder
  taskStatus <- Hasql.Decoders.column taskStatusDecoder
  return $ Task taskDescription taskStatus

This is functionally the same as our previous code, it's up to you to decide which notation you prefer.

This leaves our ServerMain module short and sweet. But there's one more thing I'd like to change, I don't like having that naked run there, it's not descriptive enough. So we modify the Warp import declaration a bit:

src/ServerMain.hs

{-# LANGUAGE OverloadedStrings #-}

module ServerMain
    ( startServer
    ) where

import HTML (HTML, toHTML, toHTMLPage)
import TaskDB (Task)
import qualified TaskDB

import qualified Network.Wai.Handler.Warp as Warp (run)
import Network.Wai (Request, Response, ResponseReceived, responseLBS)
import Network.HTTP.Types.Status (status200)
import qualified Data.ByteString.Lazy.UTF8 as UTF8 (fromString)

startServer :: IO ()
startServer = do
  Warp.run 8080 requestHandler

requestHandler :: Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
requestHandler request respond =
  let
    htmlPage htmlAble = UTF8.fromString $ toHTMLPage htmlAble
    response tasks = responseLBS status200 [] $ htmlPage tasks
  in
    do
      taskList <- TaskDB.fetchFromDB
      respond $ response taskList

There's one more thing to do. Haskell has certain native typeclasses for which it can generate an instance automatically. Specifically, these are:

Eq defines equality of values (implementing ==)
Ord defines an order of values through (implementing <=)
Show turns a value to a string based on its constructors
Read, the inverse of Show
Enum, lets the type act as an enumeration of potential values
Bounded, which quite frankly I've never used
Functor, which derives the fmap :: (a -> b) -> f a -> f b function, provided you have the right language extension. Only works for types with kind * -> *.

You generally wouldn't want to write instances for these classes by hand, except for some niche cases, since there are obvious implementations for each. We can use the deriving functionality in Haskell:

data TaskStatus = Done | NotDone
  deriving (Eq, Ord, Show, Read, Enum)

data Task = Task String TaskStatus
  deriving (Eq, Ord, Show)

Note that we do not derive Read for Task, as we didn't want to export any way to create a Task beyond retrieving one from the database.

Deriving these instances allows us to compare values, print them and read them. I derive Eq, Ord and Show for almost all my types, the other typeclasses I almost never derive.

There is more cleanup we could do, like reading port number, DB connection settings, etc. from environment settings or the command line, but those are rather environment specific, will complicate local testing a bit, and there's some discussion to be had about it, so we won't do that now.

Limits of type-based proofs

We've seen how types can be used to build proofs and how that makes refactoring easy. While this is a hell of a lot better than what we can do in most other languages, there are limits to the proofs we can build in Haskell that you should be aware of.

Obviously, although we can prove that sum :: [Int] -> Int returns an Int, we can't prove that the result will actually be the sum of the input list. We could have defined sum as sum _ = 7! Dependently-typed languages can relate proofs to actual values, but even in that case, we would have to define what a sum is to verify that the result of sum is an actual sum. At some point we need axioms. This is a fundamental limitation of mathematics.

Then there's the IO monad. Once we allow effects in functions, proofs become a lot weaker, since we don't know what might be happening with context. If you have a lot of communication and little computation, Haskell tends to degrade to the level of type-safety offered by strongly-typed imperative languages.

Finally, and this is a missed opportunity IMHO, Haskell proofs have terms and conditions attached to them, stemming from Haskell being in the Hask category rather than in Set. In Haskell, a -> b does not mean that we will definitely get an inhabitant of b from an inhabitant of a, as we would in the Set category / total functional programming. Haskell's functions are partial, meaning they may not produce a b for every a.

In part, this is by necessity. Because of the halting problem. We cannot decide for every Turing complete program if it terminates (having a function that can do this would lead to a contradiction). That means a function f :: a -> b, when given an a, might just never terminate. In that case, we say the function returns a bottom, denoted as ⊥. Let's take a look at how this would look, first we make a valid proof:

slowMultiplication :: Int -> Int -> Int
slowMultiplication n 0 = 0
slowMultiplication n m = n + slowMultiplication n (m-1)

This kind of recursion is equivalent to a proof by induction. We prove something for the base cases (slowMultiplication x 0 = 0), then prove it for the general case (slowMultiplication n m) given that it's true for the predecessor (slowMultiplication n (m-1)). The same things that make proofs by induction invalid can also cause non-termination in recursion: no base case and not using the right predecessor.

slowMultiplication_bad1 :: Int -> Int -> Int
slowMultiplication_bad1 n m = n + slowMultiplication_bad1 n (m-1)

slowMultiplication_bad2 :: Int -> Int -> Int
slowMultiplication_bad2 n 0 = 0
slowMultiplication_bad2 n m = n + n + slowMultiplication_bad2 n (m-2)

On the other hand, this is also a useful feature under lazy evaluation to make infinite structures:

infiniteFooBar :: [String]
infiniteFooBar = "Foo" : infiniteBarFoo

infiniteBarFoo :: [String]
infiniteBarFoo = "Bar" : infiniteFooBar

Some people think it's better to be total than Turing-complete. Personally I can accept either, I've not had non-termination bite me in the ass often enough to care strongly, but it has happened.

Note: the halting problem

Given a function terminates :: Program -> Bool that returns whether a program terminates, we can construct a program that passes its own code to terminates, and purposefully exhibits the opposite behavior, entering an infinite loop if it is predicted to terminate. The potential existence of such a problematic program proves the terminate function cannot be total.

But there are other exceptions to totality, Haskell allows non-exhaustive patterns and run-time errors. Here's an example stolen from an answer on stackoverflow:

-- Valid Haskell that will compile, but pattern is non-exhaustive
fac :: Int -> Int
fac 0 = 1
fac n | n > 0 = n * fac (n-1)

-- Valid Haskell with exhaustive pattern, but can throw a custom run-time error
fac2 :: Int -> Int
fac2 0 = 1
fac2 n | n > 0 = n * fac2 (n-1)
fac2 n | otherwise = error "Cannot get factorial of negative number"

This is a consequence of being liberal with patterns. We've only used constructors so far, but as you can see Haskell also allows guards (such as | n > 0). Again, a practical trade-off that you may or may not agree with. I don't like this one, preferring to instead use more restricted types to clearly define e.g. that we require a natural (unsigned integer) rather than an integer.

Conclusion

We've seen how a strongly-types purely-functional programming language like Haskell use proof to let you write, refactor and expand software with confidence. We've also explored the limits of those proofs in Haskell. If you're interested in stronger proofs, look at Elm, which uses purity and exhaustive patterns to avoid run-time exceptions, or dependently typed languages like Agda or Idris. Do expect them to be somewhat less practical than Haskell though.

Final code for this chapter.