Haskell for madmen (5 Part Series)
In this chapter we're going to write a hello world program. Doing so in Haskell requires monads, a concept from category theory. You might think this is overly complicated for something as simple as printing some output, but you'd be wrong, printing output is not simple at all. Just think about buffers, encoding, concurrency... If you judge a language by the ease with which you can write helloworld, you're going to pick a language that hides important "details".
We're first going to look at pure functions, i.e. functions without any side-effects, and their types. Purity is the default in Haskell, but does not allow writing output, which is a (side) effect.
We will then cover the IO monad, which is how Haskell handles necessary effects. This will allow us to finally write helloworld.
This chapter is probably the hardest part of this tutorial and what makes it for madmen.
In essence, every function in Haskell has exactly 1 input and 1 output. We can still use "multiple" inputs by making that output itself a function. Let's look at how this works with addition:
(+) 3 7
Writing two values separated by a space is function application,
f a is applying argument
a to function
Function application in Haskell is left-associative, so the code above is equivalent to:
((+) 3) 7
Here, we have a function
(+), to which we apply
3. The result is a new function that takes a number and adds 3 to it. To this new function we then apply
Now it's time to talk about types. As mentioned, a function only has one input and on output, and we type it using an arrow. A function that takes some type
a as input and outputs some
b has type
a -> b. Consequently, a function that takes a
a and produces a second function with type
b -> c, has type
a -> (b -> c). For our function
(+), that means the type must be:
(+) :: Int -> (Int -> Int)
(Note: in Haskell,
:: indicates a type declaration)
Since arrows in type declarations are right-associative, the parentheses are superfluous in this case, and we can also write:
(+) :: Int -> Int -> Int
Because we know functions cannot have side-effects, the type declaration tells us pretty much exactly what the function is going to do: it will compute an
Int given two other
Ints. Nothing else will happen, and it won't matter when we evaluate this function.
We don't have to apply all arguments at once. The following is perfectly valid:
add3 :: Int -> Int add3 = (+) 3 ten :: Int ten = add3 7
(+) function is a bit special. Using symbols in parentheses as a function names turns it into an infix operator. We can therefore also write a much more natural looking sum:
ten :: Int ten = 3 + 7
Finally, a note about lazy evaluation. In the above examples,
ten does not have the value
10, rather it is an expression that will result in the value
10. Thanks to referential transparency, the compiler can decide not to compute the value of
ten right away, but just pass (a reference to) it's body (
3 + 7) and postpone evaluation to whenever it's actually needed. Not only does this let us avoid unnecessary computations, but it also lets us use infinite data structures. An example:
infiniteListOf1s :: [Int] infiniteListOf1s = 1 : infiniteListOf1s
: is a "prepend" operator with type
a -> [a] -> [a], e.g.
1 : [2,3] =
[1, 2, 3])
This list is infinite! Because Haskell is non-strict, this is fine. The compiler will make sure we only compute the part of the list we actually need by using lazy evaluation. So as long as we don't try to read the whole list, it won't enter an infinite loop.
Let us now, armed with the above knowledge, look at the project generated by the command
app/Main.hs we find the following:
module Main where import Lib main :: IO () main = someFunc
This should confuse you. So let's look at the individual parts.
module Main where declares the name of the module. "Main" is a special name, as you probably expect.
import Lib imports the Lib module and adds all the symbols
Lib exports to the current namespace. This includes
someFunc, but we can't tell from the import statement, we will be changing this line to something more informative in a bit.
main :: IO () declares that the type of the value called
IO (). This would quite rightly confuse you about now. Why isn't this a function? What are these mysterious parentheses? What is an
main a function? As mentioned, functions compute one value from another, but this isn't really the nature of computer programs. Rather, we want a program that does stuff, whereas a function is just an inert formula.
main should be a list of actions / effects, such as writing to stdout or listening for http requests.
Then what is
()? This is a special type called unit. Every type has one or more inhabitants, a boolean has inhabitants
False, an unsigned 8-bit integer is inhabited by numbers 0 through 28. Unit has only 1 inhabitant, the unit. This entails that a value with type unit carries no information. Having a pure function that outputs a unit would be pointless, because we could simply substitute the answer without ever needing to evaluate said function. However, the unit type acts a lot like the number 1, and has a lot of uses when combined with other types. One of those is as used here. Both the type and the inhabitant of unit are written as
() in Haskell.
IO, if you're particularly observant, you may have noticed we seem to have applied it just like a function, but in the type declaration. Just like values have types, types have kinds. The kind of
Char, etc. is
*, the kind of
* -> *. Just as with function application, the kind of
IO (), that is
() applied to
IO, is therefore
*. As for the "meaning" of
IO, it turns a type that is computed with pure functions, into one that is computed using effects. We will see how to use impure functions in a bit.
Putting all that together,
main, by its type
IO () is a unit computed using side effects. Since
() contains no information,
IO () is just a series of effects / instructions "without" output. That is pretty much what we generally look for in a program!
After all that stuff about the type of
main, we turn to look at the next line, the term of main:
main = someFunc. This is rather disappointing, as it merely references a single other value.
someFunc is exported by the
Lib module. We pretty much have to guess that, because with the current import notation, it is not explicit. This becomes a problem once you have more modules. So let's get ahead of ourselves and change that import to the following:
import Lib (someFunc)
this will expose only
Lib, and tells us exactly where symbols are coming from. We can also make the import qualified, which forces us to mention the module name explicitly:
import qualified Lib main = Lib.someFunc
plus it's totally valid to do both:
import qualified Lib import Lib (someFunc)
Which will expose
someFunc, but allows you to access other symbols through the explicit notation from above.
Now it's time to look at what is happening in
someFunc. But where can we find the
Lib module that defines it? If you haven't changed anything, it should be
src/Lib.hs (modules names must match relative path names). You can change which directory are part of your project in
The file should look like this:
module Lib ( someFunc ) where someFunc :: IO () someFunc = putStrLn "someFunc"
We've seen the module declaration before. This one also specifies which symbols to export,
someFunc in this case.
someFunc is apparently defined as the application of the
"someFunc" applied to
putStrLn is part of the standard prelude, which is implicitly imported in every file. It's type is
String -> IO () and, as you might expect, what it does is write a string to stdout (without flushing buffers).
But how can we have multiple effects? Is there some function with type
IO () -> IO () -> IO () that combines the effects in both arguments and that we have to tediously add everywhere? Well, such a function exists but there's better ways to go about it.
As we've seen,
IO has kind
* -> *. When a type of kind
* -> * follows certain rules, we say that it is a monad. In particular, for any monad
m, the following functions must exist:
return :: a -> m a fmap :: (a -> b) -> m a -> m b (>>=) :: m a -> (a -> m b) -> m b
IO is a monad. Substituting
m, we know that we have at least the following functions:
return :: a -> IO a fmap :: (a -> b) -> IO a -> IO b (>>=) :: IO a -> (a -> IO b) -> IO b
return simply pretends an pure computation is impure. Similarly,
fmap turns a function over pure values into one over impure values. If, for instance, we read some input string, that string has type
IO String, but our pure functions only work for
fmap remedies that problem by lifting functions to IO.
(>>=) is a bit more complicated. You can think of it a bit like unix pipes. It takes some impurely computed value, and feeds that value to the next impure computation.
Let's look at a few examples, take the time to let this sink in:
computeHelloWorld :: IO String computeHelloWorld = return "Hello, World!" computeHelloWorldLength :: IO Int computeHelloWorldLength = fmap length computeHelloWorld greetTheWorld :: IO () greetTheWorld = computeHelloWorld >>= putStrLn
putStrLn is the output-writing function with type
String -> IO ())
For a slightly more useful example we'll use
getLine from the prelude. It has type
IO String and will get a line from stdin. We can now do:
nameToGreeting :: String -> String nameToGreeting name = "Hello " ++ name ++ ", I am monad." greetPerson :: IO () greetPerson = fmap nameToGreeting getLine >>= putStrLn
This is not very easy to read, and it would get a lot worse if we were to also write to stdout before reading from stdin. Luckily, Haskell has some syntactic sugar for monads in the form of do-blocks.
greetPerson :: IO () greetPerson = do putStrLn "I am monad, what is your name?" personName <- getLine putStrLn ("Hello, " ++ personName ++ "! What shall we *do* together?")
The type of a do-block will be the type of its last element.
Suppose we have the type
IO (IO a). By
(>>=), we know that it is equivalent to
IO (). Because for any
foo :: IO (IO a) we can do
bar = foo >>= id, resulting in a
bar :: IO a. But by
return bar we get
foo again! In other words, it doesn't matter how often you apply a monad to a type, it will be equivalent (technically, isomorphic) to a single application of that monad. This is what the famous phrase "monads are just monoids in the category of endofunctors" means (sort-of). A monoid being something that stays the same (type) under repeated application of a certain operation.
You've learned how effects can be represented in a functionally pure language by using monads. It is usually better to avoid using
IO monads when possible. Trivially, we could write our entire program using monads, but we would lose many advantages of programming in Haskell.
I would personally argue that an IO monad is not a good way to represent effects, but it's the current standard for generic functionally pure programming languages. Some interesting alternatives include uniqueness types (used in Clean), free monads (Haskell) and model-update systems (Elm).
You've also been introduced to lazy evaluation. There are both advantages and disadvantages to it, but that is outside the scope of this chapter.