Jasper Woudenberg

Posted on Feb 27, 2020 • Edited on Nov 21, 2020

Shake rules as recipes

#build #shake #haskell

As a project grows it accumulates more and more supporting tooling. At some point this toolset can encounter growing pains familiar to any growing software projects, related to UX, maintainability, and performance.

A build system can help address these pains. A build system allows us to organize our tooling code into small build rules with explicit dependencies, improving maintainability. Aware of these dependencies the build system can choose to skip parts of the build whose dependencies didn't change, improving performance. And if our full build command is quick then developers don't need to know tons of smaller scripts, improving UX.

In this post I would like to look at one build system in particular. Shake is build system written in Haskell, and defines an API for writing build rules in Haskell too. Shake has a great API for calling external commands and shell scripts, so we can add it to an existing toolset bit by bit. Shake doesn't make assumptions about technologies used, making it a great fit for projects containing a mix of languages that no language-specific build system covers well. Lastly, Shake supports a 'shared cache' which allows a build to skip steps if they have already ran before on the same machine or another machine! Note that this cache option is off by default.

If you've always wanted to try out Haskell, Shake is a great opportunity to learn while building something useful. Its Manual/Tutorial assumes no prior Haskell knowledge and serves as a great introduction. In the rest of this post I'm going to assume familiarity with the material that tutorial covers.

A mental model for understanding Shake code

It's important we're able to trust our build system. If our build system skips a part of the build it shouldn't, and as a result passes a build that should have failed, the ultimate consequence of that could be that we deploy broken code into production. That's obviously not something we want and so we will need to understand our build system code well.

One way to read Shake code is to walk through the code in the same way that Shake will, following along with its evaluation model. Shake is a complicated bit of software, and when I try to run its evaluation model in my mind I get so confused I end up with less trust of my Shake code rather than more. That might sound pretty damning but I do believe it's possible to write and read Shake code with confidence, if we keep a particular mental model in mind.

The mental model I use likens a shake codebase to a recipe book. Our shake rules are the recipes that make up the book. Each rule contains steps to create a particular file, which in our analogy is the dish we're preparing. Recipes contain ingredients and so do our shake rules, they are the dependencies of a rule.

I claim that if the recipe analogy applies to all the rules in our build system, that build system is correct. The recipes need to be honest though: They cannot leave out ingredients, and they cannot produce different dishes then the ones they advertise. To understand why this honesty is important let's look at a couple of examples of rules that aren't honest and see what breaks.

The case of the missing dependency.

Imagine we have the following small Elm application.

ai-banana/
  elm.json
  src/
    Main.elm
    Banana.elm

We define the following rule to build this application.

rules :: Rules ()
rules =
  "index.js" %> \out -> do
    need ["elm.json", "src/Main.elm"]
    cmd_ "elm make --output" [out]

This rule contains a bug. It doesn't define a dependency on the Banana.elm file.

The bug won't surface the first time we run shake index.js. If we then make a change to Banana.elm and run shake index.js again we'd want Shake to run the rule again and rebuild index.js, but instead Shake will skip doing this work. Shake thinks there's nothing to do because none of the dependencies of index.js it knows about have changed.

The case of the implicit build output

Imagine we have the following rules to produce a changelog for the fictitious ai-banana project.

rules :: Rules ()
rules = do
  "node_modules/.bin/*" %> \out -> do
    need ["package.json"]
    cmd_ (Stdout out) "npm install"

  "CHANGELOG" %> \out -> do
    need ["node_modules/.bin/changelog"]
    cmd_ "node_modules/.bin/changelog ai-banana"

We have one rule that installs the changelog application from npm and another that uses it to produce a CHANGELOG file.

The rule for node_modules/.bin/* isn't honest though. It doesn't just produce executable files in node_modules/.bin/, but also essential backing modules in other node_modules/ subdirectories without which the executables won't run.

If we enable Shake's shared cache (which is off by default) the rule above will be buggy. The first time we run shake CHANGELOG everything will be fine. If afterwards we remove node_modules/ and run shake CHANGELOG we will see a failure. Shake will avoid doing unnecessary work and retrieve node_modules/.bin/changelog from its cache rather than running the build rule for it again. node_modules/.bin/changelog cannot work on its own though, it will attempt to require a bunch of other files that aren't around and fail.

Don't define a rule that produces a tiger in a cage and call it "tiger". When requesting the tiger Shake might just give you what you asked for.

Tips for honest recipes

We've seen there's two primary ways of withholding information from our build system: we can omit dependencies to build rules and we can define build rules that produce more files than they advertise. In both cases we can end up with bugs.

Sometimes though a small lie told to our build system can seem a necessary, or at least awfully tempting part of addressing a tricky design challenge. In other cases the way we write our Shake rules might have made it easier to be dishonest by accident. Let's take a look at a couple of these traps and see how we can avoid them.

Avoid `needHasChanged` and `resultHasChanged`

Shake has two functions needHasChanged and resultHasChanged. They allow you to query whether a particular file has changed since the last time you ran a rule. These functions are neither dependencies nor build rules and so don't fit the "rules are recipes" mental model. To asses whether rules using needsHasChanged or resultHasChanged are correct it's not enough to check the rule is complete in defining its dependencies and outputs. Instead we're back at stepping through the rule in our mind and attempting to follow Shake's evaluation model. I recommend you avoid these two functions.

Often we can redesign our rule to use addOracleCache instead of needHasChanged or resultHasChanged. addOracleCache comes with boilerplate and type magic but fits the "rules as recipes" mental model well.

Comment upon and review every use of `liftIO`

Consider two ways in which a Shake rule can read a file.

parseConfiguration1 :: FilePath -> Action Config
parseConfiguration1 file = do
  contents <- liftIO $ readFile file
  parse contents

parseConfiguration2 :: FilePath -> Action Config
parseConfiguration2 file = do
  contents <- readFile' file
  parse contents

readFile comes from Haskell's Prelude. It returns an IO String which we can turn into an Action String using liftIO. Shake provides a readFile' that directly returns an Action String. The two parseConfiguration functions behave differently. Shake's readFile' will add the file read as a dependency to the rule calling this function, causing that rule to be rebuilt if the contents of the file changed. Haskell's default readFile function will do no such thing.

readFile' is one example of a broader pattern in the functions Shake makes available to us. Some return an Action, and these will automatically track any files they use. Others will return an IO and won't track dependencies, putting the onus on us to check we're not overlooking a dependency.

The pattern is worth extending to helper functions we define ourselves. In the example above parseConfiguration1 is not following the pattern because it returns an Action but doesn't add the files it tracks as dependencies. Tracked and untracked versions of that function that do follow the pattern look like this.

parseConfiguration :: FilePath -> Action Config
parseConfiguration file = do
  -- Easier would be to use `readFile'` here, but for the
  -- purposes of this example we'll pretend it doesn't exist.
  need [file]
  contents <- liftIO $ readFile file
  parse contents

parseConfigurationIO :: FilePath -> IO Config
parseConfigurationIO file = do
  contents <- liftIO $ readFile file
  parse contents

We can't entirely avoid using liftIO because Shake doesn't come with helper functions for all possible kinds of IO, and because there's legit usages of untracked functions that don't break our honesty principles. For example, a rule might produce and then consume a helper file. Such a file is not a dependency of the rule that created it, and adding it as a dependency could lead to weird behavior.

When using IO we need to take extra care to ensure we're not missing a dependency. How convenient for review then that we signpost every use of IO with a call to liftIO. We can help review further by adding a comment to every use of liftIO, explaining why it's necessary and safe.

Lastly, the fewer lines of code we have running in IO, the fewer lines we have to check for missing dependencies. We can reduce the amount of lines running in IO by pushing our liftIO's as far down as we can. Consider the helper function below.

largeHelper :: IO ()
largeHelper = do
  ...
  file <- readFile "all-pianos.txt"
  ...

By putting the liftIO in front of the readFile call we could make largeHelper return an Action. Now a single line in largeHelper is suspect, rather than every line in the function.

Use `newCache` and `addOracleCache` to address performance problems

Sometimes we need to sacrifice the readability of our code improve its performance. When writing Shake code too we might see opportunities to improve its performance by compromising on "rules as recipes" honesty. Should we take these opportunities? Performance is important in a build system, it's the reason we're using a builds system in the first place!

Shake provides two great helpers that allow us to address performance concerns without compromising on "rules as recipes" honesty. They are newCache and addOracleCache. newCache prevents duplicate work within a single build, and addOracleCache across several builds.

Don't pass function arguments to rules

Consider the following rule, which produces a localized index.html file based on some template.html file.

rules :: Locale -> Rules ()
rules locale =
  "index.html" %> \out -> do
    need ["template.html"]
    template <- readFile' "template.html"
    let localized = localize locale template
    writeFile out localized

localize :: Locale -> String -> String
localize = ...

This rule has a dependency that Shake doesn't know about: the locale argument. Should that argument change Shake won't know the rule needs to be rebuilt. One way to fix the problem would be to write a rule which produces a different output file for each locale.

rules :: Rules ()
rules =
  "*/index.html" %> \out -> do
    let locale = localeFromString (takeDirectory out)
    need ["template.html"]
    template <- readFile' "template.html"
    let localized = localize locale template
    writeFile out localized

localeFromString :: String -> Locale
localeFromString = ...

localize :: Locale -> String -> String
localize = ...

Use Shake as a build system, not a task runner

A build system like Shake is good at creating files. It can grow to become a central tool in our work because a lot of development tasks are to create files. For example: compiling source files into a binary or running tests to generate a report. If we make Shake responsible for all these tasks a logical next step is to try make it responsible for all other development tasks as well.

Some tasks are more about doing something than about building something, and they aren't a great fit for a build system. Examples are deploying, running code formatters, starting services like a database, or installing a development environment. These tasks don't fit the honesty criteria of our "rules as recipes" mental model.

You can embed such tasks in Shake using helpers like phony, alwaysRerun, and historyDisable, but you're probably better of building them as separate scripts. We can still write those scripts in Haskell and use Shake library functions like cmd_, but we stop pretending these scripts are recipes.

We can put our Shake build tasks and Haskell scripts back under one umbrella by creating a custom cli tool that provides a UI to both. Shake's functionality is available as a library, so we can totally use it as the engine for some build related tasks in a larger tool.

DEV Community

Shake rules as recipes

A mental model for understanding Shake code

The case of the missing dependency.

The case of the implicit build output

Tips for honest recipes

Avoid `needHasChanged` and `resultHasChanged`

Comment upon and review every use of `liftIO`

Use `newCache` and `addOracleCache` to address performance problems

Don't pass function arguments to rules

Use Shake as a build system, not a task runner

Top comments (0)

A mental model for understanding Shake code

The case of the missing dependency.

The case of the implicit build output

Tips for honest recipes

Avoid needHasChanged and resultHasChanged

Comment upon and review every use of liftIO

Use newCache and addOracleCache to address performance problems

Don't pass function arguments to rules

Use Shake as a build system, not a task runner

Avoid `needHasChanged` and `resultHasChanged`

Comment upon and review every use of `liftIO`

Use `newCache` and `addOracleCache` to address performance problems