Jasper Woudenberg

Posted on May 15, 2020 • Edited on Nov 21, 2020

Using Shake Oracles

#build #shake #haskell

In a previous blogpost I argued for the use of the Shake build system, and for writing Shake rules like they are recipes. I also recommended the use of newCache, addOracle, and addOracleCache in situations where it's tricky to do recipes. newCache, addOracle, and addOracleCache are super useful, but they have imposing types, not the most helpful names, and they look similar, making it hard to know which one we need. In this post we're going to look at each in a little more detail. Let's start by briefly introducing these functions.

Three functions with different uses

newCache allows us to memoize a function. A memoized function stores all the results it calculates. When a memoized function gets called with arguments it has seen before, it returns an earlier calculated result rather than repeating work. We can use newCache to prevent duplicate work during a build but not across builds, because Shake throws away memoized results after each run.

addOracle allows us to define dependencies that aren't files. Say that we want a rule to rebuild when the date changes. We can write some Haskell to get the current date and make it available as a dependency using addOracle. These oracles run on every build because Shake needs to know if the values they return have changed, in which case the rules that depend on them need to be re-evaluated.

addOracleCache, whatever its name implies, has a use case different from either newCache or addOracle. We can use it to create a rule that produces a Haskell value instead of a file. Suppose we have an expensive calculation, like recursively finding all the dependencies of a source file. We could define a regular build rule using %> that performs the calculation and stores the result in a sources.json file. Other rules could need ["sources.json"], decode its contents and use the result. addOracleCache allows us to do the same thing without the encoding and decoding steps. Like other build rules and unlike addOracle, a rule defined using addOracleCache reruns if any of its dependencies changes.

To summarize what these three functions do before looking at each in more detail:

newCache memoizes a function call within the current build.
addOracle defines dependencies that aren’t files.
addOracleCache creates a rule that produces a Haskell value instead of a file.

Now let's look at these in more detail.

How to use `newCache`

With newCache we can create a memoized function. This is a function that when called several times with the same argument will run just once. The memoized function stores the result of the first run for use in future calls.

newCache has the following type.

newCache :: (Eq k, Hashable k) =>
            (k -> Action v) -> Rules (k -> Action v)
             ^^^^^^^^^^^^^            ^^^^^^^^^^^^^
             the function             the memoized
             to memoize               function

If we skip over the type constraints before the => first (we'll get to them in a moment!), then we can see that newCache takes a function and then returns a memoized version of it. Wherever we were using the original function we can use the memoized version too, because they have the same type.

To add memoization behavior Shake needs to know whether we called the function with a particular argument before. That requires comparing the input argument with input arguments from previous calls, and so we need to insist the input argument is of a type that allows such comparisons. Shake does this by requiring the input type to meet the Eq and Hashable constraints.

One tricky situation where newCache can help us out is when writing a rule for a command that produces more than one file, but where we don't want to hardcode which files it produces. Suppose we have a script generate-schemas.sh that generates JSON schemas for common types.

rules :: Rules ()
rules = do
  "schemas/*.schema.json" %> \_ -> do
    schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"]
    need ("generate-schemas.sh" : schemaSrcs)
    cmd_ "generate-schemas.sh" schemaSrcs

The rule above works but is pretty inefficient. generate-schemas.sh creates all the schema files in a single run, but the rule above will rerun it every time we need a different schema. Let's use newCache to remove this duplication of work.

rules :: Rules ()
rules = do
  generateSchemasCached <- newCache $ \() -> do
    schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"]
    need ("generate-schemas.sh" : schemaSrcs)
    cmd_ "generate-schemas.sh" schemaSrcs

  "schemas/*.schema.json" %> \_ -> generateSchemasCached ()

Now if more than one schema files is needed during a run, generate-schemas.sh is ran just once. Later runs might run generate-schema.sh again, because newCache doesn't save results across builds. That's good though, because we might have deleted schema files in the interim.

How to use `addOracle`

Rules can signal they have a dependency on one or more files using need. This causes the rule to rebuild if the file they depend on changes. But what if you'd like your rule to depend on the day of the week, or the version of a tool? Using addOracle you can define such dependencies.

Say we have a rule that creates a letter from a template.

rules :: Rules ()
rules = do
  "letter.txt" %> \out -> do
    date <- liftIO Date.current
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

Imagine we're working on this until late in the evening. The next day we want to post it, so we run Shake one more time to get today's date on there. Shake skips the update though because it doesn't know it should rebuild the letter when the date changes.

If we had a file somewhere on our computer that always contained the current date then we could need that. As it stands the current date is not a file, it's the result of a call to the (made up) Date.current function. We can use addOracle to turn it into a dependency.

Let's start again by looking at addOracle's type. Notice how apart from the functions' constraints (the part of the type before the =>) addOracle has the exact same type as newCache.

addOracle :: (RuleResult q ~ a, ShakeValue q, ShakeValue a) =>
             (q -> Action a) -> Rules (q -> Action a)
              ^^^^^^^^^^^^^            ^^^^^^^^^^^^^

type ShakeValue a = (Show a, Typeable a, Eq a, Hashable a, Binary a, NFData a)

Like newCache addOracle takes a function and returns a function of the same type. This new function adds a dependency in any rule it gets called in. Shake requires a bunch of constraints on the function's argument type q and return type a to pull this off. Check out the ShakeValue documentation if you're interested in learning what these constraints are for. We'll see in a moment what RuleResult q ~ a is about.

We can use addOracle to fix our letter templating rule like so.

rules :: Rules ()
rules = do
  currentDate <- addOracle $ \CurrentDate -> liftIO Date.current

  "letter.txt" %> \out -> do
    date <- currentDate CurrentDate
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

data CurrentDate = CurrentDate
  deriving (Show, Eq, Generic)

instance Hashable CurrentDate

instance Binary CurrentDate

instance NFData CurrentDate

type instance RuleResult CurrentDate = String

We made a similar change when we introduced newCache: We extract into a function the lines we want to add special behavior to (memoization behavior in case of newCache, dependency-tracking behavior now). We wrap our extracted function using the right helper, and use function this returns in our rule.

What's different this time is that Shake requires each oracle created using addOracle or addOracleCache to have a unique type (we'll see why in a moment). We create one called CurrentDate and use Generic to generate all the instances Shake requires of the type. We also have to tell Shake that the oracle associated with the CurrentDate input type always returns a String result type (the return value of our imaginary Date.current function).

Intermezzo: What's up with these boilerplate types?

The reason oracles argument types need to be unique and the reason we for the RuleResult q ~ a constraint is to support the askOracle function in Shake's APIs. Using it our letter templating example looks like this:

rules :: Rules ()
rules = do
  void . addOracle $ \CurrentDate -> liftIO Date.current

  "letter.txt" %> \out -> do
    date <- askOracle CurrentDate
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

data CurrentDate deriving (Show, Eq, Generic)

instance Hashable CurrentDate

instance Binary CurrentDate

instance NFData CurrentDate

type instance RuleResult CurrentDate = String

We can pass askOracle any of the types we have defined oracles for. Because all our oracles have unique input types askOracle can figure out which one to call. And because we have explicitly defined the return types for each oracle input type using the RuleResult type family the type checker knows the type of askOracle SomeOracleType.

The whole thing is pretty magical, so I like to wrap up oracles in an API that exposes regular functions. As an example, we could wrap up the date oracle like this:

module Rules.Date (rules, current)

import Development.Shake
import qualified Date

current :: Action String
current = askOracle Current

rules :: Rules ()
rules =
  void . addOracle $ \Current -> currentOracle

currentOracle :: Action String
currentOracle = liftIO Date.current

data Current deriving (Show, Eq, Generic)

instance Hashable Current

instance Binary Current

instance NFData Current

type instance RuleResult Current = String

Our letter templating example could use this module like so.

rules :: Rules ()
rules = do
  Rules.Date.rules

  "letter.txt" %> \out -> do
    date <- Rules.Date.current
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

How to use `addOracleCache`

Suppose we have a project that contains several Elm applications. The Elm applications share some modules between them. If we change an Elm module we'd like Shake to recompile just those projects that use the module. We could write a rule like this:

rules :: Rules ()
rules = do
  "assets/*.elm.js" %> \out -> do
    let (Just [name]) = filePattern "assets/*.elm.js" out
    let main = name <.> "elm"
    let srcFiles = recursiveDependencies main
    need ("elm.json" : srcFiles)
    cmd_ "elm make --output" [out] [main]

recursiveDependencies :: FilePath -> Action [FilePath]
recursiveDependencies src = do
  direct <- directDependencies src
  recursive <- traverse recursiveDependencies direct
  pure (src : mconcat recursive)

directDependencies :: FilePath -> Action [FilePath]
directDependencies src =
  need [src]
  contents <- readFile src
  pure $ Elm.imports (Elm.parse contents)

This works, but it's not super efficient. For each Elm entrypoint it recalculates the full dependency tree. It would be nice if after calculating the dependency for a particular Elm module we could reuse that result in future builds, until any of the recursive dependencies of a module change.

The function recursiveDependencies looks a lot like a rule. The result it produces is a list of file paths corresponding to Elm modules. Its dependencies are the contents of those Elm modules, because a change in an Elm module might mean that its imports have changed, requiring a recalculation of the dependency tree. Let's use addOracleCache to write it as a rule.

rules :: Rules ()
rules = do
  void $ addOracleCache recursiveDependencies

  "assets/*.elm.js" %> \out -> do
    let (Just [name]) = filePattern "assets/*.elm.js" out
    let main = name <.> "elm"
    let srcFiles = askOracle (RecursiveDependenciesFor main)
    need ("elm.json" : srcFiles)
    cmd_ "elm make --output" [out] [main]

recursiveDependencies :: RecursiveDependencies -> Action [FilePath]
recursiveDependencies (RecursiveDependenciesFor src) = do
  direct <- directDependencies src
  recursive <- traverse (askOracle . RecursiveDependenciesFor) direct
  pure (src : mconcat recursive)

directDependencies :: FilePath -> Action [FilePath]
directDependencies src = do
  contents <- readFile' src -- This takes a dependency on `src`.
  pure $ Elm.imports (Elm.parse contents)

newtype RecursiveDependencies
  = RecursiveDependenciesFor FilePath
  deriving (Show, Eq, Hashable, Binary, NFData)

type instance RuleResult RecursiveDependenciesFor = [FilePath]

Done! Now we'll cache the calculation of module dependencies between builds. One further possible optimization would be to use addOracleCache to turn directDependencies into a build rule as well. That way changes to Elm modules that don't touch imports won't trigger recalculation of a module's dependencies. Give it a try!

It's worth emphasizing that although addOracleCache has an identical type to addOracle, it behaves quite differently. Remember that addOracle is for defining dependencies. Shake runs addOracle functions pre-emptively to check if their return values have changed. Had we used addOracle here performance would be worse than the non-oracle-based version of the code we started with, because Shake would rerun it even if none of the Elm source files in the entire project had changed.

Closing thoughts

I hope this post has been helpful in understanding when and how to use Shake's newCache, addOracle, and addOracleCached functions. One final tip: make your oracle types nice and verbose because Shake uses them in its logs. It will make debugging oracles easier.

That's it. Happy shaking!

DEV Community

Using Shake Oracles

Three functions with different uses

How to use `newCache`

How to use `addOracle`

Intermezzo: What's up with these boilerplate types?

How to use `addOracleCache`

Closing thoughts

Top comments (0)

Three functions with different uses

How to use newCache

How to use addOracle

Intermezzo: What's up with these boilerplate types?

How to use addOracleCache

Closing thoughts

How to use `newCache`

How to use `addOracle`

How to use `addOracleCache`