In a previous blogpost I argued for the use of the Shake build system, and for writing Shake rules like they are recipes. I also recommended the use of
addOracleCache in situations where it's tricky to do recipes.
addOracleCache are super useful, but they have imposing types, not the most helpful names, and they look similar, making it hard to know which one we need. In this post we're going to look at each in a little more detail. Let's start by briefly introducing these functions.
newCache allows us to memoize a function. A memoized function stores all the results it calculates. When a memoized function gets called with arguments it has seen before, it returns an earlier calculated result rather than repeating work. We can use
newCache to prevent duplicate work during a build but not across builds, because Shake throws away memoized results after each run.
addOracle allows us to define dependencies that aren't files. Say that we want a rule to rebuild when the date changes. We can write some Haskell to get the current date and make it available as a dependency using
addOracle. These oracles run on every build because Shake needs to know if the values they return have changed, in which case the rules that depend on them need to be re-evaluated.
addOracleCache, whatever its name implies, has a use case different from either
addOracle. We can use it to create a rule that produces a Haskell value instead of a file. Suppose we have an expensive calculation, like recursively finding all the dependencies of a source file. We could define a regular build rule using
%> that performs the calculation and stores the result in a
sources.json file. Other rules could
need ["sources.json"], decode its contents and use the result.
addOracleCache allows us to do the same thing without the encoding and decoding steps. Like other build rules and unlike
addOracle, a rule defined using
addOracleCache reruns if any of its dependencies changes.
To summarize what these three functions do before looking at each in more detail:
newCachememoizes a function call within the current build.
addOracledefines dependencies that aren’t files.
addOracleCachecreates a rule that produces a Haskell value instead of a file.
Now let's look at these in more detail.
newCache we can create a memoized function. This is a function that when called several times with the same argument will run just once. The memoized function stores the result of the first run for use in future calls.
newCache has the following type.
newCache :: (Eq k, Hashable k) => (k -> Action v) -> Rules (k -> Action v) ^^^^^^^^^^^^^ ^^^^^^^^^^^^^ the function the memoized to memoize function
If we skip over the type constraints before the
=> first (we'll get to them in a moment!), then we can see that
newCache takes a function and then returns a memoized version of it. Wherever we were using the original function we can use the memoized version too, because they have the same type.
To add memoization behavior Shake needs to know whether we called the function with a particular argument before. That requires comparing the input argument with input arguments from previous calls, and so we need to insist the input argument is of a type that allows such comparisons. Shake does this by requiring the input type to meet the
One tricky situation where
newCache can help us out is when writing a rule for a command that produces more than one file, but where we don't want to hardcode which files it produces. Suppose we have a script
generate-schemas.sh that generates JSON schemas for common types.
rules :: Rules () rules = do "schemas/*.schema.json" %> \_ -> do schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"] need ("generate-schemas.sh" : schemaSrcs) cmd_ "generate-schemas.sh" schemaSrcs
The rule above works but is pretty inefficient.
generate-schemas.sh creates all the schema files in a single run, but the rule above will rerun it every time we
need a different schema. Let's use
newCache to remove this duplication of work.
rules :: Rules () rules = do generateSchemasCached <- newCache $ \() -> do schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"] need ("generate-schemas.sh" : schemaSrcs) cmd_ "generate-schemas.sh" schemaSrcs "schemas/*.schema.json" %> \_ -> generateSchemasCached ()
Now if more than one schema files is
needed during a run,
generate-schemas.sh is ran just once. Later runs might run
generate-schema.sh again, because
newCache doesn't save results across builds. That's good though, because we might have deleted schema files in the interim.
Rules can signal they have a dependency on one or more files using
need. This causes the rule to rebuild if the file they depend on changes. But what if you'd like your rule to depend on the day of the week, or the version of a tool? Using
addOracle you can define such dependencies.
Say we have a rule that creates a letter from a template.
rules :: Rules () rules = do "letter.txt" %> \out -> do date <- liftIO Date.current need ["letter.template"] cmd_ "templateer letter.template" "--out" [out] "--options" ["date=" <> date]
Imagine we're working on this until late in the evening. The next day we want to post it, so we run Shake one more time to get today's date on there. Shake skips the update though because it doesn't know it should rebuild the letter when the date changes.
If we had a file somewhere on our computer that always contained the current date then we could
need that. As it stands the current date is not a file, it's the result of a call to the (made up)
Date.current function. We can use
addOracle to turn it into a dependency.
Let's start again by looking at
addOracle's type. Notice how apart from the functions' constraints (the part of the type before the
addOracle has the exact same type as
addOracle :: (RuleResult q ~ a, ShakeValue q, ShakeValue a) => (q -> Action a) -> Rules (q -> Action a) ^^^^^^^^^^^^^ ^^^^^^^^^^^^^ type ShakeValue a = (Show a, Typeable a, Eq a, Hashable a, Binary a, NFData a)
addOracle takes a function and returns a function of the same type. This new function adds a dependency in any rule it gets called in. Shake requires a bunch of constraints on the function's argument type
q and return type
a to pull this off. Check out the
ShakeValue documentation if you're interested in learning what these constraints are for. We'll see in a moment what
RuleResult q ~ a is about.
We can use
addOracle to fix our letter templating rule like so.
rules :: Rules () rules = do currentDate <- addOracle $ \CurrentDate -> liftIO Date.current "letter.txt" %> \out -> do date <- currentDate CurrentDate need ["letter.template"] cmd_ "templateer letter.template" "--out" [out] "--options" ["date=" <> date] data CurrentDate = CurrentDate deriving (Show, Eq, Generic) instance Hashable CurrentDate instance Binary CurrentDate instance NFData CurrentDate type instance RuleResult CurrentDate = String
We made a similar change when we introduced
newCache: We extract into a function the lines we want to add special behavior to (memoization behavior in case of
newCache, dependency-tracking behavior now). We wrap our extracted function using the right helper, and use function this returns in our rule.
What's different this time is that Shake requires each oracle created using
addOracleCache to have a unique type (we'll see why in a moment). We create one called
CurrentDate and use
Generic to generate all the instances Shake requires of the type. We also have to tell Shake that the oracle associated with the
CurrentDate input type always returns a
String result type (the return value of our imaginary
The reason oracles argument types need to be unique and the reason we for the
RuleResult q ~ a constraint is to support the
askOracle function in Shake's APIs. Using it our letter templating example looks like this:
rules :: Rules () rules = do void . addOracle $ \CurrentDate -> liftIO Date.current "letter.txt" %> \out -> do date <- askOracle CurrentDate need ["letter.template"] cmd_ "templateer letter.template" "--out" [out] "--options" ["date=" <> date] data CurrentDate deriving (Show, Eq, Generic) instance Hashable CurrentDate instance Binary CurrentDate instance NFData CurrentDate type instance RuleResult CurrentDate = String
We can pass
askOracle any of the types we have defined oracles for. Because all our oracles have unique input types
askOracle can figure out which one to call. And because we have explicitly defined the return types for each oracle input type using the
RuleResult type family the type checker knows the type of
The whole thing is pretty magical, so I like to wrap up oracles in an API that exposes regular functions. As an example, we could wrap up the date oracle like this:
module Rules.Date (rules, current) import Development.Shake import qualified Date current :: Action String current = askOracle Current rules :: Rules () rules = void . addOracle $ \Current -> currentOracle currentOracle :: Action String currentOracle = liftIO Date.current data Current deriving (Show, Eq, Generic) instance Hashable Current instance Binary Current instance NFData Current type instance RuleResult Current = String
Our letter templating example could use this module like so.
rules :: Rules () rules = do Rules.Date.rules "letter.txt" %> \out -> do date <- Rules.Date.current need ["letter.template"] cmd_ "templateer letter.template" "--out" [out] "--options" ["date=" <> date]
Suppose we have a project that contains several Elm applications. The Elm applications share some modules between them. If we change an Elm module we'd like Shake to recompile just those projects that use the module. We could write a rule like this:
rules :: Rules () rules = do "assets/*.elm.js" %> \out -> do let (Just [name]) = filePattern "assets/*.elm.js" out let main = name <.> "elm" let srcFiles = recursiveDependencies main need ("elm.json" : srcFiles) cmd_ "elm make --output" [out] [main] recursiveDependencies :: FilePath -> Action [FilePath] recursiveDependencies src = do direct <- directDependencies src recursive <- traverse recursiveDependencies direct pure (src : mconcat recursive) directDependencies :: FilePath -> Action [FilePath] directDependencies src = need [src] contents <- readFile src pure $ Elm.imports (Elm.parse contents)
This works, but it's not super efficient. For each Elm entrypoint it recalculates the full dependency tree. It would be nice if after calculating the dependency for a particular Elm module we could reuse that result in future builds, until any of the recursive dependencies of a module change.
recursiveDependencies looks a lot like a rule. The result it produces is a list of file paths corresponding to Elm modules. Its dependencies are the contents of those Elm modules, because a change in an Elm module might mean that its imports have changed, requiring a recalculation of the dependency tree. Let's use
addOracleCache to write it as a rule.
rules :: Rules () rules = do void $ addOracleCache recursiveDependencies "assets/*.elm.js" %> \out -> do let (Just [name]) = filePattern "assets/*.elm.js" out let main = name <.> "elm" let srcFiles = askOracle (RecursiveDependenciesFor main) need ("elm.json" : srcFiles) cmd_ "elm make --output" [out] [main] recursiveDependencies :: RecursiveDependencies -> Action [FilePath] recursiveDependencies (RecursiveDependenciesFor src) = do direct <- directDependencies src recursive <- traverse (askOracle . RecursiveDependenciesFor) direct pure (src : mconcat recursive) directDependencies :: FilePath -> Action [FilePath] directDependencies src = do contents <- readFile' src -- This takes a dependency on `src`. pure $ Elm.imports (Elm.parse contents) newtype RecursiveDependencies = RecursiveDependenciesFor FilePath deriving (Show, Eq, Hashable, Binary, NFData) type instance RuleResult RecursiveDependenciesFor = [FilePath]
Done! Now we'll cache the calculation of module dependencies between builds. One further possible optimization would be to use
addOracleCache to turn
directDependencies into a build rule as well. That way changes to Elm modules that don't touch imports won't trigger recalculation of a module's dependencies. Give it a try!
It's worth emphasizing that although
addOracleCache has an identical type to
addOracle, it behaves quite differently. Remember that
addOracle is for defining dependencies. Shake runs
addOracle functions pre-emptively to check if their return values have changed. Had we used
addOracle here performance would be worse than the non-oracle-based version of the code we started with, because Shake would rerun it even if none of the Elm source files in the entire project had changed.
I hope this post has been helpful in understanding when and how to use Shake's
addOracleCached functions. One final tip: make your oracle types nice and verbose because Shake uses them in its logs. It will make debugging oracles easier.
That's it. Happy shaking!
Claim your page on DEV before someone else does
Level up every day