DEV Community: Jasper Woudenberg

Haskell for the Elm Enthusiast

Jasper Woudenberg — Tue, 03 Aug 2021 15:40:01 +0000

This post was co-authored by Michael Glass, Stöffel, and myself. It first first appeared on the NoRedInk blog.

Many years ago NRI adopted Elm as a frontend language. We started small with a disposable proof of concept, and as the engineering team increasingly was bought into Elm being a much better developer experience than JavaScript more and more of our frontend development happened in Elm. Today almost all of our frontend is written in Elm.

Meanwhile, on the backend, we use Ruby on Rails. Rails has served us well and has supported amazing growth of our website, both in terms of the features it supports, and the number of students and teachers who use it. But we’ve come to miss some of the tools that make us so productive in Elm: Tools like custom types for modeling data, or the type checker and its helpful error messages, or the ease of writing (fast) tests.

A couple of years ago we started looking into Haskell as an alternative backend language that could bring to our backend some of the benefits we experience writing Elm in the frontend. Today some key parts of our backend code are written in Haskell. Over the years we’ve developed our style of writing Haskell, which can be described as very Elm-like (it’s also still changing!).

🌳 Why be Like Elm?

Elm is a small language with great error messages, great documentation, and a great community. Together these make Elm one of the nicest programming languages to learn. Participants in an ElmBridge event will go from knowing nothing of the language to writing a real application using Elm in 5 hours.

We have a huge amount of Elm code at NoRedInk, and it supports some pretty tricky UI work. Elm scales well to a growing and increasingly complicated codebase. The compiler stays fast and we don’t lose confidence in our ability to make changes to our code. You can learn more about our Elm story here.

📦 Unboxing Haskell

Haskell shares a lot of the language features we like in Elm: Custom types to help us model our data. Pure functions and explicit side effects. Writing code without runtime exceptions (mostly).

When it comes to ease of learning, Haskell makes different trade-offs than Elm. The language is much bigger, especially when including the many optional language features that can be enabled. It’s entirely up to you whether you want to use these features in your code, but you’ll need to know about many of them if you want to make use of Haskell’s packages, documentation, and how-tos. Haskell’s compiler errors typically aren’t as helpful as Elm’s are. Finally, we’ve read many Haskell books and blog posts, but haven’t found anything getting us from knowing no Haskell to writing a real application in it that’s anywhere near as small and effective as the Elm Guide.

🏟️ When in Rome, Act Like a Babylonian

Many of the niceties we’re used to in Elm we get in Haskell too. But Haskell has many additional features, and each one we use adds to the list of things that an Elm programmer will need to learn. So instead we took a path that many in the Haskell community took before us: limit ourselves to a subset of the language.

There are many styles of writing Haskell, each with its own trade-offs. Examples include Protolude, RIO, the lens ecosystem, and many more. Our approach differs in being strongly inspired by Elm. So what does our Elm-inspired style of writing Haskell look like?

🍇 Low hanging fruit: the Elm standard library

Our earliest effort in making our Haskell code more Elm-like was porting the Elm standard library to Haskell. We’ve open-sourced this port as a library named nri-prelude. It contains Haskell counterparts of the Elm modules for working with Strings, Lists, Dicts, and more.

nri-prelude also includes a port of elm-test. It provides everything you need for writing unit tests and basic property tests.

Finally, it includes a GHC plugin that makes it so Haskell’s default Prelude (basically its standard library) behaves like Elm’s defaults. For example, it adds implicit qualified imports of some modules like List, similar to what Elm does.

🎚️ Effects and the Absence of The Elm Architecture

Elm is opinionated in supporting a single architecture for frontend applications, fittingly called The Elm Architecture. One of its nice qualities is that it forces a separation of application logic (all those conditionals and loops) and effects (things like talking to a database or getting the current time). We love using The Elm Architecture writing frontend applications, but don’t see a way to apply it 1:1 to backend development. In the F# community, they use the Elm Architecture for some backend features (see: When to use Elmish Bridge), but it’s not generally applicable. We’d still like to encourage that separation between application logic and effects though, having seen some of the effects of losing that distinction in our backend code. Read our other post Pufferfish, please scale the site! if you want to read more about this.

Out of many options we’re currently using the handle pattern for managing effects. For each type of effect, we create a Handler type (we added the extra r in a typo way back and it has stuck around. Sorry). We use this pattern across our libraries for talking to outside systems: nri-postgresql, nri-http, nri-redis, and nri-kafka.

Without The Elm Architecture, we depend heavily on chaining permutations through a stateful Task type. This feels similar to imperative coding: First, do A, then B, then C. Hopefully, when we’re later on in our Haskell journey, we’ll discover a nice architecture to simplify our backend code.

🚚 Bringing Elm Values to Haskell

One way in which Haskell is different from both Elm and Rails is that it is not particularly opinionated. Often the Haskell ecosystem offers multiple different ways to do one particular thing. So whether it’s writing an http server, logging, or talking with a database, the first time we do any of these things we’ll need to decide how.

When adopting a Haskell feature or library, we care about

smallness, e.g. introduce new concepts only when necessary
how “magical” is it? E.g. How surprising is it?
How easy is it to learn?
how easy is it to use?
how comprehensible is the documentation?
explicitness over *terseness (*but terseness isn’t implicitly bad).
consistency & predictability
“safety” (no runtime exceptions).

Sometimes the Haskell ecosystem provides an option that fits our Elm values, like with the handle pattern, and so we go with it. Other times a library has different values, and then the choice not to use it is easy as well. An example of this is lens/prism ecosystem, which allows one to write super succinct code, but is almost a language onto itself that one has to learn first.

The hardest decisions are the ones where an approach protects us against making mistakes in some way (which we like) but requires familiarity with more language features to use (which we prefer to avoid).

To help us make better decisions, we often try it both ways. That is, we’re willing to build a piece of software with & without a complex language feature to ensure the cost of the complexity is worth the benefit that the feature brings us.

Another approach we take is making decisions locally. A single team might evaluate a new feature, and then demo it and share it with other teams after they have a good sense the feature is worth it. Remember: a super-power of Haskell is easy refactorability. Unlike our ruby code, going through and doing major re-writes in our Haskell codebase is often an hours-or-days-long (rather than weeks-or-months-long) endeavor. Adopting two different patterns simultaneously has a relatively small cost!

Case studies in feature adoption:

🐘 Type-Check All Elephants

One example where our approach is Elm-like in some ways but not in others is how we talk to the database. We’re using a GHC feature called quasiquoting for this, which allow us to embed SQL query strings directly into our Haskell code, like this:

{-# LANGUAGE QuasiQuotes #-}

module Animals (listAll) where

import Postgres (query, sql)

listAll :: Postgres.Handler -> Task Text (List (Text, Text))
listAll postgres =
  query postgres [sql|SELECT species, genus FROM animals|]

A library called postgresql-typed can test these queries against a real Postgres database and show us an error at compile time if the query doesn’t fit the data. Such a compile-time error might happen if a table or column we reference in a query doesn’t exist in the database. This way we use static checks to eliminate a whole class of potential app/database compatibility problems!

The downside is that writing code like this requires everyone working with it to learn a bit about quasi quotes, and what return type to expect for different kinds of queries. That said, using some kind of querying library instead has a learning curve too, and query libraries tend to be pretty big to support all the different kinds of queries that can be made.

🔣 So Many Webserver Options

Another example where we traded additional safety against language complexity is in our choice of webserver library. We went with servant here, a library that lets you express REST APIs using types, like this:

import Servant

data Routes route = Routes
  { listTodos ::
      route
        :- "todos"
        :> Get '\[JSON\] [Todo],
    updateTodo ::
      route
        :- "todos"
        :> Capture "id" Int
        :> ReqBody '[JSON] Todo
        :> Put '[JSON] NoContent,
    deleteTodo ::
      route
        :- "todos"
        :> Capture "id" Int
        :> Delete '[JSON] NoContent
  }
  deriving (Generic)

Servant is a big library that makes use of a lot of type-level programming techniques, which are pretty uncommon in Elm, so there’s a steep learning cost associated with understanding how the type magic works. Using it without a deep understanding is reasonably straightforward.

The benefits gained from using Servant outweigh the cost of expanded complexity. Based on a type like the one in the example above, the servant ecosystem can generate functions in other languages like Elm or Ruby. Using these functions means we can save time with backend-to-frontend or service-to-service communication. If some Haskell type changes in a backward-incompatible fashion we will generate new Elm code, and this might introduce a compiler error on the Elm side.

So for now we’re using servant! It’s important to note that what we want is compile-time server/client compatibility checking, and that’s why we swallow Servant’s complexity. If we could get the same benefit without the type-level programming demonstrated above, we would prefer that. Hopefully, in the future, another library will offer the same benefits from a more Elm-like API.

😻 Like what you see?

We're running the libraries discussed above in production. Our most-used Haskell application receives hundreds of thousands of requests per minute without issue and produces hardly any errors.

Code can be found at NoRedInk/haskell-libraries. Libraries have been published to hackage and stackage. We'd love to know what you think!

☄️ Pufferfish, please scale the site!

Jasper Woudenberg — Thu, 22 Jul 2021 07:43:08 +0000

This post was co-authored by Michael Glass, Stöffel, and myself. It first first appeared on the NoRedInk blog.

We created Team Pufferfish about a year ago with a specific goal: to avert the MySQL apocalypse! The MySQL apocalypse would occur when so many students would work on quizzes simultaneously that even the largest MySQL database AWS has on offer would not be able to cope with the load, bringing the site to a halt.

A little over a year ago, we forecasted our growth and load-tested MySQL to find out how much wiggle room we had. In the worst case (because we dislike apocalypses), or in the best case (because we like growing), we would have about a year’s time. This meant we needed to get going!

Looking back on our work now, the most important lesson we learned was the importance of timely and precise feedback at every step of the way. At times we built short-lived tooling and process to support a particular step forward. This made us so much faster in the long run.

🏔 Climbing the Legacy Code Mountain

Clear from the start, Team Pufferfish would need to make some pretty fundamental changes to the Quiz Engine, the component responsible for most of the MySQL load. Somehow the Quiz Engine would need to significantly reduce its load on MySQL.

Most of NoRedInk runs on a Rails monolith, including the Quiz Engine. The Quiz Engine is big! It’s got lots of features! It supports our teachers & students to do lots of great work together! Yay!

But the Quiz Engine has some problems, too. A mix of complexity and performance-sensitivity has made engineers afraid to touch it. Previous attempts at big structural change in the Quiz Engine failed and had to be rolled back. If Pufferfish was going make significant structural changes, we would need to ensure our ability to be productive in the Quiz Engine codebase. Thinking we could just do it without a new approach would be foolhardy.

⚡ ️The Vengeful God of Tests

We have mixed feelings about our test suite. It’s nice that it covers a lot of code. Less nice is that we don’t really know what each test is intended to check. These tests have evolved into complex bits of code by themselves with a lot of supporting logic, and in many cases, tight coupling to the implementation. Diving deep into some of these tests has uncovered tests no longer covering any production logic at all. The test suite is large and we didn’t have time to dive deep into each test, but we were also reluctant to delete test cases without being sure they weren’t adding value.

Our relationship with the Quiz Engine test suite was and still is a bit like one might have with an angry Greek god. We’re continuously investing effort to keep it happy (i.e. green), but we don’t always understand what we’re doing or why. Please don’t spoil our harvest and protect us from (production) fires, oh mighty RSpec!

The ultimate goal wasn’t to change Quiz Engine functionality, but rather to reduce its load on MySQL. This is the perfect scenario for tests to help us! The test suite we want is:

fast
comprehensive, and
not dependent on implementation
includes performance testing

Unfortunately, that’s not the hand we were given:

The suite takes about 30 minutes to run in CI and even longer locally.
Our QA team finds bugs that sneaked past CI in PRs with Quiz Engine changes relatively frequently.
Many tests ensure that specific queries are performed in a specific order. Considering we might replace MySQL wholesale, these tests provide little value.
And because a lot of Quiz Engine code is extremely performance-sensitive, there’s an increased risk of performance regressions only surfacing with real production load.

Fighting with our tests meant that even small changes would take hours to verify in tests, and then, because of unforeseen regressions not covered by the tests, take multiple attempts to fix, resulting in multiple-day roll-outs for small changes.

Our clock is ticking! We needed to iterate faster than that if we were going to avert the apocalypse.

🐶 I have no idea what I’m doing 🧪

Reading complicated legacy Rails code often raises questions that take surprising amounts of effort to answer.

Is this method dead code? If not, who is calling this?
Are we ever entering this conditional? When?
Is this function talking to the database?
Is this function intentionally talking to the database?
Is this function only reading from the database or also writing to it?

It isn’t even clear what code was running. There are a few features of Ruby (and Rails) which optimize for writing code over reading it. We did our best to unwrap this type of code:

Rails provides devs the ability to wrap functionality in hooks. before_ and after_ hooks let devs write setup and tear-down code once, then forget it. However, the existence of these hooks means calling a method might also evaluate code defined in a different file, and you won’t know about it unless you explicitly look for it. Hard to read!

Complicating things further is Ruby’s dynamic dispatch based on subclassing and polymorphic associations. Which load_students am I calling? The one for Quiz or the one for Practice? They each implement the Assignment interface but have pretty different behavior! And: they each have their own set of hooks🤦. Maybe it’s something completely different!

And then there’s ActiveRecord. ActiveRecord makes it easy to write queries — a little too easy. It doesn’t make it easy to know where queries are happening. It’s ergonomic that we can tell ActiveRecord what we need, and let it figure how to fetch the data. It’s less nice when you’re trying to find out where in the code your queries are happening and the answer to that question is, “absolutely anywhere”. We want to know exactly what queries are happening on these code paths. ActiveRecord doesn’t help.

🧵A rich history

A final factor that makes working in Quiz Engine code daunting is the sheer size of the beast. The Quiz Engine has grown organically over many years, so there’s a lot of functionality to be aware of.

Because the Quiz Engine itself has been hard to change for a while, APIs defined between bits of Quiz Engine code often haven’t evolved to match our latest understanding. This means understanding the Quiz Engine code requires not just understanding what it does today, but also how we thought about it in the past, and what (partial) attempts were made to change it. This increases the sum of Quiz Engine knowledge even further.

For example, we might try to refactor a bit of code, leading to tests failing. But is this conditional branch ever reached in production? 🤷

Enough complaining. What did we do about it?

We knew this was going to be a huge project, and huge projects, in the best case, are shipped late, and in the average case don’t ever ship. The only way we were going to have confidence that our work would ever see the light of day was by doing the riskiest, hardest, scariest stuff first. That way, if one approach wasn’t going to work, we would find out about it sooner and could try something new before we’d over-invested in a direction.

So: where is the risk? What’s the scariest problem we have to solve? History dictates: The more we change the legacy system, the more likely we’re going to cause regressions.

So our first task: cut away the part of the Quiz Engine that performs database queries and port this logic to a separate service. Henceforth when Rails needs to read or change Quiz Engine data, it will talk to the new service instead of going to the database directly.

Once the legacy-code risk has been minimized, we would be able to focus on the (still challenging) task of changing where we store Quiz Engine data from single-database MySQL to something horizontally scalable.

⛏️ Phase 1: Extracting queries from Rails

🔪 Finding out where to cut

Before extracting Quiz Engine MySQL queries from our Rails service, we first needed to know where those queries were being made. As we discussed above this wasn’t obvious from reading the code.

To find the MySQL queries themself, we built some tooling: we monkey-patched ActiveRecord to warn whenever an unknown read or write was made against one of the tables containing Quiz Engine data. We ran our monkey-patched code first in CI and later in production, letting the warnings tell us where those queries were happening. Using this information we decorated our code by marking all the reads and writes. Once code was decorated, it would no longer emit warnings. As soon as all the writes & reads were decorated, we changed our monkey-patch to not just warn but fail when making a query against one of those tables, to ensure we wouldn’t accidentally introduce new queries touching Quiz Engine data.

🚛 Offloading logic: Our first approach

Now we knew where to cut, we decided our place of greatest risk was moving a single MySQL query out of our rails app. If we could move a single query, we could move all of them. There was one rub: if we did move all queries to our new app, we would add a lot of network latency. because of the number of round trips needed for a single request. Now we have a constraint: Move a single query into a new service, but with very little latency.

How did we reduce latency?

Get rid of network latency by getting rid of the network — we hosted the service in the same hardware as our Rails app.
Get rid of protocol latency by using a dead-simple protocol: socket communication.

We ended up building a socket server in Haskell that took data requests from Rails, and transformed them into a series of MySQL queries, which rails would use to fetch the data itself.

🛸 Leaving the Mothership: Fewer Round Trips

Although co-locating our service with rails got us off the ground, it required significant duct tape. We had invested a lot of work building nice deployment systems for HTTP services and we didn’t want to re-invent that tooling for socket-based side-car apps. The thing that was preventing the migration was having too many round-trip requests to the Rails app. How could we reduce the number of round trips?

As we moved MySQL query generation to our new service, we started to see this pattern in our routes:

MySQL Read some data       ┐
Ruby  Do some processing   │ candidate 1 for
MySQL Read some more data  ┘ extraction
Ruby  More processing
MySQL Write some data      ┐
Ruby  Processing again!    │ candidate 2 for
MySQL Write more data      ┘ extraction

To reduce latency, we’d have to bundle reads and writes: In addition to porting reads & writes to the new service, we’d have to port the ruby logic between reads and writes, which would be a lot of work.

What if instead, we could change the order of operations and make it look like this?

MySQL Read some data       ┐ candidate 1 for
MySQL Read some more data  ┘ extraction
Ruby  Do some processing
Ruby  More processing
Ruby  Processing again!
MySQL Write some data      ┐ candidate 2 for
MySQL Write more data      ┘ extraction

Then we’d be able to extract batches of queries to Haskell and leave the logic behind in Rails.

One concern we had with changing the order of operations like this was the possibility of a request handler first writing some data to the database, then reading it back again later. Changing the order of read and write queries would result in such code failing. However, since we now had a complete and accurate picture of all the queries the Rails code was making, we knew (luckily!) we didn’t need to worry about this.

Another concern was the risk of a large refactor like this resulting in regressions causing long feedback cycles and breaking the Quiz Engine. To avoid this we tried to keep our refactors as dumb as possible: Specifically: we mostly did a lot of inlining. We would start with something like this

class QuizzesControllller < ActionController
  def show
    quiz = load_quiz! # here are queries sometimes
    quiz_type = which_quiz(quiz) # and here other times
  end
end

and we would aggressively inline functions to surface where and why we were querying

class QuizzesControllller < ActionController
  def show
    quiz = Quiz.find(quiz_id_param)
    quiz_type =
      if quiz.for_credit?
        :for_credit
      else
        load_practice_quiz_type
      end
  end
end

and again, and again

class QuizzesControllller < ActionController
  def show
    quiz = Quiz.find(quiz_id_param)
    quiz_type =
      if quiz.for_credit?
        :for_credit
      else
        how_much_fun = QuizForFun.find(quiz_id_param)
        if how_much_fun > 9000
          :super_saiyan
        else
          load_sub_syan_fun_type # TODO: inline me
        end
      end
  end
end

These are refactors with a relatively small chance of changing behavior or causing regressions.

Once the query was at the top level of the code it became clear when we needed data, and that understanding allowed us to push those queries to happen first.

e.g. from above, we could easily push the previously obscured QuizForFun query to the beginning:

class QuizzesControllller < ActionController
  def show
    quiz = Quiz.find(quiz_id_param)
    how_much_fun =
      if quiz.for_credit?
        nil
      else
        QuizForFun.find(quiz_id_param)
      end
    quiz_type = if quiz.for_credit?
        :for_credit
      elsif how_much_fun > 9000
        :super_saiyan
      else
        load_sub_syan_fun_type # TODO: inline me
      end
  end
end

You might expect our bout of inlining to introduce a ton of duplication in our code, but in practice, it surfaced a lot of dead code and made it clearer what the functions we left behind were doing. That wasn’t what we set out to do, but still, nice!

👛 Phase 2: Changing the Quiz Engine datastore

At this point all interactions with the Quiz Engine datastore were going through this new Quiz Engine service. Excellent! This means for the second part of this project, the part where we were actually going to avert the MySQL apocalypse, we wouldn’t need to worry about our legacy Rails code.

To facilitate easy refactoring, we built this new service in Haskell. The effect was immediately noticeable. Like an embargo had been lifted, from this point forward we saw a constant trickle of small productive refactors get mixed in the work we were doing, slowly reshaping types to reflect our latest understanding. Changes we wouldn’t have made on the Rails side unless we’d have set aside months of dedicated time. Haskell is a great tool to use to manage complexity!

The centerpiece of this phase was the architectural change we were planning to make: switching from MySQL to a horizontally scalable storage solution. But honestly, figuring the architecture details here wasn’t the most interesting or challenging portion of the work, so we’re just putting that aside for now. Maybe we’ll return to it in a future blog post (sneak peek: we ended up using Redis and Kafka). Like in step 1, the biggest question we had to solve was “how are we going to make it safe to move forward quickly?”

One challenge was that we had left most of our test suite behind in Rails in phase one, so we were not doing too well on that front. We added Haskell test coverage of course, including many golden result tests which are worth a post on their own. Together with our QA team we also invested in our Cypress integration test suite which runs tests from the browser, thus integration-testing the combination of our Rails and Haskell code.

Our most useful tool in making safe changes in this phase however was our production traffic. We started building up what was effectively a parallel Haskell service talking to Redis next to the existing one talking to MySQL. Both received production load from the start, but until the very end of the project only the MySQL code paths’ response values were used. When the Redis code path didn’t match the MySQL, we’d log a bug. Using these bug reports, we slowly massaged the Redis code path to return identical data to MySQL.

Because we weren’t relying on the output of the Redis code path in production, we could deploy changes to it many times a day, without fear of breaking the site for students or teachers. These deploys provided frequent and fast feedback. Deploying frequently was made possible by the Haskell Quiz Engine code living in its own service, which meant deploys contained only changes by our team, without work from other teams with a different risk profile.

🥁 So, did it work?

It’s been about a month since we’ve switched entirely to the new architecture and it’s been humming along happily. By the time we did the official switch-over to the new datastore it had been running at full-load (but with bugs) for a couple of months already. Still, we were standing ready with buckets of water in case we overlooked something. Our anxiety was in vain: the roll-out was a non-event.

Architecture, plans, goals, were all important to making this a success. Still, we think the thing most crucial to our success was continuously improving our feedback loops. Fast feedback (lots of deploys), accurate feedback (knowing all the MySQL queries Rails is making), detailed feedback (lots of context in error reports), high signal/noise ratio (removing errors we were not planning to act on), lots of coverage (many students doing quizzes). Getting this feedback required us to constantly tweak and create tooling and new processes. But even if these processes were sometimes short-lived, they've never been an overhead, allowing us to move so much faster.

Helper modules are okay

Jasper Woudenberg — Fri, 12 Mar 2021 23:41:04 +0000

A module containing a loose collection of mostly unrelated functions, that would be my definition of a Helper module. The name helpers kind of gives it away, it shows we weren't able to find a terse expression of the module's purpose.

Maybe their haphazard construction makes helper modules a bit unlikeable. They don't have great designs. But I like them anyway, because I think they fill an important niche in the wider ecosystem that is a codebase. They allow us to remove duplication without doing any api design work, at least not yet.

Api design is hard (I wrote about this before), and doing a good job requires plenty of example usages, to test our design as we create it. When we first spot some duplication it is unlikely to already be so pervasive that enough data for a good design exists. Given insufficient data, duplication is better than a bad abstraction. And after duplication, adding a helper might be the next least-worst option.

What happens next depends on usage of the helper. If it grows we might at some point design that abstraction using all the example use cases we now have. If usages remain constant we'll likely never look at the helper again. And if usage drops to a single callsite we can move the helper back to the module where it is called. One fewer helper, and one fewer layer of indirection!

Informal work lists

Jasper Woudenberg — Tue, 09 Mar 2021 10:41:57 +0000

There are many ways I learn about new tasks. Email, Github, issue trackers, chat, and my memory all chip in. I have to copy these tasks over to a single place or I'll spend a lot of time worrying that I'm forgetting something.

I think engineering teams can face a similar problem, tracking tasks and work in progress in many different places. Here are some examples of todo lists teams keep, that maybe aren't immediately recognizable as such.

The collection of TODO comments in code pretty literally compromise a todo list.
Skipped tests represent work in progress. The work will be finished either when the test is fixed and un-skipped, or removed.
Each experimental feature flag represents some work in progress. Once the work is finished the flag can be removed again. (I realize there's other uses for feature flags that don't count as work in progress).
Debug instrumentation running in production to help understand a specific issue of our code represent work in progress. Once the problem is understood the instrumentation can be removed again.

The risk I see here is these are basically backlogs with all the associated costs, but these costs aren't necessarily obvious. In fact its super easy to add an item of work in any of these categories. Removing an item of work can be much harder, especially if we are not or no longer the person with context on why the task was added.

We can mitigate some of the risk using by adding some enabling constraints:

CI can reject PRs that contain TODO comments. That way TODO comments can be used as a reminder system while working on a PR, but any that remain when the PR is done will have to be moved into the team's backlog proper. Assuming folks take more care writing a backlog issue than a TODO comment (this is true for me), this can force us to reflect on whether the thought represented by the TODO is really worth turning into a ticket. If you don't want to loose the ability to add TODO comments in PRs but do like these needing to be added to the teams tracker, an alternative system is to enforce that TODO comments need to include a link or issue number.
Similar to TODO comments, consider failing CI on skipped tests, or fail CI when skipped tests aren't accompanied by a link to a ticket.
We can have a convention of adding comments to each feature flag, specifically explaining under what circumstances it will be flipped and when it can be removed. Answering these questions can help us figure out if adding a feature flag is the right move, and help future us or others remove the feature flag once it has served its purpose.
Similar to documenting feature flags, a convention of commenting on debug instrumentation that answers the same two questions: when is this going to be used and when can we remove it?

These small process changes can help us make more conscious decisions about whether to create future work for ourselves. And when we decide to go forward, set our future selves or others up better to complete the work, or decide to remove the task.

Collaboration and Parallelization

Jasper Woudenberg — Fri, 26 Feb 2021 11:39:13 +0000

Collaborating on a task is great. Here's just some of the benefits:

Having more eyes on a problem helps us work faster. Another human might spot the problem causing a bug we missed, avoid unnecessary work, and help us avoid getting stuck in rabbit holes.
Hearing more voices when making a decision improves the quality of that decision. The earlier we get those voices in, the less rework we have to do.
By having more ears in the room we can bake context-sharing into the development process, and reduce the need for separate meetings, processes, and roles dedicated to alignment.
Some of our hard work can be draining. Sharing painful tasks can help keep us motivated.
Collaborating on tasks means we need fewer tasks in progress at the same time. This means our backlogs can be shorter, which has benefits of its own that I wrote about previously.

I think most people (including me) don't feel like collaborating on tasks 100% of the time. I need time for myself as well. Given the advantages above though, it seems worthwhile to continuously look for ways to collaborate as much as possible, as long as it's sustainable.

This goal of collaborating more can be at odds with another popular goal: to parallelize work as much as possible.

If we create a large backlog of tasks ready to pick up then we intentionally make the barrier to starting new work very low. Starting new work takes less energy than gaining context on an in-progress task and supporting a colleague.

When I created backlogs like that I think it was primarily out of fear that without sufficiently parallelized work someone on the team (myself included) might find themselves with nothing to do. Never mind I couldn't even imagine what it would be like not to know a single useful thing to do. Never mind that having some slack in our schedules for small refactoring tasks and the like makes us more productive rather than less. Any stretch of time between an individual running out of work and the team deciding a next step seemed unacceptable.

There is a self-reinforcing feedback loop here: If we're used to picking up new work over helping finish in progress work then running out of new work is a big problem. By addressing the problem we further encourage ourselves to start new tasks.

One way to break out of this loop is using a work in progress limit, a maximum amount of tasks the team can have in progress at the same time. When at the limit an in-progress task needs to be finished before a new one can started. Retro on what the biggest obstacles are on setting the limit lower and try to remove them. A work in progress limit will force us to work on fewer things at the same time without having to give up the curated backlog (which is scary!).

Here's another self-reinforcing feedback loop incentivizing parallelization: As we practice collaborating on tasks our toolbox of collaboration strategies gets fuller. Until we have a full toolbox a lot of upcoming work can look unsuitable for collaboration, causing us to parallelize it preemptively and robbing ourselves of the opportunity to learn.

We can break this loop by committing not to parallelizing preemptively. This means accept some amount of pain and trying to find collaboration strategies to address that pain. Maybe it doesn't work and you'll fall back to parallelizing anyway. That's fine. The important thing is to try collaboration first.

How much do you collaborate on your team? How much do you parallelize work? Do you experience any tensions between the two?

Backlogs Aren't Free

Jasper Woudenberg — Thu, 21 Jan 2021 18:25:34 +0000

I don't know what I would do without some sort of work list. If I couldn't jot down the occasional "I need to do X" thought for later, I'm pretty sure I wouldn't get any work done! So whether you call it a todo-list, backlog, roadmap, queue, inbox, or rug (for sweeping things under), I need one.

Backlogs address a real need but also create a new issue: They grow. This might not seem a particularly big problem. Sure, it doesn't feel great to have this list of a hundred things we want to do that doubles in size every year, but does it really hurt anyone? Yes it does, says Donald G. Reinertsen in his amazing book Principles of Product Development Flow. He introduces yet another term for backlog, product development inventory, and claims its cost is massively underestimated.

Reinertsen identifies the following costs of product development inventory:

Longer cycle time. Cycle time is the time between a task's conception and completion. That includes the time we spend working on a task but also the time it sits in a backlog waiting. The longer the backlog, the longer the wait.
Increased risk. This follows from the previous point. Consider as an example the task to fix a bug in a piece of software. Waiting with a fix increases the damage the bug can do, gives the buggy code time to integrate with other code making it harder to change and fix, and gives time for context about the buggy code to seep away, increasing the cost of a fix.
More variability. When temporarily blocked from moving forward with one task a large backlog will always provide another one to work on instead. This creates a bias to having lots of things going on at the same time, which makes it hard to bring focus to those few tasks which are a bit harder than the rest. Such tasks can drag on and result in sudden unexpected delays.
More overhead. Backlogs require prioritizing. Prioritizing tasks requires (re)gaining context on what these tasks are about. If someone else requested work they'll appreciate the occasional status update about how that work is going. The larger the backlog, the more of this type of work we have to do.
Lower quality. If a single step to improve a product takes more time (longer cycle time), then our feedback cycle will be longer too. Reducing the quality of feedback will hurt quality itself.
Less motivation. This one is most intuitive to me. Seeing my todo-list shrink makes me feel good about myself. Seeing it grow not so much.

The upshot of all this: prefer shorter backlogs. Anyone in the market for a 'short backlog methodology' has their pick of gurus. Reinertsen presents a number techniques in The Principles of Product Development Flow. But if you haven't chosen an approach yet, or if the approach you did pick isn't working for you, here's a backup option I quite like: throwing stuff away. Deleting tickets can be scary, but I find a long backlog scarier.

Don't DRY, CARE!

Jasper Woudenberg — Sun, 10 Jan 2021 14:10:29 +0000

If you had to describe how you try to write code in a single acronym, what would it be? I thought a bit about it, and the best I could come up with is CARE. It stands for: Code Aspires to Resemble Explanation. Whichever way I'd explain the purpose of a body of code, I want my code to look like that explanation.

I'm not talking about trying to make lines of code resemble grammatically correct English sentences. It's more about code and (hypothetical) explanation having a similar structure. For example, if your domain is writing utensils and your explanation clearly delineates different behaviors for pencils, ballpoints, and fineliners, do those categories enjoy high-level visibility in your code too? Or will you only come across these categories in the occasional conditional deep down in the implementation?

There's more than one way to explain most things, and so it follows multiple people applying CARE to a problem might end up with very different results. I think that's fine.

Here's some smells that might indicate code could be more CAREful:

Some terminology appears only in explanation or only in code.
A single 'beat' in an explanation narrative requires coordination between multiple distinct bits of code.
The existence of comments explaining what is happening, suggesting the code doesn't explain itself well.

Inline & Extract

Jasper Woudenberg — Sat, 21 Nov 2020 08:48:49 +0000

                      extracting functions
                              -->

      MONOLITHIC                               FRAGMENTED
   small number of                           large number of
   large functions                           small functions

                              <--
                       inlining functions

No matter what you consider the sweet spot for function size, maintaining it requires extracting and inlining to take place with equal frequency.

In your day-to-day work, do you inline functions as often as you extract them?

Automatically OCR scanned PDFs in NixOS

Jasper Woudenberg — Sat, 14 Nov 2020 17:55:08 +0000

Luckily I'm receiving more and more letters by email these days, but I still get a fair amount of paper letters as well. These I scan and then throw away.

To make it so I can find these documents back when I need them I run optical character recognition (OCR) on them after scanning. I can then use pdfgrep to search for a keyword in a directory of PDFs. That's so much easier than coming up with an organization scheme for these documents and applying it!

Here is how it works: My scanner is able to upload scanned files to a directory on a small server I'm running. When the server notices a new PDF in this directory it runs optical character recognition on it and then moves it to a different directory containing all the PDFs I ever scanned.

My scanner is a Brother ADS-1700w. The server is the smallest Hetzner Cloud instance (CX11) and costs me 3 Euro's a month. I use the OCRmyPDF to run optical character recognition. The server is running NixOS and is deployed using morph. Finally, I'm using healthchecks.io to let me know when the setup breaks.

Below is the annotated Nix code that makes the whole thing work.

{ pkgs, ... }:

{
  # A systemd path unit. Path units can be used to start
  # other services when something happens on the file
  # system, like a file being created.
  systemd.paths.ocrmypdf = {
    enable = true;
    # Enable this unit automatically when the server starts.
    wantedBy = [ "multi-user.target" ];
    description = "Start ocr-ing when there's new work.";
    pathConfig = {
      # Activate when files appear in the /data/scans-to-ocr
      # directory. This is where our scanner should upload
      # scanned files!
      DirectoryNotEmpty = "/data/scans-to-ocr";
      # If /data/scans-to-ocr does not exist, create it.
      MakeDirectory = true;
    };
  };

  # The service that does the actual work of running OCR.
  systemd.services.ocrmypdf = {
    enable = true;
    description = "Run ocrmypdf in /data/scans-to-ocr.";
    serviceConfig = {
      # Explain to systemd that this service is a script,
      # not some long-running process it needs to keep
      # alive. If the script exits after it's done that's
      # fine, systemd will call it again if there's new
      # work!
      Type = "oneshot";
      # Now we define what to run when this service gets
      # activated.
      #
      # We can pass `ExecStart` a single command to execute,
      # but the work we want to do does not fit in a single
      # command. Instead we let Nix create a shell script,
      # then tell systemd to run that script.
      ExecStart = let
        script = pkgs.writeShellScriptBin "go-ocr" ''
          #!/usr/bin/env bash

          # Run our OCR logic in turn for each scanned file.
          for file in /data/scans-to-ocr/*; do
            # generates a standard file name containing the 
            # current date and some random characters.
            output="$(mktemp -u "/tmp/$(date +%Y%m%d)_XXX.pdf")"

            # Run ocrmypdf on the scanned file.
            # --output-type   don't generate PDF/A's. This
            #                 might fail, requiring manual
            #                 intervention.
            # --rotate-pages  puts pages right-side-up
            # --skip-text     makes it so ocrmypdf skips
            #                 pages in the PDF that already
            #                 have text content, instead of
            #                 failing.
            # --language      which languages OCR should
            #                 detect. A lot (all?) languages
            #                 seem to be available by
            #                 default. Run
            #                 `tesseract --list-langs`
            #                 to find out which.
            ${pkgs.ocrmypdf}/bin/ocrmypdf \
              --output-type pdf \
              --rotate-pages \
              --skip-text \
              --language nld+eng \
              "$file" \
              "$output" \
              && rm "$file" \
              && mv "$output" /docs

            # Let healthchecks.io know whether ocrmypdf
            # succeeded or failed.
            ${pkgs.curl}/bin/curl --retry 3 \
              https://hc-ping.com/YOUR_UUID/$?
          done
        '';
      in "${script}/bin/go-ocr";
    };
  };

  # For healthchecks.io to mark a check as healthy it needs
  # to receive a periodic update, but we might not scan any
  # documents for days on end. The CRON job below will ping
  # healthchecks.io once an hour, but only if the
  # /data/scans-to-ocr directory is empty, indicating
  # ocrmypdf is doing work.
  services.cron.enable = true;
  services.cron.systemCronJobs = [
    ("0 * * * *      root    "
      + "ls -1qA /data/scans-to-ocr/ | grep -q . "
      + "|| curl -fsS -m 10 --retry 5 "
      + "-o /dev/null https://hc-ping.com/YOUR_UUID")
  ];

}

API design for code quality

Jasper Woudenberg — Sat, 31 Oct 2020 11:03:34 +0000

I so appreciate a thoughtful API. Working with a great API feels like using a super power: I'm productive, feeling good about the quality of the code I write, and just happy.

Designing a good API is hard work. Above all it requires data about different things people want to do with the API. The more the better. Such data provides an objective measure of how useful the API is.

I'm often tempted to create internal API's, hoping it will bring productivity and joy to working in part of the code. This is hard when the small group working with the code don't produce enough data to support good API design.

Which isn't to say we should give up on code quality, just that API design might often not be the right approach. When writing code used by a small group of people techniques like data modeling can achieve better results.

Writing RSpec tests for great debugging experiences

Jasper Woudenberg — Sun, 20 Sep 2020 14:52:20 +0000

The past couple of months I've worked a lot in a legacy codebase. We are lucky to have an extensive test suite which helps our efforts to make large changes immensely. At the same time working these tests has been frustrating. It's clear some failing tests provide better debugging experiences than others.

My team has been working with code that has seen little development in a couple of years. Now that we return to it we need to onboard ourselves. Consider this post an alternative RSpec style guide, containing practices I will argue are beneficial for these archeologist-developers.

There's other things you might optimize tests for though, so you might make different decisions and that's okay.

Do write the test description as a single string

Which of these styles of writing a test description do you prefer?

describe("a boat") do
  context("with a rudder") do
    it("can steer") do ... end
  end
end

it("a boat with a rudder can steer")

The test description will be super important to our future selves. We'll need it to understand how we broke the test, how we may change the test without defeating its purpose, or when we may delete the test. It's harder to read the description if it's split up, more so if there's other code between the segments.

RSpec has rules for writing test description segments. If we follow these rules RSpec can glue these sentence fragments into a full sentence nicely. But following these rules ensures the resulting test description is grammatically correct, not that the description is any good.

The whole-sentence approach has a larger chance of delivering a coherent test description to our future selves intact. For starters, it's easier to write a good test description if we're not at the same time tasked to figure out how to reuse bits of it between tests. And secondly, if we're tweaking an existing test description we can do a better job if we can read the sentence in its entirety.

Avoid `let` bindings

Can you tell whether this test will pass?

describe("taglines") do
  let(:sentence) { "Slugs: #{description}" }
  let(:description) { "the #{adjective} frontier" }
  let(:adjective) { "slimiest" }

  shared_examples_for("TNG intro") do
    let(:adjective) { "final" }
    it("introduces") do
      expect(sentence).to eq("Space, the final frontier.")
    end
  end

  context("sci-fi") do
    let(:sentence) { "Space, #{description}." }
    let(:adjective) { "quietest" }

    it_behaves_like "TNG intro"
  end
end

I thought up this not-so-great example to make a point, but it's not even that bad. A similar test in a real suite intermingles with code from other test cases and might cover several files. That's way worse!

The problem is that let bindings are global variables. Global to a single test to be precise, but when we're debugging just the one test that's the same thing. I don't know any languages or frameworks that recommend extensive use of global variables, except for test frameworks.

I believe we should aspire for test code to have the same quality as production code and for that we need to apply the same practices. Most languages either disallow global variables entirely, warn you when you use them, or heavily discourage their use. Tests will be better for doing the same.

We can use regular ruby variables instead of let bindings, except those we can't pass between the test body and hooks like before, and after. That brings us to the next practice.

Avoid `before` and `after` hooks

Let's look at a test using some hooks.

let(:door) { Door.new }

before(:each) do
  open_door door
end

after(:each) do
  close_door door
end

it("can go through a door") do
  move_through_door door
end

Suppose the can go through a door test fails and we're investigating. First we try to figure out what the test does. That's doable in the example above, but harder in a real test suite where the pieces that make up a single test are far apart, separated by code used by other tests. We'll be scrolling through the larger suite, figuring out which bits of code the failing test uses, and trying to assemble these pieces in our minds.

Often when I'm trying to assemble a mental model of a test this way my brain doesn't quite feel large enough to contain it all. I'm tempted to print the entire test suite, and use a marker on the lines that are relevant to the test I'm investigating. These would be the lines I'd mark in the example:

let(:door) { Door.new }
  open_door door
  close_door door
it("can go through a door") do
  move_through_door door

Hang on a moment, if we squint a bit that almost looks like a valid test. Let's clean that up.

it("can go through a door") do
  door = Door.new
  open_door door
  move_through_door door
  close_door door
end

To me, this is a huge improvement over the test we had before. When a test written in this style fails I can skip the puzzle-solving phase and go straight to debugging.

Suppose creating a door is a bit more involved, and we'd like to reuse the door creation logic in a couple of tests without repeating ourselves. In that case we can use a function:

it("can go through a door") do
  door = create_test_door
  open_door door
  move_through_door door
  close_door door
end

it("can knock on a door") do
  door = create_test_door
  knock_on door
end

def create_test_door do
  Door.new(
    material: :wood,
    locked: false,
  )
end

But wait, this is splitting up the test code. Should we get our markers out again? I don't think so for two reasons. First, create_test_door is explicitly called from the body of the test so that test body still gives a good summary of everything the test does. Second, the function we created has a self-descriptive name so we don't need to look at it's implementation until we have a question related to door creation.

Do test whether your matchers are providing nice error messages

Ideally the test description and error message are all we need understand what is broken in our code. A great error report allows us to move to figuring out why the code is broken, and then fixing it.

In practice error messages can be cryptic, requiring us to interpret them. Interpretation can be quick if we're familiar with the failing test, but we can't count on our future selves having that familiarity.

In RSpec the choice of matcher has a big impact on error quality, and it's easy to make not-so-great choices. Take the following example:

it("George III and George IV are the same") do
  Monarch =
    Struct.new(
      :title,
      :first_name,
      :full_name,
      :number,
      :date_of_birth,
      :place_of_birth,
      :date_of_death,
      :place_of_death,
      :buried_at,
    )
  george3 =
    Monarch.new(
      "King of the United Kingdom of Great Britain and Ireland",
      "George",
      "George William Frederich",
      "III",
      "4 June 1738",
      "Norfolk House, St James's Square, London, England",
      "29 January 1820",
      "Windsor Castle, Windsor, Berkshire, England",
      "St George's Chapel, Windsor Castle"
    )
  george4 =
    Monarch.new(
      "King of the United Kingdom of Great Britain and Ireland",
      "George",
      "George Augustus Frederich",
      "IV",
      "12 August 1762",
      "St James's Palace, London, England",
      "26 June 1830",
      "Windsor Castle, Windsor, Berkshire, England",
      "St George's Chapel, Windsor Castle"
    )
  expect(george3).to eq(george4)
end

This test will fail with the following error.

  1) George III and George IV are the same
     Failure/Error: expect(george3).to eq(george4)

       expected: #<struct Monarch title="King of the United Kingdom of Great Britain and Ireland", first_name="George"...death="Windsor Castle, Windsor, Berkshire, England", buried_at="St George's Chapel, Windsor Castle">
            got: #<struct Monarch title="King of the United Kingdom of Great Britain and Ireland", first_name="George"...death="Windsor Castle, Windsor, Berkshire, England", buried_at="St George's Chapel, Windsor Castle">

       (compared using ==)

That's not great. The test fails because the expected and asserted values are not the same but the report makes it look like they are. This kind of error has sent me looking for the problem in entirely the wrong direction.

Fixing it isn't entirely trivial either. I had to try a couple of improvements before finding one that worked.

Using have_attributes instead of eq: doesn't work with Structs.
Calling .to_s on the monarchs before passing them to eq: no improvement.
Calling .to_h on the monarchs before passing them to eq: 🎉 a diff!

Let's look at another example. eq sometimes produces bad errors, but contain_exactly always produces bad errors. Take this test:

it("fruit salad contains the right ingredients") do
  Ingredient = Struct.new(:name, :grams)
  fruit_salad = [
    Ingredient.new("mango", 400),
    Ingredient.new("pineapple", 300),
    Ingredient.new("coconut flakes", 50),
  ]
  recipe = [
    Ingredient.new("mango", 400),
    Ingredient.new("pineapple", 200),
    Ingredient.new("coconut flakes", 50),
  ]
  expect(fruit_salad).to contain_exactly(*recipe)
end

The test fails with the error below:

1) fruit salad contains the right ingredients
   Failure/Error: expect(fruit_salad).to contain_exactly(*recipe)

     expected collection contained:  ["#<struct Ingredient
name=\"coconut flakes\", grams=50>", "#<struct Ingredient
name=\"mango\", grams=400>", "#<struct Ingredient
name=\"pineapple\", grams=200>"]
     actual collection contained:    [#<struct Ingredient
name="mango", grams=400>, #<struct Ingredient
name="pineapple", grams=300>, #<struct Ingredient
name="coconut flakes", grams=50>]
     the missing elements were:      ["#<struct Ingredient
name=\"pineapple\", grams=200>"]
     the extra elements were:        [#<struct Ingredient
name="pineapple", grams=300>]

The test is failing because we added the wrong amount of pineapple, but it takes some effort to parse that out of the error report. contain_exactly errors get worse as the complexity grows of the items in the arrays we're comparing.

Instead of using contain_exactly we might group the ingredients by name, check we have the right ingredients, then for each ingredient separately check we have the right amounts. That's more work up front for better error messages when tests fail, a trade-off.

As things stand we have to fail our tests intentionally to learn what their error messages might look like, so writing RSpec tests with good errors takes commitment and experimentation. I don't have a style-guide like tip that will help test authors prevent poor matcher usage, but I do think there's a couple of things RSpec can improve:

Improve errors generated by matchers. For example, let eq produce a diff.
Remove matchers that cannot produce good error messages. For example: contain_exactly.
Warn when we use a matcher in a way that will lead to poor error messages. For example, warn we pass eq values of types for which it cannot produce good diffs.

Conclusion

I've shown a couple of practices I believe improve the experience of debugging RSpec tests. It's interesting to note a lot of them come down to using plain Ruby language features over RSpec ones. What do you think of that? Would you miss these RSpec features? What do you like about them? I'd love to hear!

Using Shake Oracles

Jasper Woudenberg — Fri, 15 May 2020 10:18:49 +0000

In a previous blogpost I argued for the use of the Shake build system, and for writing Shake rules like they are recipes. I also recommended the use of newCache, addOracle, and addOracleCache in situations where it's tricky to do recipes. newCache, addOracle, and addOracleCache are super useful, but they have imposing types, not the most helpful names, and they look similar, making it hard to know which one we need. In this post we're going to look at each in a little more detail. Let's start by briefly introducing these functions.

Three functions with different uses

newCache allows us to memoize a function. A memoized function stores all the results it calculates. When a memoized function gets called with arguments it has seen before, it returns an earlier calculated result rather than repeating work. We can use newCache to prevent duplicate work during a build but not across builds, because Shake throws away memoized results after each run.

addOracle allows us to define dependencies that aren't files. Say that we want a rule to rebuild when the date changes. We can write some Haskell to get the current date and make it available as a dependency using addOracle. These oracles run on every build because Shake needs to know if the values they return have changed, in which case the rules that depend on them need to be re-evaluated.

addOracleCache, whatever its name implies, has a use case different from either newCache or addOracle. We can use it to create a rule that produces a Haskell value instead of a file. Suppose we have an expensive calculation, like recursively finding all the dependencies of a source file. We could define a regular build rule using %> that performs the calculation and stores the result in a sources.json file. Other rules could need ["sources.json"], decode its contents and use the result. addOracleCache allows us to do the same thing without the encoding and decoding steps. Like other build rules and unlike addOracle, a rule defined using addOracleCache reruns if any of its dependencies changes.

To summarize what these three functions do before looking at each in more detail:

newCache memoizes a function call within the current build.
addOracle defines dependencies that aren’t files.
addOracleCache creates a rule that produces a Haskell value instead of a file.

Now let's look at these in more detail.

How to use `newCache`

With newCache we can create a memoized function. This is a function that when called several times with the same argument will run just once. The memoized function stores the result of the first run for use in future calls.

newCache has the following type.

newCache :: (Eq k, Hashable k) =>
            (k -> Action v) -> Rules (k -> Action v)
             ^^^^^^^^^^^^^            ^^^^^^^^^^^^^
             the function             the memoized
             to memoize               function

If we skip over the type constraints before the => first (we'll get to them in a moment!), then we can see that newCache takes a function and then returns a memoized version of it. Wherever we were using the original function we can use the memoized version too, because they have the same type.

To add memoization behavior Shake needs to know whether we called the function with a particular argument before. That requires comparing the input argument with input arguments from previous calls, and so we need to insist the input argument is of a type that allows such comparisons. Shake does this by requiring the input type to meet the Eq and Hashable constraints.

One tricky situation where newCache can help us out is when writing a rule for a command that produces more than one file, but where we don't want to hardcode which files it produces. Suppose we have a script generate-schemas.sh that generates JSON schemas for common types.

rules :: Rules ()
rules = do
  "schemas/*.schema.json" %> \_ -> do
    schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"]
    need ("generate-schemas.sh" : schemaSrcs)
    cmd_ "generate-schemas.sh" schemaSrcs

The rule above works but is pretty inefficient. generate-schemas.sh creates all the schema files in a single run, but the rule above will rerun it every time we need a different schema. Let's use newCache to remove this duplication of work.

rules :: Rules ()
rules = do
  generateSchemasCached <- newCache $ \() -> do
    schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"]
    need ("generate-schemas.sh" : schemaSrcs)
    cmd_ "generate-schemas.sh" schemaSrcs

  "schemas/*.schema.json" %> \_ -> generateSchemasCached ()

Now if more than one schema files is needed during a run, generate-schemas.sh is ran just once. Later runs might run generate-schema.sh again, because newCache doesn't save results across builds. That's good though, because we might have deleted schema files in the interim.

How to use `addOracle`

Rules can signal they have a dependency on one or more files using need. This causes the rule to rebuild if the file they depend on changes. But what if you'd like your rule to depend on the day of the week, or the version of a tool? Using addOracle you can define such dependencies.

Say we have a rule that creates a letter from a template.

rules :: Rules ()
rules = do
  "letter.txt" %> \out -> do
    date <- liftIO Date.current
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

Imagine we're working on this until late in the evening. The next day we want to post it, so we run Shake one more time to get today's date on there. Shake skips the update though because it doesn't know it should rebuild the letter when the date changes.

If we had a file somewhere on our computer that always contained the current date then we could need that. As it stands the current date is not a file, it's the result of a call to the (made up) Date.current function. We can use addOracle to turn it into a dependency.

Let's start again by looking at addOracle's type. Notice how apart from the functions' constraints (the part of the type before the =>) addOracle has the exact same type as newCache.

addOracle :: (RuleResult q ~ a, ShakeValue q, ShakeValue a) =>
             (q -> Action a) -> Rules (q -> Action a)
              ^^^^^^^^^^^^^            ^^^^^^^^^^^^^

type ShakeValue a = (Show a, Typeable a, Eq a, Hashable a, Binary a, NFData a)

Like newCache addOracle takes a function and returns a function of the same type. This new function adds a dependency in any rule it gets called in. Shake requires a bunch of constraints on the function's argument type q and return type a to pull this off. Check out the ShakeValue documentation if you're interested in learning what these constraints are for. We'll see in a moment what RuleResult q ~ a is about.

We can use addOracle to fix our letter templating rule like so.

rules :: Rules ()
rules = do
  currentDate <- addOracle $ \CurrentDate -> liftIO Date.current

  "letter.txt" %> \out -> do
    date <- currentDate CurrentDate
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

data CurrentDate = CurrentDate
  deriving (Show, Eq, Generic)

instance Hashable CurrentDate

instance Binary CurrentDate

instance NFData CurrentDate

type instance RuleResult CurrentDate = String

We made a similar change when we introduced newCache: We extract into a function the lines we want to add special behavior to (memoization behavior in case of newCache, dependency-tracking behavior now). We wrap our extracted function using the right helper, and use function this returns in our rule.

What's different this time is that Shake requires each oracle created using addOracle or addOracleCache to have a unique type (we'll see why in a moment). We create one called CurrentDate and use Generic to generate all the instances Shake requires of the type. We also have to tell Shake that the oracle associated with the CurrentDate input type always returns a String result type (the return value of our imaginary Date.current function).

Intermezzo: What's up with these boilerplate types?

The reason oracles argument types need to be unique and the reason we for the RuleResult q ~ a constraint is to support the askOracle function in Shake's APIs. Using it our letter templating example looks like this:

rules :: Rules ()
rules = do
  void . addOracle $ \CurrentDate -> liftIO Date.current

  "letter.txt" %> \out -> do
    date <- askOracle CurrentDate
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

data CurrentDate deriving (Show, Eq, Generic)

instance Hashable CurrentDate

instance Binary CurrentDate

instance NFData CurrentDate

type instance RuleResult CurrentDate = String

We can pass askOracle any of the types we have defined oracles for. Because all our oracles have unique input types askOracle can figure out which one to call. And because we have explicitly defined the return types for each oracle input type using the RuleResult type family the type checker knows the type of askOracle SomeOracleType.

The whole thing is pretty magical, so I like to wrap up oracles in an API that exposes regular functions. As an example, we could wrap up the date oracle like this:

module Rules.Date (rules, current)

import Development.Shake
import qualified Date

current :: Action String
current = askOracle Current

rules :: Rules ()
rules =
  void . addOracle $ \Current -> currentOracle

currentOracle :: Action String
currentOracle = liftIO Date.current

data Current deriving (Show, Eq, Generic)

instance Hashable Current

instance Binary Current

instance NFData Current

type instance RuleResult Current = String

Our letter templating example could use this module like so.

rules :: Rules ()
rules = do
  Rules.Date.rules

  "letter.txt" %> \out -> do
    date <- Rules.Date.current
    need ["letter.template"]
    cmd_
      "templateer letter.template"
      "--out" [out]
      "--options" ["date=" <> date]

How to use `addOracleCache`

Suppose we have a project that contains several Elm applications. The Elm applications share some modules between them. If we change an Elm module we'd like Shake to recompile just those projects that use the module. We could write a rule like this:

rules :: Rules ()
rules = do
  "assets/*.elm.js" %> \out -> do
    let (Just [name]) = filePattern "assets/*.elm.js" out
    let main = name <.> "elm"
    let srcFiles = recursiveDependencies main
    need ("elm.json" : srcFiles)
    cmd_ "elm make --output" [out] [main]

recursiveDependencies :: FilePath -> Action [FilePath]
recursiveDependencies src = do
  direct <- directDependencies src
  recursive <- traverse recursiveDependencies direct
  pure (src : mconcat recursive)

directDependencies :: FilePath -> Action [FilePath]
directDependencies src =
  need [src]
  contents <- readFile src
  pure $ Elm.imports (Elm.parse contents)

This works, but it's not super efficient. For each Elm entrypoint it recalculates the full dependency tree. It would be nice if after calculating the dependency for a particular Elm module we could reuse that result in future builds, until any of the recursive dependencies of a module change.

The function recursiveDependencies looks a lot like a rule. The result it produces is a list of file paths corresponding to Elm modules. Its dependencies are the contents of those Elm modules, because a change in an Elm module might mean that its imports have changed, requiring a recalculation of the dependency tree. Let's use addOracleCache to write it as a rule.

rules :: Rules ()
rules = do
  void $ addOracleCache recursiveDependencies

  "assets/*.elm.js" %> \out -> do
    let (Just [name]) = filePattern "assets/*.elm.js" out
    let main = name <.> "elm"
    let srcFiles = askOracle (RecursiveDependenciesFor main)
    need ("elm.json" : srcFiles)
    cmd_ "elm make --output" [out] [main]

recursiveDependencies :: RecursiveDependencies -> Action [FilePath]
recursiveDependencies (RecursiveDependenciesFor src) = do
  direct <- directDependencies src
  recursive <- traverse (askOracle . RecursiveDependenciesFor) direct
  pure (src : mconcat recursive)

directDependencies :: FilePath -> Action [FilePath]
directDependencies src = do
  contents <- readFile' src -- This takes a dependency on `src`.
  pure $ Elm.imports (Elm.parse contents)

newtype RecursiveDependencies
  = RecursiveDependenciesFor FilePath
  deriving (Show, Eq, Hashable, Binary, NFData)

type instance RuleResult RecursiveDependenciesFor = [FilePath]

Done! Now we'll cache the calculation of module dependencies between builds. One further possible optimization would be to use addOracleCache to turn directDependencies into a build rule as well. That way changes to Elm modules that don't touch imports won't trigger recalculation of a module's dependencies. Give it a try!

It's worth emphasizing that although addOracleCache has an identical type to addOracle, it behaves quite differently. Remember that addOracle is for defining dependencies. Shake runs addOracle functions pre-emptively to check if their return values have changed. Had we used addOracle here performance would be worse than the non-oracle-based version of the code we started with, because Shake would rerun it even if none of the Elm source files in the entire project had changed.

Closing thoughts

I hope this post has been helpful in understanding when and how to use Shake's newCache, addOracle, and addOracleCached functions. One final tip: make your oracle types nice and verbose because Shake uses them in its logs. It will make debugging oracles easier.

That's it. Happy shaking!