Jasper Woudenberg

Posted on Sep 20, 2020 • Edited on Oct 3, 2020

Writing RSpec tests for great debugging experiences

#ruby #rspec #testing

The past couple of months I've worked a lot in a legacy codebase. We are lucky to have an extensive test suite which helps our efforts to make large changes immensely. At the same time working these tests has been frustrating. It's clear some failing tests provide better debugging experiences than others.

My team has been working with code that has seen little development in a couple of years. Now that we return to it we need to onboard ourselves. Consider this post an alternative RSpec style guide, containing practices I will argue are beneficial for these archeologist-developers.

There's other things you might optimize tests for though, so you might make different decisions and that's okay.

Do write the test description as a single string

Which of these styles of writing a test description do you prefer?

describe("a boat") do
  context("with a rudder") do
    it("can steer") do ... end
  end
end

it("a boat with a rudder can steer")

The test description will be super important to our future selves. We'll need it to understand how we broke the test, how we may change the test without defeating its purpose, or when we may delete the test. It's harder to read the description if it's split up, more so if there's other code between the segments.

RSpec has rules for writing test description segments. If we follow these rules RSpec can glue these sentence fragments into a full sentence nicely. But following these rules ensures the resulting test description is grammatically correct, not that the description is any good.

The whole-sentence approach has a larger chance of delivering a coherent test description to our future selves intact. For starters, it's easier to write a good test description if we're not at the same time tasked to figure out how to reuse bits of it between tests. And secondly, if we're tweaking an existing test description we can do a better job if we can read the sentence in its entirety.

Avoid `let` bindings

Can you tell whether this test will pass?

describe("taglines") do
  let(:sentence) { "Slugs: #{description}" }
  let(:description) { "the #{adjective} frontier" }
  let(:adjective) { "slimiest" }

  shared_examples_for("TNG intro") do
    let(:adjective) { "final" }
    it("introduces") do
      expect(sentence).to eq("Space, the final frontier.")
    end
  end

  context("sci-fi") do
    let(:sentence) { "Space, #{description}." }
    let(:adjective) { "quietest" }

    it_behaves_like "TNG intro"
  end
end

I thought up this not-so-great example to make a point, but it's not even that bad. A similar test in a real suite intermingles with code from other test cases and might cover several files. That's way worse!

The problem is that let bindings are global variables. Global to a single test to be precise, but when we're debugging just the one test that's the same thing. I don't know any languages or frameworks that recommend extensive use of global variables, except for test frameworks.

I believe we should aspire for test code to have the same quality as production code and for that we need to apply the same practices. Most languages either disallow global variables entirely, warn you when you use them, or heavily discourage their use. Tests will be better for doing the same.

We can use regular ruby variables instead of let bindings, except those we can't pass between the test body and hooks like before, and after. That brings us to the next practice.

Avoid `before` and `after` hooks

Let's look at a test using some hooks.

let(:door) { Door.new }

before(:each) do
  open_door door
end

after(:each) do
  close_door door
end

it("can go through a door") do
  move_through_door door
end

Suppose the can go through a door test fails and we're investigating. First we try to figure out what the test does. That's doable in the example above, but harder in a real test suite where the pieces that make up a single test are far apart, separated by code used by other tests. We'll be scrolling through the larger suite, figuring out which bits of code the failing test uses, and trying to assemble these pieces in our minds.

Often when I'm trying to assemble a mental model of a test this way my brain doesn't quite feel large enough to contain it all. I'm tempted to print the entire test suite, and use a marker on the lines that are relevant to the test I'm investigating. These would be the lines I'd mark in the example:

let(:door) { Door.new }
  open_door door
  close_door door
it("can go through a door") do
  move_through_door door

Hang on a moment, if we squint a bit that almost looks like a valid test. Let's clean that up.

it("can go through a door") do
  door = Door.new
  open_door door
  move_through_door door
  close_door door
end

To me, this is a huge improvement over the test we had before. When a test written in this style fails I can skip the puzzle-solving phase and go straight to debugging.

Suppose creating a door is a bit more involved, and we'd like to reuse the door creation logic in a couple of tests without repeating ourselves. In that case we can use a function:

it("can go through a door") do
  door = create_test_door
  open_door door
  move_through_door door
  close_door door
end

it("can knock on a door") do
  door = create_test_door
  knock_on door
end

def create_test_door do
  Door.new(
    material: :wood,
    locked: false,
  )
end

But wait, this is splitting up the test code. Should we get our markers out again? I don't think so for two reasons. First, create_test_door is explicitly called from the body of the test so that test body still gives a good summary of everything the test does. Second, the function we created has a self-descriptive name so we don't need to look at it's implementation until we have a question related to door creation.

Do test whether your matchers are providing nice error messages

Ideally the test description and error message are all we need understand what is broken in our code. A great error report allows us to move to figuring out why the code is broken, and then fixing it.

In practice error messages can be cryptic, requiring us to interpret them. Interpretation can be quick if we're familiar with the failing test, but we can't count on our future selves having that familiarity.

In RSpec the choice of matcher has a big impact on error quality, and it's easy to make not-so-great choices. Take the following example:

it("George III and George IV are the same") do
  Monarch =
    Struct.new(
      :title,
      :first_name,
      :full_name,
      :number,
      :date_of_birth,
      :place_of_birth,
      :date_of_death,
      :place_of_death,
      :buried_at,
    )
  george3 =
    Monarch.new(
      "King of the United Kingdom of Great Britain and Ireland",
      "George",
      "George William Frederich",
      "III",
      "4 June 1738",
      "Norfolk House, St James's Square, London, England",
      "29 January 1820",
      "Windsor Castle, Windsor, Berkshire, England",
      "St George's Chapel, Windsor Castle"
    )
  george4 =
    Monarch.new(
      "King of the United Kingdom of Great Britain and Ireland",
      "George",
      "George Augustus Frederich",
      "IV",
      "12 August 1762",
      "St James's Palace, London, England",
      "26 June 1830",
      "Windsor Castle, Windsor, Berkshire, England",
      "St George's Chapel, Windsor Castle"
    )
  expect(george3).to eq(george4)
end

This test will fail with the following error.

  1) George III and George IV are the same
     Failure/Error: expect(george3).to eq(george4)

       expected: #<struct Monarch title="King of the United Kingdom of Great Britain and Ireland", first_name="George"...death="Windsor Castle, Windsor, Berkshire, England", buried_at="St George's Chapel, Windsor Castle">
            got: #<struct Monarch title="King of the United Kingdom of Great Britain and Ireland", first_name="George"...death="Windsor Castle, Windsor, Berkshire, England", buried_at="St George's Chapel, Windsor Castle">

       (compared using ==)

That's not great. The test fails because the expected and asserted values are not the same but the report makes it look like they are. This kind of error has sent me looking for the problem in entirely the wrong direction.

Fixing it isn't entirely trivial either. I had to try a couple of improvements before finding one that worked.

Using have_attributes instead of eq: doesn't work with Structs.
Calling .to_s on the monarchs before passing them to eq: no improvement.
Calling .to_h on the monarchs before passing them to eq: 🎉 a diff!

Let's look at another example. eq sometimes produces bad errors, but contain_exactly always produces bad errors. Take this test:

it("fruit salad contains the right ingredients") do
  Ingredient = Struct.new(:name, :grams)
  fruit_salad = [
    Ingredient.new("mango", 400),
    Ingredient.new("pineapple", 300),
    Ingredient.new("coconut flakes", 50),
  ]
  recipe = [
    Ingredient.new("mango", 400),
    Ingredient.new("pineapple", 200),
    Ingredient.new("coconut flakes", 50),
  ]
  expect(fruit_salad).to contain_exactly(*recipe)
end

The test fails with the error below:

1) fruit salad contains the right ingredients
   Failure/Error: expect(fruit_salad).to contain_exactly(*recipe)

     expected collection contained:  ["#<struct Ingredient
name=\"coconut flakes\", grams=50>", "#<struct Ingredient
name=\"mango\", grams=400>", "#<struct Ingredient
name=\"pineapple\", grams=200>"]
     actual collection contained:    [#<struct Ingredient
name="mango", grams=400>, #<struct Ingredient
name="pineapple", grams=300>, #<struct Ingredient
name="coconut flakes", grams=50>]
     the missing elements were:      ["#<struct Ingredient
name=\"pineapple\", grams=200>"]
     the extra elements were:        [#<struct Ingredient
name="pineapple", grams=300>]

The test is failing because we added the wrong amount of pineapple, but it takes some effort to parse that out of the error report. contain_exactly errors get worse as the complexity grows of the items in the arrays we're comparing.

Instead of using contain_exactly we might group the ingredients by name, check we have the right ingredients, then for each ingredient separately check we have the right amounts. That's more work up front for better error messages when tests fail, a trade-off.

As things stand we have to fail our tests intentionally to learn what their error messages might look like, so writing RSpec tests with good errors takes commitment and experimentation. I don't have a style-guide like tip that will help test authors prevent poor matcher usage, but I do think there's a couple of things RSpec can improve:

Improve errors generated by matchers. For example, let eq produce a diff.
Remove matchers that cannot produce good error messages. For example: contain_exactly.
Warn when we use a matcher in a way that will lead to poor error messages. For example, warn we pass eq values of types for which it cannot produce good diffs.

Conclusion

I've shown a couple of practices I believe improve the experience of debugging RSpec tests. It's interesting to note a lot of them come down to using plain Ruby language features over RSpec ones. What do you think of that? Would you miss these RSpec features? What do you like about them? I'd love to hear!

Top comments (2)

Peter Johnston • Sep 20 '20

I like these rules. However, the problem is rspec and the ruby community strongly prefer and encourage a lot of this, like peppering every test with 20 lets. It would be very hard to find a team that actually agrees and writes tests in this way, as you are fighting the norm. This is my main problem with rspec

Jeff Dickey • Oct 2 '20 • Edited

We left RSpec for Minitest five years ago because we wound up spending 1/3 of our effort writing custom matchers to solve just the problems you highlight. That's much easier to do in Minitest::Spec than RSpec; at least it feels more like Plain Old Ruby and it's easier to onboard new people to the code and tests. I'm starting a small skunkworks project to prove out some new components and practices, and decided to give RSpec another try without writing custom matchers.

The code tools the project set out to evaluate and demonstrate are going gangbusters; RSpec, not so much. I'm not ready to agree with DHH's long-time stance that Minitest (::Unit) is all you need, but if I didn't know about Minitest::Spec, I probably would. Friends shouldn't let friends write RSpec, and yet, here we are.