Augusts Bautra

Posted on Dec 21, 2018 • Edited on Feb 8, 2024

Minimising spec fragility

#testing #rspec

As an application grows, so does the amount of specs and at one point it may be useful to check on the health of those tests.

An unhealthy test can come in many forms - it can be slow, it can be fragile (fail intermittently), it can be hard to maintain, and worst of all, a test can fail to test the thing it is supposed to test.

Here I will discuss some strategies for dealing with test fragility. The examples and tools I will show will be for a Ruby on Rails application using RSpec, but the principles should apply to any other environment.

Sources of test fragility

Provided the application code and programming language used are stable and are not to blame for the failures, there can only be one source of fragility - changes in test environment.
These changes can result in several ways. Some elements of the environment are used implicitly and are always changing unless explicitly told not to, like the time. Other elements can change unexpectedly, like responses to web requests. Yet other things can change in response to what the application code is doing, like the database contents.
Being aware of these considerations is an important part of (web) development. Strategies that strive to mitigate the adverse effects of changing environment include writing tests that are isolated from the environment, and controlling or outright preventing unwanted changes.

Controlling the test environment

Modern (web) applications interact with a myriad of other processes so it may be a fool's errand to control all of them. Luckily, however, the most common partner-systems can be easily identified and there usually are tools for controlling ties to them in tests.

A non-exhaustive list of popular partner-systems:

System time
System tools (ImageMagick, wkhtmltopdf, etc.)
The filesystem
The database
Web requests

Controlling time

Whenever the application code being tested deals with timestamps, dates, periods and the like, it is probably a good idea to control the test time.
Use the Timecop gem to freeze time for these tests. Here's a useful snippet

RSpec.configure do |config|
  config.before do |test|
    if test.metadata.key?(:freeze_at)
      Timecop.freeze(test.metadata[:freeze_at])
    end
  end

  config.after do
    # return after each test just to be sure
    Timecop.return
  end
end

# then in tests
it "returns correct timestamp", freeze_at: "2018-12-21" do
  # example
end

If there's reason to believe that there may be time-fragile tests in the suite, it may be useful to investigate this by simulating adverse conditions.
Use timecop to travel to (not freeze) a day, month, year change moment, like the new year's.
Scale time by a factor of 100, so that 10ms is 1s.
The idea is to reproduce execution spanning the change of day, month, year and expose fragile tests.

before do
  Timecop.travel("9000-01-01 00:00:00".to_datetime.ago(1.second))
  Timecop.scale(100)
end

Controlling system tool calls

Just like with other outside dependencies, system tool calls should be stubbed out, preferably globally. Normally there are libraries wrapping the system tool call, so the api of those libraries should be stubbed out. If there are system tools that get called in raw form like %x'git status', the call ought to be wrapped in a method and the method stubbed.

Controlling the filesystem

When dealing with files tmpfiles should usually be used. However, sometimes application code does indeed change the state of the filesystem, like creates new files and writes to them. Isolating these changes can be achieved in three ways:

Stub out the creating and writing and test only calling of that, supposedly core, language behavior
Use a test harness, like an in-memory filesystem (fakefs gem) that handles cleaning itself between examples.
Explicitly delete created files and revert changes in and after hook (error prone!)

Controlling the database

A topic fraught with much grief and confusion.

The basic idea is that each example should begin in a maximally empty DB and the records created during the example should be deleted after the example, returning the DB to pristine state.
In practice allowing the write and then performing deletion is slow and unnecessary because popular databases like Postgres and MySQL support the idea of transactions - storing changes such as record creations in memory only, and allowing fast revert.
RSpec's vanilla settings use these "transactional examples" and this should be sufficient for most test setups.

Unfortunately, some apps on older Rails encountered a problem with capybara feature tests, namely, the capybara process had no access to the records created in setup transaction and developers were forced to disable transactional examples, allowing actual writes to DB and then having to handle cleaning with a gem like database_cleaner, which tended to be significantly slower than transactional examples.
Luckily, recent Rails versions (4.2+?) solve this problem natively, and for older versions there's the ActiveRecordSharedConnection backport.

Interestingly, the assumption that there are no records in the DB can lead to fragile tests with these characteristics:

hardcoded ids where exact instance id references would be correct
absolute change expectations where relative expectations would be correct
exact record expectations in scopes where a correct record presence and incorrect record absence would be correct

To identify tests that are fragile in this regard try creating a commonly used model record during before(:suite). This will shift ids and amounts, and expose incorrect expectations.

Controlling web requests

Since Ruby has a number of gems that handle making web requests (RestClient ftw), stubbing out just one library may not be enough.
Luckily, Webmock does the stubbing of web requests at a deeper level and can entirely restrict real outside web requests during tests. Use this functionality!
With real requests denied, Webmock will raise an error if an unexpected request occurs during a test, even suggesting how to stub it.

To identify fragility associated with web requests, try telling Webmock to return a code 500 response to all requests.

require "webmock/rspec"
WebMock.disable_net_connect!(allow_localhost: true)

RSpec.configure do |config|
  before do
    stub_request(:any, %r'.*').to_return(
      body: "everything went wrong", status: 500
    )
  end
end

Conclusion

By being aware of the common sources of fragility and common strategies for dealing with them developers should be able to maintain test suites that work reliably.

Thank you for reading.

DEV Community