A multi-layer approach to test runs

#rails #rspec #testing #discuss

A Rails project I joined recently has 15k test examples. With a powerful CI machine running parallel_tests it still takes 30 minutes to run the suite.

This is a meditation on how to fix this.

TL;DR

Have three tiers of "all" spec runs:

Local rake rspec:core_units task that finishes in under 60s and runs core unit tests. For developers to run inbetween single feature development locally.
CI "quick" run that skips non-units (feature specs etc.). Must finish in under 10min.
CI "full" run, callable manually and scheduled, before merging into master etc.

Mark slow, overindulging specs that you write while shaping a feature up with :slow tag.
Configure RSpec to not run these slow tests generally with config.filter_run_excluding(:slow).
This way your full-suite CI runs stay snappy a year from now.

Figure out a way to differentiate "full" and "quick" CI runs. Normally do "quick". Do not run specs that are unlikely to have been affected by changes on "quick" runs.
Do "full" runs on request by reading the commit message for "[full]" tag.

What are tests for?

Recent polls indicate that at least 85% of Ruby developers write tests, and even more developers agree that tests are good, but what are they for, exactly?

I would hope that tests have something to do with the three central pillars of good software - correctness, maintainability and performance!

I subscribe to TDD philosophy, so tests for me are a way to formalize what I have to develop and mentally structure and direct daily work. I write a failing test, then I write the feature that makes the tests pass, and repeat.

So one aspect is that tests are for writing correct code.

Another great thing about tests is that they are a record of how a feature should work in the face of changing code elsewhere. They allow you to identify unexpected effects your code has on other, existing parts of the project.

So another aspect is that tests are for maintainability over time.

Lastly, and, admittedly, least, at least in my experience, tests can be a way to specify how performant code should be.

So the third aspect is that tests can be for profiling, benchmarking and tracking performance.

Shifting priorities

Thinking in terms of these three aspects of tests, it can be argued that while all three are important over the whole life-cycle of a project, correctness is important all the time, and maintainability and performance considerations becoming more prominent only as the project grows both in line count and user base.

Correctness ... ... ... ... ... ... Maintainability ... ... Performance

How considerations interact

It can also be argued that something that is good for one of the three pillars may have a detrimental effect on the other two.

Focusing on correctness testing sometimes leads to overlapping test runs that share a lot of setup, but only test a small edge-case or branch in logic - this leaves a mark on maintainability and performance.

Focusing on maintainability testing can lead to excess time spent overdesigning the simplest of features.

Focusing on performance testing can leave a feature undertested, working not quite how it was intended, or so uniquely optimized that it is hard to use elsewhere and maintain.

In my experience, and this project in particular, correctness testing is being focused on far too much, and it severely impacts maintainability and, in some cases, performance.

Being correct, but not too correct

There has to be a way to write correctness tests during initial development of a feature and then keeping a performant subset of those for maintainability.

Each project is unique, but for this one I have these ideas:

Have two layers of tests for scopes - one which creates records and makes a query, which would only be ran on demand, and the other, which validates the SQL generated and ran in general rspec runs.
Do not run shared examples on every module includer

Once you have identified which tests need not be run past the initial development phase (but may be useful in select occasions), you may use RSpec's tagging feature to specify which tests to generally skip.

# in rails_helper.rb
RSpec.configure do |config|
  config.filter_run_excluding(:slow)
  # ...
end

# in spec/models/user_spec.rb
RSpec.describe User do
  describe "scopes" do
    describe ".admins" do
      subject(:collection) { described_class.admins }  

      context "when only validating generated SQL" do
        subject { super().to_sql }

        let(:slq) { "some sql" }

        it { is_expected.to eq(sql) }
      end

      context "when creating actual records and making a query", :slow do
        let!(:admin) { create(:user, :admin) }
        let!(:regular) { create(:user) } # counter

        it "collects users with admin privileges" do
          expect(collection.pluck(:id)).to contain_exactly(admin.id)
        end
      end
    end
  end
end

Now if you run rspec spec/models/user_spec.rb only the SQL example will run, but if you run rspec spec/models/user_spec.rb:2, both examples regardless of tags will run.

This way you can easily filter some tests from full-suite rspec runs.

Running only relevant set of specs

Ideally you would want CI to run all your suite every time, but given the suite runs for 30 minutes, it would only be able to run 16 full suites per 8 work hours.
With five developers pushing several commits a day, it is becoming unsustainable.

Knowing that pushing a trivial commit essentially robs 1/16 of CI time for the day, I always make sure I cancel the CI run. This is a manual process, which is slow and breaks my flow. So first I should implement a "do not run CI for this commit" functionality. Switch based on commit message contents comes to mind.

One big problem is that there is no way to get a glimpse of suite status locally, since without parallelization the whole suite would run for hours.
To fix this I have to make some rake rspec:core_units task that runs a curated, small set of specs that would give a developer a quick response whether their code has not dramatically broken everything, potentially sparing the need to push to CI.

Unfortunately, even without executing unnecessary or trivial runs, the CI is being swamped. I should allow CI to do much less, but balancing the feedback developers get from the run. I have to limit what specs are executed most of the time.

In my opinion CI runs cover the maintainability aspect of tests, so the "quick" runs the CI will generally do can skip specs that are for code that is not coupled to anything and is unlikely to break. This means that service and controller specs can generally be skipped, while model and feature specs ought to be run every time.

Once I have identified which tests will be run on every CI call, and which only during the occasional "full" run, I have to figure out a way to differentiate the run mode. Reading the commit message should work again.
Normally do "quick".
Do "full" runs on request by reading the commit message for "[full]" tag.
Also do scheduled "full" runs at night for master and other core branches.

In the end it comes out that a three tier spec run system has to be created:

Local rake rspec:core_units task that finishes in under 60s and runs core unit tests. For developers to run inbetween single feature development locally.
CI "quick" run that covers central business logic and units. Must finish in under 10min.
CI "full" run scheduled, and callable manually before merging into master etc.

Conclusion

What do you guys think of my plan for optimizing the testing situation for this project?
I am also looking into TestProf and hope to drive the 30min full suite execution time down to 19min and less.
What has been the most impactful tweak for your test suites?