Chun Fei Lung

Posted on Aug 30, 2021 • Edited on Dec 4, 2021 • Originally published at chuniversiteit.nl

Does it matter if you write tests before or after you write your code?

#testing #programming

Adding features during refactoring is counterproductive! It’s a fallacy that may blow up in your face.

I read and summarise software engineering papers for fun, and today we’re having a look at A dissection of the test-driven development process: Does it really matter to test-first or to test-last? (2017) by Fucci and others.

Test-driven development is a development practice that involves short, iterative cycles in which the programmer writes tests before adding new functionality or refactoring existing code. It’s commonly believed that writing tests first leads to higher-quality code and improved productivity. This study puts that belief to the test.

Why it matters

Test-driven development (TDD) has multiple characteristics that set it apart from “traditional” programming, but the “tests first, code later” aspect tends to be the thing that most people talk about (and remember).

There’s more to it than that however, so let’s talk definitions first.

TDD is an programming technique which involves cyclic, iterative implementation of new features.

In each cycle a programmer carries out the following tasks:

Writing unit tests for the desired behaviour;
Writing code to make those tests pass;
Strictly refactoring code to improve its design, i.e. without modifying its behaviour (*).

(*) Doing so could nullify or even reverse the benefits of refactoring.

A cycle is finished when all new and existing unit tests pass, and the programmer is content with the program’s design. Ideally, all cycles are short and roughly the same length; around 5 minutes long and never be longer than 10 minutes.

TDD advocates claim that adherence to these practices will lead to improved quality and productivity.

In a nutshell, TDD has four characteristics:

The sequence in which tests are written; before or after coding
The granularity (length) of cycles
The uniformity of cycle lengths
The amount of effort spent on refactoring

How do these four characteristics affect the external quality (**) of the produced software and the developer’s productivity?

(**) “Does the software do what it’s supposed to do?”

How the study was conducted

The authors held several five-day workshops about unit testing and TDD at two Nordic companies.

During the workshop, participants were asked to individually implement three tasks, of which two were greenfield and one was brownfield. Some participants made use of a test-first sequence, while others used a test-last sequence.

TDD dictates that development is done iteratively using many short cycles. To help participants work on their tasks in small steps, the researchers refined each task into clearly delineated stories and sub-stories. Tasks were then “graded” using acceptance test suites for each user story in order to determine the quality of submitted solutions.

All participants made use of a special Eclipse IDE that collected information about actions that are performed in it, like:

Code modification
Test modification
Code compilation
Test execution

This information is used to determine how participants applied TDD.

Combining timestamps from the IDE logs with the pass rate of the acceptance test suite allows one to calculate the productivity of each developer.

What discoveries were made

You probably already guessed by now that Betteridge’s law of headlines strikes again, but in what way?

Correlation

Granularity and uniformity are positively correlated, i.e. developers who use shorter cycles are able to keep them consistently short, while those who use larger cycles tend to have cycles of varying lengths. Both factors also appear to affect external quality: smaller cycles and cycles that have consistent lengths are associated with better external quality.

A small, but statistically significant correlation exists between granularity and refactoring effort: developers who use coarser cycles spend less time on refactoring.

Regression

To better understand the relation between TDD’s four characteristic factors and the two outcome variables (quality and productivity), the authors constructed two models.

The basic idea here is that each model should predict one of the outcome variables using information about the code-test sequence, cycle granularity and uniformity, and refactoring effort.

A good model is also simple, and should not include superfluous input variables. The process of trimming these variables, feature selection, is described in the original article.

I’ll simply list the most noteworthy discoveries here:

Code-test sequence is not part of either model, which suggests that – at least for external quality and developer productivity – it does not matter whether you write your tests before or after your “real” code (***);
Cycle granularity and uniformity, and refactoring effort are all negatively correlated with both quality and productivity.
The negative correlation between refactoring effort and the two outcome variables is likely due to floss refactoring (****).

(***) This study did not look at the effects on internal quality (i.e. maintainability), which is also pretty important.

(****) This is a form of refactoring that also includes other activities, like implementation of new features. These new features might not be covered by tests and are therefore more likely to introduce regression bugs.

DEV Community