Ruairí O'Brien

Posted on Sep 23, 2018 • Updated on Sep 24, 2018

Does Test Driven Development Work?

#tdd #code

Introduction

Bertrand Russell was concerned about the issue of having to accept certain axioms to be able to proceed with an education in mathematics even though those axioms had not necessarily been rigorously proven. He treats aspects of this in his book An Introduction to Mathematical Philosophy. An example that is given in the In Our Time Episode about Bertrand Russell is, for any two points in space there exists a line between those two points. This is intuitive but no proof existed of the truth of it. One major problem with a lack of proof is that, even if something might be intuitive, it is possible to argue that thing is not intuitive to you, or some other answer is more intuitive and there is no proof to appeal to that says an argument is right or wrong.

A question that will be explored in this post is, do we have any proof that Test Driven Development (TDD) works?

What does ‘Work’ mean in this context? Two measurements often used are:

Productivity
Code Quality

Does practising TDD mean a real improvement in productivity or code quality? Just measuring productivity or code quality is difficult enough.

In this post, we will look at some of the discussion going on around TDD in the industry and some of the efforts made to evaluate TDD in a scientific way.

Opinions in Industry

Before we look at scientific studies and data, it's worth going over some well known and subjective discussions around TDD for context.

Kent Beck is credited with having developed or 'rediscovered' TDD somewhere around 1999. Since then there have been many proponents of TDD and a multitude of books on the subject. There is certainly a vocal group in the industry that claims TDD is critically important to writing good software.

David Heinemeier Hansson (whose name is often abbreviated to DHH) is the creator of Ruby on Rails expressed what seems to be a fairly widely held opinion against TDD in his blog post TDD is dead. Long live testing. An interesting aspect of that post is the point of view is taken that TDD has somehow won to become accepted as the right way to do development in the industry and is even causing harm. The post was written a few years ago now but the arguments don't appear to have changed too much.

In my personal experience, TDD is not generally accepted as a necessary process in software development and is rarely mandated. There is a lot of debate and while being a practitioner of TDD is not looked down upon, it is not generally seen as being important or necessarily correct. This differs between software teams of course but in general, this is my experience. It is very common at standups I have attended to hear developers say things like, 'I am done but just need to add tests now'. In conversations like this, I have never heard anyone suggest the TDD process should have been used. It is left to the individual developers to decide that for the most part.

Bob Martin can be a polarizing individual but he is an important proponent of TDD. He attempts to refute DHH's points in his post When TDD doesn’t work.

A reference within the original 'TDD is dead' post is the article Why Most Unit Testing is Waste by Jim Coplien which is discussed in the video where Jim Coplien and Bob Martin Debate TDD.

The TDD is dead subject also led to a long discussion between Kent Beck, Martin Fowler and David Heinemeier Hansson which is quite entertaining and can be watched here if you are interested:

One interesting detail in those videos was the discussion around a pleasurable workflow and if there is a notable distinction in this area between a mind that prefers to write a test to begin solving a problem or a mind that prefers not to write a test until some form of solution to the problem has already been written.

All this covers the input of only a few people in the industry and there are many more examples. I am pointing these out here because, despite the rich debate that took place in all these blog posts and discussions, no conclusive answers can be arrived at. It seems rare that one side convinces. That may just be the nature of things but maybe we can explore it a little deeper.

In nearly all the discussions there was a consensus that automated testing is important. Most disagreements seemed to be around the perception of TDD as a process and perhaps the granularity of automated tests too.

As with so many subjects in software development, a lot of the information about TDD that we share and consume is based on opinions and anecdotes. We hear about many stories from the perspective of the storyteller and we try to build up a picture of what is right so we can try to apply that to our own situations. We generally just try things to see if they work. Some data would be useful. Is it just that software development is very difficult to measure in this way? When it comes to TDD at least some efforts have been made to gather data.

Studies on TDD

In 2003 a study called An Initial Investigation of Test Driven Development in
Industry was conducted with ’24 professional pair programmers’ where one group developed with TDD and another group used a more conventional (at the time) design-develop-test-debug waterfall approach. That study also references an earlier German Study that was run with 19 graduate students that concluded that

test-first manner neither leads to quicker development nor provides
an increase in quality. However, the understandability of the
program increases, measured in terms of proper reuse of existing
interfaces.

There were a lot of limitations to that study which the 2003 study were trying to address.

According to the George and Williams paper, the TDD group produced code that passed 18% more functional black box test cases but took 16% more time for development.

A hypothesis of this research was that the TDD approach would yield code with superior external code quality. Based on the data analysis conducted, the experimental findings are supportive that the TDD approach yields code with superior external code quality. However, the validity of the results must be considered within the context of the limitations discussed in external validity section.

There were also some interesting findings on the lack of testing done by the control group. Not practising TDD appeared to cause a leaning towards a lack of testing in general.

This study had many limitations but it was still a good effort to gather some real data on how TDD might work in the industry. In terms of proof that TDD works or not, this study is far from that.

An interesting paper to look at is Janzen, D. S., (2006). An Empirical Evaluation of the Impact of
Test-Driven Development on Software Quality which looks at other studies and does analysis on data gathered by them. This is a long paper and I won't repeat much of it here but I will highlight some interesting data and points from it. The data is interesting but it is worth keeping in mind the data is from the early 2000's with relatively small sample sizes.

Below are tables with summaries of findings from various papers referenced in Janzen, D. S., (2006)

Type (CE) is a controlled experiment and (CS) is a case study.

In Industry

Study	Type	No.of Companies	No. of Programmers	Quality Effects	Productivity Effects
George	CE	3	24	TDD passed 18% more tests	TDD took 16% longer
Maximilien	CS	1	9	50% reduction in defect density	minimal impact
Williams	CS	1	9	40% reduction in defect density	no change

In Academia

Study	Type	No. of Programmers	Quality Effects	Productivity Effects
Edwards	CE	59	54% fewer defects	n/a
Kaufmann	CE	8	improved information flow	50% improvement
Muller	CE	19	no change, but better reuse	no change
Pancur	CE	38	no change	no change
Erdogmus	CE	35	no change	improved productivity

The data here suggest mostly positive or neutral results from TDD. There is one exception (the previously mentioned George paper) where TDD took 16% longer than the control group but it is noted the control group wrote fewer tests in that study.

The essence of TDD as a design methodology is virtually unstudied, yet scattered early adoption has proceeded based solely on anecdotal evidence.

While empirical studies will rarely produce absolute repeatable results, such studies can provide evidence of causal relationships, implying results that will most likely occur in given contexts.

This to me is a great point. Measurement is difficult and getting fully conclusive results for something like this is almost impossible but we can still use the evidence to help make decisions.

Survey data reveals that developer opinions about the TDD process improve with TDD experience whereas opinions of test-last programming decrease.

Much of the data in that paper relies on surveys. These surveys indicated that the more knowledge and experience a developer had with testing and software development in general, the more likely the developer would be of having a better experience with TDD as a process. Interesting but not a major revelation of course.

A conclusion the paper arrived at:

This research has demonstrated that TDD can and is likely to improve some software quality aspects at minimal cost over a comparable test-last approach. In particular it has shown statistically significant differences in the areas of code complexity, size, and testing.

This conclusion seems very positive in favour of TDD. A notable issue with these studies is that the sample size is quite small, putting some doubt on the statistical significance of the results. That said, I would argue this work is better than nothing and gives us something to go on at least.

This research revealed a number of differences between TDD acceptance and efficacy in beginning and mature developers.

This is just an interesting observation to me. Why is there a correlation between experience and an acceptance of TDD?

A similar work: Overview of the Test Driven Development
Research Projects and Experiments also looks at varied research papers. This one is slightly newer (2012) and included some more recent research. A significant addition is a study with IBM and Microsoft development teams.

Final conclusion in this study was (Nagappan et al., 2008):

Reducing of defect density (IBM 40%, Microsoft 60% - 90%)
Increase of time taken to code feature (15% - 35%).

Threats to the validity of the study were identified as (Nagappan et al., 2008):

Higher motivation of developers that were using TDD methodology.
The project developed by using TDD might be easier.

The findings of that paper were less positive than Janzen's as they rightly noted the results were too varied and the sample sizes too small to draw any positive conclusions.

Another controlled experiment conducted in 2012 once again concluded that TDD is probably a good thing but that more evidence is needed.

Another study from 2016 looked at the effects of TDD compared to Test Last Development (TLD).

In this paper we reported a replication of an experiment
[13] in which TDD was compared to a test-last approach.

This study seemed to lean towards a verdict that TDD doesn't improve things over TLD:

Given the limitations presented in Section 5, it appears that
TDD does not improve, nor deteriorate the participants’
performance with respect to an iterative development technique
in which unit tests are written after production code

If you look at how the tests were conducted though you can see an interesting aspect of it is how programmers were to use iterative development. There was a fairly tight loop between test and production code for both TDD and TLD. Perhaps a tight iterative loop is more important than the order of test code?

In my opinion, this paper didn't provide enough data to give a conclusion one way or the other. It certainly didn't prove TDD was better or worse than writing all your tests after writing all your production code for a decent sized project.

The final study we will glance at is one I just recently came across called A dissection of the test-driven development process: does it really matter to test-first or to test-last? which is also by Fucci et al. This study actually appears to reach a similar conclusion to the previous study except that in this case, the programmers involved work in industry. Also, the comparisons were with iterative test last (ITL) development (which I think is a better term than TLD for this) and TDD.

Another interesting conclusion in that study is that shorter cycle times (time between production code and test), to a point, do appear to lead to better quality.

Some Conclusions

Looking at the data it is fairly easy to say the answer to the questions 'Does TDD Work?' is inconclusive.

If the question becomes more specific at least there are some answers.

The evidence appears to be heavily stacked in favour of short iterative test cycles, very similar to that prescribed in TDD literature, being significantly better for code quality with minimal impact on productivity. A process of small iterations with small testable blocks of code does appear to lead to more maintainable code.

Current evidence suggests that unit testing beats integration and higher level testing.

There is currently very little evidence that practising TDD is bad.

Whether writing a test first or last is better within a tight iterative cycle is inconclusive.

Testing is a discipline that takes time to be learned before seeing the real benefits of it. That is my own observation but is supported by observations in some of the studies we looked at.

Will we ever know for sure if TDD really works or not? I don't know. On a positive note, It does seem that questions like this are being asked more often these days. More rigorous study is being done. Take the book Accelerate by Nicole Forsgren, Jez Humble and Gene Kim for example. It's not about TDD specifically but it's an amazing example of the work being done to try and figure out what techniques really work in software development so at least we should have more accurate information to help us make decisions about subjects like this.

Personal Thoughts On All This

I spend a lot of time urging other developers to do TDD or at least to try it out. Usually, people will give it a go but have difficulty sticking with it. I try to explain that it is something you learn over time and when you get good at it, it’s great!

It occurred to me fairly often that I might be wrong. What if I am trying to get people to invest time in something that isn’t that great. What proof is there that TDD even works? Just because it seems to work for me, it does not automatically follow it will work for everybody.

That was the reason for this post. I wanted to look for proof that TDD actually works. I also wanted to read as much as I could from people who are against TDD or some aspect of it. Just because I think TDD is good I tended to avoid them and that’s not a good way to learn.

I personally do think varied approaches to TDD are OK. The definition of TDD shouldn't be like some ancient text to be taken absolutely literally and never altered.

To me, one of the biggest advantages (maybe the most under-appreciated advantage) of TDD is the design that it encourages. A complaint I always hear is that too many unit tests lead to code that's difficult to maintain and change. That's just a common pitfall in the learning process. Once you add TDD to your arsenal of software development techniques and learn it well, it should really help in achieving good software design. If good, clean, well-designed software is your goal, I believe TDD is something that will help really help you to achieve that.

It does sometimes feel like TDD is just a little too hard though. It reminds me of trying to do functional programming in Java. If everything is designed from the beginning to facilitate TDD, it's fairly easy. That is so rare still, and trying to do TDD all the time is really difficult in the systems we have to work with. Thanks to the tools and educational material out there today, things seem to be getting better though.

Despite the downsides, after everything I have read and experienced, I still consider TDD an excellent tool to have. I will keep using it and encouraging others to do the same.

References

George, B., Williams, L., (2003). An Initial Investigation of Test Driven Development in Industry

Janzen, D., Saiedian, S., (2005). Test-Driven Development: Concepts, Taxonomy, and Future Direction

Janzen, D. S., (2006). An Empirical Evaluation of the Impact of Test-Driven Development on Software Quality.

Sanchez, J. C., Williams, L., Maximilien, M, (2007). On the Sustained Use of a Test-Driven Development Practice at IBM

Bulajic, A., Sambasivam, S., Stojic, R., (2012). Overview of the Test Driven Development Research Projects and Experiments.

Causevic, A., Sundmark, D., Punnekkat, S, (2012). Impact of Test Design Technique Knowledge on Test Driven Development: A Controlled Experiment

Mäkinen, S., Münch, J., (2014). Effects of Test-Driven Development: A Comparative Analysis of Empirical Studies

Fucci, D., Scanniello, G., Romano, S., Shepperd, M., Sigweni, B., Uyaguari, F., Turhan, B., Juristo, N., Oivo, M., (2016). An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach

Goto Fail, Heartbleed, and Unit Testing Culture

Test-driven development - Wikipedia

A dissection of the test-driven development process: does it really matter to test-first or to test-last? Fucci et al., ICSE (2017)

A dissection of the test-driven development process: does it really matter to test-first or test-last? | the morning paper

Bertrand Russell - Face to Face Interview (BBC, 1959) - YouTube

Bertrand Russell - In Our Time BBC Radio 4 - YouTube

Bertrand Russell (Part 1 of 6) Authority and the Individual: Social Cohesion and Human Nature - YouTube

Top comments (22)

Eljay-Adobe • Sep 24 '18

One more thing... here’s a gem that is worth sharing, which I believe is attributable to Chris Raser.

The Cycle of Misery
by Chris Raser

The code for the new feature is “too simple to test.”
The code grows.
Each change is “too simple to test.”

The changes start to interact.
And produce strange edge cases.
Some of these edge cases are intentional, some are accidental.
Other code is built that depends on the current, undocumented behavior.
The actual requirements are lost to time.
The code is now “not really testable.”

The code has become thousands of lines of if-else chains.
Tons of copy-and-pasted code.
Rife with side-effect booleans.
It has half a billion possible states.
And cannot be understood by mere mortals.
Nor refactored without risk to the business.

A new feature is needed.
A bright, shiny new bit of code is created.
Because adding anything more to the old code is madness.
The code for the new feature is “too simple to test.”

Jack Senechal • Mar 8 '20

Exactly this. The "Gilded Rose" kata is a great illustration of this principle in action. I've had the pleasure, on no small number of occasions, of diving into those files with many thousands of lines, some functions in the many hundreds of lines, and trying to sort them out so that you can confidently add new functionality. As Sandi Metz illustrates on her excellent talk on the Gilded Rose, sometimes the best (only?) way out of those messes is to rewrite from scratch. If you don't have good, intentional, design-oriented tests to tell you what the intent of each part of the code is, you may well be looking at several months of effort to get back to full functionality. The only way to prevent it is writing good tests at every step. Whether TDD or ITL, get those tests in early while you know what it is you've written and why! >.<

Gabriel • Sep 23 '18 • Edited

The problem of TDD is that it requires a lot of mental strength and a solid development process to keep alive.

Developers will throw it out of the board as soon as their managers start breathing at their necks.

On top of this, it takes one dev getting away with poor testing for the whole team to cheat around the unit testing part as well, making it useless (actually, worse than useless, because they need to be maintained).

Edit - disclaimer: I do like TDD, although I think that the usefulness of it is not in having those unit tests in the first place but in forcing you to design your code better so it is testable, understandable, and with less redundant / complex parts.

Ruairí O'Brien • Sep 24 '18

That is all true. I find working in that scenario really taxing.

I have found developing a culture of automated testing can be really difficult but also really beneficial to a team though. I have seen teams that don't write tests at all, exhibiting much of the symptoms you mentioned, then transform into a team that really love testing.

I absolutely agree on the point of TDD leading to better design. I found when a team embraces TDD or at least good automated testing, pull request become a much more rewarding process and things like mob and pair programming become a lot more pleasurable and less exhausting. The conversation becomes about the best design and other real concerns.

It's a huge challenge to transform a team that does not embrace testing into one that does but if it can be done, it seems to be worth it.

Eljay-Adobe • Sep 24 '18 • Edited

In my experience — ergo anecdotal for everyone else — TDD works very well when you have the right language and tools to support it.

The tools I used were C#, Visual Studio, NCrunch, and NUnit. Not only did TDD work, it was also fun. Yes, fun. Not kidding. If you work in C# and use Visual Studio, try NCrunch and use either NUnit or xUnit.net and see for yourself.

If you have a language that supports contracts, TDD is superfluous. TDD is for languages that do not provide the ability to specify preconditions, postconditions and invariants. Contracts are something that has to be in the core language, not as a library bolt-on afterthought. Alas, there are not very many languages that provide that facility: Eiffel, D, Spec#, Midori M#. Some others I've heard support contracts, but I haven't played with them myself, such as Ada 2012, Clojure, Perl 6, Oxygene (Object Pascal dialect from RemObjects).

Full disclosure: I'm a D fanboy. Also a Midori M# fanboy, but alas that endeavor got crushed under the wheels of progress.

Some languages, like C++, are not amenable to TDD. The "one minute cycle" of make test, run test: fails, write code, run test: success, refactor, run test: success, check-in is nigh impossible when there is a lengthy build step for each test run. I hope C++20 will have contracts; fingers crossed. (Thanks to Gabriel Dos Reis, Garcia, Lakos, Meredith, Myers, and Stroustrup for working hard to make it happen!)

The other danger with TDD is putting in non-unit-tests in the mix with the TDD-style unit tests. When the unit tests get polluted with performance tests, integration tests, system tests, acceptance tests, security tests, functional tests, end-to-end tests, smoke tests, or behavior tests... then that's bad. Unit tests ought to be written by developers, and run in a debug build, and should take no more than a few seconds to run. The other kinds of tests ought to be written by the quality engineers (except for BDD tests in, say, Gherkin, which should be written by the PO perhaps assisted by BAs, within discussion by Devs & QE), should be run against an optimized release build, and may take hours or even days to run.

On my one project, the entire set of unit tests took seconds to run, and had about 70% code coverage. The integration tests took about 600 hours to run the full suite, if run in serial.

A couple videos on the topic:

I'm a fan of both James Coplien (author of Advanced C++; he's the second C++ programmer after Stroustrup), and Robert "Uncle Bob" Martin. I had a squeee of delight watching these two debate the merits of TDD:
Jim Coplien and Bob Martin Debate TDD

Another person I respect is Rainsberger. He's a strong advocate against using integration tests as a substitute for unit tests. He comes off as a bit abrasive (which I consider passionate), and his putdowns of integrations tests is in the context of using them as a substitution for unit tests. He's not against integration tests used as integration tests where they are suited. The way I think of it, integration tests (written by QE) are at the scope of bricks and mortar; unit tests (written by Devs) are at the scope of electrons and quarks. Completely different domains, for different purposes.
Integrated Tests are a Scam

sweeneyb • Sep 23 '18

Great post - it highlights a lot of what I've been grappling with over the past year.

"To me one of the biggest advantages (maybe the most under-appreciated advantage) of TDD is the design that it encourages." -- This has been my biggest takeaway (and thanks for putting it so eloquently). It's not enough to just write a bunch of tests that, once passing, will assure your feature will work. It's taking the opportunity to think about consumer usage and adapting your design to be simple, consistent, and easy to use. I wish I knew how to measure "better design" or its productivity gain or bug density reduction.

TDD has also helped me realize when I write code that isn't quite clear as it becomes hard to test. There may be a valid reason, but it triggers me to consider "are there separable concerns here? Can this be simpler?"

Disclosure: I've adopted something of a hybrid to TDD and ITL. I want awesome regression coverage, but I'm very much used to thinking in terms of solutions. For now, my test code tends to grow with my production code, informing the design and validating I'm addressing edge cases. We'll see if this is just a stopover on my way to TDD.

Thanks again - good read and very fair treatment.

Zuodian Hu • Sep 24 '18

Best analytical article I've personally seen on here, awesome!

For me, having testability and testing early in development, and particularly having a tight feedback loop, is good for my morale and confidence as a developer. Maybe it's the way I was brought up as a developer, but I get really nervous when I write too many lines of code without holding it up to some standard at run-time.

Like others have mentioned, though, adding unit testing to a body of code not designed for it seems like an utter nightmare most companies will forbid.

Vasyl Boroviak • Sep 24 '18

I've just created an account in dev.to to tell you that this is the best article on TDD I've seen in my life. Great job!

Ruairí O'Brien • Sep 24 '18

Wow. Thank you!

PhatHoang21 • Jun 21 '19

Same to you.

Ruairí O'Brien • Sep 24 '18

Really good points. Fred George mentioned something similar in this talk vimeo.com/79866979

I find myself asking the same kinds of questions when I write things like cloud functions on AWS or really small microsevices like you mentioned.

I do think there are definitely cases where TDD doesn't really help. Cloud functions, for example, allow you to set up automated tests that call the function and validate the result while it's deployed. I guess this good enough.

There's an argument against integrated tests that I love here: vimeo.com/80533536

I still find myself unit testing very small services to make sure I cover possible edge cases etc but often find I delete some or all of the test afterwards once they aren't useful any more.

I do end up working on a lot of different things like large web UI, desktop or fairly complex microservices too and getting to build very small services is still rare for me for now.

Ruairí O'Brien • Sep 24 '18

For sure. I would say though, I have been a developer joining teams that don't practice TDD or just good testing in any form and have succeeded in convincing the team to try it out. More than one team and almost without exception it has been worth it. It has just been really hard.

TDD is only a tiny aspect of the whole software life but its one that we as developers have more power to influence if we wish to.

There are loads of scenarios where TDD doesn't make sense. I am doing work on a small startup right now, prototyping. I do TDD parts that are critical or I want to design well but don't bother with much of it since it's a prototype.

It is great when you arrive at a company that is doing TDD already bit if you don't get that lucky, there is hope :) Really true comment you made though.

None None • Jan 21 '20 • Edited

Just doing Google around the place it does seem as though TDD has advocates who are perhaps a little more extreme in their advocacy than would otherwise seem... well... sane. There does seem to be a religious aspect to advocates of TDD that one doesn't seem to find among test-later waterfall-type developers.

To be fair I'm new to TDD though I've heard of others using it and I've been asked about it over the years. I've been writing commercial software since 1978 and I've seen these concepts come and go, never with testable claims being made, always with generalities of "this will benefit you" either by claiming shorter time to product completion or better quality control or cost reduction and yet never have any such proposed concepts of software development in the past 40 years that have come and gone stood the test of time.

TDD looks to me to be just another unworkable, even untestable concept. It's like "Agile" which also carries a religious following with no actual science-based, proven benefits.

HOWEVER: I'm willing to see if TDD actually does hold benefits. :) It's worth at least taking a serious look, seeing if it helps quality, reduces time to develop, and produced better code. IF IT WORKS, then it's certainly something I will want to add to my bag of tricks.

Becoming a better programmer -- even at my advanced age -- is always something to work for.

Sandor Dargo • Oct 22 '18

Thanks for your article. You put in a lot of efforts!

I personally like TDD, I think it's a very useful tool. And yes, it does require a lot of mental strength, but I think only because we don't accept to follow it all the time.

I mean there are certain things you simply don't do. You don't go and hit people who you don't agree with. You don't throw waste on the streets (some do...). And you don't have to make serious efforts to do so, because you learnt that's the way of doing things.

Could that be the case for TDD too?

Ruairí O'Brien • Oct 25 '18

Do you mean, TDD would be a less effort if we accepted it as a good way of doing things and everybody learned it from early on?

If so, I think yes. I do believe if TDD was the norm, we might be a lot better off. I believe it intuitively but can't prove it :)

None None • Jan 21 '20

We also must consider the role that Jenkins plays performing regression testing of code bases, ensuring that everything compiles and ensuring that every automated test passes during the night when developers are presumably at home asleep.

I don't know anything about TDD, I'm only just now looking at it, however I'm looking at TDD with Jenkins firmly in mind. I'm not at all convinced that TDD holds realizable benefits yet simply having a regression test platform in place, working, and actively examined in the morning to detect faults promptly is a proven benefit.

How TDD fits in to a model of Jenkins regression development cycles is a mystery to me as yet.