loading...
Cover image for The Fallacy of DRY

The Fallacy of DRY

jeroendedauw profile image Jeroen De Dauw Originally published at entropywins.wtf ・6 min read

Originally posted on my blog as The Fallacy of DRY.

DRY, standing for Don’t Repeat Yourself, is a well-known design principle in the software development world.

It is not uncommon for removal of duplication to take center stage via mantras such as “Repetition is the root of all evil”. Yet while duplication is often bad, the well intended pursuit of DRY often leads people astray. To see why, let’s take a step back and look at what we want to achieve by removing duplication.

The Goal of Software

First and foremost, software exists to fulfill a purpose. Your client, which can be your employer, is paying money because they want the software to provide value. As a developer it is your job to provide this value as effectively as possible. This includes tasks beyond writing code to do whatever your client specifies, and might best be done by not writing any code. The creation of code is expensive. Maintenance of code and extension of legacy code is even more so.

Since creation and maintenance of software is expensive, the quality of a developers work (when just looking at the code) can be measured in how quickly functionality is delivered in a satisfactory manner, and how easy to maintain and extend the system is afterwards. Many design discussions arise about trade-offs between those two measures. The DRY principle mainly situates itself in the latter category: reducing maintenance costs. Unfortunately applying DRY blindly often leads to increased maintenance costs.

The Good Side of DRY

So how does DRY help us reduce maintenance costs? If code is duplicated, and it needs to be changed, you will need to find all places where it is duplicated and apply the change. This is (obviously) more difficult than modifying one place, and more error prone. You can forget about one place where the change needs to be applied, you can accidentally apply it differently in one location, or you can modify code that happens to the same at present but should nevertheless not be changed due to conceptual differences (more on this later). This is also known as Shotgun Surgery. Duplicated code tends to also obscure the structure and intent of your code, making it harder to understand and modify. And finally, it conveys a sense of carelessness and lack of responsibility, which begets more carelessness.

Everyone that has been in the industry for a little while has come across horrid procedural code, or perhaps pretend-OO code, where copy-paste was apparently the favorite hammer of its creators. Such programmers indeed should heed DRY, cause what they are producing suffers from the issues we just went over. So where is The Fallacy of DRY?

The Fallacy of DRY

Since removal of duplication is a means towards more maintainable code, we should only remove duplication if that removal makes the code more maintainable.

If you are reading this, presumably you are not a copy-and-paste programmer. Almost no one I ever worked with is. Once you know how to create well designed OO applications (ie by knowing the SOLID principles), are writing tests, etc, the code you create will be very different from the work of a copy-paste-programmer. Even when adhering to the SOLID principles (to the extend that it makes sense) there might still be duplication that should be removed.The catch here is that this duplication will be mixed together with duplication that should stay, since removing it makes the code less maintainable. Hence trying to remove all duplication is likely to be counter productive.

Costs of Unification

How can removing duplication make code less maintainable? If the costs of unification outweigh the costs of duplication, then we should stick with duplication. We’ve already gone over some of the costs of duplication, such as the need for Shotgun Surgery. So let’s now have a look at the costs of unification.

The first cost is added complexity. If you have two classes with a little bit of common code, you can extract this common code into a service, or if you are a masochist extract it into a base class. In both cases you got rid of the duplication by introducing a new class. While doing this you might reduce the total complexity by not having the duplication, and such extracting might make sense in the first place for instance to avoid a Single Responsibility Principle violation. Still, if the only reason for the extraction is reducing duplication, ask yourself if you are reducing the overall complexity or adding to it.

Another cost is coupling. If you have two classes with some common code, they can be fully independent. If you extract the common code into a service, both classes will now depend upon this service. This means that if you make a change to the service, you will need to pay attention to both classes using the service, and make sure they do not break. This is especially a problem if the service ends up being extended to do more things, though that is more of a SOLID issue. I’ll skip going of the results of code reuse via inheritance to avoid suicidal (or homicidal) thoughts in myself and my readers.

DRY = Coupling
– A slide at DDDEU 2017

The coupling increases the need for communication. This is especially true in the large, when talking about unifying code between components or application, and when different teams end up depending on the same shared code. In such a situation it becomes very important that it is clear to everyone what exactly is expected from a piece of code, and making changes is often slow and costly due to the communication needed to make sure they work for everyone.

Another result of unification is that code can no longer evolve separately. If we have our two classes with some common code, and in the first a small behavior change is needed in this code, this change is easy to make. If you are dealing with a common service, you might do something such as adding a flag. That might even be the best thing to do, though it is likely to be harmful design wise. Either way, you start down the path of corrupting your service, which now turned into a frog in a pot of water that is being heated. If you unified your code, this is another point at which to ask yourself if that is still the best trade-off, or if some duplication might be easier to maintain.

You might be able to represent two different concepts with the same bit of code. This is problematic not only because different concepts need to be able to evolve individually, it’s also misleading to have only a single representation in the code, which effectively hides that you are dealing with two different concepts. This is another point that gains importance the bigger the scope of reuse. Domain Driven Design has a strategic pattern called Bounded Contexts, which is about the separation of code that represents different (sub)domains. Generally speaking it is good to avoid sharing code between Bounded Contexts. You can find a concrete example of using the same code for two different concepts in my blog post on Implementing the Clean Architecture, in the section “Lesson learned: bounded contexts”.

DRY is for one Bounded Context
– Eric Evans in Good Design is Imperfect Design

Conclusion

Duplication itself does not matter. We care about code being easy (cheap) to modify without introducing regressions. Therefore we want simple code that is easy to understand. Pursuing removal of duplication as an end-goal rather than looking at the costs and benefits tends to result in a more complex codebase, with higher coupling, higher communication needs, inferior design and misleading code.

Posted on by:

jeroendedauw profile

Jeroen De Dauw

@jeroendedauw

I'm a Software Craftsmanship advocate best known for my contributions to Wikidata and Semantic MediaWiki, and my maintenance of various open source projects.

Discussion

pic
Editor guide
 

I have often noticed that applying DRY to two things which are similar, but not the same, becomes a painful maintenance experience. Trying to maintain a unified abstraction between the two will often be harder due to coupling than duplication will. There is also the idea is that often we refactor similar code into a single abstraction for DRY purposes too early, before the abstraction becomes clear. Then we fight with an ill-fitting abstraction as we maintain the code. I like how Sandi Metz put it: "Duplication is better than the wrong abstraction."

I tend to use DRY only when it is obviously applicable. E.g. I have the exact same function in two places. But if there are some variances, I go ahead with duplication until it becomes clear that they should stay different or that they are the same.

 

Good article.

Along similar lines, I was told "DRY is about knowledge, not code".
Duplicate code isn't a huge issue, repeating business logic in multiple places is more of a problem for maintainability.

 

+1 for pointing out the coupling and complexity issue that may come with 'DRY' abstraction. Some concrete examples of where you might refrain could help the discussion. Where do we draw the line?

 

Do you have any suggestions for good examples?

I did describe some scenarios using words rather than code. These are quite simple ones, yet the code would take up a bunch of space. Do you think it would be better to provide code examples for those anyway?

I suppose I could add this one somewhere:

for($i=0; $i<4, ++$i) {
    doAction($i);
}

vs

doAction(1);
doAction(2);
doAction(3);
 

I like your example, it's very concise. I'd definately prefer the latter version.

Now I owe you an example trying to get at 'where to draw the line'.

Let's say I have 2 types:

public class SupportFee
{
    public int Id { get; set; }
    public string Name { get; set; }
    public bool Active { get; set; }
}

public class HardwareFee
{
    public int Id { get; set; }
    public string Name { get; set; }
    public bool Active { get; set; }
}

If I were to write code to maintain these sets I would end up with for instance 2 controllers and a couple of views to support them. They'll be very similar, but it's okay. At this point it would not bother me.

But now we add a couple more types, for instance DeviceType, ConfigurationType and ImageType. They have the same properties and adhere to the following interface:

public interface IOptionType
{
    int Id { get; set; }
    string Name { get; set; }
    bool Active { get; set; }
}

The example is taken from an actual application I wrote. I has about 10 types that adhere to the IOptionType. Also, all of them are small sets. At this point it makes sense to have an abstract controller that I inject with mapping objects, providing default views. It's just cumbersome to write the same code over and over again.

Now time passes, feature requests come in... If over time any type deviates from the IOptionType or outgrows being a small set I can still decide to write dedicated code for it.

So there's an example where the fine line is around 3 or more similar types. The DRY abstraction helped me a lot.

I think that the 3 or more criterium is nice as rule of thumb.

Thanks for the more verbose example.

Even with all this description I do not understand the situation well enough to say much about it, which makes me think this does not work as an illustration of how to decide if unification is desired or not.

One thing I'm wondering about is why not do something like

public class Thing
{
    int Id { get; set; }
    string Type { get; set; }
    string Name { get; set; }
    bool Active { get; set; }
}

In other words, having a field that indicated the type, and then naming the class to something meaningful in your domain. Presumably also without the mutability part, so you have a nice Value Object (assuming no domain logic belongs on the thing itself).

Generally I find it odd to have objects that just contain data have an interface for them, such objects being mutable, or interfaces containing an interface prefix or suffix.

Anyway, I will keep the lack of examples for this post in the back of my mind and hopefully come back in some time to mitigate it.

I think this is a fine example where unification makes sense. The abstraction saved me a lot of work, outweighing coupling or complexity issues, whereas with only 2 OptionTypes it would've been silly to go that route.

The Thing class wouldn't help our code much. Even though they're small sets, they have discrete types and tables. And overtime some of these types did change and get extra properties. Also I prefer to have meaningful names, both in the database and the code. This also helps the guy that makes the occasional report using the database.

But that is another discussion really. Just like the I-prefix.

 

Good article! Four thoughts:

(1) Dry spaghetti is still spaghetti.

(2) Inheritance, abstract classes, and the like (as well as principles of generic programming) can all be used to create very clean and maintainable code. (Check out my company's PawLIB library for one such example...although I might be biased?)

However, you are correct - those same tools can just as easily be used to make spaghetti. It would be hard to write down all the rules about that here, but there are definitely more than just "DRY".

(3) The other issue that DRY can introduce into the code is a plethora of instruction-cache misses, which occur virtually every time you are calling the 'jump' instruction under the hood (function calls, etc.) We can't avoid jumps altogether - we'll need a lot of them in good code - but either having too many jumps or having them in the wrong places can tank your program's performance.

(4) At least in C and C++, macros are an underappreciated tool at our disposal. (Yes, I hear readers screaming in agony, but hear me out.) If a piece of code is heavily used, but either performance-critical, or otherwise not well suited to isolation into a separate function, macros work well for preventing unnecessary duplication, without introducing instruction cache misses (the cost is increased compiler time).

Of course, again, macros can be abused to create spaghetti code.

 

I'm not convinced that using inheritance to deduplicate code is bad. I did this in the past and it worked quite fine for me. No need to demonize the tool, it has its use and "prefer composition over inheritance" points to that as well. Prefer, but not stick to it forever. Composition also has its toll. You might need to supply the service from the outside (dependency injection), store it as a singleton or use a service locator. If the duplicated code isn't that big, the increased complexity of the approach is obvious.

I would say, it requires experience. In time, creating your own programs, maintaining them, looking at someone else's code will bring you there. You'd be able to choose the right way without going to extremes like "no inheritance anywhere!" or "dependency injection everywhere!" or "I saw that line already, I'm moving it into a function! No, to a separate class!".

Every tool is good for its own job.

 

Fully agree that it takes experience to build good judgment and that there tend to be exceptions to rules.

I'm being a bit flippant about the code reuse via inheritance topic because I so often see people shoot themselves in the foot. There are times when it is a good choice though, but in my experience they do not occur often.

As to every tool being good for its own job: the thing with inheritance in most languages is that it does two completely different things: code reuse and sub-typing.

Anyway, that really is not what my post is about, and is a topic that takes more than a few short lines to do justice.