Duplicated code: should we always try to avoid it?

#dotnet #architecture #csharp

Introduction

Duplicated code and copy-paste are almost always the first things criticized in code. It can be said that they are often seen as a red traffic light signal, causing concern and a desire to eliminate it, often prioritizing this over other tasks. But is duplicated code always a cause for alarm?

I'll say unequivocally: not always!

Duplicated code is simply a signal to pay attention to that code fragment, carefully analyze the architecture and the tasks that the code performs in both (or more than both) cases, and only then, if necessary, refactor it!

Duplicated code in Anemic Models

Let's consider a trivial task: We have a relational database from which we select data using a service that transforms it into a DTO object and let it look like this:

    public class ClientDto
    {
        public string ClientName { get; set; }
        public string ClientDepartment { get; set; }
        public string CompanyName { get; set; }
        public string? Phone { get; set; }
        public string? Email { get; set; }
        public string? Website { get; set; }
        public string? MailingAddress { get; set; }
        public string? CityStateZip { get; set; }
        public decimal Balance { get; set; }
        public int DistanceMeters { get; set; }
    }

Let's imagine that we invoke this method from an ASP.NET application controller, and this controller method creates a View model and passes this model to the View itself. In most cases, this View model class will be remarkably similar to our DTO class or even identical to it, for example:

    public class ClientVm
    {
        public string ClientName { get; set; }
        public string ClientDepartment { get; set; }
        public string CompanyName { get; set; }
        public string? Phone { get; set; }
        public string? Email { get; set; }
        public string? Website { get; set; }
        public string? MailingAddress { get; set; }
        public string? CityStateZip { get; set; }
        public decimal Balance { get; set; }
        public int DistanceMeters { get; set; }
   }

Here we see all signs of code duplication: if a new field appears, we have to add it to both classes; if a field is renamed, we must remember to rename it for both classes (otherwise, we'll see unexpected results, especially if using tools like AutoMapper where we don't explicitly assign each field), and so on.

But don't rush to sound the alarm, blaming yourself for almost making a Junior’s mistake and urgently create a new abstract class (let's call it ClientAbstract) from which both classes would inherit. Because in this case, duplication of code is actually correct! Yes, you heard it right. Between the two evils—duplication and mixing responsibilities—choose the lesser one, which is duplication. Because duplication is visible here on the surface and doesn't lead to far-reaching consequences in architecture.

I could argue that creating an abstract class for these two would mix different architecture layers, but you could counter that we can place this abstract class in a separate assembly without any dependencies, and you would be partly right (only partly because the service returning ClientDtorepresents high-level business rules primarily dictated by the business itself and are immutable in the software context, defining how the application should behave rather than the other way around, whereas ClientVmis a low-level web implementation, merely a frontend that can change in any moment, not related to the business as such).

The Single Responsibility Principle (SRP) tells us that a class should have only one reason to change. For a business logic service, this reason is changes in business rules. For the web layer, it's about how this specific application should present the data. The web layer is likely to change much more frequently and unpredictably than the business logic. Each change in the web layer would likely require to change the foundational abstract class, ClientAbstract, which should be as stable as the business logic itself. The Stable Dependencies Principle (SDP) tells us that fewer stable components, which are more prone to change, should depend on more stable ones. In our case, the web layer should depend on the business logic, not the other way around.
Imagine that tomorrow clients decide that the variable Balance, currently displayed by you as a decimal number, doesn't look right for web pages (and I fully agree with them on this!) and needs to be displayed with a '$' sign, with separators, and up to two decimal places. Additionally, DistanceMetersshould be displayed either in meters or kilometers. Thus, the view model class takes the following form (with these transformations being performed at the controller level):

    public class ClientVm
    {
        public string ClientName { get; set; }
        ......
        public string BalanceDollars { get; set; }
        public string Distance { get; set; }
   }

And you will have to remove these 2 fields from the abstract ClientAbstract, add them directly to ClientDto (as they were declared in ClientAbstract), and then test everything to try to ensure that nothing is broken (but programming is not mathematics in this sense, in programming you cannot simply prove that everything is fine like a theorem in mathematics; we can minimize the probability of error, but we cannot eliminate it. Any changes still potentially carry the probability of error. The principle of "if it works, don't touch it" is exactly about this).

This simple example shows that sometimes there is no need to pathologically try to eliminate duplicated code; sometimes it is the norm.

Duplicated code in Logic and Algorithms

I encountered the viewpoint that the DRY principle applies to algorithms rather than to anemic models. However, let’s consider the following example:
Imagine a bacterium that splits into two every 1 minute. After 1 minute, there are 2 bacteria (the first generation); after 2 minutes, there are 4 bacteria (the second generation), and so on. We can write our genius algorithm like this:

   public int GetBacteriaCount(int generation)
        {
            return 2 ^ generation;
        }

Excellent. Also, in another part of our project, we need to calculate how many ancestors a human has in a given generation. In the first generation, there are 2 ancestors (father and mother); in the second generation, there are 4 (2 grandfathers and 2 grandmothers), and so on.

        public int GetAncestorCount(int generation)
        {
            return 2 ^ generation;
        }

The second algorithm is significantly similar to the first! We recall our magical “Don’t repeat yourself” and combine both into one. Everything works!
But what combines these two algorithms? Not in terms of mathematics and calculations, but with regard to business rules? Does the process of bacterial division have anything in common with our family tree? Are these two processes (one of which is even already completed, in the past) governed by the same rules? No!
When we try to calculate our ancestors in the 50th generation (which is only 1000-1500 years ago), we end up with a huge number, far exceeding the count of all people living today. Why is this? Because our ancestral lines overlap many times. We didn’t have billions of great-great-…-grandmothers and great-great-…-grandfathers 1000 years ago; instead, the actual number was much smaller (and many of these ancestors are common to multiple people! Love your neighbor, he/she is your relative anyway  ). Therefore, we need a coefficient to account for this overlap, such as:

        public int GetAncestorCount(int generation, double k)
        {
            return k * (2 ^ generation);
        }

Alternatively, it could be an external service injected into our class that returns the coefficient based on the generation, such as:

        public async Task<int> GetAncestorCountAsync(int generation)
        {
    var k = await _extService.GetCoefficientAsync(generation);
               return k * (2 ^ generation);
        }

Or it could be another method for calculating the correction.
What about bacterial division? Bacteria are limited by their environment, nutrients, and other time-dependent factors. So, we can represent it as follows:

        public int GetAncestorCount(int generation, int minutes)
        {
              .....
        }

Or again, with the injection of some external service. In any case, we see that the development paths of these algorithms have diverged, even though they were initially identical! This is because they are governed by entirely different business rules and have different reasons for changes. Therefore, this is also a case where it is better to keep duplicated code rather than trying to refactor this code.

What about DRY principle?

By the way, let me say a few words about the DRY principle: If you think about the meaning of 'Don’t Repeat Yourself' calls for 'not repeating,' which doesn’t explicitly imply 'duplicated code.' 'Not repeating' means not creating THE SAME code. This can refer to duplicated code where refactoring is possible and necessary, as well as completely different code that performs the same task. For example, two different algorithms that solve the same problem! This is a violation of the DRY principle, unlike the cases described earlier in this article

Conclusion

Again, when you see duplicated code, ask yourself first: "Will it really be changed for the same reason in both cases?" If yes, then it probably should be refactored to eliminate duplication. If not, perhaps those two similar algorithms that perform completely different tasks should remain similar? Perhaps if we need to add changes to one of them tomorrow, we should be able to do so without affecting the other?

P.S. In zoology, there is a concept known as "convergent evolution" - when completely different species of animals (plants/fungi) have external similarities or other similar features. But we cannot claim that snakes, for example, are close relatives of worms, or that whales are close relatives of fish, right? Snakes are much closer to turtles, even though their body shape resembles worms more. But this similarity is only due to a similar mode of movement and nothing more; in other aspects, they are quite different. The same applies to class inheritance. When you see common fields or behavior among several classes, ask yourself: "Is this similarity only superficial and only at this moment? How might both classes evolve under different external conditions, and are there common grounds for them to evolve together, meaning changes in one automatically necessitate similar changes in the other?