If you ask developers what their most productive days look like, you’ll surely get many mentions of the mythical days when they contributed negative lines of code to their projects.
People like to celebrate these days and they do it for good reasons: they likely eliminated some bugs, made the code simpler, cleaned up some technical debt. As a side effect their project compilation and runtime are now faster.
Where I work, we have a huge monolith that organically evolved and accumulated a huge amount of cruft. Last year we even spent some time reducing our technical debt. I spent that period almost exclusively getting rid of a huge amount of code and since then my coworkers use the expression going Wario when they are deleting some old code.
Deleting code is easy, knowing what to delete is definitely more challenging. The last thing I like to do is to reinstate something I removed because it turns out that something was actually still relying on it. We need some good ways to determine what code is dead with an high degree of confidence.
A very deep knowledge of the codebase is crucial and is what will allow you to obtain the best results but there are some tools that can be extremely helpful and show you what parts of your application aren’t useful anymore.
Knowing when and why some code was written for in the first place is invaluable and can give the best insights on whether you still need it or not.
As an example, think of code to read different version of a file format or older version of an HTTP API. In both cases, you cannot guarantee that this code has no uses just by looking at it. But you might know that all the files you care about have been moved to the new format or that all the clients are using the new version of the API. Congratulations, you’ve just identified good candidates for a nice clean up.
In a legacy codebase you’re likely to have many different ways of doing something. It might be a good idea to uniform your application to always use the latest approach. This will give you access to all the latest features and it will also allow you to retire the code related to the old approaches. Unfortunately, usually knowing how to migrate the new approach is not straightforward, otherwise someone else would have done it already. If that’s the case, you need to understand how the two approaches differ, what is really required by the use case that is still relying on old code and how you can replicate the functionality using the new code. It is usually possible, but it requires a lot of understanding and a fair amount of effort.
At work we mainly use C#, so we can leverage static analysis like the one performed by the compiler itself, by ReSharper and by ndepend to identify dead code. Cleaning it up is a good start, but it won’t get us too far: the most of the unused code that can be removed is code that is technically reachable.
For instance, all the code I mentioned in the previous examples would not be flagged by static analysis tools. For what they know, I might still have an old file around so they must produce conservative results. They assume that if there is a codepath that leads to the execution of a line of code it will eventually be reached.
That’s a perfectly valid approach: I would be very worried if these tools would give me unsound results.
Still, static analysis is very useful end extremely effective when, after I found what feature I can eliminate, I want to easily get rid of all its related code. But before we need to find the entry point to that feature using other means. Once that’s deleted all the rest will follow.
If we cannot know what code is actually used just by analyzing it, can we get some more information if we instrument so that we can monitor how it’s actually used?
The answer is yes and there are several approaches that can lead you to achieve very good results, but you need to watch out for false positives. Dynamic analysis is not exhaustive so you need to make sure that you’ve waited long enough before you can declare a piece of code as dead.
Recording in a database all the command line arguments passed to your application will let you know what is actually in use. There is no point of support the functionality behind a command line switch if nobody ever uses it.
If you suspect that something is dead, you can use the tombstoning approach, which is a bit more invasive but it’s surely going to give you important information with a better level of detail.
Once you’ve collected data for a while you will know what is definitely used, what is likely to be unused and when something has been used for the last time. This is precious intelligence and it will be extremely helpful in identifying what you should try to delete next.
Deleting code is not always straightforward but with a good knowledge of the application you’re working on, and with the help of some tooling it is possible to achive very good results.
If you’ve read so far, I’d like to recommend you Greg Young’s The art of destroying software. It’s not going to teach you how to delete code, but it will give you some ideas on how to optimize for deletability the new code you’ll write so that deleting it, once you don’t need it anymore, will be as easy as possible.