Technical Debt (4 Part Series)
Technical debt: We all have it and every developer worth their salt wants to pay it down, but how do we actually manage this process?
In my article on communicating technical debt I compared paying down technical debt to the importance of crop rotations in farming. If you keep working your field (the codebase) season after season to harvest a large load of crops (completing projects, adding features, etc.) and don't allow for that field to recover for a season (paying down technical debt), it starts to lose its quality and overall yield over time.
This remains an apt metaphor for software development and it holds some hints at strategies we can consider when paying down technical debt.
Surprisingly, there's a wide variety of ways to pay down technical debt. This is very helpful, since it gives us a lot of options to consider for planning purposes.
We're going to assume that you're working in an agile development for the purposes of this article, though much of this is still relevant to those using other methodologies with a little creative reinterpretation.
In the most literal sense of crop rotation, we could literally stop feature work for one sprint out of every four or so and only pay down technical debt in that sprint.
- Developers all share the same morale boost of focusing only on paying down technical debt
- Developers can coordinate work to pay down larger portions of technical debt in tandem
- The business is encouraged to look at the results of technical debt, exposing the importance of the work and the remaining work left to do
- Merge conflicts are going to abound as people make larger sets of architectural changes.
- With so much chaos and instability it may become difficult to determine when something broke unless there is good test coverage in place
- Support incidents don't stop just because you're working on technical debt. Without resources available to help with support escalations, technical debt is guaranteed to be disrupted by support interruptions.
In this model, the agile team reserves a fixed point count or percentage of overall capacity for the sprint in order to pay down debt on an ongoing basis. For example, every sprint a team might take 5 story points of assorted tech debt payoff work.
- This ensures that paying down tech debt is part of the ongoing culture of the organization
- Ongoing tech debt work can help prevent more extreme work later
- If sprint modifications are necessary, tech debt work is likely to be the first thing removed from a sprint.
- By capping the tech debt work at a smaller portion, it makes it harder to handle the occasionally larger pieces of technical debt
This is a hybrid of the last two approaches. Each sprint one developer is picked to work on technical debt pay down while everyone else works entirely on traditional work.
- You keep some of the same morale boosts of a dedicated sprint for technical debt payoff
- By having other people actively working on standard items, product management doesn't need to put initiatives on hold
- Support interruptions don't necessarily have to interrupt technical debt payoff work
- Allows for larger pieces of technical debt to be paid off in a single sprint
- Larger amounts of focused work on technical debt can still result in significant merge conflicts, though the risk isn't as great
- Technical debt resource is still the most likely to be interrupted and pulled in if extra capacity is needed
Under this model, as developers estimate and plan work, they plan on cleaning up the surrounding code and paying off incidental technical debt already present in the area. This follows the boy scout principle of always leaving a camp site (the codebase) cleaner than you found it.
Put another way, this implies that as code is touched, it becomes better. The code that is touched frequently is where you're going to be paying the greatest interest on technical debt, so it makes sense to pay things down in areas that are being worked on.
Malcolm Gladwell discusses a similar concept in his book The Tipping Point citing an example from the New York City subway system. The transit authority found that by isolating subway cars and cleaning them of graffiti and then ensuring that they remained free of graffiti, they could cut down on the broken windows effect where people would assume nobody cared and crime could run rampant. By cutting down on fare jumpers and graffiti, they cut down on violent crime in the subway system by extension.
Applied to our codebase, by ensuring that as areas of code are touched, they are cleaned up and technical debt is repaid.
From the paragraphs above you can probably guess that I'm a fan of this approach, but let's look at the pros and cons.
- Technical debt is paid down in areas that are naturally touched more frequently
- You no longer need to "make room for" technical debt payoffs, it's just part of the process
- Merge conflicts are minimized as changes occur only in isolated areas
- Not as capable of making larger system-wide changes
- Causes an inflation of story points due to extra work being done for every ticket. This effectively reduces in the amount of work that can be done in every sprint
Above the strategy of paying off technical debt was to replace the system part by part, as in the ship of Theseus thought experiment but what if that's not enough? What if you don't have time to replace your entire software bit by bit and you need to make some more drastic changes?
Here are a few ideas that might help.
Under this methodology, you split up a monolithic application into smaller applications. This is often paired with Domain Driven Design and/or microservices, but the core of the approach is that if an application is too large to replace, split it into smaller chunks until they can be feasibly replaced, then replace each part, chunk by chunk.
This can also be accomplished using Martin Fowler's Strangler Application Pattern in which a new application comes along that receives the same requests that the old one did, and calls out to legacy systems until each call has a modern replacement ready for it.
- Allows you to strategically handle the most important aspects of a rewrite without tackling all of the rewrite
- Smaller portions of applications make full rewrites down the road far less likely
- Initially this just adds complexity without paying down debt
- This introduces more points of failure and more varieties of problems
Under this model, developers can take spare time available or dedicated technical debt time and work on longer-term projects such as replacing an entire application or a portion of one. Once enough progress has been made and work is ready to begin in earnest, the work can be rolled into a sprint or series of sprints for formal implementation and delivery.
- You are allowed to discover and eliminate risk in potentially throw-away prototypes not tied to formal QA / Delivery cycles
- When work is ready for inclusion in a sprint, it's usually very focused work with most unknowns resolved
- It can be difficult to find significant amounts of time to do the out-of-band prototyping unless your organization is willing to reduce resource allocations on other projects.
In this model, all work on the old application stops, save for critical bugfixes and efforts begin on a full replacement application. This is typically what people think of when they talk about rewriting an application.
- Resources are able to focus on the new system without as many considerations of the existing system
- Overall time to complete may be faster
- With a very low time to delivery, business can feel that they are wasting money on the project
- Schedule overruns can prevent necessary work from being delivered
- Essentially becomes an "all or nothing" type of approach
- May not fully vet all risks before committing to the new platform project
Keep these options for perpetual renewing of applications in mind as well as the more radical options. There is no one best option in my opinion, only the ones that work best for your team, product, and organization. Take a look at these ideas and identify which options work best for you as you look to set up a "crop rotation" to keep your codebase healthy over time.
Next time in this series I'll talk about ways of ensuring quality and correctness as you pay down technical debt.
Photo by Sveta Fedarava on Unsplash