Matt Eland

Posted on Dec 24, 2019 • Originally published at killalldefects.com on Dec 24, 2019

Technical Debt as Risk

#communication #management #codequality #agile

Let’s say you take the leap and talk to management about technical debt in your codebase and the need to pay it down. One of two things can happen: you get a “Yes, go ahead and tackle this” or you get the dreaded “No, not right now”. This article is my advice for how to handle the dreaded “No”.

It involves looking at technical debt as risks.

Understanding the “No”

First of all, I’m assuming you did your work and prioritized and analyzed the technical debt that affects your code before asking for time to pay down the most egregious debt.

If you did and you were still told “no” or “not right now”, it’s likely not because management didn’t trust you or didn’t think it wasn’t a concern, but more likely one or more of the following:

Urgent priorities have to be accomplished first before more strategic work can be considered
Management didn’t fully understand the risk or ongoing costs associated with existing technical debt
Management didn’t trust the time and resource impact estimates provided to them on the effort
Management didn’t believe that the work could be done without introducing defects to users / stakeholders

For the purposes of this article, I’m largely going to ignore the estimation and quality aspects of these concerns, but you may want to read my article on safely paying down technical debt for some additional thoughts on the topic.

The fear many seasoned developers have when they hear “we have too many urgent things to spend time paying down debt” is that they’ll keep hearing the same refrain for the next six months or longer.

This is a legitimate concern, and I think part of this stems from the fact that technical debt primarily hurts the development team, but its effects spill out and indirectly hurt users, external stakeholders, and management. These ancillary effects are not often understood to stem from technical debt and that’s something we need to fix.

Management and Technical Debt

Another reason that technical debt doesn’t get prioritized is because it’s not well understood by management and therefore not at the forefront of their mind in the prioritization process.

This isn’t to say that management is heartless and doesn’t care about technical debt. What I’m saying is that technical debt is not what keeps them up at night.

I’m generalizing, I know, but my view is that when non-technical management thinks about technical debt, it’s often as a way of “keeping developers happy” and “doing things that a responsible business person in technology should do”.

These are good reasons to pay down technical debt, but when schedules get tight, these reasons will always be less important than finishing work to meet critical business objectives.

And this is where “No” and “Not now” responses to paying down tech debt come from.

Business managers are often extremely hardworking individuals in a world that is completely different than that a developer, tester, or development manager lives in.

That means they don’t understand the technology the way we do.

Management and Risk

Let me tell you what non-technical management does understand and care about: Risk.

Managers think about various types of risk all the time. Think about the following risks:

Customer retention numbers being lower than expected
The deal we’re working on falls through
Key employees leave us
A competitor unleashes a new capability that we don’t have
A startup challenges us in a way the revolutionizes our industry
A key competitive edge disappears
A key project is late, impacting other parties or breaching a contract
A new product is a flop

These are all significant risks that managers need to think about.

We can rephrase tech debt items as risks fairly easily, typically (and if we can’t, are these items really important or are they more of stylistic preferences?). Let’s take a look at some examples of technical debt stated as risks:

We could fix a bug in 3 places but miss the 4th due to code duplication
Weaknesses in the design of the current system could lead to slow user experiences at higher scales of usage
Inadequate security practices could result in breaches and legal liabilities
We could easily introduce new defects into the application due to our lack of unit tests
Onboarding new team members could take much longer given the complexity of the codebase
We could hit a point of terminal velocity where we are unable to implement features in an amount of time the business feels is reasonable because technical debt is slowing down development efforts
The lack of flexibility and extensibility of the code can lead to us saying no to feature requests that we would otherwise want to implement

I don’t know about you, but these examples of technical debt smell a lot like risks to me.

Additionally, when you can simplify technical details and remove jargon, things get a lot easier for business stakeholders to understand.

Clearly, the business needs to understand the high level risks we deal with as a technology team, because ignoring them is the equivalent of driving a car and treating refilling the gas tank as technical debt – if you ignore it, you’re eventually going to be stranded somewhere short of your destination.

Starting a Risk Management Meeting

What I propose is that development leadership needs to partner with business management stakeholders and invite them into the development organization.

Specifically, I advocate for holding monthly or quarterly risk management meetings in which business and development stakeholders gather to review a list of technical debt risks development has prepared.

The goal of this meeting is to make sure management is aware of new and existing risks, to discuss how these items have affected the team since the last meeting, and to talk about the relative priority and importance of each one of them.

This meeting should take somewhere between 30 and 60 minutes though it may take a little longer the first time.

In it, your agenda should be something like the following:

Go over the Risk Register (see next section for details)
1. Highlight any new Risks that were not being tracked last meeting
2. Highlight any risks that have changed in priority (severity or impact)
3. Highlight any risks that have been closed out
4. Highlight any risks that had tangible impacts since the last meeting
Discuss any new context either side needs to be aware of (e.g. new technology releases, changing project priorities, etc.)
Development will give recommendations on what risks should be prioritized in the product backlog
After the Meeting: Business should inform development of any changes made to the product backlog, or explicitly inform them that all priorities are remaining the same.

Your goal is not to get technical debt moved to the top of the priority. Instead what you are after is:

Establishing a relationship of trust, respect, and openness between technical and business leadership
Helping business management understand obstacles to success and the highest risk factors development deals with
Helping development leadership understand key business constraints and needs
Tracking the impact of technical debt on an ongoing basis
Ensuring that technical debt is prioritized appropriately

This last point is key. At its core, what we really want is that the business has all of the information it needs to make important strategic decisions around its development portfolio and that development leadership is equipped to help individual contributors understand the constraints the business faces.

Risk Registers

I mentioned the term risk register in the previous section. Let’s talk more about what those are and how they’re organized.

According to the Product Management Institute (PMI), a risk register is some form of repository for storing information on risks.

A sample risk register containing issues regarding drone hardware and software

You can easily use a shared spreadsheet in Excel or Google Sheets as a Risk Register.

What’s important is the information you include in the risk register. Let’s talk about some key fields you should include:

ID – the ID of the risk inside your spreadsheet or an external work item tracking solution such as Jira
Title – A short name for referring to the risk
Description – A few sentences describing the nature of the risk
Identified – The date the risk was originally identified
Status – The current status of the risk. Identifies if the risk has been resolved (Closed), has not been reviewed by the group (New), or has been reviewed but not yet acted upon (Open)
Component – The area in your software that the risk exists in. This is a hierarchical field with room for sub-components to identify specific parts of the software.
Strategy – The strategy your organization has elected for dealing with this risk. We’ll talk more on this in a minute.
Comments – Any comments as far as current remediation efforts or plans.
Probability – The probability that technical risks will affect future development work or production users.
Impact – The severity of the impact of the risk if it materializes. This ranges from a minor inconvenience to loss of data / production outages.
Priority – A priority for the risk based on its probability and impact.
Current Impact – This should track the ways the risk has currently manifested itself since identification.

Feel free to include other fields that make sense to you – Risk Owner , Work Item ID , and Estimated Effort to Resolve are all fields you might want to consider adding, for example.

Risk Mitigation Strategies

Just because a risk exists, doesn’t mean that it should be acted upon.

I recommend using the PMI’s list of risk management strategies since business stakeholders are more likely to be familiar with these. These strategies are:

Reduce – Here we’re trying to do work to reduce the risk’s severity and potentially even eliminate the risk entirely. This can be things like eliminating duplication, adding unit tests, and generally paying down technical debt.
Avoid – This is an odd one. Typically it’s for risks that are so severe or out of our control, but we want to take steps to prevent damage from them. As far as technical debt, this could refer to changing frameworks entirely, rewriting an application, or stopping future development.
Transfer – Risk transference involves making another party responsible for the risk should it occur. Insurance is a key example of this. This is one you typically won’t use very often with technical debt, though outsourced work could potentially be viewed as transference (though I’d still put it in the reduce bucket).
Accept – This refers to debt that you acknowledge is an issue and have chosen to accept any negative consequences it brings along instead of actively taking another strategy. High effort, low impact risks will typically get accepted.

As you review risks with business stakeholders, you’ll be putting them into one of these four buckets. Keep in mind that risks can change strategies, so don’t be discouraged if a key technical debt risk is marked as accepted.

Tracking the Impact of Technical Debt

I mentioned earlier how we should be tracking the impact of technical debt over time as part of these meetings.

This is where engineering leadership needs to do some heavy lifting.

While some software tools exist to identify and track code smells as likely sources of technical debt, no tool will ever be able to tell you how much time one of your developers lost last week due to a particular instance of technical debt.

This means that tracking technical debt is a human problem and it needs a human solution.

In order to track technical debt, you need to get the people it impacts the most, the developers, to track the way it impacts them in a reliable and repeatable way.

This is a subject I’m still exploring solutions for, but my best ideas center around:

Tagging time entries in time management systems with a Tech Debt tag and a comment linking back to an item on the risk register
Adding a custom field to your work item management tool to track the amount of time has been lost to each type of technical debt
Developers E-Mailing development leadership once a week with estimates of time lost to specific technical debt incidents
When defects arise as a result of technical debt, link those items to the technical debt item that helped make them possible

None of these answers are perfect, but you’re looking for a reliable low-accuracy way of seeing how technical debt is impacting you over time, these are all things to consider.

Once business is able to see the way that technical debt is hindering the organization, it will start prioritizing and targeting the most dangerous pieces of technical debt, which is what you’re after to begin with.

Closing Thoughts

I do want to stress here that the desire to get business management involved in managing technical risks should never be seen as a ploy or Trojan horse. This should be a sincere and humble effort to help the business understand more of the dangers you are concerned with as technical leaders and a way of building respect, trust, and an open and honest partnership.

Put simply, this won’t happen if you are only looking for a “Yes”. You have to be willing to hear a “No” or “Not now” and understand it, and adjust your priorities to what is truly best for the business while helping them understand and prioritize the things they don’t see.

If we can do this as technical leaders everyone will benefit.

The post Technical Debt As Risks appeared first on Kill All Defects.

Cover Photo by Lubo Minar on Unsplash

Top comments (1)

Daniel Veihelmann • Dec 24 '19

Thanks for this post!
I wanted to note that there are tools out there that help you managing technical debt (e.g. long methods, as mentioned in the risk management screenshot).

(I'm part of a company that develops such a tool, but I won't mention names unless asked ;-) )