DEV Community

What's your approach to fixing a "long-term" hairy bug?

Ben Halpern on June 14, 2018

You come across an issue that just ain't getting fixed in a short amount of time? What goes through your head, what do you begin to think about?

Paula George • Jun 14 '18

redo the whole project lol

charliedeveloper • Jun 14 '18

not necessarily what exactly needs to happen to fix it, but how can the whole team be involved in fixing it so that it doesn't happen again. With everyone getting involved it:

makes everyone realise what level of effort it takes to fix it if they let it happen again.
prevents knowledge silos
collaborative code on complicated legacy projects usually looks less like spaghetti than if approached individually.

Paula George • Jun 15 '18

if the bug is hairy as u say i will start with roots and re-build the whole code without rewriting it just double check everything and i hope u were using any kind of version control just print everything

Gunnar Gissel • Jun 14 '18

My thoughts turn to mitigation and isolation.

First things first, how can I make it livable for however long I have to live with it? Can I put a cron in to restart a troublesome service? Can I add more retries to talk to a box with a bad network? Can I wrap a bad library with something that makes it less bad?

Second, how can I contain the problem so when I want to fix it properly I can do it quickly and without hurting the rest of the system? Interfaces, facades, etc.

Finally, I like to document how much the issue hurts the team in terms of lost productivity and hours spent messing with it - never hurts to have ammo to convince stakeholders and supervisors that we should make a persistent problem a priority.

Gokul Kathirvel • Jun 14 '18

The greatest part is the users might get used to it and fixing the bug may create churn 😁😆

Stephanie Handsteiner • Jun 14 '18

“It's not a bug, it's a feature (now)!“ :D

Ben Halpern • Jun 14 '18

Given that I have the privilege of just hacking and making it up as I go, this is half my day I swear.

George Offley • Jun 14 '18

First thing I usually think about it if the code is designed correctly. There is usually a design flaw that causes those bugs. At least in my experience. Lay out the steps you need to do in the simplest form and see if there is a better way to do it.

Keith Showalter • Jun 14 '18

Project stopping bug early in a development? Start from nothing and see where things went wrong.

Project stopping bug later in development? Start breaking the data flow apart to see where things go wrong.

Anything else gets prioritized based on how critical (work around available?), and how burnt out I am dealing with that particular issue.

Jibin Philipose • Jun 15 '18

I guess this is where IDE capabilities are the most helpful because nobody can debug like the IDE debugger, through breakpoints and narrowing down the function or variables which are responsible for doing something but are messing up big time.

Mitch Jackson • Jun 14 '18

How long has this bug existed, and is it generating issues for customers?

Can I present a work-around approach to the team to mitigate the bug's impact?

Can I make a business case to management/client to allocate resources to fix this bug? What are the costs to repair the bug? What are the costs to ignore the bug?

Paul Isaris • Jun 17 '18

First, I try to log every step of the buggy method into a log file (especially when it is on a production server).
Then I take an abstract look at the method design, to make sure I don't have any design gotchas.
Then, I retrace the call and check the platform/plugin details (if for example it is a plugin that is used in a mobile app).

Frank Carr • Jun 18 '18

I just work the problem in the Agile way with a combination of spikes and user stories, maybe contained within an epic if it's that big and hairy.

I've got one right now that I'm working on that's been a bit tough to define. I've been working this spike for the entirety of the current sprint. Users tell me it's a big problem but can't really give me a clear description of what is needed to correct the issue. It's probably part business process/policy with software changes to match the desired policy. Getting to the desired policy though is where there's a lack of agreement. Until product owners agree, it will be difficult to move forward.

In other cases, it's been a bit easier but getting to the root cause (to use one of my least favorite buzz words) of the problem is the first thing. I've usually found that it goes back to data so that tends to be the first place I look. For example, a column in a table can be null but the code expects it not to ever be null. If good tests are in place, this isn't too difficult usually. But, when working with legacy code, there may not be tests available so it requires really digging into code that might have been written 10+ years ago.

Omar Hussain • Jun 14 '18

Usually if a bug is taking long to trace, then its better to sleep over it. Most of the times its just a wrong approach one is taking in tracing it. Also its better to take such bugs at the end, if they aren't show stoppers, to save and spend valuable time on being actually productive.

Albrecht Scheidig • Jun 15 '18

This is how we do it:
a) Try to find a workaround, and mark the workaround with "FIXME".
b) Invest one week per quarter to fix all the FIXMEs (whole team - fixmekathon).

And a general thought: If anything (feature, issue, ...) seems to take "very long": invest 15% of "very long" into thinking about the problem and sketching solutions. This will mostly reduce "very long" and result in better solutions.

Ben Sinclair • Jun 15 '18

I document it and break it down into steps.

Sometimes there's a part of your code which works but nobody remembers how, something legacy that nobody wants to touch for fear of breaking it.
Fear of fragility is a major reason these bugs go on so long, so periodically revisiting your code or getting another set of eyes on it will go a long way to preventing these sorts of things in the first place.

I have just finished going through this process for an authentication system on a big, sprawling project which was tripping up in various ways but seemed too big to replace. It took a while, and it took me working on a separate branch and having maintenance tickets deflected to other developers for longer than the client wanted, but it's all better now. I can pat it on the head and give it a lollipop.

Incidentally, this is why my Mac is so out of date. It tells me it needs to reboot every day and I deny it because I have no confidence it'll come back up. MacOS has tricked me too many times. At some point I'll wipe it and start from scratch, or get a replacement. But the OS is so fragile that I avoid going anywhere near updates and patches.

Alex Iskold 🗽 • Jun 16 '18

Its a little open ended. There are different types of bugs. There are bugs where you see where the problem is but don't know why. There are bugs where you don't even know where the problem is. Assuming that your code is well designed you will know where the problem is. Get second pair of eyes on it, pair programming always helps. If the code is spaghetti, this is a perfect opportunity to re-factor. If we are stuck for a while, take breaks, switch to doing something else. Get an ice cream :)