DEV Community

What's your approach to fixing a "long-term" hairy bug?

Ben Halpern on June 14, 2018

You come across an issue that just ain't getting fixed in a short amount of time? What goes through your head, what do you begin to think about?

Collapse
 
paula2001 profile image
Paula George

redo the whole project lol

Collapse
 
charliedevelops profile image
charliedeveloper

not necessarily what exactly needs to happen to fix it, but how can the whole team be involved in fixing it so that it doesn't happen again. With everyone getting involved it:

  • makes everyone realise what level of effort it takes to fix it if they let it happen again.
  • prevents knowledge silos
  • collaborative code on complicated legacy projects usually looks less like spaghetti than if approached individually.
Collapse
 
paula2001 profile image
Paula George

if the bug is hairy as u say i will start with roots and re-build the whole code without rewriting it just double check everything and i hope u were using any kind of version control just print everything

Collapse
 
monknomo profile image
Gunnar Gissel

My thoughts turn to mitigation and isolation.

First things first, how can I make it livable for however long I have to live with it? Can I put a cron in to restart a troublesome service? Can I add more retries to talk to a box with a bad network? Can I wrap a bad library with something that makes it less bad?

Second, how can I contain the problem so when I want to fix it properly I can do it quickly and without hurting the rest of the system? Interfaces, facades, etc.

Finally, I like to document how much the issue hurts the team in terms of lost productivity and hours spent messing with it - never hurts to have ammo to convince stakeholders and supervisors that we should make a persistent problem a priority.

Collapse
 
gokatz profile image
Gokul Kathirvel

The greatest part is the users might get used to it and fixing the bug may create churn ๐Ÿ˜๐Ÿ˜†

Collapse
 
stephanie profile image
Stephanie Handsteiner

โ€œIt's not a bug, it's a feature (now)!โ€œ :D

Collapse
 
ben profile image
Ben Halpern

Given that I have the privilege of just hacking and making it up as I go, this is half my day I swear.

Collapse
 
georgeoffley profile image
George Offley

First thing I usually think about it if the code is designed correctly. There is usually a design flaw that causes those bugs. At least in my experience. Lay out the steps you need to do in the simplest form and see if there is a better way to do it.

Collapse
 
mechkit profile image
Keith Showalter

Project stopping bug early in a development? Start from nothing and see where things went wrong.

Project stopping bug later in development? Start breaking the data flow apart to see where things go wrong.

Anything else gets prioritized based on how critical (work around available?), and how burnt out I am dealing with that particular issue.

Collapse
 
jibinp profile image
Jibin Philipose

I guess this is where IDE capabilities are the most helpful because nobody can debug like the IDE debugger, through breakpoints and narrowing down the function or variables which are responsible for doing something but are messing up big time.

Collapse
 
mjac profile image
Mitch Jackson

How long has this bug existed, and is it generating issues for customers?

Can I present a work-around approach to the team to mitigate the bug's impact?

Can I make a business case to management/client to allocate resources to fix this bug? What are the costs to repair the bug? What are the costs to ignore the bug?

Collapse
 
pavlosisaris profile image
Paul Isaris

First, I try to log every step of the buggy method into a log file (especially when it is on a production server).
Then I take an abstract look at the method design, to make sure I don't have any design gotchas.
Then, I retrace the call and check the platform/plugin details (if for example it is a plugin that is used in a mobile app).

Collapse
 
jfrankcarr profile image
Frank Carr

I just work the problem in the Agile way with a combination of spikes and user stories, maybe contained within an epic if it's that big and hairy.

I've got one right now that I'm working on that's been a bit tough to define. I've been working this spike for the entirety of the current sprint. Users tell me it's a big problem but can't really give me a clear description of what is needed to correct the issue. It's probably part business process/policy with software changes to match the desired policy. Getting to the desired policy though is where there's a lack of agreement. Until product owners agree, it will be difficult to move forward.

In other cases, it's been a bit easier but getting to the root cause (to use one of my least favorite buzz words) of the problem is the first thing. I've usually found that it goes back to data so that tends to be the first place I look. For example, a column in a table can be null but the code expects it not to ever be null. If good tests are in place, this isn't too difficult usually. But, when working with legacy code, there may not be tests available so it requires really digging into code that might have been written 10+ years ago.

Collapse
 
satansly profile image
Omar Hussain

Usually if a bug is taking long to trace, then its better to sleep over it. Most of the times its just a wrong approach one is taking in tracing it. Also its better to take such bugs at the end, if they aren't show stoppers, to save and spend valuable time on being actually productive.

Collapse
 
scheidig profile image
Albrecht Scheidig

This is how we do it:
a) Try to find a workaround, and mark the workaround with "FIXME".
b) Invest one week per quarter to fix all the FIXMEs (whole team - fixmekathon).

And a general thought: If anything (feature, issue, ...) seems to take "very long": invest 15% of "very long" into thinking about the problem and sketching solutions. This will mostly reduce "very long" and result in better solutions.

Collapse
 
moopet profile image
Ben Sinclair

I document it and break it down into steps.

Sometimes there's a part of your code which works but nobody remembers how, something legacy that nobody wants to touch for fear of breaking it.
Fear of fragility is a major reason these bugs go on so long, so periodically revisiting your code or getting another set of eyes on it will go a long way to preventing these sorts of things in the first place.

I have just finished going through this process for an authentication system on a big, sprawling project which was tripping up in various ways but seemed too big to replace. It took a while, and it took me working on a separate branch and having maintenance tickets deflected to other developers for longer than the client wanted, but it's all better now. I can pat it on the head and give it a lollipop.

Incidentally, this is why my Mac is so out of date. It tells me it needs to reboot every day and I deny it because I have no confidence it'll come back up. MacOS has tricked me too many times. At some point I'll wipe it and start from scratch, or get a replacement. But the OS is so fragile that I avoid going anywhere near updates and patches.

Collapse
 
alexiskold profile image
Alex Iskold ๐Ÿ—ฝ

Its a little open ended. There are different types of bugs. There are bugs where you see where the problem is but don't know why. There are bugs where you don't even know where the problem is. Assuming that your code is well designed you will know where the problem is. Get second pair of eyes on it, pair programming always helps. If the code is spaghetti, this is a perfect opportunity to re-factor. If we are stuck for a while, take breaks, switch to doing something else. Get an ice cream :)

Collapse
 
sudiukil profile image
Quentin Sonrel

Why fix it when I can make it a feature? ๐Ÿ˜

Collapse
 
andrekelvin profile image
AndreKelvin

It's a feature leave it like that ๐Ÿ˜‚๐Ÿ˜‚

Collapse
 
yobyob profile image
Grover Sean Reyes • Edited

what do I begin to think about?

How did i just ____ myself?

Collapse
 
stargator profile image
Stargator

In a different environment!

Collapse
 
acoh3n profile image
Arik

I'll start suspecting a race condition or some other concurrency issue. These are usually the hardest to detect and kill.

Collapse
 
bgadrian profile image
Adrian B.G.

Try to convince the business side to promote it to a feature.

If they say no I can throw the bomb "refactoring".

Collapse
 
justinctlam profile image
Justin Lam

Can I add telemetry data or traces? Keep gathering data, which over time will give me enough information to formulate a hypothesis. Hopefully.