DEV Community

Cover image for Even the Big Ones Mess Up

Even the Big Ones Mess Up

Daniel Starner on December 27, 2018

This morning, my feed blew up due to my friends and family complaining about some new Instagram overhaul. Apparently, their feeds were scrolling ho...
Collapse
 
ben profile image
Ben Halpern

Every time I encounter a 500 error in the wild I breathe a sigh of relief.

Collapse
 
kpollich profile image
Kyle Pollich • Edited

No one is saying that mistakes are meaningless or can be swept away without concern. Mistakes are action items and opportunities to grow as professional and as individuals.

Consider this excerpt from @dstarner 's original post:

No matter how much we prepare, we will make mistakes, that's just a part of life. What matters is how we face those mistakes and issues, and the tenacity we bring to making software better.

Having lofty expectations for yourself is fine, but for a lot of people, failing to reach those lofty personal expectations leads to severe stress and anxiety. Learning to cope with your mistakes and grow is an important personal development skill. Many people can be paralyzed for fear of making a mistake, and this post (as I understood it) seeks to alleviate some of that paralysis.

Collapse
 
dan_starner profile image
Daniel Starner

I agree with this completely. I'm pretty sure you just summed up my thoughts better than my post did πŸ˜‚

Collapse
 
thebouv profile image
Anthony Bouvier

Modern medicine is built upon trial and error. LOTS of mistakes made. People died because of those mistakes.

Think about what a doctor does and what a software developer does. It is very much a debugging exercise. "What hurts? When did it start? What factors surrounded this happening? Does it hurt when you do this? Or only this?"

They're debugging you. And when pressed, they'll admit: they're doing what they can with the info they have and there is no 100% solution most likely. They are taking guesses; educated guesses of course. But guesses.

Collapse
 
danilapetrova profile image
Danila Petrova

I don't think I have seen a better explanation of what a doctor does. Entirely on point, and to be honest, it makes me even more grateful people are willing to pick up the role to debug us, humans.

Fact is that it is impossible not to make mistakes, whatever it may be we set out to do. And the world as we know it is built by mistakes and takeaways from them. Without error, we would still be playing it safe in caves with flint fire.

We have to mention that, the level of responsibility is vastly higher with doctors than that of a software developer, or any other profession, really.

Collapse
 
_morgan_adams_ profile image
morgana

Lots of great questions at the end there and I think for a lot of cases "it depends" e.g. a mistake with software that puts someone on the moon vs Instagram swiping issues.

In general I think we ought to give each other a bit of break and focus on collaborating more to make things better.

I have a good laugh every time I run into a bug or server issues out on the internet because I've been there. It happens.

 
notriddle profile image
Michael "notriddle" Howell

Then you will be eternally frustrated.

 
notriddle profile image
Michael "notriddle" Howell

Doctors were harder on themselves than patients were when it came to judging their ability to minimize the pain, discomfort, or disability caused by a condition. Only 37 percent of physicians thought they were "very" effective, though 60 percent more thought they were "somewhat" effective. But 79 percent of patients said their doctor helped to minimize their pain or discomfort. -- Consumer Reports

You're suffering from contempt borne of familiarity. You know everything that's wrong in the software world, and nothing about how messed up the medical world is. How 30% of new doctors suffer from depression. And, speaking from personal experience, how freaking elitist an MD can be. Nor have you conversed with people who have chronic illnesses and have taken tons of different drugs with various side-effects and the doctors just move on to the next; I'm not sure if they publicly exist, or are kept behind closed doors, but "everybody makes mistakes" is obviously the norm in that world.

"The only doctor who never loses a patient is one who doesn't try to heal." Which I don't want to be overly-critical of the medical field; I wouldn't even be remotely familiar enough with all the stuff going on to be able to give a truly informed critique beyond the surface level.

Collapse
 
coatsnmore profile image
Nick Coats • Edited

I'd be interested to see how long it took them to recover. If we are truly in the blessed days of DevOps and Business Agility, then their Mean-Time-To-Recover is all that really matters, yeah?

That being said, as long as you figure out the root issue and put a test in place to make sure that doesn't happen again as part of the deployment pipeline, then it was actually time well spent.

Collapse
 
elmuerte profile image
Michiel Hendriks

You should always make a big fuss about mistakes which happen in production. Of course mistakes happen, but why didn't this mistake happen in Test or Acceptance. What do you need to fix to the way you go to production so that this does not happen again the next time.

Quite often production issues are the result of good enough mentality. The quality should be good, it does not have to be perfect. Something is good when it works, and you have proof that it works. It might not be the best in performance or scalability, but you know where its limitations are. And here is where something that is good can still fail in production. For example, in case of the Amazon issue they probably had in incorrect estimation of the surge of new devices. And that's fine. But if they did not even consider for it, and test for it. Then they deserve all the fuss that should be made about it.

In production there should only be two cases of issues which are (kind of) acceptable.

  1. "Oh fuck!" Usually the result of somebody performing an explicit action, like deleting a file on the wrong server.
  2. "That's interesting." Something happens which defies the world as you have defined it. These are usually the result of a user performing a combination or series of actions which where not accounted for in the logic.

Both these issues are not really solvable. You can only reduce the number of occurrences. This is what defines your software/process maturity.
You can attempt to expose these problems by employing things like chaos engineering and fuzzing testing. But that only gets you so far. In fuzzing testing you generally only try to find the edge cases of a single unit. But for the "That's interesting" you probably need to invoke a whole series of edge cases.

Collapse
 
vip3rousmango profile image
Al Romano

100% this. ^

I'm now changing incident report reasons to "Oh Fuck" and "That's interesting...".

So. Much. Yes.

Collapse
 
stanleynguma profile image
STANLEY NGUMA

This is one of the best reads to close the year with

Collapse
 
defman profile image
Sergey Kislyakov

Are you sure Einstein didn't make mistakes?

Collapse
 
aschwin profile image
Aschwin Wesselius

You, me, Amazon or Instagram...we will never write perfect software or always get things right, because there is no right way or perfect software.

I don't want to discourage you (actually incourage), but there are people that build perfect software. And that's not just the tutorial "Hello world!", but enterprise level software with zero defects, build on time and on budget.

They just made lots more mistakes and improved their approach faster and cut things short to just one methodology. And it works.

Even better, these people can teach you how to do it too. And it's essence is very simple, but not easy.

But I agree, all people mess up. And it's good to realize this fact.