DEV Community

Cover image for Dealing with broken prod
Bruno Vego
Bruno Vego

Posted on

Dealing with broken prod

Probably one of the biggest fears that we have as developers. Breaking production can be disastrous for your company, but most of the time it's really not.

It will probably happen sooner or later in your career; some bug will make it past all your unit tests, e2e tests, and manual tests.

In my experience some people have a really bad time dealing with the fact when they broke prod... but let me tell you a secret: it gets better with every prod you break.

The worst one is the first one. Please don't think that I am encouraging you to break prod all the time, I am just suggesting that you shouldn't feel terrified when it happens for the first time. 😄

Bad stuff happens

Here's a little side story: I remember the first time I got an F in school. It was fine arts, a subject that a ten-year-old me wasn't really fond of since I couldn't draw if my life depended on it. When the teacher read my name and said F, I remember bursting into tears. I always had all As in all my other subjects and worked hard for that, but my first A was in fine arts? Like, cmon.

Anyways, when my mom picked me up I thought she was gonna yell at me (Balkan parents for y'all. it's similar to the Asian parent stereotype but much more unpredictable), but to my surprise, she didn't. She actually took me for a pizza (which an overweight ten-year-old really enjoyed) and we talked about how it happened and how to prevent it in the future.

The point of the story is that when you screw up, chances are that you won't get yelled at, but helped to overcome it. If your manager yells at you for doing something bad and you've actually put legit effort into it, it's probably time to switch jobs.

I did break production a few times more than I'd like, but, as with everything, the more you experience you have the better you will handle it. (I also got some more Fs in elementary school later on, but didn't cry at any other lol)

When you break prod for the first time, you'll probably break into a sweat (don't burst into tears, you'll probably hate that later) and your mind will get clogged. Take a breath.

Also, if someone else breaks prod please don't go out looking for blood. Blaming people is bad etiquette and you certainly wouldn't want someone to go after you when you are the culprit. Help them instead. It's a team effort.

Most of the time the problem will be pretty easy to fix, but there are steps that we can take beforehand to prevent it.

Write postmortems

Postmortems are a great way for your team to see what has gone wrong and take the right steps to mitigate the problem in the future.

It usually contains the following information:

  • When the outage happened
  • How long did it last
  • How large was the impact
  • What actually happened
  • Steps to mitigate it in the future (it's great to include a timeline when it will get fixed)

Also, I'll state it again: don't blame anyone. It's not an individual's fault. Other people code reviewed the broken code and have let it through, also testers who may have missed the bug. The fault is always split across the team.

Set up a proper CI/CD environment

This part is especially important for developers.

If the problem is too big and the actual bug cause is unknown, waiting for the fix isn't a real option. This is where the benefit of proper CI/CD environments comes in really handy.

Just deploy the old version of the app. It should be as simple as clicking a button is.

It really isn't that hard and there are a bunch of articles online on how to do it. It may come out of the box if you are deploying to Netlify/Vercel/Heroku, in that case, you just push the old git tag.

Set up monitoring and alarms

Monitoring and alarming are great because it allows you to notice that something is wrong before the users notice or report it.

I really like Sentry for error logging and Datadog for logs and metrics. They are really awesome tools and are not that complicated to work with.

Invest time into tests

Whether it be e2e or unit tests, they really come in handy. Tests are essentially working both as QA and as a specification on how something should work. Especially if you are rewriting stuff - it's nice to have example inputs and the result asserted to their expected output.

I won't go on about this too long since there are thousands of posts on unit tests.

Wrapping up

Developing isn't a breeze. Sometimes it's really hard to keep the focus on a certain problem, especially if you are working on it for a prolonged time.

Bugs will pass through, mistakes will sometimes be visible. They will also get fixed, that's just how developing works (especially in these times where companies push you to release stuff as soon as possible).

Don't blame others if that happens, and ask for help when you have to fix it, it's alright.

Thank you for reading!

Top comments (0)