In the modern era of information, it is generally very apparent when things don't go the way we intend them to. We have a habit of chalking these non-ideal outcomes as a failure. These non ideal outcomes include all sorts of symptoms: a system failing to do what was expected of it, a time line not being reached, the wrong expectation having been set.
From what I've seen, the product development world has largely accepted that these "failures" are a part of general development life-cycle. "We learn from our failures more than our successes" is a pretty common phrase I have heard in relation to development, even going back to high school.
So if we all agree that these "failures" are an important part of the process it makes sense to embrace failure and do everything we can to maximize the benefits from it, right?
Unfortunately we are surrounded by a culture that does not seem to want to embrace failure. The answer to failure is very often to find some sort of scape-goat. This is especially true when it's someone reacting to (what they view as) a failure in a product or system that they have no immediate direct control over.
The Blame Game
Let's say a piece of code gets deployed that drops the entirety of your users table given some special condition that . Let's play the blame game.
It was QA's fault for not testing 100% of every single possible out comes. It was the developer's fault for writing bad SQL. It was the reviewer's fault for not catching the bad SQL. It was the project owner's fault for not properly scoping the ticket to include test cases that account for that edge case. It was the architects fault for not having better, more testable design patterns implemented that could have caught this. It was the team lead's fault for not thoroughly checking every single one of their PRs that came their way. It was the dev-ops fault for deploying the bad code.
Cool, great, we did it. We blamed all the things. Some of those probably sound pretty ridiculous, too. Now we can move on, we have fixed the problem. Wait, no, we didn't. We solved nothing. This thing is going to occur again in a week because SQL mistakes are just inherently difficult to catch.
Everything has a Risk
Nothing we ever do will be completely fool proof. There is no perfect state where I can click a button and be 100% confident that that button will do the same exact thing it has always done. In fact Chaos Engineering is a [really cool] style of software design that actively introduces unpredictability to force us to design around it.
To complicate the problem even more every risk is not equal. Using Chaos Engineering as a fairly meta example, the impact of an overlooked risk is often not high enough to warrant the overhead of maintaining something like Chaos Monkey that has massive overheads and implications on development life cycles.
So to suffice it to say that, if we can't be expected to be able to accommodate for all risks, we should not be held accountable for when one makes it through the cracks. This is not to say that we have no responsibility and shouldn't be held accountable for improving some portion of the system or process. I'm actually saying the opposite.
Empower Your Failures
So stepping back, we have our failure. Something went wrong, expectations were not met. Do we roll everything back to when everything was failure free and hide in a cave from ever making a change again because obviously the way we have it now is perfect?
No! We mitigate the damages and use the "failure" as leverage to implement change to improve process or the system. One of the wisest people I know taught me that some of the greatest changes are born in chaos. No one wants to make process or design changes when everything is working.
If you have no means to provide an improvement, and this is super important, don't feel obligated to have an opinion. It's very easy to find where a piece of process could be improved, especially in hindsight. We have an entire internet full of people waiting to tell you what went wrong.
The valuable part, the reason people make tons of money doing jobs that require decades of experience having learned from these failures time and time again, is improving the system or process to avoid having repeat mistakes. Throwing out blame about what the actual problem was or who caused the problem is not helping, and will often become a distraction to the people attempting to solve it.
It's About Trust
We love to categorize people. Stereotyping, generalizing, it's all human tools to avoid the fact that we can't actually comprehend everything that goes on in a person's mind without having lived their entire life. When someone makes a decision we don't like or breaks something, it's so easy to chalk it up to them being incompetent and move on with your life.
I would challenge you to default to trusting everyone until really proven they don't deserve it. I'm sure you just had ten people pop in your head that aren't trustworthy. Flip that script, there are other people that probably had you pop in their head as untrustworthy. They are wrong, right?
This is a hard exercise: actually believing that people are by default good, well intending, and aren't intentionally causing problems just because they're some movie villain that likes disrupting your life.
With this trust, however, you will find that instead of blaming someone you will start looking for context that you were missing. Something that drove decisions that were made that might have been outside of the scope of what you know. And with that knowledge comes not only the the ability to improve the system or product, but the ability to improve yourself.
Top comments (0)