Are you tired of seeing the context of data engineering stories and how they lead to success? Well, be ready to dig deeper into the world of "Survivorship bias in data engineering".
So the term "Survivorship bias" was initiated by the statistician "Abraham Wald" at Columbia University, who started the analysis of the research and development on World War II combat aircraft and specifically recommended places for reinforcement against enemy attacks.
First, let's see the major problem here: data getting generated every millisecond, customer touchpoints, and whatever happens worldwide. Every time, it leaves a digital footprint, as you might think right now. Oh yes, data engineering has helped to gain a massive amount of data from a technical perspective, called "data pipelines," leading to remarkable findings and business success. But haven't you heard about numerous failed small-scale and large-scale projects that never make it to the spotlight? They do not get talked about because, well, they are failures.
However, there will be more! As we focus on the term Survivorship bias here, the data ecosystem that exists and how we used to train machine learning models come into the limelight. Mostly, we focus only on the data that leads to successful models, neglecting the rest of the data to cut the mustard. So, the machine learning models will basically learn from a biased dataset.
Now, I know what you are thinking: "Is there a better way" or "What can we do other than the usual process". Well, first, stop thwarting the projects that are so-called "failed" and give a little attention to where they have gone wrong and try to learn from them. Again, it's better to perceive than to do what we usually do because divergent thinking can be more mindful and involves actively seeking out examples of failed projects.
Currently, we are living in the hype cycle of AI; it's essential to keep an eye on where we are heading, and in terms of solution bias, we can welcome the chaos and unpredictable data engineering. It's hard to depend on only the success stories, and you must be prepared to divert and cope when things go wrong. And hey, maybe we can even find some humor in the failures along the way.
Folks, I hope you got some sense of the not-so-glamorous side of data engineering. But do not discourage you. Always welcome failure; try to intercept it and learn from it. No one knows your failed project will lead to something amazing (or at least a good laugh).
Top comments (0)