DEV Community

Chameera De Silva
Chameera De Silva

Posted on

1

Survivorship Bias in Data Engineering

Survivorship Bias

Are you tired of seeing the context of data engineering stories and how they lead to success? Well, be ready to dig deeper into the world of "Survivorship bias in data engineering".

So the term "Survivorship bias" was initiated by the statistician "Abraham Wald" at Columbia University, who started the analysis of the research and development on World War II combat aircraft and specifically recommended places for reinforcement against enemy attacks.

First, let's see the major problem here: data getting generated every millisecond, customer touchpoints, and whatever happens worldwide. Every time, it leaves a digital footprint, as you might think right now. Oh yes, data engineering has helped to gain a massive amount of data from a technical perspective, called "data pipelines," leading to remarkable findings and business success. But haven't you heard about numerous failed small-scale and large-scale projects that never make it to the spotlight? They do not get talked about because, well, they are failures.

However, there will be more! As we focus on the term Survivorship bias here, the data ecosystem that exists and how we used to train machine learning models come into the limelight. Mostly, we focus only on the data that leads to successful models, neglecting the rest of the data to cut the mustard. So, the machine learning models will basically learn from a biased dataset.

Now, I know what you are thinking: "Is there a better way" or "What can we do other than the usual process". Well, first, stop thwarting the projects that are so-called "failed" and give a little attention to where they have gone wrong and try to learn from them. Again, it's better to perceive than to do what we usually do because divergent thinking can be more mindful and involves actively seeking out examples of failed projects.

Currently, we are living in the hype cycle of AI; it's essential to keep an eye on where we are heading, and in terms of solution bias, we can welcome the chaos and unpredictable data engineering. It's hard to depend on only the success stories, and you must be prepared to divert and cope when things go wrong. And hey, maybe we can even find some humor in the failures along the way.

Folks, I hope you got some sense of the not-so-glamorous side of data engineering. But do not discourage you. Always welcome failure; try to intercept it and learn from it. No one knows your failed project will lead to something amazing (or at least a good laugh).

Heroku

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay