How many times have you seen a picture of a plane with red dots posted on the internet without context?
There's a famous story about a statistician named Abraham Wald and a bunch of WWII bombers. The military looked at the planes coming back from combat, mapped where they were riddled with bullet holes, and decided to add armor there. Wald, being the kind of person who ruins meetings by being right, pointed out the obvious thing nobody wanted to hear:
The planes they were looking at came back. The ones hit in the spots with no bullet holes, the engine, the cockpit, were at the bottom of the English Channel, not available for the survey.
Reinforce the parts that aren't shot up. That's where the dead planes got hit.
I think about this story a lot, mostly while reading those blog posts titled "X Habits That Made Me a 10x Engineer."
The entire industry is a returning-plane survey
Here is the uncomfortable thing about software engineering wisdom: almost all of it is collected from the planes that came back.
Successful companies write blog posts. Successful founders do podcast tours. Successful engineers give conference talks with titles like "Scaling to 100 Million Users with Three People and a Dream." The companies that did the exact same things and died do not have a booth at the conference. They are not on the panel. They are in the channel, with the engines.
And yet we keep doing the survey. We stare at the bullet holes on the survivors and go, "Ah, this is where we add armor."
"Netflix uses microservices, so we should too"
You have eleven users. Three of them are your co-founders, and one is your mom.
Netflix runs a globe-spanning streaming empire on hundreds of microservices because they have hundreds of teams, billions in revenue, and problems you will be lucky to have in a decade. You have a Postgres database that is doing just fine, thank you, and a monolith that boots in four seconds.
So naturally, you spend the next eight months splitting your perfectly functional app into 23 microservices, introducing a message queue you don't understand, adding a service mesh to manage the services you didn't need, and now your local dev environment requires 40GB of RAM and a small prayer.
You did what Netflix does! And you will also do what most companies that copied Netflix did, which is quietly migrate back to a monolith in two years and write a blog post about it called "Why We Moved Back to a Monolith." That post will be very popular. It will be read by people who are about to make the same mistake and will not be dissuaded.
The bullet holes are on the survivors. The armor goes where the holes aren't.
"All our best engineers skipped college"
Survivorship bias loves a hiring anecdote.
"Our best people are self-taught, never finished a CS degree, just shipped." Cool. You are describing the four self-taught people you hired and kept. You are not describing the enormous pool of self-taught applicants you rejected, or the ones who flamed out in month two, or the entire category of people who never applied because the job post wanted "10 years of Kubernetes experience" for a technology that is nine years old.
You hired survivors and then concluded that being a survivor is a great predictor of survival. Groundbreaking.
"They don't build software like they used to"
Ah, yes, the legendary robustness of old software. That COBOL system from 1987 that still keeps that one giant corporation afloat. They really knew how to build things back then.
Friend. That COBOL system is the survivor. It is one plane. The skies of 1987 were thick with garbage software that crashed, got cancelled, lost its company, and was mercifully deleted before you were born. Nobody's writing "we don't make 'em like we used to" think-pieces about PayrollSystem_FINAL_v3_actuallyfinal.exe because it was put out of its misery in 1991.
We remember old software fondly for the same reason we remember the planes that landed: the other ones aren't here to interview.
The advice that is secretly just luck wearing a hoodie
The most dangerous survivorship bias is the confident first-person retrospective. It goes like this:
"I never wrote tests. I never used a debugger. I just shipped fast and trusted my gut. And look where I am now."
Yes. Look where you are. You are one data point who happens to be holding a microphone. For every you, there is a graveyard of developers who also never wrote tests, also shipped on vibes, and are now explaining to a very calm room why prod has been down for six hours. They are not at this conference. They are in a Slack channel called #incident-2024 that will never be archived.
The gut-trusting works great right up until it doesn't, and the "doesn't" is invisible because it doesn't get a keynote.
How to actually not be the dumb general
The fix is the whole point of the Wald story, and it's deceptively simple: go looking for the missing planes.
When someone tells you, "Do X, it's why we won," ask the question nobody at the survey wanted to ask:
How many teams did X and lost?
If the trait shows up just as often in the wreckage as in the winners, it didn't cause the winning. Microservices, rewriting in Rust, a four-hour daily standup, unlimited PTO, "we don't do meetings" — run them all through the same filter. The survivors will tell you their habits caused their success. The dead can't tell you their identical habits caused nothing of the sort.
This is why post-mortems are worth more than success stories. A blameless write-up of how something failed is one of the rare documents written from inside the wreckage. It's a transmission from a plane that didn't come back, which makes it the only useful data on the whole map.
Please add the armor where the holes aren't
So go forth. Read the FAANG engineering blog. Admire the architecture diagram with 200 boxes. Feel the pull to adopt all of it.
Then remember that you are surveying returning planes, close the tab, and go ship the boring monolith that your eleven users will actually be served by.
The bullet holes are a trap. The survivors are a biased sample. And the next time someone says "we should do it like Google," you get to be the insufferable person in the meeting who is, annoyingly, correct.
Wald would be proud. Wald would also tell you to write tests.




Top comments (0)