Unless you lived on the Moon, you know what happened on July 19, 2024.
In a few words, a “defective” content update for Windows hosts caused the outage of many machines. Blue screens of death were everywhere, affecting airports, hospitals, and more.
A big mess.
But as developers, what can we learn from it?
1. Be careful about what you give access to.
What I find difficult to understand is how CrowdStrike had such easy access to the core of Windows. It reminds me of this xkdc comic
I understand the Windows world is huge, and maintaining its security is a challenge, almost to the level of "almost impossible." Windows is also considered insecure due to its past, and crackers want to target as many users as possible, which are on the Windows system.
So, when we download a third-party library, ask yourself 100 times if you really need it and what is behind it. It's like having a guest at home.
2. Testing is more important than the code you write.
Was it avoidable? Most probably yes. Overconfidence, strict timelines, and management pressure can lead developers to push untrusted code. And disaster is just around the corner. Finding time for testing is a developer's right, which needs to be explained multiple times to non-tech people.
3. Communication is important.
I appreciate the honesty of CrowdStrike. They recognised the issue and didn’t blame anyone else but themselves. The fix was late, but they didn’t make excuses for the non-tech people, who immediately believed it was a cracker attack.
4. Make a postmortem so everyone can learn about it.
Having a postmortem is important. It's a document where you analyze your mistakes and learn from them. It also gives honesty and transparency to the company.
And you? What are your thought about this?
Top comments (0)