All it takes is a single line of code to wreck havoc on the global economy in a single day. Here's the complete story of the Microsoft Windows outage.
On 19th July 2024, the Windows operating system faced a vulnerability issue resulting in the Blue Screen Of Death (BSOD) error.
The cause of this vulnerability was in the code of a software called Falcon Sensor provided by Crowdstrike, a cybersecurity organisation.
Microsoft relies on Falcon for Windows' security. So, when a buggy update in the security software was executed, it caused Windows to shut it down as a security measure leading to the blue screen error.
Now, what was the actual technical issue? Let's do a deep dive!
The faulty code created a pointer that pointed to NULL, when it should be pointing to some object (say Object* obj). The pointer was pointing to an invalid memory location.
Now, when the pointer object's properties were to be accessed (for whatever representational purposes), it simply wasn't possible as the pointer was pointing to nothing, i.e., NULL. This mess is still manageable as developers usually have a NULL check in place (like obj === NULL). But that check was missing as well.
Moving on, when the object's properties were accessed ($obj->a OR obj.a), the program was trying to read from an invalid memory location, which it was not allowed to (If obj is NULL, then obj.a will be invalid). It's like reading from an invisible book. If the book isn't visible, then the content is also missing (invalid, in our case). You can't read the content if you can't see the book.
When the program tried to read from an invalid memory location (which it wasn't supposed to), Windows recognized this as a potential security issue to its system and crashed the entire program in order to protect the system. This resulted in the Blue Screen of Death (BSOD) and hence, the outage.
For people wondering how Crowdstrike could push faulty code into Microsoft's codebase, they did not. Crowdstrike released the updated .sys file (which itself wasn't a kernel level driver) which could communicate with other components in the Falcon sensor which run in the same space as the Windows kernel.
Windows kernel is the most privileged place in the Windows operating system from where any program can directly interact with the memory and other hardware.
In the end, these 3 stakeholders need to realize what needs to be done on their end:
DEVELOPERS
Firstly, consider all possible scenarios and try to write bug-free code. Secondly, reducing the number of lines of code doesn't necessarily reduce the negative impact created by the code. Lastly, thoroughly test your code for all possible cases (including edge cases).ORGANIZATIONS
Regular maintenance of your codebase with newer technologies and/or standards is crucial. Never overlook code security. I repeat, NEVER! Code maintenance and security may cost you thousands now, but will save you millions (if not billions) in the future. Add to it the saved reputation of the company!END CONSUMERS
Never rely on a single system or process, technical or managerial. Always have backup systems/processes that have a completely different approach. In this case, companies such as Zerodha weren't impacted as much because most, if not all of their systems were running on the Linux operating system. Diversify your operational risk!
Top comments (0)