DEV Community

Cover image for The Great Fall: Decoding the Crowdstrike-Microsoft Outage of July 2024
Shish Singh
Shish Singh

Posted on

The Great Fall: Decoding the Crowdstrike-Microsoft Outage of July 2024

On July 18, 2024, the digital world experienced a tremor. A global outage, impacting millions, brought critical services like Microsoft 365, Azure, and countless others to a grinding halt. The culprit? A seemingly innocuous software update from a leading cybersecurity company - CrowdStrike.

This blog delves into the technical intricacies of what went wrong, the mitigation plans put in place, and the global reach of this event.

A Programmer's Error Snowballs into a System Crash

The root cause of the outage stemmed from a human error: a bug in the code written by a CrowdStrike developer. Here's a breakdown of the technical specifics:

Programmer Error: During the development of a CrowdStrike Falcon sensor update, a C++ coding error was introduced.

Null Pointer Creation: The code created a pointer variable (Obj* obj) intended to reference a specific object in memory containing data. However, due to the error, the pointer remained NULL, meaning it didn't point to any valid memory location.

Missing Null Check: Ideally, programmers add checks to ensure a pointer isn't null before using it. This vital check was missing in the faulty code.

Attempting to Access "Nothing": With the null pointer essentially pointing to "nothing" in memory, the code tried to access information within the object it was supposed to represent (like obj->a or obj->b). This resulted in attempts to read data from an invalid memory address calculated based on the null pointer value (e.g., 0x0 + 4).

Imagine this scenario: you have a note to remind yourself to buy milk, but you forgot to write it down anywhere (the null pointer). Then, you try to read the imaginary note (accessing the null pointer) - it's bound to fail.

Memory Access Violation: Since the program attempted to access memory it wasn't authorised to, Windows recognised this as a potential security threat. To protect the system, Windows crashed the program entirely, leading to the infamous Blue Screen of Death (BSOD) and the subsequent outage.
Essentially, the code tried to read data from nowhere in memory, triggering a system crash as a safety measure.

Mitigation Efforts and the Road to Recovery

While the cause may seem like a simple mistake, the impact was far-reaching. Thankfully, both CrowdStrike and Microsoft responded swiftly:

CrowdStrike: Acknowledged the issue and released a public statement along with a workaround solution.

Microsoft: Communicated with CrowdStrike and external developers to expedite a solution. They also provided technical guidance and support to help customers recover safely.

The fix involved a solution from CrowdStrike that addressed the null pointer issue and prevented further crashes. Additionally, Microsoft posted instructions on the Windows Message Center to guide users on how to remedy the situation on their Windows endpoints.

A Global Impact

The outage wasn't localized; it affected users across the globe. Critical business operations, healthcare services, airlines, stock exchanges, and countless individuals across various countries were impacted.

While the exact number of affected users and locations remains unclear, reports suggest the outage spanned continents, causing significant disruptions.

The Crowdstrike-Microsoft outage serves as a stark reminder of the domino effect a seemingly minor software bug can have. It highlights the importance of rigorous code reviews and the crucial role collaboration plays in mitigating widespread disruptions. As the digital world continues to evolve, so too must our efforts to ensure its stability and resilience.

Reference

Cover: https://timesofindia.indiatimes.com/technology/tech-news/microsoft-acknowledges-it-is-crowdstrike-behind-the-outage-read-what-the-company-said/articleshow/111865989.cms

**Helping our customers through the CrowdStrike outage: **https://www.nytimes.com/2024/07/19/business/microsoft-outage-cause-azure-crowdstrike.html

Microsoft outage cause explained: What is CrowdStrike and why users are getting Windows' blue screen of death?: https://www.livemint.com/technology/tech-news/blue-screen-of-death-windows-users-face-massive-outage-due-to-new-crowdstrike-update-11721370250881.html

*Check out my other blogs: *

Travel/Geo Blogs
Subscribe to my channel:
Youtube Channel
Instagram:
Destination Hideout

What is CrowdStrike, the company at the heart of the global Microsoft outage?:
https://indianexpress.com/article/technology/microsoft-global-outage-satya-nadella-crowdstrike-key-points-9464267/lite/

Top comments (1)

Collapse
 
annavi11arrea1 profile image
Anna Villarreal

Thanks for deets. Was curious.