DEV Community

SupermanSpace
SupermanSpace Subscriber

Posted on

The Great AWS Outage of October 2025: When the Internet's Backbone Buckled

October 20, 2025 — In the early hours of Monday morning, millions of internet users worldwide woke up to find their favorite apps and services completely unavailable. Snapchat wouldn't load. Wordle was inaccessible. Medium not loading. Vercel was not working. Ring doorbells went dark. Amazon's own shopping site displayed error pages featuring apologetic dog photos. The culprit? A massive outage at Amazon Web Services (AWS), the cloud computing giant that quietly powers much of the modern internet.

The Scale of the Disruption

The outage began at 12:11 a.m. PT (3:11 a.m. ET) when AWS reported an "operational issue" affecting 14 different services in its U.S.-East-1 Region center in northern Virginia. What started as a technical glitch in a single data center quickly cascaded into one of the largest internet disruptions since the CrowdStrike malfunction of 2024.

Over 4 million users reported issues due to the incident, affecting an astonishing array of services that people rely on daily. The impact was both widespread and democratic in its chaos, bringing down everything from entertainment platforms to critical business infrastructure.

Who Was Affected?

The list of affected services reads like a who's who of the internet:

Social Media & Communication:

  • Snapchat
  • Reddit
  • Signal (encrypted messaging)

Gaming:

  • Fortnite
  • Roblox
  • Pokémon GO

Financial Services:

  • Coinbase (cryptocurrency exchange)
  • Venmo
  • PayPal
  • Robinhood
  • Chime

Amazon's Own Services:

  • Amazon.com (shopping)
  • Prime Video
  • Alexa
  • Ring doorbells and security cameras

Airlines:
United Airlines experienced disruptions to its app and website, with some internal systems also temporarily affected. Delta Airlines experienced a small number of minor flight delays.

Education:

  • Duolingo
  • Canvas (online teaching platform)

Other Major Services:

  • Canva (graphic design)
  • Perplexity (AI search)
  • Max (streaming)
  • Apple Music
  • Microsoft Teams (surprisingly affected despite Microsoft's own Azure cloud)

Support

  • Intercom (down from last 11 hours)
  • YourGPT Helpdesk (not loading for 15 minutes)
  • Ada (not working for 2 hours)

In the United Kingdom, customers of banks including Lloyds, Bank of Scotland, and Halifax reported issues while attempting to log into their accounts. British government websites Gov.uk and HM Revenue and Customs also experienced disruptions, highlighting
how critical infrastructure has become dependent on cloud services.

The Root Cause: A DNS and Database Perfect Storm

At 4:26 a.m. ET, Amazon flagged significant error rates for requests made to the DynamoDB endpoint in the US-EAST-1 Region. DynamoDB is AWS's database service that thousands of companies use to manage their data tables and indexes.

The issue appeared to be related to DNS resolution of the DynamoDB API endpoint. DNS (Domain Name System) is essentially the internet's phonebook, translating human-readable website names into computer-readable IP addresses. When this system fails to communicate with databases, the entire chain of services collapses.

At 11:43 a.m., AWS identified the root cause as "an underlying internal subsystem responsible for monitoring the health of our network load balancers". This technical fault in a monitoring system created a cascading failure that rippled through AWS's entire infrastructure.

A software engineer and cyber expert noted that the issue appeared to be with one of the networking systems AWS uses to control a database product, highlighting how a problem in one small component can bring down an entire ecosystem.

The Recovery: A Long and Bumpy Road

At 6:35 a.m. ET, AWS announced that the database problem was "fully mitigated" but warned there may still be delays. However, the recovery proved more complicated than initially hoped.

Many sites came back online within a few hours, although Downdetector showed another spike in user reports around noon ET of outages at Amazon, AWS and Alexa. The initial fix didn't fully resolve all issues, and services continued to experience intermittent problems throughout the day.

Around 1:30 p.m. ET, AWS said it was starting to see "early signs" of EC2 recovery in some regions and was applying fixes to remaining areas. The company's EC2 (Elastic Compute Cloud) service provides virtual server capacity that companies rely on to run their applications.

Amazon.com itself wasn't spared from the chaos. Reports on Downdetector showed over 12,000 outages in the US, with Amazon displaying "something went wrong" error pages featuring various dogs to frustrated shoppers.

Real-World Impact: Beyond Inconvenience

While many users experienced mere inconvenience, the outage had more serious consequences for others:

Business Disruption: Warehouse and delivery employees, along with drivers for Amazon's Flex service, reported that internal systems were offline at many sites. Some warehouse workers were instructed to stand by in break rooms and loading areas during their shift.

Accessibility Concerns: One user shared a particularly poignant example of how cloud dependency affects vulnerable populations: "I use Alexa-enabled smart plugs to control the lamps in my room. I'm unable to walk without leaning on crutches so being able to turn lights and music on by voice is very helpful. During the outage my smart plugs became unresponsive".

Educational Impact: Educational publishing company Folens contacted teachers advising that services linked to its 'My Folens' library were being disrupted, affecting students' access to learning materials.

Security Concerns: Ring experienced outages affecting thousands of users, creating problems for those who rely on Ring doorbells and security cameras for safety.

The Fragility of Centralized Infrastructure

This outage underscores a fundamental vulnerability in how the modern internet is structured. The outage highlighted the fragility of companies that use cloud-based servers to host their data, and how suddenly businesses across the globe can be affected by an unplanned outage.

AWS is the dominant player in cloud computing, making $107 billion in the 2024 financial year, representing 17% of Amazon's total revenue. This dominance means that when AWS experiences problems, the ripple effects are felt globally.

Cori Crider, executive director of the Future of Technology Institute, stated: "Europe's dependency on monopoly cloud companies like Amazon is a security vulnerability and an economic threat we can't ignore", calling for European governments to diversify their cloud providers and support local alternatives.

Charlotte Wilson, head of enterprise at Check Point Software Technologies, noted: "Today's outage is another reminder that the digital world doesn't stop at borders - a local fault can ripple worldwide in minutes".

Was It a Cyberattack?

With such widespread disruption, speculation naturally turned to the possibility of malicious activity. However, Rob Jardin, chief digital officer at cybersecurity company NymVPN, stated: "There's no sign that this AWS outage was caused by a cyberattack - it looks like a technical fault affecting one of Amazon's main data centres".

Rafe Pilling, director of threat intelligence at cybersecurity firm Sophos, acknowledged: "When anything like this happens, the concern that it's a cyber incident is understandable. AWS has a far-reaching and intricate footprint, so any issue can cause a major upset". However, all evidence points to a technical failure rather than malicious intent.

Lessons Learned and Moving Forward

This incident raises critical questions about internet infrastructure resilience:

  1. Single Points of Failure: The concentration of so many services on a single cloud provider creates systemic risk. When AWS goes down, significant portions of the internet follow.

  2. Redundancy vs. Cost: While AWS and other cloud providers generally maintain robust systems, the complexity of these networks means that unforeseen interactions can cause cascading failures.

  3. Transparency: AWS customers were unable to report the problem because its automated support ticketing system was also offline, highlighting how even reporting mechanisms can be caught in the same failure.

  4. Geographic Concentration: Many outages appeared to be concentrated in the United States, with a focus in Virginia, which is considered the global capital for data centres, demonstrating the risks of geographic concentration.

The Aftermath

As of late Monday afternoon, AWS continued working toward full resolution, with most services gradually returning to normal operation. However, the incident serves as a stark reminder of how dependent modern life has become on cloud infrastructure.

Charlotte Wilson recommended that people keep good backups, save important information offline, and know alternative ways to connect to the internet or pay if systems fail — practical advice for an increasingly cloud-dependent world.

The AWS outage of October 2025 will likely be studied for years to come as a case study in infrastructure fragility, centralization risk, and the need for more resilient digital systems. As we continue to move more of our lives online, building redundancy and diversification into our digital infrastructure isn't just good engineering — it's essential for maintaining the connectivity that modern society depends upon.

For now, the internet has largely recovered, but the question remains: how do we build a more resilient digital future that doesn't collapse when a single provider stumbles?


This is a developing story. AWS continues to monitor services and work toward complete restoration of all affected systems.

Top comments (0)