On December 7, 2021, AWS suffered its most severe outage in the US-East-1 region, lasting roughly five to seven hours. A network device overload, triggered by a routine scaling activity, caused a control plane failure, disrupting core services like EC2, S3, DynamoDB, and ELB. This led to widespread outages for Netflix, Disney+, Amazon’s e-commerce platform, Robinhood, Slack, and even government sites like the IRS, with economic losses estimated in the billions. The timing—pre-holiday peak—amplified the chaos, marking it as AWS’s most impactful outage to date due to its duration, service breadth, and global ripple effects.
The incident exposed vulnerabilities in over-reliance on a single region, particularly US-East-1, AWS’s busiest hub. AWS responded by enhancing network monitoring, traffic management, and transparency via faster Post-Event Summaries. Key lessons include adopting multi-region architectures, using tools like AWS Fault Injection Simulator for resilience testing, and maintaining robust failover plans. As of September 2025, no outage has matched this scale, but the event underscores the need for businesses to diversify cloud dependencies and prepare for systemic risks.
Top comments (0)