Tarek CHEIKH

Posted on Jan 15 • Originally published at tarekcheikh.Medium on Jan 15

The Hidden Backbone of the Internet: Why S3 Security Should Keep You Up at Night

#security #awss3 #aws #infosec

Part 1 of 4 in the S3 Security Series

When you scroll through Instagram, stream a movie on Netflix, or check your bank balance, you’re probably not thinking about cloud storage. But behind almost every digital experience you have today, there’s a good chance Amazon S3 is involved. And that’s exactly why its security matters more than most people realize.

I’ve spent over 20 years in IT architecture and cloud computing. In the past decade working with AWS, I’ve seen organizations make the same security mistakes over and over again. Mistakes that have cost companies hundreds of millions of dollars and exposed the personal data of hundreds of millions of people.

This is the first in a series of four articles where I’ll share what I’ve learned about S3 security — not from textbooks, but from real-world experience building and securing cloud infrastructure.

S3: The Invisible Giant

Let me give you some numbers that put S3’s scale into perspective.

Amazon S3 currently stores over 350 trillion objects across exabytes of data. It handles over 100 million requests per second on average. To put that in human terms: every second, S3 processes more requests than many websites receive in a year.

S3 was launched on March 14, 2006 — it’s been around for almost two decades now. And in that time, it has become foundational infrastructure for the internet. Netflix runs entirely on AWS and uses S3 for all its video storage and delivery. Airbnb uses S3 for backups and static files as they scaled from a small startup to hosting over 7 million accommodations globally. Even NASA relies on AWS infrastructure for their computing and storage needs.

When S3 has an outage — which is rare given its 99.99% availability design and 99.999999999% durability (that’s eleven 9s) — entire portions of the internet feel it. That’s how deeply embedded it has become in our digital infrastructure.

The Security Problem Nobody Talks About

Here’s the thing about S3: it’s incredibly easy to use. You can create a bucket, upload files, and share them in minutes. That simplicity is both its greatest strength and its biggest security risk.

According to research data, as many as 7% of all S3 servers were completely publicly accessible without any authentication during the peak of S3 misconfiguration issues. 35% were unencrypted. We’re not talking about small test buckets — we’re talking about production systems holding customer data, financial records, and personal information.

The issue peaked around 2017, but it didn’t stop there. These incidents have continued for years, demonstrating how difficult it can be to properly secure cloud-based resources.

The Decade of S3 Breaches: A Timeline

Let me walk you through some of the most significant S3-related breaches. These aren’t theoretical scenarios — they’re documented incidents that resulted in regulatory fines, lawsuits, and real harm to millions of people.

2017: The Year Everything Went Public

Deep Root Analytics — 198 Million Voter Records

In June 2017, security researcher Chris Vickery discovered an unsecured Amazon S3 bucket containing personal information on approximately 198 million American voters — roughly three out of every five Americans. The data was collected by Deep Root Analytics, a data analytics firm working for the Republican National Committee.

The exposed data included dates of birth, mailing addresses, phone numbers, political inclinations, voter registration status, and algorithmic predictions across 48 different categories. About 1.1 terabytes of data was available to download, not password protected. Anyone could access it simply by navigating to a six-character Amazon subdomain.

The data was exposed on June 1, 2017, when the firm updated security settings. It was discovered on June 12 and secured on June 14.

Verizon — 14 Million Customer Records

The same month, another researcher discovered a cloud-based Amazon S3 data repository that was fully downloadable and configured to allow public access. The cloud server was owned by NICE Systems, a third-party vendor handling Verizon’s back-office and call center operations.

The exposed data included customer names, phone numbers, and account PINs — enough information for anyone to access individual accounts, even those protected by two-factor authentication. Verizon was notified on June 13 about the exposure, but the bucket wasn’t locked down until June 22.

Accenture — Internal Credentials Exposed

Also in 2017, Accenture accidentally exposed sensitive data due to improperly secured AWS S3 buckets. The mistake could have allowed attackers to access internal credentials and encryption keys — essentially giving them the keys to the kingdom.

Dow Jones & Company — 2 Million+ Customer Records

The parent company of the Wall Street Journal exposed personal information about more than 2 million customers through misconfigured S3 permissions. The permissions were set to allow anyone with a free AWS account to access servers containing millions of customer account details.

WWE — Fan Database Exposed

World Wrestling Entertainment leaked information about wrestling fans including addresses, birthdates, educational background, ethnicity, earnings, and children’s age ranges. The database was found on an S3 server with no authentication required.

Alteryx — 123 Million Households

Marketing and analytics company Alteryx put sensitive data at risk for the majority of American households. A database containing addresses, phone numbers, mortgage ownership, ethnicity, and personal interest information about 123 million households was exposed on a publicly accessible S3 storage cache.

2019: The Capital One Breach

This is the one that changed everything.

On July 19, 2019, Capital One discovered that someone had accessed customer information stored in their AWS environment. The breach affected approximately 100 million individuals in the United States and about 6 million in Canada.

The exposed data included names, addresses, dates of birth, credit scores, transaction data, more than 100,000 Social Security numbers, and approximately 1 million Canadian Social Insurance Numbers. The breach represented one of the largest data breaches ever affecting the financial services industry.

How It Happened

The attacker, a former AWS engineer named Paige Thompson, exploited a server-side request forgery (SSRF) vulnerability. The attack path involved:

A misconfigured Web Application Firewall (WAF) that could be tricked into executing commands
Access to the AWS metadata service, which provided temporary credentials
An over-provisioned IAM role that granted access to S3 storage buckets
The ability to copy 30 GB of customer data across 700 different S3 buckets

The breach actually occurred between March 22–23, 2019, but wasn’t discovered until July — a four-month gap where the attacker had access to the data.

The Consequences

The financial impact was staggering:

$80 million fine from the Office of the Comptroller of the Currency (OCC)
$190 million settlement for customer lawsuits
Nearly $300 million in total losses including litigation, settlement fees, and remediation

Paige Thompson was convicted on seven federal charges related to the data theft.

2020–2021: The Breaches Continue

Prestige Software — 10 Million Hotel Booking Records (2020)

In November 2020, security researchers discovered that Prestige Software had exposed over 10 million records related to its Cloud Hospitality platform. This affected users of major travel websites including Booking.com, Expedia, and Hotels.com. The exposed data included customer names and credit card numbers — all because of a misconfigured S3 bucket.

Twitch — 125 GB Data Leak (2021)

In October 2021, Amazon-owned streaming platform Twitch suffered a massive data breach. An anonymous actor leaked a 125 GB torrent file containing the entirety of Twitch’s source code (6,000 internal Git repositories), creator payout reports from 2019, proprietary SDKs and internal AWS services, internal security tools, and an unreleased Steam competitor codenamed “Vapor.”

Security researchers found nearly 6,600 secrets inside the Twitch Git repositories, including 194 AWS keys, 69 Twilio keys, 68 Google API keys, hundreds of database connection strings, and 14 GitHub OAuth keys.

The cause? A server configuration error that allowed unauthorized access. One credible theory based on forensic analysis suggests it came from a compromised S3 bucket.

2022: Aviation and Education Under Fire

Pegasus Airlines — 6.5 TB of Flight Data (February 2022)

In February 2022, SafetyDetectives researchers discovered an unprotected AWS S3 bucket belonging to Pegasus Airlines containing 6.5 terabytes of “Electronic Flight Bag” information. The exposed data included navigation information, proprietary software, and personal information pertaining to flight crew members. Once notified, Pegasus Airlines promptly secured the bucket — but the exposure demonstrated how even aviation companies weren’t immune to basic misconfigurations.

Securitas Airport Data — 3 TB Affecting Multiple Airports (2022)

A misconfigured Amazon S3 bucket resulted in 3TB of airport security data — more than 1.5 million files — being publicly accessible without any authentication. The exposure affected at least four airports in Colombia and Peru. The leaked data included photos of airline employees, national ID cards, information about planes, fuel lines, and GPS map coordinates. Security researchers noted that this information “could present a serious threat if leveraged by terrorist groups or criminal organizations.”

McGraw Hill — 22 TB of Student Data (2022)

Educational publishing giant McGraw Hill exposed more than 100,000 students’ information through misconfigured S3 buckets. The buckets contained over 22 TB of data and 117 million files, including Excel sheets with student names, email addresses, and grades. The exposed data was linked to prestigious institutions including Johns Hopkins University, University of Michigan, UCLA, and Canada’s McGill University. The most alarming part? The misconfigured buckets could have been accessed by anyone with a web browser as far back as 2015 — potentially a seven-year exposure window.

2023: Global Organizations Fall Victim

Capita — UK Government Contractor Data Leak

Capita, a major UK government contractor, had sensitive data from councils, residents, and other sources leaked due to a misconfigured S3 bucket. The breach highlighted how third-party vendors remain a significant weak point in the supply chain.

MPD FM — UK Government Employee Data

MPD FM, a facility management and security company serving UK government departments, exposed passports, visas, national IDs, and employee data through a misconfigured S3 bucket.

Tata Motors — 70+ TB of Fleet Data

A breach at Tata Motors revealed multiple critical security oversights that allowed unauthorized access to customer databases, financial records, fleet tracking systems, and administrative dashboards. One exposed bucket contained over 70 terabytes of fleet data spanning back to 1996.

2024–2025: The Problem Persists

ESHYFT Healthcare Data Exposure (2025)

In early 2025, security researcher Jeremiah Fowler discovered a non-password-protected AWS S3 bucket belonging to ESHYFT, a healthcare staffing platform. The exposed database contained 86,341 records totaling 108.8 gigabytes of sensitive data belonging to US nurses, including profile images, professional licenses, certifications (CPR cards, BLS certifications), tax documents (W2 forms), partial Social Security numbers, and work schedule information. The bucket was secured on March 5, 2025, demonstrating that healthcare-adjacent organizations continue to struggle with basic cloud security.

Indian Bank Transfer Records — 273,000 Documents (2025)

In August 2025, UpGuard discovered a public Amazon S3 storage bucket containing over 273,000 PDF documents detailing bank transfers in India. Each file documented a single transaction, revealing unredacted bank account numbers, transaction amounts, names, phone numbers, and email addresses. The breach affected customers of at least 38 banks and financial institutions, including State Bank of India and Punjab National Bank. The bucket was indexed by GrayhatWarfare, a searchable database of publicly visible cloud storage, before being secured on September 4.

The Codefinger Ransomware Campaign (January 2025)

Perhaps most alarming is a new attack vector that emerged in early 2025. Security firm Halcyon identified a threat actor group called “Codefinger” conducting ransomware attacks that leverage AWS’s own encryption infrastructure against customers.

The attack works like this: using compromised AWS credentials, attackers encrypt S3 bucket data using Server-Side Encryption with Customer-Provided Keys (SSE-C). Once encrypted, recovery is impossible without the attacker’s key — neither the victim nor AWS can decrypt the data. The attackers then demand ransom payments for the decryption keys.

AWS’s Customer Incident Response Team confirmed they “detected a pattern where a large number of S3 CopyObject operations using SSE-C began to overwrite objects” and implemented automatic mitigations. But this attack represents a fundamental shift: instead of just exposing data, attackers can now hold it hostage using AWS’s own security features.

A 2024 Palo Alto Networks study found over 90,000 leaked .env files containing 1,185 AWS access keys — each one a potential entry point for this type of attack.

The Scale of the Problem Today

According to research by Lightspin (now Gem Security), approximately 46% of S3 buckets are potentially misconfigured, with many still publicly accessible. The Fortinet 2025 Global Threat Landscape Report notes that “cloud environments remain a top target, with adversaries exploiting persistent weaknesses, such as open storage buckets, over-permissioned identities, and misconfigured services.”

We’re not talking about a problem that was solved years ago. This is an ongoing crisis.

Understanding the Root Cause: OWASP Top 10

If you’re familiar with the OWASP Top 10, these breaches won’t surprise you. Security Misconfiguration is now ranked #5 in the OWASP Top 10 2021 — up from #6 in the previous edition.

According to OWASP, 90% of applications tested showed some form of misconfiguration, with over 208,000 occurrences of Common Weakness Enumeration (CWE) issues in this risk category.

OWASP specifically calls out cloud storage permissions: “Review cloud storage permissions (e.g., S3 bucket permissions)” as a key prevention measure.

The pattern across all these breaches is remarkably consistent:

Default configurations left unchanged — S3 buckets created with default settings that may allow unintended access
Unnecessary features enabled — Permissions granted that weren’t actually needed
Missing security hardening — Basic protections like encryption and logging not configured
Improper access controls — Overly permissive policies that expose data to the internet

Why Does This Keep Happening?

After years of working with organizations on their AWS security, I’ve identified three main reasons:

1. Speed Over Security

In the rush to deploy applications, security is often an afterthought. “We’ll lock it down later” becomes “We forgot to lock it down at all.” S3 makes it so easy to get things working that teams move on before implementing proper security controls.

2. Complexity of Cloud Security

AWS provides powerful security features, but understanding and correctly implementing them requires expertise. The shared responsibility model means AWS secures the infrastructure, but you’re responsible for securing your configurations. Many teams don’t fully understand where that line is.

3. Lack of Visibility

You can’t secure what you can’t see. Organizations often don’t have clear visibility into all their S3 buckets, their configurations, and who has access to them. Shadow IT and forgotten test buckets create blind spots.

The Stakes Have Never Been Higher

GDPR fines can reach up to 4% of annual global revenue for unencrypted personal data. HIPAA violations can cost $50,000+ per exposed patient record. PCI-DSS violations can result in loss of card processing privileges. Beyond regulatory fines, there’s the reputational damage, customer lawsuits, and the human cost of having your personal information exposed.

What’s Next

In the next article, I’ll dive deep into the specific security checks every S3 bucket needs and the compliance frameworks (CIS, AWS FSBP, PCI-DSS, HIPAA, SOC 2, ISO 27001/27017/27018, and GDPR) that drive these requirements. We’ll look at what each check protects against and why it matters.

In Part 3, I’ll introduce a tool I built to automate these security assessments — born out of my own frustration with manually checking bucket configurations across multiple accounts.

And in Part 4, we’ll cover step-by-step remediation for every security issue, with AWS Console, CLI, and Python examples you can use immediately.

Sources

2017 Breaches

2019 Capital One Breach

2020–2021 Breaches

2022 Breaches

2023 Breaches

2024–2025 Breaches

Industry Reports and Standards

AWS Documentation

DEV Community