Bala Paranj

Posted on Apr 24

8.7 billion records leaked from one misconfigured cluster. Eight findings would have prevented it.

#aws #security #elasticsearch #cloud

A publicly exposed Elasticsearch cluster leaked billions of records including national IDs and plaintext passwords. No exploit. No zero-day. Just a database on the internet without authentication. Here's what prevention looks like for this class of incident.

No exploit required

In 2026, researchers discovered a massive Elasticsearch cluster exposed to the public internet. Billions of records — national IDs, addresses, phone numbers, plaintext passwords — accessible to anyone who knew the endpoint. No authentication. No VPC restriction. No encryption. The cluster sat open for weeks.

The breach wasn't sophisticated. Nobody exploited a vulnerability. Nobody bypassed security controls. There were no security controls to bypass. The cluster was deployed with default settings that allowed unauthenticated access from any IP address on the internet.

This is the most common class of data breach in cloud infrastructure: a managed data store left public because the access controls that should have been configured weren't. The same pattern that causes public S3 buckets, exposed RDS databases, and open Redis instances. Different service, identical root cause.

1. What went wrong

An Elasticsearch cluster (or AWS OpenSearch domain) has six security layers. Every layer was either disabled or left at its permissive default:

Public endpoint. The cluster was accessible from the internet. OpenSearch domains can be deployed inside a VPC (private, unreachable externally) or with a public endpoint. This one had a public endpoint.

No authentication. OpenSearch supports fine-grained access control — IAM-based authentication, SAML federation, internal user databases. None were enabled. Any HTTP request to the endpoint returned data.

No access policy restriction. OpenSearch resource-based policies can restrict which principals or IP ranges can access the domain. The policy either allowed Principal: "*" or had no restrictions.

No encryption in transit. HTTPS enforcement was not configured. Requests and responses, including the leaked records traveled in plaintext.

No encryption at rest. Data on disk was unencrypted. Anyone with storage-level access could read the raw data.

No audit logging. OpenSearch audit logs track who accessed what. Without them, there's no forensic trail of how many parties accessed the exposed data or what they downloaded.

Six layers of security. All six absent. The data store was as exposed.

2. Why this keeps happening

Elasticsearch and OpenSearch are developer tools. Teams deploy them for search, analytics, and log aggregation. The deployment path optimizes for "get it working". Getting it working means the endpoint is reachable and data flows in.

Security configuration is a separate step that happens after deployment or doesn't happen at all. The defaults don't help:

Public endpoint is often the path of least resistance during development
Authentication adds complexity to the client integration
VPC deployment requires networking setup that developers may not have permissions for
Encryption has a perceived performance cost (negligible in practice, significant in perception)

Every unsecured Elasticsearch cluster follows the same lifecycle: deployed for development, promoted to production, never hardened. The security review that should happen between development and production didn't happen or happened and didn't check this cluster.

This isn't unique to Elasticsearch. The same lifecycle applies to every managed data store: RDS, DynamoDB, Redis, Redshift. The service provides security controls. The deployment doesn't enable them. The data sits exposed until someone — a researcher, a journalist, or an attacker — finds it.

3. Eight findings, three critical

Running a security evaluation against this cluster's configuration produces eight findings that together paint the complete picture of exposure:

Finding 1: CTL.OPENSEARCH.PUBLIC.001 [CRITICAL]

  DEFECT:
    The OpenSearch domain has a public endpoint
    accessible from the internet without VPC
    restriction.

  INFECTION:
    Anyone on the internet can reach the domain's
    API endpoint. Automated scanners continuously
    enumerate public OpenSearch domains by probing
    known endpoint patterns. Exposure is not
    hypothetical — it's actively discovered within
    hours of deployment.

  FAILURE:
    Direct data access from the internet. Every
    document in every index is reachable without
    network-level barriers.

  DELTA:
    Change: domain public access
    Current: true
    Fix: set to false (disable), deploy in VPC

Finding 2: CTL.OPENSEARCH.AUTH.001 [CRITICAL]

  DEFECT:
    The OpenSearch domain has no authentication
    enabled. Requests are accepted without
    credentials.

  INFECTION:
    Any HTTP request to the domain endpoint returns
    data. No IAM signature, no username/password,
    no SAML token required. Combined with a public
    endpoint, this means anyone on the internet can
    query, modify, or delete data.

  FAILURE:
    Unauthenticated data access. The entire contents
    of the cluster are readable by any party that
    discovers the endpoint.

  DELTA:
    Change: authentication enabled
    Current: false
    Fix: set to true (enable)

Finding 3: CTL.OPENSEARCH.VPC.001 [CRITICAL]

  DEFECT:
    The OpenSearch domain is not deployed in a VPC.
    Network access is controlled only by the domain's
    access policy, not by VPC security groups or
    network ACLs.

  DELTA:
    Change: VPC deployment
    Current: false
    Fix: set to true (enable), deploy in VPC

Plus five high-severity findings for missing fine-grained access control, permissive access policy, no HTTPS enforcement, no encryption at rest, and no node-to-node encryption. Plus medium-severity findings for missing audit logging.

Three critical findings. Five high. Each with a specific DEFECT describing what's wrong, an INFECTION explaining how it enables attack, a FAILURE describing worst-case outcome, and a DELTA providing the exact configuration change that eliminates the finding.

You can see eight findings with triage context and counterfactual fixes and act on it.

4. Why individual findings aren't enough

A flat scanner produces the same eight findings as a list. The operator sees eight items and starts working through them. Which one first?

The critical findings are obvious: public endpoint, no auth, no VPC. But the relationship between them matters for triage. Public endpoint without authentication is catastrophic. Public endpoint with authentication is concerning but not an immediate breach. No VPC without a restrictive access policy is exposed. No VPC with a policy limiting to specific IPs is a calculated risk.

The findings compound. The risk isn't additive — it's multiplicative. Each missing security layer removes a barrier that could have compensated for another missing layer. When all six layers are absent simultaneously, the exposure is total.

Compound chain detection models this. When the public-endpoint, no-auth, and no-VPC findings fire on the same domain, a chain definition composes them into one compound finding representing total exposure. The compound finding scores higher than any individual finding — its ExposureScore reflects the multiplicative risk of all barriers being absent, not just the additive sum.

The operator sees "this domain has total exposure" as one triage unit, not three separate findings they mentally correlate.

5. The data store pattern

The Chinese leak is Elasticsearch. But the misconfiguration pattern is universal across managed data stores. Every service has the same security layers:

Layer	S3	RDS	OpenSearch	Others
Public access control	Block Public Access	PubliclyAccessible	VPC / public endpoint	Same pattern
Authentication	IAM policies	IAM DB auth	Fine-grained access	Same pattern
Access policy	Bucket policy	Security groups	Resource policy	Same pattern
Encryption in transit	TLS enforcement	force_ssl	HTTPS enforcement	Same pattern
Encryption at rest	SSE-S3/KMS	StorageEncrypted	EncryptionAtRest	Same pattern
Audit logging	Access logging	Audit logs	Audit logs	Same pattern

The controls that prevent the Chinese leak on Elasticsearch are structurally identical to the controls that prevent public S3 buckets and exposed RDS instances. Different service names, different property paths, same security model.

This is why the detection isn't per-incident. It's per-pattern. The publicly accessible data store without authentication pattern applies to every managed data store service. Detection that covers the pattern covers every incident in the class — past, present, and future.

6. What the operator does

The eight findings arrive with everything the operator needs:

DEFECT tells them where to look. "The domain has a public endpoint" — check the domain's endpoint configuration.

INFECTION tells them whether to care. "Automated scanners discover public OpenSearch domains within hours" — yes, this is urgent.

FAILURE tells them the consequence. "Every document in every index is reachable" — this is a data breach, not a theoretical risk.

DELTA tells them what to change. "Set public access to false, deploy in VPC" — the specific configuration change. Not generic advice. A verified counterfactual: make this change and this finding disappears.

The operator doesn't research the incident. Doesn't manually inspect the OpenSearch console. The finding carries the complete triage chain from "what's wrong" to "what to change." Prevention happens at the terminal.

Compare this to the alternative: the operator reads about the Chinese leak in the news, wonders "could this happen to us?", manually audits their OpenSearch domains, discovers the same misconfigurations, figures out the fixes, and applies them. Days of work. Or: they run a scan, get eight findings with full context, and fix the misconfigurations in an hour. Same outcome. Different timeline. The breach happens in the gap between those timelines.

7. Prevention, not forensics

The Chinese leak was discovered by researchers. Not by the organization's security team. Not by their monitoring. Not by their logging. By external researchers who found the open endpoint and reported it.

By the time the organization learned about the exposure, the data had been accessible for weeks. Any attacker who found it — and automated scanners find open Elasticsearch clusters within hours — had the data. The organization's incident response couldn't undo the exposure. They could only close the endpoint and assess the damage.

This is the forensics problem. Logging, monitoring, and alerting tell you about a breach after it happens. They're necessary for incident response. They don't prevent the incident.

Prevention means finding the public endpoint, the missing authentication, and the absent encryption before an attacker does. Before a researcher discovers it. Before the data is accessed. The scan runs continuously on every evaluation cycle. The misconfiguration is flagged the moment it appears, with the specific fix in the DELTA section.

The Chinese leak didn't require an exploit. It didn't require sophistication. It required a misconfigured deployment that nobody checked. The eight findings that would have caught it take seconds to evaluate. The fixes take minutes to apply. The breach took weeks to discover and affected billions of records.

Deployed Without Security Controls

The Chinese leak is not unusual. It's not even interesting from a technical perspective. A database was left on the internet without authentication. That's the whole story. No zero-day, no advanced persistent threat, no nation-state actor. Just a configuration that should have been set.

Public Elasticsearch clusters are discovered daily. Public S3 buckets are discovered daily. Public RDS instances, open Redis caches, exposed MongoDB databases — daily. Each one is the same pattern: a managed data store with security controls available, deployed without them.

The controls exist. The services provide them. The deployments don't enable them. The gap between security controls available and security controls enabled is where billions of records leak.

Eight findings close that gap before the breach.

Detection for publicly exposed data stores — including the OpenSearch pattern from the Chinese leak incident — is implemented in Stave, an open-source security CLI. Thirteen OpenSearch controls detect every layer of the exposure. Compound chains surface the multiplicative risk when multiple layers are absent simultaneously.

DEV Community