03:14 AM. The PagerDuty alert hit my phone like a physical blow. "Critical: Data Integrity Failure – Account B ETL Job Failing."
Six months ago, our cross-account data sharing strategy was a nightmare of IAM resource-based policies. We had S3 bucket policies spanning three accounts, thousands of lines of JSON, and a collective prayer that no developer accidentally added s3:GetObject to a Principal: * block. It was a ticking time bomb of "Access Denied" errors and "Wait, why can the marketing team see HIPAA-regulated PII?" queries.
Today, those S3 policies are gone. We use AWS Lake Formation governed tables. The difference isn't just cosmetic; it’s the difference between auditing a 4,000-line JSON file and reading a single, declarative grant statement.
What we saw
The incident started when an ETL job in our Analytics account (Account B) stopped pulling data from our Production account (Account A). The error wasn't a clean "Access Denied." It was an AccessDeniedException coming from the AWS Glue Data Catalog, even though the IAM role had explicit glue:GetTable and s3:GetObject permissions.
Our first instinct—the "false lead"—was to assume the S3 bucket policy was malformed. We spent 45 minutes diffing the JSON in bucket-prod-data against our internal "Gold Standard" template. We checked the Principal ARN. We checked the Condition: StringEquals: aws:PrincipalOrgID. Everything looked perfect.
The symptoms were deceptive. The job could list the partitions, but as soon as the Spark executor tried to read the underlying Parquet files, the job would hang for 30 seconds and then crash. We were looking at a classic "permission mismatch" between the Data Catalog layer and the Data Storage layer.
Photo by Katelyn G on Unsplash
Root cause
The root cause was the "confused deputy" problem masked by overly permissive IAM policies. We had been relying on s3:GetObject and a Glue resource policy that was essentially a catch-all.
When we migrated to Lake Formation, we didn't fully sever the tie to the underlying S3 policy. We had a hybrid mess where Lake Formation was managing the metadata, but the S3 bucket policy was still trying to enforce access. Because we hadn't set up the LF-Tag based policies correctly, the Glue Service Role was struggling to resolve the identity of the cross-account requester.
Specifically, we were violating the "Lake Formation-managed S3 locations" requirement. If you register a path in Lake Formation (e.g., s3://data-prod-finance/), you must ensure that the IAM role used by the remote account has the lakeformation:GetDataAccess permission. We hadn't included this. We were trying to authenticate using standard IAM, while the bucket was configured to only accept requests through the Lake Formation engine.
The offending configuration was this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::ACCOUNT_B:role/AnalyticsRole"},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::data-prod-finance/*"
}
]
}
This looks fine to a junior engineer. But if the bucket has s3:BlockPublicAccess enabled and you’ve enabled Lake Formation, that policy is essentially being bypassed or, worse, ignored in favor of the Lake Formation grant. The failure happened because we didn't have the glue:GetTable and lakeformation:GetDataAccess dance fully synchronized.
The fix
We nuked the bucket policies. Seriously. We stripped the cross-account S3 bucket policies down to only allow the Lake Formation service role to access the data.
Then, we implemented a proper Lake Formation Cross-Account Grant. Instead of sharing the S3 bucket, we shared the Data Catalog table.
- We registered the S3 location in Account A under Lake Formation:
aws lakeformation register-resource --resource-arn s3://data-prod-finance/. - We granted the
SELECTpermission on the specific Glue table to the external Account B ID:aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::ACCOUNT_B:root --permissions SELECT --resource '{ "Table": { "DatabaseName": "finance", "Name": "transactions" } }'. - In Account B, we created a "Resource Link" pointing to the shared table in Account A.
By moving the permission logic from the S3 bucket policy into the Lake Formation GRANT command, we moved from "who can touch these files?" to "who can run queries on this table?" The auditors loved it because the Get-LFPermissions API response is structured, human-readable, and doesn't require a degree in IAM JSON parsing to verify.
What we changed so it never happens again
We stopped treating security as a post-deployment checklist.
First, we implemented a strict "No Manual IAM" policy for data access. If a team needs cross-account access, they submit a PR to our Terraform repository that defines a aws_lakeformation_permissions resource. If it's not in the HCL, it doesn't exist. This prevents the "drift" that caused our 3:00 AM incident.
Second, we moved to an LF-Tag based access control (LF-TBAC) model. Instead of granting access to specific tables, we tag tables with Classification: PII or Environment: Prod. The cross-account role gets access to everything tagged Environment: Prod. When a new table is added, it’s automatically shared if it carries the right tag. This eliminated the manual work of adding new tables to the grant list, which was a constant source of "Access Denied" tickets.
Finally, we use AWS CloudTrail to monitor GetDataAccess events. If an account attempts to query a table they don't have permission for, it fires an alarm in our SOC. Before, we were flying blind; now, we see the attempt in real-time.
The real lesson? Don't fight the platform. If you’re still trying to manage cross-account data sharing using S3 bucket policies and IAM roles, you’re just building technical debt that will eventually wake you up at 3:00 AM. Stop it. Move the governance to the layer that was built to handle it. Your auditors—and your sleep schedule—will thank you.
Cover photo by Albert Stoynov on Unsplash.
Top comments (0)