<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Soumyadeep Basu</title>
    <description>The latest articles on DEV Community by Soumyadeep Basu (@soumyadeep_basu_3101d3ac1).</description>
    <link>https://dev.to/soumyadeep_basu_3101d3ac1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3870577%2F1faa3124-25b0-4484-b519-5d8789705689.jpg</url>
      <title>DEV Community: Soumyadeep Basu</title>
      <link>https://dev.to/soumyadeep_basu_3101d3ac1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/soumyadeep_basu_3101d3ac1"/>
    <language>en</language>
    <item>
      <title>AWS Lake Formation: Why Your Data Lake Permissions Are Probably a Mess (And How to Fix That)</title>
      <dc:creator>Soumyadeep Basu</dc:creator>
      <pubDate>Thu, 09 Apr 2026 21:34:51 +0000</pubDate>
      <link>https://dev.to/soumyadeep_basu_3101d3ac1/aws-lake-formation-why-your-data-lake-permissions-are-probably-a-mess-and-how-to-fix-that-1pl</link>
      <guid>https://dev.to/soumyadeep_basu_3101d3ac1/aws-lake-formation-why-your-data-lake-permissions-are-probably-a-mess-and-how-to-fix-that-1pl</guid>
      <description>&lt;p&gt;Week 1 of a series on AWS Lake Formation — from fundamentals to real-world implementation&lt;br&gt;
If you've been working with AWS data infrastructure for any length of time, you've probably set up an S3-based data lake. You created some buckets, wrote IAM policies, maybe added some bucket policies on top — and it worked.&lt;br&gt;
Until it didn't.&lt;br&gt;
Maybe your team grew. Maybe you added a second AWS account. Maybe someone asked "who exactly has access to this data?" and you spent two hours trying to piece together an answer from a maze of IAM policies.&lt;br&gt;
That's the moment AWS Lake Formation starts making sense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Lake Formation Actually Is&lt;/strong&gt;&lt;br&gt;
Lake Formation is not a storage service. Your data still lives in S3. Lake Formation is a permissions and governance layer that sits on top of your data lake and gives you one centralized place to define, manage, and audit who can access what.&lt;br&gt;
Think of S3 + IAM as the raw infrastructure. Lake Formation is the control plane on top of it.&lt;br&gt;
With Lake Formation you can grant permissions at the:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Database level&lt;/strong&gt;&lt;br&gt;
 — this team can see this entire database&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Table level&lt;/strong&gt;&lt;br&gt;
 — this role can query this specific table&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Column level&lt;/strong&gt;&lt;br&gt;
 — this user can see everything except these sensitive columns&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Row level&lt;/strong&gt;&lt;br&gt;
 — this team can only see rows where region = 'US'&lt;br&gt;
That kind of granularity with pure IAM is technically possible. In practice it becomes unmaintainable fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why IAM Alone Breaks at Scale&lt;/strong&gt;&lt;br&gt;
Let me give you a concrete scenario.&lt;br&gt;
You have a data lake with 40 tables across 6 databases. You have 5 teams — data science, analytics, finance, marketing, and engineering. Each team needs different access to different tables, and some tables have columns that only certain roles should see.&lt;br&gt;
With pure IAM you're writing and maintaining dozens of policies, attaching them to roles, keeping bucket policies in sync, and hoping nothing drifts. When someone gets an access denied error at 9am on a Monday, good luck tracing exactly which policy is the problem.&lt;br&gt;
With Lake Formation, you open one console (or run one API call / Terraform resource) and say: data science role has SELECT on these 12 tables, excluding this column. Done. Auditable. Revokable in seconds.&lt;br&gt;
The mental model shift is significant: you stop thinking about who can access S3 paths and start thinking about who can access data assets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Core Concepts You Need to Know&lt;/strong&gt;&lt;br&gt;
Before you touch anything in Lake Formation, get these four concepts clear:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Data Catalog&lt;/strong&gt;&lt;br&gt;
 — the metadata layer. Lake Formation uses the AWS Glue Data Catalog to store database and table definitions. You register your S3 locations here and Lake Formation takes over governing access to them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Data Lake Administrator&lt;/strong&gt;&lt;br&gt;
 — a special role that has full control over Lake Formation. You'll set this up first. Don't confuse it with an IAM admin — it's a separate Lake Formation concept.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Permissions&lt;/strong&gt;&lt;br&gt;
 — Lake Formation has its own permission model (DESCRIBE, SELECT, ALTER, DROP, etc.) that maps roughly to what you'd expect from a database. These are what you grant to IAM principals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Registered Locations&lt;/strong&gt;&lt;br&gt;
 — before Lake Formation can govern an S3 path, you have to register it. This tells LF "I want you to manage access to data in this location." After that, IAM alone is no longer enough to read the data — LF permissions are required too.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What This Series Covers&lt;/strong&gt;&lt;br&gt;
This is week one of a practical series on Lake Formation. No fluff — just the things that actually matter when you're implementing this in a real environment.&lt;br&gt;
&lt;strong&gt;Here's where we're going:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Week 2 — TBAC vs NBAC&lt;/strong&gt;: the two permission models, when to use each, and why the choice matters more than AWS lets on&lt;br&gt;
&lt;strong&gt;Week 3 — Cross-account data sharing&lt;/strong&gt;: the gotchas, the traps, and a real bug that cost me hours&lt;br&gt;
&lt;strong&gt;Week 4 — Automating Lake Formation with Terraform&lt;/strong&gt;: what works, what doesn't, and how to structure your modules&lt;br&gt;
If you work with AWS data infrastructure — whether you're building a data lake from scratch or trying to bring governance to an existing one — this series is for you.&lt;br&gt;
&lt;em&gt;&lt;strong&gt;I'm a Data Platform Engineer working with AWS data infrastructure daily. Follow along if you want the practical version of Lake Formation — not the tutorial, the real thing.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>awsdatalake</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
