DEV Community

Cover image for Implementing a Secure Data Governance Architecture on AWS with S3, Glue, Athena, and Lake Formation
DavidπŸ’»
DavidπŸ’» Subscriber

Posted on

Implementing a Secure Data Governance Architecture on AWS with S3, Glue, Athena, and Lake Formation

This article explains how to built a secure and fully auditable data governance architecture using AWS S3, Glue, CloudTrail, Lake Formation, and Amazon Quick Suite

This design ensures data organization, encryption, version control, access restriction, and advanced traceability, while enabling analytical queries and dashboards through Athena and Quick Suite

Requirements

  • AWS account
  • CSV files with data

The proposed architecture

Architecture

Walkthrough

The goal is to build a centralized S3-based data lake that manages raw, processed, and sensitive data securely.
Our main bucket can be called as company-data-governance-raw

Inside our bucket we follow this folder structure:

company-data-governance-raw/
β”œβ”€β”€ raw/               
β”‚   β”œβ”€β”€ clientes/
β”‚   β”œβ”€β”€ transacciones/
β”‚   └── productos/
β”‚
β”œβ”€β”€ processed/          
β”‚   β”œβ”€β”€ clientes/
β”‚   └── transacciones/
β”‚       └── sensible=no/
β”‚
β”œβ”€β”€ sensitive/  
β”‚
β”œβ”€β”€ athena/ 
β”‚
└── logs/ 
    └── s3-access/
Enter fullscreen mode Exit fullscreen mode

This organization keeps every dataset in its right lifecycle stage: from ingestion to analysis.

S3 Configuration and Security Controls

When creating the bucket:

βœ… Block all public access

Block s3 access

βœ… Enable versioning to preserve data integrity and restore older versions

Bucket Versioning

βœ… Enable MFA Delete to prevent accidental or unauthorized deletions

βœ… Enable encryption in transit and at rest (S3 SSE-S3 or SSE-KMS with your own CMK)

Encryption

Using AWS KMS allows symmetric encryption under your control, essential for sensitive workloads.

S3 Bucket Policy – Enforcing Security

To harden the S3 layer, we can configure our bucket policy with these three key rules:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::company-data-governance-raw/*",
        "arn:aws:s3:::company-data-governance-raw"
      ],
      "Condition": {
        "Bool": { "aws:SecureTransport": "false" }
      }
    },
    {
      "Sid": "RestrictSensitiveData",
      "Effect": "Deny",
      "Principal": "*",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::company-data-governance-raw/sensitive/*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalArn": [
            "arn:aws:iam::<account-id>:user/<user-or-role>",
            "arn:aws:iam::<account-id>:role/AWSGlueServiceRole-GobiernoDatos"
          ]
        }
      }
    },
    {
      "Sid": "S3PolicyStmt-DO-NOT-MODIFY-1760725108745",
      "Effect": "Allow",
      "Principal": { "Service": "logging.s3.amazonaws.com" },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::company-data-governance-raw/*",
      "Condition": {
        "StringEquals": { "aws:SourceAccount": "<your-account-id" }
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

What this does, denies HTTP requests only HTTPS traffic is allowed.

Restricts access to /sensitive/ only for a specific IAM user and Glue role. Grants S3 logging service permission to write access logs only from the same AWS account. This combination provides network level encryption, identity based access control, and logging integrity.

Enabling Access Logging

Enabling S3 server access logging is essential for auditing who accessed what. Logs from all requests (user or programmatic) are stored under /logs/s3-access/ inside the same bucket, ensuring full traceability of every data operation.

Access log s3

Building the Data Catalog in AWS Glue

After uploading your data into the appropriate folders, the next step is to catalog it. Go to AWS Glue β†’ Data Catalog β†’ Databases

Create a new database named data_governance_catalog

aws glue database

Then create a crawler that will scan your S3 bucket and automatically build table schemas.

Steps:

  • Crawler name: company-crawler-governance-data-raw
  • Create a new IAM role: DataGovernance
  • Choose the database data_governance_catalog as target
  • Run the crawler on demand

Crawler

Once it finishes, you’ll see three tables reflecting your S3 folder structure (clientes, transacciones, productos).

Tags in table

Adding Traceability with AWS CloudTrail

Security doesn’t stop at encryption. We also need visibility into every read and write operation.

In CloudTrail β†’ Trails, select your primary trail and enable Data Events for S3. This allows auditing of operations like GetObject and PutObject inside the bucket. CloudTrail logs will now show who accessed which file and when, ensuring compliance and audit readiness

Applying Data Access Controls with Lake Formation

For column level and row level permissions, we can use AWS Lake Formation Filters. Go to Lake Formation β†’ Data Filters β†’ Create new filter

Filter name: data-governance-filters

  • Select your data catalog and focus on sensitive columns in the clientes table (e.g., credit card holder details)
  • Column filters: hide entire sensitive columns
  • Row filters: restrict specific data rows
  • Then, under Permissions, define Grants to control who can see what.
  • For example, deny the Glue role and the current user access to columns marked as sensitive.

Filters datalake

Row filters

After applying the filter, running queries via Athena will show masked results for those restricted columns.

Querying Data with Amazon Athena

Now we can query the cataloged data using Athena, which automatically integrates with Glue. When executing a query, Athena respects Lake Formation permissions sensitive columns are hidden for restricted users, and queries run against optimized Parquet data under /processed/.

Example:

SELECT * FROM data_governance_catalog.clientes;
Enter fullscreen mode Exit fullscreen mode

Query results

Results are stored in the /athena/ folder, ready to visualize in Quick suite.

Visualizing Insights with Amazon QuickSight

To visualize and share insights. Open QuickSight β†’ New Analysis

  • Choose Athena as the data source
  • Select the data_governance_catalog
  • Build dashboards using your filtered datasets

Athena sources

Example visualizations:

  • Customers by region
  • Predicted transactions per segment
  • Sensitive data audit summaries Quick Suite connects seamlessly with Athena, ensuring that governance rules continue to apply even at the visualization layer.

Source tables

Result dashboards

Cost Estimation

Here’s an approximate monthly cost breakdown for a moderate workload:

Component Usage Assumption Estimated / Month (USD)
S3 Storage 500 GB (Standard) $11.50
S3 Requests Moderate GET/PUT traffic $8.00
CloudTrail Data Events 25 M events $25.00
Glue Data Catalog 20k objects / 1 M requests $1.20
Glue Crawler ~0.3 h/day (β‰ˆ 9 DPU-h) $4.00
Glue ETL (light) β‰ˆ 9 DPU-h/month $4.00
Athena Queries 3 TB scanned/month $15.00
CloudWatch Alarms 5 alarms $0.50
QuickSight (Enterprise) 1 author + 1 reader + 5 GB SPICE $30.90
Total Estimated Cost β‰ˆ $100.10 / month

A compact and cost efficient solution for complete data governance.

Conclusion

This architecture provides a secure, auditable, and scalable data governance foundation built entirely with managed AWS services.
From S3 encryption to Lake Formation filters and QuickSight dashboards, every layer enforces security, traceability, and performance. We can easily extend this solution by:

  • Adding Glue ETL jobs for automated transformations
  • Integrating with Amazon Redshift for advanced analytics
  • Applying AWS Macie for sensitive data discovery

If you’re building a data lake or starting a governance project, this structure provides a strong and repeatable foundation for compliance ready analytics in AWS.

Top comments (0)