This article explains how to built a secure and fully auditable data governance architecture using AWS S3, Glue, CloudTrail, Lake Formation, and Amazon Quick Suite
This design ensures data organization, encryption, version control, access restriction, and advanced traceability, while enabling analytical queries and dashboards through Athena and Quick Suite
Requirements
- AWS account
- CSV files with data
The proposed architecture
Walkthrough
The goal is to build a centralized S3-based data lake that manages raw, processed, and sensitive data securely.
Our main bucket can be called as company-data-governance-raw
Inside our bucket we follow this folder structure:
company-data-governance-raw/
βββ raw/
β βββ clientes/
β βββ transacciones/
β βββ productos/
β
βββ processed/
β βββ clientes/
β βββ transacciones/
β βββ sensible=no/
β
βββ sensitive/
β
βββ athena/
β
βββ logs/
βββ s3-access/
This organization keeps every dataset in its right lifecycle stage: from ingestion to analysis.
S3 Configuration and Security Controls
When creating the bucket:
β Block all public access
β Enable versioning to preserve data integrity and restore older versions
β Enable MFA Delete to prevent accidental or unauthorized deletions
β Enable encryption in transit and at rest (S3 SSE-S3 or SSE-KMS with your own CMK)
Using AWS KMS allows symmetric encryption under your control, essential for sensitive workloads.
S3 Bucket Policy β Enforcing Security
To harden the S3 layer, we can configure our bucket policy with these three key rules:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyInsecureTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::company-data-governance-raw/*",
"arn:aws:s3:::company-data-governance-raw"
],
"Condition": {
"Bool": { "aws:SecureTransport": "false" }
}
},
{
"Sid": "RestrictSensitiveData",
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::company-data-governance-raw/sensitive/*",
"Condition": {
"StringNotEquals": {
"aws:PrincipalArn": [
"arn:aws:iam::<account-id>:user/<user-or-role>",
"arn:aws:iam::<account-id>:role/AWSGlueServiceRole-GobiernoDatos"
]
}
}
},
{
"Sid": "S3PolicyStmt-DO-NOT-MODIFY-1760725108745",
"Effect": "Allow",
"Principal": { "Service": "logging.s3.amazonaws.com" },
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::company-data-governance-raw/*",
"Condition": {
"StringEquals": { "aws:SourceAccount": "<your-account-id" }
}
}
]
}
What this does, denies HTTP requests only HTTPS traffic is allowed.
Restricts access to /sensitive/ only for a specific IAM user and Glue role. Grants S3 logging service permission to write access logs only from the same AWS account. This combination provides network level encryption, identity based access control, and logging integrity.
Enabling Access Logging
Enabling S3 server access logging is essential for auditing who accessed what. Logs from all requests (user or programmatic) are stored under /logs/s3-access/ inside the same bucket, ensuring full traceability of every data operation.
Building the Data Catalog in AWS Glue
After uploading your data into the appropriate folders, the next step is to catalog it. Go to AWS Glue β Data Catalog β Databases
Create a new database named data_governance_catalog
Then create a crawler that will scan your S3 bucket and automatically build table schemas.
Steps:
- Crawler name: company-crawler-governance-data-raw
- Create a new IAM role: DataGovernance
- Choose the database data_governance_catalog as target
- Run the crawler on demand
Once it finishes, youβll see three tables reflecting your S3 folder structure (clientes, transacciones, productos).
Adding Traceability with AWS CloudTrail
Security doesnβt stop at encryption. We also need visibility into every read and write operation.
In CloudTrail β Trails, select your primary trail and enable Data Events for S3. This allows auditing of operations like GetObject and PutObject inside the bucket. CloudTrail logs will now show who accessed which file and when, ensuring compliance and audit readiness
Applying Data Access Controls with Lake Formation
For column level and row level permissions, we can use AWS Lake Formation Filters. Go to Lake Formation β Data Filters β Create new filter
Filter name: data-governance-filters
- Select your data catalog and focus on sensitive columns in the clientes table (e.g., credit card holder details)
- Column filters: hide entire sensitive columns
- Row filters: restrict specific data rows
- Then, under Permissions, define Grants to control who can see what.
- For example, deny the Glue role and the current user access to columns marked as sensitive.
After applying the filter, running queries via Athena will show masked results for those restricted columns.
Querying Data with Amazon Athena
Now we can query the cataloged data using Athena, which automatically integrates with Glue. When executing a query, Athena respects Lake Formation permissions sensitive columns are hidden for restricted users, and queries run against optimized Parquet data under /processed/.
Example:
SELECT * FROM data_governance_catalog.clientes;
Results are stored in the /athena/ folder, ready to visualize in Quick suite.
Visualizing Insights with Amazon QuickSight
To visualize and share insights. Open QuickSight β New Analysis
- Choose Athena as the data source
- Select the data_governance_catalog
- Build dashboards using your filtered datasets
Example visualizations:
- Customers by region
- Predicted transactions per segment
- Sensitive data audit summaries Quick Suite connects seamlessly with Athena, ensuring that governance rules continue to apply even at the visualization layer.
Cost Estimation
Hereβs an approximate monthly cost breakdown for a moderate workload:
Component | Usage Assumption | Estimated / Month (USD) |
---|---|---|
S3 Storage | 500 GB (Standard) | $11.50 |
S3 Requests | Moderate GET/PUT traffic | $8.00 |
CloudTrail Data Events | 25 M events | $25.00 |
Glue Data Catalog | 20k objects / 1 M requests | $1.20 |
Glue Crawler | ~0.3 h/day (β 9 DPU-h) | $4.00 |
Glue ETL (light) | β 9 DPU-h/month | $4.00 |
Athena Queries | 3 TB scanned/month | $15.00 |
CloudWatch Alarms | 5 alarms | $0.50 |
QuickSight (Enterprise) | 1 author + 1 reader + 5 GB SPICE | $30.90 |
Total Estimated Cost | β $100.10 / month |
A compact and cost efficient solution for complete data governance.
Conclusion
This architecture provides a secure, auditable, and scalable data governance foundation built entirely with managed AWS services.
From S3 encryption to Lake Formation filters and QuickSight dashboards, every layer enforces security, traceability, and performance. We can easily extend this solution by:
- Adding Glue ETL jobs for automated transformations
- Integrating with Amazon Redshift for advanced analytics
- Applying AWS Macie for sensitive data discovery
If youβre building a data lake or starting a governance project, this structure provides a strong and repeatable foundation for compliance ready analytics in AWS.
Top comments (0)