wilfried bako

Posted on May 19

Building Project Aegis: Designing a Serverless File Integrity Monitoring System on AWS

#terraform #devops #cloudsecurity #aws

🛡️ Building Project Aegis: Designing a Serverless File Integrity Monitoring System on AWS

One of the biggest challenges in cloud security is ensuring that critical files remain authentic, traceable, and tamper-free after being uploaded into cloud environments.

In industries such as legal systems, healthcare, forensic investigations, and compliance-driven environments, even a small unauthorized file modification can create serious operational and security risks.

That raised an important question for me:

How can you detect when a file has been silently modified while still maintaining a reliable audit trail and real-time operational visibility?

To explore that problem, I designed and deployed Project Aegis — a serverless file integrity monitoring platform built on AWS.

The platform automatically:

generates SHA-256 hashes for uploaded files
stores audit history in DynamoDB
detects file tampering
triggers real-time alerts using Amazon SNS
maintains an operational audit workflow using serverless architecture

Unlike traditional monolithic systems, the entire platform was designed around:

event-driven workflows
serverless scalability
operational automation
infrastructure reproducibility
observability and monitoring principles

Architecture Overview

The architecture follows a fully event-driven workflow:

Amazon S3 → AWS Lambda → DynamoDB → Amazon SNS

Workflow summary:

A user uploads a file into Amazon S3
S3 automatically triggers a Lambda function
Lambda generates a SHA-256 hash of the uploaded file
DynamoDB stores historical hash records and audit metadata
If the same filename is uploaded with different content:

the system detects tampering
triggers an SNS notification
updates audit history

Core AWS Services Used

Amazon S3

Amazon S3 acts as the ingestion layer for uploaded files and automatically triggers the processing pipeline using event notifications.

AWS Lambda

AWS Lambda handles the serverless processing logic:

retrieves uploaded files
generates SHA-256 hashes
compares historical audit records
updates DynamoDB
triggers SNS notifications when tampering is detected

Amazon DynamoDB

DynamoDB stores:

file metadata
SHA-256 hashes
audit history
modification tracking records

This provides a scalable and serverless audit logging layer.

Amazon SNS

Amazon SNS sends real-time operational alerts whenever suspicious file modifications are detected.

Amazon CloudWatch

CloudWatch was heavily used for:

monitoring
troubleshooting
Lambda execution tracing
debugging operational failures

Terraform

After initially building the project manually inside the AWS Console, I later automated the infrastructure using Terraform to improve:

reproducibility
scalability
deployment consistency
infrastructure management

File Integrity Logic

One of the most important parts of the project was designing reliable file tampering detection.

The system follows this logic:

Same filename + same content → No alert
Same filename + modified content → Trigger alert

Instead of relying only on filenames, the platform generates SHA-256 hashes to compare actual file content integrity.

This prevents false positives and improves audit reliability significantly.

Testing the System

Initial Upload

Example:

```text id="c8d92s"
test.txt → hello




Result:

* hash stored in DynamoDB
* no alert triggered

### Modified Upload



```text id="d92jss"
test.txt → HELLO WORLD 123

Result:

new hash generated
tampering detected
SNS email alert triggered
audit record updated

Challenges & Solutions

One of the most valuable parts of building Project Aegis was troubleshooting real operational and infrastructure problems while designing the platform.

Several engineering challenges forced me to think beyond simply connecting AWS services and instead focus on debugging, observability, system behavior, and operational reliability.

1. Duplicate File Detection

Problem

Uploading files with the same name triggered unnecessary alerts.

Root Cause

Initial logic compared filenames only instead of actual file content.

Solution

Implemented SHA-256 hashing to compare file content integrity directly.

Result

Alerts now trigger only when actual file content changes.

2. SNS Alerts Not Triggering

Problem

Real-time alerts were not being received after file modifications.

Root Cause

Missing SNS publish permissions and incomplete Lambda notification logic.

Solution

Added proper IAM permissions (sns:Publish) and integrated SNS workflows directly into Lambda processing.

Result

Operational alerts now trigger successfully in real time.

3. Amazon S3 Overwrite Behavior

Problem

Uploading a file with the same name replaced the existing object unexpectedly.

Root Cause

Amazon S3 overwrites objects sharing the same key by default.

Solution

Shifted detection logic toward hash comparison rather than filename dependency.

Result

The platform accurately detects modifications even when files are overwritten.

4. CloudWatch Debugging & Observability

Problem

It was initially difficult to verify whether Lambda executions completed successfully.

Solution

Used Amazon CloudWatch logs to trace execution flow, monitor failures, and debug event processing behavior.

Result

Improved operational visibility and troubleshooting reliability significantly.

5. Infrastructure Deployment Consistency

Problem

Manual deployments introduced operational inconsistency and configuration drift.

Solution

Implemented Infrastructure as Code using Terraform to automate AWS resource provisioning.

Result

The infrastructure can now be recreated consistently using repeatable deployment workflows.

What I Learned

Building Project Aegis reinforced several important cloud engineering and operational concepts for me:

Designing event-driven serverless architectures for real-time processing
Applying SHA-256 hashing to enforce integrity validation and auditability
Orchestrating AWS services into a cohesive operational workflow
Using Infrastructure as Code with Terraform for scalable and reproducible deployments
Implementing least-privilege IAM permissions to secure service interactions
Leveraging CloudWatch for observability, debugging, and operational monitoring
Understanding how cloud systems behave under real operational conditions instead of only theoretical deployments

One of the biggest mindset shifts from this project was realizing that cloud engineering is not simply about deploying services.

It’s about understanding:

operational behavior
reliability
observability
troubleshooting
automation
security
repeatability at scale

Future Improvements

There are several areas I’d continue expanding in future iterations of the platform:

CI/CD automation using GitHub Actions or Jenkins
Multi-environment Terraform deployments (dev/staging/prod)
API Gateway + authentication integration
Advanced observability dashboards using CloudWatch Insights or Grafana
Cross-region replication and disaster recovery workflows
Policy-as-code and automated security validation
Enhanced audit retention and compliance-focused storage strategies

Final Takeaway

Project Aegis started as a cloud security learning project, but it ultimately became an exercise in operational thinking, infrastructure automation, observability, and system reliability.

Building systems is important.

But building systems that are:

repeatable
observable
secure
resilient
operationally reliable

is what truly starts shifting cloud projects toward real engineering platforms.

GitHub Repository

https://github.com/wilfriedbako/Project-Aegis

If you’ve worked on similar cloud security or serverless engineering projects, I’d genuinely enjoy connecting and learning from other approaches and ideas.

AWS #CloudSecurity #Terraform #InfrastructureAsCode #Serverless #DevOps #CloudEngineering #PlatformEngineering #Observability #CloudArchitecture #Lambda #DynamoDB #Automation #Python