π‘οΈ Building Project Aegis: Designing a Serverless File Integrity Monitoring System on AWS
One of the biggest challenges in cloud security is ensuring that critical files remain authentic, traceable, and tamper-free after being uploaded into cloud environments.
In industries such as legal systems, healthcare, forensic investigations, and compliance-driven environments, even a small unauthorized file modification can create serious operational and security risks.
That raised an important question for me:
How can you detect when a file has been silently modified while still maintaining a reliable audit trail and real-time operational visibility?
To explore that problem, I designed and deployed Project Aegis β a serverless file integrity monitoring platform built on AWS.
The platform automatically:
- generates SHA-256 hashes for uploaded files
- stores audit history in DynamoDB
- detects file tampering
- triggers real-time alerts using Amazon SNS
- maintains an operational audit workflow using serverless architecture
Unlike traditional monolithic systems, the entire platform was designed around:
- event-driven workflows
- serverless scalability
- operational automation
- infrastructure reproducibility
- observability and monitoring principles
Architecture Overview
The architecture follows a fully event-driven workflow:
Amazon S3 β AWS Lambda β DynamoDB β Amazon SNS
Workflow summary:
- A user uploads a file into Amazon S3
- S3 automatically triggers a Lambda function
- Lambda generates a SHA-256 hash of the uploaded file
- DynamoDB stores historical hash records and audit metadata
- If the same filename is uploaded with different content:
- the system detects tampering
- triggers an SNS notification
- updates audit history
Core AWS Services Used
Amazon S3
Amazon S3 acts as the ingestion layer for uploaded files and automatically triggers the processing pipeline using event notifications.
AWS Lambda
AWS Lambda handles the serverless processing logic:
- retrieves uploaded files
- generates SHA-256 hashes
- compares historical audit records
- updates DynamoDB
- triggers SNS notifications when tampering is detected
Amazon DynamoDB
DynamoDB stores:
- file metadata
- SHA-256 hashes
- audit history
- modification tracking records
This provides a scalable and serverless audit logging layer.
Amazon SNS
Amazon SNS sends real-time operational alerts whenever suspicious file modifications are detected.
Amazon CloudWatch
CloudWatch was heavily used for:
- monitoring
- troubleshooting
- Lambda execution tracing
- debugging operational failures
Terraform
After initially building the project manually inside the AWS Console, I later automated the infrastructure using Terraform to improve:
- reproducibility
- scalability
- deployment consistency
- infrastructure management
File Integrity Logic
One of the most important parts of the project was designing reliable file tampering detection.
The system follows this logic:
- Same filename + same content β No alert
- Same filename + modified content β Trigger alert
Instead of relying only on filenames, the platform generates SHA-256 hashes to compare actual file content integrity.
This prevents false positives and improves audit reliability significantly.
Testing the System
Initial Upload
Example:
```text id="c8d92s"
test.txt β hello
Result:
* hash stored in DynamoDB
* no alert triggered
### Modified Upload
```text id="d92jss"
test.txt β HELLO WORLD 123
Result:
- new hash generated
- tampering detected
- SNS email alert triggered
- audit record updated
Challenges & Solutions
One of the most valuable parts of building Project Aegis was troubleshooting real operational and infrastructure problems while designing the platform.
Several engineering challenges forced me to think beyond simply connecting AWS services and instead focus on debugging, observability, system behavior, and operational reliability.
1. Duplicate File Detection
Problem
Uploading files with the same name triggered unnecessary alerts.
Root Cause
Initial logic compared filenames only instead of actual file content.
Solution
Implemented SHA-256 hashing to compare file content integrity directly.
Result
Alerts now trigger only when actual file content changes.
2. SNS Alerts Not Triggering
Problem
Real-time alerts were not being received after file modifications.
Root Cause
Missing SNS publish permissions and incomplete Lambda notification logic.
Solution
Added proper IAM permissions (sns:Publish) and integrated SNS workflows directly into Lambda processing.
Result
Operational alerts now trigger successfully in real time.
3. Amazon S3 Overwrite Behavior
Problem
Uploading a file with the same name replaced the existing object unexpectedly.
Root Cause
Amazon S3 overwrites objects sharing the same key by default.
Solution
Shifted detection logic toward hash comparison rather than filename dependency.
Result
The platform accurately detects modifications even when files are overwritten.
4. CloudWatch Debugging & Observability
Problem
It was initially difficult to verify whether Lambda executions completed successfully.
Solution
Used Amazon CloudWatch logs to trace execution flow, monitor failures, and debug event processing behavior.
Result
Improved operational visibility and troubleshooting reliability significantly.
5. Infrastructure Deployment Consistency
Problem
Manual deployments introduced operational inconsistency and configuration drift.
Solution
Implemented Infrastructure as Code using Terraform to automate AWS resource provisioning.
Result
The infrastructure can now be recreated consistently using repeatable deployment workflows.
What I Learned
Building Project Aegis reinforced several important cloud engineering and operational concepts for me:
- Designing event-driven serverless architectures for real-time processing
- Applying SHA-256 hashing to enforce integrity validation and auditability
- Orchestrating AWS services into a cohesive operational workflow
- Using Infrastructure as Code with Terraform for scalable and reproducible deployments
- Implementing least-privilege IAM permissions to secure service interactions
- Leveraging CloudWatch for observability, debugging, and operational monitoring
- Understanding how cloud systems behave under real operational conditions instead of only theoretical deployments
One of the biggest mindset shifts from this project was realizing that cloud engineering is not simply about deploying services.
Itβs about understanding:
- operational behavior
- reliability
- observability
- troubleshooting
- automation
- security
- repeatability at scale
Future Improvements
There are several areas Iβd continue expanding in future iterations of the platform:
- CI/CD automation using GitHub Actions or Jenkins
- Multi-environment Terraform deployments (dev/staging/prod)
- API Gateway + authentication integration
- Advanced observability dashboards using CloudWatch Insights or Grafana
- Cross-region replication and disaster recovery workflows
- Policy-as-code and automated security validation
- Enhanced audit retention and compliance-focused storage strategies
Final Takeaway
Project Aegis started as a cloud security learning project, but it ultimately became an exercise in operational thinking, infrastructure automation, observability, and system reliability.
Building systems is important.
But building systems that are:
- repeatable
- observable
- secure
- resilient
- operationally reliable
is what truly starts shifting cloud projects toward real engineering platforms.
GitHub Repository
https://github.com/wilfriedbako/Project-Aegis
If youβve worked on similar cloud security or serverless engineering projects, Iβd genuinely enjoy connecting and learning from other approaches and ideas.


Top comments (0)