DEV Community

Cover image for Building Project Aegis: Designing a Serverless File Integrity Monitoring System on AWS
wilfried bako
wilfried bako

Posted on

Building Project Aegis: Designing a Serverless File Integrity Monitoring System on AWS

πŸ›‘οΈ Building Project Aegis: Designing a Serverless File Integrity Monitoring System on AWS

One of the biggest challenges in cloud security is ensuring that critical files remain authentic, traceable, and tamper-free after being uploaded into cloud environments.

In industries such as legal systems, healthcare, forensic investigations, and compliance-driven environments, even a small unauthorized file modification can create serious operational and security risks.

That raised an important question for me:

How can you detect when a file has been silently modified while still maintaining a reliable audit trail and real-time operational visibility?

To explore that problem, I designed and deployed Project Aegis β€” a serverless file integrity monitoring platform built on AWS.

The platform automatically:

  • generates SHA-256 hashes for uploaded files
  • stores audit history in DynamoDB
  • detects file tampering
  • triggers real-time alerts using Amazon SNS
  • maintains an operational audit workflow using serverless architecture

Unlike traditional monolithic systems, the entire platform was designed around:

  • event-driven workflows
  • serverless scalability
  • operational automation
  • infrastructure reproducibility
  • observability and monitoring principles

Architecture Overview

The architecture follows a fully event-driven workflow:

Amazon S3 β†’ AWS Lambda β†’ DynamoDB β†’ Amazon SNS
Enter fullscreen mode Exit fullscreen mode

Workflow summary:

  1. A user uploads a file into Amazon S3
  2. S3 automatically triggers a Lambda function
  3. Lambda generates a SHA-256 hash of the uploaded file
  4. DynamoDB stores historical hash records and audit metadata
  5. If the same filename is uploaded with different content:
  • the system detects tampering
  • triggers an SNS notification
  • updates audit history

Core AWS Services Used

Amazon S3

Amazon S3 acts as the ingestion layer for uploaded files and automatically triggers the processing pipeline using event notifications.

AWS Lambda

AWS Lambda handles the serverless processing logic:

  • retrieves uploaded files
  • generates SHA-256 hashes
  • compares historical audit records
  • updates DynamoDB
  • triggers SNS notifications when tampering is detected

Amazon DynamoDB

DynamoDB stores:

  • file metadata
  • SHA-256 hashes
  • audit history
  • modification tracking records

This provides a scalable and serverless audit logging layer.

Amazon SNS

Amazon SNS sends real-time operational alerts whenever suspicious file modifications are detected.

Amazon CloudWatch

CloudWatch was heavily used for:

  • monitoring
  • troubleshooting
  • Lambda execution tracing
  • debugging operational failures

Terraform

After initially building the project manually inside the AWS Console, I later automated the infrastructure using Terraform to improve:

  • reproducibility
  • scalability
  • deployment consistency
  • infrastructure management

File Integrity Logic

One of the most important parts of the project was designing reliable file tampering detection.

The system follows this logic:

  • Same filename + same content β†’ No alert
  • Same filename + modified content β†’ Trigger alert

Instead of relying only on filenames, the platform generates SHA-256 hashes to compare actual file content integrity.

This prevents false positives and improves audit reliability significantly.


Testing the System

Initial Upload

Example:

```text id="c8d92s"
test.txt β†’ hello




Result:

* hash stored in DynamoDB
* no alert triggered

### Modified Upload



```text id="d92jss"
test.txt β†’ HELLO WORLD 123
Enter fullscreen mode Exit fullscreen mode

Result:

  • new hash generated
  • tampering detected
  • SNS email alert triggered
  • audit record updated


Challenges & Solutions

One of the most valuable parts of building Project Aegis was troubleshooting real operational and infrastructure problems while designing the platform.

Several engineering challenges forced me to think beyond simply connecting AWS services and instead focus on debugging, observability, system behavior, and operational reliability.

1. Duplicate File Detection

Problem

Uploading files with the same name triggered unnecessary alerts.

Root Cause

Initial logic compared filenames only instead of actual file content.

Solution

Implemented SHA-256 hashing to compare file content integrity directly.

Result

Alerts now trigger only when actual file content changes.


2. SNS Alerts Not Triggering

Problem

Real-time alerts were not being received after file modifications.

Root Cause

Missing SNS publish permissions and incomplete Lambda notification logic.

Solution

Added proper IAM permissions (sns:Publish) and integrated SNS workflows directly into Lambda processing.

Result

Operational alerts now trigger successfully in real time.


3. Amazon S3 Overwrite Behavior

Problem

Uploading a file with the same name replaced the existing object unexpectedly.

Root Cause

Amazon S3 overwrites objects sharing the same key by default.

Solution

Shifted detection logic toward hash comparison rather than filename dependency.

Result

The platform accurately detects modifications even when files are overwritten.


4. CloudWatch Debugging & Observability

Problem

It was initially difficult to verify whether Lambda executions completed successfully.

Solution

Used Amazon CloudWatch logs to trace execution flow, monitor failures, and debug event processing behavior.

Result

Improved operational visibility and troubleshooting reliability significantly.


5. Infrastructure Deployment Consistency

Problem

Manual deployments introduced operational inconsistency and configuration drift.

Solution

Implemented Infrastructure as Code using Terraform to automate AWS resource provisioning.

Result

The infrastructure can now be recreated consistently using repeatable deployment workflows.


What I Learned

Building Project Aegis reinforced several important cloud engineering and operational concepts for me:

  • Designing event-driven serverless architectures for real-time processing
  • Applying SHA-256 hashing to enforce integrity validation and auditability
  • Orchestrating AWS services into a cohesive operational workflow
  • Using Infrastructure as Code with Terraform for scalable and reproducible deployments
  • Implementing least-privilege IAM permissions to secure service interactions
  • Leveraging CloudWatch for observability, debugging, and operational monitoring
  • Understanding how cloud systems behave under real operational conditions instead of only theoretical deployments

One of the biggest mindset shifts from this project was realizing that cloud engineering is not simply about deploying services.

It’s about understanding:

  • operational behavior
  • reliability
  • observability
  • troubleshooting
  • automation
  • security
  • repeatability at scale

Future Improvements

There are several areas I’d continue expanding in future iterations of the platform:

  • CI/CD automation using GitHub Actions or Jenkins
  • Multi-environment Terraform deployments (dev/staging/prod)
  • API Gateway + authentication integration
  • Advanced observability dashboards using CloudWatch Insights or Grafana
  • Cross-region replication and disaster recovery workflows
  • Policy-as-code and automated security validation
  • Enhanced audit retention and compliance-focused storage strategies

Final Takeaway

Project Aegis started as a cloud security learning project, but it ultimately became an exercise in operational thinking, infrastructure automation, observability, and system reliability.

Building systems is important.

But building systems that are:

  • repeatable
  • observable
  • secure
  • resilient
  • operationally reliable

is what truly starts shifting cloud projects toward real engineering platforms.


GitHub Repository

https://github.com/wilfriedbako/Project-Aegis

If you’ve worked on similar cloud security or serverless engineering projects, I’d genuinely enjoy connecting and learning from other approaches and ideas.

AWS #CloudSecurity #Terraform #InfrastructureAsCode #Serverless #DevOps #CloudEngineering #PlatformEngineering #Observability #CloudArchitecture #Lambda #DynamoDB #Automation #Python

Top comments (0)