With its ability to store virtually unlimited amounts of data, Amazon S3 is an extremely popular choice for customers looking to store data in the cloud. On a recent project, our client relied on S3 to store uploaded files. However, during the project, a key requirement emerged, ensuring that every uploaded file was free from malware before being processed. To address this, I initially implemented a serverless scanning solution using ClamAV, an open-source antivirus software, alongside AWS Lambda to scan files as they were uploaded to S3. While this setup worked as intended, it also introduced unexpected challenges, particularly during periods of high file upload activity. Shortly after completing the ClamAV solution, AWS introduced GuardDuty S3 Malware Protection, a managed service that simplifies malware scanning. In this article, I’ll share my experiences with both solutions and explain why GuardDuty S3 Malware Protection ultimately proved to be the superior choice.
Why Scanning S3 File Uploads Is Crucial
Hackers often exploit software vulnerabilities to upload files containing malware, or other malicious code, potentially wreaking havoc on systems and compromising sensitive data. A notable example of this is the 2017 Equifax hack, one of the most infamous data breaches in history.
In this case, attackers exploited a vulnerability to infiltrate Equifax’s systems and used it to upload malicious files, allowing them to dig into Equifax’s systems and steal sensitive info from over 147 million people. All because a gap in security allowed attackers to upload the wrong kind of file.
This underscores the importance of implementing proactive measures to detect and mitigate threats before they escalate. Whether an organisation is managing sensitive customer data or simply hosting files, failing to account for malicious uploads can have devastating and far-reaching implications.
My Initial Approach: Lambda + ClamAV
My initial approach to identify potentially malicious S3 uploads consisted of two lambda functions:
-
Scan-File Lambda:
- An S3 event triggered this lambda whenever a file was uploaded Each file would then be downloaded into the lambda’s /tmp directory.
- Scan the file using ClamAV (scan capability derived from a custom lambda layer, the details of which can be found here), and delete any files where malware is detected.
-
Update-Malware-Definitions Lambda:
- Triggered by Eventbridge scheduler every 12 hours, this function downloaded the up to date malware definitions used by ClamAV and stored them in an S3 bucket.
Drawbacks: Bottlenecks During High Activity
While the solution worked well under low to moderate traffic, it struggled when the upload rate spiked significantly. Each Lambda invocation spent around 30 seconds scanning a single file which led to:
Lambda Concurrency Issues: Multiple files arriving simultaneously caused a surge in Lambda invocations, quickly hitting lambda concurrency limits.
Delayed Processing: The long execution time per file created bottlenecks, resulting in slower handling of uploads.
These issues undermined the reliability and scalability of the solution, requiring me to pivot to something that could better handle large numbers of uploads.
Moving to GuardDuty Malware Protection
To address these challenges, I looked to a new managed service from AWS: GuardDuty Malware Protection.
GuardDuty continuously scans new files as they’re uploaded to select S3 buckets and removes the operational overhead traditionally associated with scanning for malware at scale. Once a file is scanned, it can be given a tag detailing the outcome of the scan, which can then be used for post-scan processing.
With this knowledge, I knew that GuardDuty was the right service for the job. I repurposed the Scan file lambda, modifying the code to check the S3 file’s tag rather than perform a scan itself. If the tag suggested the presence of malware, the lambda moved the file into a separate S3 bucket designated as the quarantine bucket.
This approach provided the following advantages:
Seamless Integration: GuardDuty integrates easily with S3 and eventbridge, allowing the malware scan to trigger downstream processing.
Real-Time Insights: The service provides clear visibility into the scan status of files.
By switching to GuardDuty, I eliminated the performance bottlenecks and concurrency issues of the previous lambda based setup while also simplifying the upfront work.
A diagram of my solution is below.
Lessons Learned
Reflecting on this journey, a few key insights stand out:
Understand Workload Scalability: While Lambda is a powerful tool, its scalability is hindered when high-latency operations, like ClamAV malware scanning, increase execution time; hitting lambda concurrency limits.
Evaluate Managed Solutions: AWS’s managed services, such as GuardDuty, often provide specialised solutions with better performance and reliability than custom architectures.
Keep it Simple: GuardDuty significantly reduced the complexity of my setup, allowing me to focus on other critical aspects of the system.
Conclusion
Scanning files as they hit an S3 bucket remains a critical security practice. While my initial solution with ClamAV and Lambda demonstrated the feasibility of serverless malware scanning, it also highlighted the importance of accounting for spikes and maintaining operational simplicity.
For anyone considering file scanning in the cloud, I highly recommend exploring GuardDuty Malware Protection for S3. It not only enhances security but also removes the burden of maintaining a custom solution. However, there are some considerations to keep in mind.
GuardDuty doesn’t scan files immediately upon upload, so if you have an event-driven architecture that requires the files to be already scanned, you’ll need to implement a decoupling mechanism to ensure those scans have taken place.
Like other AWS services, GuardDuty has quotas, like maximum file size, which may require workarounds.
Thanks
Thanks for reading. If you have any questions, feel free to ask in the comments! I'd also love to hear from anyone who's experimented with similar setups. 🙂
Top comments (0)