Photo by Ed Hardie on Unsplash
Original Source: https://skildops.com/blog/stop-malware-at-the-door-automated-s3-file-scanning-with-aws-guardduty
Amazon S3 is widely used to store various kind of files and most of the times we do not scan these files for malwares thus exposing ourselves and our clients to risk. It was the same case for one of projects we were working on. It was business as usual and we were on a prep call before the official ISO-27001 audit process kicks off. We had managed to tick all the boxes except the one when we were asked if we are scanning the files that we upload to the S3 bucket and our answer was a straight no. I could see on the face of the evaluator that she wasn’t very happy with our answer and asked if we can manage to get it sorted.
The security audit was just a week later and we knew the time is not enough to implement a solution within the timeframe. Luckily, a temporary exception was approved for the solution because all the files distributed to the clients were generated internally by the data engineers. The exception got us enough time to implement a solution to mitigate the risk of distributing malicious files to the clients.
In this article, we will implement a simple serverless solution to automatically mitigate such risk in real-time using AWS GuardDuty.
Key Services Involved:
Let’s start by taking a look at the architecture diagram to understand how the solution functions.
Architecture Overview
It all starts with enabling the GuardDuty malware protection plan for the source S3 bucket either via console, API or CLI. Once you enable the malware protection plan, S3 event notifications are configured to invoke the scan whenever an object is uploaded to the source bucket.
After the scan, GuardDuty will optionally tag the object and publish the result to an EventBridge rule for further processing.
Once the scan results are published to the EventBridge rule, they are forwarded to an SNS topic to achieve a fan-out solution. The message is then pushed to an SQS queue which is consumed by a Lambda function for final processing.
Note:
- As of writing this article, only one source bucket can be associated per protection plan. Hence, you need to create a dedicated protection plan for every bucket you want to secure.
- S3 bucket must be in the same region as GuardDuty.
Business Logic
Once a message is pulled by the Lambda function from SQS, it starts with extracting the scan result and file details from the message generated by GuardDuty and pushed to the EventBridge to identify the action that it needs to perform. The action that will be executed depends on the scan result and the action configured for a result type.
There are two ways to configure the action: either by setting the environment variable GD_SCANNED_FILE_ACTION for the Lambda function through the respective Terraform variable or by adding a tag to the object uploaded to the bucket. Details about how to tag an S3 object can be found in the GitHub repo readme.
Tag attached to the object always gets the highest priority if an action is configured using both the above mentioned options whereas if an action is not configured via either of the supported methods, default action is chosen based on the scan result.
In case, actions are configured partially via either or both the supported methods, deep merge is performed to generate the final result with S3 object tag having the highest priority followed by the lambda function environment variable and default action with the least priority.
Post identifying and executing the action, optionally a notification will be triggered via AWS SNS topic.
Default action:
{
"threat": {
"action": "delete",
"notify_user": True,
},
"no_threat": {
"action": "do_nothing",
"notify_user": False,
},
"failed": {
"action": "do_nothing",
"notify_user": True,
},
}
For a detailed guide on how to deploy and configure the solution, please refer to the GitHub repo that contains the source code, a terraform module and a detailed readme to deploy the solution on AWS cloud.

Top comments (0)