Amazon Macie to detect sensitive data from your S3 Buckets

#s3 #security #macie #machinelearning

Leaking data or sensitive information exposure can lead to many insecurities to your organization including loss of business reputation and trust as well as long-term financially losses. Therefore, security is something we should seriously look at including applying security prevention, detection guardrails, monitoring, remediation and governance to stay on top of security of your businesses and its applications. To manage these sort of issues AWS provides a variety of security services that can be applied at different levels to safe-guard you and your customers business data while uplifting your businesses security posture.

Amazon Macie is a fully managed, ML & pattern matching service that helps with data security and data privacy concerns. Macie can provide a details list of sensitive information it can find in your S3 Buckets, so you can review them and take action. Actions can be done manually or by automating based on event using services like lambda and step functions. Automating prevent delays and human errors allowing you to act instantly to remediate or alerts on threats. Your content will be scanned based on pre-defined AWS defined rules as well as any rules you define by your own (custom rules). Macie has a native integration with AWS Organizations to allow centrally govern and perform scaled operations across your organization.

Machine can find PII (Personally Identifiable Information) such as Name, Address and contact details. National Information such as your Passports, Identities, Drivers License and Social Security Numbers), Medical Information such as Medical data, pharmacy information and even credentials and keys such as AWS Secret Keys and Private Keys etc.

That's not all, Macie can scan and detect threats related to PFI (Personal Financial Information) such as Credit Card Numbers and Bank Account Details also. Macie will scan and detect threats and present them in the form of findings via different AWS services such as Macie's Console, Macie's API's, Amazon EventBridge and Security Hub.

In order to scan and proceed with the threat detection within the data stored in your S3 Buckets, it uses a Service Linked Role to acquire necessary permission to create an inventory of all your s3 buckets, monitor, collect statistics, analyze the object and detect sensitive information. Macie also create metadata about all your S3 buckets, Usually these metadata gets refreshed every 24h as part of Macie's refresh cycle and you can also trigger it manually from the Macie's Console every 5 minutes. The metadata captures below will be use for on-going and future threat detection operations.

Macie will create a finding for each threats it detects from the moment you enabled Macie. For example, if someone disables the default encryption for a bucket, it will create a finding for you to review.

Some of the captured metadata includes

Name
ARN
Creation Date
Account-Level Access/Permissions
Shared/Cross-Account Access and Replication Settings
Object Counts etc.

During the scanning and threat detection activity Macie looks for Unencrypted buckets, Publicly accessible bucket and Buckets shared with other accounts without an explicit allowed defined and then analyze and collect findings for the below listed categories.

- S3BucketPublicAccessDisabled
- S3BucketEncryptionDisabled
- S3BucketPublic
- S3BucketReplicatedExternally
- S3BucketSharedExternally

With each finding will have a severity defined and general information about the threat including bucket name, when and how Macie was able detect the threat. These findings will be available for 90 days from the date the scan triggered and collected the information and can be viewed and explored from Macie's Console, Macie's APIs, EventBridge and AWS Security Hub to take necessary security precautions to mitigate the detected issues. You can also suppress findings, if you are sure that those are based on your comp policies and regulations that's in place.

Important to Note:
Your S3 Buckets could have different Server-Side/Client Side Encryptions configured and depending on the method configured for each Bucket or all the buckets as a whole, there are some implication that prevents Amazon Macie from analyzing and detecting threats from your S3 Buckets.

For instance, you could have used SSE-S3, SSE-KMS Server-Side Encryptions configured for your S3 Buckets, if that's the case then no issues Macie can scan, detect and report threats.

However, if you used CMK (Customer Managed Keys) for encrypting your S3 data then you have to explicitly allow Macie to use that key during the execution of the Sensitive Data discovery job which can be configured to run either one time, daily, weekly or monthly basis and collects findings, otherwise Macie will find it difficult to proceed with the Job of analyzing and detecting threats as it can not decrypt the data.

Similarly, for SSE-C (Server-side encryption with customer provided keys) Macie is unable to decrypt analyze and detect threats therefore Macie will just report metadata about your Buckets, same goes for any S3 Buckets configured to use Client-Side-Encryption.

Also note that Macie will not be able to Analyze and detect threats in Audio, Video, Image files for that you may have to use another service from AWS like Amazon Rekognition.

Further, It is key to keep in mind that an organization can only have a single administrator account at a given time. And an account cannot be both a Macie administrator as well as a member account.

However, If you ever wish to change the Macie administrator account then note that all member accounts will be removed. However, Macie will not be disabled from those member accounts.

A member account can only be associated with one administrator at a given time and it is unable to disassociate itself from that administrator once the member account is associated to the administrator account.

Thank you for your time...

DEV Community

Amazon Macie to detect sensitive data from your S3 Buckets

Top comments (0)

Read next

🎗️Breast Cancer Prediction with Decision Trees

What is Amazon Rekognition?

New Method Lets You Train 100B AI Models on a Single Consumer GPU, 2.6x Faster

Google's LearnLM: AI Model Gets Teaching Upgrade to Boost Educational Performance