DEV Community

Cover image for Serverless AI powered content moderation service
Jimmy Dahlqvist for AWS Heroes

Posted on

Serverless AI powered content moderation service

Around one year ago I created a blog post about the creation of a File Manager service. in this post we will use this service as our base and extend it with content moderation. We'll use GuardDuty and Rekognition to assist in this task. As usual everything will be serverless and event-driven.

Recap

To refresh everyone's memory let's start with a short recap.

The service will store files in S3 and keep a record of all the files in a DynamoDB table. The system overview looks like this, with and API exposing functionality to the user and then carry out the work in a serverless and event-driven way.

Architecture overview for the original file manager

The upload flow is initiated by a client calling the API where a Lambda function will creat pre-signed S3 url that the client can use to upload the file. We don't upload file directly over the API, since Amazon API gateway has a max payload size of 10mb, and to support all kinds of files this will become a limitation.

When the client then uploads the file to S3 this will generate an event, the part below the dashed line in the image, this will invoke a StepFunction that will update the file inventory.

Upload flow for the original file manager

Extended architecture

In the extended architecture we add functionality to use GuardDuty S3 malware scanning and Rekognition for image moderation. GuardDuty will scan new files that arrive in the S3 bucket, that I call staging, a tag will will be added to the object and the scan result posted to the default event-bus. The scan result, if OK, will invoke a StepFunction next that utilize Rekognition for image moderation. I have implemented the same logic in this StepFunction and add a tag on the object and post an event onto a event-bus. Finally files are moved to either a quarantine ocr storage bucket.

Every part of the solution is decoupled and can run independently and a saga pattern is applied to move the logic to the next phase.

Extended overview with content moderation

Now let's dig a bit deeper into each of the parts of this solution.

Malware scanning

The GuardDuty Malware scanning doesn't require much setup. This is a fully managed feature in GuardDuty and the only thing that is required is that a configuration of it. GuardDuty will then pick up new object automatically.

Malware Scan Flow

To achieve this flow the only thing we need to do is to create a S3MalwareProtectionPlan and assign it appropriate permissions. One important thing to remember, if you encrypt your objects in S3 with a Customer Managed Key, don't forget to give GuardDuty permissions to decrypt using this key.

  S3MalwareProtectionPlan:
    Type: AWS::GuardDuty::MalwareProtectionPlan
    Properties:
      Actions:
        Tagging:
          Status: ENABLED
      ProtectedResource:
        S3Bucket:
          BucketName:
            Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
      Role: !GetAtt S3MalwareProtectionPlanRole.Arn
Enter fullscreen mode Exit fullscreen mode

Image moderation

The moderation part with Rekognition involves a couple of more steps. A StepFunction is invoked by the result from the Malware scan, and call Rekognition to moderate the image. This StepFunction will then tag the object and post the scan result onto EventBridge custom service bus. One important thing to remember, if you encrypt your objects in S3 with a Customer Managed Key, don't forget to give permissions to decrypt using this key. Rekognition will give you a strange error Unsupported and not a clear error to why it failed in this case.

Image moderation Flow

To achieve this flow the only thing we need to do is to create the StateMachine and setup the events it should be invoked on.

  ModerateImageStateMachineStandard:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: StateMachine/moderation.asl.yaml
      DefinitionSubstitutions:
        EventBusName: 
          Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
      Tracing:
        Enabled: true
      Policies:
        - Statement:
            - Effect: Allow
              Action:
                - logs:*
              Resource: "*"
        - S3FullAccessPolicy:
            BucketName: 
              Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
        - EventBridgePutEventsPolicy:
            EventBusName: 
              Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
        - RekognitionDetectOnlyPolicy: {}
      Events:
        GuardDutyMalwareScanResult:
          Type: EventBridgeRule
          Properties:
            InputPath: $.detail
            Pattern:
              source:
                - aws.guardduty
              detail-type:
                - GuardDuty Malware Protection Object Scan Result
              detail:
                scanResultDetails:
                  scanResultStatus:
                    - NO_THREATS_FOUND
Enter fullscreen mode Exit fullscreen mode

The StateMachine definition is rather large, and there is need for some magic. Since you can't append tags to an S3 object, we first need to fetch all existing tags, append our new tag and put the entire array of tags on the object. This would probably be easier to do in a Lambda function, but where is the fun in that. Intrinsic functions for the win....

Comment: Moderate images using Rekognition
StartAt: Debug
States:
  Debug:
    Type: Pass
    Next: Get Object Metadata
  Get Object Metadata:
    Type: Task
    Parameters:
      Bucket.$: $.s3ObjectDetails.bucketName
      Key.$: $.s3ObjectDetails.objectKey
    Resource: arn:aws:states:::aws-sdk:s3:headObject
    Next: Get Object Tags
    ResultPath: $.S3MetaData
  Get Object Tags:
    Type: Task
    Parameters:
      Bucket.$: $.s3ObjectDetails.bucketName
      Key.$: $.s3ObjectDetails.objectKey
    Resource: arn:aws:states:::aws-sdk:s3:getObjectTagging
    Next: Is File Supported?
    ResultPath: $.s3Tags
  Is File Supported?:
    Type: Choice
    Choices:
      - Or:
          - Variable: $.S3MetaData.ContentType
            StringMatches: image/png
          - Variable: $.S3MetaData.ContentType
            StringMatches: image/jpeg
        Next: Moderate Image
    Default: File Not Supported
  File Not Supported:
    Type: Pass
    Next: Add FILE_NOT_SUPPORTED to Object Tags Array
    Parameters:
      ContentTypes: []
      ModerationLabels: []
      ModerationModelVersion: "7.0"
      ThreatsFound: "-1"
    ResultPath: $.RekognitionModeration
  Add FILE_NOT_SUPPORTED to Object Tags Array:
    Type: Pass
    ResultPath: $.scanResult
    Parameters:
      status: FILE_NOT_SUPPORTED
      newTagSet.$: >-
        States.StringToJson(States.Format('[{},{}]',
        States.ArrayGetItem(States.StringSplit(States.JsonToString($.s3Tags.TagSet),
        '[]'),0), '{"Key":"ImageModerationStatus","Value":"FILE_NOT_SUPPORTED"}'))
    Next: Tag S3 Object
  Moderate Image:
    Type: Task
    Parameters:
      Image:
        S3Object:
          Bucket.$: $.s3ObjectDetails.bucketName
          Name.$: $.s3ObjectDetails.objectKey
    Resource: arn:aws:states:::aws-sdk:rekognition:detectModerationLabels
    Next: File Supported
    ResultPath: $.RekognitionModeration
  File Supported:
    Type: Pass
    Parameters:
      ThreatsFound.$: States.ArrayLength($.RekognitionModeration.ModerationLabels)
      ContentTypes.$: $.RekognitionModeration.ContentTypes
      ModerationLabels.$: $.RekognitionModeration.ModerationLabels
      ModerationModelVersion.$: $.RekognitionModeration.ModerationModelVersion
    ResultPath: $.RekognitionModeration
    Next: Was Threats Found?
  Was Threats Found?:
    Type: Choice
    Choices:
      - Variable: $.RekognitionModeration.ThreatsFound
        NumericGreaterThan: 0
        Next: Add THREATS_DETECTED to Object Tags Array
    Default: Add NO_THREATS to Object Tags Array
  Add THREATS_DETECTED to Object Tags Array:
    Type: Pass
    Parameters:
      status: THREATS_FOUND
      newTagSet.$: >-
        States.StringToJson(States.Format('[{},{}]',
        States.ArrayGetItem(States.StringSplit(States.JsonToString($.s3Tags.TagSet),
        '[]'),0), '{"Key":"ImageModerationStatus","Value":"THREATS_FOUND"}'))
    ResultPath: $.scanResult
    Next: Tag S3 Object
  Add NO_THREATS to Object Tags Array:
    Type: Pass
    Next: Tag S3 Object
    Parameters:
      status: NO_THREATS_FOUND
      newTagSet.$: >-
        States.StringToJson(States.Format('[{},{}]',
        States.ArrayGetItem(States.StringSplit(States.JsonToString($.s3Tags.TagSet),
        '[]'),0), '{"Key":"ImageModerationStatus","Value":"NO_THREATS_FOUND"}'))
    ResultPath: $.scanResult
  Tag S3 Object:
    Type: Task
    Parameters:
      Bucket.$: $.s3ObjectDetails.bucketName
      Key.$: $.s3ObjectDetails.objectKey
      Tagging:
        TagSet.$: $.scanResult.newTagSet
    Resource: arn:aws:states:::aws-sdk:s3:putObjectTagging
    ResultPath: null
    Next: Post Scan Result Event
  Post Scan Result Event:
    Type: Task
    Resource: arn:aws:states:::events:putEvents
    Parameters:
      Entries:
        - Detail:
            metadata: {}
            data:
              id.$: $.s3ObjectDetails.objectKey
              status.$: $.scanResult.status
              scanData.$: $.RekognitionModeration
          DetailType: Moderation Scan Completed
          EventBusName: ${EventBusName}
          Source: ImageModeration
    End: true
    ResultPath: null

Enter fullscreen mode Exit fullscreen mode

Finalize the upload

The last part of this solution is to react to the moderation and place the content either in the quarantine bucket or the long term bucket. For this I use two different StepFunction with some difference in event that invokes them.

Completed Flow

To achieve this flow a new StateMachine is created.

  MoveFilesToPermanentStorageStateMachineStandard:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: StateMachine/move-to-permanent-storage.asl.yaml
      DefinitionSubstitutions:
        EventBusName:
          Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
        StorageBucketName:
          Fn::ImportValue: !Sub ${CommonInfraStackName}:storage-bucket-name
        StagingBucketName:
          Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
      Tracing:
        Enabled: true
      Policies:
        - Statement:
            - Effect: Allow
              Action:
                - logs:*
              Resource: "*"
        - S3FullAccessPolicy:
            BucketName:
              Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
        - S3FullAccessPolicy:
            BucketName:
              Fn::ImportValue: !Sub ${CommonInfraStackName}:storage-bucket-name
        - EventBridgePutEventsPolicy:
            EventBusName:
              Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
      Events:
        NoModerationThreatsFoundEvent:
          Type: EventBridgeRule
          Properties:
            EventBusName:
              Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
            InputPath: $.detail
            Pattern:
              source:
                - ImageModeration
              detail-type:
                - Moderation Scan Completed
              detail:
                data:
                  status:
                    - NO_THREATS_FOUND 
Enter fullscreen mode Exit fullscreen mode

With a StateMachine definition that is a bit easier to follow then the image moderation part.

Comment: Handle result and copy files to storage bucket
StartAt: Debug
States:
  Debug:
    Type: Pass
    Next: Get Object Metadata
  Get Object Metadata:
    Type: Task
    Parameters:
      Bucket: ${StagingBucketName}
      Key.$: $.data.id
    Resource: arn:aws:states:::aws-sdk:s3:headObject
    Next: CopyObject
    ResultPath: $.S3MetaData
  CopyObject:
    Type: Task
    Parameters:
      Bucket: ${StorageBucketName}
      CopySource.$: >-
        States.Format('${StagingBucketName}/{}',$.data.id)
      Key.$: $.data.id
    Resource: arn:aws:states:::aws-sdk:s3:copyObject
    ResultPath: null
    Next: DeleteObject
  DeleteObject:
    Type: Task
    Parameters:
      Bucket: ${StagingBucketName}
      Key.$: $.data.id
    Resource: arn:aws:states:::aws-sdk:s3:deleteObject
    ResultPath: null
    Next: Post Event File Moved
  Post Event File Moved:
    Type: Task
    Resource: arn:aws:states:::events:putEvents
    Parameters:
      Entries:
        - Detail:
            metadata: {}
            data:
              id.$: $.data.id
              status: STORED
              contentType.$: $.S3MetaData.ContentType
              fileSize.$: $.S3MetaData.ContentLength
          DetailType: Completed
          EventBusName: ${EventBusName}
          Source: ImageModeration
    End: true
    ResultPath: null
Enter fullscreen mode Exit fullscreen mode

Conclusion

This was a short post on how I extended my previous built file manager with malware scanning and image moderation. Using only managed services made this a fairly easy task.

To get the full source code and deploy it your self, visit Serverless-Handbook Image Moderation

Final Words

Don't forget to follow me on LinkedIn and X for more content, and read rest of my Blogs

As Werner says! Now Go Build!

Top comments (0)