DEV Community

Cover image for Building a Scalable Serverless Image Processing Pipeline with AWS SQS and Lambda
Jehiel Martinez
Jehiel Martinez

Posted on

Building a Scalable Serverless Image Processing Pipeline with AWS SQS and Lambda

Image-intensive applications need efficient ways to process images as soon as they are uploaded. Direct processing can overload servers and create bottlenecks. A distributed, scalable approach is necessary to handle varying workloads without sacrificing performance.

Click aquí para la version en español

Architecture Overview

The solution uses a serverless, event-driven architecture that ensures scalability and resilience:

  1. Images are uploaded to an S3 input bucket.
  2. Upload events trigger a Lambda function.
  3. Lambda pushes metadata to an SQS queue.
  4. Processing Lambda consumes messages from the SQS queue.
  5. Thumbnails are generated and stored in the output bucket.

This approach decouples the image upload and processing stages, allowing the system to handle spikes in uploads efficiently.

Diagrama

AWS Services Used

  • Amazon S3: Acts as both the source for original images and the destination for processed thumbnails.
  • Amazon SQS: Serves as a message broker to decouple upload and processing stages.
  • AWS Lambda: Provides serverless compute for event handling and image processing.
  • AWS IAM: Manages permissions and ensures secure access to AWS resources.

Technical Implementation

Image Upload Flow

  1. A user uploads an image to the input S3 bucket.
  2. An S3 event notification triggers the first Lambda function.
  3. This Lambda function constructs a message containing the image's metadata (e.g., bucket name, object key).
  4. The metadata is published to an SQS queue.

Processing Flow

  1. A second Lambda function continuously polls the SQS queue for new messages.
  2. Upon receiving a message, the Lambda function:
    • Downloads the image from the input bucket.
    • Processes the image using the Sharp library to generate a thumbnail.
    • Uploads the thumbnail to the output bucket.
  3. Once the process is successful, the corresponding SQS message is deleted.

Error Handling

  • Dead Letter Queue (DLQ): Messages that fail processing after retries are moved to a DLQ for further analysis.
  • CloudWatch Logs: Logs capture detailed processing steps and error information for debugging.
  • Automatic Retries: Transient errors trigger automatic retries based on SQS policies.

Infrastructure as Code with AWS CDK

Let's break down the core infrastructure implementation in our AWS CDK stack:

S3 Bucket Setup

const bucket = new Bucket(this, 'AwsSqsThumbnailGeneratorBucket', {
      bucketName: 'sqs-thumbnail-generator-bucket',
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
      versioned: true
    });
Enter fullscreen mode Exit fullscreen mode

We create an S3 bucket with versioning enabled to store original images and generated thumbnails. The bucket is configured for automatic object deletion to simplify cleanup.

Queue Configuration

const dlq = new Queue(this, 'AwsSqsThumbnailGeneratorDeadLetterQueue', {
  queueName: 'thumbnail-generator-dlq',
  retentionPeriod: cdk.Duration.days(10)
});

const queue = new Queue(this, 'AwsSqsThumbnailGeneratorQueue', {
  queueName: 'thumbnail-generator-queue',
  visibilityTimeout: cdk.Duration.seconds(60),
  deadLetterQueue: {
    queue: dlq,
    maxReceiveCount: 3
  } 
});
Enter fullscreen mode Exit fullscreen mode

We set up two queues:

A main queue for processing images
A Dead Letter Queue (DLQ) for handling failed processing attempts
Messages are moved to DLQ after 3 failed attempts

S3 to SQS Integration

bucket.addEventNotification(
  EventType.OBJECT_CREATED, 
  new SqsDestination(queue), 
  {prefix: 'uploads/'}
);
Enter fullscreen mode Exit fullscreen mode

This configuration automatically sends a message to our SQS queue whenever a new file is uploaded to the 'uploads/' directory in our S3 bucket.

Sharp Layer

const sharpLayer = new LayerVersion(this, 'SharpLayer', {
      code: Code.fromAsset(path.join(__dirname, './layers/sharp.zip')),
      compatibleRuntimes: [Runtime.NODEJS_LATEST],
      compatibleArchitectures: [Architecture.ARM_64]
    });
Enter fullscreen mode Exit fullscreen mode

We need the Sharp library to process images. We create a Lambda Layer with the library and attach it to our processing Lambda function. This ensures that the library is available to the function at runtime.

This specific layer is built for ARM64 architecture, which is the architecture used by AWS Graviton2 processors. It's optimized for performance and cost savings. Sharp Layer Source

Thumbnail Generator Lambda

  const generator = new NodejsFunction(this, 'AwsSqsThumbnailGeneratorProcessor', {
      functionName: 'thumbnail-generator-processor',
      runtime: Runtime.NODEJS_LATEST,
      handler: 'handler',
      entry: path.join(__dirname, './functions/thumbnail-generator.ts'),
      layers: [sharpLayer],
      architecture: Architecture.ARM_64,
      timeout: cdk.Duration.seconds(60),
      environment: {
        QUEUE_URL: queue.queueUrl
      }
    });

    generator.addEventSource(new SqsEventSource(queue));
Enter fullscreen mode Exit fullscreen mode

We define a Lambda function that processes messages from the SQS queue. The function is triggered by new messages in the queue and generates thumbnails using the Sharp library.

The function code is defined in the thumbnail-generator.ts file, which is responsible for downloading, processing, and uploading images.

IAM Policies

  queue.addToResourcePolicy(new PolicyStatement({
      effect: Effect.ALLOW,
      principals: [new ServicePrincipal('s3.amazonaws.com')],
      actions: ['sqs:SendMessage'],
      resources: [queue.queueArn],
      conditions: {
        ArnLike: {
          'aws:SourceArn': bucket.bucketArn,
        },
      },
    }));
  bucket.grantReadWrite(generator);
  queue.grantConsumeMessages(generator);  
Enter fullscreen mode Exit fullscreen mode

We define IAM policies to allow the S3 bucket to send messages to the SQS queue and grant the processing Lambda function read/write access to the bucket and permission to consume messages from the queue.

Using it

If you’d like to try this project, follow these steps:

  1. Clone the Repository: AWS SQS Thumbnail Generator.
  2. Update Configuration: Modify the account and region details in the /bin/aws-sqs-thumbnail-generator.ts file.
  3. Configure the Thumbnail Sizes: Adjust the thumbnail sizes in the /functions/thumbnail-generator.ts file.
  4. Deploy the Infrastructure: Use AWS CDK to deploy the necessary resources.
  5. Test the Workflow: Upload an image to the input S3 bucket and watch as thumbnails are automatically generated and stored in the output bucket.

Conclusion

The AWS SQS Thumbnail Generator is a powerful example of leveraging serverless architecture to build scalable, efficient, and resilient workflows. By decoupling processes and utilizing services like SQS, Lambda, and S3, this project showcases how to handle resource-intensive tasks with ease.

This project can be extended further by adding features like image recognition, metadata extraction, or integrating with other AWS services. The possibilities are endless, and the scalability and flexibility of serverless architecture make it an ideal choice for such applications.

Additional Resources

Top comments (0)