Rob Bulmer for AWS Community Builders

Posted on Jan 25, 2023 • Originally published at Medium

Storage-First pattern in AWS with API Gateway, Part 1: using S3

#serverless #eventdriven #architecture #aws

Checkout the original post on Medium here:
https://medium.com/@robertbulmer/storage-first-pattern-in-aws-with-api-gateway-part-1-using-s3-216e20b08353

👋 Introduction

This post explains the Storage-First pattern when designing your serverless architecture and where this can come in useful to create additional Resiliency and High-availability in your serverless workloads.

This example is useful in conjunction with serverless Event-Driven Architecture (EDA). The requester will receive an immediate HTTP 200 API Gateway response as soon as the request has been saved. If the requester requires confirmation, consider raising an event to post back, polling or even web sockets.

🪣 Storage-First

When we say Storage-First we mean capturing the entire incoming request into your API, reliably using AWS managed services without the need for your API validate, parse or transform the request.

By storing the request, we have an exact copy in case the handler encounters an unexpected issue. Potential issues could be:

Failure to parse the request
Failure to save the processed request to a data store
Failure to pass on the request to another downstream or third party service

We can use a number of AWS managed services with the Storage-First pattern, which will also be covered in my other posts. These aren’t limited to but include S3, SQS, DynamoDB, EventBridge. Each service has their own use cases, quotas and limitations and those mentioned above have direct integrations with AWS API Gateway.

🔎 Let’s look at a typical request

Typically, a simple API request is sent via a service to our API gateway with an attached Lambda handler.

Let’s see what could go wrong:

Although AWS Lambda has high availability built in, you’re not protected against any runtime issues with downstream services, or problems like bugs introduced in latest releases.

If an unexpected problem occurs, now you’ve completely lost the request. You may have added appropriate logging to investigate and recover, but this doesn’t help in all situations.

Additionally, say your request came from a third party vendor, or is a legacy system that ‘fires and forgets’ the request with no ability to retry. then you’ve lost the request forever.

So what can we do to recover from this scenario?

Let’s store the request in AWS S3, and we can re-drive the request if required without extending functionality on the sender, which may not be possible if:

You don’t own the system, it belongs to a third party vendor
The sender is a legacy application that cannot support or cannot be extended to retry

Now in an error scenario we can pick up the request data from the S3 bucket and process it again, exactly as it came in.

🚀 Real Scenario

The Storage-First pattern comes nicely into place when dealing with requests that contain large data payloads from third party or legacy applications that do not have high resiliency and the ability to protect against failures.

Take for instance an application that sends product XML data to our API.

The product XML has the potential to be varied and vast. For this reason I have chosen AWS S3 as the Storage-First solution, we can take any amount of data and store it in S3 without any issues. If we used SQS, we are limited to a payload of 256kb, read more on SQS quotas here.

Above we have two API endpoints:

1) PUT /product

2) PUT /product/{bucket}/{key}

1) Put /Product

This endpoint uploads XML data using a specific bucket created in our AW CDK stack. The RequestId that is automatically generated by API Gateway is used as the object name inside the bucket.

Let’s see method 1 in action by submitting our raw XML data to the endpoint:

curl --location --request PUT 'https://{apigwid}.execute-api.eu-west-1.amazonaws.com/prod/product/' \ --header 'x-api-key: {apiKey}' \ --header 'Content-Type: application/xml' \ --data-raw '<Product> <AssetCrossReference Type="Primary Image"/> <AssetCrossReference Type="Image 02"/> </Product>'

Now let’s check our S3 Bucket:

Let’s have a look at the CDK code:

// Create new Integration method const putObjectIntegrationAutoName: AwsIntegration = new AwsIntegration({ service: "s3", region: "eu-west-1", integrationHttpMethod: "PUT", path: "{bucket}/{object}", options: { credentialsRole: this.apiGatewayRole, // Passes the request body to S3 without transformation passthroughBehavior: PassthroughBehavior.WHEN_NO_MATCH, requestParameters: { // Specify the bucket name from the XML bucket we created above "integration.request.path.bucket": '${targetBucket.bucketName}', // Specify the object name using the APIG context requestId "integration.request.path.object": "context.requestId", "integration.request.header.Accept": "method.request.header.Accept", }, // Return a 200 response after saving to S3 integrationResponses: [ { statusCode: "200", responseParameters: { "method.response.header.Content-Type": "integration.response.header.Content-Type", }, }, ], }, });

// Create the endpoint method options const putObjectMethodOptionsAutoName: MethodOptions = { // Protected by API Key authorizationType: AuthorizationType.NONE, // Require the API Key on all requests apiKeyRequired: true, requestParameters: { "method.request.header.Accept": true, "method.request.header.Content-Type": true, }, methodResponses: [ { statusCode: "200", responseParameters: { "method.response.header.Content-Type": true, }, }, ], };

// assign the integration to /product resource productResource.addMethod( "PUT", putObjectIntegrationAutoName, putObjectMethodOptionsAutoName );

2) Put /Product/{bucket}/{key}

This endpoint uploads XML data to a user specified S3 bucket and object name. This allows the requester to choose where to put the data and what to call the file.

Note: In this example, our CDK stack has a single bucket and the Lambda permissions are restricted to that bucket only.

Let’s see what happens when we post to method 2:

curl --location --request PUT 'https://{apigwid}.execute-api.eu-west-1.amazonaws.com/prod/product/{bucketName}/p1234' \ --header 'Content-Type: application/xml' \ --header 'x-api-key: {apikey}' \ --data-raw '<Product> <AssetCrossReference Type="Primary Image"/> <AssetCrossReference Type="Image 02"/> </Product>'

Now check the S3 bucket for our new file “p1234” that we specified in the above request:

Now let’s see the differences in AWS CDK when using the request path parameters to decide where to store the object:

// Create the new integration method const putObjectIntegrationUserSpecified: AwsIntegration = new AwsIntegration({ ... options: { ... requestParameters: { // use the bucket name in the request path "integration.request.path.bucket": "method.request.path.bucketName", // use the object key in the request path "integration.request.path.object": "method.request.path.objectKey", "integration.request.header.Accept": "method.request.header.Accept", }, ... }, });

For completeness, let’s check our S3 Trigger to the Parser Lambda function is working by checking AWS CloudWatch. The Parser function will read the XML data and convert this to a JS Object.

☑Summary

As we have seen we can capture request data and store this directly in AWS S3 using AWS direct integrations with API Gateway. This provides a highly resilient solution to managing data requests and allows for re-driving requests if there are any unexpected errors.

With a small amount of CDK code we can utilise AWS managed services to provide a robust solution that is great for requesters who cannot resend requests or those that you may not have access to or support channels for.

You can extend the above code with a full EDA solution that can notify the requester the request has been fully processed. Alternatively if you cannot use EDA within your end to end approach, consider using web sockets or polling in conjunction to notify the sender once complete.

👨‍💻 Code

All of the code featured in this post can be found here:
https://github.com/rbulmer55/Apigw-to-s3

📣 Getting in touch!

Happy building!… 🚀

Thank you for reading,

Reach me on linkedIn here:

https://www.linkedin.com/in/robertbulmer/

Built for developers, by developers.

Whether you're building a simple prototype or a business-critical product, Heroku's fully-managed platform gives you the simplest path to delivering apps quickly — using the tools and languages you already love!

Learn More

Top comments (0)

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post