Checkout the original post on Medium here:
https://medium.com/@robertbulmer/storage-first-pattern-in-aws-with-api-gateway-part-1-using-s3-216e20b08353
👋 Introduction
This post explains the Storage-First pattern when designing your serverless architecture and where this can come in useful to create additional Resiliency and High-availability in your serverless workloads.
This example is useful in conjunction with serverless Event-Driven Architecture (EDA). The requester will receive an immediate HTTP 200 API Gateway response as soon as the request has been saved. If the requester requires confirmation, consider raising an event to post back, polling or even web sockets.
🪣 Storage-First
When we say Storage-First we mean capturing the entire incoming request into your API, reliably using AWS managed services without the need for your API validate, parse or transform the request.
By storing the request, we have an exact copy in case the handler encounters an unexpected issue. Potential issues could be:
- Failure to parse the request
- Failure to save the processed request to a data store
- Failure to pass on the request to another downstream or third party service
We can use a number of AWS managed services with the Storage-First pattern, which will also be covered in my other posts. These aren’t limited to but include S3, SQS, DynamoDB, EventBridge. Each service has their own use cases, quotas and limitations and those mentioned above have direct integrations with AWS API Gateway.
🔎 Let’s look at a typical request
Typically, a simple API request is sent via a service to our API gateway with an attached Lambda handler.
Let’s see what could go wrong:
Although AWS Lambda has high availability built in, you’re not protected against any runtime issues with downstream services, or problems like bugs introduced in latest releases.
If an unexpected problem occurs, now you’ve completely lost the request. You may have added appropriate logging to investigate and recover, but this doesn’t help in all situations.
Additionally, say your request came from a third party vendor, or is a legacy system that ‘fires and forgets’ the request with no ability to retry. then you’ve lost the request forever.
So what can we do to recover from this scenario?
Let’s store the request in AWS S3, and we can re-drive the request if required without extending functionality on the sender, which may not be possible if:
You don’t own the system, it belongs to a third party vendor
The sender is a legacy application that cannot support or cannot be extended to retry
Now in an error scenario we can pick up the request data from the S3 bucket and process it again, exactly as it came in.
🚀 Real Scenario
The Storage-First pattern comes nicely into place when dealing with requests that contain large data payloads from third party or legacy applications that do not have high resiliency and the ability to protect against failures.
Take for instance an application that sends product XML data to our API.
The product XML has the potential to be varied and vast. For this reason I have chosen AWS S3 as the Storage-First solution, we can take any amount of data and store it in S3 without any issues. If we used SQS, we are limited to a payload of 256kb, read more on SQS quotas here.
Above we have two API endpoints:
1) PUT /product
2) PUT /product/{bucket}/{key}
1) Put /Product
This endpoint uploads XML data using a specific bucket created in our AW CDK stack. The RequestId that is automatically generated by API Gateway is used as the object name inside the bucket.
Let’s see method 1 in action by submitting our raw XML data to the endpoint:
curl --location --request PUT 'https://{apigwid}.execute-api.eu-west-1.amazonaws.com/prod/product/' \
--header 'x-api-key: {apiKey}' \
--header 'Content-Type: application/xml' \
--data-raw '<Product>
<AssetCrossReference Type="Primary Image"/>
<AssetCrossReference Type="Image 02"/>
</Product>'
Now let’s check our S3 Bucket:
Let’s have a look at the CDK code:
// Create new Integration method
const putObjectIntegrationAutoName: AwsIntegration = new AwsIntegration({
service: "s3",
region: "eu-west-1",
integrationHttpMethod: "PUT",
path: "{bucket}/{object}",
options: {
credentialsRole: this.apiGatewayRole,
// Passes the request body to S3 without transformation
passthroughBehavior: PassthroughBehavior.WHEN_NO_MATCH,
requestParameters: {
// Specify the bucket name from the XML bucket we created above
"integration.request.path.bucket": '${targetBucket.bucketName}',
// Specify the object name using the APIG context requestId
"integration.request.path.object": "context.requestId",
"integration.request.header.Accept": "method.request.header.Accept",
},
// Return a 200 response after saving to S3
integrationResponses: [
{
statusCode: "200",
responseParameters: {
"method.response.header.Content-Type":
"integration.response.header.Content-Type",
},
},
],
},
});
// Create the endpoint method options
const putObjectMethodOptionsAutoName: MethodOptions = {
// Protected by API Key
authorizationType: AuthorizationType.NONE,
// Require the API Key on all requests
apiKeyRequired: true,
requestParameters: {
"method.request.header.Accept": true,
"method.request.header.Content-Type": true,
},
methodResponses: [
{
statusCode: "200",
responseParameters: {
"method.response.header.Content-Type": true,
},
},
],
};
// assign the integration to /product resource
productResource.addMethod(
"PUT",
putObjectIntegrationAutoName,
putObjectMethodOptionsAutoName
);
2) Put /Product/{bucket}/{key}
This endpoint uploads XML data to a user specified S3 bucket and object name. This allows the requester to choose where to put the data and what to call the file.
Note: In this example, our CDK stack has a single bucket and the Lambda permissions are restricted to that bucket only.
Let’s see what happens when we post to method 2:
curl --location --request PUT 'https://{apigwid}.execute-api.eu-west-1.amazonaws.com/prod/product/{bucketName}/p1234' \
--header 'Content-Type: application/xml' \
--header 'x-api-key: {apikey}' \
--data-raw '<Product>
<AssetCrossReference Type="Primary Image"/>
<AssetCrossReference Type="Image 02"/>
</Product>'
Now check the S3 bucket for our new file “p1234” that we specified in the above request:
Now let’s see the differences in AWS CDK when using the request path parameters to decide where to store the object:
// Create the new integration method
const putObjectIntegrationUserSpecified: AwsIntegration =
new AwsIntegration({
...
options: {
...
requestParameters: {
// use the bucket name in the request path
"integration.request.path.bucket": "method.request.path.bucketName",
// use the object key in the request path
"integration.request.path.object": "method.request.path.objectKey",
"integration.request.header.Accept": "method.request.header.Accept",
},
...
},
});
For completeness, let’s check our S3 Trigger to the Parser Lambda function is working by checking AWS CloudWatch. The Parser function will read the XML data and convert this to a JS Object.
☑Summary
As we have seen we can capture request data and store this directly in AWS S3 using AWS direct integrations with API Gateway. This provides a highly resilient solution to managing data requests and allows for re-driving requests if there are any unexpected errors.
With a small amount of CDK code we can utilise AWS managed services to provide a robust solution that is great for requesters who cannot resend requests or those that you may not have access to or support channels for.
You can extend the above code with a full EDA solution that can notify the requester the request has been fully processed. Alternatively if you cannot use EDA within your end to end approach, consider using web sockets or polling in conjunction to notify the sender once complete.
👨💻 Code
All of the code featured in this post can be found here:
https://github.com/rbulmer55/Apigw-to-s3
📣 Getting in touch!
Happy building!… 🚀
Thank you for reading,
Reach me on linkedIn here:
Top comments (0)