Prevent costly Amazon DynamoDB operations in AWS Amplify CLI projects

#amplify #dynamodb #aws #serverless

The serverless revolution gave birth to a new wave of developers – full-stack AWS serverless developers. Capable both on the frontend and backend, those developers can be very productive, delivering business value in days, not months.

One tool in their toolbelt might be the AWS Amplify CLI, which lives under the AWS Amplify umbrella. While very powerful, when misused, it might lead to a large AWS bill at the end of the month.

This blog post will show you how to ensure that your AWS Amplify CLI bootstrapped application is not running costly Amazon DynamoDB operations like the Scan operation.

The problem with Amazon DynamoDB Scan operation

Before diving into the solution, let us first learn why you might not want to run Amazon DynamoDB Scan operations on your Amazon DynamoDB tables.

The main issue is that in most scenarios, the Amazon DynamoDB Scan is wildly inefficient and costly. The bigger your data set is, both in volume and size, the more problematic the operation can be.

The Scan operation will read every item in your table or secondary index (Scan can retrieve up to 1 MB of data in one request). This rule applies to all Scan requests, even if you specify filter conditions on top of it!

Of course, that in of itself would not be a problem. It is entirely valid to use the Scan operation for data migrations alike. The situation drastically changes when developers expose the Scan operation as an API endpoint – which is what AWS Amplify CLI can do.

Unknowingly using Amazon DynamoDB `Scan` operations

How might one unknowingly expose a Amazon DynamoDB Scan operation as an endpoint while using the AWS Amplify CLI?

Let us take this seemingly harmless-looking, GraphQL as an example.

input AMPLIFY {
  globalAuthRule: AuthRule = { allow: public }
} # FOR TESTING ONLY!
type Post @model {
  id: ID!
  title: String!
}

When deployed according to the instructions in the documentation, various GraphQL queries, mutations, and subscriptions will be created in the AWS AppSync service.

To deploy the AWS Amplify applications using AWS Amplify CLI consult the Getting started documentation.

One of the queries is the ListPosts query. This query will utilize Amazon DynamoDB Scan operation for data retrieval. Not ideal in most circumstances.

I have a lot of empathy for the AWS Amplify team. The idea is to hide as much complexity about AWS as possible while enabling the developers to do their job – this is a challenging problem.

In my humble opinion, while mostly doing great work, this is an area where they fall a bit short. To my best knowledge, there are no warnings in the documentation about this behavior.

The documentation around various data-modeling-related directives is excellent, though. Give it a read!

Custom AWS CDK overrides to the rescue

So how can we use the AWS Amplify CLI without worrying about using inefficient operations to access our data?

I was very happily surprised that, in the wave of the announcements before the AWS re:Invent 2021 came the support to override Amplify-generated resources using CDK – this is just what we need!

After reading the excellent introductory blog post, it did not take me long to make sure all Amazon DynamoDB Scan operations are forbidden. And I was able to do it without changing my application code.

Two-step process

The first step is to run the AWS Amplify CLI to override the GraphQL API resources. Consult the official documentation for more information.

amplify override api

The second step is to amend the override.ts file that the AWS Amplify CLI created for us. The following is the code I've added to the override.ts file.

import { AmplifyApiGraphQlResourceStackTemplate } from "@aws-amplify/cli-extensibility-helper";

export function override(resources: AmplifyApiGraphQlResourceStackTemplate) {
  for (const model in resources.models) {
    const modelDDBTable = resources.models[model].modelDDBTable;
    resources.models[model].modelIamRole.policies = [
      {
        policyName: "AmplifyDenyTest",
        policyDocument: {
          Version: "2012-10-17",
          Statement: [
            {
              Effect: "Deny",
              Action: ["dynamodb:Scan"],
              // TIL: Scan can be performed on indexes
              Resource: [modelDDBTable.attrArn, `${modelDDBTable.attrArn}/*`]
            }
          ]
        }
      }
    ];
  }
}

Despite overriding the policies property for each model AWS IAM role, one can still use other Amazon DynamoDB operations. It is possible because the modelIamRole has other policies attached to it by default.

Be careful when overriding resources in your applications. You might want to do a test run before deploying anything to production.

Closing words

You might be wondering, why bother? I would argue that having such guardrails in place, you are most likely to fall into a pit of success when it comes to your Amazon DynamoDB access patterns.

Striving to hide complexity is an excellent mission, and AWS Amplify CLI does an excellent job doing that. But it sometimes can shoot us in the foot when we are not careful.

For more similar content, follow me on Twitter - @wm_matuszewski.

Thank you for your valuable time.