DEV Community: Emma Moinat

CSV Imports to DynamoDB at Scale

Emma Moinat — Mon, 05 May 2025 07:20:25 +0000

I recently had to populate a DynamoDB table with over 740,000 items as part of a migration project. I tried three different approaches to see what would give me the best mix of speed, cost, and operational sanity. I’m sharing what I learned here in case you’re facing the same question: “What’s the best way to load a large amount of data into DynamoDB?”

Spoiler: Step Functions with batched Lambda invocations is the winner — but the path to that conclusion was full of interesting surprises.

Before You Go Too Far...

If your data is stored in S3 as a CSV or JSON file, and you're looking for a simple, no-code solution to load it directly into DynamoDB, AWS offers an out-of-the-box option. You can use the DynamoDB Data Import feature from the S3 console to create a table and populate it from your S3 bucket with minimal effort.

This feature is ideal if you don't need custom pipelines or complex data transformations. You simply upload your data, configure the table, and let DynamoDB handle the rest.

For more details on this feature, check out the official documentation: DynamoDB S3 Data Import.

If that fits your use case, this post might be more than you need — feel free to stop here.

The Setup

I needed to insert 740,000 items into DynamoDB from a CSV file as part of a migration. Each item was under 1KB, which meant write capacity usage stayed predictable and efficient. I tested:

An AWS Glue Job using Apache Spark.
A Step Function with distributed map writing items directly via PutItem.
A Step Function batching 1,000 items per state, passed to a Lambda using BatchWriteItem.

Here’s what I found.

Benchmark Results

Method	Time to Load 740K Items	Cost	Notes
Glue Job	~12 minutes	~$2.1	Good for large files & data transformation
Step Function + Direct DynamoDB	~120 minutes	~$18.5	Every item is a state transition — ouch
Step Function + Lambda Batches	~5 minutes	~$1.78	Fastest and cheapest with high configurability

Option 1: Glue Job

Glue is great when you’re dealing with S3-based batch inputs or doing big ETL-style transformations. I used 10 Data Processing Units (DPUs), and the job finished in about 12 minutes.

Summary:

⏩ Fast & Scalable
⚙️ Great for data transformation
💰 Charged for minimum run time of 1-minute
🤓 Some Apache Spark knowledge required

It’s a solid option for large datasets, especially if you already have some experience with Glue. Note that for smaller datasets, you'll still be charged for at least 1-minute processing time, even if the job finishes in under a minute.

Option 2: Step Function with Direct DynamoDB Writes

This was by far the simplest implementation — just feed the items into a distributed map and call PutItem on each one. You can also easily configure the concurrency of this step, but be careful about your DynamoDB table's write capacity, or you could get throttled.

Unfortunately, there’s no simple way to batch the items up and use BatchWriteItem directly. Serverless Land has created a workflow to do it, which you can find here.

I tested a similar approach, but it was slower for large datasets due to more state transitions. I imagine for a small enough dataset, this could be a great solution.

However, with the large dataset in my case, it was painfully slow and surprisingly expensive. Even just loading the CSV file from S3 took a long time, plus I hit issues with the state payload input/output being too large.

Summary:

🕓 Took more than 2 hours to write the data
💸 Cost almost $20 because each item is a Step Function state transition ($0.025 per 1,000 transitions adds up quickly)

Simple? Yes. Scalable? Not really.

Option 3: Step Function with Lambda Batches

Here’s where things got good. I batched the input into chunks of 1,000 items using Step Functions' distributed map, then handed each batch off to a Lambda. That Lambda used BatchWriteItem in a loop to write in chunks of 25 (the max per batch write). I ran the Lambda task with a concurrency of 20, but you can adjust this based on your table's write capacity units.

With this setup:

🚀 Completed in ~5 minutes
💵 Cost only $1.78 total
🔁 Batch size of 1,000 kept SFN transitions and Lambda invocations low
🛠 Full control over retries, unprocessed items, and logging

The Lambda had 2GB of memory and finished each batch in ~100ms. Total compute cost was therefore very small.

The Final Take

If you’re bulk-loading a DynamoDB table:

Use Step Functions + Lambda + batch writes if you want the best combo of speed, cost, and control.
Use Glue if your data is already in S3 or you need transformations.
Avoid direct DynamoDB tasks in Step Functions unless your item count is small.

The sweet spot seems to be letting Step Functions handle parallelism and letting Lambda do the writing in batches.

This setup scaled cleanly to over 740K items with no issues and minimal cost.

Considerations

DynamoDB Write Capacity Considerations

In terms of DynamoDB write capacity, I ended up using on-demand mode and just made sure I retried enough times when being throttled — and I was throttled. If you want, you can set the table to have appropriate provisioned write capacity. Here's a rough way to calculate the units:

Required WCUs = (Items per second) × (Item size in KB, rounded up)

For example, in my case, I am running 20 Lambdas concurrently, each writing batches of 25 items. Each batch write takes around 100ms. This means:

Items per second = 250 items per second × 20 concurrency = 5000 items per second

So, applying the formula:

Required WCUs = 5000 (items per second) × 1 (item size in KB)

Therefore, a write capacity of 5000 WCUs would have been ideal to avoid throttling.

In my case, i.e. with on-demand I was throttled quite a bit at first but as the capacity scaled up things settled:

Noting here that I ran this in 2 batches of ~300k

TL;DR

Don’t underestimate the per-transition cost of Step Functions.
DynamoDB is a beast.
Batch everything.
Lambda + Step Functions = powerful combo when tuned right.

Hope this helps you avoid a few hours of experimenting. 🙂

Stay tuned for a follow-up post which will include all the code I used achieve this!

Lessons Learned from Migrating to Amazon Cognito

Emma Moinat — Fri, 08 Nov 2024 08:00:04 +0000

Migrating a user authentication system to Amazon Cognito (or any provider) is a complex process that requires meticulous planning and attention to detail. Throughout our journey, we uncovered critical insights into configuration, best practices, and potential challenges. In this guide, we share the most valuable lessons learnt to help you navigate your Cognito migration smoothly and avoid common pitfalls.

1. Configuration is Key: Get It Right the First Time

Some settings in Amazon Cognito’s user pool configuration are set in stone once selected. This includes critical options like:

Sign-in Options (e.g., email, phone number)
Username Case Sensitivity
Required Attributes (such as phone numbers, email, etc.)

Lesson:

These settings can’t be modified post-creation, so you’ll have to create a new user pool if changes are needed later. This can be disruptive, especially if you’ve already started your migration or have active users.

For example, making a phone number a required attribute may seem harmless—until you encounter partners who can’t provide one. This led to a complete reset of our migration. If you’re uncertain about making an attribute required, consider leaving it optional to maintain flexibility.

2. Mind the Hosted UI Domain Switch

When using Cognito’s Hosted UI, Cognito will assign a default domain, which you can replace with a custom one later. This sounds simple enough, but it’s essential to be aware of how this switch affects user login behaviours. Many users rely on password managers, and any shift in the domain will cause them to lose their saved login autofills, leading to login difficulties and frustrated users.

Cautionary Tale:

A migration was performed without addressing this, resulting in significant login issues. Without observability or tracking in place, the problem wasn’t apparent until trading numbers dropped, causing a scramble to troubleshoot. To avoid this, ensure your new Hosted UI matches your previous login flow’s domain to keep user login habits uninterrupted. Also, ensure you have monitoring and observability in place early so you can catch issues quickly. AWS provides a metric for login success rate so this is a good place to start.

3. The Password Policy Paradox

Cognito offers robust control over password policies, from length to special characters, allowing for a highly secure environment. However, this flexibility can sometimes become a limitation.

Key Points:

Password Requirements: If you’re migrating users with passwords from a legacy system, know that Cognito’s password policy won’t retroactively enforce stricter requirements. Users with passwords not meeting the new criteria will still be allowed, so setting RESET_REQUIRED on their status might be necessary to prompt a password change.
Special Character Compatibility: Cognito’s allowed characters don’t include symbols like £, which can be problematic if users in legacy systems used them. During migration, mismatches in allowed characters can lead to login failures, forcing preemptive password changes on your users.

Advice:

To prepare for migration, align your legacy system’s sign-up/registration password policy with the intended Cognito password requirements as early as possible. This ensures that new passwords are migration-ready. Avoid enforcing password requirements during login to prevent access issues. Instead, set up monitoring in your legacy system to detect any passwords containing special characters not supported by Cognito. If a significant percentage of passwords conflict, consider prompting users to update their passwords before migration. Alternatively, you may issue temporary passwords for affected accounts as outlined below.

4. User Migration Strategies: Choosing the Right Path

Migrating users from an existing system into Cognito can follow various strategies, each with its trade-offs:

Lambda-Based User Migration Trigger: One of the most effective approaches, this trigger allows you to migrate users only when they log in. This method is seamless for active users, while inactive accounts remain untouched, reducing unnecessary migration. You will likely need to keep the trigger active for several months to capture most of the active user base. This method will not check the user's password against your Cognito password policy, so will allow all users to be migrated. As mentioned above, you might want to enforce a password reset for these users.

Be careful with this trigger as the event includes the plaintext password, so we don't want to see any logger.info(event) in this Lambda, or ever for that matter 😉.

Temporary Passwords for Unmigrated Users: For users who don’t log in during the migration period, consider issuing temporary passwords. This is an effective fallback that ensures no user is left behind, though it requires sending reset instructions and user follow-up.
Passwordless Authentication Options: To reduce dependency on legacy password compatibility, explore passwordless methods such as one-time codes or magic links. These can simplify the user experience and reduce migration hurdles by minimising password management complexities. We explored this approach thoroughly but ultimately decided against it, as it altered the user experience too much.

5. Addressing Mobile App Complexity and Version Control

Mobile app releases add a unique layer of complexity during migrations, particularly if the app’s authentication flow relies on the legacy system. Unlike web applications, which can deploy updates quickly, mobile app changes depend on users updating their app versions. This makes it essential to plan for backward compatibility.

If you change authentication flows or UI domains, coordinate with mobile release schedules to ensure users on older versions aren’t locked out. Consider keeping the legacy system active until the majority of mobile users have updated, or providing fallback authentication methods for those on previous versions. A smooth transition often requires close collaboration with your mobile development team to minimise user disruption.

6. User Communication and Support Planning

Considering the potential impacts on user behaviour and login experiences, a solid communication strategy can ease the transition for users. You may want to:

Send emails or in-app notifications ahead of time, informing users of upcoming changes.
Prepare FAQs or help guides on password resets, updating saved passwords, or handling two-factor authentication.
Brief your support teams on possible issues and train them to handle potential login and password-reset queries.

Providing clear, proactive communication can minimise user frustration and reduce the volume of support tickets that may arise during the transition.

In Summary

Migrating to Amazon Cognito requires a precise configuration approach and a close understanding of how each setting affects the user experience. The key lessons from our experience are to:

Set configurations thoughtfully, as they’re often permanent.
Align Hosted UI domains to maintain password manager compatibility.
Establish strong but migration-friendly password policies.
Prepare for mobile complexities by planning around app update cycles.
Develop a user communication and support strategy to ease the transition.

By applying these insights and setting up vigilant monitoring, you can minimise disruptions and ensure a successful migration to Amazon Cognito.

For a more in-depth guide on Cognito user migration approaches, see this post from The Burning Monk.

For an honest—though perhaps slightly biased—review of Cognito and the migration process, check out this article from FusionAuth.

Building Auth0 Actions in TypeScript

Emma Moinat — Thu, 05 Sep 2024 06:16:24 +0000

Auth0 Actions

In 2022 Auth0 announced Auth0 Actions as the successor to both Auth0 Rules and Hooks.

Auth0 Actions provides a unified view across secure, tenant-specific, self-contained functions that allow you to customize the behavior of Auth0. Each action is bound to a specific triggering event on the Auth0 platform, which executes custom code when that event is produced at runtime.
Auth0 blog post about GA of Actions

If you have used or are still using Rules or Hooks you will likely have experienced their limitations. Here is a quick comparison of Hooks and Rules with Actions:

It feels like Actions has it all! Actions offers a smooth developer experience, thanks to its intuitive drag-and-drop flow editor, powerful Monaco code editor and handy version control. Well done Auth0!

Infrastructure as Code (IaC)

While online code editors and drag-and-drop flow editors are convenient, they often fall short when it comes to complex deployments. We need the ability to deploy Auth0 configurations repeatedly, perhaps to different tenants or accounts. Enter IaC.

As this blog post is mostly to discuss Actions and how to handle deploying them from Typescript, we won't delve too deep into the pros and cons of different IaC providers. However, to just mention a few of your options, if you love CDK, then there is a great construct here. Otherwise, there is a Terraform provider here.

In our case we are using the Terraform provider but most of the following steps you can also make use of in the case of the CDK construct.

Implementing An Action

Let's take an example of the Post Login action, and imagine we want to deny someone access if their email is not verified.

Approach 1: Auth0 UI

Editing in the UI you would create something like this:

With the drag-and-drop workflow looking something like this:

Approach 2: Terraform Inline Code

We want to reproduce this configuration in our IaC. Out of the box you can write inline code as follows:

resource "auth0_action" "post_login_action" {
  name    = "PostLoginAction"
  runtime = "node18"
  deploy  = true
  code    = <<-EOT
  /**
   * Handler that will be called during the execution of a PostLogin flow.
   *
   * @param {Event} event - Details about the user and the context in which they are logging in.
   * @param {PostLoginAPI} api - Interface whose methods can be used to change the behavior of the login.
   */
   exports.onExecutePostLogin = async (event, api) => {
     if (!event.user.email_verified) {
        api.access.deny('Please verify your email address to continue.');
     }
   };
  EOT

  supported_triggers {
    id      = "post-login"
    version = "v3"
  }
}

However, inline code is not ideal for many reasons, some of these are:

Maintainability: As the codebase grows, maintaining large blocks of inline code within Terraform configurations can become cumbersome. It makes the Terraform files longer and harder to read, increasing the risk of errors.
Version Control: Managing code across different environments is easier when it's stored in separate files with proper version control. Inline code makes it difficult to track changes independently from the Terraform configuration.
Reusability: Inline code is tied to a specific Terraform resource, limiting its reuse across multiple resources or projects. External files can be imported wherever needed, promoting code reuse.
Testing: Inline code is harder to test in isolation. By keeping the code in separate files, you can easily run unit tests and other checks before deploying it through Terraform.
Type Safety and Tooling: When using TypeScript or other languages that compile to JavaScript, you benefit from type checking, better editor support, and more robust development tools. Inline JavaScript in Terraform doesn't allow for this enhanced development workflow.

Using external files for your code, with Terraform's file function or similar, resolves these issues by separating concerns, improving maintainability, and allowing better integration with development tools.

Thankfully terraform offers a file helper function. With this helper you simply pass in a path and it will go and pull the contents of that file in as a string.

With this file helper you could then target a JavaScript file as follows:

resource "auth0_action" "post_login_action" {
  name    = "PostLoginAction"
  runtime = "node18"
  deploy  = true
  code    = file("${path.module}/path/to/post-login-action.js")

  supported_triggers {
    id      = "post-login"
    version = "v3"
  }
}

This is also a valid approach, but what we really want is to be able to write our code in TypeScript and have it transpile into the JavaScript we need. Enter Rollup.

TypeScript to JavaScript

Auth0 Actions are very specific in how the code needs to look so finding the right Rollup config (rollup.config.js) took some time but with perseverance we got there:

import typescript from '@rollup/plugin-typescript';

export default {
  input: ["src/post-login-action.ts"],
  output: {
    strict: false,
    format: "cjs",
    dir: "dist",
  },
  external: [], // here you can add any external dependencies
  plugins: [
    typescript({ module: 'es6' })
  ]
}

Note that you will of course need to install both rollup and @rollup/plugin-typescript using your package manager.

Thanks to a thread in the Auth0 Community for this config.

This now enables us to write (and test 😍) the following code:

type PostLoginAPI = {
  access: { deny: (message: string) => void };
};

type Event = {
  user: { email_verified: boolean };
  client: { name: string };
};

export const onExecutePostLogin = async (event: Event, api: PostLoginAPI) => {
  if (!event.user.email_verified) {
    api.access.deny('Please verify your email address to continue.');
  }
};

Unfortunately Auth0 currently lacks public type definitions for Event and PostLoginAPI, so I've implemented custom types. I hope Auth0 will release these types in the future, as they would greatly simplify and enhance the type safety of this code.

Once your code is ready you can run rollup -c to transpile your TS code. Finally your terraform resource definition would point to your dist folder where the JS code will be output to:

resource "auth0_action" "post_login_action" {
  name    = "PostLoginAction"
  runtime = "node18"
  deploy  = true
  code    = file("${path.module}/path/to/dist/post-login-action.js")

  supported_triggers {
    id      = "post-login"
    version = "v3"
  }
}

So that's it, your TypeScript code is ready to be deployed as Auth0 Actions.

Considerations

You need to ensure your Action code is built before you attempt a terrform plan or apply. In our case we are using terragrunt which has a helpful before_hook, setup in the terragrunt.hcl file as follows:

terraform {
  ...

  before_hook "before_hook" {
    commands     = ["apply", "plan"]
    execute      = ["bash", "../path/to/pre-build.sh"]
  }
}

Where pre-build.sh is a simple script that runs our Action's build command, in our case npm run build.

There are other options out there for pre-plan or pre-apply hooks, it is not essential that you use terragrunt, although, I do recommend checking it out.

Summary

In summary, Auth0 Actions bring a powerful upgrade to the way we customise and extend Auth0, replacing the older Rules and Hooks with a more flexible and unified platform.

By leveraging TypeScript and tools like Rollup, we can maintain type safety and modular code while deploying through Infrastructure as Code (IaC) solutions like Terraform. This approach not only enhances our ability to manage complex deployments but also improves the overall developer experience.

With Actions, you can efficiently create, test, and deploy secure, tenant-specific functions, making it easier to tailor Auth0 to your specific needs. If you have any questions, please leave a comment.
Thanks for following along!

Cognito Inception: How to add Cognito as OIDC Identity Provider in Cognito

Emma Moinat — Sat, 15 Jun 2024 08:19:12 +0000

What?

Amazon Cognito is an identity platform for web and mobile apps. With Amazon Cognito, you can authenticate and authorise users from a built-in user directory, from your enterprise directory, or from consumer identity providers like Google and Facebook.

This post will look at how to setup AWS Cognito to use an OpenID Connect (OIDC) identity provider of another Cognito user pool.

Open ID Connect (OIDC) is an authentication protocol built on top of OAuth 2.0. It is designed to verify an existing account (identity of an end user) by a third party application using an Identity Provider site (IDP). It complements OAuth 2.0 which is an authorisation protocol.

In this case we are using Cognito as the IDP but you could replace this with many other providers like Salesforce, Github or Azure AD etc. etc.

Why?

You might wonder why you would want to integrate 2 Cognitos? 🤔

In this Cognito hosted UI login screen you can see various authentication options are offered, including an alternative Cognito user pool. There's even an option to log directly into this Cognito user pool, which is all configurable.

Integrating two Cognito user pools can be beneficial if you have a product linked to a Cognito user pool and a customer who has their own Cognito user pool with their user base. This setup allows the customer's user base to access your product without needing to migrate users to your product's user pool.

These 2 Cognito user pools can exist in different accounts and regions.

Why not?

I feel obliged to mention before you go any further with this setup that it will cost you!

Cognito generally is known to be an inexpensive alternative to many other auth providers with one of the major benefits being that there is a free tier of 50,000 monthly active users per account or per AWS organisation. However, this is only the case for users who sign in directly to the user pool or through a social identity provider. So what about users who log in through an OIDC federation like this example. Well...

For users federated through SAML 2.0 or an OpenID Connect (OIDC) identity provider, Amazon Cognito user pools has a free tier of 50 MAUs per account or per AWS organization.
For users who sign in through SAML or OIDC federation, the price for MAUs above the 50 MAU free tier is $0.015.

Cognito pricing

I would recommend doing a quick estimate of the cost of this approach for your use case. Head over to AWS' pricing calculator before you continue any further as you may be surprised by the price. And if you never come back to this blog I will understand why! 😂

How?

Here you can see each step in the authentication process. This is a very standard flow when using an external OIDC provider.

Let's set it up

To keep things clear, we'll refer to the Cognito with the user base as the "Customer user pool" and the other one as the "Product user pool". Our product will first interact with its own user pool (Product user pool) before being redirected to the Customer user pool.

Customer User Pool

In this tutorial we will look at how to set this up from A to Z but in reality the Customer user pool may already exist with its user base. In that case you may just need to create a new client in your existing customer user pool so you can skip some of the following steps.

Let's first set up the Cognito user pool with the user base (i.e. the customer's user pool).

Head to AWS Cognito and click Create user pool.
Select Provider types to be only Cognito user pool and sign-in options to be whatever suits your use case (I chose email):

Follow through the next steps setting up your password policy, MFA, User account recovery and Sign-up experience as you desire.
On the Integrate your app page enter your desired user pool name.
Tick Use the Cognito Hosted UI

Select the domain setup you want but using a cognito domain is fine if you don't have a custom domain.

Set up the client app as follows:

Notice here I have generated a client secret - in this case we need a secret to use this client later as an identity provider. If you don't include it at setup time then you will have to create a new client as this cannot be changed after creation.

Also for now I have entered a placeholder allowed callback url of https://example.com but we will come back to change this later.

In the Advanced app client settings you can leave everything as it is except adjust the scope as follows:

Review and create your user pool!
Let's get a user added to this customer's user base when we are still in the area. Keep note of the user's details as you will need them later of course.

Product User Pool

Let's set up the "Product" Cognito user pool, i.e. the instance that your product will interact directly with.

Head to AWS Cognito and click Create user pool
On the Configure sign-in experience screen select Federated identity providers as an option and the sign-in options whatever suits you:

For Federated sign-in options tick OpenID Connect (OIDC)

Follow through the next steps setting up your password policy, MFA, User account recovery and Sign-up experience as you desire.
Next you will be presented with a Connect federated identity providers screen - this is where the magic happens. Here fill in the client id and client secret from your customer's user pool's app client. (i.e. the client app we created in the steps above)

You'll find those details in the App Integration tab of your Customer's user pool and then selecting the client you created:

Enter them as follows (where the provider name will be what is displayed to the user in the hosted UI later):

Keep Attribute request method as GET
Setup the issuer url where the url will be: https://cognito-idp.{region}.amazonaws.com/{customerUserPoolId}
Add the email attribute and email_verified as shown here:

You can add as many other attributes as you want or need here. Each attribute in a user pool with match exactly to the same attribute in the other user pool, logically.

Name your user pool, for example, product-user-pool.
Setup your app client as you require. It is not required at this point to generate a client secret for this user pool. You can add one if you want but I wouldn't recommend it if you plan to use this user pool in a webapp or mobile app etc.
In the advanced settings, ensure the following:

Set the Identity providers to include your newly created IDP:

If you do not want the user to be able to log in directly to your product user pool via the hosted UI, here you can remove the option of Cognito user pool and have the IDP as the only option.

Set the scopes to match what we set in the other user pool and in the Identity Provider:

Review and create your second user pool!

Final integration

One last step, we need to go to the Customer user pool and adjust the allowed callbacks for the client.
Head to the App integration tab and then click into your client and go to the hosted UI settings.
Set the allowed callbacks to be the following:

https://{productCognitoDomain}/oauth2/idpresponse

Result

Now if you head to the Hosted UI of the Product user pool you will see this:

If you click on the button to login to the customer's user pool you will see this:

And if you look at the url you can see you are on the customer's user pool hosted UI.

You can now log in with the details you set up earlier in the customer's user pool. You are then redirected to the product user pool's redirect url, authenticated and all. Magic. 🪄

Resources

Thanks to Daniel Kim and his original post which you can read here Using Cognito User Pool as an OpenID Connect Provider

Lambda Persistent Storage with EFS using CDK

Emma Moinat — Fri, 22 Dec 2023 17:56:57 +0000

This tutorial is a quick run through how to set up persistent storage for a lambdas using CDK. You might wonder why you would want to do that but I will show you some use cases below.

Elastic File System

The AWS service I will be using for this is the Elastic File System (EFS). When setting up EFS you will need to choose a throughput mode and a performance mode. Your choice will depend on your use case so please take some time to consider what is best for you. Find more details here.

In this example I am using the recommended throughput mode of Elastic and the performance mode of General Purpose. You can change the throughput mode later if really needed but performance mode changes would require migration so let's try to avoid that!

Here we have our file system, which we are deploying inside our Virtual Private Cloud (VPC):

const fileSystem = new FileSystem(this, "FileSystem", {
  vpc: vpc,
  performanceMode: PerformanceMode.GENERAL_PURPOSE,
  throughputMode: ThroughputMode.ELASTIC
});

You can also set properties like encryption or removal policy so take time to consider what setup is best for you.

Access Point

What we need now is an access point for our lambda to mount to:

const accessPoint = fileSystem.addAccessPoint("EfsAccessPoint", {
  createAcl: {
  ownerGid: "1001",
  ownerUid: "1001",
  permissions: "750"
},
  path: "/lambda",
  posixUser: {
    gid: "1001",
    uid: "1001"
  }
});

Setting the path property above is setting the path on the EFS file system to expose as the root directory to the client using this access point. If not set it will default to /.

For more details on the posix user setup check this out.

Lambda Function

We now have all we need to hook up a lambda function to EFS, so here is how we do that:

new Function(this, "EfsLambdaFunction", {
  runtime: Runtime.NODEJS_20_X,
  code: Code.fromAsset("lambda-code"),
  handler: "index.handler",
  vpc: vpc, // lambda must be in the same VPC as the file system
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, "/mnt/some-folder")
});

Of course the important line here is:
filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, "/mnt/some-folder").

This is setting the lambda's access point and the mount path within that access point.

The mount path must start with the folder mnt and have a subfolder after that, but this can really be anything you wish. This value of /mnt/some-folder is going to be very important to your lambda as this is the only folder it can access, if you try to access any file outside the folder /mnt/some-folder/ you will get a permission denied error.

What can be worth doing is passing this mount path value in as an environment variable to your lambda so you don't have to hard code it into the lambda's code. That way if you were ever to change this value you wouldn't have to change your lambda's code. For example:

const mountPath = "/mnt/some-folder";

new Function(this, "EfsLambdaFunction", {
  runtime: Runtime.NODEJS_20_X,
  code: Code.fromAsset("lambda-code"),
  handler: "index.handler",
  vpc: vpc,
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, mountPath),
  environment: {
    EFS_PATH: mountPath
  }
});

This way you can access files in your lambda using this EFS_PATH variable.

Use Cases

Generative AI (Large Language Models)

In my case, the reason I even considered this approach to lambda storage was on a recent Generative AI project.

We wanted to add some Large Language Model guardrails to our project. This required pulling in a dependency, namely, LLM Guard, which needs to pull in around 2.5GB of data in order to run some checks against a range of AI models.

Of course lambdas can handle 2.5GB with its ephemeral (temporary) storage (as we all know this can now be set as high as 10GB). The issue for us was really the performance. If the lambda goes cold, then a request comes in, the user would have to wait for the lambda to pull in all the models before getting an answer. This is a terrible user experience taking around 2 minutes to give a response. Switching to EFS got this down to around 20 seconds, which of course is still not ideal but it is a step in the right direction.

Other Possible Use Cases

Large dependencies - Sometimes pulling in large dependencies can actually cause a timeout in your lambda's init phase. A workaround is to install the dependencies on EFS so then the lambda doesn't need to install it each time. Here are a few walkthroughs of this from AWS:
Node example
Python example
I might look into this for my own use case too!
Processing images or videos - A common use case for lambdas, using EFS provides an efficient option to perform these tasks.
Zipping and unzipping large files - Some workflows require large zip files for initialisation. With EFS your files can remain unzipped, ready to use.
Machine Learning workloads - AI was already mentioned above in my own use case but worth throwing ML into the list anyway. Many machine learning models depend on large reference data files such as models or libraries. Storing these in EFS will help these task to be much more performant!

Alternatives

I feel it is worth mentioning that there are more options for lambda storage than EFS. Here is a comparison provided by AWS:

You can see S3 and lambda layers also mentioned here!

Code Again

Just for clarity, here is all the code thrown together in a simple stack:

import {FileSystem, PerformanceMode, ThroughputMode} from "aws-cdk-lib/aws-efs";
import {Vpc} from "aws-cdk-lib/aws-ec2";
import {Runtime, Function, FileSystem as LambdaFileSystem, Code} from "aws-cdk-lib/aws-lambda";
import {Stack} from "aws-cdk-lib";
import {Construct} from "constructs";

export class EfsLambdaStack extends Stack {
  constructor(scope: Construct) {
    super(scope, "EfsLambdaStack");

    const vpc = new Vpc(this, "Vpc");

    const fileSystem = new FileSystem(this, "FileSystem", {
      vpc: vpc,
      performanceMode: PerformanceMode.GENERAL_PURPOSE,
      throughputMode: ThroughputMode.ELASTIC
    });

    const accessPoint = fileSystem.addAccessPoint("EfsAccessPoint", {
      createAcl: {
        ownerGid: "1001",
        ownerUid: "1001",
        permissions: "750"
      },
      path: "/lambda",
      posixUser: {
        gid: "1001",
        uid: "1001"
      }
    });

    const mountPath = "/mnt/some-folder";

    new Function(this, "EfsLambdaFunction", {
      runtime: Runtime.NODEJS_20_X,
      code: Code.fromAsset("lambda-code"),
      handler: "index.handler",
      vpc: vpc,
      filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, mountPath),
      environment: {
        EFS_PATH: mountPath
      }
    });
  }
}

Thanks for stopping by! Let me know your use cases for persistent lambda storage in the comments! 💁🏻‍♀️

AWS Custom Resource using CDK

Emma Moinat — Wed, 04 Oct 2023 08:16:14 +0000

If you are new to AWS' Cloud Development Kit (CDK), here's a quick explanation of what exactly it is:

The AWS Cloud Development Kit (CDK) is a software development framework that allows you to define and provision cloud infrastructure using familiar programming languages such as TypeScript, Python, and Java.

Traditionally, infrastructure provisioning has been done using templates or scripts that are difficult to read and understand. CDK simplifies this process by allowing you to define your infrastructure in code, using the same programming constructs that you use to build applications.

With CDK, you can define your infrastructure as a series of reusable components called "constructs." These constructs can be shared across your organization and easily reused in multiple projects.

Overall, AWS CDK makes it easier and faster to build and manage cloud infrastructure, with less room for error and greater potential for reuse.

Thanks ChatGPT for that great explanation! 😂

So, now we have the basics of what CDK is and what it does for us, I want to look at building up a Custom Resource.

Why build a Custom Resource?

Custom Resources is a very useful feature of CDK that allows you to define and manage resources that are not available as a resource type in AWS CloudFormation.

There are a few different use cases for such a thing, these include:

Provisioning resources not supported by CDK
Implementing custom logic/configuration
Integrating with third-party services

Using resource types for third-party resources provides you a way to reliably manage these resources using a single tool, without having to resort to time-consuming and error-prone methods like manual configuration or custom scripts.

AWS Documentation

Using a Custom Resource allows us to automate the creation, deletion and updating of these resources across multiple environments as and when we need.

How to build a Custom Resource?

To build a Custom Resource we need 3 things:

A Lambda function for onEvent handling create, delete and update events.
A Provider which points the Custom Resource to the lambda.
The CustomResource itself with any props you need for your third party resource or otherwise.

You can also provide an additional Lambda function to handle isComplete.
This is used when the lifecycle operation cannot be completed immediately.
The isComplete handler will be retried asynchronously after onEvent until it returns { IsComplete: true }, or until it times out.

onEvent Lambda

This function will be invoked for all resource lifecycle operations (Create/Update/Delete).

Here is how this handler might look:

import {
  CloudFormationCustomResourceCreateEvent,
  CloudFormationCustomResourceDeleteEvent,
  CloudFormationCustomResourceEvent,
  CloudFormationCustomResourceResponse,
  CloudFormationCustomResourceUpdateEvent,
} from "aws-lambda";

export const handler = async (event: CloudFormationCustomResourceEvent): Promise<CloudFormationCustomResourceResponse> => {
  switch (event.RequestType) {
    case "Create":
      return await createSomeResource(event as CloudFormationCustomResourceCreateEvent);
    case "Update":
      return await updateSomeResource(event as CloudFormationCustomResourceUpdateEvent);
    case "Delete":
      return await deleteSomeResource(event as CloudFormationCustomResourceDeleteEvent);
  }
};

For each of these cases, we want to respond according.

In other words, for a Create event we would want to, well, create something. Update we might want to delete something and create a new item if our use case doesn't offer a way to directly update the resource. For Delete, we should be deleting our resource. Pretty straight forward.

If you were, for example, using this Custom Resource to integrate with a third party service, you may want to make some specific API calls for creating, updating and deleting.

Understanding Lifecycle Events

It is important to understand how these events will be handled by CloudFormation.

If onEvent returns successfully, CloudFormation will show you the nice green tick of CREATE_COMPLETE for the CustomResource.

However, if onEvent throws an error, CloudFormation will let you know something went wrong and the CDK deploy will fail.

There are some important cases to think about when errors occur:

Do you need to tidy up some other resources if some step fails in your create flow?
What happens if you hit the delete event but there is nothing to delete?

Should be noted, if a Delete event happens to fail, CloudFormation will just abandon that resource moving forward.

You can find more detail on these cases here.

Lambda in CDK

A super simple way to declare this Lambda in CDK is:

readonly onEventHandlerFunction = new NodejsFunction(this, "CustomResourceOnEventHandlerFunction", {
  timeout: Duration.seconds(30),
  runtime: Runtime.NODEJS_18_X,
  entry: "/path/to/CustomResourceOnEventHandler.ts"
});

NodejsFunction creates a Node.js Lambda function bundled using esbuild. This means you can directly pass in your TypeScript file. Cool, right?

Provider

We just need to tell the Provider (from aws-cdk-lib/custom-resources) to point the custom resource to the above function.

readonly customResourceProvider = new Provider(this, "CustomResourceProvider", {
  onEventHandler: this.onEventHandlerFunction,
  logRetention: RetentionDays.ONE_DAY
});

Custom Resource

Finally, we just need to tell the Custom Resource who its provider is and give it a type starting with Custom:: :

readonly resource = new CustomResource(this, "YourCustomResource", {
  serviceToken: this.customResourceProvider.serviceToken,
  properties: {...this.props, id: this.id},
  resourceType: "Custom::YourCustomResource",
});

Result

Bringing these 3 things together, you will create a class similar to:

import {Construct} from "constructs";
import {Provider} from "aws-cdk-lib/custom-resources";
import {RetentionDays} from "aws-cdk-lib/aws-logs";
import {CustomResource, Duration} from "aws-cdk-lib";
import {NodejsFunction} from "aws-cdk-lib/aws-lambda-nodejs";
import {SomeProps} from "../models/SomeProps";
import {Runtime} from "aws-cdk-lib/aws-lambda";

export class YourCustomResource extends Construct {
  constructor(private scope: Construct, private id: string, private props: Omit<SomeProps, "id">) {
    super(scope, id);
  };

  readonly onEventHandlerFunction = new NodejsFunction(this, "CustomResourceOnEventHandlerFunction", {
    timeout: Duration.seconds(30),
    runtime: Runtime.NODEJS_18_X,
    entry: "/path/to/CustomResourceOnEventHandler.ts"
  });

  readonly customResourceProvider = new Provider(this, "CustomResourceProvider", {
    onEventHandler: this.onEventHandlerFunction,
    logRetention: RetentionDays.ONE_DAY
  });

  readonly resource = new CustomResource(this, "YourCustomResource", {
    serviceToken: this.customResourceProvider.serviceToken,
    properties: {...this.props, id: this.id},
    resourceType: "Custom::YourCustomResource",
  });
}

You can see here we are passing through all the props from the YourCustomResource into the Custom Resource.

In the example of making API calls using the Custom Resource we might need something like an API Key to be passed through from our CDK stack into the resource.

Using this YourCustomResource class you can now build up something like this in one of your CDK Stacks:

readonly exampleCustomResource = new YourCustomResource(this, "YourCustomResourceExample", {
    enabled: true,
    apiKey: "api-key", // This prop could differ per environment
    name: "Example Custom Resource"
});

Your CDK diff for this stack would look something like this:

Now we have an understanding of how to build up the infrastructure for this Custom Resource, we now just need to determine how the onEvent Lambda will handle each of the lifecycle events.

This depends on your use case but the possibilities are endless really!

This part I will leave up to you as it is very specific per scenario, but I hope this has helped you on your journey of getting a CustomResource up and running.

What's Next for CDK? 👀

Emma Moinat — Mon, 02 Oct 2023 11:48:43 +0000

Last week we had the yearly CDK Day where we saw many talks from all around the world. If you missed it you can find the schedule and links to all the talks here.

One of the talks was from the CDK team where they talked about the recent improvements of CDK and what is coming next. You can find the talk here:
Meet the CDK team

Recent Improvements

Some enhancements from this year were:

Improved L2 Construct Coverage ⛱️
Policy Validation at Synth Time 🫶
Improved Permissions Boundaries via Bootstrap 🔐
App Staging Synthesizer 🎛️

What's Next?

New Channel

CDK are launching a new YouTube channel called CDK Live. This will not only be about hearing from AWS but hearing from the community too.

This will become the go to source for anything related to AWS CDK. The channel will be a blend of tutorials, deep dives and interviews with industry experts. 🆒

Improvements

From listening to their users, the CDK team have flagged the areas they wish to improve by the end of this year and beyond.

Speed

CDK is slow, this we can all agree on. This is mostly caused by CloudFormation.

You'll be glad to hear that the CloudFormation and the CDK teams are working together to improve this.

According to the CDK team, we can expect some big performance improvements before the end of the year. 🎉

Migration

If you have existing cloud resources it is currently quite challenging to shift these resources into CDK.

The CDK team will be introducing a cdk migrate capability.

The basic principle here is to take a CloudFormation template and autogenerate some CDK code.

This is actually already available but is still an experimental feature so there are no guarantees about the outcome or stability of the functionality.

Refactoring

This was the most voted RFC.

As developers, we want to be able to refactor the CDK code and for CDK to be smart enough to not just recreate resources from scratch but to relocate them, for example, into another stack.

This will potentially be available officially at the start of next year. 🤞

These are some of the main areas the CDK team will be focusing on in the upcoming months but of course, as always, there will be improvements for construct coverage, and many other enhancements.

I hope you are also excited for what is to come next for CDK.

AWS CDK: Principle of Least Privilege

Emma Moinat — Fri, 29 Sep 2023 14:51:10 +0000

CDK Deploy

AWS CDK creates multiple roles at bootstrap time which allows CDK to deploy infrastructure on your behalf using a CloudFormation deployment. This can be kicked off by a developer or by an automated system like a pipeline.

In order to deploy, the actor needs the permissions to assume the created CDK roles:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": [
                "arn:aws:iam::*:role/cdk-*"
            ]
        }
    ]
}

With this permission you have everything you need to deploy your stacks.

Bootstrap

This post is to address the permissions required for cdk deploy, however, I think it is worth looking at the permissions required to bootstrap an environment. Of course this step is only required once per account for each region so once you have completed this, you won't need these permissions again.

The permissions are as follows:

Permissions to deploy the CDK bootstrap CloudFormation stack.

{
    "Effect": "Allow",
    "Action": [
        "cloudformation:CreateChangeSet",
        "cloudformation:DeleteChangeSet",
        "cloudformation:DeleteStack",
        "cloudformation:DescribeChangeSet",
        "cloudformation:DescribeStacks",
        "cloudformation:DescribeStackEvents",
        "cloudformation:ExecuteChangeSet",
        "cloudformation:GetTemplate"
    ],
    "Resource": [
        "arn:aws:cloudformation:eu-west-1:112233445566:stack/CDKToolkit/*"
    ]
}

Permissions to create the CDK bootstrap roles we mentioned above.

{
    "Effect": "Allow",
    "Action": [
        "iam:AttachRolePolicy",
        "iam:CreateRole",
        "iam:DeleteRole",
        "iam:DeleteRolePolicy",
        "iam:DetachRolePolicy",
        "iam:GetRole",
        "iam:GetRolePolicy",
        "iam:PutRolePolicy",
        "iam:TagRole"
    ],
    "Resource": [
        "arn:aws:iam::112233445566:role/cdk-hnb659fds-cfn-exec-role-*",
        "arn:aws:iam::112233445566:role/cdk-hnb659fds-file-publishing-role-*",
        "arn:aws:iam::112233445566:role/cdk-hnb659fds-image-publishing-role-*",
        "arn:aws:iam::112233445566:role/cdk-hnb659fds-lookup-role-*",
        "arn:aws:iam::112233445566:role/cdk-hnb659fds-deploy-role-*"
    ]
}

Permissions to create the CDK bootstrap bucket. CDK uses this bucket to stage S3 assets in your application, such as Lambda function bundles and static assets in your frontend applications.

{
    "Effect": "Allow",
    "Action": [
        "s3:CreateBucket",
        "s3:DeleteBucketPolicy",
        "s3:GetEncryptionConfiguration",
        "s3:GetBucketPolicy",
        "s3:PutBucketPolicy",
        "s3:PutBucketVersioning",
        "s3:PutEncryptionConfiguration",
        "s3:PutLifecycleConfiguration",
        "s3:PutBucketPublicAccessBlock"
    ],
    "Resource": [
        "arn:aws:s3:::cdk-hnb659fds-assets-*"
    ]
}

Permissions to create CDK bootstrap ECR repository. CDK uses this repository to stage Docker images in your application.

{
    "Effect": "Allow",
    "Action": [
        "ecr:CreateRepository",
        "ecr:DeleteRepository",
        "ecr:DescribeRepositories",
        "ecr:PutLifecyclePolicy",
        "ecr:SetRepositoryPolicy"
    ],
    "Resource": [
        "arn:aws:ecr:eu-west-1:112233445566:repository/cdk-hnb659fds-container-assets-*"
    ]
}

Permissions to create CDK bootstrap version SSM parameter. The parameter stores the version of the deployed CDK bootstrap stack.

{
    "Effect": "Allow",
    "Action": [
        "ssm:DeleteParameter",
        "ssm:GetParameters",
        "ssm:PutParameter"
    ],
    "Resource": [
        "arn:aws:ssm:eu-west-1:112233445566:parameter/cdk-bootstrap/hnb659fds/version"
    ]
}

Execution Role Issues

The exec role that CDK creates has caused some problems for users as this role has a default of AdministratorAccess. For some users this is simply not possible for security reasons, understandably.

There is an option at bootstrap time to instead pass the ARNs of managed policies:

--cloudformation-execution-policies specifies the ARNs of managed policies that should be attached to the deployment role assumed by AWS CloudFormation during deployment of your stacks. By default, stacks are deployed with full administrator permissions using the AdministratorAccess policy.

For Example:

--cloudformation-execution-policies "arn:aws:iam::aws:policy/AWSLambda_FullAccess,arn:aws:iam::aws:policy/AWSCodeDeployFullAccess".

Caveats

Testing

If you use your pipeline to also run some post deployment tests or to maybe inject some config into your deployed infrastucture you will need to carefully consider all the permissions you need for those steps.

Developer Experience

As a developer, you do not want to be continually denied to perform the things you need to do for your job. If your permisions are locked down too tightly then it can hinder your productivity. You may just need to be patient until you find the right balance between least privlege and productivity.

Least Privilege Principle

It can take time and patience to follow this principle but for security reasons it is always worth sticking with it until you find that balance. If you use AWS Organisations within your company you can save some time by finding the balance once for your developers and then applying it across the board.

EC2 Charges — They are NAT a joke

Emma Moinat — Wed, 29 Mar 2023 11:31:52 +0000

How often do you check your cloud provider bill and costs? Do you know what your biggest costs in the cloud are? Have you checked if all the costs are justified?

This story begins with me doing just that...

I was investigating a project that is running in AWS. One of the biggest costs is Elastic Compute Cloud (EC2), which accounts for nearly 70% of the total bill:

However, when I was having a look at these EC2 costs in more detail, there was an extra EC2-Other service which accounted for a hefty amount of the total EC2 costs:

EC2-Other — sorry, what? 😐

After some research into this EC2-Other service, it sounds almost like it is an expected cost, and it was just part and parcel of using EC2:

The EC2-Other category includes multiple service-related usage types, tracking costs associated [with] Amazon EBS volumes and snapshots, elastic IP addresses, NAT gateways, data transfer, and more.

– EC2 Cost Explorer

So, for a moment I stopped looking into these costs and just assumed they were normal.

EC2-Other — sorry, no. ✋

A few days had passed, and I was still investigating that same project. This time I was looking specifically at a monthly bill breakdown. This was when I noticed something odd:

Did you spot it?

52,664 GB (~53 TB) of data being processed through our NAT gateway — that is suspicious. We would expect to see some data being transferred through the NAT gateway but that amount is absurd.

My investigations began, and well it did not take long to find an answer.

I determined that every time a container in EC2 was pulling an image from ECR (Elastic Container Registry) it was being transferred through our NAT gateway. Every file that was being transferred from S3 to a container or vice versa, this was going through the gateway. Even our logs being sent from any container to CloudWatch, through the gateway…

Now the costs make a little more sense.

EC2-Other — sorry, bye 👋

Do not panic — there is a solution — enter VPC Endpoints, also known as AWS PrivateLink.

You can use AWS PrivateLink to connect the resources in your VPC to services using private IP addresses, as if those services were hosted directly in your VPC.

– AWS PrivateLink concepts — Amazon Virtual Private Cloud

If you use your console for changes to infrastructure then follow this tutorial on how to set these endpoints up for S3, ECR and logs: Create Private Links via Console or also this one provided by AWS: New VPC Endpoint for S3.

However, if you use CDK to deploy your infrastructure, like a hero, then here is how to set up your 3 new VPC endpoints. I am using Kotlin but if you use TypeScript or another language it shouldn’t be too hard to adjust.

CDK (In Kotlin)

Here are your imports for this setup:

import software.amazon.awscdk.services.ec2.GatewayVpcEndpoint
import software.amazon.awscdk.services.ec2.GatewayVpcEndpointProps
import software.amazon.awscdk.services.ec2.GatewayVpcEndpointAwsService
import software.amazon.awscdk.services.ec2.InterfaceVpcEndpoint
import software.amazon.awscdk.services.ec2.InterfaceVpcEndpointProps
import software.amazon.awscdk.services.ec2.InterfaceVpcEndpointService

You’ll need to have your VPC available in this stack.

Gateway Endpoint

Firstly here is your Gateway Endpoint for S3:

private val s3VpcEndpoint = GatewayVpcEndpoint(
  scope,
  "your-s3-endpoint",
  GatewayVpcEndpointProps.builder()
    .vpc(vpc)
    .service(GatewayVpcEndpointAwsService.S3)
    .build()
)

At the time of writing this, gateway endpoints were only available for S3 and DynamoDB — otherwise you must use an Interface Endpoint. Gateway endpoints for S3 are offered at no cost and the routes are managed through route tables.

Interface Endpoint

Interface endpoints are priced at $0.01/per AZ/per hour. Cost depends on the Region, check current pricing. Data transferred through the interface endpoint is charged at $0.01/per GB (depending on Region).

For the Interface Endpoint for ECR you need to have your security groups available:

It is important to include the correct security groups here or else you could have issues with your containers not being able to pull from ECR.

private val ecrVpcEndpoint = InterfaceVpcEndpoint(
  scope,
  "your-ecr-endpoint",
  InterfaceVpcEndpointProps.builder()
.service(InterfaceVpcEndpointService("com.amazonaws.$yourRegion.ecr.dkr"))
    .vpc(vpc)
    .privateDnsEnabled(true)
    .securityGroups(
      listOf(exampleSecurityGroup)
    )
    .build()
)

Then for the Interface Endpoint for Logs:

private val logsVpcEndpoint = InterfaceVpcEndpoint(
  scope,
  "your-logs-endpoint",
  InterfaceVpcEndpointProps.builder()
    .service(InterfaceVpcEndpointService("com.amazonaws.$yourRegion.logs"))
    .vpc(vpc)
    .privateDnsEnabled(true)
    .securityGroups(
      listOf(exampleSecurityGroup)
    )
    .build()
)

Sadly with CDK you cannot add a name tag to these endpoints (see issue here) so when you deploy you will see something that looks like this:

So, I pushed these changes to our production environment halfway through the day 23rd June, but I think you can see that quite clearly here:

So after leaving our system to run for a little while with these endpoints in place we started to see the EC2-Other costs drop significantly. For us, our costs dropped from over $2000 per month to just over $200!

I hope that you too can find a use for these VPC Endpoints to help lower your costs!

There is an increase, of course, in the costs for your VPC for these endpoints, but I am not too worried about the $1 increase there:

Thanks for coming on this journey with me, and I hope this can help you save some precious dolla bills 💲

Emma.

Building an OpenSearch Index from DynamoDB with CDK

Emma Moinat — Wed, 18 Jan 2023 16:06:19 +0000

Introduction

We will be looking at how to set up an OpenSearch index from a DynamoDB table. We will assume you have some knowledge of DynamoDB and Lambdas and also are familiar with using CDK for deploying infrastructure into AWS.

DynamoDB

Firstly, let’s think about our DynamoDB table and how to set it up in a way that it is ready to be indexed. This is actually very straight forward and utilises a DynamoDB stream.

"A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table." - AWS

Using CDK your table might look like:

const userTable = new dynamodb.Table(this, "UserTable", {
  tableName: "user-table",
  billingMode: BillingMode.PAY_PER_REQUEST,
  partitionKey: {name: "partitionKey", type: AttributeType.STRING},
  sortKey: {name: "sortKey", type: AttributeType.STRING},
  pointInTimeRecovery: true,
  stream: StreamViewType.NEW_IMAGE // This is the important line!
});

If you are using the console instead of CDK see this.

There are different types of streams:

KEYS_ONLY - Only the key attributes of the modified item are written to the stream.
NEW_IMAGE - The entire item, as it appears after it was modified, is written to the stream.
OLD_IMAGE - The entire item, as it appeared before it was modified, is written to the stream.
NEW_AND_OLD_IMAGES - Both new and old item images of the item are written to the stream.

Here we have chosen NEW_IMAGE because we only need to know the new item to index.

This will create a table with a DynamoDB stream, which means any new, updated or deleted item events will be streamed into a place of your choosing; we have chosen a Lambda.

Lambda

So, next up, we must think about the indexing Lambda. There is currently no direct way to index your data from a stream to the OpenSearch domain, so we must add a middle man to do the work. More on this can be found here.

The code for this Lambda and a few other helpful Lambdas can be found here on Github. This Lambda lives in the index-stream directory.

Here is a code snippet of this Lambda’s handler:

export const handler = async (event: DynamoDBStreamEvent): Promise<void> => {
  console.log("Received event from the user table");

  for (const record of event.Records) {
    if (!record.eventName || !record.dynamodb || !record.dynamodb.Keys) continue;

    const partitionKey = record.dynamodb.Keys.partitionKey.S;
    const sortKey = record.dynamodb.Keys.sortKey.S;
    // Note here that we are using a pk and sk 
    // but maybe you are using only an id, this would look like:
    // const id = record.dynamodb.Keys.id.S;

    try {
      if (record.eventName === "REMOVE") {
        // performing a DELETE request to your index
        return await removeDocumentFromOpenSearch(partitionKey, sortKey);
      } else {
        // There are 2 types of events left to handle, INSERT and MODIFY, 
        // which will both contain a NewImage
        if (!record.dynamodb.NewImage) continue;

        const userDocument = DynamoDB.Converter.unmarshall(record.dynamodb.NewImage) as User;
        // performing a PUT request to your index
        return await indexDocumentInOpenSearch(userDocument, partitionKey, sortKey);
      }
    } catch (error) {
      console.error("Error occurred updating OpenSearch domain", error);
      throw error;
    }
  }
};

In CDK you can create your Lambda as follows:

const userTableIndexingFunction = new Function(this, "UserTableIndexingFunction", {
  functionName: "UserTableIndexingFunction",
  code: Code.fromAsset("user-table-indexing-lambda-dist-folder"),
  runtime: Runtime.NODEJS_16_X,
  handler: "index.handler"
});

Then we can add the DynamoDB stream as a source event to this Lambda.

userTableIndexingFunction.addEventSource(new DynamoEventSource(userTable, {
  startingPosition: StartingPosition.TRIM_HORIZON,
  batchSize: 1, // Our lambda could handle this being more than 1 as well but of the for loop
  retryAttempts: 3
}));

There are 2 types of starting positions:

TRIM_HORIZON - Start reading at the last untrimmed record in the shard in the system, 
               which is the oldest data record in the shard. 
               In other words, the stream will look at all the item events and 
               deal with them in chronological order (oldest event to most recent event)

      LATEST - Start reading just after the most recent record in the shard, 
               so that you always read the most recent data in the shard. 
               In other words, the stream will look at all the item events and 
               deal with the most recent first and work down until the oldest event.

For this example, we therefore use TRIM_HORIZON so that the index will reflect the data in its current state.

OpenSearch

Now, let’s look at the actual OpenSearch domain setup. Now, AWS suggests some substantial power (and therefore money) for a production ready domain. You can find the best practices here. For this example we will use a very small setup with no redundancy, however, feel free to scale this up based on your needs:

const openSearchDomain = new Domain(this, "OpenSearchDomain", {
    version: EngineVersion.OPENSEARCH_1_0,
  capacity: {
    dataNodeInstanceType: "t3.small.search",
    dataNodes: 1,
    masterNodes: 0
  },
  ebs: {
    enabled: true,
    volumeSize: 50,
    volumeType: EbsDeviceVolumeType.GENERAL_PURPOSE_SSD
  }
});

This will deploy your OpenSearch domain, this can take some time, so be patient.

One final thing to think about is the granting your Lambda the rights to read and write to your domain. In your stack with your OpenSearch domain, add this:

openSearchDomain.grantIndexReadWrite("user-index", userTableIndexingFunction);

This will allow your Lambda to do its job.

That's it, you are all set up and ready to index any new data into your OpenSearch index.

For indexing existing data, you can find a helpful Lambda under the index-data directory here on Github.