<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael Bahr</title>
    <description>The latest articles on DEV Community by Michael Bahr (@michabahr).</description>
    <link>https://dev.to/michabahr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F262125%2F27393e45-5540-45fe-94c0-61daf0753dfc.jpg</url>
      <title>DEV Community: Michael Bahr</title>
      <link>https://dev.to/michabahr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/michabahr"/>
    <language>en</language>
    <item>
      <title>How to Defend Against AWS Surprise Bills</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Thu, 14 Jan 2021 13:48:22 +0000</pubDate>
      <link>https://dev.to/michabahr/how-to-defend-against-aws-surprise-bills-c2a</link>
      <guid>https://dev.to/michabahr/how-to-defend-against-aws-surprise-bills-c2a</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was first published on &lt;a href="https://bahr.dev" rel="noopener noreferrer"&gt;bahr.dev&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://subscribe.bahr.dev/now" rel="noopener noreferrer"&gt;Subscribe to get new articles&lt;/a&gt; straight to your inbox!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Short on time?&lt;/strong&gt; Set up Budget Alerts in less than 2 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Got a surprise bill?&lt;/strong&gt; Here’s how you can contact AWS Support.&lt;/p&gt;

&lt;p&gt;Imagine you’ve been running a hobby project in the cloud for the last 6 months. Every month you paid 20 cents. Not enough to really care about. However one morning you notice a surprisingly large transaction of $2700.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Good morning, $2700 AWS bill!  &lt;/p&gt;

&lt;p&gt;Holy shit...&lt;/p&gt;

&lt;p&gt;— Chris Short @ KubeCon (&lt;a class="mentioned-user" href="https://dev.to/chrisshort"&gt;@chrisshort&lt;/a&gt;) &lt;a href="https://twitter.com/ChrisShort/status/1279406322837082114?ref_src=twsrc%5Etfw" rel="noopener noreferrer"&gt;July 4, 2020&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cloud computing allows us to pay for storage, compute and other services as we use them. Instead of going to a computer shop and buying a server rack, we can use services and get a bill at the end of the month. The downside is however that we can use more than we might have money for. This can be especially tricky with serverless solutions which automatically scale up with the traffic that comes in.&lt;/p&gt;

&lt;p&gt;Accidentally leaving an expensive VM running, or having your Lambda functions spiral out of control, may lead to a dreaded surprise bill.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://t.co/tAqUqCoV9R" rel="noopener noreferrer"&gt;pic.twitter.com/tAqUqCoV9R&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;— Fernando (@fmc_sea) &lt;a href="https://twitter.com/fmc_sea/status/1328510918855073793?ref_src=twsrc%5Etfw" rel="noopener noreferrer"&gt;November 17, 2020&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this article we’ll take a look at how billing works, and what you can do to prevent surprise bills.&lt;/p&gt;

&lt;h2&gt;
  
  
  Focus On Small Bills
&lt;/h2&gt;

&lt;p&gt;This article &lt;strong&gt;focuses on personal or small company accounts&lt;/strong&gt; with relatively small bills. While a $3000 spike in cost might not be noticeable in a large corporate bill, it can be devastating for a personal account that you run hobby projects on.&lt;/p&gt;

&lt;h2&gt;
  
  
  There’s No Perfect Solution
&lt;/h2&gt;

&lt;p&gt;Unfortunately there’s no perfect solution to prevent surprise bills. As &lt;a href="https://www.lastweekinaws.com/podcast/aws-morning-brief/whiteboard-confessional-the-curious-case-of-the-9000-aws-bill-increase/" rel="noopener noreferrer"&gt;Corey Quinn explains on his podcast&lt;/a&gt;the AWS billing system can take a couple hours to receive all data, in some cases up to 24 or 48 hours. As a result the Budget Alerts might trigger hours or days after a significant spending happened. Budget alerts are still a great tool to prevent charges that take more than a day or two to accrue, e.g. forgetting an expensive EC2 instance that you used to follow a machine learning workshop.&lt;/p&gt;

&lt;p&gt;It’s up to you how much time you want to invest to reduce the risk of surprise bills, but I highly recommend you to &lt;strong&gt;take 2 minutes to set up Budget Alerts&lt;/strong&gt;!&lt;/p&gt;

&lt;h2&gt;
  
  
  Defense Mechanisms
&lt;/h2&gt;

&lt;p&gt;There are multiple mechanisms that you can apply to defend against surprise bills. The ones we look into include security, alerting, remediating actions and improved visibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Secure Your Account With Multi Factor Authentication
&lt;/h3&gt;

&lt;p&gt;This is the &lt;strong&gt;first thing you should set up&lt;/strong&gt; when creating a new AWS account.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Hey! You! 👋 Do you own an AWS account?  &lt;/p&gt;

&lt;p&gt;🚨STOP SCROLLING AND CHECK THIS NOW- is MFA enabled on your root account?   &lt;/p&gt;

&lt;p&gt;Yes? Cool, carry on 🙋🏻‍♂️  &lt;/p&gt;

&lt;p&gt;No? ENABLE IT NOW! PLEASE! 🙏🏽  &lt;/p&gt;

&lt;p&gt;This reminder brought to you by an SA who had two customers with theirs account compromised in a week 🙈&lt;/p&gt;

&lt;p&gt;— Karan (@somecloudguy) &lt;a href="https://twitter.com/somecloudguy/status/1331288928096309249?ref_src=twsrc%5Etfw" rel="noopener noreferrer"&gt;November 24, 2020&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa_enable_virtual.html" rel="noopener noreferrer"&gt;Follow this official guide&lt;/a&gt; from AWS to set up multi factor authentication (MFA) for your account. By activating MFA on your account, you add another barrier for malicious attackers.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Budget Alerts
&lt;/h3&gt;

&lt;p&gt;This is the &lt;strong&gt;second thing you should set up&lt;/strong&gt; when creating a new AWS account.&lt;/p&gt;

&lt;p&gt;Budget Alerts are the most popular way to keep an eye on your spending. By creating a budget alert, you will get a notification e.g. via e-mail which tells you that the threshold has been exceeded. You can further customize notifications through &lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/budgets-sns-policy.html" rel="noopener noreferrer"&gt;Amazon SNS&lt;/a&gt;or &lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/sns-alert-chime.html" rel="noopener noreferrer"&gt;AWS Chatbot&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here’s a &lt;a href="https://youtu.be/_YXSDIPFhTI" rel="noopener noreferrer"&gt;short video&lt;/a&gt; (52 seconds) that you can follow to create your first Budget Alert. &lt;a href="https://www.youtube.com/watch?v=MKNtSOQXFrY" rel="noopener noreferrer"&gt;Ryan H Lewis made a longer video&lt;/a&gt; with some more context around Budget Alerts, and the many ways you can configure them.&lt;/p&gt;

&lt;p&gt;If you’re already using the CDK then the package &lt;a href="https://awscdk.io/packages/@stefanfreitag/aws-budget-notifier@0.1.5/#/" rel="noopener noreferrer"&gt;aws-budget-notifier&lt;/a&gt; gets you started quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What amount should you start with?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start with an amount that’s a bit above your current spending and that you’re comfortable with. If you’re just starting out, $10 is probably a good idea. If you already have workloads running for a few months, then take your average spending and add 50% on top.&lt;/p&gt;

&lt;p&gt;I also recommend to set up &lt;strong&gt;multiple billing alerts at various thresholds&lt;/strong&gt; :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The comfortable alert: This is an amount that you’re comfortable spending, but you want to look into the bill over the next days.&lt;/li&gt;
&lt;li&gt;The dangerous alert: At this amount, you’re not comfortable anymore, and want to shut down a service as soon as possible. If your comfortable amount is $10, this one might be $100.&lt;/li&gt;
&lt;li&gt;The critical alert: At this amount, you want to nuke your account from orbit. With a comfortable amount of $10, this one might be $500. You can attach Budget Actions or pager alerts to this alarm to automatically stop EC2 instances or wake you up at night.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As an addition to predefined thresholds, you can also try out &lt;a href="https://aws.amazon.com/aws-cost-management/aws-cost-anomaly-detection/" rel="noopener noreferrer"&gt;AWS Cost Anomaly Detection&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DANGER - The Orbital Nuke Option&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As you can send notifications to SNS, you can trigger a Lambda function that runs &lt;a href="https://github.com/rebuy-de/aws-nuke" rel="noopener noreferrer"&gt;aws-nuke&lt;/a&gt; which will tear down all the infrastructure in your account. Do not use this on any account that you have production data in. If you want to learn more about this, &lt;a href="https://github.com/rebuy-de/aws-nuke" rel="noopener noreferrer"&gt;check out the GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Budget Actions
&lt;/h3&gt;

&lt;p&gt;AWS &lt;a href="https://aws.amazon.com/about-aws/whats-new/2020/10/announcing-aws-budgets-actions/" rel="noopener noreferrer"&gt;recently announced Budget Actions&lt;/a&gt;. This is an extension to Budget Alerts, where you can trigger actions when a budget exceeds its threshold. In addition to sending e-mail notifications, you can now apply custom IAM policies like “Deny EC2 Run Instances” or let AWS shut down EC2 and RDS instances for you as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fbahr.dev%2Fpictures%2F2020%2Fsurprisebills%2Fbudget-action-shut-down-ec2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fbahr.dev%2Fpictures%2F2020%2Fsurprisebills%2Fbudget-action-shut-down-ec2.png" alt="Budget Action to Shut Down an EC2 Instance"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Mobile App
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://aws.amazon.com/console/mobile/" rel="noopener noreferrer"&gt;AWS Console Mobile Application&lt;/a&gt; puts the cost explorer just 3-5 taps away. This way you can check in on your spending with minimal effort.&lt;/p&gt;

&lt;p&gt;Below you can see two screens from the mobile app:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fbahr.dev%2Fpictures%2F2020%2Fsurprisebills%2Fmobile-app.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fbahr.dev%2Fpictures%2F2020%2Fsurprisebills%2Fmobile-app.png" alt="Cost Explorer in Mobile App"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To use the app you should &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html" rel="noopener noreferrer"&gt;set up a dedicated user&lt;/a&gt;that only &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html" rel="noopener noreferrer"&gt;gets the permissions&lt;/a&gt; that the app needs to display your spending.&lt;/p&gt;

&lt;p&gt;Here’s an IAM policy that grants read access to the cost explorer as well as cloudwatch alarms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ce:DescribeCostCategoryDefinition",
                "ce:GetRightsizingRecommendation",
                "ce:GetCostAndUsage",
                "ce:GetSavingsPlansUtilization",
                "ce:GetReservationPurchaseRecommendation",
                "ce:ListCostCategoryDefinitions",
                "ce:GetCostForecast",
                "ce:GetReservationUtilization",
                "ce:GetSavingsPlansPurchaseRecommendation",
                "ce:GetDimensionValues",
                "ce:GetSavingsPlansUtilizationDetails",
                "ce:GetCostAndUsageWithResources",
                "ce:GetReservationCoverage",
                "ce:GetSavingsPlansCoverage",
                "ce:GetTags",
                "ce:GetUsageForecast",
                "health:DescribeEventAggregates",
                "cloudwatch:DescribeAlarms",
                "aws-portal:ViewAccount",
                "aws-portal:ViewUsage",
                "aws-portal:ViewBilling"
            ],
            "Resource": "*"
        }
    ]
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can group the permissions into 3 sets:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cost Explorer Read Access (everything that starts with &lt;code&gt;ce:&lt;/code&gt;). These let us get detailed information about or current and forecasted spending.&lt;/li&gt;
&lt;li&gt;CloudWatch Alarms Read Access (&lt;code&gt;cloudwatch:DescribeAlarms&lt;/code&gt;). This allows you to see if there are any alarms, but doesn’t let you get further than that.&lt;/li&gt;
&lt;li&gt;General Access (permissions starting with &lt;code&gt;aws-portal:&lt;/code&gt; and &lt;code&gt;health:&lt;/code&gt;). These allow you to display the mobile dashboard properly. As far as I understand and tested, without they don’t give you access to the spending details, but without them you can’t show the dashboards.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Please &lt;a href="https://twitter.com/bahrdev" rel="noopener noreferrer"&gt;let me know&lt;/a&gt; if any of these permissions can be removed.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Secrets Manager
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.reddit.com/r/aws/comments/9i2zzh/huge_unexpected_45k_bill_for_ec2_instances/" rel="noopener noreferrer"&gt;If access keys get leaked through public repositories&lt;/a&gt;, malicious actors can start expensive EC2 instances in your account and use it for example to mine Bitcoin. There are also reports of instances hidden away in less frequently used regions, small enough that they don’t get noticed in the bill summary.&lt;/p&gt;

&lt;p&gt;To keep your code free from access keys or other secrets, you can use the &lt;a href="https://aws.amazon.com/secrets-manager/" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt; to store the secrets which your code needs at runtime.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/secrets-manager/" rel="noopener noreferrer"&gt;Follow this AWS tutorial&lt;/a&gt; to create your first secret. Once you’ve created one, replace the secret from your code base by using one of the official AWS clients (&lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/secretsmanager.html" rel="noopener noreferrer"&gt;boto3 for Python&lt;/a&gt;) to retrieve the secret.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3

client = boto3.client('secretsmanager')

response = client.get_secret_value(SecretId='replace-me')

secret = response['SecretString']

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please note that each secret will cost you $0.40 per month, as well as $0.05 per 10,000 API calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contact Support
&lt;/h2&gt;

&lt;p&gt;If you experienced a surprise bill, stop the apps that cause the high spending, rotate your access keys if necessary and contact AWS support.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/oKNAxfmQMZM" rel="noopener noreferrer"&gt;Here’s a 20 seconds video which guides you to the support case&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The steps to file a support ticket are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the top right click on Support and then select the Support Center&lt;/li&gt;
&lt;li&gt;Press the orange button that says Create case&lt;/li&gt;
&lt;li&gt;Select Account and billing support&lt;/li&gt;
&lt;li&gt;As type select “Billing” and as category select “Payment issue”&lt;/li&gt;
&lt;li&gt;Now fill out the details and submit&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While there are folks who got their surprise bill reimbursed, please don’t rely on this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The first thing you should do is set up MFA and Budget Alerts. After that you can look into more advanced operations like Budget Actions to lock down your account if spending spikes.&lt;/p&gt;

&lt;p&gt;If your applications use secrets or access keys, you can prevent them from accidentally ending up in your repositories by storing the secrets in the AWS Secrets Manager instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://chrisshort.net/the-aws-bill-heard-around-the-world/" rel="noopener noreferrer"&gt;The AWS bill heard around the world&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.lastweekinaws.com/podcast/aws-morning-brief/whiteboard-confessional-the-curious-case-of-the-9000-aws-bill-increase/" rel="noopener noreferrer"&gt;Last Week in AWS Podcast&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/checklistforunwantedcharges.html" rel="noopener noreferrer"&gt;AWS Checklist for avoiding unexpected charges&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/budgets-best-practices.html" rel="noopener noreferrer"&gt;AWS’ best practices for budgets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/better-programming/how-to-protect-yourself-from-unexpectedly-high-aws-bills-4ec91bbe66f4" rel="noopener noreferrer"&gt;How to Protect Yourself From Unexpectedly High AWS Bills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ceoraford.com/posts/never-get-an-unexpected-aws-bill-again!/" rel="noopener noreferrer"&gt;Never Get an Unexpected AWS Bill Again!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=MKNtSOQXFrY" rel="noopener noreferrer"&gt;YouTube: How to avoid Huge AWS Bills with AWS Budgets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=FVwdlJ8lM0Q" rel="noopener noreferrer"&gt;YouTube: How to set up Budget Alerts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>cost</category>
    </item>
    <item>
      <title>How To Get Random Records From A Serverless Application</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Thu, 07 Jan 2021 00:00:00 +0000</pubDate>
      <link>https://dev.to/michabahr/how-to-get-random-records-from-a-serverless-application-25gd</link>
      <guid>https://dev.to/michabahr/how-to-get-random-records-from-a-serverless-application-25gd</guid>
      <description>&lt;p&gt;Some applications need to get random data for providing their customers are good and diversified experience, e.g. for a quiz app. In this article we take a look at three serverless approaches to getting random records from a large and changing set of data.&lt;/p&gt;

&lt;p&gt;A serverless mechanism for getting random records should be scalable, support a changing dataset and scale down to zero if not in use.&lt;/p&gt;

&lt;p&gt;A great quiz app lets us store millions of questions so that the game stays interesting to our customers. It also allows us to add more questions over time, and remove questions that are outdated.&lt;/p&gt;

&lt;p&gt;Keep in mind that true randomness is not always desirable, as that can lead to your user seeing the same record 5 times after another. Keep track of what your user has already seen, and try again if you load a record that they’ve already seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;p&gt;Apart from a quiz app, you might need to get random records for&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a vocabulary app,&lt;/li&gt;
&lt;li&gt;a “wisdom of the day” Twitter bot,&lt;/li&gt;
&lt;li&gt;a “picture of the week” calendar,&lt;/li&gt;
&lt;li&gt;a Special Sales Deal suggestion,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and many more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You need an AWS account and credentials in the environment that you’re running the examples from. &lt;a href="https://docs.aws.amazon.com/cloudshell/latest/userguide/welcome.html"&gt;You can use AWS CloudShell for this&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To get the most of this article, you should be familiar with one of DynamoDB, S3 or Redis.&lt;/p&gt;

&lt;p&gt;Python knowledge, or the ability to translate the examples to other languages is a nice to have.&lt;/p&gt;

&lt;h2&gt;
  
  
  Offset
&lt;/h2&gt;

&lt;p&gt;In the chapters for DynamoDB and S3 we’re using a random offset. The trick here is that this random offset does not need to exist as a record in the target service. S3 and DynamoDB will take it and scan until they find a record.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H5vnFqAA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://bahr.dev/pictures/randomized-offset.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H5vnFqAA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://bahr.dev/pictures/randomized-offset.png" alt="Randomized Offset Visualization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In plain English we tell DynamoDB and S3 to start at a certain point, and then keep looking until they find one record.&lt;/p&gt;

&lt;h2&gt;
  
  
  DynamoDB
&lt;/h2&gt;

&lt;p&gt;DynamoDB is a serverless key-value database that is optimized for transactional access patterns. If the partition key of our table is random within a range (e.g. a UUID), we can combine a &lt;code&gt;Scan&lt;/code&gt; operation with a random offset to get a random record on each request.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/metall0id"&gt;Tyrone Erasmus&lt;/a&gt; pointed me to a Stackoverflow answer, that we’re looking at in more detail below.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have used the most upvoted answer once or twice on dynamo: &lt;a href="https://t.co/OdIWdTWVzI"&gt;https://t.co/OdIWdTWVzI&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Wasn't intuitive at first but actually works really well (and only consumes 1 read capacity)&lt;/p&gt;

&lt;p&gt;— Tyrone Erasmus (@metall0id) &lt;a href="https://twitter.com/metall0id/status/1342518793084526596?ref_src=twsrc%5Etfw"&gt;December 25, 2020&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this example we’re using &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html"&gt;the Python library boto3 for DynamoDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First we insert some records that have a UUID as their partition key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in range(100):
  item = {'pk': str(uuid4()), 'text': f'What is {i}+{i}?'}
  table.put_item(Item=item)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are more exhaustive examples in the following sections.&lt;/p&gt;

&lt;p&gt;The second step is to run the &lt;code&gt;Scan&lt;/code&gt; operation with a random offset. We use the parameter &lt;code&gt;Limit&lt;/code&gt; so that the scan stops after it found one entry, and we use &lt;code&gt;ExclusiveStartKey&lt;/code&gt; to pass in a random offset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;table.scan(
    Limit=1,
    ExclusiveStartKey={
        'pk': str(uuid4())
    }
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above example we have a table that has a partition key called &lt;code&gt;pk&lt;/code&gt;. Every record in this database has a UUID as their partition key.&lt;/p&gt;

&lt;p&gt;By running this command, we read and retrieve exactly one record. While scans are usually considered expensive, but this scan operation only consumes 0.5 capacity units. This is the same amount a &lt;code&gt;get_item&lt;/code&gt; operation consumes. You can test this by adding the parameter&lt;code&gt;ReturnConsumedCapacity='TOTAL'&lt;/code&gt; to the scan operation.&lt;/p&gt;

&lt;p&gt;From my tests DynamoDB offers the best price for datasets with heavy usage. If you store a lot of records but only rarely access them, then S3 offers better pricing. More on that in the cost comparison.&lt;/p&gt;

&lt;p&gt;Please note that DynamoDB has a size limit of 400 KB per record. If you exceed that, then consider using the S3 or Redis approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully Random
&lt;/h3&gt;

&lt;p&gt;Here’s a complete Python example to pick a random record from a table called &lt;code&gt;random-table&lt;/code&gt;. The example includes writing records and checking for an edge case where we start at the end of the table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
from uuid import uuid4

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('random-table')

# Create 100 records with a random partition key
for i in range(100):
    item = {'pk': str(uuid4()), 'text': f"question-{i}"}
    table.put_item(Item=item)
    print(f"Inserted {item}")

# Read 3 records and print them with the consumed capacity
for i in range(3):
    response = table.scan(
        Limit=1,
        ExclusiveStartKey={
            'pk': str(uuid4())
        },
        ReturnConsumedCapacity='TOTAL'
    )
    if response['Items']:
        print({
            "Item": response['Items'][0],
            "Capacity": response['ConsumedCapacity']['CapacityUnits'],
            "ScannedCount": response['ScannedCount']
        })
    else:
        print("Didn't find an item. Please try again.")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Categorized
&lt;/h3&gt;

&lt;p&gt;Many use cases are not fully random, but require some kind of categorization. An example for this is a quiz, where we have the three difficulties &lt;code&gt;['easy', 'medium', 'difficult']&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In this case, we don’t want to query for a fully random record until we find one that matches the desired category. Instead, we want to achieve the same with one request.&lt;/p&gt;

&lt;p&gt;To achieve this we need a different data model. Instead of putting the UUID into the partition key, we use the partition key for the category and add a sort key with the UUID. This may lead to a big partition, but there’s no limit on how many records you can store in a single DynamoDB partition:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In a DynamoDB table, there is no upper limit on the number of distinct sort key values per partition key value. If you needed to store many billions of Dog items in the Pets table, DynamoDB would allocate enough storage to handle this requirement automatically. - &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html"&gt;DynamoDB documentation about partitions&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here’s an example which expands on the fully random example with categories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
from uuid import uuid4
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('random-table-categorized')

categories = ['easy', 'medium', 'difficult']

for category in categories:
    # Create 50 records for each category with a random sort key
    for i in range(50):
        item = {'pk': category, 'sk': str(uuid4()), 'text': f"question-{category}-{i}"}
        table.put_item(Item=item)
        print(f"Inserted {item}")

for category in categories:
    # Read 3 records and print them with the consumed capacity
    for i in range(3):
        response = table.query(
            Limit=1,
            KeyConditionExpression=Key('pk').eq(category) &amp;amp; Key('sk').gt(str(uuid4())),
            ReturnConsumedCapacity='TOTAL'
        )
        if response['Items']:
            print({
                "Item": response['Items'][0],
                "Capacity": response['ConsumedCapacity']['CapacityUnits'],
                "ScannedCount": response['ScannedCount']
            })
        else:
            print("Didn't find an item. Please try again.")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  S3
&lt;/h2&gt;

&lt;p&gt;S3 is a serverless object storage. It allows you to store and retrieve any amount of data, and offers industry-leading scalability, availability and performance. It’s also cheaper than fully fledged databases for storage heavy use cases. It does however offer less query flexibility than databases like DynamoDB.&lt;/p&gt;

&lt;p&gt;With S3 we take a similar approach to the one used with DynamoDB, which we need two API calls for: One for finding the key of a random object, and one for retrieving that object’s content.&lt;/p&gt;

&lt;p&gt;Assuming that there’s a bucket called &lt;code&gt;my-bucket-name&lt;/code&gt; with files that each have a UUID as their name, we can use the following approach.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;list_response = s3client.list_objects_v2(
    Bucket='my-bucket-name',
    MaxKeys=1,
    StartAfter=str(uuid4()),
)
key = list_response['Contents'][0]['Key']
item_response = s3client.get_object(
    Bucket=bucket_name,
    Key=key
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the parameter &lt;code&gt;MaxKeys=1&lt;/code&gt; we tell &lt;code&gt;list_objects_v2&lt;/code&gt; to stop after it found one file. &lt;code&gt;StartAfter&lt;/code&gt; is the equivalent to DynamoDB’s &lt;code&gt;ExclusiveStartKey&lt;/code&gt; which allows us to pass a random offset. The result of &lt;code&gt;list_objects_v2&lt;/code&gt;is a list of object keys, from which we pick the first one and retrieve the object.&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysUsingAPIs.html"&gt;The result is sorted alphabetically&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully Random
&lt;/h3&gt;

&lt;p&gt;Here’s a Python example to pick a random record from a bucket called &lt;code&gt;my-bucket-name&lt;/code&gt;. The example includes writing files and checking for an edge case, where we might have started at the end of the bucket.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
from uuid import uuid4

client = boto3.client('s3')

bucket_name = 'my-bucket-name'

# Create 100 records with a random key
for i in range(100):
    key = str(uuid4())
    client.put_object(Body=f"question={i}".encode(),
                      Bucket=bucket_name,
                      Key=key)
    print(f"Inserted {key}")

# Read 3 records and print them
for i in range(3):
    list_response = client.list_objects_v2(
        Bucket=bucket_name,
        MaxKeys=1,
        StartAfter=str(uuid4()),
    )
    if 'Contents' in list_response:
        key = list_response['Contents'][0]['Key']
        item_response = client.get_object(
            Bucket=bucket_name,
            Key=key
        )
        print({
            'Key': key,
            'Content': item_response['Body'].read().decode('utf-8')
        })
    else:
        print("Didn't find an item. Please try again.")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Categorized
&lt;/h3&gt;

&lt;p&gt;Here’s an S3 example with categories, which we add as a key prefix. What was previously &lt;code&gt;key&lt;/code&gt;now becomes &lt;code&gt;category/key&lt;/code&gt;. For &lt;code&gt;list_objects_v2&lt;/code&gt; we need to consider the category in two places. The first one is the &lt;code&gt;Prefix&lt;/code&gt; parameter, and the second one is the &lt;code&gt;StartAfter&lt;/code&gt; parameter which needs to include the category and the key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
from uuid import uuid4

client = boto3.client('s3')

bucket_name = 'my-random-bucket'

categories = ['easy', 'medium', 'difficult']

for category in categories:
    # Create 100 records with a random key
    for i in range(100):
        key = str(uuid4())
        client.put_object(Body=f"question-{category}-{i}".encode(),
                          Bucket=bucket_name,
                          Key=f"{category}/{key}")
        print(f"Inserted {key} for category {category}")

for category in categories:
    # Read 3 records and print them
    for i in range(3):
        start_after = f"{category}/{uuid4()}"
        list_response = client.list_objects_v2(
            Bucket=bucket_name,
            MaxKeys=1,
            Prefix=category,
            StartAfter=start_after
        )
        if 'Contents' in list_response:
            key = list_response['Contents'][0]['Key']
            item_response = client.get_object(
                Bucket=bucket_name,
                Key=key
            )
            print({
                'Key': key,
                'Content': item_response['Body'].read().decode('utf-8'),
            })
        else:
            print("Didn't find an item. Please try again.")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you omit the &lt;code&gt;Prefix&lt;/code&gt;, you might find objects that are outside the selected category. Assuming that the &lt;code&gt;StartAfter&lt;/code&gt;parameter is &lt;code&gt;categoryA/object2&lt;/code&gt;, and that we don’t provide a &lt;code&gt;Prefix&lt;/code&gt;, then our result would be &lt;code&gt;categoryB/object3&lt;/code&gt;. If we however include &lt;code&gt;Prefix=categoryA&lt;/code&gt;, then &lt;code&gt;categoryB/object3&lt;/code&gt; doesn’t match, and we get an empty result instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;categoryA/object1
categoryA/object2
categoryB/object3

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;list_objects_v2&lt;/code&gt; call &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysUsingAPIs.html"&gt;always returns an ordered list&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redis
&lt;/h2&gt;

&lt;p&gt;Redis is an in-memory data store that can be used as a database amongst others. While Redis is not serverless, there are offerings like &lt;a href="https://lambda.store/"&gt;Lambda Store&lt;/a&gt;that you can use to keep your application fully serverless.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Have you tried redis? It has both RANDOMKEY as well as SRANDMEMBER commands that might be useful here&lt;/p&gt;

&lt;p&gt;— Yan Cui is making the AppSync Masterclass (&lt;a class="comment-mentioned-user" href="https://dev.to/theburningmonk"&gt;@theburningmonk&lt;/a&gt;
) &lt;a href="https://twitter.com/theburningmonk/status/1342516700764364801?ref_src=twsrc%5Etfw"&gt;December 25, 2020&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Redis approach suggested by &lt;a href="https://twitter.com/theburningmonk"&gt;Yan Cui&lt;/a&gt; is a lot simpler, because there are built-in command to pick random entries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;RANDOMKEY&lt;/code&gt; gets a random key from the currently selected database.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SRANDMEMBER&lt;/code&gt; lets us pick one or more random entries from a set, which lets us add categorization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Redis has a size limit of 512 MB per record.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully Random
&lt;/h3&gt;

&lt;p&gt;In the example below, we store unstructured records in our database. We retrieve a random key with the command &lt;code&gt;RANDOMKEY&lt;/code&gt;, and then get the value with the &lt;code&gt;GET {key}&lt;/code&gt; command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET firstKey "Hello world!"
SET secondKey "Panda"

RANDOMKEY
&amp;gt; "secondKey"

GET firstKey
&amp;gt; "Panda"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach requires two calls per random record.&lt;/p&gt;

&lt;h3&gt;
  
  
  Categorized
&lt;/h3&gt;

&lt;p&gt;We can leverage sets to add categories into our data. This approach is very powerful, as the &lt;code&gt;SRANDMEMBER&lt;/code&gt; command has an optional parameter with which we can specify how many records we want to retrieve. This comes in handy, if our users should see multiple entries at once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SADD easy 1 2 3 4 5 6 7 8 9
SADD medium 1 2 3 4 5 6 7 8 9
SADD difficult 1 2 3 4 5 6 7 8 9

SRANDMEMBER easy
&amp;gt; 5
SRANDMEMBER medium 2
&amp;gt; 3,6
SRANDMEMBER difficult 5
&amp;gt; 2,3,6,7,9

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach only needs one call per random record, or less if you retrieve multiple records at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Comparison
&lt;/h2&gt;

&lt;p&gt;In the cost comparison we’re looking at DynamoDB (On-Demand), S3 (Standard), and Lambda Store, because all of them are serverless solutions. All of them have a free tier that lets you test the approaches for free.&lt;/p&gt;

&lt;p&gt;In the cost comparison we’re reading from a dataset of one million records, each with a size of 1 KB. This should be enough for a question and some meta information. The total size of this data set is 1 GB. We’re excluding cost for Data Transfer.&lt;/p&gt;

&lt;p&gt;In the first table you see the price per single random record, as well as per million.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Single Record&lt;/th&gt;
&lt;th&gt;One Million Records&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB&lt;/td&gt;
&lt;td&gt;$0.000000125&lt;/td&gt;
&lt;td&gt;$0.125&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;$0.0000054&lt;/td&gt;
&lt;td&gt;$5.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda Store&lt;/td&gt;
&lt;td&gt;$0.000004&lt;/td&gt;
&lt;td&gt;$4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With DynamoDB, we’re using eventually consistent reads which are charged at a half read capacity unit. For S3 we need a &lt;code&gt;List&lt;/code&gt; and a &lt;code&gt;Get&lt;/code&gt; call, which are added together in the table. Lambda Store has a flat price per 100,000 requests. We’re assuming the &lt;code&gt;SRANDMEMBER&lt;/code&gt; operation here, as it needs only one request.&lt;/p&gt;

&lt;p&gt;In the second table you see the price per 1 GB stored per month.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;GB-month&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;$0.023&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda Store&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;Lambda Store has a concurrency limit of 20 in the free tier, 1000 in the Standard tier and 5000 in the Premium tier. They do however offer reserved capacity for high throughput use cases. So once you hit these limits, you probably have a working business model that can pay for reserved capacity.&lt;/p&gt;

&lt;p&gt;With DynamoDB, you might hit a throughput limitation, even if you’re in On-Demand mode. To work around this, you can let your app retry for a few times, or switch to a high provisioned capacity. Another approach I heard of but didn’t verify is to create the table with a very high provisioned capacity, and then immediately switch back to On-Demand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Serverless is a great fit for providing random records. The application scales with demand and has a very low price per access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DynamoDB has the lowest cost and highest flexibility&lt;/strong&gt; for applications that are similar to a quiz app. The pricing is better for applications that have a few million records, and where those records are read frequently and repeatedly. Imagine 10 million records stored, with 100 million reads each month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S3 becomes interesting for applications that use significantly more storage&lt;/strong&gt; , and read infrequently. Imagine 10 billion records stored, and 2 million reads a month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lambda Store has the easiest learning curve at a reasonable price&lt;/strong&gt;. Its Redis database is also the only of the three offerings, that &lt;strong&gt;can return multiple random records in one request&lt;/strong&gt;. Last but not least&lt;a href="https://docs.lambda.store/docs/overall/compare/#aws-dynamodb"&gt;Lambda Store says on their website&lt;/a&gt; that their latency is “submillisecond while the latency is up to 10 msec in DynamoDB”. &lt;a href="https://medium.com/lambda-store/swifter-than-dynamodb-lambda-store-serverless-redis-bfacfaf92c80"&gt;Check out their full benchmark article&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/a/27389403"&gt;Scan approach on Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.lambda.store"&gt;Lambda Store Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/lambda-store/swifter-than-dynamodb-lambda-store-serverless-redis-bfacfaf92c80"&gt;Benchmark from Lambda Store&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html"&gt;Boto3 documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html"&gt;DynamoDB documentation about partitions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cloudshell/latest/userguide/welcome.html"&gt;AWS CloudShell Manual&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>serverless</category>
      <category>aws</category>
      <category>dynamodb</category>
      <category>s3</category>
    </item>
    <item>
      <title>Amazon Timestream vs DynamoDB for Timeseries Data</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Thu, 12 Nov 2020 08:40:57 +0000</pubDate>
      <link>https://dev.to/michabahr/amazon-timestream-vs-dynamodb-for-timeseries-data-3gil</link>
      <guid>https://dev.to/michabahr/amazon-timestream-vs-dynamodb-for-timeseries-data-3gil</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was first published on &lt;a href="https://bahr.dev"&gt;bahr.dev&lt;/a&gt;. &lt;a href="https://subscribe.bahr.dev/now"&gt;Subscribe to get new articles&lt;/a&gt; straight to your inbox!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AWS recently announced that their &lt;a href="https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-timestream-now-generally-available/"&gt;Timestream database is now generally available&lt;/a&gt;. I tried it out with an existing application that uses timeseries data. Based on my experimentation this article compares Amazon Timestream with DynamoDB and shows what I learned.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.timescale.com/blog/what-the-heck-is-time-series-data-and-why-do-i-need-a-time-series-database-dcf3b1b18563/"&gt;Timeseries data is a sequence of data points stored in time order&lt;/a&gt;. Each timestream record can be extended with dimensions that give more context on the measurement. One example are fuel measurements of trucks, with truck types and number plates as dimensions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;As this article compares Timestream with DynamoDB, it's good for you to have some experience with the latter. But even if you don't, you can learn about both databases here.&lt;/p&gt;

&lt;p&gt;I will also mention Lambda and API Gateway. If you're not familiar with those two, just read them as "compute" and "api".&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case
&lt;/h2&gt;

&lt;p&gt;My application monitors markets to notify customers of trading opportunities and registers about 500,000 market changes each day. DynamoDB requires ~20 &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html"&gt;RCU/WCU&lt;/a&gt;s for this. While most of the system is event-driven and can complete eventually, there are also userfacing dashboards that need fast responses. &lt;/p&gt;

&lt;p&gt;Below you can see a picture of the current architecture, where a Lambda function pulls data into DynamoDB, another one creates notifications when a trading opportunity appears and an API Gateway that serves data for the user dashboards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0cp5x-Q9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/timestream-use-case.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0cp5x-Q9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/timestream-use-case.png" alt="Architecture for Use Case"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each record in the database consists of two measurements (price and volume), has two dimensions (article number and location) and has a timestamp.&lt;/p&gt;

&lt;p&gt;Testing out Timestream required two changes: An additional Lambda function to replicate from DynamoDB to Timestream, and a new API that reads from Timestream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Format
&lt;/h2&gt;

&lt;p&gt;Let's start by comparing the data format of DynamoDB and Timestream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DynamoDB&lt;/strong&gt; holds a flexible amount of attributes, which are identified by a unique key. This means that you need to query for a key, and will get the according record with multiple attributes. That's for example useful when you store meta information for movies or songs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5pZLnm18--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/data-format-dynamo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5pZLnm18--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/data-format-dynamo.png" alt="Data Format DynamoDB"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timestream&lt;/strong&gt; instead is designed to store continuous measurements, for example from a temperature sensor. There are only inserts, no updates. Each measurement has a name, value, timestamp and dimensions. A dimension can be for example the city where the temperature sensor is, so that we can group results by city.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nRoaaayT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/data-format-timestream.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nRoaaayT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/data-format-timestream.png" alt="Data Format Timestream"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Write to Timestream
&lt;/h2&gt;

&lt;p&gt;Timestream shines when it comes to ingestion. The &lt;a href="https://docs.aws.amazon.com/timestream/latest/developerguide/API_WriteRecords.html"&gt;&lt;code&gt;WriteRecords&lt;/code&gt; API&lt;/a&gt; is designed with a focus on batch inserts, which allows you to insert up to 100 records per request. With DynamoDB my batch inserts were sometimes throttled both with provisioned and ondemand capacity, while I saw no throttling with Timestream.&lt;/p&gt;

&lt;p&gt;Below you can see a snapshot from AWS Cost Explorer when I started ingesting data with a &lt;a href="https://aws.amazon.com/timestream/pricing"&gt;memory store&lt;/a&gt; retention of 7 days. Memory store is Timestream's fastest, but most expensive storage. It is required for ingestion but its retention can be reduced to one hour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3CFHBKNt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/timestream-write-storage-cost.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3CFHBKNt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/timestream-write-storage-cost.png" alt="Timestream Write and Storage Cost"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;write operations are cheap&lt;/strong&gt; and can be neglected in comparison to cost for storage and reading. Inserting 515,000 records has cost me $0.20, while the in-memory storage cost for all of those records totalled $0.37 after 7 days. My spending matches &lt;a href="https://aws.amazon.com/timestream/pricing/"&gt;Timestream's official pricing&lt;/a&gt; of $0.50 per 1 million writes of 1KB size.&lt;/p&gt;

&lt;p&gt;As &lt;strong&gt;each Timestream record can only contain one measurement&lt;/strong&gt;, we need to split up the DynamoDB records which hold multiple measurements. Instead of writing one record with multiple attributes, we need to write one record per measure value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backfilling old data&lt;/strong&gt; might not be possible if its age exceeds the maximum retention time of the memory store which is 12 months. In October 2020 it was only possible to write to memory store and if you tried to insert older records you would get an error. To backfill and optimize cost you can start with 12 months retention and then lower it once your backfilling is complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Read from Timestream
&lt;/h2&gt;

&lt;p&gt;You can read data from Timestream with SQL queries and get charged per GB of scanned data. &lt;code&gt;WHERE&lt;/code&gt; clauses are key to limiting the amount of data that you scan because "data is pruned by Amazon Timestream’s query engine when evaluating query predicates" (&lt;a href="https://aws.amazon.com/timestream/pricing/"&gt;Timestream Pricing&lt;/a&gt;). &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The less data makes it through your &lt;code&gt;WHERE&lt;/code&gt; clauses, the cheaper and faster your query.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I tested the read speed by running the same queries against two APIs that were backed by DynamoDB (blue) and Timestream (orange) respectively. Below you can see a chart where I mimicked user behavior over the span of an hour. The spikes where DynamoDB got slower than Timestream were requests where computing the result required more than 500 queries to DynamoDB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dAwMbJym--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/access-speed-comparison.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dAwMbJym--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/access-speed-comparison.png" alt="Access Speed Comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DynamoDB is designed for blazing fast queries, but &lt;a href="https://bahr.dev/2020/02/02/aggregate-ddb/"&gt;doesn't support adhoc analytics&lt;/a&gt;. SQL queries won't compete at getting individual records, but can get interesting once you have to access many different records and can't precompute data. My queries to Timestream usually took more than a second, and I decided to &lt;strong&gt;precompute user facing data into DynamoDB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Dashboards that update every minute or so and can wait 10s for a query to complete are fine with reading from Timestream. Use the right tool for the right job.&lt;/p&gt;

&lt;p&gt;Timestream seems to have &lt;strong&gt;no limit on query length&lt;/strong&gt;. An SQL query with 1,000 items in an SQL IN clause works fine, while &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html"&gt;DynamoDB limits queries to 100 operands&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timestream Pricing
&lt;/h2&gt;

&lt;p&gt;Timestream pricing mostly comes down to two questions: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you need &lt;strong&gt;memory store with long retention&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;Do you &lt;strong&gt;read frequently&lt;/strong&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below you can see the cost per storage type calculated into hourly, daily and monthly cost. On the right hand side you can see the relative cost compared to memory store.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;My ingestion experiments with Timestream were quite cheap with 514,000 records inserted daily for a whole month and the cost ending up below $10. This is a low barrier to entry for you to make some experiments. I &lt;strong&gt;dropped the memory storage down to two hours&lt;/strong&gt;, because I only needed it &lt;strong&gt;for ingestion&lt;/strong&gt;. Magnetic store seemed fast enough for my queries.&lt;/p&gt;

&lt;p&gt;When I tried to read and precompute data into DynamoDB every few seconds, I noticed that &lt;strong&gt;frequent reads can become expensive&lt;/strong&gt;. Timestream requires you to pick an encryption key from the Key Management Service (KMS), which is then used to decrypt data when reading from Timestream. In my experiment decrypting with KMS accounted for about 30% of the actual cost.&lt;/p&gt;

&lt;p&gt;Below you can see a chart of my spending on Timestream and KMS with frequent reads on October 14th and 15th.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--j8poRZT6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/timestream-cost.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--j8poRZT6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/2020/timestream/timestream-cost.png" alt="Timestream in Cost Explorer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problems and Limitations
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/timestream/latest/developerguide/API_RejectedRecord.html"&gt;Records can get rejected for three reasons&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate values for the same dimensions, timestamps, and measure names&lt;/li&gt;
&lt;li&gt;Timestamps outside the memory's retention store&lt;/li&gt;
&lt;li&gt;Dimensions or measures that exceed the Timestream limits (e.g. numbers that are bigger than a BigInt)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on &lt;a href="https://forums.aws.amazon.com/thread.jspa?messageID=960046%F3%AA%98%AE"&gt;my experience with these errors&lt;/a&gt; I suggest that you &lt;strong&gt;log the errors but don't let the exception bubble up&lt;/strong&gt;. If you're building historical charts, one or two missing values shouldn't be a problem.&lt;/p&gt;

&lt;p&gt;Below you can see an example of how I &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/timestream-write.html"&gt;write records to Timestream with the boto3 library for Python&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;timestream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'timestream-write'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;timestream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write_records&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;DatabaseName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'MarketWatch'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;TableName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'Snapshots'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;CommonAttributes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;'Time'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())),&lt;/span&gt;
            &lt;span class="s"&gt;'TimeUnit'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'SECONDS'&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# each chunk can hold up to 100 records
&lt;/span&gt;        &lt;span class="n"&gt;Records&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RejectedRecordsException&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'exception'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'RejectedRecords'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="s"&gt;'reason'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Reason'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
            &lt;span class="s"&gt;'rejected_record'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'RecordIndex'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another perceived limitation is that each record can only hold one measurement (name and value). Assuming you have a vehicle with 200 sensors, you could write that into DynamoDB with one request, while Timestream already needs two. However this is pretty easy to compensate and I couldn't come up with a good acceess pattern where you must combine different measurement types (e.g. temperature and voltage) in a single query.&lt;/p&gt;

&lt;p&gt;Last but not least, Timestream does not have provisioned throughput yet. Especially when collecting data from a fleet of IoT sensors it would be nice to limit the ingestion to not cause cost spikes that may be caused by a bug in the sensors. In my tests the cost for writing records has been negligible though.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;I moved my timeseries data to Timestream, but added another DynamoDB table for precomputing user facing data. While my cost stayed roughly the same, I now have &lt;strong&gt;cheap long term storage at 12% of the previous price&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;DynamoDB is faster for targeted queries, whereas &lt;strong&gt;Timestream is better for analytics&lt;/strong&gt; that include large amounts of data. You can &lt;strong&gt;combine both and precompute&lt;/strong&gt; data that needs fast access.&lt;/p&gt;

&lt;p&gt;Trying out queries is key to understanding if it fits your use case and its requirements. You can do that in the timestream console with the AWS examples. Beware of frequent reads and monitor your spending.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it out
&lt;/h2&gt;

&lt;p&gt;Try out one of the sample databases through the Timestream console or replicate some of the data you write to DynamoDB into Timestream. You can achieve the latter for example with &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html"&gt;DynamoDB streams&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For some more inspiration, check out the &lt;a href="https://github.com/awslabs/amazon-timestream-tools"&gt;timestream tools and samples by awslabs on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/store-and-access-time-series-data-at-any-scale-with-amazon-timestream-now-generally-available/"&gt;Timestream now Generally Available&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloudonaut.io/unboxing-amazon-timestream/"&gt;Unboxing Amazon Timestream&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/database/design-patterns-for-high-volume-time-series-data-in-amazon-dynamodb/"&gt;Design patterns for high-volume, time-series data in Amazon DynamoDB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-hybrid.html"&gt;Best Practices for Implementing a Hybrid Database System&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Enjoyed this article? I publish a new article every month. &lt;a href="https://twitter.com/bahrdev"&gt;Connect with me on Twitter&lt;/a&gt; and &lt;a href="https://subscribe.bahr.dev/now"&gt;subscribe for new articles&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>database</category>
    </item>
    <item>
      <title>Validate Email Workflows with a Serverless Inbox API</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Thu, 15 Oct 2020 08:37:39 +0000</pubDate>
      <link>https://dev.to/michabahr/validate-email-workflows-with-a-serverless-inbox-api-ogc</link>
      <guid>https://dev.to/michabahr/validate-email-workflows-with-a-serverless-inbox-api-ogc</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was first published on &lt;a href="https://bahr.dev" rel="noopener noreferrer"&gt;bahr.dev&lt;/a&gt;. &lt;a href="https://subscribe.bahr.dev/now" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt; to get new articles straight to your inbox!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this article you'll learn &lt;strong&gt;how to build a serverless API&lt;/strong&gt; that you can use to &lt;strong&gt;validate your email sending workflows&lt;/strong&gt;. You will have access to &lt;strong&gt;unlimited inboxes&lt;/strong&gt; for your domain, allowing you to &lt;strong&gt;use a new inbox for every test run&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bahrmichael/inbox-api" rel="noopener noreferrer"&gt;The working code is ready for you to deploy on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With AWS services Simple Email Service (SES) and API Gateway we can build a fully automated solution. Its pricing model fits most testing workloads into the free tier, and can handle up to 10,000 mails per month for just $10. No maintenance or development required. It also allows you to stay in the &lt;a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/request-production-access.html" rel="noopener noreferrer"&gt;SES sandbox&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To deploy this solution, you should have an &lt;strong&gt;AWS account&lt;/strong&gt; and some experience with the &lt;a href="https://aws.amazon.com/cdk/" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS CDK&lt;/strong&gt;&lt;/a&gt;. I'll be using the &lt;strong&gt;TypeScript&lt;/strong&gt; variant. This article uses CDK version 1.63.0. Let me know if anything breaks in newer versions!&lt;/p&gt;

&lt;p&gt;To receive mail with SES you need a domain or subdomain. You can &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/domain-register.html" rel="noopener noreferrer"&gt;register a domain with Route53&lt;/a&gt; or &lt;a href="https://bahr.dev/2020/09/01/multiple-frontends/" rel="noopener noreferrer"&gt;delegate from another provider&lt;/a&gt;. You can also use subdomains like &lt;code&gt;mail-test.bahr.dev&lt;/code&gt; to receive mail if you already connected your apex domain (e.g. &lt;code&gt;bahr.dev&lt;/code&gt;) with another mailserver.&lt;/p&gt;

&lt;h2&gt;
  
  
  High-Level Overview
&lt;/h2&gt;

&lt;p&gt;The solution consists of two parts. The email receiver and the api that lets you access the received mail. The first writes to the database, the latter reads from it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Ftempmail%2Finbox-api.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Ftempmail%2Finbox-api.png" alt="Architecture Overview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the email receiver we use &lt;a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/receiving-email.html" rel="noopener noreferrer"&gt;SES&lt;/a&gt; with &lt;a href="https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ses-readme.html#email-receiving" rel="noopener noreferrer"&gt;Receipt Rules&lt;/a&gt;. We use those rules to store the raw payload and attachments in an S3 bucket, and send a nicely formed payload to a Lambda function which creates an entry in the DynamoDB table.&lt;/p&gt;

&lt;p&gt;On the API side there's a single read operation which requires the recipient's email address. It can be parameterized to reduce the number of emails that will be returned. &lt;/p&gt;

&lt;p&gt;Old emails are automatically discarded with &lt;a href="https://bahr.dev/2019/05/29/scheduling-ddb/" rel="noopener noreferrer"&gt;DynamoDB's time to live (TTL) feature&lt;/a&gt;, keeping the database small without any maintenance work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verify Domain with SES
&lt;/h2&gt;

&lt;p&gt;To receive mail, you must be in control of a domain that you can register with SES. This can also be a subdomain, e.g. if you already use your apex domain (e.g. bahr.dev) for another mailservice like Office 365.&lt;/p&gt;

&lt;p&gt;The integration with SES is easiest if you have a hosted zone for your domain in Route53. To use domains from another provider like GoDaddy, I suggest that you &lt;a href="https://bahr.dev/2020/09/01/multiple-frontends/" rel="noopener noreferrer"&gt;set up a nameserver delegation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once you have a hosted zone for your domain, &lt;a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/verify-domain-procedure.html" rel="noopener noreferrer"&gt;go to the Domain Identity Management in SES and verify a new domain&lt;/a&gt;. There's also &lt;a href="https://youtu.be/3o-PcDozNkY" rel="noopener noreferrer"&gt;a short video where I verify a domain with SES&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Model
&lt;/h2&gt;

&lt;p&gt;We'll use DynamoDB's partition and sort keys to enable two major features: Receiving mail for many aliases and receiving more than one mail for each alias. An alias is the &lt;code&gt;front-part&lt;/code&gt; in &lt;code&gt;front-part@domain.com&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;partition_key: recipient@address.com
sort_key: timestamp#uuid
ttl: timestamp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By combining a timestamp and a uuid we can sort and filter by the timestamp, while also guaranteeing that no two records will conflict with each other. The TTL helps us to keep the table small, by &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html" rel="noopener noreferrer"&gt;letting DynamoDB remove old records&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'm using &lt;a href="https://github.com/jeremydaly/dynamodb-toolbox" rel="noopener noreferrer"&gt;Jeremy Daly's dynamodb-toolbox&lt;/a&gt; to model my database entities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Entity&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dynamodb-toolbox&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;v4&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;uuid&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;uuid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Require AWS SDK and instantiate DocumentClient&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;DynamoDB&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-sdk/clients/dynamodb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DocumentClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;DynamoDB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DocumentClient&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Instantiate a table&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MailTable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="c1"&gt;// Specify table name (used by DynamoDB)&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Define partition and sort keys&lt;/span&gt;
  &lt;span class="na"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;sortKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Add the DocumentClient&lt;/span&gt;
  &lt;span class="nx"&gt;DocumentClient&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Mail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Entity&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Mail&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="na"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;// recipient address&lt;/span&gt;
      &lt;span class="na"&gt;sk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
        &lt;span class="na"&gt;hidden&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="na"&gt;sortKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; 
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;number&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="na"&gt;table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MailTable&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Receiver
&lt;/h2&gt;

&lt;p&gt;SES allows us to set up &lt;code&gt;ReceiptRules&lt;/code&gt; which trigger actions when a new mail arrives. &lt;a href="https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ses-actions-readme.html" rel="noopener noreferrer"&gt;There are multiple actions to choose from&lt;/a&gt; but we are mostly interested in the Lambda and S3 actions. We use the Lambda action to store details like the recipient, the sender and the subject in a DynamoDB table. With the S3 action we get the raw email deliverd as a file into a bucket. This will be handy to later support more use cases like rerturning the mail's body and attachments.&lt;/p&gt;

&lt;p&gt;Below you can see the abbreviated CDK code to set up the &lt;code&gt;ReceiptRules&lt;/code&gt;. Please note that you have to activate the rule set in the AWS console. &lt;a href="https://github.com/aws/aws-cdk/issues/10321" rel="noopener noreferrer"&gt;There is currently no high level CDK construct for this&lt;/a&gt; and I don't want you to accidentally override an existing rule set. &lt;a href="https://youtu.be/00_sx_-SFc0" rel="noopener noreferrer"&gt;Here's a short video where I activate a rule set&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Bucket&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-s3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Table&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-dynamodb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;Function&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ReceiptRuleSet&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-ses&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-ses-actions&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InboxApiStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// your-domain.com&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;INBOX_DOMAIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawMailBucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RawMail&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TempMailMetadata&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;postProcessFunction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PostProcessor&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;...&lt;/span&gt;
        &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TABLE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantWriteData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;postProcessFunction&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// after deploying the cdk stack you need to activate this ruleset&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ReceiptRuleSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ReceiverRuleSet&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;recipients&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
              &lt;span class="na"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rawMailBucket&lt;/span&gt;
            &lt;span class="p"&gt;}),&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Lambda&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
              &lt;span class="na"&gt;function&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;postProcessFunction&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
          &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the above CDK code in place, let's take a look at the Lambda function that is triggered when a new mail arrives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;SESHandler&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// the model uses dynamodb-toolbox&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Mail&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./model&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SESHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mail&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;mail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;subject&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;mail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commonHeaders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;mail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="c1"&gt;// set the ttl as 7 days into the future and &lt;/span&gt;
        &lt;span class="c1"&gt;// strip milliseconds (ddb expects seconds for the ttl)&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;mail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Mail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt;
            &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function above maps the SES event into one record per recipient and store them together with a TTL attribute in the database. &lt;a href="https://github.com/bahrmichael/inbox-api" rel="noopener noreferrer"&gt;You can find the full source code on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now that we receive mail directly into our database, let's build an API to access the mail.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Read API
&lt;/h2&gt;

&lt;p&gt;The Read API consists of an API Gateway and a Lambda function with read access to the DynamoDB table. If you haven't built such an API before, &lt;a href="https://www.youtube.com/watch?v=XVHGq2uJu9s" rel="noopener noreferrer"&gt;I recommend that you check out Marcia's video on how to build serverless APIs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Below you can see the abbreviated CDK code to set up the API Gateway and Lambda function. &lt;a href="https://github.com/bahrmichael/inbox-api" rel="noopener noreferrer"&gt;You can find the full source code on GitHub&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LambdaRestApi&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-apigateway&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Table&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-dynamodb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InboxApiStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TempMailMetadata&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiFunction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ApiLambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TABLE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantReadData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;apiFunction&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LambdaRestApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;InboxApi&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apiFunction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API Gateway is able to directly integrate with DynamoDB, but to continue using the database model I built with &lt;a href="https://github.com/jeremydaly/dynamodb-toolbox" rel="noopener noreferrer"&gt;dynamodb-toolbox&lt;/a&gt; I have to go through a Lambda function. I also feel more comfortable writing TypeScript than &lt;a href="https://www.baeldung.com/apache-velocity" rel="noopener noreferrer"&gt;Apache Velocity Templates&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With the lambda function below, we load mails for a particular recipient and can filter to only return mails that arrived after a given timestamp.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyHandler&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// the model uses dynamodb-toolbox&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Mail&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./model&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryParams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queryStringParameters&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;recipient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;queryParams&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;recipient&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;recipient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Missing query parameter: recipient&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;queryParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;since&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nx"&gt;queryParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Mail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;recipient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;beginsWith&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nx"&gt;Items&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mails&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After deploying the read API, you can run a GET request which includes the recipient mail as the &lt;code&gt;recipient&lt;/code&gt; query parameter. You can further tweak your calls by providing a &lt;code&gt;since&lt;/code&gt; timestamp or a &lt;code&gt;limit&lt;/code&gt; that is great than the default &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example if you are sending an order confirmation to &lt;code&gt;random-uuid@inbox-api.domain.com&lt;/code&gt;, then you need to run GET request against &lt;code&gt;https://YOUR_API_ENDPOINT/?recipient=random-uuid@inbox-api.domain.com&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations and Potential Improvements
&lt;/h2&gt;

&lt;p&gt;While the &lt;a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/request-production-access.html" rel="noopener noreferrer"&gt;SES sandbox restricts how many emails you can send&lt;/a&gt;, there seems to be no limiation about receiving mail.&lt;/p&gt;

&lt;p&gt;Our solution is not yet capable of providing attachments or the mail body. The &lt;a href="https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-ses-actions.S3.html" rel="noopener noreferrer"&gt;SES S3 action&lt;/a&gt; already stores those in a bucket which can be used for an improved read API function.&lt;/p&gt;

&lt;p&gt;We could also drop the Lambda function that ties together the API Gateway and DynamoDB, by replacing it with a direct integration between the two services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it Yourself
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/bahrmichael/inbox-api" rel="noopener noreferrer"&gt;Check out the source code on GitHub&lt;/a&gt;. There's a step by step guide for you to try out this solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/bahrmichael/inbox-api" rel="noopener noreferrer"&gt;Source code on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ses-readme.html#email-receiving" rel="noopener noreferrer"&gt;Receipt Rules&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/send-email-simulator.html" rel="noopener noreferrer"&gt;Test email edge cases with the AWS mailbox simulator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html" rel="noopener noreferrer"&gt;DynamoDB TTL&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Enjoyed this article? I publish a new article every month. Connect with me on &lt;a href="https://twitter.com/bahrdev" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; and &lt;a href="https://subscribe.bahr.dev/now" rel="noopener noreferrer"&gt;subscribe&lt;/a&gt; for new articles!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>testing</category>
    </item>
    <item>
      <title>Point Multiple Subdomains To The Same Frontend</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Thu, 03 Sep 2020 07:44:11 +0000</pubDate>
      <link>https://dev.to/michabahr/point-multiple-subdomains-to-the-same-frontend-2p2d</link>
      <guid>https://dev.to/michabahr/point-multiple-subdomains-to-the-same-frontend-2p2d</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was first published on &lt;a href="https://bahr.dev" rel="noopener noreferrer"&gt;bahr.dev&lt;/a&gt;. &lt;br&gt;
&lt;a href="https://dev.us19.list-manage.com/subscribe/post?u=60149d3a4251e09f826818ef8&amp;amp;id=ad766562ce" rel="noopener noreferrer"&gt;Signup for the mailing list&lt;/a&gt; and get new articles straight to your inbox!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Back in 2019 I built an online ticketshop for sports clubs. In its core, the shop was a webapp that processes payments and sends PDF via email. When it came to customization, things got tricky: Each club had a different name, different pictures, and sometimes even different questions they wanted to ask their customers. To give each of the clubs a customized experience, we provided each of them with their own subdomain. Eventually there were six different frontend deployments, multiple branches and the code bases started to diverge. Recently I learned that you can use DNS ARecords to route all requests under a certain domain to the same frontend. Thanks to &lt;a href="https://twitter.com/handk85" rel="noopener noreferrer"&gt;DongGyun&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;This article explains how you can point multiple subdomains to the same frontend deployment by creating DNS records and a static website with the &lt;a href="https://aws.amazon.com/cdk/" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (CDK)&lt;/a&gt;. That will enable you to give each of your customers a customized experience, while having just one frontend deployment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia3.giphy.com%2Fmedia%2FkdLSRH6v4dSc5eNNQ4%2Fgiphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia3.giphy.com%2Fmedia%2FkdLSRH6v4dSc5eNNQ4%2Fgiphy.gif" alt="Wildcard Domains Demo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shortcut&lt;/strong&gt;: If you don't need Infrastructure as Code (IaC), then an ARecord in Route 53 with &lt;code&gt;*.yourdomain.com&lt;/code&gt; that points to your existing CloudFront distribution gets you the same result.&lt;/p&gt;

&lt;p&gt;The magic is in the chapter "Wildcard Routing". &lt;a href="https://github.com/bahrmichael/wildcard-subdomains" rel="noopener noreferrer"&gt;Check out the full source code on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To deploy the solution of this article, you should have an AWS account and some experience with the &lt;a href="https://aws.amazon.com/cdk/" rel="noopener noreferrer"&gt;AWS CDK&lt;/a&gt;. It's also good to have an unused domain registered in Amazon Route 53, but we will learn how to use other providers and used domains as well.&lt;/p&gt;

&lt;p&gt;This article uses CDK version 1.60.0. Let me know if anything breaks in newer versions! &lt;/p&gt;

&lt;p&gt;Please bootstrap your account for CDK by running &lt;code&gt;cdk bootstrap&lt;/code&gt;. We will need this for the &lt;code&gt;DnsValidatedCertificate&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Optional: &lt;a href="https://www.cloudflare.com/learning/dns/what-is-dns/" rel="noopener noreferrer"&gt;Understanding how DNS and especially nameservers work&lt;/a&gt; will help you a lot with troubleshooting potential routing issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;Let's find a solution by putting us in the customers shoes. As a customer I want to go to &lt;code&gt;bear.picture.bahr.dev&lt;/code&gt; or &lt;code&gt;forest.picture.bahr.dev&lt;/code&gt; or any other address in the format &lt;code&gt;*.picture.bahr.dev&lt;/code&gt; and then see a picture for the word in the beginning. As a developer I want the least amount of complexity possible. Multiple frontend deployments increase complexity.&lt;/p&gt;

&lt;p&gt;The request flow would look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fuser-flow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fuser-flow.png" alt="Overview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see above that only the domain changes, but nothing else. At &lt;strong&gt;the core of the solution&lt;/strong&gt; are &lt;strong&gt;wildcard ARecords&lt;/strong&gt; which let us route traffic for any subdomain to a particular target. The website can then take the URL, extract the subdomain and ask for the right picture. In the next chapter we will take a look at each part in detail.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Froute53-preview.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Froute53-preview.png" alt="Route 53 ARecords"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Create A Hosted Zone
&lt;/h2&gt;

&lt;p&gt;To register DNS records in AWS, we need to &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/hosted-zones-working-with.html" rel="noopener noreferrer"&gt;create a Hosted Zone in Route 53&lt;/a&gt;. &lt;a href="https://aws.amazon.com/route53/pricing/" rel="noopener noreferrer"&gt;Each Hosted Zone costs $0.50 per month&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Hosted Zone is easiest to set up if you have a domain that is managed by Route 53 and that you don't use for anything else yet. &lt;/p&gt;

&lt;p&gt;We will also look at how you can set up your Hosted Zone if you are already using your Route 53 domain for another purpose (e.g. your blog) or if that domain is managed by a different provider than Route 53.&lt;/p&gt;

&lt;p&gt;Depending on who manages your domain (e.g. Route 53 or GoDaddy) and if you already use your apex domain for other websites, you have to tweak the solution a bit. In my example, I already use my apex domain &lt;code&gt;bahr.dev&lt;/code&gt; for my blog, and have the domain managed by GoDaddy. We will see how to specify the right records there in the following chapters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Warning&lt;/strong&gt;: Before deleting hosted zones, please make sure you delete all related records in the root hosted zone or third party provider. Dangling CNAME and NS records might &lt;a href="https://searchsecurity.techtarget.com/answer/What-is-subdomain-takeover-and-why-does-it-matter" rel="noopener noreferrer"&gt;allow an attacker to serve content in your name&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.1. Fresh Domain That Is Managed By Route 53
&lt;/h3&gt;

&lt;p&gt;This is the easiest path. All we need is a Hosted Zone for our domain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;HostedZone&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-route53&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`bahr.dev`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HostedZone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;HostedZone&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;zoneName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Route 53 can now serve DNS records for that domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.2. Used Domain That Is Managed By Route 53
&lt;/h3&gt;

&lt;p&gt;This assumes that you already have a Hosted Zone for your apex domain, use your apex domain for something different and want to use a subdomain instead. An apex domain is your top level domain, e.g. &lt;code&gt;bahr.dev&lt;/code&gt; or &lt;code&gt;google.com&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We need to tell the DNS servers that information about the subdomain is in another Hosted Zone and do this by creating a &lt;code&gt;ZoneDelegationRecord&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;HostedZone&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-route53&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;// bahr.dev is already in use, so we'll start &lt;/span&gt;
&lt;span class="c1"&gt;// at the subdomain picture.bahr.dev&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apexDomain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bahr.dev&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`picture.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;apexDomain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// as above we create a hostedzone for the subdomain&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HostedZone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;HostedZone&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;zoneName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// add a ZoneDelegationRecord so that requests for *.picture.bahr.dev &lt;/span&gt;
&lt;span class="c1"&gt;// and picture.bahr.dev are handled by our newly created HostedZone&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nameServers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostedZoneNameServers&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rootZone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;HostedZone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromLookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Zone&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
  &lt;span class="na"&gt;domainName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apexDomain&lt;/span&gt; 
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ZoneDelegationRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Delegation&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;recordName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;nameServers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rootZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A low time to live (TTL) allows for faster trial and error as DNS caches expire quicker. You should increase this as you make you get ready for production.&lt;/p&gt;

&lt;p&gt;We will later add ARecords, so that requests to &lt;code&gt;picture.bahr.dev&lt;/code&gt; and &lt;code&gt;*.picture.bahr.dev&lt;/code&gt; go to the same CloudFront distribution. &lt;code&gt;bahr.dev&lt;/code&gt; will not be affected.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.3. Domain Is Managed By A Provider Other Than AWS
&lt;/h3&gt;

&lt;p&gt;Again we will create a Hosted Zone in Route 53, but this time we need manual work to register the nameservers of our Hosted Zone with our DNS provider. To get started, first create a Hosted Zone through the AWS console.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fcreate-hosted-zone.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fcreate-hosted-zone.png" alt="Create Hosted Zone"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will give us a Hosted Zone with two entries for Nameservers (NS) and Start Of Authority (SOA). We will copy the authoritative nameserver, and tell our DNS provider to delegate requests to our Hosted Zone in AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fhosted-zone-records.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fhosted-zone-records.png" alt="Hosted Zone Records"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy the authoritative nameserver from the SOA record, go to your DNS provider and create a nameserver record, where you replace the values for &lt;code&gt;Name&lt;/code&gt; and &lt;code&gt;Value&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Type: NS
Name: picture
Value: ns-1332.awsdns-38.org
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use a specific value like &lt;code&gt;picture&lt;/code&gt; if you want to start at a subdomain like &lt;code&gt;*.picture.bahr.dev&lt;/code&gt; or use &lt;code&gt;@&lt;/code&gt; if you want to use your apex domain like &lt;code&gt;*.bahr.dev&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fgodaddy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fbahrmichael%2Fbahrmichael.github.io%2Fraw%2Fmaster%2Fpictures%2F2020%2Fwildcarddomains%2Fgodaddy.png" alt="Nameserver Record GoDaddy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then use the following CDK snippet to import the Hosted Zone that you created manually.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;HostedZone&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-route53&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`picture.bahr.dev`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;HostedZone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromLookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HostedZone&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
  &lt;span class="na"&gt;domainName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; 
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Certificate
&lt;/h2&gt;

&lt;p&gt;Now that we have DNS routing set up, we can request and validate a certificate. We need this certificate to serve our website with https.&lt;/p&gt;

&lt;p&gt;With the CDK we can create and validate a certificate in one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DnsValidatedCertificate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ValidationMethod&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-cdk/aws-certificatemanager&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;certificate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DnsValidatedCertificate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Certificate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;us-east-1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;domainName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;subjectAlternativeNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;validationDomains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;validationMethod&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ValidationMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DNS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a lot going on here, so let's break it down. &lt;/p&gt;

&lt;p&gt;First we set the region to &lt;code&gt;us-east-1&lt;/code&gt;, because &lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/migrate-ssl-cert-us-east/" rel="noopener noreferrer"&gt;CloudFront requires certificates to be in &lt;code&gt;us-east-1&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We then use the CDK construct &lt;code&gt;DnsValidatedCertificate&lt;/code&gt; which spawns a certificate request and a lambda function to register the CNAME record in Route 53. That record is used for validating that we actually own the domain. &lt;/p&gt;

&lt;p&gt;The parameter &lt;code&gt;hostedZone&lt;/code&gt; specifies which Hosted Zone the certificate shall connect with. This is the Hosted Zone we created before.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;domainName&lt;/code&gt; and &lt;code&gt;subjectAlternativeNames&lt;/code&gt; specify which domains the certificate should be valid for. The remaining parameters configure the validation process.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Frontend Deployment
&lt;/h2&gt;

&lt;p&gt;With the certificate in place, we can create a Single Page Application (SPA) deployment via S3 and CloudFront. We're using the npm package &lt;a href="https://www.npmjs.com/package/cdk-spa-deploy" rel="noopener noreferrer"&gt;cdk-spa-deploy&lt;/a&gt; to shorten the amount of code required for configuring the S3 bucket and attaching a CloudFront distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;SPADeploy&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cdk-spa-deploy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deployment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SPADeploy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;spaDeployment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createSiteWithCloudfront&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;indexDoc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="na"&gt;websiteFolder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./website&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="na"&gt;certificateARN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;certificate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;certificateArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="na"&gt;cfAliases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;index.html&lt;/code&gt; can be an HTML file as short as &lt;code&gt;&amp;lt;p&amp;gt;Hello world!&amp;lt;/p&amp;gt;&lt;/code&gt; and should be stored in the folder &lt;code&gt;./website&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the browser we can use JavaScript to get the subdomain. The line of code below splits the URL &lt;code&gt;ice.picture.bahr.dev&lt;/code&gt; into an array &lt;code&gt;['ice', 'picture', 'bahr', 'dev']&lt;/code&gt; and then picks the first element &lt;code&gt;'ice'&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;subdomain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;host&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that information, the website can then contact the CMS to get the right assets for your customer.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Wildcard Routing
&lt;/h2&gt;

&lt;p&gt;And finally it's time for the wildcard routing. With the CDK code below, all requests to &lt;code&gt;*.picture.bahr.dev&lt;/code&gt; and &lt;code&gt;picture.bahr.dev&lt;/code&gt; will be routed to the frontend deployment we set up above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CloudFrontTarget&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-cdk/aws-route53-targets&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ARecord&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;RecordTarget&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-route53&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cloudfrontTarget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;RecordTarget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAlias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CloudFrontTarget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deployment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;distribution&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ARecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ARecord&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;recordName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudfrontTarget&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ARecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WildCardARecord&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;recordName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudfrontTarget&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once all the DNS records have propagated, we can test our setup. Please note that deploying the whole solution sometimes takes 10 to 15 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Here's the full CDK code that you can copy into your existing CDK codebase. &lt;/p&gt;

&lt;p&gt;I suggest that you &lt;a href="https://github.com/bahrmichael/wildcard-subdomains" rel="noopener noreferrer"&gt;start with checking out the source code&lt;/a&gt; and adjust the domain and Hosted Zone to your needs. Add a &lt;code&gt;ZoneDelegationRecord&lt;/code&gt; if you need it. Make sure to run &lt;code&gt;cdk bootstrap&lt;/code&gt; if you haven't done that yet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;SPADeploy&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cdk-spa-deploy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DnsValidatedCertificate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ValidationMethod&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-cdk/aws-certificatemanager&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CloudFrontTarget&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-cdk/aws-route53-targets&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;HostedZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ARecord&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;RecordTarget&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-cdk/aws-route53&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WildcardSubdomainsStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`picture.bahr.dev`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HostedZone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;HostedZone&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;zoneName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;certificate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DnsValidatedCertificate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Certificate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;domainName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;subjectAlternativeNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;validationDomains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;validationMethod&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ValidationMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DNS&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deployment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SPADeploy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;spaDeployment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createSiteWithCloudfront&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="na"&gt;indexDoc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="na"&gt;websiteFolder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./website&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="na"&gt;certificateARN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;certificate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;certificateArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="na"&gt;cfAliases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cloudfrontTarget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;RecordTarget&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAlias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CloudFrontTarget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deployment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;distribution&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ARecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ARecord&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;recordName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudfrontTarget&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ARecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WildCardARecord&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hostedZone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;recordName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`*.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudfrontTarget&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now run &lt;code&gt;AWS_PROFILE=myProfile npm run deploy&lt;/code&gt; to deploy the solution. Replace &lt;code&gt;myProfile&lt;/code&gt; with whatever profile you're using for AWS. &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html" rel="noopener noreferrer"&gt;Here's more about AWS profiles&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The deployment may take somewhere between 10 and 15 minutes. Grab a coffee and let CDK do its thing. If you run into problems, check the troubleshooting section below.&lt;/p&gt;

&lt;p&gt;Once the deployment is done, you should be able to visit any subdomain of the domain you specified (e.g. &lt;code&gt;bear.picture.bahr.dev&lt;/code&gt; for the domain &lt;code&gt;picture.bahr.dev&lt;/code&gt;) and see your website.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The DNS routing doesn't work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A high time to live (TTL) on DNS records can make changes difficult to test. Try to lower the TTL as far as possible.&lt;/p&gt;

&lt;p&gt;If your domain is not managed by Route 53, make sure that the DNS routing from your DNS provider is set up correctly.&lt;/p&gt;

&lt;p&gt;If you use your apex domain for something else, make sure to set up a &lt;code&gt;ZoneDelegationRecord&lt;/code&gt; that redirects traffic for your subdomain to your new Hosted Zone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The deployment failed to clean up.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Depending on which step the deployment fails, not all resources can be cleaned up. This is most likely due to the CNAME record that the lambda function of the &lt;code&gt;DnsValidatedCertificate&lt;/code&gt; created. Go to the Hosted Zone, remove the CNAME record and delete the stack by running &lt;code&gt;cdk destroy&lt;/code&gt; or deleting it through the AWS console's CloudFormation service.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Failed to create resource. Cannot read property 'Name' of undefined&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Clean up the stack, remove and redeploy it. I'm not sure where that error comes from, but retrying fixed it for me.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The certificate validation times out.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Make sure you are using the right approach so that the required CNAME record will be visible to the DNS servers. If you've used your domain before, set up the right &lt;code&gt;ZoneDelegationRecord&lt;/code&gt;. This can be a bit tricky so feel free to &lt;a href="https://twitter.com/bahrdev" rel="noopener noreferrer"&gt;reach out to me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/bahrmichael/wildcard-subdomains" rel="noopener noreferrer"&gt;Check out the full source code&lt;/a&gt; and try it yourself! If you'd like to contribute, a PR to &lt;a href="https://github.com/cdk-patterns" rel="noopener noreferrer"&gt;cdk patterns&lt;/a&gt; is probably a good idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.cloudflare.com/learning/dns/what-is-dns/" rel="noopener noreferrer"&gt;What is DNS?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freecodecamp.org/news/how-to-host-a-static-website-with-s3-cloudfront-and-route53-7cbb11d4aeea/" rel="noopener noreferrer"&gt;Host a static website with CloudFront and S3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html" rel="noopener noreferrer"&gt;AWS profiles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/nideveloper/CDK-SPA-Deploy" rel="noopener noreferrer"&gt;cdk-spa-deploy on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>cdk</category>
      <category>ux</category>
    </item>
    <item>
      <title>Archive your AWS data to reduce storage cost</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Mon, 10 Aug 2020 08:41:34 +0000</pubDate>
      <link>https://dev.to/michabahr/archive-your-aws-data-to-reduce-storage-cost-364c</link>
      <guid>https://dev.to/michabahr/archive-your-aws-data-to-reduce-storage-cost-364c</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was first published on &lt;a href="https://bahr.dev"&gt;bahr.dev&lt;/a&gt;. &lt;br&gt;
&lt;a href="https://dev.us19.list-manage.com/subscribe/post?u=60149d3a4251e09f826818ef8&amp;amp;id=ad766562ce"&gt;Signup for the mailing list&lt;/a&gt; and get new articles straight to your inbox!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AWS offers a variety of general purpose storage solutions. While DynamoDB is the best option when latency and a variety of access patterns matter most, S3 allows for cost reduction when access patterns are less complex and latency is less critical.&lt;/p&gt;

&lt;p&gt;This article describes the available options for archiving data, how to prepare that data for long term archival and how to let S3 transition data between storage tiers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OLQjejbX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_storage_vs_latency.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OLQjejbX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_storage_vs_latency.png" alt="Storage Price vs Latency"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below you find a table comparing the prices and access latencies as of August 2020.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;All prices are for us-east-1. This article focuses on storage cost only.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You should have an &lt;strong&gt;AWS account&lt;/strong&gt; and gained &lt;strong&gt;first experience with DynamoDB or S3&lt;/strong&gt;. The code snippets are written in Python and are intended to run on &lt;strong&gt;AWS Lambda&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moving Data
&lt;/h2&gt;

&lt;p&gt;As you've seen in the previous table, you can achieve significant storage cost reduction, by moving your data to a cheaper storage solution.&lt;/p&gt;

&lt;p&gt;There are 3 major paths when archiving data:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DynamoDB to S3&lt;/li&gt;
&lt;li&gt;S3 Storage Tiers&lt;/li&gt;
&lt;li&gt;Final Archival with S3 Glacier&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first path requires a lambda function, and the others can be achieved without additional glue code. We will however look at data aggregation for small objects, as infrequent access solutions are less suitable for small objects.&lt;/p&gt;

&lt;h3&gt;
  
  
  DynamoDB to S3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--v4Ldux8F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_badge_1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v4Ldux8F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_badge_1.png" alt="Badge for DynamoDB to S3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to move data from DynamoDB to S3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Moving data out of DynamoDB makes sense when that data is becoming stale, but remains interesting for future use cases. &lt;/p&gt;

&lt;p&gt;An example for this are performance metrics. We're most interested in the recent weeks, but don't look at data from months ago too much. We still want to keep those around for later analysis or troubleshooting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to move data from DynamoDB to S3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To move data from DynamoDB to S3, we can use DynamoDB's Time to Live (ttl) feature in combination with event streams. This approach requires four steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Specifying a ttl attribute on a DynamoDB table&lt;/li&gt;
&lt;li&gt;Adding a timestamp to records which shall expire&lt;/li&gt;
&lt;li&gt;Activating a stream that DynamoDB will emit deleted records to&lt;/li&gt;
&lt;li&gt;Attaching a lambda to this stream, which checks for &lt;code&gt;DELETE&lt;/code&gt; events and writes the records into an S3 bucket&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first three steps are covered in my article &lt;a href="https://bahr.dev/2020/02/02/aggregate-ddb/"&gt;How to analyse and aggregate data from DynamoDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The lambda function to transition records to S3 can be as short as the following snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;boto3&lt;/span&gt;
&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'s3'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Records'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'eventName'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;'DELETE'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'dynamodb'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'NewImage'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# this assumes that there is a partition key called Id 
&lt;/span&gt;        &lt;span class="c1"&gt;# which is a number, and that there is no sort key
&lt;/span&gt;        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'dynamodb'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'Keys'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'Id'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'N'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;object_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"data-from-dynamodb/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'my_bucket'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;object_key&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will store deleted records in the S3 bucket &lt;code&gt;my_bucket&lt;/code&gt;. No data is lost, the DynamoDB table stays small and you get an instant 90% cost reduction on storage.&lt;/p&gt;

&lt;h3&gt;
  
  
  S3 Storage Tiers
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Kv1nQYBR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_badge_2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Kv1nQYBR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_badge_2.png" alt="Badge for S3 Storage Tiers"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If your data is accessed infrequently&lt;/strong&gt; you can achieve further cost savings by picking the right storage tier.&lt;/p&gt;

&lt;p&gt;S3 Lifecycle Transitions allow us to move objects between storage tiers without them leaving the S3 bucket. We define rules where we specify which storage tier an object shall be moved to once it reaches a certain age.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7tNYbfQL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://docs.aws.amazon.com/AmazonS3/latest/dev/images/lifecycle-transitions-v2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7tNYbfQL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://docs.aws.amazon.com/AmazonS3/latest/dev/images/lifecycle-transitions-v2.png" alt="S3 Transition Paths"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can read all about the S3 lifecycle transitions on the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-transition-general-considerations.html"&gt;official AWS documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to move data between S3 tiers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;S3 Standard gives you the most reliability and fastest access speed. If you're okay with 99.9% availability or you only access your data rarely (e.g. for regulatory checks), then the non-standard tiers can give you a cost advantage. The durability of your data is not affected (unless you pick the One-AZ tier).&lt;/p&gt;

&lt;p&gt;You also should aggregate data before moving it to a storage tier other than S3 Standard or S3 Intelligent-Tiering, as there is a &lt;a href="https://aws.amazon.com/s3/storage-classes/"&gt;minimum capacity charge per object&lt;/a&gt;. As a rule of thumb, aggregate your objects until the result is at least 1MB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to move data between S3 tiers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Due to the minimum capacity charge we will &lt;strong&gt;start by aggregating data&lt;/strong&gt;. If all of your objects in S3 are already 1MB or more, then you can skip directly to the lifecycle rules. To aggregate objects we can use any compute service (EC2, Fargate, Lambda) to load objects from S3, aggregate them and write the aggregated data back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;simplejson&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;boto3&lt;/span&gt;
&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'s3'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;one_mb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;
&lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'my_bucket'&lt;/span&gt;
&lt;span class="n"&gt;date_prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'data-from-dynamodb/2020-07'&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Downloading data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;objects&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;files_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;list_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_prefix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;obj_info&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Contents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;obj_info&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Key'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Body'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'utf-8'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'key'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'data'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Aggregating data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;aggregated_objects&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;aggregator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;aggregator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'key'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'data'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'data'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;one_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'data'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"aggregated/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;aggregator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'data'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"aggregated/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Deleting data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;obj_info&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;list_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_prefix&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Contents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delete_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;obj_info&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Key'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Done"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code snippet uses &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html"&gt;boto3&lt;/a&gt; and loads all records from the folder &lt;code&gt;data-from-dynamodb/2020-07&lt;/code&gt;, aggregates them, deletes the old data and uploads the new data into the folder &lt;code&gt;aggregated&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now that we've packaged our objects, let's continue with &lt;strong&gt;lifecycle transitions&lt;/strong&gt;. S3 can be configured to automatically move objects between storage tiers.&lt;/p&gt;

&lt;p&gt;In this article we will configure the lifecycle transitions through the AWS console. You can also use the CDK's &lt;a href="https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-s3.LifecycleRule.html"&gt;LifecycleRules&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-s3.Transition.html"&gt;Transition&lt;/a&gt;s to build an Infrastructure as Code solution.&lt;/p&gt;

&lt;p&gt;To get started, open your S3 bucket in the AWS console and open the Management tab. Click on "Add lifecycle rule" to configure a lifecycle. By applying the lifecycle rule to the folder &lt;code&gt;aggregated&lt;/code&gt;, we only transition data which has been packaged for archival.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zE8zFWsl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_step_1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zE8zFWsl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_step_1.png" alt="Lifecycle Rules Step 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Specify a Transition to Standard-IA (Infrequent Access) after 30 days. We're assuming here that data will be archived and therefore infrequently accessed, but you can increase this number however you like or pick another storage tier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wpV8g4JL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_step_2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wpV8g4JL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_step_2.png" alt="Lifecycle Rules Step 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Review and complete the lifecycle rule.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jQ2c0N9Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_step_3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jQ2c0N9Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_step_3.png" alt="Lifecycle Rules Step 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After 30 days you should start see in your bill that some objects are now priced at a less expensive storage tier. If you picked S3 Infrequent Access, that's another 45% you save for storage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data on Ice with S3 Glacier
&lt;/h3&gt;

&lt;p&gt;While we're only looking at Glacier here, you can apply the same principles for moving data to Intellingent-Tiering, One Zone-IA and Glacier Deep Archive.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0lnPwRnT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_badge_3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0lnPwRnT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_badge_3.png" alt="Badge for S3 Glacier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to move data to Glacier&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;S3 Glacier and S3 Glacier Deep Archive become interesting options, when you need to store data for a very long time (+5 years) and only access it very rarely (1-2 a year or less).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to move data to Glacier&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As we've previously aggregated our data, we can add additional lifecycle transitions to move the data from S3 Infrequent Access to S3 Glacier. Instead of the Infrequent Access tier, now pick a Glacier option and adjust the time before transition accordingly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Nra_tC7r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_glacier.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Nra_tC7r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/data_archival_glacier.png" alt="Lifecycle Rule Glacier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's it. Your data is now on ice and we get an additional 68% cost reduction on storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to retrieve data from Glacier&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To access data that is stored in Glacier, &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/user-guide/restore-archived-objects.html"&gt;you have to restore a copy of the object&lt;/a&gt;. The copy will be available for as long as you specified. The retrieval however can take up to 12 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Do you have big DynamoDB tables? Figure out what data you can archive, and start moving it to S3 for a 90% cost reduction!&lt;/p&gt;

&lt;p&gt;Do you already have data in S3? Add a lifecycle transition to a lower tier and aggregate objects if needed.&lt;/p&gt;

&lt;p&gt;Did you enjoy this artcile? &lt;a href="https://dev.us19.list-manage.com/subscribe/post?u=60149d3a4251e09f826818ef8&amp;amp;id=ad766562ce"&gt;Signup for the mailing list&lt;/a&gt; and get new articles like this straight to your inbox!&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pricing Pages for &lt;a href="https://aws.amazon.com/dynamodb/pricing/"&gt;DynamoDB&lt;/a&gt; and &lt;a href="https://aws.amazon.com/s3/pricing/"&gt;S3&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/de/blogs/database/automatically-archive-items-to-s3-using-dynamodb-time-to-live-with-aws-lambda-and-amazon-kinesis-firehose/"&gt;Use Kinesis to move data from DynamoDB to S3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/whitfin/s3-utils"&gt;A tool to concat files in S3&lt;/a&gt;, might be helpful to optimize aggregation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/a/33200748/1309035"&gt;Use S3's multipart upload to aggregate objects&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>database</category>
      <category>lambda</category>
    </item>
    <item>
      <title>How to pick the right Compute Savings Plan for Serverless Workloads on AWS</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Tue, 30 Jun 2020 00:00:00 +0000</pubDate>
      <link>https://dev.to/michabahr/how-to-pick-the-right-compute-savings-plan-for-serverless-workloads-on-aws-24n1</link>
      <guid>https://dev.to/michabahr/how-to-pick-the-right-compute-savings-plan-for-serverless-workloads-on-aws-24n1</guid>
      <description>&lt;p&gt;&lt;a href="https://aws.amazon.com/savingsplans/faq/"&gt;Compute Savings Plans&lt;/a&gt; are a flexible approach to lowering your AWS bill by committing to an hourly spending. Finding the right commitment can however be tricky when we consider free tiers, varying workloads, available budgets and your plans for the next years. This article describes the available options for serverless workloads, how to pick the right plan and how to improve on existing plans.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;This article applies to you, if you will &lt;strong&gt;spend at least $30 per month on AWS Lambda or Fargate&lt;/strong&gt; for at least 1 year. Below this amount it is likely that you will overpay. AWS starts to give recommendations only once you exceed $0.10 per hour or $72 per month.&lt;/p&gt;

&lt;p&gt;To use this guide effectively, &lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/ce-enable.html"&gt;you should have enabled the Cost Explorer for at least 2 months&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This article focuses on a &lt;strong&gt;single account&lt;/strong&gt;. If you use AWS Organizations you can still apply what you learn here, but &lt;a href="https://docs.aws.amazon.com/savingsplans/latest/userguide/what-is-savings-plans.html"&gt;check the docs regarding multi account setups&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Savings Plans will first apply to usage in the account that owns the plan, and then apply to usage in other accounts in the AWS Organization.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To simplify this article, I will ignore EC2 and DynamoDB. Have a look at &lt;a href="https://aws.amazon.com/blogs/aws/dynamodb-price-reduction-and-new-reserved-capacity-model/"&gt;DynamoDB Reserved Capacity&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/savingsplans/latest/userguide"&gt;AWS’s Guide to Savings Plans&lt;/a&gt; if you’re curious about lower prices for DynamoDB and EC2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disclaimer
&lt;/h2&gt;

&lt;p&gt;Numbers in this article might be wrong, as they are based on my understanding of public documentation. Start with a small savings plan and later add additional ones to avoid overpay.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do Savings Plans save money?
&lt;/h2&gt;

&lt;p&gt;With a Savings Plan you pay for compute before you use it, and in exchange your &lt;a href="https://aws.amazon.com/savingsplans/pricing/"&gt;rates for that purchase are lowered&lt;/a&gt; by up to &lt;strong&gt;17% for Lambda&lt;/strong&gt; and up to &lt;strong&gt;52% for Fargate&lt;/strong&gt;. In this chapter we will refer to hourly commitment as prepaid compute.&lt;/p&gt;

&lt;p&gt;Once purchased, a Savings Plan gives you hourly packages of prepaid compute that compute services such as Lambda and Fargate can use. Any additional compute above the prepaid amount is priced at regular On-Demand rates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PzmkPlSS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplans_sp_flow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PzmkPlSS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplans_sp_flow.png" alt="Savings Plans generate prepaid compute"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When the compute usage matches the prepaid amount, you don’t generate any additional spending beyond what you already paid for. Achieving this perfect fit is difficult as compute usage tends to vary in serverless environments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gwqVUrle--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_usage_full.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gwqVUrle--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_usage_full.png" alt="100% usage of prepaid compute"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When your services use more compute than what you already prepaid for, additional compute will be charged at On-Demand rates. This happens automatically and there’s nothing more you need to do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TY6xFG2V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_usage_ondemand.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TY6xFG2V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_usage_ondemand.png" alt="More usage than what's prepaid"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When your services use less compute than what you prepaid for, the remaining amount is not transferred into the next hour, but discarded. You pay the commitment no matter if you use it or not. This may be the case if you have irregular workloads (e.g. nightly jobs) or the free tier covers a lot of your spending.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X_MNXgFX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_usage_overpay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X_MNXgFX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_usage_overpay.png" alt="Less usage than what's prepaid"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Savings Plan will show up as a reduction of compute spending on your AWS bill.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oUAaa6KT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_lambda_bill.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oUAaa6KT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_lambda_bill.png" alt="AWS bill with prepaid Lambda"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below you see a picture of my recent compute spending (filtered to Lambda). While the first few days don’t incur any spending due to the free tier, the following days show spending and also savings (red bar in the negatives). My Savings Plan covers roughly $0.6 each day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nkvylL8m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_applied_savings.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nkvylL8m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_applied_savings.png" alt="Applied savings in cost explorer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that this does not mean that I save $0.6 each day. Instead I’ve bought the compute in advance and back then got a 17% discount compared to On-Demand. Now that prepaid compute lowers my daily spending.&lt;/p&gt;

&lt;h2&gt;
  
  
  What options are there?
&lt;/h2&gt;

&lt;p&gt;When picking a savings plan, there are two major questions to answer: For how long do you commit and when do you want to pay?&lt;/p&gt;

&lt;p&gt;You can pick a term length of 1 or 3 years. The longer, the better your rates for Fargate. For Lambda the term length doesn’t seem to matter in terms of pricing. As for the payment options, you can choose between “No upfront”, “Partial upfront” and “All upfront”. The more upfront, the better your rates.&lt;/p&gt;

&lt;p&gt;What do these upfront terms mean?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No upfront: Monthly charges&lt;/li&gt;
&lt;li&gt;Partial upfront: 50% when you buy the savings plan, the rest as monthly charges&lt;/li&gt;
&lt;li&gt;All upfront: One payment when you buy the plan&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is an overview of &lt;a href="https://aws.amazon.com/savingsplans/pricing/"&gt;the possible savings rates for us-east-1 as of 2020-06-28&lt;/a&gt;. Please check the rates for your region before making any purchase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lambda&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Payment Option&lt;/th&gt;
&lt;th&gt;Term Length&lt;/th&gt;
&lt;th&gt;Savings over On-Demand&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No upfront&lt;/td&gt;
&lt;td&gt;1/3 year(s)&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partial upfront&lt;/td&gt;
&lt;td&gt;1/3 year(s)&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All upfront&lt;/td&gt;
&lt;td&gt;1/3 year(s)&lt;/td&gt;
&lt;td&gt;17%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Fargate&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Payment Option&lt;/th&gt;
&lt;th&gt;Term Length&lt;/th&gt;
&lt;th&gt;Savings over On-Demand&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No upfront&lt;/td&gt;
&lt;td&gt;1 year&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partial upfront&lt;/td&gt;
&lt;td&gt;1 year&lt;/td&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All upfront&lt;/td&gt;
&lt;td&gt;1 year&lt;/td&gt;
&lt;td&gt;27%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No upfront&lt;/td&gt;
&lt;td&gt;3 years&lt;/td&gt;
&lt;td&gt;45%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partial upfront&lt;/td&gt;
&lt;td&gt;3 years&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All upfront&lt;/td&gt;
&lt;td&gt;3 years&lt;/td&gt;
&lt;td&gt;52%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How do I pick the right plan?
&lt;/h2&gt;

&lt;p&gt;If your compute spending is above $0.10 per hour &lt;a href="https://console.aws.amazon.com/cost-management/home#/savings-plans/recommendations"&gt;AWS will give you recommendations for Savings Plans&lt;/a&gt; which also account for variable usage patterns. Have a look at those numbers before doing your own calculations. My spending was too low for any recommendations.&lt;/p&gt;

&lt;p&gt;The key to picking the right savings plan is to understand your recent and future spending. Once you know how much you regularly spend, I suggest that you pick a small amount to start with. As a rule of thumb you can pick 50% of your hourly spending. Only once you have data on how well the small savings plan works should you consider committing to more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understand your current spending
&lt;/h3&gt;

&lt;p&gt;Start by &lt;a href="https://console.aws.amazon.com/cost-management/home"&gt;opening the AWS Cost Explorer&lt;/a&gt;. If you haven’t used it before, &lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/ce-enable.html"&gt;enable the Cost Explorer now&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the top of the graph select Daily as the granularity, and filter the services to only show “Lambda” and “EC2 Container Service” (= Fargate). If you can select Hourly that’s even better, but be aware that &lt;a href="https://aws.amazon.com/about-aws/whats-new/2019/11/aws-cost-explorer-supports-hourly-resource-level-granularity/"&gt;Hourly granularity incurs additional charges&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ch_7IvFN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_cost_explorer.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ch_7IvFN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_cost_explorer.png" alt="Cost Explorer for two months of compute"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;“The Hourly commitment is the Savings Plans rate, and not the On-demand spend. &lt;a href="https://docs.aws.amazon.com/savingsplans/latest/userguide/sp-purchase.html#purchase-sp-direct"&gt;(AWS Docs)&lt;/a&gt;”. This means that when you see an hourly spending of $1, the according hourly commitment with a savings rate of 17% is &lt;code&gt;$1 * (1-17%) = $0.83&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let’s assume a &lt;em&gt;minimum&lt;/em&gt; Lambda spending of $1 per hour or $720 per month. We’re looking for a 1 year commitment. All Upfront gives us the best rate of 17% for Lambda. With an hourly commitment of $0.83 this brings us to a &lt;strong&gt;single payment of $0.83 x 24 x 30 x 12 = $7,171&lt;/strong&gt;. As a result &lt;strong&gt;we save $1,469&lt;/strong&gt; compared to On-Demand rates ($8,640).&lt;/p&gt;

&lt;p&gt;When picking Partial Upfront, we will commit to &lt;code&gt;$1 * (1-15%) = $0.85&lt;/code&gt;, pay $3,672 upfront and then pay another $612 every month. With 15% savings on compute, we save $1,296 compared to On-Demand rates.&lt;/p&gt;

&lt;p&gt;As you can see in the picture above, my spending for compute was roughly $1 per day (not hour), but I had 6 days that were covered by the free tier. A savings plan barely made sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  Savings Planner
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://savings.bahr.dev"&gt;The savings planner&lt;/a&gt; helps you find the right savings rate. You upload your recent cost reports, enter your savings rates as well as an hourly commitment. The website then tells you how much it expects you to save.&lt;/p&gt;

&lt;p&gt;I bought a savings plan and when I compared the utilization report on AWS, the numbers were even better than expected by the savings planner.&lt;/p&gt;

&lt;p&gt;The website is frontend only (no data sent to any server) and is &lt;a href="https://github.com/bahrmichael/savingsplanner"&gt;open source on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free tier consideration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If the free tier doesn’t last a single day of your cost, then you can skip this section.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With a compute spending of $30 per month or $0.04 per hour we could think that a savings plan of $0.04 per hour makes sense. If we have days covered by the free tier however, we will overcommit by $0.04 per hour or $0.96 on each of those days. From my understanding this means that on the first 6 days, I would pay $0.04 x 24 x 6 = $5.76 more than with On-Demand pricing and on the remaining 24 days I would save 24 x 24 x $0.04 x 12% = $2.76. In total that would put me at an overpay of $3.&lt;/p&gt;

&lt;p&gt;You can use the following formula to determine if a certain commitment makes sense for you:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;hourly_commitment * 24 * days_not_covered_by_free_tier * savings_rate - hourly_commitment * 24 * days_covered_by_free_tier = total_savings&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Start small and iterate
&lt;/h3&gt;

&lt;p&gt;Do not just average your spending onto an hourly level and pick that as your hourly commitment. If you’re not spending the same on every single hour, then I suggest you rather start with a smaller savings plan and purchase another one when you see that there’s still room for improvement.&lt;/p&gt;

&lt;p&gt;While we pick an hourly commitment based on our previous spending, we will &lt;strong&gt;commit on future spending&lt;/strong&gt;. If you’re not sure that you’ll be spending that amount for the next 1 or 3 years, then please take caution when considering a savings plan.&lt;/p&gt;

&lt;p&gt;Assume you have the following spending for every month:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cls9SSs3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_iteration_1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cls9SSs3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_iteration_1.png" alt="Savings Iteration 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Based on the overly simplistic formula &lt;code&gt;hourly_commitment * 24 * days_with_spending_above_commitment * savings_rate - hourly_commitment * 24 * days_with_spending_below_commitment&lt;/code&gt; we could save $5 by committing to $0.04 per hour ($0.96 per day) with an All Upfront payment.&lt;/p&gt;

&lt;p&gt;Once we apply that savings plan, our remaining spending looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NHyEal1D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_iteration_2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NHyEal1D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_iteration_2.png" alt="Savings Iteration 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Adding another $0.04 savings plan would now put us at an additional -$0.72 of savings. Tha’s not savings but overcommitment! Therefore we should either not buy a second savings plan or consider a lower hourly commitment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://savings.bahr.dev"&gt;The savings planner can help you finding the right amount based on cost exports.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Keep in mind that you can always buy additional savings plans, but revoking existing ones is not possible. Start small and iterate!&lt;/p&gt;

&lt;h2&gt;
  
  
  How do I buy a savings plan?
&lt;/h2&gt;

&lt;p&gt;You understood how savings plans work and what hourly commitment makes sense for you? Good! &lt;a href="https://console.aws.amazon.com/cost-management/home"&gt;Now go to the cost management in the AWS Console&lt;/a&gt;. In the lefthand navbar click on Purchase Savings Plan. Here you can pick a term length of 1 or 3 years, specify your hourly commitment and decide if you want to pay all upfront, partial upfront or no upfront.&lt;/p&gt;

&lt;p&gt;Add the savings plan to your cart and carefully review the order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did I pick the right term length?&lt;/li&gt;
&lt;li&gt;Will I use up the hourly commitment or will I overpay?&lt;/li&gt;
&lt;li&gt;Am I comfortable with an upfront payment?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you’re ready to purchase, click on Submit order.&lt;/p&gt;

&lt;p&gt;That’s it. You will now get reduced rates for your compute usage. Make sure to check the utilization report over the next days.&lt;/p&gt;

&lt;p&gt;If you selected All or Partial Upfront, you will soon see a big spike in the Cost Explorer. Don’t worry, that’s just the upfront payment for the savings plan.&lt;/p&gt;

&lt;h2&gt;
  
  
  So how much am I actually saving?
&lt;/h2&gt;

&lt;p&gt;After a couple days you can &lt;a href="https://console.aws.amazon.com/cost-management/home#/savings-plans/utilization"&gt;visit the Utilization report&lt;/a&gt;. This page will show you how much you save with your savings plan and if there is any overcommitment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xyhqWW-b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_utliization.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xyhqWW-b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/savingsplan_utliization.png" alt="Utilization Report"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example above, we’re at always at 100% utilization, except for a few days in the beginning of each month where the free tier applies. At the end we’re still saving about 15% on our compute bill.&lt;/p&gt;

&lt;p&gt;Over the next days you should also notice a little drop in the Cost Explorer for spending related to Lambda and Fargate as prepaid compute is automatically applied. I suggest that you wait for a couple weeks before you consider purchasing another savings plan. Use the same approach that we followed above, but keep in mind that your spending has now lowered and you should only factor in spending since the date of the most recent savings plan purchase in the analysis for your next savings plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/savingsplans/latest/userguide/sp-monitoring.html"&gt;Have a look at what the official documentation says about monitoring savings plans&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/new-savings-plans-for-aws-compute-services/"&gt;Savings Plans Launch Announcement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.lastweekinaws.com/blog/aws-begins-sunsetting-ris-replaces-them-with-something-much-much-better/"&gt;Last Week In AWS about Savings Plans&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://savings.bahr.dev/"&gt;Savings Planner&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/savingsplans/latest/userguide"&gt;AWS’s Guide to Savings Plans&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/savingsplans/latest/userguide/sp-monitoring.html"&gt;Monitoring Savings Plans&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/dynamodb-price-reduction-and-new-reserved-capacity-model/"&gt;DynamoDB Reserved Capacity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>cost</category>
    </item>
    <item>
      <title>Measuring Performance with CloudWatch Custom Metrics and Insights</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Mon, 27 Apr 2020 00:00:00 +0000</pubDate>
      <link>https://dev.to/michabahr/measuring-performance-with-cloudwatch-custom-metrics-and-insights-2g4e</link>
      <guid>https://dev.to/michabahr/measuring-performance-with-cloudwatch-custom-metrics-and-insights-2g4e</guid>
      <description>&lt;p&gt;This article focuses on serveless technologies such as AWS Lambda and CloudWatch.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;tl;dr: CloudWatch Insights is great if you can log JSON and only consider the last few weeks, otherwise I suggest asynchronous log analysis with a detached lambda function.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In a previous article, I explained how you can use CloudWatch Custom Metrics to &lt;a href="https://bahr.dev/2020/04/13/custom-metrics/"&gt;monitor an application’s health&lt;/a&gt;. In this article we will look at the serverless scheduler, and use custom metrics to monitor the performance of its most critical component.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://bahr.dev/2019/10/11/serverless-scheduler/"&gt;serverless scheduler&lt;/a&gt; solves the problem of ad hoc scheduling with a serverless approach. This type of scheduling describes irregular point in time invocations, e.g. one in 32 hours and another one in 4 days. While scale is usually not a problem with serverless technologies, keeping the precision high can become a challenge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zgavl3-Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A9dwvWJotSP9SEPp5TE-Lzw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zgavl3-Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A9dwvWJotSP9SEPp5TE-Lzw.png" alt="AdHoc Scheduling"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The serverless scheduler accepts payloads with a &lt;code&gt;date&lt;/code&gt; about when they shall be sent back. It uses SQS to prepare events that are up to 15 minutes away from their target date and then publishes them with a &lt;a href="https://aws.amazon.com/lambda/"&gt;lambda&lt;/a&gt; function called &lt;code&gt;emitter&lt;/code&gt;. This function receives the events a second early and waits for the right moment to publish them. The reason for this is that cold starts can add a couple hundred milliseconds of delay.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;emitter&lt;/code&gt; function is also where we know how much delay there is between the expected date and the date that we managed to deliver the payload back. The lower this delay, the better our precision. We track the delay in milliseconds.&lt;/p&gt;

&lt;p&gt;In previous tests I use Python’s &lt;a href="https://matplotlib.org/"&gt;matplotlib&lt;/a&gt; to build charts, now we’ll take a look at how CloudWatch can support us here. Bonus: We can register alarms in CloudWatch to notify us when things go south.&lt;/p&gt;

&lt;p&gt;But first, let’s report the performance data performantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 Ways You Can Report Metrics
&lt;/h2&gt;

&lt;p&gt;CloudWatch has an API to &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html"&gt;upload metric data&lt;/a&gt;. We can use this API on (1) every event, (2) per execution of the &lt;code&gt;emitter&lt;/code&gt; function, (3) somewhen later by asynchronously processing the logs, or (4) not at all by using CloudWatch Insights instead.&lt;/p&gt;

&lt;p&gt;Please note that your function requires the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/permissions-reference-cw.html"&gt;permission&lt;/a&gt; &lt;code&gt;cloudwatch:PutMetricData&lt;/code&gt; to upload metric data.&lt;/p&gt;

&lt;h3&gt;
  
  
  On Every Event
&lt;/h3&gt;

&lt;p&gt;Reporting metrics is straightforward. Decide on a namespace to upload your metrics under, choose a metrics name and put in the delay. You can find the full details on the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html"&gt;AWS Documentation&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
cloudwatch = boto3.client('cloudwatch')

def handle(event, context):
    for item in event.get['Records']:
        publish_event(item)
        delay = get_delay(item)
        put_metrics(item)

def put_metrics(delay):
    cloudwatch.put_metric_data(
        Namespace='serverless-scheduler',
        MetricData=[
            {
                'MetricName': 'emitter-delay',
                'Value': delay, # e.g. 19
                'Unit': 'Milliseconds',
            },
        ]
    )

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach is good if your function processes just one event at a time and you don’t hit &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html"&gt;lambda’s concrrency limits&lt;/a&gt;. In other situations you may quickly notice a significant downside of this approach: We don’t make use of batching. In the worst case we establish a new connection for every event.&lt;/p&gt;

&lt;p&gt;Even if we don’t establish a new connection for each event, the code still waits for the network call to complete. To put the speed of network calls into perspective, have a look at &lt;a href="https://www.prowesscorp.com/computer-latency-at-a-human-scale/"&gt;Computer Latency at a Human Scale&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;How about we report metrics just once?&lt;/p&gt;

&lt;h3&gt;
  
  
  Per Function Execution
&lt;/h3&gt;

&lt;p&gt;If our function processes multiple events per execution, we can report the performance metrics in one go. To do this, we first collect, then aggregate, and finally submit all the data to CloudWatch once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
cloudwatch = boto3.client('cloudwatch')

def handle(event, context):
    delays = []
    for item in event['Records']:
        publish_event(item)
        delays.append(get_delay(item))

    values, counts = aggregate_delays(delays)
    put_metrics(values, count)

def put_metrics(values, counts):
    cloudwatch.put_metric_data(
        Namespace='serverless-scheduler',
        MetricData=[
            {
                'MetricName': 'emitter-delay',
                'Values': values,
                'Counts': counts,
                'Unit': 'Milliseconds',
            },
        ]
    )

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To ensure that we publish all the events as quickly as possible, we only collect the delay values. After the important work is done, we start doing analytics.&lt;/p&gt;

&lt;p&gt;The batch parameters of the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html"&gt;PutMetricData API&lt;/a&gt; expect an array of values and a correspoing array of counts, where each item in the counts array describes how often a given value has occurred.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;values[0] = 200
count[0] = 10

==&amp;gt; The value 200 has occurred 10 times

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To prepare these two arrays, the function &lt;code&gt;aggregate_delays&lt;/code&gt; can be implemented in the following way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def aggregate_delays(delays):
    # group the delays in a map
    delay_map = {}
    for delay in delays:
        if delay not in delay_map:
            delay_map[delay] = 0
        delay_map[delay] += 1

    # break it apart into two arrays
    values = []
    counts = []
    for value, count in delay_map.items():
        values.append(value)
        counts.append(count)

    return values, counts

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However this approach still takes up runtime of the &lt;code&gt;emitter&lt;/code&gt;, which could be needed for sending other events instead. The next approach will move the metrics reporting outside of the &lt;code&gt;emitter&lt;/code&gt; function.&lt;/p&gt;

&lt;h3&gt;
  
  
  Asynchronous Log Processing
&lt;/h3&gt;

&lt;p&gt;By detaching the analytics from the &lt;code&gt;emitter&lt;/code&gt;, we can make sure that the core application performs only the most important work. To do this, you can use the &lt;a href="https://serverless.com"&gt;serverless framework&lt;/a&gt; to attach a lambda function to another one’s log stream.&lt;/p&gt;

&lt;p&gt;In the following snippet of the &lt;code&gt;serverless.yml&lt;/code&gt;, we register a function called &lt;code&gt;analyzer&lt;/code&gt; to be invoked when new logs arrive at the log group &lt;code&gt;/aws/lambda/serverless-scheduler-emitter&lt;/code&gt;. We also add a filter so that only those logs make it to the function, where the field &lt;code&gt;log_type&lt;/code&gt; has the value &lt;code&gt;"emit_delay"&lt;/code&gt;. Learn more about the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html"&gt;Filter and Pattern Syntax&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;functions:
  analyzer:
    handler: my_handler.handle
    events:
      - cloudwatchLog:
          logGroup: '/aws/lambda/serverless-scheduler-emitter'
          filter: '{$.log_type = "emit_delay"}'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To use these filters, the log event must be in JSON format. We let the emitter output the relevant data as a JSON string.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# simplejson can handle decimals better
import simplejson as json

def handle(event, context):
    for item in event['Recods']:
        publish_event(item)
        delay = get_delay(item)

        log_event = {
            'log_type': 'emit_delay',
            'delay': delay
        }
        print(f"{json.dumps(log_event)}")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then implement the lambda function that is listening to the log stream.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import gzip
from base64 import b64decode
import simplejson as json

import boto3
cloudwatch = boto3.client('cloudwatch')

def handle(event, context):
    # log events are compressed
    # we have to decompress them first
    log_events = extract_log_events(event):
    delays = []
    for log_event in log_events:
        delays.append(int(log_event['delay']))

    # use the aggregation from the previous example
    # to reduce the number of api calls
    values, counts = aggregate_delays(delays)
    put_metrics(values, count)

def extract_log_events(event):
    compressed_payload = b64decode(event['awslogs']['data'])
    uncompressed_payload = gzip.decompress(compressed_payload)
    payload = json.loads(uncompressed_payload)
    return payload['logEvents']

# ... functions from previous example ...

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every time there are new logs, the analyzer function is invoked and reports the delay metrics.&lt;/p&gt;

&lt;p&gt;But wait, there’s another ad hoc approach which requires even less code.&lt;/p&gt;

&lt;h3&gt;
  
  
  CloudWatch Insights
&lt;/h3&gt;

&lt;p&gt;In the previous section we started logging JSON. These logs can be used by &lt;a href="https://aws.amazon.com/blogs/aws/new-amazon-cloudwatch-logs-insights-fast-interactive-log-analytics/"&gt;CloudWatch Logs Insights&lt;/a&gt; to generate metrics from logs. All without building and deploying new analyzer functions!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://twitter.com/hashtag/AWSLambda?src=hash&amp;amp;ref_src=twsrc%5Etfw"&gt;#AWSLambda&lt;/a&gt; protip: Thou Shalt Log JSON! (and then you just use cloudwatch insights for searching across all the whole log group easily instead of fucking around with log streams)!&lt;/p&gt;

&lt;p&gt;— Gojko Adzic (@gojkoadzic) &lt;a href="https://twitter.com/gojkoadzic/status/1253246550672801793?ref_src=twsrc%5Etfw"&gt;April 23, 2020&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;code&gt;emitter&lt;/code&gt; now prints JSON logs like &lt;code&gt;{'log_type': 'emity_delay', 'delay': 156}&lt;/code&gt;. To visualise the delays we open &lt;a href="https://console.aws.amazon.com/cloudwatch/home?#logsV2:logs-insights"&gt;CloudWatch Logs Insights&lt;/a&gt; in the AWS console, select the right log group and use &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html"&gt;CloudWatch Logs Query Syntax&lt;/a&gt; to build a query which aggregates the delay data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xBfIkPf2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-query.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xBfIkPf2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-query.png" alt="Insights Query"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The query &lt;code&gt;stats max(delay) by bin(60s)&lt;/code&gt; builds an aggregate (&lt;code&gt;stats&lt;/code&gt;) of the maximum delay (&lt;code&gt;max(delay)&lt;/code&gt;) for every minute (&lt;code&gt;bin(60s)&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;After running the query, we see a logs and a visualization tab. Here’s the visualization:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EkCVxxlL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-visualization.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EkCVxxlL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-visualization.png" alt="Insight Visualization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With a click on “Add to dashboard” we can build a widget out of this metric and add it to one of our existing dashboards. We’ll look more into graphing in the next section.&lt;/p&gt;

&lt;p&gt;Note that this approach is based on CloudWatch logs, where you pay &lt;a href="https://aws.amazon.com/cloudwatch/pricing"&gt;$0.03 per GB&lt;/a&gt; of storage. If you only need metrics for the last few weeks, then CloudWatch Insights with a 14 or 28 day log retention period is okay. Otherwise Custom Metrics are cheaper for long term storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph It
&lt;/h2&gt;

&lt;p&gt;In a &lt;a href="https://bahr.dev/2020/04/13/custom-metrics/"&gt;recent article&lt;/a&gt; I explained how to turn custom metrics into graphs and how to build a dashboard. This time we take a look at how we can make the best out of the delay metrics, and spice them up with a reference line.&lt;/p&gt;

&lt;p&gt;Looking at the maximum delay, we can quickly understand what the worst example is. But using percentiles allows us to better understand how bad the situation really is. If your maximum delay is 14 seconds, but that only occurred once, then the situation isn’t too bad. If however the 90% percentile (p90) to 10 seconds, then a significant number of customers will be impacted. P90 describes the best 90%.&lt;/p&gt;

&lt;p&gt;To better understand the various percentiles, you can use the following query to plot out the &lt;code&gt;max&lt;/code&gt;, &lt;code&gt;p99&lt;/code&gt;, &lt;code&gt;p95&lt;/code&gt; and &lt;code&gt;p90&lt;/code&gt;. I’ve increased the bin to 10 minutes so that the lines don’t overlap too much.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stats max(delay), percentile(delay, 99), percentile(delay, 95), percentile(delay, 90) by bin(10m) 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The visualization gives us four lines.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RFFvtnLd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-percentiles.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RFFvtnLd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-percentiles.png" alt="Insights Percentiles"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Reference Line
&lt;/h3&gt;

&lt;p&gt;If you’re building graphs from custom metrics, you can add a reference line to indicate a threshold. With the serverless scheduler my reference line is 1 second. I have not found out how I can add static values through CloudWatch Insights and will therefore use regular CloudWatch metrics instead. Let me know if you know more!&lt;/p&gt;

&lt;p&gt;Once you selected a metric, you can add a reference line by adding the formula &lt;code&gt;IF(m1, 1000, 0)&lt;/code&gt;. Replace &lt;code&gt;1000&lt;/code&gt; with your reference value. This expression will print a reference line, if the other data series &lt;code&gt;m1&lt;/code&gt; has a value.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2KDuyLfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-reference-line.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2KDuyLfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/measuring-performance-reference-line.jpg" alt="Reference Line"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Alarms
&lt;/h2&gt;

&lt;p&gt;If too many delays are above our reference line, we should investigate if there’s a new bug or increased load breaks the system. The quickest way to learn about that is to use &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html"&gt;CloudWatch Alarms&lt;/a&gt;. In a &lt;a href="https://bahr.dev/2020/04/13/custom-metrics/"&gt;previous article&lt;/a&gt; I explained how you can set up alarms that send you an email.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article we learned 4 approaches to generating metric data, each with their own little trade offs. CloudWatch Insights is great if you only consider the last few weeks, otherwise I suggest asynchronous log analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Studying
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=g1wxfYVjCPY"&gt;AWS re:Invent 2018: Introduction to Amazon CloudWatch Logs Insights (DEV375)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html"&gt;CloudWatch Alarms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html"&gt;CloudWatch Logs Insights Query Syntax&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=hF3NM9j-u7I"&gt;Amazon CloudWatch Synthetics&lt;/a&gt; for more complex testing and monitoring&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>metrics</category>
      <category>cloudwatch</category>
    </item>
    <item>
      <title>Monitoring an application's health with CloudWatch Custom Metrics</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Mon, 13 Apr 2020 00:00:00 +0000</pubDate>
      <link>https://dev.to/michabahr/monitoring-an-application-s-health-with-cloudwatch-custom-metrics-4mkj</link>
      <guid>https://dev.to/michabahr/monitoring-an-application-s-health-with-cloudwatch-custom-metrics-4mkj</guid>
      <description>&lt;p&gt;Follow me on &lt;a href="https://bahr.dev"&gt;bahr.dev&lt;/a&gt; and &lt;a href="https://twitter.com/michabahr"&gt;twitter&lt;/a&gt; so you are the first to see when I publish new articles!&lt;/p&gt;




&lt;p&gt;For most applications it makes sense to trigger CloudWatch alarms when lambda functions throw errors. Throwing errors on unwanted behavior is a best practice which also allows you to make use of standard metrics and redrive mechanisms. However some applications may have trade offs between concurrency and blast radius, which don’t allow them to rely solely on errors for the health of their application.&lt;/p&gt;

&lt;p&gt;In this article I will show you how I use custom metrics to verify that an application’s core process is healthy. We will also take a look at the operational cost of this solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;The application “Eve Market Watch” lets players of the MMORPG Eve Online define market thresholds for various items. When the available amount drops below that threshold, the user gets a notification so they can restock the market. In the picture below, a threshold of 100,000 would trigger an ingame mail while a threshold of 90,000 would not yet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PG7LtciX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-market.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PG7LtciX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-market.png" alt="Market"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core process (market parser) takes all the user defined thresholds, pulls in market data from the game’s API and figures out which items are running low. If the number of processed items drops significantly, then something has happened that I should investigate, be it a market that’s not available anymore or a new bug.&lt;/p&gt;

&lt;p&gt;The application has a trade off between concurrency and blast radius. The optimal blast radius would be one lambda per user and market, which keeps the application intact for all users, while allowing for quick isolation of the problematic ones. However I’m using the free plan of redislab to cache before writing to DynamoDB. The free plan of redislab has a limit of 30 connections, while the application currently has 370 active users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Goal
&lt;/h2&gt;

&lt;p&gt;If the market parser breaks, I want to know about that before my users do. There have been a couple times where I repeatedly broke the core process without noticing it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0weIgTPp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-chat.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0weIgTPp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-chat.png" alt="Chat complaint"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To achieve this, the market parser shall track the number of items being processed so that an alarm can fire if that number drops significantly.&lt;/p&gt;

&lt;p&gt;Here is where CloudWatch custom metrics and alarms come into play.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Metrics
&lt;/h2&gt;

&lt;p&gt;Custom metrics allow you to collect arbitrary time series data, graph it and trigger actions.&lt;/p&gt;

&lt;p&gt;To collect custom metrics you need at least a namespace, a metric name, a value and a unit. You can find the full details on the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html"&gt;AWS Documentation&lt;/a&gt;. You may also define dimensions to increase the granularity of your data. The following examples use Python 3.7 with AWS’ &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudwatch.html"&gt;boto3 client for CloudWatch&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_data(
    Namespace='marketwatch',
    MetricData=[
        {
            'MetricName': 'my-metric-name',
            'Dimensions': [
                {
                    'Name': 'dimension-name',
                    'Value': 'dimension-value'
                }
            ],
            'Value': 123,
            'Unit': 'Count'
        },
    ]
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The namespace is a &lt;code&gt;string&lt;/code&gt; which lets you link multiple metrics to an application or domain. In my example I use &lt;code&gt;marketwatch&lt;/code&gt; as the namespace.&lt;/p&gt;

&lt;p&gt;By setting a good metric name, you can identify your new metric amongst others and understand what data it holds. In my example I use &lt;code&gt;snapshots-built&lt;/code&gt;, as this is the number of items that the market parser was able to get data for.&lt;/p&gt;

&lt;p&gt;As for the metric value I send the number of items that have been processed and use the unit &lt;code&gt;Count&lt;/code&gt;. See the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html"&gt;documentation&lt;/a&gt; for all available units.&lt;/p&gt;

&lt;p&gt;You may increase the metrics’ granularity with up to 10 &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#Dimension"&gt;dimensions&lt;/a&gt;. Beware that you can only define CloudWatch alarms on the highest granularity. In my example I add one dimension, which distinguishes between real data that I got from the markets, and zero values which are added when no data is available.&lt;/p&gt;

&lt;p&gt;All things together the function that sends the metrics looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def put_metrics(count, snapshot_type):
    cloudwatch.put_metric_data(
        Namespace='marketwatch',
        MetricData=[
            {
                'MetricName': 'snapshots-built',
                'Dimensions': [
                    {
                        'Name': 'type',
                        'Value': snapshot_type # can be 'real' or 'virtual'
                    }
                ],
                'Value': count,
                'Unit': 'Count'
            },
        ]
    )

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I intentionally don’t set a timestamp so that CloudWatch registers the event at the timestamp it is received. “Data points with time stamps from 24 hours ago or longer can take at least 48 hours to become available for GetMetricData or GetMetricStatistics from the time they are submitted.” - &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html"&gt;API PutMetricData&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Permissions
&lt;/h2&gt;

&lt;p&gt;You have to grant your function the permission to submit metrics. If you’re using the &lt;a href="https://serverless.com"&gt;serverless framework&lt;/a&gt;, you can add the following permission to your &lt;code&gt;serverless.yml&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;provider:
  ...
  iamRoleStatements:
    - Effect: Allow
      Action:
        - cloudwatch:PutMetricData
      Resource: "*"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more details check the api documentation for &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html"&gt;PutMetricData&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;Once the code is deployed and running we go to the &lt;a href="https://console.aws.amazon.com/cloudwatch/home#metricsV2:graph=~()"&gt;CloudWatch Metrics&lt;/a&gt;, look up our metric and verify that our code is collecting data. Once your code submitted the first metrics, you will see your new namespace under “Custom Namespaces”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--A0smWE1J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-metrics.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--A0smWE1J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-metrics.png" alt="Metrics with custom namespaces"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open the dashboard, drill down into the right category and explore the available data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualization
&lt;/h2&gt;

&lt;p&gt;Once you found your data, continue by creating a graph which visualizes your data. Select the data series you want to visualize and adjust the “Graphed metrics”. When you see many dots in your graph, you can increase the period so that the dots get merged into a line. You can also report metrics more frequently.&lt;/p&gt;

&lt;p&gt;As the core process of my application runs every 15 minutes, it makes sense to average over a period of 15 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mbS6QQPF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-graphed-metrics.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mbS6QQPF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-graphed-metrics.png" alt="Graphed metrics"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more info about graphing metrics, check out the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/graph_metrics.html"&gt;CloudWatch documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once you’re happy with the data and graphs you’re seeing, head over to the &lt;a href="https://console.aws.amazon.com/cloudwatch/home#dashboards:"&gt;CloudWatch Dashboards&lt;/a&gt; where we will create our own dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dashboard
&lt;/h2&gt;

&lt;p&gt;Go to the CloudWatch &lt;a href="https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:"&gt;dashboards&lt;/a&gt; and create a new one. Give it a name and add your first widget, where you recreate the graph from the last section. You will again see the screen from the last section where you select your custom namespace, your metrics and dimensions, and then build a graph with the appropriate settings. Give your graph a name and click on “Create widget”.&lt;/p&gt;

&lt;p&gt;Resize the widget and add more as you need. Here’s how my dashboard looks:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QmS1CNE3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QmS1CNE3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-dashboard.png" alt="Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see in the top graph on the left side around “04/06”, there is some lack of data. When my code stops working and doesn’t collect data anymore, an alarm should be triggered.&lt;/p&gt;

&lt;p&gt;There is another drop after “04/08”. This one recovered itself within a reasonable time. I do not need an alarm for that situation, but should still analyze the problem later on.&lt;/p&gt;

&lt;p&gt;Let’s look at creating alarms next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alarms
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#CloudWatchAlarms"&gt;CloudWatch Alarms&lt;/a&gt; trigger an action when a given condition is met, i.e. response times exceeding 1000ms. In our example we want to fire an alarm, when the reported amount of processed items drops significantly for a prolonged time or is not reported at all.&lt;/p&gt;

&lt;p&gt;To create an alarm, head over to the &lt;a href="https://console.aws.amazon.com/cloudwatch/home#alarmsV2:"&gt;alarms section&lt;/a&gt; in CloudWatch and click on “Create alarm”. You will then be asked to select a metric, where you pick and plot a metric as we’ve done in the previous sections. Note that here you can only select one data series. You can’t aggregate various dimensions here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EfrjvktP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-anomaly-band.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EfrjvktP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-anomaly-band.png" alt="Alarm configuration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the metric selected, you can define conditions. I decided to go with the “Anomaly detection” and picked “Lower than the band” as the threshold type. Play around with the anomaly detection threshold value to see what is best for your data. In the additional configuration I defined that 10 out of 10 datapoints need to breach the band before an alarm gets triggered. This way the app can recover itself in case an external API temporarily fails. I also decided to “Treat missing data as bad (breaching threshold)” as the alarm would otherwise not fire if my code breaks before the metrics are reported.&lt;/p&gt;

&lt;p&gt;In the picture below you see a preview of the anomaly detection against the metrics I’ve collected. We see a few red drops where the anomaly detection triggers, but as we’ve configured the alarm to only fire if 10 out of 10 data points are bad, we only get alarms when the market parser does not recover. If you look closely, you also see regular drops in the gray anomaly band which are caused by the game’s daily downtime. CloudWatch correctly understands that this is a recurring behavior.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UKRU-bKF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-preview.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UKRU-bKF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-preview.png" alt="Anomaly band preview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When my alarm fires, I want to receive an email. This is the easiest way to continue, but you may set up custom integrations through SNS topics, e.g. for &lt;a href="https://read.acloud.guru/slack-notification-with-cloudwatch-alarms-lambda-6f2cc77b463a"&gt;Slack&lt;/a&gt;. To send alarms to an email, choose to “Create a new topic”, enter a name for the new topic and enter an email address that will receive the alarm. Click on “Create Topic” below the email input and then click on next to continue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ucKAS5O4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-notification.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ucKAS5O4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/bahrmichael/bahrmichael.github.io/raw/master/pictures/custom-metrics-notification.png" alt="Creating a notification"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally give your alarm a name and finish the setup. To test your alarm you can update the trigger conditions or report metrics that will trigger the alarm. Make sure to check that you get an email as expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;The operational cost of using custom CloudWatch Custom Metrics and Alarms consists of two parts: The ingestion and the monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ingestion
&lt;/h3&gt;

&lt;p&gt;Each custom metric that you submit data for costs $0.30 per month. Custom metrics are not covered by the &lt;a href="https://aws.amazon.com/cloudwatch/pricing/"&gt;free tier&lt;/a&gt;. “All custom metrics charges are prorated by the hour and metered only when you send metrics to CloudWatch.”&lt;/p&gt;

&lt;p&gt;You also pay for the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html"&gt;PutMetricData&lt;/a&gt; calls, but the first one million API requests are covered by the free tier, and then cost $10 per one million API requests. My application reports two metrics every 15 minutes, which is a total of 5,760 API requests per month.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring
&lt;/h3&gt;

&lt;p&gt;We’ve set up one dashboard and one alarm. Each dashboard costs $3 per month, but the free tier covers three dashboards for up to 50 metrics per month. Each anomaly detection alarm costs $0.30 at standard resolution as it is made up of three alarms: “one for the evaluated metric, and two for the upper and lower bound of expected behavior”. If you select high resolution, which is 10 seconds instead of 60 seconds, you pay three times as much. As we’re only reporting data every 15 minutes, high resolution doesn’t make sense. The free tier covers up to 10 alarm metrics (not applicable to high-resolution alarms).&lt;/p&gt;

&lt;h3&gt;
  
  
  Total
&lt;/h3&gt;

&lt;p&gt;For my application I expect to pay a total of $0.30 per month. Without the free tier I would still expect less than $5.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We saw how applications can collect custom metrics and how we can use CloudWatch to trigger alarms based on those metrics. I think the price is very fair, as small hobby projects with a few custom metrics can get away with a low price while medium sized enterprise software can remain under $100 per month.&lt;/p&gt;

&lt;p&gt;If you would like to define alarms as code, have a look at &lt;a href="https://github.com/awslabs/realworld-serverless-application/blob/master/ops/sam/app/alarm.template.yaml"&gt;this example&lt;/a&gt;. For all users of the serverless framework, &lt;a href="https://serverless.com/blog/serverless-ops-metrics/"&gt;this article&lt;/a&gt; explains how to add alerts.&lt;/p&gt;




&lt;p&gt;Did you like this article or do you know where it could be improved? Let me know on &lt;a href="https://twitter.com/michabahr"&gt;twitter&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>python</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Stock Sentiment Analysis - Part 2: Analysing the sentiment</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Fri, 27 Mar 2020 12:40:56 +0000</pubDate>
      <link>https://dev.to/michabahr/stock-sentiment-analysis-part-2-analysing-the-sentiment-28ig</link>
      <guid>https://dev.to/michabahr/stock-sentiment-analysis-part-2-analysing-the-sentiment-28ig</guid>
      <description>&lt;p&gt;In this two part article I will show you how to build an app, that collects people's opinions about companies and how to turn that into sentiments. Disclaimer: Trade at your own risk!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/michabahr/stock-sentiment-analysis-part-1-collecting-opinions-gdl"&gt;Part 1&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Warning&lt;/strong&gt;: Only deploy this if you have set up budget alarms and understand the spending potential of this solution! Check the section Cost Analysis for more details.&lt;/p&gt;




&lt;p&gt;We left off with collecting raw tweets in our DynamoDB table. Now it's time to understand if the tweets about a given company are rather positive or negative. We will add another Lambda which is invoked when we find new tweets. This Lambda then asks AWS Comprehend for the tweet's sentiment, which is a score of how positive, neutral or negative the text was.&lt;/p&gt;

&lt;p&gt;Here's an example tweet I collected. Do you think the text is rather negative or positive?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I wish I were as positive about anything in life as Ross is about $TSLA.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Streams
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://serverless.com"&gt;serverless framework&lt;/a&gt; has a simple way to attach a lambda to a &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html"&gt;DynamoDB stream&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;functions:
  tweetAnalyzer:
    handler: tweetAnalyzer.handle
    events:
      - stream:
          arn: arn:aws:dynamodb:REGION:ACCOUNT_ID:table/Tweets/stream/DATE
          batchSize: 25
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this snippet of code the function &lt;code&gt;tweetAnalyzer&lt;/code&gt; will be invoked when we add new tweets to the table. The &lt;code&gt;batchSize&lt;/code&gt; of 25 allows us to process multiple tweets at once. I chose 25 because that's the maximum amount of texts we can pass to AWS Comprehend per request (&lt;a href="https://docs.aws.amazon.com/comprehend/latest/dg/API_BatchDetectSentiment.html"&gt;BatchSizeLimitExceededException&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Our function is invoked when an entry in our DynamoDB table changes. As I've defined the stream to only contain the key of the entry, we first have to load the entry, then send it to AWS Comprehend and finally update the entry in our table. You can find the source code on &lt;a href="https://github.com/bahrmichael/twitter-sentiment-analyzer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The event passed from the stream to our function contains a list of &lt;code&gt;Records&lt;/code&gt; which each hold the entry's key in &lt;code&gt;item['dynamodb']['Keys']['id']['N']&lt;/code&gt;. If your key is not a number, you have to adjust the &lt;code&gt;['id']['N']&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
table = boto3.resource('dynamodb').Table('Tweets')

tweets = []
for item in event.get('Records', []):
    item_id = int(item['dynamodb']['Keys']['id']['N'])
    tweet = table.get_item(Key={'id': item_id}).get('Item', None)
    if 'sentiment_score' not in tweet:
        tweets.append(tweet)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loop loads all the entries which don't have a &lt;code&gt;sentiment_score&lt;/code&gt; yet. We need to filter because the events will fire when an entry in the table &lt;em&gt;changes&lt;/em&gt;, which is also the case when we add the sentiment score. Let's continue with adding that score.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sentiments
&lt;/h2&gt;

&lt;p&gt;The Comprehend API expects a list of texts. As we saved some more meta information along the tweets' text, we have to extract the raw text first. Don't change the tweets' order here as AWS Comprehend returns the results along with their &lt;code&gt;Index&lt;/code&gt; from the input, which we will map back the results to the original tweets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;text_list = list(map(lambda tweet: tweet['text'], tweets))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the next step we send the texts to Comprehend. Note that the &lt;code&gt;text_list&lt;/code&gt; must not have more than &lt;a href="https://docs.aws.amazon.com/comprehend/latest/dg/API_BatchDetectSentiment.html"&gt;25 entries&lt;/a&gt;. You must also specify a language. As we've previously told the Twitter API to only return English tweets (by best effort), we will use &lt;code&gt;en&lt;/code&gt; here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
comprehend = boto3.client('comprehend')

comprehend_response = comprehend.batch_detect_sentiment(
    TextList=text_list,
    LanguageCode='en'
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of the results contains an &lt;code&gt;Index&lt;/code&gt; which is the index of the item in the &lt;code&gt;text_list&lt;/code&gt;. We use that information to map the result back to our DynamoDB entries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from decimal import Decimal

for entry in comprehend_response.get('ResultList', []):
    tweet = tweets[entry['Index']]
    tweet['sentiment_score'] = json.loads(json.dumps(entry['SentimentScore']), parse_float=Decimal)
    table.put_item(Item=tweet)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that because DynamoDB doesn't like float parameters, we have to convert the floats to Decimals.&lt;/p&gt;

&lt;p&gt;Let's see how Comprehend rated our example from above. Do you think that Comprehend catched the sarcasm?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "neutral": 0.07,
    "positive": 0.88,
    "negative": 0.03,
    "mixed": 0.00
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While the score may be wrong for a couple of tweets, the overall scoring over thousands and millions of tweets will average out the wrong ones.&lt;/p&gt;

&lt;p&gt;The app now collects sentiment insights for you. Don't stop reading here, as this might become expensive!&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;Each tweet will require one sentiment analysis and one DynamoDB write operation. Again the time span is a whole year of operation.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://aws.amazon.com/dynamodb/pricing/on-demand/"&gt;on-demand pricing&lt;/a&gt; DynamoDB charges $1.25 per million write request units and $0.25 per million read request units. With the worst case of 100 new tweets every 10 minutes, we're looking at a total of 100x6x24x365 = 5256000 WCUs or $6.57 per year.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://aws.amazon.com/comprehend/pricing/"&gt;cost of AWS Comprehend&lt;/a&gt; is a bit more intense. &lt;/p&gt;

&lt;p&gt;Tweets vary in length, but for the worst case we assume that each tweet uses up the full 280 characters. If we collect 100 tweets every 10 minutes, we are going collect 100x6x24x365 = 5,256,000 tweets per year. Over the last year I did however only collect 1,500,000 tweets.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Amazon Comprehend requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request." &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sentiment analysis is priced at $0.0001 per unit. The price to analyze 5,256,000 tweets with 280 characters or 3 units each is 5,256,000 x $0.0001 x 3 = &lt;strong&gt;$1,576.8&lt;/strong&gt;. This is expensive for a hobby project, so please make sure you tag your resources appropriately and think twice before you run this as a hobby project.&lt;/p&gt;

&lt;p&gt;There is a free tier for Comprehend, which for sentiment analysis will "cover only the main analysis [...]. But after you analyze the text, the system automatically calls different APIs [...]. These automatic calls [...] are not covered by the Free Tier [...]" (Source: AWS Support). Because of this I started seeing Comprehend charges before the full 50k free units were used up. The team of Comprehend has already received the feedback so they can improve the website.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further development
&lt;/h2&gt;

&lt;p&gt;If you want to take this approach and push it forward, I suggest to start building a container/VM based solution to pull tweets from Twitter. &lt;a href="https://aws.amazon.com/fargate/"&gt;AWS Fargate&lt;/a&gt; lets you run containers without managing the servers below.&lt;/p&gt;

&lt;p&gt;Know any other sentiment APIs? How do they compare in pricing? Try to attach one of them instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Analyzing the sentiment can be achieved with one function, but only scales if your workload is small or your credit card big. Please make sure you have budget alarms set up if you want to deploy this yourself, this is not cheap!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>python</category>
      <category>serverless</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Stock Sentiment Analysis - Part 1: Collecting opinions</title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Fri, 27 Mar 2020 12:40:03 +0000</pubDate>
      <link>https://dev.to/michabahr/stock-sentiment-analysis-part-1-collecting-opinions-gdl</link>
      <guid>https://dev.to/michabahr/stock-sentiment-analysis-part-1-collecting-opinions-gdl</guid>
      <description>&lt;p&gt;In this two part article I will show you how to build an app that collects people's opinions about companies and how to turn that into sentiments. Disclaimer: Trade at your own risk!&lt;/p&gt;

&lt;p&gt;As for technologies my current go to stack is AWS serverless tech and deployment with the Serverless Framework. This article assumes that you are familiar with both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Collecting opinions
&lt;/h2&gt;

&lt;p&gt;Many platforms have APIs that let us collect opinions. The prime example is probably Twitter where everyone screams into the forest. We will start by setting up an app and collecting the raw data.&lt;/p&gt;

&lt;p&gt;In order to collect data from Twitter, you have to create &lt;a href="https://developer.twitter.com/apps" rel="noopener noreferrer"&gt;a developer app&lt;/a&gt; and generate &lt;a href="https://developer.twitter.com/en/docs/basics/authentication/oauth-1-0a" rel="noopener noreferrer"&gt;oauth1 keys&lt;/a&gt;. You can do all of that through the browser. Store the details in the &lt;code&gt;config.&amp;lt;STAGE&amp;gt;.json&lt;/code&gt; file. The value for &lt;code&gt;&amp;lt;STAGE&amp;gt;&lt;/code&gt; is either &lt;code&gt;dev&lt;/code&gt; or whatever you provided with the &lt;code&gt;--stage&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fekekl9klmpkiqt4ibcgh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fekekl9klmpkiqt4ibcgh.png" alt="Twitter Access Keys"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next we set up a serverless app. With serverless technologies like AWS Lambda we don't need to worry about creating and maintaining serves, and it's also super cheap. You can find the source code &lt;a href="https://github.com/bahrmichael/twitter-sentiment-analyzer" rel="noopener noreferrer"&gt;on github&lt;/a&gt; or learn how to create a serverless app at &lt;a href="https://serverless.com" rel="noopener noreferrer"&gt;serverless.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our first function will run in a regular interval. I chose 10 minutes because that was usually enough time for for 10-50 new tweets to appear. Learn more about the scheduling options by visiting the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html" rel="noopener noreferrer"&gt;CloudWatch docs&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;functions:
  tweetCollector:
    handler: tweetCollector.handle
    events:
      - schedule: rate(10 minutes)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function needs to authenticate with Twitter, load new tweets and then store these in our table for later processing.&lt;/p&gt;

&lt;p&gt;The first step is pretty simple with &lt;a href="https://tweepy.org" rel="noopener noreferrer"&gt;Tweepy&lt;/a&gt;. In the following snippet we load the oauth1 keys from environment variables and then authenticate. Our script will abort if the authentication fails. You can find the full source code on &lt;a href="https://github.com/bahrmichael/twitter-sentiment-analyzer" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import tweepy

auth = tweepy.OAuthHandler(os.environ['CONSUMER_KEY'], os.environ['CONSUMER_SECRET'])
auth.set_access_token(os.environ['ACCESS_TOKEN'], os.environ['ACCESS_TOKEN_SECRET'])

api = tweepy.API(auth)
if not api:
    print("Can't Authenticate")
    return
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As I'm running this on AWS Lambda, search queries are a better fit than the Twitter streaming API. The following search query allows us to define a search term, how many tweets we want to load per query as well as a max_id and since_id for pagination.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_tweets = api.search(q=search_query, lang='en', count=tweets_per_query, max_id=str(max_id - 1), since_id=since_id)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To store the data I'm using a &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SampleData.CreateTables.html" rel="noopener noreferrer"&gt;DynamoDB table&lt;/a&gt;. As this query may return tweets that we already have, we check if a tweet already exists before writing it. We could just overwrite the tweets, but this would leader to higher DynamoDB spending. As a rule of thumb &lt;a href="https://aws.amazon.com/dynamodb/pricing/on-demand/" rel="noopener noreferrer"&gt;a read costs&lt;/a&gt; 1/5th of a write operation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for tweet in new_tweets:
    existing_tweet = table.get_item(Key={'id': tweet._json['id']}).get('Item', None)
    if existing_tweet is None:
        table.put_item(
            Item={
                    'id': tweet._json['id'],
                    'created_at': tweet._json['created_at'],
                    'text': tweet._json['text'],
                    'query': search_query
                }
            )
        count += 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before you deploy, make sure to fill in the configuration. The config is documented in the &lt;a href="https://github.com/bahrmichael/twitter-sentiment-analyzer" rel="noopener noreferrer"&gt;readme&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally deploy the app with &lt;code&gt;sls deploy&lt;/code&gt; and let the tweet collection begin. Keeping in mind that the schedule only fires every 10 minutes, check the logs for errors if no tweets arrive. The most likely errors are missing AWS permissions or bad Twitter keys. The first new tweet in our table shows that our collector is up and running!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;START RequestId: 6aad9d8e-aa2c-422c-a471-6c7f9254c919 Version: $LATEST
Downloaded 100 tweets
Saved tweets: 17
END RequestId: 6aad9d8e-aa2c-422c-a471-6c7f9254c919
REPORT RequestId: 6aad9d8e-aa2c-422c-a471-6c7f9254c919  Duration: 1121.61 ms    Billed Duration: 1200 ms    Memory Size: 1024 MB    Max Memory Used: 98 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;To understand the &lt;strong&gt;yearly&lt;/strong&gt; cost of this stack, we will look at two parts: The Lambda function and the DynamoDB table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda
&lt;/h3&gt;

&lt;p&gt;The collector function runs once every 10 minutes. This results in 6x24x365 = 52560 invocations per year. &lt;a href="https://aws.amazon.com/lambda/pricing/" rel="noopener noreferrer"&gt;Lambda charges&lt;/a&gt; $0.20 per 1M requests, so we're looking at $0.1 per year. Additionally Lambda charges $0.0000166667 for every GB-second. A GB-second is a Lambda with 1024MB RAM running for one second. When our function runs for 2 seconds each 10 minutes, it will use 2x6x24x365 = 105120 GB-seconds. That's another $1.75 per year.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://aws.amazon.com/free/" rel="noopener noreferrer"&gt;free tier&lt;/a&gt; is likely to cover all of that.&lt;/p&gt;

&lt;h3&gt;
  
  
  DynamoDB
&lt;/h3&gt;

&lt;p&gt;With &lt;a href="https://aws.amazon.com/dynamodb/pricing/on-demand/" rel="noopener noreferrer"&gt;on-demand pricing&lt;/a&gt; DynamoDB charges $1.25 per million write request units and $0.25 per million read request units. This means that the worst case for spending is all new tweets and none that we can skip. With 100 new tweets every 10 minutes, we're looking at a total of 100x6x24x365 = 5256000 WCUs or $6.57 per year.&lt;/p&gt;

&lt;p&gt;If you exceed the free 25GB per month, then DynamoDB will charge you $0.25 for every additional GB. My table with 1.5m tweets weighs ~270MB.&lt;/p&gt;

&lt;p&gt;You can additionally lower the cost by switching to provisioned mode, where DynamoDB offers &lt;a href="https://aws.amazon.com/dynamodb/pricing/provisioned/" rel="noopener noreferrer"&gt;25 WCUs and 25 RCUs for free&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Total
&lt;/h3&gt;

&lt;p&gt;In total that's $0.1 + $1.75 + $6.57 = $8.42 per year.&lt;/p&gt;

&lt;p&gt;Note that CloudWatch will charge you too, should you exceed the free tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Collecting data is fairly simple and very cheap. But will it be the same if we monitor 100 companies? Feel free to test that by using the source code on &lt;a href="https://github.com/bahrmichael/twitter-sentiment-analyzer" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/michabahr/stock-sentiment-analysis-part-2-analysing-the-sentiment-28ig"&gt;part 2&lt;/a&gt; of this article we will use sentiment analysis to understand if a tweet is positive, neutral or negative.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>lambda</category>
      <category>python</category>
    </item>
    <item>
      <title>How to analyse and aggregate data from DynamoDB </title>
      <dc:creator>Michael Bahr</dc:creator>
      <pubDate>Sun, 02 Feb 2020 20:56:24 +0000</pubDate>
      <link>https://dev.to/michabahr/how-to-analyse-and-aggregate-data-from-dynamodb-24p3</link>
      <guid>https://dev.to/michabahr/how-to-analyse-and-aggregate-data-from-dynamodb-24p3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was first published on &lt;a href="https://bahr.dev"&gt;bahr.dev&lt;/a&gt;. &lt;a href="https://dev.us19.list-manage.com/subscribe/post?u=60149d3a4251e09f826818ef8&amp;amp;id=ad766562ce"&gt;Signup for the mailing list&lt;/a&gt; and get new articles straight to your inbox!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;DynamoDB is not a database designed to run analysis queries with. We can however use DynamoDB streams and lambda functions to run these analyses each time data changes.&lt;/p&gt;

&lt;p&gt;This article explains how to build an analysis pipeline and demonstrates it with two examples. You should be familiar with DynamoDB tables and AWS Lambda.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pipeline setup
&lt;/h2&gt;

&lt;p&gt;Assuming we already have a DynamoDB table, there are two more parts we need to set up: A DynamoDB stream and a Lambda function. The stream emits changes such as inserts, updates and deletes.&lt;/p&gt;

&lt;h3&gt;
  
  
  DynamoDB Stream
&lt;/h3&gt;

&lt;p&gt;To set up the DynamoDB stream, we'll go through the AWS management console. Open the settings of your table and click the button called "Manage Stream". &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---KUiux0R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ewrxc6sikqz79ehlcx0f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---KUiux0R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ewrxc6sikqz79ehlcx0f.png" alt="Stream details"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By default you can go with "New and old images" which will give you the most data to work with. Once you enabled the stream, you can copy its ARN which we will use in the next step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attach a Lambda Function
&lt;/h3&gt;

&lt;p&gt;When you work with the &lt;a href="https://serverless.com/framework/docs/providers/aws/events/streams/"&gt;serverless framework&lt;/a&gt;, you can simply set the stream as an event source for your function by adding the ARN as a &lt;code&gt;stream&lt;/code&gt; in the &lt;code&gt;events&lt;/code&gt; section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;functions:
  analysis:
    handler: analysis.handle
    events:
      - stream: arn:aws:dynamodb:us-east-1:xxxxxxx:table/my-table/stream/2020-02-02T20:20:02.002
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy the changes with &lt;code&gt;sls deploy&lt;/code&gt; and your function is ready to process the incoming events. It's a good idea to start by just printing the data from DynamoDB and then building your function around that input.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Design
&lt;/h3&gt;

&lt;p&gt;With DynamoDB it's super important to think about your data access patterns first, or you'll have to rebuild your tables many more times than necessary. Also watch Rick Houlihan's fantastic design patterns for DynamoDB from &lt;a href="https://www.youtube.com/watch?v=HaEPXoXVf2k"&gt;re:Invent 2018&lt;/a&gt; and &lt;a href="https://www.youtube.com/watch?v=6yqfmXiZTlM"&gt;re:Invent 2019&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 1: Price calculation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MG99-wYZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://web.ccpgamescdn.com/newssystem/media/70713/1/ACSENSION_HEADER_B.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MG99-wYZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://web.ccpgamescdn.com/newssystem/media/70713/1/ACSENSION_HEADER_B.jpg" alt="EVE Online"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In EVE Online's player driven economy items can be traded through contracts. A hobby project of mine uses &lt;a href="https://esi.evetech.net/"&gt;EVE Online's API&lt;/a&gt; to get information about item exchange contracts in order to calculate prices for these items. It collected more than 1.5 million contracts over the last year and derived prices for roughly 7000 items.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-Processing
&lt;/h3&gt;

&lt;p&gt;To build an average price, we need more than one price point. For this reason the single contract we receive is not enough, but we need all the price points for an item.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  {
    "contract_id": 152838252,
    "date_issued": "2020-01-05T20:47:40Z",
    "issuer_id": 1273892852,
    "price": 69000000,
    "location_id": 60003760,
    "contract_items": [2047]
  }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the API's response is not in an optimal format, we have to do some pre-processing to eliminate unnecessary information and put key information into the table's primary and sorting keys. Remember that table scans can get expensive and smaller entries mean more records per query result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|type_id (pk)|date (sk)           |price   |location|
|2047        |2020-01-05T20:47:40Z|69000000|60003760|

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case I decided to use the item's ID (e.g. 2047) as the primary key and the date as the sort key. That way my analyser can pick all the records for one item and limit it to the most recent entries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analysis
&lt;/h3&gt;

&lt;p&gt;The attached Lambda function receives an event from the stream. This event contains amongst others the item's ID for which it should calculate a new price. Using this ID the function queries the pre-processed data and receives a list of items from which it can calculate averages and other valuable information.&lt;/p&gt;

&lt;p&gt;Attention: Don't do a scan here! It will get expensive quickly. Design your data so that you can use queries.&lt;/p&gt;

&lt;p&gt;The aggregated result is persisted in another table from which we can source a pricing API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;p&gt;Without further adjustment the analysis function will get linearly slower the more price points are available. It can however limit the number of price points it loads. By scanning the date in the sort key backwards we load only the latest, most relevant entries. Based on our requirements we can then decide to load only one or two pages, or opt for the most recent 1000 entries. This way we can enforce an upper bound on the runtime per item.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 2: Leaderboard
&lt;/h2&gt;

&lt;p&gt;Another example based on a &lt;a href="https://twitter.com/dm_macs/status/1223925884152950784"&gt;twitter discussion&lt;/a&gt; is about a leaderboard. In the German soccer league Bundesliga the club from Cologne won 4:0 against the club from Freiburg today. This means that Cologne gets three points while Freiburg gets zero. Loading all the matches and then calculating the ranking on the fly will lead to bad performance once we get deeper into the season. That's why we should again use streams.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SOjvo-35--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/3cmosdm5iz0knxk44atx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SOjvo-35--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/3cmosdm5iz0knxk44atx.png" alt="Analysis Pipeline"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Design
&lt;/h3&gt;

&lt;p&gt;We will assume that our first table holds raw data in the following format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|league (pk)|match_id (sk)|first_party|second_party|score|
|Bundesliga |1            |Cologne    |Freiburg    |4:0  |
|Bundesliga |2            |Hoffenheim |Bayer       |2:1  |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We design the leaderboard table to take a format, where we can store multiple leagues and paginate over the participants. We're going with a composite sort key as we want to let the database sort the leaderboard first by score, then by the amount of goals they shot and finally by their name.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|league (pk)|score#goals#name (sk)|score|goals_shot|name (GSI)|
|Bundesliga |003#004#Cologne      |3    |4         |Cologne   |
|Bundesliga |003#002#Hoffenheim   |3    |2         |Hoffenheim|
|Bundesliga |000#001#Bayer        |0    |1         |Bayer     |
|Bundesliga |000#000#Freiburg     |0    |0         |Freiburg  |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the sort key (&lt;code&gt;sk&lt;/code&gt;) is a string we have to zero pad the numbers. Sorting multiple strings containing numbers won't have the same result as sorting plain numbers. Choose the padding wisely and opt for a couple orders of magnitude higher than you expect the score to get. Note that this approach won't work well if your scores can grow indefinitely. If you have a solution to that, please share it and I'll reference you here!&lt;/p&gt;

&lt;p&gt;We're also adding a &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html"&gt;GSI&lt;/a&gt; on the club's name to have better access to a single club's leaderboard entry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analysis
&lt;/h3&gt;

&lt;p&gt;Each time a match result is inserted to the first table, the stream will fire an event for the analysis function. This entry contains the match and its score from which we can derive who gets how many points.&lt;/p&gt;

&lt;p&gt;Based on the clubs' names, we can load the old leaderboard entries. We use these entries to first delete the existing records, then take the existing scores and goals, add the new ones and write the new leaderboard records.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|league (pk)|score#goals#name (sk)    |score|goals_shot|name (GSI)|
|Bundesliga |006#005#Cologne          |6    |5         |Cologne   |
|Bundesliga |003#005#Bayer            |3    |5         |Bayer     |
|Bundesliga |003#002#Hoffenheim       |3    |2         |Hoffenheim|
|Bundesliga |000#000#Freiburg         |0    |0         |Freiburg  |
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;p&gt;As each match results in one or two queries and one or two updates, the time to update the score stays limited.&lt;/p&gt;

&lt;p&gt;When we display the leaderboard it is a good idea to use pagination. That way the user sees an appropriate amount of data and our requests have a limited runtime as well.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dynamodb</category>
      <category>lambda</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
