DEV Community: Michael Bahr

How to Defend Against AWS Surprise Bills

Michael Bahr — Thu, 14 Jan 2021 13:48:22 +0000

This article was first published on bahr.dev.
Subscribe to get new articles straight to your inbox!

Short on time? Set up Budget Alerts in less than 2 minutes.

Got a surprise bill? Here’s how you can contact AWS Support.

Imagine you’ve been running a hobby project in the cloud for the last 6 months. Every month you paid 20 cents. Not enough to really care about. However one morning you notice a surprisingly large transaction of $2700.

Good morning, $2700 AWS bill!

Holy shit...

— Chris Short @ KubeCon (@chrisshort) July 4, 2020

Cloud computing allows us to pay for storage, compute and other services as we use them. Instead of going to a computer shop and buying a server rack, we can use services and get a bill at the end of the month. The downside is however that we can use more than we might have money for. This can be especially tricky with serverless solutions which automatically scale up with the traffic that comes in.

Accidentally leaving an expensive VM running, or having your Lambda functions spiral out of control, may lead to a dreaded surprise bill.

pic.twitter.com/tAqUqCoV9R

— Fernando (@fmc_sea) November 17, 2020

In this article we’ll take a look at how billing works, and what you can do to prevent surprise bills.

Focus On Small Bills

This article focuses on personal or small company accounts with relatively small bills. While a $3000 spike in cost might not be noticeable in a large corporate bill, it can be devastating for a personal account that you run hobby projects on.

There’s No Perfect Solution

Unfortunately there’s no perfect solution to prevent surprise bills. As Corey Quinn explains on his podcastthe AWS billing system can take a couple hours to receive all data, in some cases up to 24 or 48 hours. As a result the Budget Alerts might trigger hours or days after a significant spending happened. Budget alerts are still a great tool to prevent charges that take more than a day or two to accrue, e.g. forgetting an expensive EC2 instance that you used to follow a machine learning workshop.

It’s up to you how much time you want to invest to reduce the risk of surprise bills, but I highly recommend you to take 2 minutes to set up Budget Alerts!

Defense Mechanisms

There are multiple mechanisms that you can apply to defend against surprise bills. The ones we look into include security, alerting, remediating actions and improved visibility.

1. Secure Your Account With Multi Factor Authentication

This is the first thing you should set up when creating a new AWS account.

Hey! You! 👋 Do you own an AWS account?

🚨STOP SCROLLING AND CHECK THIS NOW- is MFA enabled on your root account?

Yes? Cool, carry on 🙋🏻‍♂️

No? ENABLE IT NOW! PLEASE! 🙏🏽

This reminder brought to you by an SA who had two customers with theirs account compromised in a week 🙈

— Karan (@somecloudguy) November 24, 2020

Follow this official guide from AWS to set up multi factor authentication (MFA) for your account. By activating MFA on your account, you add another barrier for malicious attackers.

2. Budget Alerts

This is the second thing you should set up when creating a new AWS account.

Budget Alerts are the most popular way to keep an eye on your spending. By creating a budget alert, you will get a notification e.g. via e-mail which tells you that the threshold has been exceeded. You can further customize notifications through Amazon SNSor AWS Chatbot.

Here’s a short video (52 seconds) that you can follow to create your first Budget Alert. Ryan H Lewis made a longer video with some more context around Budget Alerts, and the many ways you can configure them.

If you’re already using the CDK then the package aws-budget-notifier gets you started quickly.

What amount should you start with?

Start with an amount that’s a bit above your current spending and that you’re comfortable with. If you’re just starting out, $10 is probably a good idea. If you already have workloads running for a few months, then take your average spending and add 50% on top.

I also recommend to set up multiple billing alerts at various thresholds :

The comfortable alert: This is an amount that you’re comfortable spending, but you want to look into the bill over the next days.
The dangerous alert: At this amount, you’re not comfortable anymore, and want to shut down a service as soon as possible. If your comfortable amount is $10, this one might be $100.
The critical alert: At this amount, you want to nuke your account from orbit. With a comfortable amount of $10, this one might be $500. You can attach Budget Actions or pager alerts to this alarm to automatically stop EC2 instances or wake you up at night.

As an addition to predefined thresholds, you can also try out AWS Cost Anomaly Detection.

DANGER - The Orbital Nuke Option

As you can send notifications to SNS, you can trigger a Lambda function that runs aws-nuke which will tear down all the infrastructure in your account. Do not use this on any account that you have production data in. If you want to learn more about this, check out the GitHub repository.

3. Budget Actions

AWS recently announced Budget Actions. This is an extension to Budget Alerts, where you can trigger actions when a budget exceeds its threshold. In addition to sending e-mail notifications, you can now apply custom IAM policies like “Deny EC2 Run Instances” or let AWS shut down EC2 and RDS instances for you as shown below.

4. Mobile App

The AWS Console Mobile Application puts the cost explorer just 3-5 taps away. This way you can check in on your spending with minimal effort.

Below you can see two screens from the mobile app:

To use the app you should set up a dedicated userthat only gets the permissions that the app needs to display your spending.

Here’s an IAM policy that grants read access to the cost explorer as well as cloudwatch alarms.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ce:DescribeCostCategoryDefinition",
                "ce:GetRightsizingRecommendation",
                "ce:GetCostAndUsage",
                "ce:GetSavingsPlansUtilization",
                "ce:GetReservationPurchaseRecommendation",
                "ce:ListCostCategoryDefinitions",
                "ce:GetCostForecast",
                "ce:GetReservationUtilization",
                "ce:GetSavingsPlansPurchaseRecommendation",
                "ce:GetDimensionValues",
                "ce:GetSavingsPlansUtilizationDetails",
                "ce:GetCostAndUsageWithResources",
                "ce:GetReservationCoverage",
                "ce:GetSavingsPlansCoverage",
                "ce:GetTags",
                "ce:GetUsageForecast",
                "health:DescribeEventAggregates",
                "cloudwatch:DescribeAlarms",
                "aws-portal:ViewAccount",
                "aws-portal:ViewUsage",
                "aws-portal:ViewBilling"
            ],
            "Resource": "*"
        }
    ]
}

We can group the permissions into 3 sets:

Cost Explorer Read Access (everything that starts with ce:). These let us get detailed information about or current and forecasted spending.
CloudWatch Alarms Read Access (cloudwatch:DescribeAlarms). This allows you to see if there are any alarms, but doesn’t let you get further than that.
General Access (permissions starting with aws-portal: and health:). These allow you to display the mobile dashboard properly. As far as I understand and tested, without they don’t give you access to the spending details, but without them you can’t show the dashboards.

Please let me know if any of these permissions can be removed.

5. Secrets Manager

If access keys get leaked through public repositories, malicious actors can start expensive EC2 instances in your account and use it for example to mine Bitcoin. There are also reports of instances hidden away in less frequently used regions, small enough that they don’t get noticed in the bill summary.

To keep your code free from access keys or other secrets, you can use the AWS Secrets Manager to store the secrets which your code needs at runtime.

Follow this AWS tutorial to create your first secret. Once you’ve created one, replace the secret from your code base by using one of the official AWS clients (boto3 for Python) to retrieve the secret.

import boto3

client = boto3.client('secretsmanager')

response = client.get_secret_value(SecretId='replace-me')

secret = response['SecretString']

Please note that each secret will cost you $0.40 per month, as well as $0.05 per 10,000 API calls.

Contact Support

If you experienced a surprise bill, stop the apps that cause the high spending, rotate your access keys if necessary and contact AWS support.

Here’s a 20 seconds video which guides you to the support case.

The steps to file a support ticket are:

In the top right click on Support and then select the Support Center
Press the orange button that says Create case
Select Account and billing support
As type select “Billing” and as category select “Payment issue”
Now fill out the details and submit

While there are folks who got their surprise bill reimbursed, please don’t rely on this.

Conclusion

The first thing you should do is set up MFA and Budget Alerts. After that you can look into more advanced operations like Budget Actions to lock down your account if spending spikes.

If your applications use secrets or access keys, you can prevent them from accidentally ending up in your repositories by storing the secrets in the AWS Secrets Manager instead.

Resources

How To Get Random Records From A Serverless Application

Michael Bahr — Thu, 07 Jan 2021 00:00:00 +0000

Some applications need to get random data for providing their customers are good and diversified experience, e.g. for a quiz app. In this article we take a look at three serverless approaches to getting random records from a large and changing set of data.

A serverless mechanism for getting random records should be scalable, support a changing dataset and scale down to zero if not in use.

A great quiz app lets us store millions of questions so that the game stays interesting to our customers. It also allows us to add more questions over time, and remove questions that are outdated.

Keep in mind that true randomness is not always desirable, as that can lead to your user seeing the same record 5 times after another. Keep track of what your user has already seen, and try again if you load a record that they’ve already seen.

Use Cases

Apart from a quiz app, you might need to get random records for

a vocabulary app,
a “wisdom of the day” Twitter bot,
a “picture of the week” calendar,
a Special Sales Deal suggestion,

and many more.

Prerequisites

You need an AWS account and credentials in the environment that you’re running the examples from. You can use AWS CloudShell for this.

To get the most of this article, you should be familiar with one of DynamoDB, S3 or Redis.

Python knowledge, or the ability to translate the examples to other languages is a nice to have.

Offset

In the chapters for DynamoDB and S3 we’re using a random offset. The trick here is that this random offset does not need to exist as a record in the target service. S3 and DynamoDB will take it and scan until they find a record.

In plain English we tell DynamoDB and S3 to start at a certain point, and then keep looking until they find one record.

DynamoDB

DynamoDB is a serverless key-value database that is optimized for transactional access patterns. If the partition key of our table is random within a range (e.g. a UUID), we can combine a Scan operation with a random offset to get a random record on each request.

Tyrone Erasmus pointed me to a Stackoverflow answer, that we’re looking at in more detail below.

I have used the most upvoted answer once or twice on dynamo: https://t.co/OdIWdTWVzI

Wasn't intuitive at first but actually works really well (and only consumes 1 read capacity)

— Tyrone Erasmus (@metall0id) December 25, 2020

In this example we’re using the Python library boto3 for DynamoDB.

First we insert some records that have a UUID as their partition key.

for i in range(100):
  item = {'pk': str(uuid4()), 'text': f'What is {i}+{i}?'}
  table.put_item(Item=item)

There are more exhaustive examples in the following sections.

The second step is to run the Scan operation with a random offset. We use the parameter Limit so that the scan stops after it found one entry, and we use ExclusiveStartKey to pass in a random offset.

table.scan(
    Limit=1,
    ExclusiveStartKey={
        'pk': str(uuid4())
    }
)

In the above example we have a table that has a partition key called pk. Every record in this database has a UUID as their partition key.

By running this command, we read and retrieve exactly one record. While scans are usually considered expensive, but this scan operation only consumes 0.5 capacity units. This is the same amount a get_item operation consumes. You can test this by adding the parameterReturnConsumedCapacity='TOTAL' to the scan operation.

From my tests DynamoDB offers the best price for datasets with heavy usage. If you store a lot of records but only rarely access them, then S3 offers better pricing. More on that in the cost comparison.

Please note that DynamoDB has a size limit of 400 KB per record. If you exceed that, then consider using the S3 or Redis approach.

Fully Random

Here’s a complete Python example to pick a random record from a table called random-table. The example includes writing records and checking for an edge case where we start at the end of the table.

import boto3
from uuid import uuid4

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('random-table')

# Create 100 records with a random partition key
for i in range(100):
    item = {'pk': str(uuid4()), 'text': f"question-{i}"}
    table.put_item(Item=item)
    print(f"Inserted {item}")

# Read 3 records and print them with the consumed capacity
for i in range(3):
    response = table.scan(
        Limit=1,
        ExclusiveStartKey={
            'pk': str(uuid4())
        },
        ReturnConsumedCapacity='TOTAL'
    )
    if response['Items']:
        print({
            "Item": response['Items'][0],
            "Capacity": response['ConsumedCapacity']['CapacityUnits'],
            "ScannedCount": response['ScannedCount']
        })
    else:
        print("Didn't find an item. Please try again.")

Categorized

Many use cases are not fully random, but require some kind of categorization. An example for this is a quiz, where we have the three difficulties ['easy', 'medium', 'difficult'].

In this case, we don’t want to query for a fully random record until we find one that matches the desired category. Instead, we want to achieve the same with one request.

To achieve this we need a different data model. Instead of putting the UUID into the partition key, we use the partition key for the category and add a sort key with the UUID. This may lead to a big partition, but there’s no limit on how many records you can store in a single DynamoDB partition:

In a DynamoDB table, there is no upper limit on the number of distinct sort key values per partition key value. If you needed to store many billions of Dog items in the Pets table, DynamoDB would allocate enough storage to handle this requirement automatically. - DynamoDB documentation about partitions

Here’s an example which expands on the fully random example with categories.

import boto3
from uuid import uuid4
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('random-table-categorized')

categories = ['easy', 'medium', 'difficult']

for category in categories:
    # Create 50 records for each category with a random sort key
    for i in range(50):
        item = {'pk': category, 'sk': str(uuid4()), 'text': f"question-{category}-{i}"}
        table.put_item(Item=item)
        print(f"Inserted {item}")

for category in categories:
    # Read 3 records and print them with the consumed capacity
    for i in range(3):
        response = table.query(
            Limit=1,
            KeyConditionExpression=Key('pk').eq(category) & Key('sk').gt(str(uuid4())),
            ReturnConsumedCapacity='TOTAL'
        )
        if response['Items']:
            print({
                "Item": response['Items'][0],
                "Capacity": response['ConsumedCapacity']['CapacityUnits'],
                "ScannedCount": response['ScannedCount']
            })
        else:
            print("Didn't find an item. Please try again.")

S3

S3 is a serverless object storage. It allows you to store and retrieve any amount of data, and offers industry-leading scalability, availability and performance. It’s also cheaper than fully fledged databases for storage heavy use cases. It does however offer less query flexibility than databases like DynamoDB.

With S3 we take a similar approach to the one used with DynamoDB, which we need two API calls for: One for finding the key of a random object, and one for retrieving that object’s content.

Assuming that there’s a bucket called my-bucket-name with files that each have a UUID as their name, we can use the following approach.

list_response = s3client.list_objects_v2(
    Bucket='my-bucket-name',
    MaxKeys=1,
    StartAfter=str(uuid4()),
)
key = list_response['Contents'][0]['Key']
item_response = s3client.get_object(
    Bucket=bucket_name,
    Key=key
)

With the parameter MaxKeys=1 we tell list_objects_v2 to stop after it found one file. StartAfter is the equivalent to DynamoDB’s ExclusiveStartKey which allows us to pass a random offset. The result of list_objects_v2is a list of object keys, from which we pick the first one and retrieve the object.The result is sorted alphabetically.

Fully Random

Here’s a Python example to pick a random record from a bucket called my-bucket-name. The example includes writing files and checking for an edge case, where we might have started at the end of the bucket.

import boto3
from uuid import uuid4

client = boto3.client('s3')

bucket_name = 'my-bucket-name'

# Create 100 records with a random key
for i in range(100):
    key = str(uuid4())
    client.put_object(Body=f"question={i}".encode(),
                      Bucket=bucket_name,
                      Key=key)
    print(f"Inserted {key}")

# Read 3 records and print them
for i in range(3):
    list_response = client.list_objects_v2(
        Bucket=bucket_name,
        MaxKeys=1,
        StartAfter=str(uuid4()),
    )
    if 'Contents' in list_response:
        key = list_response['Contents'][0]['Key']
        item_response = client.get_object(
            Bucket=bucket_name,
            Key=key
        )
        print({
            'Key': key,
            'Content': item_response['Body'].read().decode('utf-8')
        })
    else:
        print("Didn't find an item. Please try again.")

Categorized

Here’s an S3 example with categories, which we add as a key prefix. What was previously keynow becomes category/key. For list_objects_v2 we need to consider the category in two places. The first one is the Prefix parameter, and the second one is the StartAfter parameter which needs to include the category and the key.

import boto3
from uuid import uuid4

client = boto3.client('s3')

bucket_name = 'my-random-bucket'

categories = ['easy', 'medium', 'difficult']

for category in categories:
    # Create 100 records with a random key
    for i in range(100):
        key = str(uuid4())
        client.put_object(Body=f"question-{category}-{i}".encode(),
                          Bucket=bucket_name,
                          Key=f"{category}/{key}")
        print(f"Inserted {key} for category {category}")

for category in categories:
    # Read 3 records and print them
    for i in range(3):
        start_after = f"{category}/{uuid4()}"
        list_response = client.list_objects_v2(
            Bucket=bucket_name,
            MaxKeys=1,
            Prefix=category,
            StartAfter=start_after
        )
        if 'Contents' in list_response:
            key = list_response['Contents'][0]['Key']
            item_response = client.get_object(
                Bucket=bucket_name,
                Key=key
            )
            print({
                'Key': key,
                'Content': item_response['Body'].read().decode('utf-8'),
            })
        else:
            print("Didn't find an item. Please try again.")

If you omit the Prefix, you might find objects that are outside the selected category. Assuming that the StartAfterparameter is categoryA/object2, and that we don’t provide a Prefix, then our result would be categoryB/object3. If we however include Prefix=categoryA, then categoryB/object3 doesn’t match, and we get an empty result instead.

categoryA/object1
categoryA/object2
categoryB/object3

The list_objects_v2 call always returns an ordered list.

Redis

Redis is an in-memory data store that can be used as a database amongst others. While Redis is not serverless, there are offerings like Lambda Storethat you can use to keep your application fully serverless.

Have you tried redis? It has both RANDOMKEY as well as SRANDMEMBER commands that might be useful here

— Yan Cui is making the AppSync Masterclass (@theburningmonk ) December 25, 2020

The Redis approach suggested by Yan Cui is a lot simpler, because there are built-in command to pick random entries:

RANDOMKEY gets a random key from the currently selected database.
SRANDMEMBER lets us pick one or more random entries from a set, which lets us add categorization.

Redis has a size limit of 512 MB per record.

Fully Random

In the example below, we store unstructured records in our database. We retrieve a random key with the command RANDOMKEY, and then get the value with the GET {key} command.

SET firstKey "Hello world!"
SET secondKey "Panda"

RANDOMKEY
> "secondKey"

GET firstKey
> "Panda"

This approach requires two calls per random record.

Categorized

We can leverage sets to add categories into our data. This approach is very powerful, as the SRANDMEMBER command has an optional parameter with which we can specify how many records we want to retrieve. This comes in handy, if our users should see multiple entries at once.

SADD easy 1 2 3 4 5 6 7 8 9
SADD medium 1 2 3 4 5 6 7 8 9
SADD difficult 1 2 3 4 5 6 7 8 9

SRANDMEMBER easy
> 5
SRANDMEMBER medium 2
> 3,6
SRANDMEMBER difficult 5
> 2,3,6,7,9

This approach only needs one call per random record, or less if you retrieve multiple records at once.

Cost Comparison

In the cost comparison we’re looking at DynamoDB (On-Demand), S3 (Standard), and Lambda Store, because all of them are serverless solutions. All of them have a free tier that lets you test the approaches for free.

In the cost comparison we’re reading from a dataset of one million records, each with a size of 1 KB. This should be enough for a question and some meta information. The total size of this data set is 1 GB. We’re excluding cost for Data Transfer.

In the first table you see the price per single random record, as well as per million.

Service	Single Record	One Million Records
DynamoDB	$0.000000125	$0.125
S3	$0.0000054	$5.4
Lambda Store	$0.000004	$4

With DynamoDB, we’re using eventually consistent reads which are charged at a half read capacity unit. For S3 we need a List and a Get call, which are added together in the table. Lambda Store has a flat price per 100,000 requests. We’re assuming the SRANDMEMBER operation here, as it needs only one request.

In the second table you see the price per 1 GB stored per month.

Service	GB-month
DynamoDB	$0.25
S3	$0.023
Lambda Store	$0.15

Limitations

Lambda Store has a concurrency limit of 20 in the free tier, 1000 in the Standard tier and 5000 in the Premium tier. They do however offer reserved capacity for high throughput use cases. So once you hit these limits, you probably have a working business model that can pay for reserved capacity.

With DynamoDB, you might hit a throughput limitation, even if you’re in On-Demand mode. To work around this, you can let your app retry for a few times, or switch to a high provisioned capacity. Another approach I heard of but didn’t verify is to create the table with a very high provisioned capacity, and then immediately switch back to On-Demand.

Conclusion

Serverless is a great fit for providing random records. The application scales with demand and has a very low price per access.

DynamoDB has the lowest cost and highest flexibility for applications that are similar to a quiz app. The pricing is better for applications that have a few million records, and where those records are read frequently and repeatedly. Imagine 10 million records stored, with 100 million reads each month.

S3 becomes interesting for applications that use significantly more storage , and read infrequently. Imagine 10 billion records stored, and 2 million reads a month.

Lambda Store has the easiest learning curve at a reasonable price. Its Redis database is also the only of the three offerings, that can return multiple random records in one request. Last but not leastLambda Store says on their website that their latency is “submillisecond while the latency is up to 10 msec in DynamoDB”. Check out their full benchmark article.

Resources

Amazon Timestream vs DynamoDB for Timeseries Data

Michael Bahr — Thu, 12 Nov 2020 08:40:57 +0000

This article was first published on bahr.dev. Subscribe to get new articles straight to your inbox!

AWS recently announced that their Timestream database is now generally available. I tried it out with an existing application that uses timeseries data. Based on my experimentation this article compares Amazon Timestream with DynamoDB and shows what I learned.

Timeseries data is a sequence of data points stored in time order. Each timestream record can be extended with dimensions that give more context on the measurement. One example are fuel measurements of trucks, with truck types and number plates as dimensions.

Prerequisites

As this article compares Timestream with DynamoDB, it's good for you to have some experience with the latter. But even if you don't, you can learn about both databases here.

I will also mention Lambda and API Gateway. If you're not familiar with those two, just read them as "compute" and "api".

Use Case

My application monitors markets to notify customers of trading opportunities and registers about 500,000 market changes each day. DynamoDB requires ~20 RCU/WCUs for this. While most of the system is event-driven and can complete eventually, there are also userfacing dashboards that need fast responses.

Below you can see a picture of the current architecture, where a Lambda function pulls data into DynamoDB, another one creates notifications when a trading opportunity appears and an API Gateway that serves data for the user dashboards.

Each record in the database consists of two measurements (price and volume), has two dimensions (article number and location) and has a timestamp.

Testing out Timestream required two changes: An additional Lambda function to replicate from DynamoDB to Timestream, and a new API that reads from Timestream.

Data Format

Let's start by comparing the data format of DynamoDB and Timestream.

DynamoDB holds a flexible amount of attributes, which are identified by a unique key. This means that you need to query for a key, and will get the according record with multiple attributes. That's for example useful when you store meta information for movies or songs.

Timestream instead is designed to store continuous measurements, for example from a temperature sensor. There are only inserts, no updates. Each measurement has a name, value, timestamp and dimensions. A dimension can be for example the city where the temperature sensor is, so that we can group results by city.

Write to Timestream

Timestream shines when it comes to ingestion. The WriteRecords API is designed with a focus on batch inserts, which allows you to insert up to 100 records per request. With DynamoDB my batch inserts were sometimes throttled both with provisioned and ondemand capacity, while I saw no throttling with Timestream.

Below you can see a snapshot from AWS Cost Explorer when I started ingesting data with a memory store retention of 7 days. Memory store is Timestream's fastest, but most expensive storage. It is required for ingestion but its retention can be reduced to one hour.

The write operations are cheap and can be neglected in comparison to cost for storage and reading. Inserting 515,000 records has cost me $0.20, while the in-memory storage cost for all of those records totalled $0.37 after 7 days. My spending matches Timestream's official pricing of $0.50 per 1 million writes of 1KB size.

As each Timestream record can only contain one measurement, we need to split up the DynamoDB records which hold multiple measurements. Instead of writing one record with multiple attributes, we need to write one record per measure value.

Backfilling old data might not be possible if its age exceeds the maximum retention time of the memory store which is 12 months. In October 2020 it was only possible to write to memory store and if you tried to insert older records you would get an error. To backfill and optimize cost you can start with 12 months retention and then lower it once your backfilling is complete.

Read from Timestream

You can read data from Timestream with SQL queries and get charged per GB of scanned data. WHERE clauses are key to limiting the amount of data that you scan because "data is pruned by Amazon Timestream’s query engine when evaluating query predicates" (Timestream Pricing).

The less data makes it through your WHERE clauses, the cheaper and faster your query.

I tested the read speed by running the same queries against two APIs that were backed by DynamoDB (blue) and Timestream (orange) respectively. Below you can see a chart where I mimicked user behavior over the span of an hour. The spikes where DynamoDB got slower than Timestream were requests where computing the result required more than 500 queries to DynamoDB.

DynamoDB is designed for blazing fast queries, but doesn't support adhoc analytics. SQL queries won't compete at getting individual records, but can get interesting once you have to access many different records and can't precompute data. My queries to Timestream usually took more than a second, and I decided to precompute user facing data into DynamoDB.

Dashboards that update every minute or so and can wait 10s for a query to complete are fine with reading from Timestream. Use the right tool for the right job.

Timestream seems to have no limit on query length. An SQL query with 1,000 items in an SQL IN clause works fine, while DynamoDB limits queries to 100 operands.

Timestream Pricing

Timestream pricing mostly comes down to two questions:

Do you need memory store with long retention?
Do you read frequently?

Below you can see the cost per storage type calculated into hourly, daily and monthly cost. On the right hand side you can see the relative cost compared to memory store.

My ingestion experiments with Timestream were quite cheap with 514,000 records inserted daily for a whole month and the cost ending up below $10. This is a low barrier to entry for you to make some experiments. I dropped the memory storage down to two hours, because I only needed it for ingestion. Magnetic store seemed fast enough for my queries.

When I tried to read and precompute data into DynamoDB every few seconds, I noticed that frequent reads can become expensive. Timestream requires you to pick an encryption key from the Key Management Service (KMS), which is then used to decrypt data when reading from Timestream. In my experiment decrypting with KMS accounted for about 30% of the actual cost.

Below you can see a chart of my spending on Timestream and KMS with frequent reads on October 14th and 15th.

Problems and Limitations

Records can get rejected for three reasons:

Duplicate values for the same dimensions, timestamps, and measure names
Timestamps outside the memory's retention store
Dimensions or measures that exceed the Timestream limits (e.g. numbers that are bigger than a BigInt)

Based on my experience with these errors I suggest that you log the errors but don't let the exception bubble up. If you're building historical charts, one or two missing values shouldn't be a problem.

Below you can see an example of how I write records to Timestream with the boto3 library for Python.

import boto3

timestream = boto3.client('timestream-write')

try:
    timestream.write_records(
        DatabaseName='MarketWatch', 
        TableName='Snapshots',
        CommonAttributes={
            'Time': str(int(time())),
            'TimeUnit': 'SECONDS'
        },
        # each chunk can hold up to 100 records
        Records=chunk
    )
except client.exceptions.RejectedRecordsException as err:
    print({'exception': err})
    for rejected in err.response['RejectedRecords']:
        print({
            'reason': rejected['Reason'], 
            'rejected_record': chunk[rejected['RecordIndex']]
        })

Another perceived limitation is that each record can only hold one measurement (name and value). Assuming you have a vehicle with 200 sensors, you could write that into DynamoDB with one request, while Timestream already needs two. However this is pretty easy to compensate and I couldn't come up with a good acceess pattern where you must combine different measurement types (e.g. temperature and voltage) in a single query.

Last but not least, Timestream does not have provisioned throughput yet. Especially when collecting data from a fleet of IoT sensors it would be nice to limit the ingestion to not cause cost spikes that may be caused by a bug in the sensors. In my tests the cost for writing records has been negligible though.

Summary

I moved my timeseries data to Timestream, but added another DynamoDB table for precomputing user facing data. While my cost stayed roughly the same, I now have cheap long term storage at 12% of the previous price.

DynamoDB is faster for targeted queries, whereas Timestream is better for analytics that include large amounts of data. You can combine both and precompute data that needs fast access.

Trying out queries is key to understanding if it fits your use case and its requirements. You can do that in the timestream console with the AWS examples. Beware of frequent reads and monitor your spending.

Try it out

Try out one of the sample databases through the Timestream console or replicate some of the data you write to DynamoDB into Timestream. You can achieve the latter for example with DynamoDB streams.

For some more inspiration, check out the timestream tools and samples by awslabs on GitHub.

Validate Email Workflows with a Serverless Inbox API

Michael Bahr — Thu, 15 Oct 2020 08:37:39 +0000

This article was first published on bahr.dev. Subscribe to get new articles straight to your inbox!

In this article you'll learn how to build a serverless API that you can use to validate your email sending workflows. You will have access to unlimited inboxes for your domain, allowing you to use a new inbox for every test run.

The working code is ready for you to deploy on GitHub.

With AWS services Simple Email Service (SES) and API Gateway we can build a fully automated solution. Its pricing model fits most testing workloads into the free tier, and can handle up to 10,000 mails per month for just $10. No maintenance or development required. It also allows you to stay in the SES sandbox.

Prerequisites

To deploy this solution, you should have an AWS account and some experience with the AWS CDK. I'll be using the TypeScript variant. This article uses CDK version 1.63.0. Let me know if anything breaks in newer versions!

To receive mail with SES you need a domain or subdomain. You can register a domain with Route53 or delegate from another provider. You can also use subdomains like mail-test.bahr.dev to receive mail if you already connected your apex domain (e.g. bahr.dev) with another mailserver.

High-Level Overview

The solution consists of two parts. The email receiver and the api that lets you access the received mail. The first writes to the database, the latter reads from it.

For the email receiver we use SES with Receipt Rules. We use those rules to store the raw payload and attachments in an S3 bucket, and send a nicely formed payload to a Lambda function which creates an entry in the DynamoDB table.

On the API side there's a single read operation which requires the recipient's email address. It can be parameterized to reduce the number of emails that will be returned.

Old emails are automatically discarded with DynamoDB's time to live (TTL) feature, keeping the database small without any maintenance work.

Verify Domain with SES

To receive mail, you must be in control of a domain that you can register with SES. This can also be a subdomain, e.g. if you already use your apex domain (e.g. bahr.dev) for another mailservice like Office 365.

The integration with SES is easiest if you have a hosted zone for your domain in Route53. To use domains from another provider like GoDaddy, I suggest that you set up a nameserver delegation.

Once you have a hosted zone for your domain, go to the Domain Identity Management in SES and verify a new domain. There's also a short video where I verify a domain with SES.

Data Model

We'll use DynamoDB's partition and sort keys to enable two major features: Receiving mail for many aliases and receiving more than one mail for each alias. An alias is the front-part in front-part@domain.com.

partition_key: recipient@address.com
sort_key: timestamp#uuid
ttl: timestamp

By combining a timestamp and a uuid we can sort and filter by the timestamp, while also guaranteeing that no two records will conflict with each other. The TTL helps us to keep the table small, by letting DynamoDB remove old records.

I'm using Jeremy Daly's dynamodb-toolbox to model my database entities.

import { Table, Entity } from 'dynamodb-toolbox';
import { v4 as uuid } from 'uuid';

// Require AWS SDK and instantiate DocumentClient
import * as DynamoDB from 'aws-sdk/clients/dynamodb';
const DocumentClient = new DynamoDB.DocumentClient();

// Instantiate a table
export const MailTable = new Table({
  // Specify table name (used by DynamoDB)
  name: process.env.TABLE,

  // Define partition and sort keys
  partitionKey: 'pk',
  sortKey: 'sk',

  // Add the DocumentClient
  DocumentClient
});

export const Mail = new Entity({
    name: 'Mail',

    attributes: {
      id: { partitionKey: true }, // recipient address
      sk: { 
        hidden: true, 
        sortKey: true, 
        default: (data: any) => `${data.timestamp}#${uuid()}` 
      },
      timestamp: { type: 'string' },
      from: { type: 'string' },
      to: { type: 'string' },
      subject: { type: 'string' },
      ttl: { type: 'number' },
    },

    table: MailTable
  });

The Receiver

SES allows us to set up ReceiptRules which trigger actions when a new mail arrives. There are multiple actions to choose from but we are mostly interested in the Lambda and S3 actions. We use the Lambda action to store details like the recipient, the sender and the subject in a DynamoDB table. With the S3 action we get the raw email deliverd as a file into a bucket. This will be handy to later support more use cases like rerturning the mail's body and attachments.

Below you can see the abbreviated CDK code to set up the ReceiptRules. Please note that you have to activate the rule set in the AWS console. There is currently no high level CDK construct for this and I don't want you to accidentally override an existing rule set. Here's a short video where I activate a rule set.

import * as cdk from '@aws-cdk/core';
import { Bucket } from '@aws-cdk/aws-s3';
import { Table } from '@aws-cdk/aws-dynamodb';
import { Function } from '@aws-cdk/aws-lambda';
import { ReceiptRuleSet } from '@aws-cdk/aws-ses';
import * as actions from '@aws-cdk/aws-ses-actions';

export class InboxApiStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // your-domain.com
    const domain = process.env.INBOX_DOMAIN; 

    const rawMailBucket = new Bucket(this, 'RawMail');

    const table = new Table(this, 'TempMailMetadata', {
        ...
    });

    const postProcessFunction = new Function(this, 'PostProcessor', {
        ...
        environment: {
            'TABLE': table.tableName,
        }
    });
    table.grantWriteData(postProcessFunction);

    // after deploying the cdk stack you need to activate this ruleset
    new ReceiptRuleSet(this, 'ReceiverRuleSet', {
      rules: [
        {
          recipients: [domain],
          actions: [
            new actions.S3({
              bucket: rawMailBucket
            }),
            new actions.Lambda({
              function: postProcessFunction
            })
          ],
        }
      ]
    });
  }
}

With the above CDK code in place, let's take a look at the Lambda function that is triggered when a new mail arrives.

import { SESHandler } from 'aws-lambda';
// the model uses dynamodb-toolbox
import { Mail } from './model';

export const handler: SESHandler = async(event) => {

    for (const record of event.Records) {
        const mail = record.ses.mail;

        const from = mail.source;
        const subject = mail.commonHeaders.subject;
        const timestamp = mail.timestamp;

        const now = new Date();
        // set the ttl as 7 days into the future and 
        // strip milliseconds (ddb expects seconds for the ttl)
        const ttl = now.setDate(now.getDate() + 7) / 1000;

        for (const to of mail.destination) {
            await Mail.put({
                id: to, timestamp,
                from, to,
                subject, ttl
            });
        }
    }
}

The function above maps the SES event into one record per recipient and store them together with a TTL attribute in the database. You can find the full source code on GitHub.

Now that we receive mail directly into our database, let's build an API to access the mail.

The Read API

The Read API consists of an API Gateway and a Lambda function with read access to the DynamoDB table. If you haven't built such an API before, I recommend that you check out Marcia's video on how to build serverless APIs.

Below you can see the abbreviated CDK code to set up the API Gateway and Lambda function. You can find the full source code on GitHub.

import * as cdk from '@aws-cdk/core';
import { LambdaRestApi } from '@aws-cdk/aws-apigateway';
import { Table } from '@aws-cdk/aws-dynamodb';

export class InboxApiStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const table = new Table(this, 'TempMailMetadata', {
        ...
    });

    const apiFunction = new Function(this, 'ApiLambda', {
        environment: {
            'TABLE': table.tableName,
        }
    });
    table.grantReadData(apiFunction);

    new LambdaRestApi(this, 'InboxApi', {
      handler: apiFunction,
    });
  }
}

API Gateway is able to directly integrate with DynamoDB, but to continue using the database model I built with dynamodb-toolbox I have to go through a Lambda function. I also feel more comfortable writing TypeScript than Apache Velocity Templates.

With the lambda function below, we load mails for a particular recipient and can filter to only return mails that arrived after a given timestamp.

import { APIGatewayProxyHandler } from 'aws-lambda';
// the model uses dynamodb-toolbox
import { Mail } from './model';

export const handler: APIGatewayProxyHandler = async(event) => {
    const queryParams = event.queryStringParameters;
    const recipient = queryParams?.recipient;
    if (!recipient) {
        return {
            statusCode: 400,
            body: 'Missing query parameter: recipient'
        }
    }
    const since = queryParams.since || '';
    const limit = +queryParams.limit || 1;

    const mails = (await Mail.query(
        recipient,
        {
            beginsWith: since,
            limit,
        }
    )).Items;

    return {
        statusCode: 200,
        body: JSON.stringify(mails)
    }
}

After deploying the read API, you can run a GET request which includes the recipient mail as the recipient query parameter. You can further tweak your calls by providing a since timestamp or a limit that is great than the default 1.

For example if you are sending an order confirmation to random-uuid@inbox-api.domain.com, then you need to run GET request against https://YOUR_API_ENDPOINT/?recipient=random-uuid@inbox-api.domain.com.

Limitations and Potential Improvements

While the SES sandbox restricts how many emails you can send, there seems to be no limiation about receiving mail.

Our solution is not yet capable of providing attachments or the mail body. The SES S3 action already stores those in a bucket which can be used for an improved read API function.

We could also drop the Lambda function that ties together the API Gateway and DynamoDB, by replacing it with a direct integration between the two services.

Try it Yourself

Check out the source code on GitHub. There's a step by step guide for you to try out this solution.

Point Multiple Subdomains To The Same Frontend

Michael Bahr — Thu, 03 Sep 2020 07:44:11 +0000

This article was first published on bahr.dev.
Signup for the mailing list and get new articles straight to your inbox!

Back in 2019 I built an online ticketshop for sports clubs. In its core, the shop was a webapp that processes payments and sends PDF via email. When it came to customization, things got tricky: Each club had a different name, different pictures, and sometimes even different questions they wanted to ask their customers. To give each of the clubs a customized experience, we provided each of them with their own subdomain. Eventually there were six different frontend deployments, multiple branches and the code bases started to diverge. Recently I learned that you can use DNS ARecords to route all requests under a certain domain to the same frontend. Thanks to DongGyun!

This article explains how you can point multiple subdomains to the same frontend deployment by creating DNS records and a static website with the AWS Cloud Development Kit (CDK). That will enable you to give each of your customers a customized experience, while having just one frontend deployment.

Shortcut: If you don't need Infrastructure as Code (IaC), then an ARecord in Route 53 with *.yourdomain.com that points to your existing CloudFront distribution gets you the same result.

The magic is in the chapter "Wildcard Routing". Check out the full source code on GitHub.

Prerequisites

To deploy the solution of this article, you should have an AWS account and some experience with the AWS CDK. It's also good to have an unused domain registered in Amazon Route 53, but we will learn how to use other providers and used domains as well.

This article uses CDK version 1.60.0. Let me know if anything breaks in newer versions!

Please bootstrap your account for CDK by running cdk bootstrap. We will need this for the DnsValidatedCertificate.

Optional: Understanding how DNS and especially nameservers work will help you a lot with troubleshooting potential routing issues.

The Solution

Let's find a solution by putting us in the customers shoes. As a customer I want to go to bear.picture.bahr.dev or forest.picture.bahr.dev or any other address in the format *.picture.bahr.dev and then see a picture for the word in the beginning. As a developer I want the least amount of complexity possible. Multiple frontend deployments increase complexity.

The request flow would look like this:

You can see above that only the domain changes, but nothing else. At the core of the solution are wildcard ARecords which let us route traffic for any subdomain to a particular target. The website can then take the URL, extract the subdomain and ask for the right picture. In the next chapter we will take a look at each part in detail.

1. Create A Hosted Zone

To register DNS records in AWS, we need to create a Hosted Zone in Route 53. Each Hosted Zone costs $0.50 per month.

The Hosted Zone is easiest to set up if you have a domain that is managed by Route 53 and that you don't use for anything else yet.

We will also look at how you can set up your Hosted Zone if you are already using your Route 53 domain for another purpose (e.g. your blog) or if that domain is managed by a different provider than Route 53.

Depending on who manages your domain (e.g. Route 53 or GoDaddy) and if you already use your apex domain for other websites, you have to tweak the solution a bit. In my example, I already use my apex domain bahr.dev for my blog, and have the domain managed by GoDaddy. We will see how to specify the right records there in the following chapters.

Warning: Before deleting hosted zones, please make sure you delete all related records in the root hosted zone or third party provider. Dangling CNAME and NS records might allow an attacker to serve content in your name.

1.1. Fresh Domain That Is Managed By Route 53

This is the easiest path. All we need is a Hosted Zone for our domain.

import { HostedZone } from '@aws-cdk/aws-route53';

...
const domain = `bahr.dev`;

const hostedZone = new HostedZone(this, "HostedZone", {
    zoneName: domain
});

Route 53 can now serve DNS records for that domain.

1.2. Used Domain That Is Managed By Route 53

This assumes that you already have a Hosted Zone for your apex domain, use your apex domain for something different and want to use a subdomain instead. An apex domain is your top level domain, e.g. bahr.dev or google.com.

We need to tell the DNS servers that information about the subdomain is in another Hosted Zone and do this by creating a ZoneDelegationRecord.

import { HostedZone } from '@aws-cdk/aws-route53';

...

// bahr.dev is already in use, so we'll start 
// at the subdomain picture.bahr.dev
const apexDomain = 'bahr.dev';
const domain = `picture.${apexDomain}`;

// as above we create a hostedzone for the subdomain
const hostedZone = new HostedZone(this, "HostedZone", {
    zoneName: domain
});
// add a ZoneDelegationRecord so that requests for *.picture.bahr.dev 
// and picture.bahr.dev are handled by our newly created HostedZone
const nameServers: string[] = hostedZone.hostedZoneNameServers!;    
const rootZone = HostedZone.fromLookup(this, 'Zone', { 
  domainName: apexDomain 
});
new ZoneDelegationRecord(this, "Delegation", {
    recordName: domain,
    nameServers,
    zone: rootZone,
    ttl: Duration.minutes(1)
});

A low time to live (TTL) allows for faster trial and error as DNS caches expire quicker. You should increase this as you make you get ready for production.

We will later add ARecords, so that requests to picture.bahr.dev and *.picture.bahr.dev go to the same CloudFront distribution. bahr.dev will not be affected.

1.3. Domain Is Managed By A Provider Other Than AWS

Again we will create a Hosted Zone in Route 53, but this time we need manual work to register the nameservers of our Hosted Zone with our DNS provider. To get started, first create a Hosted Zone through the AWS console.

This will give us a Hosted Zone with two entries for Nameservers (NS) and Start Of Authority (SOA). We will copy the authoritative nameserver, and tell our DNS provider to delegate requests to our Hosted Zone in AWS.

Copy the authoritative nameserver from the SOA record, go to your DNS provider and create a nameserver record, where you replace the values for Name and Value:

Type: NS
Name: picture
Value: ns-1332.awsdns-38.org

Use a specific value like picture if you want to start at a subdomain like *.picture.bahr.dev or use @ if you want to use your apex domain like *.bahr.dev.

Then use the following CDK snippet to import the Hosted Zone that you created manually.

import { HostedZone } from '@aws-cdk/aws-route53';

...

const domain = `picture.bahr.dev`;

const hostedZone = HostedZone.fromLookup(this, 'HostedZone', { 
  domainName: domain 
});

2. Certificate

Now that we have DNS routing set up, we can request and validate a certificate. We need this certificate to serve our website with https.

With the CDK we can create and validate a certificate in one command:

import { DnsValidatedCertificate, ValidationMethod } from "@aws-cdk/aws-certificatemanager";

...

const certificate = new DnsValidatedCertificate(this, "Certificate", {
    region: 'us-east-1',
    hostedZone: hostedZone,
    domainName: this.domain,
    subjectAlternativeNames: [`*.${this.domain}`],
    validationDomains: {
        [this.domain]: this.domain,
        [`*.${this.domain}`]: this.domain
    },
    validationMethod: ValidationMethod.DNS,
});

There's a lot going on here, so let's break it down.

First we set the region to us-east-1, because CloudFront requires certificates to be in us-east-1.

We then use the CDK construct DnsValidatedCertificate which spawns a certificate request and a lambda function to register the CNAME record in Route 53. That record is used for validating that we actually own the domain.

The parameter hostedZone specifies which Hosted Zone the certificate shall connect with. This is the Hosted Zone we created before.

domainName and subjectAlternativeNames specify which domains the certificate should be valid for. The remaining parameters configure the validation process.

3. Frontend Deployment

With the certificate in place, we can create a Single Page Application (SPA) deployment via S3 and CloudFront. We're using the npm package cdk-spa-deploy to shorten the amount of code required for configuring the S3 bucket and attaching a CloudFront distribution.

import { SPADeploy } from 'cdk-spa-deploy';

...

const deployment = new SPADeploy(this, 'spaDeployment')
    .createSiteWithCloudfront({
        indexDoc: 'index.html', 
        websiteFolder: './website', 
        certificateARN: certificate.certificateArn, 
        cfAliases: [this.domain, `*.${this.domain}`]
    });

The index.html can be an HTML file as short as <p>Hello world!</p> and should be stored in the folder ./website.

In the browser we can use JavaScript to get the subdomain. The line of code below splits the URL ice.picture.bahr.dev into an array ['ice', 'picture', 'bahr', 'dev'] and then picks the first element 'ice'.

const subdomain = window.location.host.split('.')[0];

With that information, the website can then contact the CMS to get the right assets for your customer.

4. Wildcard Routing

And finally it's time for the wildcard routing. With the CDK code below, all requests to *.picture.bahr.dev and picture.bahr.dev will be routed to the frontend deployment we set up above.

import { CloudFrontTarget } from "@aws-cdk/aws-route53-targets";
import { ARecord, RecordTarget } from '@aws-cdk/aws-route53';

...

const cloudfrontTarget = RecordTarget.fromAlias(new CloudFrontTarget(deployment.distribution));

new ARecord(this, "ARecord", {
    zone: hostedZone,
    recordName: `${this.domain}`,
    target: cloudfrontTarget
});

new ARecord(this, "WildCardARecord", {
    zone: hostedZone,
    recordName: `*.${this.domain}`,
    target: cloudfrontTarget
});

Once all the DNS records have propagated, we can test our setup. Please note that deploying the whole solution sometimes takes 10 to 15 minutes.

Try It Yourself

Here's the full CDK code that you can copy into your existing CDK codebase.

I suggest that you start with checking out the source code and adjust the domain and Hosted Zone to your needs. Add a ZoneDelegationRecord if you need it. Make sure to run cdk bootstrap if you haven't done that yet.

import * as cdk from '@aws-cdk/core';
import { SPADeploy } from 'cdk-spa-deploy';
import { DnsValidatedCertificate, ValidationMethod } from "@aws-cdk/aws-certificatemanager";
import { CloudFrontTarget } from "@aws-cdk/aws-route53-targets";
import { HostedZone, ARecord, RecordTarget } from '@aws-cdk/aws-route53';

export class WildcardSubdomainsStack extends cdk.Stack {

  private readonly domain: string;

  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const domain = `picture.bahr.dev`;

    const hostedZone = new HostedZone(this, "HostedZone", {
      zoneName: domain
    });

    const certificate = new DnsValidatedCertificate(this, "Certificate", {
      hostedZone,
      domainName: this.domain,
      subjectAlternativeNames: [`*.${this.domain}`],
      validationDomains: {
        [this.domain]: this.domain,
        [`*.${this.domain}`]: this.domain
      },
      validationMethod: ValidationMethod.DNS
    });

    const deployment = new SPADeploy(this, 'spaDeployment')
        .createSiteWithCloudfront({
            indexDoc: 'index.html', 
            websiteFolder: './website', 
            certificateARN: certificate.certificateArn, 
            cfAliases: [this.domain, `*.${this.domain}`]
        });

    const cloudfrontTarget = RecordTarget
        .fromAlias(new CloudFrontTarget(deployment.distribution));

    new ARecord(this, "ARecord", {
      zone: hostedZone,
      recordName: `${this.domain}`,
      target: cloudfrontTarget
    });

    new ARecord(this, "WildCardARecord", {
      zone: hostedZone,
      recordName: `*.${this.domain}`,
      target: cloudfrontTarget
    });
  }
}

Now run AWS_PROFILE=myProfile npm run deploy to deploy the solution. Replace myProfile with whatever profile you're using for AWS. Here's more about AWS profiles.

The deployment may take somewhere between 10 and 15 minutes. Grab a coffee and let CDK do its thing. If you run into problems, check the troubleshooting section below.

Once the deployment is done, you should be able to visit any subdomain of the domain you specified (e.g. bear.picture.bahr.dev for the domain picture.bahr.dev) and see your website.

Troubleshooting

The DNS routing doesn't work.

A high time to live (TTL) on DNS records can make changes difficult to test. Try to lower the TTL as far as possible.

If your domain is not managed by Route 53, make sure that the DNS routing from your DNS provider is set up correctly.

If you use your apex domain for something else, make sure to set up a ZoneDelegationRecord that redirects traffic for your subdomain to your new Hosted Zone.

The deployment failed to clean up.

Depending on which step the deployment fails, not all resources can be cleaned up. This is most likely due to the CNAME record that the lambda function of the DnsValidatedCertificate created. Go to the Hosted Zone, remove the CNAME record and delete the stack by running cdk destroy or deleting it through the AWS console's CloudFormation service.

Failed to create resource. Cannot read property 'Name' of undefined

Clean up the stack, remove and redeploy it. I'm not sure where that error comes from, but retrying fixed it for me.

The certificate validation times out.

Make sure you are using the right approach so that the required CNAME record will be visible to the DNS servers. If you've used your domain before, set up the right ZoneDelegationRecord. This can be a bit tricky so feel free to reach out to me on Twitter.

Next Steps

Check out the full source code and try it yourself! If you'd like to contribute, a PR to cdk patterns is probably a good idea.

Archive your AWS data to reduce storage cost

Michael Bahr — Mon, 10 Aug 2020 08:41:34 +0000

This article was first published on bahr.dev.
Signup for the mailing list and get new articles straight to your inbox!

AWS offers a variety of general purpose storage solutions. While DynamoDB is the best option when latency and a variety of access patterns matter most, S3 allows for cost reduction when access patterns are less complex and latency is less critical.

This article describes the available options for archiving data, how to prepare that data for long term archival and how to let S3 transition data between storage tiers.

Below you find a table comparing the prices and access latencies as of August 2020.

All prices are for us-east-1. This article focuses on storage cost only.

Prerequisites

You should have an AWS account and gained first experience with DynamoDB or S3. The code snippets are written in Python and are intended to run on AWS Lambda.

Moving Data

As you've seen in the previous table, you can achieve significant storage cost reduction, by moving your data to a cheaper storage solution.

There are 3 major paths when archiving data:

DynamoDB to S3
S3 Storage Tiers
Final Archival with S3 Glacier

The first path requires a lambda function, and the others can be achieved without additional glue code. We will however look at data aggregation for small objects, as infrequent access solutions are less suitable for small objects.

DynamoDB to S3

When to move data from DynamoDB to S3

Moving data out of DynamoDB makes sense when that data is becoming stale, but remains interesting for future use cases.

An example for this are performance metrics. We're most interested in the recent weeks, but don't look at data from months ago too much. We still want to keep those around for later analysis or troubleshooting.

How to move data from DynamoDB to S3

To move data from DynamoDB to S3, we can use DynamoDB's Time to Live (ttl) feature in combination with event streams. This approach requires four steps:

Specifying a ttl attribute on a DynamoDB table
Adding a timestamp to records which shall expire
Activating a stream that DynamoDB will emit deleted records to
Attaching a lambda to this stream, which checks for DELETE events and writes the records into an S3 bucket

The first three steps are covered in my article How to analyse and aggregate data from DynamoDB.

The lambda function to transition records to S3 can be as short as the following snippet:

from datetime import datetime
import boto3
s3 = boto3.resource('s3')

def handler(event, context):
    for record in event.get('Records', []):
        if record['eventName'] != 'DELETE':
            return

        payload = record['dynamodb']['NewImage']
        # this assumes that there is a partition key called Id 
        # which is a number, and that there is no sort key
        key = record['dynamodb']['Keys']['Id']['N']

        date = datetime.now().isoformat()

        object_key = f"data-from-dynamodb/{date}/{key}"
        body = json.dumps(payload).encode()
        s3.Object('my_bucket', object_key).put(Body=body)

This will store deleted records in the S3 bucket my_bucket. No data is lost, the DynamoDB table stays small and you get an instant 90% cost reduction on storage.

S3 Storage Tiers

If your data is accessed infrequently you can achieve further cost savings by picking the right storage tier.

S3 Lifecycle Transitions allow us to move objects between storage tiers without them leaving the S3 bucket. We define rules where we specify which storage tier an object shall be moved to once it reaches a certain age.

You can read all about the S3 lifecycle transitions on the official AWS documentation.

When to move data between S3 tiers

S3 Standard gives you the most reliability and fastest access speed. If you're okay with 99.9% availability or you only access your data rarely (e.g. for regulatory checks), then the non-standard tiers can give you a cost advantage. The durability of your data is not affected (unless you pick the One-AZ tier).

You also should aggregate data before moving it to a storage tier other than S3 Standard or S3 Intelligent-Tiering, as there is a minimum capacity charge per object. As a rule of thumb, aggregate your objects until the result is at least 1MB.

How to move data between S3 tiers

Due to the minimum capacity charge we will start by aggregating data. If all of your objects in S3 are already 1MB or more, then you can skip directly to the lifecycle rules. To aggregate objects we can use any compute service (EC2, Fargate, Lambda) to load objects from S3, aggregate them and write the aggregated data back.

import simplejson as json
import boto3
s3 = boto.client('s3')

one_mb = 1024 * 1024
bucket = 'my_bucket'
date_prefix = 'data-from-dynamodb/2020-07'

print("Downloading data")

objects = []
files_response = s3.list_objects(Bucket=bucket, Prefix=date_prefix)
for obj_info in files_response.get('Contents', []):
    key = obj_info['Key']
    obj = s3.Object(bucket, key).get()
    data = json.loads(obj['Body'].read().decode('utf-8'))
    objects.append({'key': key, 'data': data})

print("Aggregating data")

aggregated_objects = []
size = 0
aggregator = {}
for obj in objects:
    aggregator[obj['key']] = obj['data']
    size += len(obj['data'])

    if size > one_mb:
        body = json.dumps(obj['data']).encode()
        s3.put_object(Body=body, Bucket=bucket, Key=f"aggregated/{date}")
        size = 0
        aggregator = {}

body = json.dumps(obj['data']).encode()
s3.put_object(Body=body, Bucket=bucket, Key=f"aggregated/{date}")

print("Deleting data")

for obj_info in s3.list_objects(Bucket=bucket, Prefix=date_prefix).get('Contents', []):
    s3.delete_object(Bucket=bucket, Key=obj_info['Key'])

print("Done")

This code snippet uses boto3 and loads all records from the folder data-from-dynamodb/2020-07, aggregates them, deletes the old data and uploads the new data into the folder aggregated.

Now that we've packaged our objects, let's continue with lifecycle transitions. S3 can be configured to automatically move objects between storage tiers.

In this article we will configure the lifecycle transitions through the AWS console. You can also use the CDK's LifecycleRules and Transitions to build an Infrastructure as Code solution.

To get started, open your S3 bucket in the AWS console and open the Management tab. Click on "Add lifecycle rule" to configure a lifecycle. By applying the lifecycle rule to the folder aggregated, we only transition data which has been packaged for archival.

Specify a Transition to Standard-IA (Infrequent Access) after 30 days. We're assuming here that data will be archived and therefore infrequently accessed, but you can increase this number however you like or pick another storage tier.

Review and complete the lifecycle rule.

After 30 days you should start see in your bill that some objects are now priced at a less expensive storage tier. If you picked S3 Infrequent Access, that's another 45% you save for storage.

Data on Ice with S3 Glacier

While we're only looking at Glacier here, you can apply the same principles for moving data to Intellingent-Tiering, One Zone-IA and Glacier Deep Archive.

When to move data to Glacier

S3 Glacier and S3 Glacier Deep Archive become interesting options, when you need to store data for a very long time (+5 years) and only access it very rarely (1-2 a year or less).

How to move data to Glacier

As we've previously aggregated our data, we can add additional lifecycle transitions to move the data from S3 Infrequent Access to S3 Glacier. Instead of the Infrequent Access tier, now pick a Glacier option and adjust the time before transition accordingly.

That's it. Your data is now on ice and we get an additional 68% cost reduction on storage.

How to retrieve data from Glacier

To access data that is stored in Glacier, you have to restore a copy of the object. The copy will be available for as long as you specified. The retrieval however can take up to 12 hours.

Next Steps

Do you have big DynamoDB tables? Figure out what data you can archive, and start moving it to S3 for a 90% cost reduction!

Do you already have data in S3? Add a lifecycle transition to a lower tier and aggregate objects if needed.

Did you enjoy this artcile? Signup for the mailing list and get new articles like this straight to your inbox!

How to pick the right Compute Savings Plan for Serverless Workloads on AWS

Michael Bahr — Tue, 30 Jun 2020 00:00:00 +0000

Compute Savings Plans are a flexible approach to lowering your AWS bill by committing to an hourly spending. Finding the right commitment can however be tricky when we consider free tiers, varying workloads, available budgets and your plans for the next years. This article describes the available options for serverless workloads, how to pick the right plan and how to improve on existing plans.

Prerequisites

This article applies to you, if you will spend at least $30 per month on AWS Lambda or Fargate for at least 1 year. Below this amount it is likely that you will overpay. AWS starts to give recommendations only once you exceed $0.10 per hour or $72 per month.

To use this guide effectively, you should have enabled the Cost Explorer for at least 2 months.

This article focuses on a single account. If you use AWS Organizations you can still apply what you learn here, but check the docs regarding multi account setups:

The Savings Plans will first apply to usage in the account that owns the plan, and then apply to usage in other accounts in the AWS Organization.

To simplify this article, I will ignore EC2 and DynamoDB. Have a look at DynamoDB Reserved Capacity and AWS’s Guide to Savings Plans if you’re curious about lower prices for DynamoDB and EC2.

Disclaimer

Numbers in this article might be wrong, as they are based on my understanding of public documentation. Start with a small savings plan and later add additional ones to avoid overpay.

How do Savings Plans save money?

With a Savings Plan you pay for compute before you use it, and in exchange your rates for that purchase are lowered by up to 17% for Lambda and up to 52% for Fargate. In this chapter we will refer to hourly commitment as prepaid compute.

Once purchased, a Savings Plan gives you hourly packages of prepaid compute that compute services such as Lambda and Fargate can use. Any additional compute above the prepaid amount is priced at regular On-Demand rates.

When the compute usage matches the prepaid amount, you don’t generate any additional spending beyond what you already paid for. Achieving this perfect fit is difficult as compute usage tends to vary in serverless environments.

When your services use more compute than what you already prepaid for, additional compute will be charged at On-Demand rates. This happens automatically and there’s nothing more you need to do.

When your services use less compute than what you prepaid for, the remaining amount is not transferred into the next hour, but discarded. You pay the commitment no matter if you use it or not. This may be the case if you have irregular workloads (e.g. nightly jobs) or the free tier covers a lot of your spending.

The Savings Plan will show up as a reduction of compute spending on your AWS bill.

Below you see a picture of my recent compute spending (filtered to Lambda). While the first few days don’t incur any spending due to the free tier, the following days show spending and also savings (red bar in the negatives). My Savings Plan covers roughly $0.6 each day.

Note that this does not mean that I save $0.6 each day. Instead I’ve bought the compute in advance and back then got a 17% discount compared to On-Demand. Now that prepaid compute lowers my daily spending.

What options are there?

When picking a savings plan, there are two major questions to answer: For how long do you commit and when do you want to pay?

You can pick a term length of 1 or 3 years. The longer, the better your rates for Fargate. For Lambda the term length doesn’t seem to matter in terms of pricing. As for the payment options, you can choose between “No upfront”, “Partial upfront” and “All upfront”. The more upfront, the better your rates.

What do these upfront terms mean?

No upfront: Monthly charges
Partial upfront: 50% when you buy the savings plan, the rest as monthly charges
All upfront: One payment when you buy the plan

Here is an overview of the possible savings rates for us-east-1 as of 2020-06-28. Please check the rates for your region before making any purchase.

Lambda

Payment Option	Term Length	Savings over On-Demand
No upfront	1/3 year(s)	12%
Partial upfront	1/3 year(s)	15%
All upfront	1/3 year(s)	17%

Fargate

Payment Option	Term Length	Savings over On-Demand
No upfront	1 year	20%
Partial upfront	1 year	25%
All upfront	1 year	27%
No upfront	3 years	45%
Partial upfront	3 years	50%
All upfront	3 years	52%

How do I pick the right plan?

If your compute spending is above $0.10 per hour AWS will give you recommendations for Savings Plans which also account for variable usage patterns. Have a look at those numbers before doing your own calculations. My spending was too low for any recommendations.

The key to picking the right savings plan is to understand your recent and future spending. Once you know how much you regularly spend, I suggest that you pick a small amount to start with. As a rule of thumb you can pick 50% of your hourly spending. Only once you have data on how well the small savings plan works should you consider committing to more.

Understand your current spending

Start by opening the AWS Cost Explorer. If you haven’t used it before, enable the Cost Explorer now.

In the top of the graph select Daily as the granularity, and filter the services to only show “Lambda” and “EC2 Container Service” (= Fargate). If you can select Hourly that’s even better, but be aware that Hourly granularity incurs additional charges.

“The Hourly commitment is the Savings Plans rate, and not the On-demand spend. (AWS Docs)”. This means that when you see an hourly spending of $1, the according hourly commitment with a savings rate of 17% is $1 * (1-17%) = $0.83.

Let’s assume a minimum Lambda spending of $1 per hour or $720 per month. We’re looking for a 1 year commitment. All Upfront gives us the best rate of 17% for Lambda. With an hourly commitment of $0.83 this brings us to a single payment of $0.83 x 24 x 30 x 12 = $7,171. As a result we save $1,469 compared to On-Demand rates ($8,640).

When picking Partial Upfront, we will commit to $1 * (1-15%) = $0.85, pay $3,672 upfront and then pay another $612 every month. With 15% savings on compute, we save $1,296 compared to On-Demand rates.

As you can see in the picture above, my spending for compute was roughly $1 per day (not hour), but I had 6 days that were covered by the free tier. A savings plan barely made sense.

Savings Planner

The savings planner helps you find the right savings rate. You upload your recent cost reports, enter your savings rates as well as an hourly commitment. The website then tells you how much it expects you to save.

I bought a savings plan and when I compared the utilization report on AWS, the numbers were even better than expected by the savings planner.

The website is frontend only (no data sent to any server) and is open source on GitHub.

Free tier consideration

If the free tier doesn’t last a single day of your cost, then you can skip this section.

With a compute spending of $30 per month or $0.04 per hour we could think that a savings plan of $0.04 per hour makes sense. If we have days covered by the free tier however, we will overcommit by $0.04 per hour or $0.96 on each of those days. From my understanding this means that on the first 6 days, I would pay $0.04 x 24 x 6 = $5.76 more than with On-Demand pricing and on the remaining 24 days I would save 24 x 24 x $0.04 x 12% = $2.76. In total that would put me at an overpay of $3.

You can use the following formula to determine if a certain commitment makes sense for you:

hourly_commitment * 24 * days_not_covered_by_free_tier * savings_rate - hourly_commitment * 24 * days_covered_by_free_tier = total_savings

Start small and iterate

Do not just average your spending onto an hourly level and pick that as your hourly commitment. If you’re not spending the same on every single hour, then I suggest you rather start with a smaller savings plan and purchase another one when you see that there’s still room for improvement.

While we pick an hourly commitment based on our previous spending, we will commit on future spending. If you’re not sure that you’ll be spending that amount for the next 1 or 3 years, then please take caution when considering a savings plan.

Assume you have the following spending for every month:

Based on the overly simplistic formula hourly_commitment * 24 * days_with_spending_above_commitment * savings_rate - hourly_commitment * 24 * days_with_spending_below_commitment we could save $5 by committing to $0.04 per hour ($0.96 per day) with an All Upfront payment.

Once we apply that savings plan, our remaining spending looks like this:

Adding another $0.04 savings plan would now put us at an additional -$0.72 of savings. Tha’s not savings but overcommitment! Therefore we should either not buy a second savings plan or consider a lower hourly commitment.

The savings planner can help you finding the right amount based on cost exports.

Keep in mind that you can always buy additional savings plans, but revoking existing ones is not possible. Start small and iterate!

How do I buy a savings plan?

You understood how savings plans work and what hourly commitment makes sense for you? Good! Now go to the cost management in the AWS Console. In the lefthand navbar click on Purchase Savings Plan. Here you can pick a term length of 1 or 3 years, specify your hourly commitment and decide if you want to pay all upfront, partial upfront or no upfront.

Add the savings plan to your cart and carefully review the order:

Did I pick the right term length?
Will I use up the hourly commitment or will I overpay?
Am I comfortable with an upfront payment?

Once you’re ready to purchase, click on Submit order.

That’s it. You will now get reduced rates for your compute usage. Make sure to check the utilization report over the next days.

If you selected All or Partial Upfront, you will soon see a big spike in the Cost Explorer. Don’t worry, that’s just the upfront payment for the savings plan.

So how much am I actually saving?

After a couple days you can visit the Utilization report. This page will show you how much you save with your savings plan and if there is any overcommitment.

In the example above, we’re at always at 100% utilization, except for a few days in the beginning of each month where the free tier applies. At the end we’re still saving about 15% on our compute bill.

Over the next days you should also notice a little drop in the Cost Explorer for spending related to Lambda and Fargate as prepaid compute is automatically applied. I suggest that you wait for a couple weeks before you consider purchasing another savings plan. Use the same approach that we followed above, but keep in mind that your spending has now lowered and you should only factor in spending since the date of the most recent savings plan purchase in the analysis for your next savings plan.

Have a look at what the official documentation says about monitoring savings plans.

Resources

Measuring Performance with CloudWatch Custom Metrics and Insights

Michael Bahr — Mon, 27 Apr 2020 00:00:00 +0000

This article focuses on serveless technologies such as AWS Lambda and CloudWatch.

tl;dr: CloudWatch Insights is great if you can log JSON and only consider the last few weeks, otherwise I suggest asynchronous log analysis with a detached lambda function.

In a previous article, I explained how you can use CloudWatch Custom Metrics to monitor an application’s health. In this article we will look at the serverless scheduler, and use custom metrics to monitor the performance of its most critical component.

The serverless scheduler solves the problem of ad hoc scheduling with a serverless approach. This type of scheduling describes irregular point in time invocations, e.g. one in 32 hours and another one in 4 days. While scale is usually not a problem with serverless technologies, keeping the precision high can become a challenge.

The serverless scheduler accepts payloads with a date about when they shall be sent back. It uses SQS to prepare events that are up to 15 minutes away from their target date and then publishes them with a lambda function called emitter. This function receives the events a second early and waits for the right moment to publish them. The reason for this is that cold starts can add a couple hundred milliseconds of delay.

The emitter function is also where we know how much delay there is between the expected date and the date that we managed to deliver the payload back. The lower this delay, the better our precision. We track the delay in milliseconds.

In previous tests I use Python’s matplotlib to build charts, now we’ll take a look at how CloudWatch can support us here. Bonus: We can register alarms in CloudWatch to notify us when things go south.

But first, let’s report the performance data performantly.

4 Ways You Can Report Metrics

CloudWatch has an API to upload metric data. We can use this API on (1) every event, (2) per execution of the emitter function, (3) somewhen later by asynchronously processing the logs, or (4) not at all by using CloudWatch Insights instead.

Please note that your function requires the permission cloudwatch:PutMetricData to upload metric data.

On Every Event

Reporting metrics is straightforward. Decide on a namespace to upload your metrics under, choose a metrics name and put in the delay. You can find the full details on the AWS Documentation.

import boto3
cloudwatch = boto3.client('cloudwatch')

def handle(event, context):
    for item in event.get['Records']:
        publish_event(item)
        delay = get_delay(item)
        put_metrics(item)

def put_metrics(delay):
    cloudwatch.put_metric_data(
        Namespace='serverless-scheduler',
        MetricData=[
            {
                'MetricName': 'emitter-delay',
                'Value': delay, # e.g. 19
                'Unit': 'Milliseconds',
            },
        ]
    )

This approach is good if your function processes just one event at a time and you don’t hit lambda’s concrrency limits. In other situations you may quickly notice a significant downside of this approach: We don’t make use of batching. In the worst case we establish a new connection for every event.

Even if we don’t establish a new connection for each event, the code still waits for the network call to complete. To put the speed of network calls into perspective, have a look at Computer Latency at a Human Scale.

How about we report metrics just once?

Per Function Execution

If our function processes multiple events per execution, we can report the performance metrics in one go. To do this, we first collect, then aggregate, and finally submit all the data to CloudWatch once.

import boto3
cloudwatch = boto3.client('cloudwatch')

def handle(event, context):
    delays = []
    for item in event['Records']:
        publish_event(item)
        delays.append(get_delay(item))

    values, counts = aggregate_delays(delays)
    put_metrics(values, count)

def put_metrics(values, counts):
    cloudwatch.put_metric_data(
        Namespace='serverless-scheduler',
        MetricData=[
            {
                'MetricName': 'emitter-delay',
                'Values': values,
                'Counts': counts,
                'Unit': 'Milliseconds',
            },
        ]
    )

To ensure that we publish all the events as quickly as possible, we only collect the delay values. After the important work is done, we start doing analytics.

The batch parameters of the PutMetricData API expect an array of values and a correspoing array of counts, where each item in the counts array describes how often a given value has occurred.

values[0] = 200
count[0] = 10

==> The value 200 has occurred 10 times

To prepare these two arrays, the function aggregate_delays can be implemented in the following way:

def aggregate_delays(delays):
    # group the delays in a map
    delay_map = {}
    for delay in delays:
        if delay not in delay_map:
            delay_map[delay] = 0
        delay_map[delay] += 1

    # break it apart into two arrays
    values = []
    counts = []
    for value, count in delay_map.items():
        values.append(value)
        counts.append(count)

    return values, counts

However this approach still takes up runtime of the emitter, which could be needed for sending other events instead. The next approach will move the metrics reporting outside of the emitter function.

Asynchronous Log Processing

By detaching the analytics from the emitter, we can make sure that the core application performs only the most important work. To do this, you can use the serverless framework to attach a lambda function to another one’s log stream.

In the following snippet of the serverless.yml, we register a function called analyzer to be invoked when new logs arrive at the log group /aws/lambda/serverless-scheduler-emitter. We also add a filter so that only those logs make it to the function, where the field log_type has the value "emit_delay". Learn more about the Filter and Pattern Syntax.

functions:
  analyzer:
    handler: my_handler.handle
    events:
      - cloudwatchLog:
          logGroup: '/aws/lambda/serverless-scheduler-emitter'
          filter: '{$.log_type = "emit_delay"}'

To use these filters, the log event must be in JSON format. We let the emitter output the relevant data as a JSON string.

# simplejson can handle decimals better
import simplejson as json

def handle(event, context):
    for item in event['Recods']:
        publish_event(item)
        delay = get_delay(item)

        log_event = {
            'log_type': 'emit_delay',
            'delay': delay
        }
        print(f"{json.dumps(log_event)}")

We then implement the lambda function that is listening to the log stream.

import gzip
from base64 import b64decode
import simplejson as json

import boto3
cloudwatch = boto3.client('cloudwatch')

def handle(event, context):
    # log events are compressed
    # we have to decompress them first
    log_events = extract_log_events(event):
    delays = []
    for log_event in log_events:
        delays.append(int(log_event['delay']))

    # use the aggregation from the previous example
    # to reduce the number of api calls
    values, counts = aggregate_delays(delays)
    put_metrics(values, count)

def extract_log_events(event):
    compressed_payload = b64decode(event['awslogs']['data'])
    uncompressed_payload = gzip.decompress(compressed_payload)
    payload = json.loads(uncompressed_payload)
    return payload['logEvents']

# ... functions from previous example ...

Now every time there are new logs, the analyzer function is invoked and reports the delay metrics.

But wait, there’s another ad hoc approach which requires even less code.

CloudWatch Insights

In the previous section we started logging JSON. These logs can be used by CloudWatch Logs Insights to generate metrics from logs. All without building and deploying new analyzer functions!

#AWSLambda protip: Thou Shalt Log JSON! (and then you just use cloudwatch insights for searching across all the whole log group easily instead of fucking around with log streams)!

— Gojko Adzic (@gojkoadzic) April 23, 2020

The emitter now prints JSON logs like {'log_type': 'emity_delay', 'delay': 156}. To visualise the delays we open CloudWatch Logs Insights in the AWS console, select the right log group and use CloudWatch Logs Query Syntax to build a query which aggregates the delay data.

The query stats max(delay) by bin(60s) builds an aggregate (stats) of the maximum delay (max(delay)) for every minute (bin(60s)).

After running the query, we see a logs and a visualization tab. Here’s the visualization:

With a click on “Add to dashboard” we can build a widget out of this metric and add it to one of our existing dashboards. We’ll look more into graphing in the next section.

Note that this approach is based on CloudWatch logs, where you pay $0.03 per GB of storage. If you only need metrics for the last few weeks, then CloudWatch Insights with a 14 or 28 day log retention period is okay. Otherwise Custom Metrics are cheaper for long term storage.

Graph It

In a recent article I explained how to turn custom metrics into graphs and how to build a dashboard. This time we take a look at how we can make the best out of the delay metrics, and spice them up with a reference line.

Looking at the maximum delay, we can quickly understand what the worst example is. But using percentiles allows us to better understand how bad the situation really is. If your maximum delay is 14 seconds, but that only occurred once, then the situation isn’t too bad. If however the 90% percentile (p90) to 10 seconds, then a significant number of customers will be impacted. P90 describes the best 90%.

To better understand the various percentiles, you can use the following query to plot out the max, p99, p95 and p90. I’ve increased the bin to 10 minutes so that the lines don’t overlap too much.

stats max(delay), percentile(delay, 99), percentile(delay, 95), percentile(delay, 90) by bin(10m)

The visualization gives us four lines.

Reference Line

If you’re building graphs from custom metrics, you can add a reference line to indicate a threshold. With the serverless scheduler my reference line is 1 second. I have not found out how I can add static values through CloudWatch Insights and will therefore use regular CloudWatch metrics instead. Let me know if you know more!

Once you selected a metric, you can add a reference line by adding the formula IF(m1, 1000, 0). Replace 1000 with your reference value. This expression will print a reference line, if the other data series m1 has a value.

Alarms

If too many delays are above our reference line, we should investigate if there’s a new bug or increased load breaks the system. The quickest way to learn about that is to use CloudWatch Alarms. In a previous article I explained how you can set up alarms that send you an email.

Conclusion

In this article we learned 4 approaches to generating metric data, each with their own little trade offs. CloudWatch Insights is great if you only consider the last few weeks, otherwise I suggest asynchronous log analysis.

Further Studying

AWS re:Invent 2018: Introduction to Amazon CloudWatch Logs Insights (DEV375)
CloudWatch Alarms
CloudWatch Logs Insights Query Syntax
Amazon CloudWatch Synthetics for more complex testing and monitoring

Monitoring an application's health with CloudWatch Custom Metrics

Michael Bahr — Mon, 13 Apr 2020 00:00:00 +0000

Follow me on bahr.dev and twitter so you are the first to see when I publish new articles!

For most applications it makes sense to trigger CloudWatch alarms when lambda functions throw errors. Throwing errors on unwanted behavior is a best practice which also allows you to make use of standard metrics and redrive mechanisms. However some applications may have trade offs between concurrency and blast radius, which don’t allow them to rely solely on errors for the health of their application.

In this article I will show you how I use custom metrics to verify that an application’s core process is healthy. We will also take a look at the operational cost of this solution.

Context

The application “Eve Market Watch” lets players of the MMORPG Eve Online define market thresholds for various items. When the available amount drops below that threshold, the user gets a notification so they can restock the market. In the picture below, a threshold of 100,000 would trigger an ingame mail while a threshold of 90,000 would not yet.

The core process (market parser) takes all the user defined thresholds, pulls in market data from the game’s API and figures out which items are running low. If the number of processed items drops significantly, then something has happened that I should investigate, be it a market that’s not available anymore or a new bug.

The application has a trade off between concurrency and blast radius. The optimal blast radius would be one lambda per user and market, which keeps the application intact for all users, while allowing for quick isolation of the problematic ones. However I’m using the free plan of redislab to cache before writing to DynamoDB. The free plan of redislab has a limit of 30 connections, while the application currently has 370 active users.

Goal

If the market parser breaks, I want to know about that before my users do. There have been a couple times where I repeatedly broke the core process without noticing it.

To achieve this, the market parser shall track the number of items being processed so that an alarm can fire if that number drops significantly.

Here is where CloudWatch custom metrics and alarms come into play.

Custom Metrics

Custom metrics allow you to collect arbitrary time series data, graph it and trigger actions.

To collect custom metrics you need at least a namespace, a metric name, a value and a unit. You can find the full details on the AWS Documentation. You may also define dimensions to increase the granularity of your data. The following examples use Python 3.7 with AWS’ boto3 client for CloudWatch.

import boto3
cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_data(
    Namespace='marketwatch',
    MetricData=[
        {
            'MetricName': 'my-metric-name',
            'Dimensions': [
                {
                    'Name': 'dimension-name',
                    'Value': 'dimension-value'
                }
            ],
            'Value': 123,
            'Unit': 'Count'
        },
    ]
)

The namespace is a string which lets you link multiple metrics to an application or domain. In my example I use marketwatch as the namespace.

By setting a good metric name, you can identify your new metric amongst others and understand what data it holds. In my example I use snapshots-built, as this is the number of items that the market parser was able to get data for.

As for the metric value I send the number of items that have been processed and use the unit Count. See the documentation for all available units.

You may increase the metrics’ granularity with up to 10 dimensions. Beware that you can only define CloudWatch alarms on the highest granularity. In my example I add one dimension, which distinguishes between real data that I got from the markets, and zero values which are added when no data is available.

All things together the function that sends the metrics looks like this:

def put_metrics(count, snapshot_type):
    cloudwatch.put_metric_data(
        Namespace='marketwatch',
        MetricData=[
            {
                'MetricName': 'snapshots-built',
                'Dimensions': [
                    {
                        'Name': 'type',
                        'Value': snapshot_type # can be 'real' or 'virtual'
                    }
                ],
                'Value': count,
                'Unit': 'Count'
            },
        ]
    )

I intentionally don’t set a timestamp so that CloudWatch registers the event at the timestamp it is received. “Data points with time stamps from 24 hours ago or longer can take at least 48 hours to become available for GetMetricData or GetMetricStatistics from the time they are submitted.” - API PutMetricData

Permissions

You have to grant your function the permission to submit metrics. If you’re using the serverless framework, you can add the following permission to your serverless.yml.

provider:
  ...
  iamRoleStatements:
    - Effect: Allow
      Action:
        - cloudwatch:PutMetricData
      Resource: "*"

For more details check the api documentation for PutMetricData.

Testing

Once the code is deployed and running we go to the CloudWatch Metrics, look up our metric and verify that our code is collecting data. Once your code submitted the first metrics, you will see your new namespace under “Custom Namespaces”.

Open the dashboard, drill down into the right category and explore the available data.

Visualization

Once you found your data, continue by creating a graph which visualizes your data. Select the data series you want to visualize and adjust the “Graphed metrics”. When you see many dots in your graph, you can increase the period so that the dots get merged into a line. You can also report metrics more frequently.

As the core process of my application runs every 15 minutes, it makes sense to average over a period of 15 minutes.

For more info about graphing metrics, check out the CloudWatch documentation.

Once you’re happy with the data and graphs you’re seeing, head over to the CloudWatch Dashboards where we will create our own dashboard.

Dashboard

Go to the CloudWatch dashboards and create a new one. Give it a name and add your first widget, where you recreate the graph from the last section. You will again see the screen from the last section where you select your custom namespace, your metrics and dimensions, and then build a graph with the appropriate settings. Give your graph a name and click on “Create widget”.

Resize the widget and add more as you need. Here’s how my dashboard looks:

As you can see in the top graph on the left side around “04/06”, there is some lack of data. When my code stops working and doesn’t collect data anymore, an alarm should be triggered.

There is another drop after “04/08”. This one recovered itself within a reasonable time. I do not need an alarm for that situation, but should still analyze the problem later on.

Let’s look at creating alarms next.

Alarms

CloudWatch Alarms trigger an action when a given condition is met, i.e. response times exceeding 1000ms. In our example we want to fire an alarm, when the reported amount of processed items drops significantly for a prolonged time or is not reported at all.

To create an alarm, head over to the alarms section in CloudWatch and click on “Create alarm”. You will then be asked to select a metric, where you pick and plot a metric as we’ve done in the previous sections. Note that here you can only select one data series. You can’t aggregate various dimensions here.

With the metric selected, you can define conditions. I decided to go with the “Anomaly detection” and picked “Lower than the band” as the threshold type. Play around with the anomaly detection threshold value to see what is best for your data. In the additional configuration I defined that 10 out of 10 datapoints need to breach the band before an alarm gets triggered. This way the app can recover itself in case an external API temporarily fails. I also decided to “Treat missing data as bad (breaching threshold)” as the alarm would otherwise not fire if my code breaks before the metrics are reported.

In the picture below you see a preview of the anomaly detection against the metrics I’ve collected. We see a few red drops where the anomaly detection triggers, but as we’ve configured the alarm to only fire if 10 out of 10 data points are bad, we only get alarms when the market parser does not recover. If you look closely, you also see regular drops in the gray anomaly band which are caused by the game’s daily downtime. CloudWatch correctly understands that this is a recurring behavior.

When my alarm fires, I want to receive an email. This is the easiest way to continue, but you may set up custom integrations through SNS topics, e.g. for Slack. To send alarms to an email, choose to “Create a new topic”, enter a name for the new topic and enter an email address that will receive the alarm. Click on “Create Topic” below the email input and then click on next to continue.

Finally give your alarm a name and finish the setup. To test your alarm you can update the trigger conditions or report metrics that will trigger the alarm. Make sure to check that you get an email as expected.

Cost Analysis

The operational cost of using custom CloudWatch Custom Metrics and Alarms consists of two parts: The ingestion and the monitoring.

Ingestion

Each custom metric that you submit data for costs $0.30 per month. Custom metrics are not covered by the free tier. “All custom metrics charges are prorated by the hour and metered only when you send metrics to CloudWatch.”

You also pay for the PutMetricData calls, but the first one million API requests are covered by the free tier, and then cost $10 per one million API requests. My application reports two metrics every 15 minutes, which is a total of 5,760 API requests per month.

Monitoring

We’ve set up one dashboard and one alarm. Each dashboard costs $3 per month, but the free tier covers three dashboards for up to 50 metrics per month. Each anomaly detection alarm costs $0.30 at standard resolution as it is made up of three alarms: “one for the evaluated metric, and two for the upper and lower bound of expected behavior”. If you select high resolution, which is 10 seconds instead of 60 seconds, you pay three times as much. As we’re only reporting data every 15 minutes, high resolution doesn’t make sense. The free tier covers up to 10 alarm metrics (not applicable to high-resolution alarms).

Total

For my application I expect to pay a total of $0.30 per month. Without the free tier I would still expect less than $5.

Conclusion

We saw how applications can collect custom metrics and how we can use CloudWatch to trigger alarms based on those metrics. I think the price is very fair, as small hobby projects with a few custom metrics can get away with a low price while medium sized enterprise software can remain under $100 per month.

If you would like to define alarms as code, have a look at this example. For all users of the serverless framework, this article explains how to add alerts.

Did you like this article or do you know where it could be improved? Let me know on twitter!

Stock Sentiment Analysis - Part 2: Analysing the sentiment

Michael Bahr — Fri, 27 Mar 2020 12:40:56 +0000

In this two part article I will show you how to build an app, that collects people's opinions about companies and how to turn that into sentiments. Disclaimer: Trade at your own risk!

Part 1

Warning: Only deploy this if you have set up budget alarms and understand the spending potential of this solution! Check the section Cost Analysis for more details.

We left off with collecting raw tweets in our DynamoDB table. Now it's time to understand if the tweets about a given company are rather positive or negative. We will add another Lambda which is invoked when we find new tweets. This Lambda then asks AWS Comprehend for the tweet's sentiment, which is a score of how positive, neutral or negative the text was.

Here's an example tweet I collected. Do you think the text is rather negative or positive?

I wish I were as positive about anything in life as Ross is about $TSLA.

Streams

The serverless framework has a simple way to attach a lambda to a DynamoDB stream.

functions:
  tweetAnalyzer:
    handler: tweetAnalyzer.handle
    events:
      - stream:
          arn: arn:aws:dynamodb:REGION:ACCOUNT_ID:table/Tweets/stream/DATE
          batchSize: 25

With this snippet of code the function tweetAnalyzer will be invoked when we add new tweets to the table. The batchSize of 25 allows us to process multiple tweets at once. I chose 25 because that's the maximum amount of texts we can pass to AWS Comprehend per request (BatchSizeLimitExceededException).

Our function is invoked when an entry in our DynamoDB table changes. As I've defined the stream to only contain the key of the entry, we first have to load the entry, then send it to AWS Comprehend and finally update the entry in our table. You can find the source code on GitHub.

The event passed from the stream to our function contains a list of Records which each hold the entry's key in item['dynamodb']['Keys']['id']['N']. If your key is not a number, you have to adjust the ['id']['N'].

import boto3
table = boto3.resource('dynamodb').Table('Tweets')

tweets = []
for item in event.get('Records', []):
    item_id = int(item['dynamodb']['Keys']['id']['N'])
    tweet = table.get_item(Key={'id': item_id}).get('Item', None)
    if 'sentiment_score' not in tweet:
        tweets.append(tweet)

This loop loads all the entries which don't have a sentiment_score yet. We need to filter because the events will fire when an entry in the table changes, which is also the case when we add the sentiment score. Let's continue with adding that score.

Sentiments

The Comprehend API expects a list of texts. As we saved some more meta information along the tweets' text, we have to extract the raw text first. Don't change the tweets' order here as AWS Comprehend returns the results along with their Index from the input, which we will map back the results to the original tweets.

text_list = list(map(lambda tweet: tweet['text'], tweets))

In the next step we send the texts to Comprehend. Note that the text_list must not have more than 25 entries. You must also specify a language. As we've previously told the Twitter API to only return English tweets (by best effort), we will use en here.

import boto3
comprehend = boto3.client('comprehend')

comprehend_response = comprehend.batch_detect_sentiment(
    TextList=text_list,
    LanguageCode='en'
)

Each of the results contains an Index which is the index of the item in the text_list. We use that information to map the result back to our DynamoDB entries.

from decimal import Decimal

for entry in comprehend_response.get('ResultList', []):
    tweet = tweets[entry['Index']]
    tweet['sentiment_score'] = json.loads(json.dumps(entry['SentimentScore']), parse_float=Decimal)
    table.put_item(Item=tweet)

Note that because DynamoDB doesn't like float parameters, we have to convert the floats to Decimals.

Let's see how Comprehend rated our example from above. Do you think that Comprehend catched the sarcasm?

{
    "neutral": 0.07,
    "positive": 0.88,
    "negative": 0.03,
    "mixed": 0.00
}

While the score may be wrong for a couple of tweets, the overall scoring over thousands and millions of tweets will average out the wrong ones.

The app now collects sentiment insights for you. Don't stop reading here, as this might become expensive!

Cost Analysis

Each tweet will require one sentiment analysis and one DynamoDB write operation. Again the time span is a whole year of operation.

With on-demand pricing DynamoDB charges $1.25 per million write request units and $0.25 per million read request units. With the worst case of 100 new tweets every 10 minutes, we're looking at a total of 100x6x24x365 = 5256000 WCUs or $6.57 per year.

The cost of AWS Comprehend is a bit more intense.

Tweets vary in length, but for the worst case we assume that each tweet uses up the full 280 characters. If we collect 100 tweets every 10 minutes, we are going collect 100x6x24x365 = 5,256,000 tweets per year. Over the last year I did however only collect 1,500,000 tweets.

"Amazon Comprehend requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request."

Sentiment analysis is priced at $0.0001 per unit. The price to analyze 5,256,000 tweets with 280 characters or 3 units each is 5,256,000 x $0.0001 x 3 = $1,576.8. This is expensive for a hobby project, so please make sure you tag your resources appropriately and think twice before you run this as a hobby project.

There is a free tier for Comprehend, which for sentiment analysis will "cover only the main analysis [...]. But after you analyze the text, the system automatically calls different APIs [...]. These automatic calls [...] are not covered by the Free Tier [...]" (Source: AWS Support). Because of this I started seeing Comprehend charges before the full 50k free units were used up. The team of Comprehend has already received the feedback so they can improve the website.

Further development

If you want to take this approach and push it forward, I suggest to start building a container/VM based solution to pull tweets from Twitter. AWS Fargate lets you run containers without managing the servers below.

Know any other sentiment APIs? How do they compare in pricing? Try to attach one of them instead.

Conclusion

Analyzing the sentiment can be achieved with one function, but only scales if your workload is small or your credit card big. Please make sure you have budget alarms set up if you want to deploy this yourself, this is not cheap!

Stock Sentiment Analysis - Part 1: Collecting opinions

Michael Bahr — Fri, 27 Mar 2020 12:40:03 +0000

In this two part article I will show you how to build an app that collects people's opinions about companies and how to turn that into sentiments. Disclaimer: Trade at your own risk!

As for technologies my current go to stack is AWS serverless tech and deployment with the Serverless Framework. This article assumes that you are familiar with both.

Collecting opinions

Many platforms have APIs that let us collect opinions. The prime example is probably Twitter where everyone screams into the forest. We will start by setting up an app and collecting the raw data.

In order to collect data from Twitter, you have to create a developer app and generate oauth1 keys. You can do all of that through the browser. Store the details in the config.<STAGE>.json file. The value for <STAGE> is either dev or whatever you provided with the --stage parameter.

Next we set up a serverless app. With serverless technologies like AWS Lambda we don't need to worry about creating and maintaining serves, and it's also super cheap. You can find the source code on github or learn how to create a serverless app at serverless.com.

Our first function will run in a regular interval. I chose 10 minutes because that was usually enough time for for 10-50 new tweets to appear. Learn more about the scheduling options by visiting the CloudWatch docs.

functions:
  tweetCollector:
    handler: tweetCollector.handle
    events:
      - schedule: rate(10 minutes)

The function needs to authenticate with Twitter, load new tweets and then store these in our table for later processing.

The first step is pretty simple with Tweepy. In the following snippet we load the oauth1 keys from environment variables and then authenticate. Our script will abort if the authentication fails. You can find the full source code on Github.

import tweepy

auth = tweepy.OAuthHandler(os.environ['CONSUMER_KEY'], os.environ['CONSUMER_SECRET'])
auth.set_access_token(os.environ['ACCESS_TOKEN'], os.environ['ACCESS_TOKEN_SECRET'])

api = tweepy.API(auth)
if not api:
    print("Can't Authenticate")
    return

As I'm running this on AWS Lambda, search queries are a better fit than the Twitter streaming API. The following search query allows us to define a search term, how many tweets we want to load per query as well as a max_id and since_id for pagination.

new_tweets = api.search(q=search_query, lang='en', count=tweets_per_query, max_id=str(max_id - 1), since_id=since_id)

To store the data I'm using a DynamoDB table. As this query may return tweets that we already have, we check if a tweet already exists before writing it. We could just overwrite the tweets, but this would leader to higher DynamoDB spending. As a rule of thumb a read costs 1/5th of a write operation.

for tweet in new_tweets:
    existing_tweet = table.get_item(Key={'id': tweet._json['id']}).get('Item', None)
    if existing_tweet is None:
        table.put_item(
            Item={
                    'id': tweet._json['id'],
                    'created_at': tweet._json['created_at'],
                    'text': tweet._json['text'],
                    'query': search_query
                }
            )
        count += 1

Before you deploy, make sure to fill in the configuration. The config is documented in the readme.

Finally deploy the app with sls deploy and let the tweet collection begin. Keeping in mind that the schedule only fires every 10 minutes, check the logs for errors if no tweets arrive. The most likely errors are missing AWS permissions or bad Twitter keys. The first new tweet in our table shows that our collector is up and running!

START RequestId: 6aad9d8e-aa2c-422c-a471-6c7f9254c919 Version: $LATEST
Downloaded 100 tweets
Saved tweets: 17
END RequestId: 6aad9d8e-aa2c-422c-a471-6c7f9254c919
REPORT RequestId: 6aad9d8e-aa2c-422c-a471-6c7f9254c919  Duration: 1121.61 ms    Billed Duration: 1200 ms    Memory Size: 1024 MB    Max Memory Used: 98 MB

Cost Analysis

To understand the yearly cost of this stack, we will look at two parts: The Lambda function and the DynamoDB table.

Lambda

The collector function runs once every 10 minutes. This results in 6x24x365 = 52560 invocations per year. Lambda charges $0.20 per 1M requests, so we're looking at $0.1 per year. Additionally Lambda charges $0.0000166667 for every GB-second. A GB-second is a Lambda with 1024MB RAM running for one second. When our function runs for 2 seconds each 10 minutes, it will use 2x6x24x365 = 105120 GB-seconds. That's another $1.75 per year.

The free tier is likely to cover all of that.

DynamoDB

With on-demand pricing DynamoDB charges $1.25 per million write request units and $0.25 per million read request units. This means that the worst case for spending is all new tweets and none that we can skip. With 100 new tweets every 10 minutes, we're looking at a total of 100x6x24x365 = 5256000 WCUs or $6.57 per year.

If you exceed the free 25GB per month, then DynamoDB will charge you $0.25 for every additional GB. My table with 1.5m tweets weighs ~270MB.

You can additionally lower the cost by switching to provisioned mode, where DynamoDB offers 25 WCUs and 25 RCUs for free.

Total

In total that's $0.1 + $1.75 + $6.57 = $8.42 per year.

Note that CloudWatch will charge you too, should you exceed the free tier.

Conclusion

Collecting data is fairly simple and very cheap. But will it be the same if we monitor 100 companies? Feel free to test that by using the source code on GitHub.

In the part 2 of this article we will use sentiment analysis to understand if a tweet is positive, neutral or negative.

How to analyse and aggregate data from DynamoDB

Michael Bahr — Sun, 02 Feb 2020 20:56:24 +0000

This article was first published on bahr.dev. Signup for the mailing list and get new articles straight to your inbox!

DynamoDB is not a database designed to run analysis queries with. We can however use DynamoDB streams and lambda functions to run these analyses each time data changes.

This article explains how to build an analysis pipeline and demonstrates it with two examples. You should be familiar with DynamoDB tables and AWS Lambda.

Pipeline setup

Assuming we already have a DynamoDB table, there are two more parts we need to set up: A DynamoDB stream and a Lambda function. The stream emits changes such as inserts, updates and deletes.

DynamoDB Stream

To set up the DynamoDB stream, we'll go through the AWS management console. Open the settings of your table and click the button called "Manage Stream".

By default you can go with "New and old images" which will give you the most data to work with. Once you enabled the stream, you can copy its ARN which we will use in the next step.

Attach a Lambda Function

When you work with the serverless framework, you can simply set the stream as an event source for your function by adding the ARN as a stream in the events section.

functions:
  analysis:
    handler: analysis.handle
    events:
      - stream: arn:aws:dynamodb:us-east-1:xxxxxxx:table/my-table/stream/2020-02-02T20:20:02.002

Deploy the changes with sls deploy and your function is ready to process the incoming events. It's a good idea to start by just printing the data from DynamoDB and then building your function around that input.

Data Design

With DynamoDB it's super important to think about your data access patterns first, or you'll have to rebuild your tables many more times than necessary. Also watch Rick Houlihan's fantastic design patterns for DynamoDB from re:Invent 2018 and re:Invent 2019.

Example 1: Price calculation

In EVE Online's player driven economy items can be traded through contracts. A hobby project of mine uses EVE Online's API to get information about item exchange contracts in order to calculate prices for these items. It collected more than 1.5 million contracts over the last year and derived prices for roughly 7000 items.

Pre-Processing

To build an average price, we need more than one price point. For this reason the single contract we receive is not enough, but we need all the price points for an item.

  {
    "contract_id": 152838252,
    "date_issued": "2020-01-05T20:47:40Z",
    "issuer_id": 1273892852,
    "price": 69000000,
    "location_id": 60003760,
    "contract_items": [2047]
  }

As the API's response is not in an optimal format, we have to do some pre-processing to eliminate unnecessary information and put key information into the table's primary and sorting keys. Remember that table scans can get expensive and smaller entries mean more records per query result.

|type_id (pk)|date (sk)           |price   |location|
|2047        |2020-01-05T20:47:40Z|69000000|60003760|

In this case I decided to use the item's ID (e.g. 2047) as the primary key and the date as the sort key. That way my analyser can pick all the records for one item and limit it to the most recent entries.

Analysis

The attached Lambda function receives an event from the stream. This event contains amongst others the item's ID for which it should calculate a new price. Using this ID the function queries the pre-processed data and receives a list of items from which it can calculate averages and other valuable information.

Attention: Don't do a scan here! It will get expensive quickly. Design your data so that you can use queries.

The aggregated result is persisted in another table from which we can source a pricing API.

Performance

Without further adjustment the analysis function will get linearly slower the more price points are available. It can however limit the number of price points it loads. By scanning the date in the sort key backwards we load only the latest, most relevant entries. Based on our requirements we can then decide to load only one or two pages, or opt for the most recent 1000 entries. This way we can enforce an upper bound on the runtime per item.

Example 2: Leaderboard

Another example based on a twitter discussion is about a leaderboard. In the German soccer league Bundesliga the club from Cologne won 4:0 against the club from Freiburg today. This means that Cologne gets three points while Freiburg gets zero. Loading all the matches and then calculating the ranking on the fly will lead to bad performance once we get deeper into the season. That's why we should again use streams.

Data Design

We will assume that our first table holds raw data in the following format:

|league (pk)|match_id (sk)|first_party|second_party|score|
|Bundesliga |1            |Cologne    |Freiburg    |4:0  |
|Bundesliga |2            |Hoffenheim |Bayer       |2:1  |

We design the leaderboard table to take a format, where we can store multiple leagues and paginate over the participants. We're going with a composite sort key as we want to let the database sort the leaderboard first by score, then by the amount of goals they shot and finally by their name.

|league (pk)|score#goals#name (sk)|score|goals_shot|name (GSI)|
|Bundesliga |003#004#Cologne      |3    |4         |Cologne   |
|Bundesliga |003#002#Hoffenheim   |3    |2         |Hoffenheim|
|Bundesliga |000#001#Bayer        |0    |1         |Bayer     |
|Bundesliga |000#000#Freiburg     |0    |0         |Freiburg  |

As the sort key (sk) is a string we have to zero pad the numbers. Sorting multiple strings containing numbers won't have the same result as sorting plain numbers. Choose the padding wisely and opt for a couple orders of magnitude higher than you expect the score to get. Note that this approach won't work well if your scores can grow indefinitely. If you have a solution to that, please share it and I'll reference you here!

We're also adding a GSI on the club's name to have better access to a single club's leaderboard entry.

Analysis

Each time a match result is inserted to the first table, the stream will fire an event for the analysis function. This entry contains the match and its score from which we can derive who gets how many points.

Based on the clubs' names, we can load the old leaderboard entries. We use these entries to first delete the existing records, then take the existing scores and goals, add the new ones and write the new leaderboard records.

|league (pk)|score#goals#name (sk)    |score|goals_shot|name (GSI)|
|Bundesliga |006#005#Cologne          |6    |5         |Cologne   |
|Bundesliga |003#005#Bayer            |3    |5         |Bayer     |
|Bundesliga |003#002#Hoffenheim       |3    |2         |Hoffenheim|
|Bundesliga |000#000#Freiburg         |0    |0         |Freiburg  |
...

Performance

As each match results in one or two queries and one or two updates, the time to update the score stays limited.

When we display the leaderboard it is a good idea to use pagination. That way the user sees an appropriate amount of data and our requests have a limited runtime as well.

DEV Community: Michael Bahr

How to Defend Against AWS Surprise Bills

Focus On Small Bills

There’s No Perfect Solution

Defense Mechanisms

1. Secure Your Account With Multi Factor Authentication

2. Budget Alerts

3. Budget Actions

4. Mobile App

5. Secrets Manager

Contact Support

Conclusion

Resources

How To Get Random Records From A Serverless Application

Use Cases

Prerequisites

Offset

DynamoDB

Fully Random

Categorized

S3

Fully Random

Categorized

Redis

Fully Random

Categorized

Cost Comparison

Limitations

Conclusion

Resources

Amazon Timestream vs DynamoDB for Timeseries Data

Prerequisites

Use Case

Data Format

Write to Timestream

Read from Timestream

Timestream Pricing

Problems and Limitations

Summary

Try it out

Further Reading

Validate Email Workflows with a Serverless Inbox API

Prerequisites

High-Level Overview

Verify Domain with SES

Data Model

The Receiver

The Read API

Limitations and Potential Improvements

Try it Yourself

Further Reading

Point Multiple Subdomains To The Same Frontend

Prerequisites

The Solution

1. Create A Hosted Zone

1.1. Fresh Domain That Is Managed By Route 53

1.2. Used Domain That Is Managed By Route 53

1.3. Domain Is Managed By A Provider Other Than AWS

2. Certificate

3. Frontend Deployment

4. Wildcard Routing

Try It Yourself

Troubleshooting

Next Steps

Further Reading

Archive your AWS data to reduce storage cost

Prerequisites

Moving Data

DynamoDB to S3

S3 Storage Tiers

Data on Ice with S3 Glacier

Next Steps

Further Reading

How to pick the right Compute Savings Plan for Serverless Workloads on AWS

Prerequisites

Disclaimer

How do Savings Plans save money?

What options are there?

How do I pick the right plan?

Understand your current spending