DEV Community: Tony Chan

Basics of AWS Tags & Terraform with S3 - Part 1

Tony Chan — Fri, 11 Mar 2022 14:57:20 +0000

Managing AWS resources can be an extremely arduous process. AWS doesn't have logical resource groups and other niceties that Azure and GCP have. This nonwithstanding, AWS is still far and away the most popular cloud provider in the world. Therefore, it's still very important to find ways to organize your resources effectively.

One of the most important ways to organize and filter your resources is by using AWS tags. While tagging can be a tedious process, Terraform can help ease the pain by providing several ways to tag your AWS resources. In this blog and accompanying video series, we're going to take a look at various methods and strategies to tag your resources and keep them organized efficiently.

These posts are written so that you can follow along. You will just need an environment that has access to the AWS API in your region. I typically use AWS Cloud9 for this purpose, but any environment with access will do.

Github repo: https://github.com/CloudForecast/aws-tagging-with-terraform

Tag Blocks

The first method we can use to tag resources is by using a basic tag block. Let's create a main.tf file and configure an S3 bucket to take a look at this.

Configure Terraform to use the AWS provider

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

Configure the AWS Provider

provider "aws" {
  region = "us-west-2"
}

Create a random ID to prevent bucket name clashes

resource "random_id" "s3_id" {
    byte_length = 2
}

We utilize the random_id function:
https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/id
to create the entropy needed in our bucket names to ensure we do not overlap with the name of another S3 bucket.

Create an S3 Bucket w/ Terraform and Tag It

resource "aws_s3_bucket" "devops_bucket" {
  bucket = "devops-bucket-${random_id.s3_id.dec}"

  tags = {
      Env = "dev"
      Service = "s3"
      Team = "devops"
  }
}

Now, let's run terraform apply -auto-approve.

Once the apply is finished, let's run terraform console and then run aws_s3_bucket.devops_bucket.tags to verify the tags:

> aws_s3_bucket.devops_bucket.tags
tomap({
  "Env" = "dev"
  "Service" = "s3"
  "Team" = "devops"
})

To exit the console, run exit or ctrl+c. You can also just run terraform state show aws_s3_bucket.devops_bucket.tags, terraform show, or just scroll up through the output to see the tags.

As you can see, AWS tags can be specified on AWS resources by utilizing a tags block within a resource. This is a simple way to ensure each s3 bucket has tags, but it is in no way efficient. Tagging every resource in AWS like this is not only tedious and the complete opposite of the DRY (Don't Repeat Yourself) principle, but it's also avoidable to an extent!

Default AWS Tags & Terraform

In order to specify deployment-wide tags, you can specify a default_tags block within the provider block. This will allow you to specify fallback tags for any resource that has no tags defined. If, however, you do specify tags on a specific resource, those tags will take precedence. Let's take a look:

Using Terraform to Create a Second S3 bucket

resource "aws_s3_bucket" "finance_bucket" {
  bucket = "cloudforecast-finance-${random_id.s3_id.dec)"

  tags = {
    Env = "dev"
    Service = "s3"
    Team = "finance"
  }
}

Once you have added the second bucket definition and saved the file, go ahead and apply the configuration with terraform apply -auto-approve.
Once you have applied, you can run terraform console and access both buckets by their resource name:

> aws_s3_bucket.devops_bucket.tags
tomap({
  "Env" = "dev"
  "Service" = "s3"
  "Team" = "devops"
})
> aws_s3_bucket.finance_bucket.tags
tomap({
  "Env" = "dev"
  "Service" = "s3"
  "Team" = "finance"
})

If we were to deploy 10s, 100s, or even 1000s of resources, this would not be very efficient. Let's add default tags to make this more efficient:

Add Default AWS Tags w/ Terraform

Within the provider block of our configuration, add the default tag in order to assign both resources the Env tag:

provider "aws" {
  region = "us-west-2"
    default_tags {
      tags = {
          Env = "dev"
    }
  }
}

Remove Env tags w/ Terraform

Now that we've added the default tags, let's remove the Env tag from the AWS S3 buckets:

resource "aws_s3_bucket" "devops_bucket" {
    bucket = "devops-bucket-${random_id.s3_id.dec}"

    tags = {
        Service = "s3"
        Team = "devops"
    }
}

resource "aws_s3_bucket" "finance_bucket" {
    bucket = "finance-bucket-${random_id.s3_id.dec}"

    tags = {
        Service = "s3"
        Team = "finance"
    }
}

Run terraform apply -auto-approve again and, once it's finished deploying,
run terraform console. Within the console, type the resource address of each S3 bucket and view the output:

> aws_s3_bucket.devops_bucket.tags
tomap({
  "Service" = "s3"
  "Team" = "devops"
})
> aws_s3_bucket.finance_bucket.tags
tomap({
  "Service" = "s3"
  "Team" = "finance"
})

Do you notice something missing? Default tags are not displayed within the tags attribute. Default tags are found within the tags_all attribute, so re-run the previous commands with tags_all replacing tags:

> aws_s3_bucket.devops_bucket.tags_all
tomap({
  "Env" = "dev"
  "Service" = "s3"
  "Team" = "devops"
})
> aws_s3_bucket.finance_bucket.tags_all
tomap({
  "Env" = "dev"
  "Service" = "s3"
  "Team" = "finance"
})

There they are! Keep this in mind. If you are querying the state to perform actions based on tags, you will want to use the tags_all attribute instead of just tags by themselves.

Tag Precedence

Now, for one last quick test to see the tag precedence in action, let's add the Env tag back to our finance bucket, but define it as prod instead of dev:

resource "aws_s3_bucket" "finance_bucket" {
  bucket = "finance-bucket-${random_id.s3_id.dec}"

  tags = {
    Env = "prod"
    Service = "s3"
    Team    = "finance"
  }
}

Run terraform apply -auto-approve again:

  # aws_s3_bucket.finance_bucket will be updated in-place
  ~ resource "aws_s3_bucket" "finance_bucket" {
        id                                   = "finance-bucket-52680"
      ~ tags                                 = {
          + "Env"     = "prod"
            # (2 unchanged elements hidden)
        }
      ~ tags_all                             = {
          ~ "Env"     = "dev" -> "prod"
            # (2 unchanged elements hidden)
        }
        # (17 unchanged attributes hidden)
    }

Notice the changes made, then run terraform console:

> aws_s3_bucket.finance_bucket.tags_all
tomap({
  "Env" = "prod"
  "Service" = "s3"
  "Team" = "finance"
})

Notice the Env tag has now been changed to prod, our updated value, overriding the default tags.

Destroy Resources

Now, if you're ready, go ahead and destroy your resources!

terraform destroy -auto-approve

Conclusion

Alright, so now that we have an idea of how to assign custom tags and default tags, join me on the next part in this series where we dive deeper!

Original Post

AWS DynamoDB Pricing and Cost Optimization Guide

Tony Chan — Tue, 11 Jan 2022 17:20:15 +0000

Amazon Web Services DynamoDB, a NoSQL database service, is excellent for applications that require low-latency data access, such as web, mobile, IoT, and gaming apps. It improves the durability of an application by handling large amounts of data quickly and efficiently. Among its features are built-in caching and security and backup support for web applications. Because AWS DynamoDB supports ACID transactions it can be scaled at the enterprise level, allowing for the development of business-critical applications.

AWS DynamoDB does offer a free tier, but its pricing for paid plans charge based on six factors:

The amount of data storage
The amount of data you read and write
The amount of data transfer
Backup and restore operations you performed
DynamoDB streams
Amount of write request units replicated when using global tables

This article will thoroughly explain DynamoDB pricing structure and how to reduce expenses in DynamoDB by taking the above factors into account, so you can get the best performance at the lowest cost.

DynamoDB Pricing

DynamoDB can be extremely expensive to use. There are two pricing structures to choose from: provisioned capacity and on-demand capacity.

DynamoDB Provisioned Capacity

In this Amazon DybamoDB Pricing Plan, you’re billed hourly per the use of operational capacity units (or read and write capacity units). You can control costs by specifying the maximum amount of resources needed by each database table being managed. The provisioned capacity provides autoscaling and dynamically adapts to an increase in traffic. However, it does not implement autoscaling for sudden changes in data traffic unless that’s enabled.

DynamoDB On-demand Pricing

This plan is billed per request units (or read and write request units). You’re only charged for the requests you make, making this a truly serverless choice. This choice can become expensive when handling large production workloads, though. The on-demand capacity method is perfect for autoscaling if you’re not sure how much traffic to expect.

Knowing which capacity best suits your requirements is the first step in optimizing your costs with DynamoDB. Here are some factors to consider before making your choice.

You should use provisioned capacity when:

You have an idea of the maximum workload your application will have
Your application’s traffic is consistent and does not require scaling (unless you enable the autoscaling feature, which costs more)

You should use on-demand capacity when:

You’re not sure about the workload your application will have
You don’t know how consistent your application’s data traffic will be
You only want to pay for what you use

You can learn more about how costs are calculated for both pricing structures here.

DynamoDB Pricing Calculator

There are a few options available to help estimate and calculate what you might pay for with AWS DynamoDB. The best we've found for DynamoDB is AWS's own AWS Pricing Calculator which can be found here, directly: DynamoDB Pricing Calculator. With this calculator, you can easily pick and choose DynamoDB features, the provisioned capacity, read/write settings and then get a clear estimate:

DynamoDB’s Read and Write Capacity

The read capacity of a DynamoDB table indicates how much you can read from it. Read capacity units (RCUs) are used to measure the read capacity of a table. For an object up to 500 KB in size, one RCU equals one strongly consistent read per second or two ultimately consistent reads per second.

The write capacity of a DynamoDB table informs you how much you can write into it. Write capacity units (WCUs) denote one write per second for items up to 1 KB in size.

DynamoDB Autoscaling

Database behaviors can be tricky to measure, which makes scaling problematic. Underscaling your database can lead to catastrophe, while overscaling can lead to a waste of resources. The DynamoDB autoscaling functionality configures suitable read and write throughput to meet the request rate of your application. This means that when your workload changes, DynamoDB automatically adjusts and dynamically redistributes your database partitions to better fit changes in read throughput, write throughput, and storage.

Autoscaling is the default capacity setting when you create a DynamoDB table, but you can activate it on any table. In DynamoDB, you define autoscaling by specifying the minimum and maximum levels of read and write capacity, as well as the desired usage percentage. When the amount of consumed reads or writes exceeds the desired usage percentage for two minutes in a row, the upper threshold alarm is activated. The lower threshold alarm is triggered when traffic falls below the desired utilization, minus twenty percent, for fifteen minutes in a row. When both alarms are raised, the autoscaling process begins.

Monitoring DynamoDB Resources

Monitoring AWS resources such as DynamoDB for latency, traffic, errors, and saturation is called resource monitoring. It makes scaling your DynamoDB database easier since you can get the metrics you need, like network throughput, CPU utilization, or read/write operations. For example, after monitoring your database, you discover that the database experienced a high surge in traffic. This suggests that a large amount of data is being either read from or written to your database. You may decide to increase the read capacity of your database to accommodate more read requests or increase the write capacity to accommodate more write requests.

DynamoDB Cost Optimization

Now that you know some of the factors behind how you are billed when using AWS DynamoDB, here are some suggestions for making these factors work in your favor and ensuring DynamoDB is cost-effective

Picking the Right Capacity

You may already know which capacity structure you’d like to adopt, but keep these points in mind as you make your final choice.

DynamoDB On-Demand Capacity Cost Optimization

According to Yan Cui's blog, his calculations suggest that on-demand tables are about five to six times more costly per request than provisioned tables. If your workload maintains consistent usage with no unexpected spikes, but you’re unsure about future usage, consider using provisioned mode with autoscaling enabled.

DynamoDB Provisioned Capacity Cost Optimization

If you use provisioned capacity and your capacity exceeds 100 units, consider purchasing reserved capacity. Compared to granted throughput capacity, reserved capacity delivers a seventy-six percent discount over a three-year term and a fifty-three percent discount over one year.

Finding Unused DynamoDB Tables

Unused DynamoDB tables are a waste of resources and unnecessarily raise your costs. You have two options for handling this. You can use the on-demand capacity mode to ensure you only pay for the database tables to which you make read/write requests. Alternately, try to detect the unused tables and eliminate them. To do this, you need to review the read/write operations on your tables. If a table has no read/write activity in the last ninety days, it is an unused table.

Reduce DynamoDB Backup Needs

According to estimations, the Amazon DynamoDB backup pipeline, which enables automatic backups and drops of data, can considerably raise your costs. This is due to the amount of WCUs constantly expended on creating a backup of the current DynamoDB tables and the amount of RCUs that will need to be expended when retrieving the backup tables. As a result, the backup table has no specified resource sizes and the table is growing without bounds. Instead use the native DynamoDB backup, which uses provisioned capacity. That way you know how much backup size you need and how much read/write resource capacity it will require.

Using AWS Cheaper Regions

Some AWS regions are more expensive than others. If you’re not concerned about your data location, choose the cheapest region you can get.

The cheapest regions are us-east-1, us-east-2, and us-west-2, costing $0.25 per GB/month, $0.00065 per WCU/hour, and $0.00013 per RCU/hour.

Conclusion

Amazon DynamoDB can be an important tool for your software projects, but it can also be an expensive tool if you’re not careful. Following these tips to optimize your costs can help you keep your budget down and free you up to focus on other aspects of your business.

This article was originally posted on the CloudForecast Blog: https://www.cloudforecast.io/blog/dynamodb-pricing/

AWS EMR Cost Optimization Guide

Tony Chan — Tue, 14 Dec 2021 17:19:06 +0000

AWS EMR (Elastic MapReduce) is Amazon’s managed big data platform which allows clients who need to process gigabytes or petabytes of data to create EC2 instances running the Hadoop File System (HDFS). AWS generally bills storage and compute together inside instances, but AWS EMR allows you to scale them independently, so you can have huge amounts of data without necessarily requiring large amounts of compute. AWS EMR clusters integrate with a wide variety of storage options. The most common and cost-effective are Simple Storage Service (S3) buckets and the HDFS. You can also integrate with dozens of other AWS services, including RDS, S3 Glacier, Redshift, and Data Pipeline.

AWS EMR is powerful, but understanding pricing can be a challenge. Because the service has several unique features and extensively utilizes other AWS services, it’s easy to lose track of all the elements factored into your monthly spend. In this article, I’ll share an overview of AWS EMR’s pricing model, some tips for controlling your AWS EMR costs, and resources for monitoring your EMR spend. While it’s hard to generalize advice for EMR because each data warehouse is different, this article should give you a starting point for understanding how your use case will be priced by Amazon.

AWS EMR Cluster Pricing

Most of the costs of running an AWS EMR cluster come from the utilization of other AWS resources, like EC2 instances and S3 storage. To run a job, an AWS EMR cluster must have at least one primary node and one core node. The EC2 instances in the cluster are charged by the minute based on instance size. Every instance in a cluster is created with an attached, ephemeral EBS volume with 10 GiB of provisioned space (instances without attached instance storage are given more) to hold HDFS data and any temporary data like caching or buffers. These volumes are charged per GiB provisioned and prorated over the time the instance runs.

The data to process and data processing application are stored in S3 buckets, where you're charged per Gibibyte (GiB) per month. A job is submitted to the EMR cluster via steps or a Hadoop job. You can also automate cluster launch using a service like Step Functions, Data Pipeline, or Lambda Functions. To start the job, the EMR File System (EMRFS) retrieves data from S3 (adding GET request fees to the S3 bucket). Any buckets in a different region will also be charged per GiB for data transferred to the cluster.

You can set the minimum and maximum number of EC2 instances your EMR cluster uses to help control your costs vs. availability. For example, this cluster uses managed scaling, and has the maximum number of non-primary nodes set to “3”:

Once the job starts, EMR will monitor utilization and add nodes if needed. EMR managed scaling adds nodes in a specific order: On-Demand core nodes, On-Demand task nodes, Spot instance core nodes, and Spot instance task nodes. These additional instances also have costs and attached EBS volumes.

Other data, like configurations for auto-scaling instances and log data can also be stored in S3 buckets. Once the job completes (or finishes a step), intermediate data can be stored in HDFS for more processing or written to an S3 bucket (with a PUT request fee, storage costs, and any cross regions data-transfer charges). At the end of the job, EMR will terminate idle instances (and attached EBS volumes) down to the minimum, while remaining instances wait for the next workload.

In short, if your EMR cluster is sitting idly, waiting for data, it should scale down appropriately, but you'll still pay for storage costs during downtime. While putting a cap on the number of EC2 instances EMR uses helps you control your costs, your data warehouse might struggle if it's hit with a sudden spike.

AWS EMR Cost Optimization Tips

Now that we've laid the groundwork for how pricing in EMR works, let's look at some of the levers you can pull to decrease your EMR costs.

Prepare Your Data

When you’re working on the petabyte scale, disorganized data can dramatically increase costs by increasing the amount of time it takes to find the data you intend to process. Good ways to improve the efficiency of your EMR cluster are data partitioning, compression, and formatting.

Data partitioning is vital to ensure you’re not wading through an entire data lake to find the few lines of data you want to process, racking up bandwidth and compute costs in the process. You can partition data by carefully planning to use prefixes and S3 Select. Or use a Hadoop tool like Hive, Presto, or Spark in tandem with a metadata storage service like Glue.

Partitioning by date is common and suits many tasks, but you can partition by any key. A daily partition could prevent an EMR cluster from requesting and scanning a week’s worth of data. Much like database indexing, some partitioning is extremely useful, but over-partitioning can hurt performance by forcing the primary node to track additional metadata and distribute many small files. When reading data, aim to keep partitions larger than 128 MB (the default HDFS block size) to avoid the performance hit associated with loading many small files.

Data compression has the obvious benefit of reducing storage space. It also saves on bandwidth for data passed in and out of your cluster. Hadoop can handle reading gzip, bzip2, LZO, and snappy compressed files without any additional configuration. Gzip is not splittable after compression, so it’s not as appealing as other compression formats. You can also configure EMR to compress the output of your job, saving bandwidth and storage in both directions.

Data formatting is another place to make gains. When dealing with huge amounts of data, finding the data you need can take up a significant amount of your compute time. Apache Parquet and Apache ORC are columnar data formats optimized for analytics that pre-aggregate metadata about columns. If your EMR queries column intensive data like sum, max, or count, you can see significant speed improvements by reformatting data like CSVs into one of these columnar formats.

Use the Right Instance Type

Once your data is stored efficiently, you can bring attention to optimizing how that data is processed. The EC2 instances EMR uses to process data and run the cluster are charged per second. The cost of EC2 instances scales with size, so doubling the size of an instance doubles the hourly cost, but the cost of managing the EMR overhead for a cluster sometimes remains fixed. For many instance families, the hourly EMR fee for an .8xlarge is the same as the hourly EMR fee for a .24xlarge machine. This means larger machines running many tasks are more cost efficient, they decrease the percentage of your budget spent supporting EMR overhead.

Choose Your AWS EC2 Pricing

There are four options for purchasing EC2 instances:

On-Demand instances can be started or shut down at any time with no commitment and are the most expensive. The upside is that they'll always be available and can't be taken away (like spot instances can).
One and three year reserved instances are On-Demand EC2 instances you reserve in exchange for discounts of 40% to 70%, but you're committing to a long-term commitment with a specific instance family within a specific region .
Savings Plans are a slightly more flexible version of reserved instances. You still commit to purchase a certain amount of computer for a one or three year term, but you can choose to change instance family and region. This contract for compute can also be applied to AWS Fargate and AWS Lambda usage.
Spot instances allow clients to purchase unused EC2 capacity, with discounts that can reach 90% and are tied to demand over time. The downside is that spot instances could be claimed back at any time, so they aren't appropriate for most long-running jobs.

The best way to determine which instance type to use is by testing your application in EMR while monitoring your cluster through the EMR management console, log files, and CloudWatch metrics. You want to be fully utilizing as much of your EMR system as possible, making sure you don’t have large amounts of compute idling while ensuring you can reliably hit your SLAs.

Use the Right Number of Primary, Core, and Task Nodes

There are three types of nodes in an EMR cluster. It's important to understand what they do so you can devote the right number of instances to each of these types.

The primary node (there can only be one or three running) manages the cluster and tracks the health of nodes by running the YARN Resource Manager and the HDFS Name Node Service. These machines don’t run tasks and can be smaller than other nodes (the exception is if your cluster runs multiple steps in parallel). Having three primary nodes gives you redundancy in case one goes down, but you will obviously pay three times as much for the peace of mind.

Core nodes run tasks and speak to HDFS by running the HDFS DataNode Daemon and the YARN Node Manager service. These are the workhorses of any EMR cluster and can be scaled up or down as needed.

Task nodes don’t know about HDFS and only run the YARN Node Manager service. They are best suited for parallel computation, like Hadoop MapReduce tasks and Spark executors. Because they can be reclaimed without risking losing data stored in HDFS, they are ideal candidates to become Spot instances.

While EMR handles the scaling up and down of core and task nodes, you have the ability to set minimums and maximums. If your maximum is too low, large jobs might back up and take a long time to run. If your minimum is too low, spikes in data take longer as more instances ramp up. On the flip side, if your maximum is too high, an error in your data pipeline could lead to huge cost increases.

Instance Configuration

Once you've tested and selected the appropriate instance types, sizes, and number of nodes, you have to make a configuration decision. You can deploy an instance fleet or uniform instance groups.

Instance fleets are flexible and designed to utilize Spot instances effectively. When creating an instance fleet, you specify up to five instance types, a range of availability zones (avoid saving a few cents on instances only to spend them transferring data between zones), a target for Spot and On-Demand instances, and a maximum price you’d pay for a Spot instance. When the fleet launches, EMR provisions instances until your targets are met.

You can set a provisioning timeout, which allows you to terminate the cluster or switch to On-Demand instances if no Spot instances are available. Instance fleets also support Spot instances for predefined durations, allowing your cluster to confidently access a Spot instance for 1 to 6 hours.

Running all your EMR clusters as Spot Instances would be great for your budget but will leave you a system you can’t always use to process work promptly. Evaluate your requirements, plan to have more expensive and more reliable long-running instances to ensure you meet your SLAs while adding cheaper, less reliable Spot instances to handle spikes in demand.

Uniform Instance Groups are more targeted, requiring you to specify a single instance size and decide between On Demand on Spot instances before launching. Instance Groups are perfect for tasks that are well understood and need a concrete, consistent amount of resources. Instance Fleets are great for grabbing Spot instances where possible while allowing the cluster to fall back to On-Demand instances if needed.

Scaling AWS EMR Clusters

AWS EMR clusters are big, powerful, and expensive. EMR utilization also often comes in peaks and valleys of utilization, making scaling your cluster a good cost-saving option when handling usage spikes. Instance fleets and uniform instance clusters can both use EMR Managed Scaling.

This scaling service automatically adds nodes when utilization is high and removes them when it decreases. Unfortunately, it's only available for applications that use Yet Another Resource Manager (YARN) (sorry, Presto). If you're running an instance group, you can also specify your own scaling policy using a CloudWatch metric and other parameters you specify. This can give you more fine-grained control over scaling, but it's obviously more complicated to set up.

Terminate AWS EMR clusters

Another fundamental decision you'll have to make about every EMR cluster you spin up is whether it should terminate after running the job or keep running?

Terminating clusters after running jobs is great for saving money - you no longer pay for an instance or its attached storage - but auto-terminating clusters also has drawbacks. Any data in the HDFS is lost forever upon cluster termination, so you will have to write stateless jobs that rely on a metadata store in S3 or Glue.

Auto-termination is also inefficient when running many small jobs. It generally takes less than 15 minutes for a cluster to get provisioned and start processing data, but if a job takes 5 minutes to run and you’re running 10 in a row, auto-termination quickly takes a toll.

To get the most out of long-running clusters, try to smooth out utilization over time. Scatter jobs throughout the day, and try to draw all the EMR users in your organization to the instance to fill gaps in long-running clusters. Long-running, large clusters can be quite cost-effective. Instance costs decrease relative to size, and the EMR fees on top often don’t increase when increasing instance size, increasing the relative value.

Monitoring AWS EMR Costs

Now that your AWS EMR cluster has instances scaling smoothly while reading beautifully compressed and formatted data, check Cost Explorer to track your cost reduction progress.

Cost Explorer gives you a helpful map of what’s running in your organization but requires some attention to yield the greatest gains. Tag your resources, giving each cluster an owner and business unit to attribute your costs to. Tagging is especially important for EMR because it relies on so many other Amazon services. It can be really hard to differentiate between the many resources used by your EMR cluster when they're in the same AWS account as your production application.

Other AWS EMR Resource

If you're looking for more practical articles on this subject, another solid resource can be found on Teads engineering blog. This article was written by Wassim Almaaoui and he describes 3 measures that helped them significantly lower their data processing cost on EMR:

Run our Workloads on Spot Instances
Leverage the EMR pricing of bigger EC2 Instances.
Automatically detect idle clusters (and terminate them ASAP).

Check this article out when you get a chance to supplement this one: Reducing AWS EMR data processing costs

Knowing Your Requirements, Meeting Your Goals

In this post, you've learned how EMR pricing works and what you can do to minimize and track your EMR costs. It's possible to grow your EMR infrastructure while controlling costs, but it will likely take a deep understanding of your data processing requirements and a little trial and error. EMR can eat up huge amounts of storage, so be sure you partition, compress, and format data to reduce your storage costs. Categorize jobs according to priority, schedule, and resource requirements, then nestle them into the right instance types. Consider cluster termination for jobs that run only occasionally and auto-scale long-running instances when appropriate.

If this is overwhelming, CloudForecast can help. Reach out to our CTO, Francois (francois@cloudforecast.io), if you’d like help implementing a long-term cost-reduction strategy for EMR.

This article was originally published on the CloudForecast Blog: AWS EMR Cost Optimization Guide

Node Exporter and Kubernetes Guide

Tony Chan — Wed, 10 Nov 2021 17:02:48 +0000

Monitoring is essential to a reliable system. It helps keep your services consistent and available by preemptively alerting you to important issues. In legacy (non-Kubernetes) systems, monitoring is simple. You only need to set up dashboards and alerts on two components: the application and the host. But when it comes to Kubernetes, monitoring is significantly more challenging.

In this guide, we explore the challenges associated with Kubernetes monitoring, how to set up a monitoring and alert system for your Kubernetes clusters using Grafana and Prometheus, a pricing comparison of four different hosted monitoring systems, and some key metrics that you can set up as alerts in your system.

How Is Kubernetes Monitoring Different From Traditional Systems?

Kubernetes is highly distributed, being composed of several different nested components. You need to monitor your application and hosts, as well as your containers and clusters. Kubernetes also has an additional layer of complexity—automated scheduling.

The scheduler manages your workloads and resources optimally, creating a moving target. As the architect, you can’t be certain of the identity or number of nodes running on your pods. You can either manually schedule your nodes (not recommended) or deploy a robust tagging system alongside logging. Tagging allows you to collect information from your clusters, which exposes the metrics to an endpoint for a service to scrape.

What Is a Node Exporter by Prometheus

The most popular service that tags and exports metrics in Kubernetes is Node Exporter by Prometheus, an open source service that installs through a single static binary. Node Exporter monitors a host by exposing its hardware and OS metrics which Prometheus pulls from.

How to Install Node Exporter

To monitor your entire deployment, you’ll need a node exporter running on each node—this can be configured through a DaemonSet. Prometheus has a good quick start resourceon this in their public repo.

You can use Helm—the package manager—to install Prometheus in one line:

helm install prometheus-operator stable/prometheus-operator --namespace monitor

Alternatively, you can wget a tar file through GitHub and unzip it:

wget https://github.com/prometheus/node_exporter/releases/download/v*/node_exporter-*.*-amd64.tar.gz
tar xvfz node_exporter-*.*-amd64.tar.gz
cd node_exporter-*.*-amd64
./node_exporter

After installation, verify that the monitors are running on the Pods:

kubectl get pods -n monitor

You should see Prometheus, Grafana, which is an included open-source analytics platform, node-exporter, and kube-state-metrics on the Pods.

How to Use the Node Exporter and View Your Kubernetes Metrics

This Prometheus Node Exporter exposes an endpoint, /metrics, which you can grep. The node exporter scrapes targets at a specified interval, attaches labels to them, and displays them through a metrics URL as text or a protocol buffer. Alternatively, this is available through http://localhost:9100/metrics.

You can also explore these metrics through the Prometheus console dashboard to get specific information. The dashboard and Prometheus metrics can be seen through http://localhost:8080/docker/prometheus. This is different from the Prometheus web UI, where you can explore container metrics through expressions.

The scraped metrics get saved to a database that you can query using PromQL through this web console. For example, a query to select all the values of the HTTP GET requests received in your staging, testing, and development environments would be:

http_requests_total{environment=~"staging|testing|development",method!="GET"}

It is intuitive to query for a number of other summary metrics such as average, max, min, or specific percentiles. Prometheus also allows you to set up alerts through email, Slack, and other supported mediums that get triggered based on conditions via Alertmanager. For example, you can set a trigger to send a high priority Slack message when a P90 is hit on various account creation failures.

How to Set Up Alertmanager

There are a couple of options to install Alertmanager. You can bootstrap a Prometheus deployment through Helm Charts directly with a single command.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Otherwise, you can download and extract the latest Alertmanager tar from Prometheus's official download link. This will link you to the latest version on GitHub, which you can fetch using the code snippet below.

After installation, you can start Alertmanager on localhost port 9093 and begin setting up alerts through the communication channels mentioned above.

Note that CAdvisor, Google’s solution for natively monitoring Kubernetes, can also be used alongside Prometheus to view metrics out of the box. You can explore the metrics through the CAdvisor web UI.

Managed Hosting Through Prometheus, Grafana, New Relic, and Datadog

Once you set up the metrics you're interested in, you can aggregate and display them through dashboards on a hosted backend. Hosted backends include Grafana Cloud, New Relic, Datadog, or you can self-host through Prometheus as discussed earlier.

There are some benefits to locating your metrics servers on premises, but it’s generally poor practice. Exceptions include very large entities with data servers distributed across the world, or those who have strong restrictions around highly sensitive data.

Keeping metrics on-premises results in a single point of failure that cannot be root caused if your metrics are down. Benefits include control over your security protocols and monitoring services, but these will be less robust than cloud platforms unless very well planned.

Thankfully, there are several hosted options for Prometheus and metrics dashboarding. Let’s break down the differences between the most popular:

Hosted Prometheus Pricing

Pricing: Freemium. Can be used as a managed service on Cloud Services.
- AWS Example: Up to 40 M samples and 10 GB of queries free. $0.90/10 MM samples for the first 2 billion samples.
Type: Dashboard
My Thoughts: Self-hosting your metrics and alerting instances creates a single point of failure that can be critical. Another option is that you can instead host it on cloud providers and replicate in multiple AZ/regions.

Grafana Cloud

Pricing: Freemium. Free account with 10,000 series for Prometheus metrics, 50 GB of logs, 50 GB of traces, and 3 team members.14 day free trial for Pro. $49/mo + usage afterward.
Type: Dashboard
My Thoughts: Grafana is a good tool that is natively bundled with Prometheus for dashboarding. Grafana Cloud maintains Grafana for you—including updates, support, and guaranteed uptime.

NewRelic

Pricing: Freemium. Free account for 1 admin user and unlimited viewing users with 100 GB/mo for ingestion and 8+ days retention, with unlimited querying, alerts, and anomaly detection.
- $0.25/GB above 100 GB.
- $99/user above the first.
- $0.50/event above 1000 incidents.
Type: Monitoring tool
My Thoughts: New Relic can be cheaper with lots of hosts, while providing a competitive base of features similar to the ones provided by Datadog. This is because you pay on ingress/egress rather than number of hosts.

DataDog

Pricing: Freemium.
- Free account up to 5 hosts.
- Infrastructure
- $15/mo per host for Pro plan.
- $23/mo per host for Enterprise plan.
- Logging
- $0.10/GB for ingestion.
- $1.70/MM events for 15 day retention.
- See pricing page for other services.
Type: Monitoring tool
My Thoughts; Datadog is a pricier option as you need to opt into multiple services while the others are bundled.

Key Kubernetes Metrics to Monitor

Once you've chosen a service provider, it’s time to set up a list of key metrics. Triggering metrics through severity standards is a good approach here. You can set up a range of alerts, from Severity 1 for individual-productivity problems, to Severity 5 for breaking issues impacting customers worldwide. For these issues, you might want to consider metrics around system performance. This could include CPU, memory, disk space, plus network usage and their trends. Below, you can see a few examples to start off:

CPU Usage

100 - (avg(irate(node_cpu{mode="idle", instance=~"$instance"}[1m])) * 100)

Memory Usage (10^9 refers to GB)

node_memory_MemAvailable{instance="$instance"}/10^9

node_memory_MemTotal{instance="$instance"}/10^9

Disk Space Usage: Free Inodes vs. Total Inodes

node_filesystem_free{mountpoint="/", instance="$instance"}/10^9

Network Ingress

node_network_receive_bytes_total{instance=”$instance”}/10^9

Network Egress

node_network_transmit_bytes_total{instance=”$instance”}/10^9

Cluster CPU Usage

sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100

Pod CPU Usage

sum (rate (container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name)

IO Usage by Container

sum(container_fs_io_time_seconds_total{name=~"./"}) by (name)

Conclusion

Although Kubernetes monitoring is more complex, there are a number of viable options to facilitate the process. We discussed how Node Exporter can help you export metrics, compare different hosted monitoring service options, and explore some key metrics to utilize for monitoring memory, network, and CPU usage.

Once you’ve decided to implement your monitoring stack, consider revisiting your Kubernetes administration and exploring cost reduction through CloudForecast new k8s cost management tool, Barometer.

This article was originally published on: https://www.cloudforecast.io/blog/node-exporter-and-kubernetes/

AWS NAT Gateway Pricing and Cost Reduction Guide

Tony Chan — Wed, 03 Nov 2021 17:25:27 +0000

What Is a NAT Device?

A NAT device is a server that relays packets between devices on a private subnet and the internet. It relays responses back to the server that sent the original request. Since it only sends response packets to the private subnet, it keeps your private subnet secure.

The NAT works by replacing the source address of incoming packets with its own address and forwarding them to their destination on the internet. Similarly when the NAT receives an incoming packet, it replaces the destination address with the address of the server on the private subnet that sent the initial request.

The most common use case for a NAT device in AWS is to download updates on instances in a private subnet, but the NAT can be used any time you want to keep a subnet private and still allow it to talk to the internet.

AWS NAT Devices

You can use two different types of NAT devices in your VPC. The oldest of the two is called a NAT Instance, and the newer one is called a NAT Gateway.

What is an AWS NAT Instance?

An AWS NAT Instance is really just an EC2 instance running a service in a public subnet.

Amazon currently provides a NAT AMI. You can find them by searching for amzn-ami-vpc-nat in the name. You may, however, need to build your own NAT images in the future since Amazon built the NAT AMIs on a version of Amazon Linux that is EOL, and they don't plan on updating the NAT images.

Based on the AWS shared responsibility model{:target="blank"}, you will need to manage updating and scaling your NAT instance. As a tradeoff, you get more control over traffic routing, and you can run software on the instance beyond just a NAT service.

Performance of your NAT Instance will be up to you, since it can vary based on the instance type that you choose{:target="blank"}. For example, a t3.micro instance can have up to 5 Gbps, but a m5n.12xlarge can get 50 Gbps of bandwidth.

What Is AWS NAT Gateway?

AWS NAT Gateway is the new, managed solution to setting up a NAT device in your VPC. Since it's a managed device, you can set it up once and forget about it. AWS will take care of automatically scaling and updating it as needed.

The AWS NAT Gateway can scale to allow up to 45 Gbps through it. If you need more bandwidth, you can always create another one and send different subnet traffic through different gateways.

Nat Gateway vs Nat Instance Pricing

The cost of an AWS NAT instance is just like any other EC2 instance. It’s determined by the type of instance and the amount of data transferred out to the internet.

When you use an AWS NAT Gateway, you're charged for two things{:target="blank"}: a flat rate for every hour that it's running, and a fee for every GB that passes through it.

NAT Gateway Pricing

You can use the AWS Pricing Calculator{:target="blank"} to estimate the costs of VPC configurations. Using the example of the auto repair shop from the introduction, you can calculate some example costs. We'll assume that you'll be transferring 100 GB every month.

You can use the t3.micro and m5n.12xlarge instance types from earlier to get an idea of the range of instance costs. Assuming that you want to run the instance all the time and use an EC2 Instance Savings Plan, you will get the following values:

Description	Cost
t3.micro	$7.75
m5n.12xlarge	$1,316.27
NAT Gateway	$37.35

Ensuring High Availability

If you follow AWS best practices in your VPC, you'll need to set up redundancy across multiple availability zones{:target="blank"} to ensure your application is highly available. This means you'll need to create a NAT Instance or Gateway for each availability zone (AZ). Depending on your availability requirements, this means you'll need to multiply each of the costs in the table by 2 at minimum.

Reduce AWS NAT Gateway Costs

Now that you know how much a NAT device is going to cost, you may be wondering if there's a way to reduce your AWS bill. Read on to learn a few of my favorite methods.

Use the Right Tool

You can see from the comparison table above that the prices of NAT Instances can vary greatly. If you need bandwidth close to 45 Gbps, then you should definitely use the NAT Gateway. In the example above, you would save $1,278.92 and offload maintenance work onto Amazon.

On the other hand, if you need to run a bastion server and 5 Gbps is enough bandwidth, the t3.micro is plenty. This would save $29.60 every month. While it's not as big of a savings as switching from an m5n instance to the NAT Gateway, you do gain the option of using it as a bastion server, too.

Another consideration is maintenance time and costs. If you’re at a smaller company where everyone has multiple roles, offloading maintenance time to AWS can provide a substantial productivity boost.

Take Advantage of Maintenance Windows

In the maintenance shop example, you need to keep the NAT device running all the time if the service places orders for parts throughout the day. If you change the service to place vendor orders at a specific time every day, you could turn on a NAT instance on a schedule.

You can create an EC2 Auto Scaling Group that spins up your NAT Instance a few minutes before your maintenance window{:target="blank"} and scales down to zero when incoming network traffic dies off{:target="blank"}.

If you want to use a NAT Gateway on a schedule, you certainly could do that. It'd be a bit more complicated though, since you would need to create and destroy the gateway on a schedule. You could set up a CloudWatch event that triggers a lambda that updates your VPC infrastructure.

AWS Has a Gift for You

The easiest way to save some money is to take advantage of AWS's always free resources{:target="blank"}. You can run a single t3.micro for 750 hours a month for free. So if you don't have a lot of traffic (less than 5 Gbps), throw up a free NAT instance and use that money on something else.

Conclusion

A NAT device acts as a secure bridge between your private subnet and the internet. AWS provides two NAT device types: a NAT instance that you manage yourself, and a NAT gateway. Since there are some tradeoffs on performance, cost, maintenance, and configurability, you'll need to evaluate both options for your project.

This article was originally published on: https://www.cloudforecast.io/blog/aws-nat-gateway-pricing-and-cost/

Kubernetes Cost Management and Analysis Guide

Tony Chan — Wed, 22 Sep 2021 17:13:04 +0000

The popularity of Kubernetes is constantly increasing, with more and more companies moving their workload to this way of orchestration. Some organizations exclusively develop new applications on Kubernetes, taking advantage of the architecture designs it enables. Other organizations move their current infrastructure to Kubernetes in a lift-and-shift manner. While some tools offer native solutions to cost analysis, these can quickly become too simple of an overview.

Having your workload running in Kubernetes can bring lots of benefits, but costs become difficult to manage and monitor. In this article, we’ll examine the key reasons why cost can be so difficult to manage in Kubernetes. Plus, you’ll gain insight into how you can improve your cost management significantly.

Traditional vs. Kubernetes Resource Management

Before diving into cost management, it's important to first understand how the underlying resources differ. We’ll use the simple webshop above as an example. This webshop contains three distinct components: a frontend service, a cart service, and a product service. The frontend service is responsible for serving everything visually. The cart service is responsible for saving a customer's order in the database. Lastly, the product service is an API that other services, like the frontend, can query in order to get product information. An actual webshop will naturally be more complicated, but we'll stick with this as an example.

Traditional Architecture

Traditionally you would spin up each service on their own pools of VMs, giving these pools the appropriate sizes. This makes it easy to see the cost of each service; you just need to look at the bill. For example, you can quickly figure out the product service is taking up a lot of resources, which you can then start looking into.

Since traditional architecture has been around for so long, many tools—especially cloud providers—are used to reporting costs this way. This isn't the case for Kubernetes.

Kubernetes Architecture

It's possible to re-create the traditional architecture in Kubernetes with a dedicated node pool for each service, but this isn’t the best practice. Ideally, you should use a single or a few pools to host your applications, meaning the three distinct services can run on the same set of nodes. Because of this, your bill can’t tell you what service is taking up what amount of resources.

Kubernetes does provide you with standard metrics like CPU and RAM usage per application, but it’s still tough to decipher not only what is costing you a lot, but specifically how you can lower costs. Given Kubernetes’ various capabilities, many strategies can be implemented to lower costs.

Strategies can involve rightsizing nodes, which isn't too different from a traditional architecture, but Kubernetes offers something new. Kubernetes lets you rightsize Pods. Using limits and requests, as well as specifying the right size of your nodes, you can make sure Pods are efficiently stacked on your nodes for optimal utilization.

Comparing Architectures

While Kubernetes offers many advantages over a traditional architecture, moving your workload to this orchestrator does present challenges. Kubernetes requires extra focus on cost; it won’t be possible to simply look at the bill and know what resources are costing a lot.

With Kubernetes, you should look into using specialized tools for cost reporting. Many of these tools include recommendations on how to lower your cost, which is especially useful. Let's take a deeper dive into how you would manage cost in a Kubernetes setup.

Managing Kubernetes Costs

Managing costs in Kubernetes is not a one-and-done process. There are a number of pitfalls that, if overlooked, could result in businesses experiencing higher costs than what they may have predicted. Let’s talk about some areas where you should be on the lookout for opportunities to mitigate costs.

Kubernetes Workload Considerations

First, understand the nature of your application and how it translates to a cluster environment. Does your application consist of long-lived operations, batch operations that get triggered, are they stateful (ie, databases) or are they stateless?

The answers to these questions should inform the decision-making process around what Kubernetes objects need to be created. Ensuring that your environment only runs the necessary resources is a key step to cost optimization.

Kubernetes Workload Resource Management

Once you have a clear picture of your resources, you can set some limits and configure features like Horizontal Pod Autoscaling (HPA) to scale pods up and down based on utilization. HPAs can be configured to operate based on metrics like CPU and memory out of the box, and can be additionally configured to operate on custom metrics. As you analyze your workload, you can further modify the settings that determine the behavior of your resources.

Kubernetes Infrastructure Resource Management

Managing Kubernetes costs around infrastructure can be especially tricky as you try to figure out the right type of nodes to support your workloads. Your node types will depend on the applications, their resource requirements, and factors related to scaling.

Operators can configure monitoring and alerts to keep track of how nodes are coping and what occurrences in your workload may be triggering scaling events. These kinds of activities can help organizations save costs related to overprovisioning by leveraging scaling features and tools like Cluster Autoscaler to scale nodes when necessary.

Leveraging Observability

In the same vein as the previous point, your organization can make more informed decisions regarding your Kubernetes cluster size and node types by monitoring custom application metrics (ie, requests per second) along with CPU, memory, network, and storage utilization by Pods.

Optimizing Kubernetes Cost with Monitoring

One of the main ways to optimize the costs associated with running Kubernetes clusters is to set up the correct tooling for monitoring. You’ll also need to know how to react to the information you receive, and make sure it’s given to you effectively.

Barometer is coming soon to CloudForecast, which will be helpful for use cases like this.

Monitoring Kubernetes Cluster Cost

The first things you need to monitor in your Kubernetes Cluster are CPU and memory usage. These metrics give you a quick overview of how many resources your Kubernetes cluster is using. By making sure resources in your Kubernetes cluster are correctly tagged using labels or namespaces, you’ll quickly learn what services are costing the most in your organization.

The easiest way to monitor these metrics is via automated reporting. CloudForecast’s upcoming tool will be able to consolidate these reports and deliver them to your team by email or Slack. This ensures each team is aware of how their services are performing, and whether they're using up too many costly resources.

Setting up a general overview is highly recommended. Additionally, you should also ensure you get notified if something out of the ordinary happens. For example, you’ll want to be notified if the product service suddenly starts costing a lot more; this allows you to troubleshoot why and work on fixes.

Kubernetes comes with various different metrics you can use to determine the cost of a specific service. Using the /metrics endpoint provided by the Kubernetes API, you can get a view into pod_cpu_utilization and pod_memory_utilization. With these metrics it becomes easier to see what workloads are drawing what costs. Tools like CloudForecast’s Barometer use these metrics to calculate how many dollars every pod is spending. Having this overview and g_tting a baseline cost of your_ Kubernetes cluster, will help you know when costs are rising too rapidly, and exactly where it’s happening. Knowing how cAdvisor works with Prometheus, and the metrics they collectively expose is incredibly valuable when you want to examine your clusters.

While there are many metrics that can be analyzed, RAM and CPU are typically the ones you want to focus on, as these are the ones that drive your provider to allocate more resources. You can think of RAM and CPU metrics as the symptoms of your cost. With a proper overview they will allow you to know what workloads are costing you more than normal, and from there you start drilling into the service to figure out why it’s happening.

Acting on Monitoring Data

Once you've been notified of irregularities in your Kubernetes cluster, you need to act. There are many valid strategies for lowering k8s cluster cost. As mentioned earlier, a good first step in Kubernetes is to rightsize your nodes and Pods so they run efficiently. Whatever steps you take to optimize cost, doing it manually is tough.

Tools automatically suggest why your cost is high and how to reduce it. This allows you to quickly implement cost optimizations, but also uncover solutions that otherwise wouldn't have come to mind.

What to Monitor

Tools can help a lot, but they’re wasted without a good foundation. To set up a good foundation, you should determine a set of Key Performance Indicators (KPIs) to monitor. A great KPI example is the number of untagged resources. Having your resources tagged allows your tool reports to be more precise, delivering better optimizations.

You could also monitor the total cost of your untagged resources. This can act as motivation for getting your resources tagged, and remind your team to maintain a good baseline approach when setting up new resources. Tracking your KPIs before and after the introduction of a tool is a great way to determine how much it actually helps. In any case, determining KPIs will make sure you’re on top of what's happening in your Kubernetes cluster.

How to Develop a Unit Cost Calculator

Crucial to understanding your cost is knowing how to use the AWS Pricing Calculator. This helps you compare costs associated with running a self-hosted Kubernetes cluster versus with an Amazon EKS cluster.

The CPU (vCPUs) and Memory (GiB) specified in the following example are just for demonstrative purposes and will vary depending on workload requirements.

AWS EKS Cluster Cost and Pricing Estimation

The following calculations are for a Highly Available (HA) Kubernetes cluster with a control plane managed by AWS (EKS) and three worker nodes with 4 vCPUs and 16 GiB of memory each. The instance type used in this case is a t4g.xlarge Reserved EC2 instance (1 year period). This instance type is automatically generated as a recommendation based on the CPU and memory requirements that are specified.

Unit Calculations

EC2 Instance Savings Plans rate for t4g.xlarge in the EU (Ireland) for 1 Year term and No Upfront is 0.0929 USDHours in the commitment: 365 days x 24 hours x 1 year = 8760.0 hours
Total Commitment: 0.0929 USD x 8760 hours = 813.8 USD
Upfront: No Upfront (0% of 813.804) = 0 USD
Hourly cost for EC2 Instance Savings Plans = (Total Commitment - Upfront cost)/Hours in the term: (813.804 - 0)/8760 = 0.0929 USD

Please note that you will pay an hourly commitment for the Savings Plans and your usage will be accrued at a discounted rate against this commitment.

Pricing Calculations

1 Cluster x 0.10 USD per hour x 730 hours per month = 73 USD
[worker nodes] 3 instances x 0.0929 USD x 730 hours in month = 203.45 USD (monthly instance savings cost)
30 GB x 0.11 USD x 3 instances = 9.90 USD (EBS Storage Cost)

Monthly Cost: 286.35 USD
Annual Cost: 3,436.20 USD

Self-Hosted Kubernetes Cluster Pricing and Cost Estimation

The following calculations are for a custom Highly Available (HA) Kubernetes cluster that is self hosted in AWS, and also consists of three worker nodes with 4 vCPUs and 16 GiB of memory each. Similar to the previous analysis of EKS cluster cost estimations, this analysis will use the same instance type for the same reasons detailed above.

Unit Conversions

EC2 Instance Savings Plans rate for t4g.xlarge in the EU (Ireland) for 1 Year term and No Upfront is 0.0929 USD
Hours in the commitment: 365 days * 24 hours * 1 year = 8760.0 hours
Total Commitment: 0.0929 USD * 8760 hours = 813.8 USD
Upfront: No Upfront (0% of 813.804) = 0 USD
Hourly cost for EC2 Instance Savings Plans = (Total Commitment - Upfront cost)/Hours in the term: (813.804 - 0)/8760 = 0.0929 USD

Please note that you will pay an hourly commitment for Savings Plans and your usage will be accrued at a discounted rate against this commitment.

Pricing Calculations

[control-plane nodes] 3 instances x 0.0929 USD x 730 hours in month = 203.45 USD (monthly instance savings cost)
[worker nodes] 3 instances x 0.0929 USD x 730 hours in month = 203.45 USD (monthly instance savings cost)
30 GB x 0.11 USD x 3 instances = 9.90 USD (EBS Storage Cost)

Monthly Cost: 426.70 USD
Annual Cost: 5,120.40 USD

Conclusion

By now you've learned how Kubernetes architecture differs from traditional architecture. You've learned what challenges arise once you start to manage costs in Kubernetes, and how to keep them under control. Features like labeling and namespacing can have a great impact on the traceability of your cost, allowing you to reap the full benefits of a Kubernetes architecture. Also, you’ve learned how using the AWS Pricing Calculator can help you estimate the costs associated with running your workloads on a custom Kubernetes cluster compared to running an EKS cluster.

Using a tool like CloudForecast’s Barometer can greatly improve the tracking of cost in your cluster. Barometer not only offers you an effective general overview, it also gives you actionable cost optimization insights.

This article was originally published on: https://www.cloudforecast.io/blog/kubernetes-cost-management-and-analysis/

AWS Data Transfer Costs and Optimization Guide

Tony Chan — Tue, 09 Mar 2021 22:37:32 +0000

While it's easy to quickly move data around in the cloud, each AWS data transfer costs money and these hidden fees and quickly add up over time if you're not careful. In this guide, we'll summarize AWS Data Transfer pricing and all the little cost associated with it, and show you how to reduce data transfer cost in AWS.

It's easy to move data around, but AWS data transfer costs can quickly add up if you're not careful!

AWS Data Transfer Costs and Pricing

To summarize AWS Data Transfer Pricing, Amazon charges for data transfers between AWS and the internet and between various AWS services, such as EC2 and S3. These AWS data transfer costs may be included in the cost of the service, assessed in only one direction or assessed in both directions. It's important to understand these variables to manage costs.

In general, AWS data transfer pricing are highest for data transfers between regions, followed by data transfers between availability zones (AZs), and finally, within a single AZ. AWS Open Guide provides a helpful visualization showing the dynamic nature of these costs, although it hasn’t been updated since August 2017 and may be outdated in areas.

Here is the full diagram, which can also be found here directly: https://github.com/open-guides/og-aws#aws-data-transfer-costs

Reduce AWS Data Transfer Costs

Reducing your AWS Data transfer cost is often a balance between stability, security, and cost. Resources split across regions, AZs, and VPCs certainly increase data transfer costs, but splitting resources also provides greater security and higher availability during disasters and outages.

AWS Data Transfer Costs Between AZs

Traffic between AZs cost $0.01 per Gb per direction—both in and out. When you transfer data from one region to another, you pay $0.02 per Gb for outgoing data transfer. The only exception is from us-east-1 to us-east-2, where you pay $0.01 per Gb. Inbound data transfer will cost you money if you download from another instance in the same region with a public IP regardless of whether it’s your account. (Thanks to /u/Burekitas for these added cost insights via /r/aws).

Six Tips to Reduce AWS Data Transfer Costs

Efficient Routes: Minimize traffic between regions and AZs and maximize the traffic that stays within AZs since the costs tend to be significantly lower. If you don't need high-cost regions, cut them to save on data transfer costs.
Avoid Public IPs: AWS Data transfer costs are higher with public IP or elastic IP addresses compared to private addresses. If you're only accessing data locally, use private addresses.
Look for Free Transfers: Some AWS services provide free cross-AZ data transfers, such as EFS, RDS, and MSK, which could make them a compelling option.
Use CloudFront: Amazon CloudFront is a global content delivery network (CDN) with free data transfers to AWS for HEAD/GET requests. If you have high data volumes, consider using it to keep costs down.
DIY NAT Instances: The high cost of managed NAT gateways, in data transfer and processing fees, means that it might make sense to run your own NAT instances to save on cost.
Experiment with Options: The AWS Simple Monthly Calculator lets you experiment with different configurations and learn how to maximize your savings. The data transfer section provides insight into data cost management.

By keeping these best practices in mind, you can minimize your AWS data transfer costs while ensuring that security and availability aren't adversely affected. If you’re using more than 300Tb per month, contact AWS Sales to discuss options.

EC2 Other Costs in Cost Explorer

AWS Cost Explorer lets you analyze data transfer costs for each instance as long as you enable cost allocation tags. You can analyze everything from data transfer costs as an aggregate number for regions, AZs, and virtual private clouds (VPCs) to data transfer costs on a per-application or per-instance basis.

There are also a few Cost Explorer for Cost and Usage filters that we use frequently to monitor Data Transfer costs quickly. The following are a few filters that you can use:

The Bottom Line

Cloud infrastructure providers may be cheaper than on-premise infrastructure on the surface, but there are several gotchas where costs can add up — including data transfer costs. By carefully monitoring your costs and adhering to some best practices, you can keep these costs well under control.

This post was originally written on CloudForecast Blog on 5/27/2020.

AWS RDS Pricing and Optimization Guide

Tony Chan — Tue, 17 Nov 2020 16:33:40 +0000

Amazon Web Services makes getting your data into their Relational Database Service(RDS) relatively easy. Import costs are free, and you can store up to 100 terabytes across all your instances. AWS RDS hosts your relational databases in the cloud, and their engineers handle patching, monitoring, availability, and some security concerns.

These factors make getting started with AWS RDS easy, but understanding and controlling your costs is another matter entirely. In this article, you’ll learning the following with AWS RDS:

AWS RDS Cost and Pricing
- AWS RDS Database Engine
- RDS Instance Sizes
- Reserved Instances
- RDS Storage: Aurora and Autoscaling
- RDS Backups
- Regions
- Multi-AZ Deployments
RDS Cost Monitoring
- AWS Cost Explorer
- RDS Management Console and Enhanced Monitoring
- CloudForecast
AWS RDS Cost Optimization
- Right Sizing your Instances
- Database Hygiene
- RDS IOPS
- RDS CloudWatch Metrics
- RDS Data Transfer Cost
- RDS Snapshots

AWS RDS Cost and Pricing

Instance usage, storage, I/O, backups, and data transfers drive the bulk of your AWS RDS costs. Instance usage and storage are unavoidable, but generally, you should minimize them while adequately addressing your needs. AWS RDS offers some I/O and backup capability bundled into the cost of storage, but you might need more. Moving data into RDS from the internet is free, but moving it out of RDS can get expensive.

In this section, I’ll dive deeper into each of AWS RDS’s pricing factors to help you understand how your usage might affect your monthly bill.

RDS Database Engine

Amazon currently offers six database engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server. Normally you won’t be able to change your database engine, but you can choose to optimize for memory, performance, or I/O.

The three open-source databases (Postgres, MySQL, and MariaDB) are similar in price. Depending on the size, PostgreSQL instances are five to ten percent more expensive per hour, but PostgreSQL, MySQL, and MariaDB share pricing for storage, provisioned I/O, and data transfer.

AWS Aurora is Amazon's proprietary database, so it gets special treatment. AWS offers a serverless option, making it ideal for applications that don’t need to be on all the time, like test environments. Aurora also has a multi-zone backup system that charges per million replicated I/O operations. Storage per gibibyte (GiB) is a few cents more expensive, but if you're dealing with intermittent usage or need fast failovers and many read replicas, Aurora can save money over implementing these features on other engines.

Oracle and SQL Server aren't open-source or owned by Amazon. To accomodate licensing, hourly instances can cost nearly twice as much. You can self-license with Oracle, which brings the cost in line with open-source options. Other fees, like storage and data transfer, match their open-source counterparts.

While this guide isn’t intended to help you choose the best database engine, it is important to note that pricing varies based on the engine you choose.

RDS Instance Sizes

Once you select an engine, you have to select an RDS instance size with the appropriate computational (vCPU), network (Mbps), and memory capacity (GiB RAM). RDS offers instances ranging from db.t3.micro (2 vCPUS, 1 GiB RAM, 2085 Mbps) to db.m5.24xlarge (96 vCPUS, 384 GiB RAM, 19,000 Mbps).

Selecting the right RDS instance size can be challenging. To estimate the RDS instance size you’ll need, estimate or track the amount of data your queries need (called your working set), then select an instance that can fit your working set into memory. I’ll touch on RDS monitoring and right-sizing your RDS instances later in this guide.

Reserved Instances

Without additional configuration, RDS instances are created on-demand. These instances are billed in one-second increments from the moment the instance starts to its termination. You can stop, start, or change an on-demand instance size at any time.

The alternative to on-demand pricing is Reserved Instances for RDS. You commit to lease an RDS instance for a set period (1 or 3+ years) in exchange for discounts up to 60%. AWS offers sizing flexibility for all Reserved Instance engines except SQL Server and License Included Oracle, allowing administrators to freely change instance size within the same family. If you're able to commit to RDS for a year or three and have monitored your requirements enough to develop a solid performance baseline, you can save money by trading away the flexibility to turn off or downsize your databases.

RDS Storage: Aurora and Autoscaling

For most engines, you buy storage per GiB in advance. Aurora is the exception: you only pay for what you use. It's important to accurately predict your monthly storage needs as you cannot reduce storage on an instance (except Aurora).

AWS can auto-scale your storage when an instance has under 10% space remaining for more than 5 minutes. This option has the benefit of keeping storage costs low, but can surprise you if something unexpected happens. To protect against auto-scaling to 65,536 GiB, set a maximum storage threshold for your instance.

If you can accurately predict your storage needs, manually provisioning is the cheapest option. If you're facing unpredictability or unused storage, consider auto-provisioning and focus on maintaining a reasonable storage maximum.

RDS Backups

AWS backs up 100% of the storage you've purchased in any zone for free. If you buy 20 GiB of storage across two instances, it includes 20 GiB of backup space.

If you need more space for backups, you pay per GiB at a slightly lower rate than regular storage costs. RDS automatically backs up each storage volume every day. These backups are stored according to the backup retention period. Automated backups will not occur if the DB's state is not AVAILABLE (for example, if the state is STORAGE_FULL). Users can also create manual backups. These never expire and count against your backup storage total.

Regions

As with most AWS services, RDS costs are specific to a region. Choose your region carefully because the most expensive regions double the hourly cost of instances, increase storage by a few cents per GiB, and 5x inter-zone data transfer pricing.

On the other hand, if your database is located further from your application servers, you’ll add latency to every database call. If this latency degrades user experience, you probably don’t want to use a distant, slightly cheaper region.

Multi-AZ Deployments

If you need availability when an AWS regional data center encounters trouble, you can enable Multi-AZ deployments. This creates a backup database instance and replicates your data to a second AWS data center. Be aware that this doubles your monthly instance and storage costs but enhances the reliability of critical services. If you need a Multi-AZ deployment, focus on reducing your storage needs and instance size, these gains are won twice.

RDS Cost Monitoring

Once you understand the many options RDS offers and set up an instance, there are a few tools that will help you audit your current usage and predict future requirements.

AWS Cost Explorer

The best way to audit your RDS spending is the [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/. Activating and examining your daily or monthly spend is an excellent way to visualize your organization's priorities.

Tagging your resources helps you understand which projects and teams are using which databases. Cost Explorer also offers suggestions for using reserved instances based on your past usage.

RDS Management Console and Enhanced Monitoring

AWS provides a “Monitoring” tab in the RDS Console that displays free-tier CloudWatch metrics like the number of connections and CPU utilization. Keeping an eye on your usage in the console can help you prepare to right-size your storage or purchase reserved instances.

AWS gives you the option to activate additional monitoring services.

Performance Insights gathers data about the database load. This tool has its own pricing model with a free tier that includes 7-day retention.

Enhanced Monitoring is stored and priced as CloudWatch logs. It reports metrics from a user agent instead of the hypervisor, allowing you to examine running processes and the OS, which is useful for examining the resource usage of individual queries.

Finally, you can enable and access database logs directly for the price of storing the files.

CloudForecast

CloudForecastsupplements AWS’s Cost Explorer through proactive monitoring and optimization reports that keeps your RDS cost in check.

Through the daily cost reports, you'll receive a daily report via email or slack that details your RDS cost in relation to your overall spend and alerts you of any cost anamolies with RDS.

The tagging compliance report helps make sure your RDS instances are properly tagged and let you know exactly which RDS resources are not following compliance.

Finally, the ZeroWaste Health Report (beta) let's you know of possible inefficiencies by identifying all your over-provisioned and unused RDS instances in one single report.

RDS Cost Optimization

Armed with insights into your requirements and RDS's abilities, it's time to put cost-saving measures and AWS best practices to use. In the rest of this guide, I’ll offer some strategies for decreasing your RDS costs using the insights you gathered above.

Right Sizing your Instances

In short, turn off anything that's not being used.

Every month, you pay for instances and storage and the infrastructure attached to them. You can check each database’s utilization using the connections metric in the RDS Console. On-demand RDS instances can be stopped for up to 7 days. When stopped, you aren't charged for DB Instance hours, but you are charged for storage. You can use an AWS Lambda scheduled event and DB Instance API calls to programmatically stop and start instances.

To practice right-sizing, act on your monitoring and purchase the machine with the minimum requirements to meet your needs. Multiple RDS instances can also be consolidated into a single instance to minimize costs. This is especially helpful for development environments where a low number of users can access several databases running on a single instance.

Database Hygiene

Indexing and database sanitation are both important for controlling costs. Proper indexing is important for performance and I/O, as it allows your instance size to remain small and minimizes bottlenecks.

Removing unused tables, columns, and indexes directly impacts your storage costs. Cachingand batching statements can improve performance. You can use Enhanced Monoring and database-specific tools like the MySQL slow query log to track and examine outliers that take lots of resources, then optimize them.

RDS IOPS

Input/Output (I/O) operations are extremely important to databases. You use an input operation to write to the database and an output operation to read data. You can monitor read and write I/O per second (IOPS) from the RDS Console.

General Purpose SSD instances start fully stocked with 5.4 million IOPS credits, enough to perform 3,000 operations a second for 30 minutes. They also generate I/O credits at a rate of 3 IOPS per GiB of storage.

In the first months of an RDS transition, keep an eye on your IOPS credit balance (also reported in the RDS console) so you don't run out.

The options for directly increasing I/O are purchasing more storage or switching to the Provisioned IOPS storage type, where you pay a fixed price per thousand IOPS. Indexing and increasing the instance size can also increase I/O speed as each item in the queue is handled more efficiently.

RDS CloudWatch Metrics

CloudWatch is an excellent monitoring tool, but it incurs its own costs. It’s important to consider your monitoring frequency. Going from 5-minute monitoring to 1-minute monitoring will dramatically increase the size of your logs. If your CloudWatch budget is slowly expanding, consider modifying the log data retention period and auditing your alarms to be sure they're relevant.

RDS Data Transfer Cost

Importing data to RDS from the internet is free, but you often want to use your data elsewhere. Data transfer out to the internet costs between .09 and .13 cents per GiB. Data transfer prices between zones are very dependent on the chosen zone, ranging in price from .02 to .13 cents per GiB.

Be aware that you're actually charged twice: once when the data leaves a zone and again when it enters a target zone. To reduce these costs, minimize the amount of data you're sending. Limit queries, and don't re-run reports. Transferring data between Amazon RDS and EC2 Instances in the same Availability Zone is free, so you can sidestep some data fees by consolidating services into a single zone.

RDS Snapshots

Database snapshots contribute to your storage costs as well. You can reduce backup storage by reducing your backup retention period and deleting manually created snapshots, which are never automatically removed. If you need to store snapshots, move them somewhere cheaper like an S3 instance. For MySQL, storing snapshots in S3 is about 1/5 as expensive as keeping them in RDS.

Conclusion

AWS RDS is extremely powerful and deeply customizable, but the complex pricing model means that your monthly spend can grow quickly and unexpectedly.

RDS optimization starts with understanding your current requirements using monitoring tools like CostExplorer and CloudWatch. Once you know the resources you need, you can select the best region, instance engine, and size. Then, keep an eye on your IOPS credits, data transfers between zones and out to the internet, and backup requirements. Database hygiene matters even more when you’re paying per GiB, so optimize queries and use efficient indexes.

If you’re struggling to understand your AWS spend, CloudForecast can help. Reach out to our CTO, francois@cloudforecast.io, if you’d like help tagging your resources or implementing a long-term cost-reduction strategy.

This was originally posted on our blog on 11/17/2020: https://www.cloudforecast.io/blog/aws-rds-pricing-and-optimization/

Using Terraform and AWS CloudFormation to Enforce Your AWS Tags

Tony Chan — Thu, 27 Aug 2020 18:06:57 +0000

Once you have adopted an AWS tagging strategy, you’ll need to make sure that all your existing AWS resources and any new ones you create abide by it. Consistency is the key - if you don’t proactively enforce your AWS tagging strategy, you’ll always be playing catch up and chasing down team members to make sure they add the right tags to their resources.

While you can apply AWS tags to your resources manually using the AWS CLI or AWS Tag Editor, you’ll probably find this cumbersome and error-prone at scale. A better approach is to automatically apply AWS tags to your resources and use rules to enforce their consistent usage.

Depending on the tool you use to maintain your infrastructure on AWS, your method of proactively enforcing AWS tags on new resources may vary. In this guide, I’ll highlight two tools: Terraform and AWS CloudFormation. You’ll see how to use each to create and update AWS cost allocation tags on your resources and then enforce the proper use of specific tags for new resources. By proactively enforcing your AWS tagging strategy, you’ll minimize your time spent auditing and correcting improper AWS tags and force developers to learn best AWS tagging best practices for your environment.

Using Terraform for AWS Tags

The first infrastructure management tool I’ll cover is Terraform. Terraform works across a variety of cloud hosting providers to help you provision and maintain your AWS resources. With Terraform, you can define your servers, databases, and networks in code and apply your changes programmatically to your AWS account.

If you’re new to Terraform, they have a well-documented Getting Started guide and several AWS template examples on GitHub. In this section, I’ll show you some snippets from a demo Terraform project and module that is available on GitHub. You'll learn the following in this Terraform AWS tags:

Tag a New AWS EC2 Instance with Terraform
Using Terraform to Update Existing AWS Tags
Enforce AWS Tags with Terraform

Tag a New AWS EC2 Instance with Terraform

If you want to deploy an EC2 instance with AWS Tags using Terraform, your configuration might include something like this:

resource "aws_instance" "cart" {
  connection {
    type = "ssh"
    user = "ubuntu"
    host = self.public_ip
    private_key = file(var.private_key_path)
  }

  instance_type = "t2.micro"

  ami = var.aws_amis[var.aws_region]

  key_name = aws_key_pair.auth.key_name

  vpc_security_group_ids = [aws_security_group.default.id]

  subnet_id = aws_subnet.default.id

  provisioner "remote-exec" {
    inline = [
      "sudo apt-get -y update",
      "sudo apt-get -y install nginx",
      "sudo service nginx start",
    ]
  }
  tags = {
    contact = "j-mark"
    env = "dev"
    service = "cart"
  }
}

The above example includes three AWS cost allocation tags: contact, env, and service with values described as strings. When you apply this configuration. Terraform will connect to AWS and deploy an EC2 instance having the AWS tags you specified.

Using Terraform to Update Existing AWS Tags

Terraform makes it easy to update already existing resources with AWS tags in reversible and consistent ways. If you’re using AWS tags to keep track of a resource’s contact (e.g.: j-mark in the above example), you’re likely to need to update the AWS tag when the team member leaves or changes roles.

To update the AWS tags on your resource, simply update the corresponding tags in your Terraform configuration. The new tags will overwrite any previous tags assigned to the resource, including tags added outside of Terraform.

For example, to change the contact cost allocation tag on the EC2 instance above, you might update the tags block above with the following:

tags = {
  contact = "l-duke"
  env = "dev"
  service = "cart"
}

When you apply this configuration, the AWS tags will be automatically updated in the AWS console:

If you keep your Terraform configuration files in version control - which is probably a good idea - you will be able to see how tags have changed over time. You can also review changes using the same code review process that your application code goes through to help you catch mistakes in the execution of your tagging strategy.

Enforce AWS Tags with Terraform

As your infrastructure grows, a code review process likely won’t be enough to prevent improper AWS tagging. Fortunately, you can enforce AWS tag names and values using variables and custom validation rules in Terraform.

In the examples above, the tags list was hard-coded into the EC2 instance definition. A more scalable pattern would be to break your EC2 instance template into its own module and use a tags variable. You can then write a custom validation rule to check that the tags comply with your strategy.

For example, if you want to check that:

The user specifies at least one tag
The contact tag is either j-mark or l-duke
The env tag is set
The service tag is either cart or search

You might create a module with a variable specified like this:

variable "tags" {
  description = "The tags for this resource."
  validation {
    condition = length(var.tags) > 0 && contains(["j-mark", "l-duke"], var.tags.contact) && var.tags.env != null && contains(["cart", "search", "cart:search"], var.tags.service)
    error_message = "Invalid resource tags applied."

  }
}

Now when you run terraform plan with a missing or invalid tag, you’ll get an error:

Error: Invalid value for variable
...
Invalid resource tags applied.

Your rules can be as complex as Terraform’s Configuration Language allows, so functions like regex(), substr(), and distinct() are all available. That said, there are some caveats to this approach.

First, custom variable validation is an experimental feature in Terraform. Experimental features are subject to change, meaning that you might need to pay attention to Terraform update mores closely. To enable variable_validation, add the following to your terraform block:

terraform {
  experiments = [variable_validation]
}

Second, Terraform’s variable validation only happens during the terraform plan phase of your infrastructure’s lifecycle. It can’t prevent users from accidentally changing your tags directly in the AWS console, and it’s only as good as the validation rules you write. If you start using a new resource but forget to add validation rules, you might end up with lots of resources that don’t adhere to your tagging strategy.

Another option for paid Terraform Cloud customers is Sentinel , which allows you to create custom policies for your resources. I won’t cover this method here, but Terraform has created an example policy to show you how to enforce mandatory AWS tags.

Using AWS CloudFormation for Tags

Similar to Terraform, AWS CloudFormation lets you provision AWS resources based on configuration files. Unlike Terraform, CloudFormation is part of Amazon’s offerings, so it won’t necessarily help you if you want to use another infrastructure provider. The approach to tagging your resources in CloudFormation is similar to that used by Terraform, but as you’ll see, the configuration format is different.

If you’re new to AWS CloudFormation, Amazon’s official walkthrough will help you get started deploying some basic templates. In this section, I’ll show you some snippets from a demo AWS CloudFormation template which is also available on GitHub. You'll learn the following in this Terraform AWS tags section:

AWS CloudFormation Template to Deploy Tags
Using CloudFormation to Update AWS Tags
CloudFormation Template to Enforce AWS Tags

AWS CloudFormation Template to Deploy Tags

AWS CloudFormation is designed to make it easy to create AWS resources with a single template file. Using a CloudFormation template, every resource that can be deployed with an AWS tag.

For example, to create a new EC2 instance with the same three AWS tags used in the Terraform example above, add an array of Tags to the resource’s Properties block:

"Resources" : {
  "WebServerInstance": {  
    "Type": "AWS::EC2::Instance",
    "Metadata" : {...},
    "Properties": {
      "Tags" : [
       {
          "Key" : "contact",
          "Value" : "j-mark"
       },
       {
          "Key" : "env",
          "Value" : "dev"
       },
       {
          "Key" : "service",
          "Value" : "cart"
       }
      ],
      ...       
    }
  },
  ...         
},

Using AWS CLI, you can deploy this CloudFormation template as a new stack. This will ensure your template is valid and create the specified resources with their tags on AWS:

aws cloudformation create-stack --template-body file://path/to/your/template.json --stack-name=<YOUR_STACK_NAME>

If you have lots of similar resources in your template, you can deploy AWS tags to all the resources in the stack at once using the --tags flag with the create-stack or update-stack commands:

# Creating a stack with tags
aws cloudformation create-stack --template-body file://path/to/your/template.json --stack-name=<YOUR_STACK_NAME> --tags="Key=env,Value=dev"

# Updating a stack with tags
aws cloudformation update-stack --template-body file://path/to/your/template.json --stack-name=<YOUR_STACK_NAME> --tags="Key=env,Value=dev"

Using CloudFormation to Update AWS Tags

If you want to change the contact on your EC2 instance created above, simply change the Tags section of your template file and use the [update-stack](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudformation/update-stack.html) command to deploy your changes.

"Tags" : [
 {
    "Key" : "contact",
    "Value" : "l-duke"
 },
  ...
],

AWS CloudFormation behaves the same way that Terraform does when you update tags outside your template file. Any tags set manually will be overridden by the update-stack command, so be sure that everyone on your team deploy's tags through CloudFormation.

CloudFormation Template to Enforce AWS Tags

AWS provides Organization Tag Policies and Config Managed Rules to help you find improperly tagged resources, but neither of these tools prevents you from creating resources with missing or invalid tags. One way to proactively enforce your tagging strategy is by using the CloudFormation linter.

cfn-lint is a command-line tool that will make sure your AWS CloudFormation template is correctly formatted. It checks the formatting of your JSON or YAML file, proper typing of your inputs, and a few hundred other best practices. While the presence of specific tags isn’t checked by default, you can write a custom rule to do so in Python.

For example, if you want to ensure that your CloudFormation web servers follow the same rules as the Terraform example above and have:

At least one AWS tag
The contact tag set to either j-mark or l-duke
The env tag set
The service tag set to cart or search

You can create a new rule called TagsRequired.py:

from cfnlint.rules import CloudFormationLintRule
from cfnlint.rules import RuleMatch


class TagsRequired(CloudFormationLintRule):
    id = 'E9000'
    shortdesc = 'Tags are properly set'
    description = 'Check all Tag rules for WebServerInstaces'

    def match(self, cfn):
        matches = []
        approved_contacts = ['j-mark', 'l-duke']
        valid_services = ['cart', 'search']
        web_servers = [x for x in cfn.search_deep_keys('WebServerInstance') if x[0] == 'Resources']

        for web_server in web_servers:
            tags = web_server[-1]['Properties']['Tags']

            if not tags:
                message = "All resources must have at least one tag"
                matches.append(RuleMatch(web_server, message.format()))

            if not next((x for x in tags if x.get('Key') == 'env'), None):
                message = "All resources must have an 'env' tag"
                matches.append(RuleMatch(web_server, message.format()))

            for tag in tags:
                if tag.get('Key') == 'contact' and tag.get('Value') not in approved_contacts:
                    message = "The contact must be an approved contact"
                    matches.append(RuleMatch(web_server, message.format()))

                if tag.get('Key') == 'service' and tag.get('Value') not in valid_services:
                    message = "The service must be a valid service"
                    matches.append(RuleMatch(web_server, message.format()))

        return matches

When you run cfn-lint, include your custom rule:

cfn-lint template.json -a ./path/to/custom/rules

If your CloudFormation template is missing any tags, you’ll see an error:

E9000 Missing Tag contact at Resources/WebServerInstance/Properties/Tags
template.json:169:9

Using linting to validate your AWS CloudFormation rules is a great way to enforce your AWS tags proactively. If you’re storing your CloudFormation templates in version control, you can run cfn-lint using pre-commit hooks or by making it part of your continuous integration workflow.

Because these rules are written in Python, they can be as complex as you need them to be, but they have drawbacks as well. Like Terraform’s custom variable validation, linting rules won’t tell you about existing problems in resources that aren’t managed by CloudFormation, so they work best when combined with a reactive tag audit and adjustment strategy.

Conclusion

Properly tagged resources will help you predict and control your costs, but your tagging strategy can’t just be reactive. Having proactively enforced patterns will require an up-front investment, but will save you time and money in the long-run.

Once you’ve adopted a tagging strategy and proactive enforcement method, the last piece of the puzzle is catching up when you fall behind. In the final part of this guide, you’ll see how to audit and find mistagged resources to ensure your tagging strategy continues to succeed in the future.

If you’re interested in getting help with your tagging, reach out to our CTO at francois@cloudforecast.io to receive a free tagging compliance report.

This was originally posted on our blog on 8/27/2020: https://www.cloudforecast.io/blog/aws-tagging-best-practices-guide-part-2/

AWS Tagging Best Practices Guide: Part 1 of 3

Tony Chan — Tue, 18 Aug 2020 15:19:52 +0000

This was originally posted on our blog on August 12, 2020: https://www.cloudforecast.io/blog/aws-tagging-best-practices/

Part 1: An Introduction to AWS Tagging Strategies

If you've worked in Amazon Web Services for long, you've probably seen or used AWS cost allocation tags to organize your team's resources. AWS tags allows you to attach metadata to most resources in the form of key-value pairs called tags. In this guide (the first in a three-part series), we'll cover some of the most common use-cases for AWS tags and look at some best AWS tagging best practices for selecting and organizing your AWS tags. Finally, we'll explore some examples of AWS resource tagging strategies used by real companies to improve visibility into their resource utilization in Amazon Web Services.

Why use AWS Tags?

AWS tags can help you understand and control your AWS costs. AWS Cost Explorer allows you to use tags to break down your AWS resource usage over time, while tools like CloudForecast keep you informed of your spending proactively.

Understanding and controlling your costs isn’t the only reason you should AWS tags to tag your resources. You can use AWS tags to answer a variety of questions, including:

Which team member is the point of contact for this AWS resource?
How many of our servers have been updated with the latest version of our operating system?
How many of our services have alerting enabled?
Which AWS resources are unnecessary at low-load hours?
Who should have access to this resource?

Before you start adding AWS tags to all of your AWS resources, it's essential to create a strategy that will help you sustainably manage your tags. AWS tags can be helpful, but without a consistently applied plan, they can become an unsustainable mess.

AWS Tagging Best Practices

While there isn't a perfect AWS tagging strategy that works for every organization, there are a few AWS tagging best practices that you should be familiar with.

1. Know how each tag you create will be used

AWS cites four categories for cost allocation tags: technical, business, security, and automation. Consider which of these categories you will need when creating your AWS tagging strategy:

Technical Tags help engineers identify and work with the resource. These might include an application or service name, an environment, or a version number.
Business Tags allow stakeholders to analyze costs and the teams or business units responsible for each resource. For example, you might want to know what percentage of your AWS spend is going towards the new product you launched last year so you can determine the return on investment of that effort.
Security Tags ensure compliance and security standards are met across the organization. These tags might be used to limit access or denote specific data security requirements for HIPAA or SOC compliance.
Automation Tags can be used to automate the cleanup, shutdown, or usage rules for each resource in your account. For example, you could tag sandbox servers and run a script to delete them after they're no longer in use.

2. Decide which AWS tags will be mandatory

As you decide which AWS tags you need and how you will use them, set rules about their usage. Decide which AWS tags will be mandatory, what character should be used as a delimiter, and who will be responsible for creating them. If you already have many resources, you may have to delegate tag assignment to the teams who use them.

3. Develop a consistent AWS tag naming convention

Choosing a consistent and scalable AWS tag naming convention for your AWS tag keys and values can be complicated. There are different AWS tag naming convention rules about which characters you can use and how long AWS tag keys and AWS tag values can be. Be sure to read up on these tag restrictions before you select a AWS tag naming convention.

A common AWS tag naming convention pattern is to use lowercase letters with hyphens between words and colons to namespace them. For example, you might use something like this:

Tag Key	Value
mifflin:eng:os-version	1.0

Where mifflin is the name of your company, eng designates this tag as being relevant to the engineering team, os-version indicates the purpose of the tag, and 1.0 is the value.

4. Limit the number of AWS tags you adopt

There are technical and practical limits to the number of tags you should use. First, AWS enforces a 50-tag limit on each resource. More importantly, engineers will have a hard time keeping track of and remembering how to properly use tags if you require too many.

Fortunately, many tags can be avoided by relying on AWS's built-in resource metadata. For example, you don't have to store the creator of an EC2 instance because Amazon adds a createdBy tag by default. Decide which tags you need and try to limit the creation of new tags.

5. Automate AWS tag management

As the number of AWS resources in your account grows, keeping up with your AWS tags, enforcing conventions, and updating tags will get increasingly difficult. In Part 2 and 3 of this guide, we'll look at how you can use Terraform, CloudFormation, Cloud Custodian to manage tags across your resources.

Amazon also offers tag policies, tagging by resource group, and a resource tagging API to help you govern and assign tags in bulk. Automating as much of the tag management process as possible will result in higher quality, more maintainable tags in the long run.

6. Plan to audit and maintain AWS tags

You will undoubtedly need to revisit your AWS tags periodically to make sure they're still useful and accurate. Depending on how many resources you deploy, this might mean setting a reminder to audit your tags every quarter, or it might mean creating a committee to review and update tags every month. We'll look at some tools and strategies for managing your tags in Part 3 of this guide.

Amazon Web Services provides a comprehensive document of their recommended practices for tagging resources. Be sure to review it if you're new to AWS tags and want to dive deeper into some of these AWS tagging best practices.

Example AWS Tagging Strategies

Let's look at a few real-world tagging strategies. These are adapted from real companies that use AWS tags to organize their resources for various reasons. While they may differ from your use case, they'll offer you insight into how you might tag your resources in AWS.

Example 1: A Service-Based AWS Tagging Strategy

A widespread pattern for tagging resources is by service and environment. For example, if an organization has two services (cart and search) and two environments (prod and dev), they might set up the following tags:

Key	Value
service	cart or search
contact	Name of the engineer who maintains this resource
env	prod or dev

If these two services share a single RDS instance, then the database can be tagged service=cart|search (to indicate that this resource serves both services) and the architecture might look something like this:

If you choose an AWS tagging strategy like the one above, you have to consider how tags will change over time. For example, if you add a new service that shares the same RDS instance, you’ll have to update the database’s tags to include the name of the new service. For this reason, some teams opt to use a single tag to indicate that a resource may be used by all services (eg: service=common).

Service-based tagging strategies like this are usually a good starting point if you'd like to understand which services contribute the most to your AWS costs. The business team can use these tags to see how much they're paying for each service or environment and reach out to the appropriate contact if they have questions.

Example 2: A Compliance AWS Tagging Strategy

AWS cost allocation tags may also help organizations manage governance or compliance. Tools like CloudForecast through their AWS Tagging Compliance feature can help you maintain tagging compliance if you tag your resources in specific ways. These AWS tags might be used to limit access or run extra security checks on particular resources.

In this example, the company tags resources that contain user data with user-data=true so that they can audit them more frequently and ensure they meet specific standards. All resources have a contact and env tag to designate the responsible team member and ensure someone is accountable for keeping them up to date.

Using a compliance tagging strategy does not preclude you from using other strategies as well. One of the advantages of AWS tags is that they let you segment your AWS resources in a nearly infinite number of ways.

Example 3: Account Segmented Environments

The final example we'll look at is an account-segmented tagging strategy. While AWS's IAM permissions allow you to assign access to users, roles, and teams granularly, some organizations may want to go a step further.

"When resources across heterogeneous logical environments are colocated, it is deceptively easy to accidentally use resources from another environment if you're not extraordinarily careful when provisioning resources and designing network/IAM policies" - Platform Engineer at Cars.com

In this example, the organization designated business unit and team tags to each resource, with each environment having a separate AWS account.

This allows them to generate reports in each environment to see what their resource costs are for the marketing (mktg) unit is vs. the data warehousing (data) unit. If the team uses this method of account-segmented tagging, they’ll need to use a master account to see resource usage across their entire organization. You can also use CloudForecast to generate regular cost reports and breakdowns across multiple AWS accounts.

Conclusion

Any organization that uses AWS at scale will need to develop a tagging strategy that works for them. Consider the AWS tagging best practices and examples above, as well as your organization's goals.

Once you decide on a AWS tagging strategy, you will need a plan for adding and maintaining AWS cost allocation tags. In the next part of this guide, we'll look at tools you can adopt to ensure your engineering teams are using AWS tags consistently across all your AWS resources.

How Tagging Resources Can Reduce Your AWS Bill

Tony Chan — Wed, 18 Mar 2020 16:22:40 +0000

Amazon AWS has revolutionized everything from simple file storage to complex data processing tasks. From students to Fortune 100 companies, anyone can tap virtually limitless storage and processing power in just minutes.

But with great power comes great responsibility: The pay-as-you-go pricing structure makes it just as easy to rack up enormous bills. It's easy for companies to 'set and forget' their AWS infrastructure until it grows out of control.

Fortunately, there are simple tools and strategies that you can put in place to manage your costs. The key is recognizing the need to monitor costs early on and putting the right processes in place to ensure everyone is kept up-to-date.

Let's take a look at why AWS tags are the best starting point for cost management and how you can get started with them.

Amazon AWS has made it possible to provision resources in minutes — but it's just as easy to rack up large bills without the right cost monitoring in place.

What Are AWS Tags?

Amazon AWS enables users to assign metadata to their resources in the form of tags — or simple labels that consist of a key and optional value. You can use tags to manage, search for, and filter resources, which makes it much easier to manage them.

There are two different types of cost allocation tags:

User-defined Tags: These tags can include anything that you’d like to track, ranging from projects to cost centers. Just don’t include any sensitive information in them! The easiest way to add them is by using the AWS Tag Editor.
AWS-generated Tags: These are “createdBy” tags that are automatically created by AWS. You can activate these tags in the AWS Management Console in the Billing and Cost Management console under Cost Allocation Tags.

AWS Cost Explorer and Cost and Usage Report files (CUR) can break down AWS costs using these tags. You can easily see what business units, projects, customers, or cost centers are generating the highest expenses and address them before they become a problem.

For example, the following report shows tags in each column and costs in each row, demonstrating how you can drill down into resource usage:

Cost Allocation Report Example - Source: Amazon AWS Docs

Cost allocation reports are generated as a comma-separated value (CSV) file that can be easily analyzed by third-party reporting tools or turned into visualizations.

It’s important to note that cost allocation tags can take up to 24 hours to appear in the Billing and Cost Management console, but you can speed up the process with a manual refresh.

Getting Started with Tags

The best way to get started with tags is to come up with a standardized set of resource tags to use throughout your organization. For instance, do you want to track costs by project, department or other metrics? What is the name of these tags?

If you have existing tags, you can identify and programmatically standardize them using tools like the Resource Groups Tagging API, AWS Config Rules, custom scripts, or by manually using the AWS Tag Editor and billing reports. This is known as reactive governance.

Example of Rule Parameters - Source: Amazon AWS Docs

The example above shows a series of required tags specified in an AWS Config Rule. If the "Value" is missing, that means that the tag is optional, while specific values mean that the tag must include one of the values in order to be valid.

You can check for invalid tags on resources by setting up custom rules. In the example below, the rule checks EC2 instances to ensure that they contain required tags and flag any instances that don’t adhere to the standard.

Example of Required Tags on EC2 - Source: Amazon AWS Docs

If you already use standardized tags or are starting from scratch, you can move straight to tools like AWS CloudFormation, AWS Service Catalog, or IAM resource-level permissions to enforce standardized tags during resource creation. This is known as proactive governance.

Tag Library Workflow - Source: Amazon AWS Docs

The AWS Service Catalog TagOption Library is one of the easiest ways to create tag templates. By associating TagOptions for different products, the tool aggregates the associated product TagOptions and applies them to each newly provisioned product.

The Bottom Line

It's easy for AWS costs to spin out of control over time, but fortunately, tagging can help you monitor and proactively address these rising costs.

Tracking AWS costs can help you save thousands or tens of thousands of dollars over time by cutting waste and streamlining usage. It’s important to implement these efforts with the support of both engineers and management and ensure smooth communication.

Interview with GitLab CEO, Sid Sijbrandij: Keys to Developer Marketing

Tony Chan — Fri, 22 Mar 2019 20:38:26 +0000

The CloudForecast team and I recently had a chance to do a “Pick Your Brain” session with the CEO of GitLab, Sid Sijbrandij. The range of topics included, keys to marketing to developers and growth strategies.

The full interview can be found below this post. Here are a few highlights from the conversations:

What are some common early mistakes that you see software service companies make with marketing. Which one did you personally fall for? Are there any specific things you can think of with developer marketing that you see that works really well?

You have to trust yourself that you understand your audience. For example, I post a lot on Hacker News and I saw that as a hobby of mine. I’m seeing the huge effect it has on people being fans of GitLab through my posting on HackerNews.

Not measuring is a pitfall. If you do any marketing actions you want to make sure you measure what the effect is. It’s important to figure out ways to measure those marketing actions effects.

Examples: What is your qualification criteria? What is a marketing qualified lead? What is the accepted sales accepted opportunity?

Another pitfall is only focusing on the number of leads. If you are only focused on the count, you’ll get a lot of small opportunities. It’s important to mix in the total size of pipeline opportunity.

Are there certain themes that you see that resonates with developers with content marketing?

I’m a big fan of HackerNews, so I am biased towards that. If you read something on HackerNews, it’s because you want to learn something. There will be a lot of lessons and stories to tell as you build out your product. These lessons and stories are things you tell your co-founders, “Hey! This is interesting and this was the effect.”. Writing and telling those stories in a public setting is an underserved market. I’ve found that they are mostly done by hobbyists. I find that too many companies want to present company pitch in their post.

Telling about a hard technical problem that you solved, or something that totally didn’t work as you intended tends to be more interesting for developers. It’s very relatable and you feel like you’ve benefited from someone else making a mistake instead of you making the mistake yourself. That’s a gift that just keeps on giving. Try to keep it to topics that you discuss with developer friends when you grab a beer together.

How did you measure what was successful in your early days at GitLab with acquiring customers?

We followed the revenue and tried different things really quickly.

We tried many things at GitLab in the early days. We tried to sell support. Nobody wanted that. We tried donations. We got it to grow from seven dollars to less than a thousand dollar a month. We tried paid development but it was super hard to coordinate. Then we tried licensed software and that went bananas compared to the rest. We also tried different things with our price with testing a higher tier and then people bought that. Then we tested another tier that is even higher and that worked well too.

You all really tried a lot of things. How quickly did you all go from one idea to another?

We hopped from one idea to another pretty fast in the early days. For example, we tried donations for two months before moving on. What they advised us at YCombinator, an incubator program we went to, is to set a metric to destroy and try to prove it. Our metric at GitLab was to increase revenue by 20% every two weeks.

That really keeps you honest. If you can’t do that, then you have to make a dramatic change with your product. It’s also great because you might think that any change should take at least three months to implement. If you are forced to execute a change in one week and need to have the revenue to show for it after the two weeks, you start taking all these shortcuts to try to make it happen. These are not bad shortcuts, but good shortcuts since you’re trying out things very fast and learning.

Are you paying the price for all the shortcuts? Example: Needing to hire more engineers or putting more engineer resources to fix tech debt , or working with clients to fix things due to the shortcuts that were made.

Not really! You’d think that. We don’t take shortcuts with things like code quality. We do take shortcuts with the scope of a feature. It’s sometimes embarrassing with how much we cut scope. For example, we now have a Jaeger integration, which is a new cool tracing methodology. The integration is you can put in the URL and it will link to to Jaeger. It’s a trivial feature but it was a bigger thing. We only had time to ship a basic version of it.

Another good example is introducing a new pricing plan. You can make changes to your pricing page to see how it converts. You can also take out some ads and see what click thru rate is with different pages or do interviews with relevant people and ask them what it is worth to them. We’ve always had these huge two months plan but we cut them down in scope. This still gave us feedback but in a lot quicker way.

We define scope based on how much time we have and not the dream vision.

Did you feel like your growth strategy kind of evolved a bit to stay in line with your product and vision? Like how did that change a little bit from your early days to what it is today, over the last eight years?

Again in the beginning we always said, “Hey the majority of the world isn’t using GitLab yet. So, we’ll optimize for everyone not using GitLab.” That meant that we had to greatly expand the scope of our product in the beginning which is pretty controversial. That has worked out really well for us because now that the scope is so big, people now see the advantage of a single application for the whole devops lifecycle.

Prioritization is getting harder now that we are bigger. It was whatever features we thought were needed in the early days. Then you get your first customers and you focus building features for them. We had one customer where we built everything they asked for. It turned out to be a really good thing because everyone else was asking for the same features later. Now we have this mixed bag of features that needs to be refactored into new features or new features that need to be extended. An extension of scope.

The most useful thing I can tell you is to have a single person that determines prioritization of features. At GitLab, product manager’s makes all final decisions on features. If you don’t agree, you can always talk to the product manager. We have certain guidelines with how they should make decisions. You don’t want the decision be between 10 different people.

Any blogs or writers that you recommend that you personally enjoy reading and enjoy kind of looking over whenever you get a free chance?

The blog of Ben Thompson is great. If you’re starting a SaaS business, I would read Predictable Revenue by Aaron Ross.

Any final advice to SaaS companies in the dev tool space?

I think VC’s don’t like to invest in dev tools because it’s notoriously hard to make money. If you make a dev tool, make sure it’s a broad product. If it only has a single feature then you’re just selling a C.I. and it will be very hard to make money.

Thank you to Sid and the GitLab team for setting this up. GitLab is a single application for the entire software development lifecycle. From project planning and source code management to CI/CD, monitoring, and security. For more information on GitLab, please visit their website: www.gitlab.com.