CAST AI

Posted on May 27, 2021 • Originally published at cast.ai

400 (!) EC2 instance types: the good, the bad, and the ugly

#aws #ec2 #devops #kubernetes

A DevOps life isn’t a piece of cake in AWS. How are you supposed to make sense of EC2 instance types when you’re looking at almost 400 different ones?

Picking the right VM type for the job that doesn’t burn a hole in your pocket is a challenge. But there are a few things you can do to make your life easier (and gain points with your financial department).

Careful choice of EC2 instances is definitely worth your time because compute is the biggest part of your cloud bill. If you manage to optimize it, you’ll open the doors to dramatic reductions of your cloud costs.

What you’ll find inside:

Before we get started: 5 basic facts about Amazon EC2 instances

Amazon Elastic Compute Cloud ( EC2) is a service that delivers compute capacity in the cloud to help teams benefit from easy-to-scale cloud computing.
AWS currently offers nearly 400 different instances with choices across storage options, networking, operating systems.
Users can choose from machines located in 24 regions and 77 availability zones all over the world.
EC2 instances use two types of processors: Intel Xeon and AMD EPYC, and Arm-based AWS Graviton.
To match your use case, you can choose from 5 different EC2 instance families optimized for compute, memory, storage, accelerated computing or general purpose.

How to choose the EC2 instance types with cost optimization in mind

1. Identify your application’s requirements

Some teams make the mistake of choosing EC2 instances that are too large. They want to be on the safe side in case their application’s requirements increase. But why overprovision when you can use a burstable instance or delegate the task to incredibly cost-effective spot instances when needed?

Other teams are tempted to use more affordable instances. But what if they start running memory-intensive applications and encounter performance issues?

It all starts with knowing your workload requirements well. Make a deliberate effort to get only what your application really needs.

Identify the minimum requirements of your workload and pick EC2 instance types that meets them across these dimensions:

vCPU count
vCPU architecture
Memory
SSD storage
Network

Let’s say that you’ve done your homework and came up with a set of targeted instance types.

CPU vs. GPU – which one should you pick?

If you’re looking for an instance to support a machine learning application, for GPU instead of CPU. GPU-dense instance types train models much faster. Interestingly, the GPU wasn’t initially designed for machine learning – it was designed to display graphics.

What about running predictions? Is investing specialized instance types worth it? AWS has introduced a new instance type designed for inference, AWS EC2 Inf. It supposedly delivers up to 30% higher throughput and 45% lower cost per inference than EC2 G4 instances.

And what’s the hype around Arm all about? The EC2 A1 family is powered by the Graviton2 Arm processor. Since Arm is less power-hungry, it’s also cheaper to run and cool. Cloud providers usually charge less for this type of processor.

But if you’d like to use it, you might have to re-architect your delivery pipeline to compile your application for Arm. On the other hand, if you’re already running an interpreted stack like Python, Ruby or NodeJS, your applications will likely run on Arm.

2. Shop around for EC2 instance types and families

3. Choose your instance size with cost savings in mind

EC2 instance types come in one or more sizes, so scaling resources to match your workload’s requirements is easy.

But size isn’t the only factor that determines the cost.

AWS rolls out different computers to provide compute capacity. And the chips in those computers have different performance characteristics.

You might get an instance running on an older-generation processor that is slightly slower or a new-generation one that is a bit faster. The instance type you pick might come with strong performance characteristics your application doesn’t really need. And you won’t even know it.

How to verify this? Benchmarking is the best approach. It means that you drop the same workload on every machine type you want to examine and check its performance characteristics.

Here’s an example of benchmarking
To understand instance performance, we developed a metric called Endurance Coefficient. Here’s how we calculate it:

We measure how much work an instance type can carry out in 12 hours and how variable the CPU performance is.
A sustained base load needs stability. A workload that experiences traffic or batch job occasionally can get away with lower stability.
In our calculation, instances with stable performance are close to 100 and ones with random performance edge closer to 0 value. We tested the DigitalOcean s1_1 machine and – as you can see – it achieved a pretty high endurance coefficient of 0.97107 (97%). The AWS t3_medium_st instance delivered a less stable result with the endurance coefficient of 0.43152 (43%).

4. Weigh the pros and cons of different pricing models

Next, you have to select an EC2 pricing model that matches your needs and budget. AWS offers the following models:

On-Demand instances

You pay only for the resources that you actually use. No need to worry about long-term binding contracts or upfront payments. Increase or reduce your usage just-in-time. But this flexibility comes with a high price tag. Workloads with fluctuating traffic spikes benefit the most from On-Demand instances.

Reserved Instances

Buy capacity upfront in a given availability zone with a large discount off the On-Demand price. The larger your upfront payment, the larger the discount. But if go for it, you’re also committing to a specific instance or family. And you can’t change that later if your requirements change.

Savings Plans

Get the Reserved Instances discounts but commit to use a given amount of compute power per hour (not specific instance types and configurations). Anything extra will be billed at the high On-Demand rate.

But wait, didn’t you migrate to the cloud to avoid CAPEX in the first place? Resourced Instances and Savings Plans pose risk of vendor lock-in. The resources you get today might make little sense for your company doesn the line. Three years is an eternity in cloud computing.

Spot instances

Bidding on spare compute is a smart move, you can save up to 90% off the On-Demand pricing. But AWS can pull the plug on your instance any time and give you just 2 minutes to prepare for it. You need to come up with a strategy to deal with that.

Learn more about spot instances here: Spot instances: How to reduce AWS, Azure, and GCP costs by 90%

Dedicated host

A physical server that brings an instance capacity that is fully dedicated to you. You can reduce costs by using your own licenses to slash costs and get the resiliency and flexibility of the cloud. It’s pricey, but a good match for applications that have to achieve compliance and, for example, not share hardware with other tenants.

5. Slash costs with CPU bursting

Burstable performance instances were designed to give you a baseline level of CPU performance together with the possibility of bursting to a higher level when the need arises.

Burstable instances in families T2, T3, T3a, and T4g are a good fit for low-latency interactive applications, microservices, small/medium databases, and product prototypes.

Bursting can happen if you have credits. The number of accumulated CPU credits depends on your instance type. Generally, larger instances collect more credits per hour. But note that there’s a cutoff to the number of credits that can be collected (and naturally, it’s higher for larger instances)

Restarting instances leads to losing credits:

Restarting an instance in T2 family means that you immediately lose all the accrued credits.
If you restart an instance in T3 and T4 families, your credits will still be there for seven days (and then you’ll lose them).

We examined burstable instances AWS offers and discovered that if you load your instance for 4 hours or more per day (on average), you’re better off with a non-burstable instance. But if you run an e-commerce business and experience traffic spikes once in a while, a burstable instance is cost-effective.

Side note: vCPU capacity is limited

Our tests revealed that compute capacity tends to increase linearly during the first four hours. After that, the increase is limited and the amount of available compute goes down by nearly 90% by the end of the day.

6. Optimize storage choices for EC instance types

To maximize cloud cost savings, be careful about data storage:

Make sure that the EC2 instance types you choose have a storage throughput your application needs.
Avoid expensive products like premium SSD unless you plan to use them to the fullest.
Be careful about egress traffic. In a single-cloud scenario, you pay egress costs between various availability zones, which most often costs some $0.01/GB. But in a multi-cloud setup, you’ll be charged more – for example $0.02 for using direct fiber. ### 7. Use Spot Instances (even for production workloads) Spot Instances are a great way to save up on your AWS bill. By bidding on instances AWS isn’t using, you can get up to a 90% discount on the On-Demand pricing.

The first step is qualifying your workload for Spot Instances. Is it spot-ready? Answer these questions to find out:

How much time does your workload need to finish the job?
Is it mission- and time-critical?
Can it tolerate interruptions gracefully?
Is it tightly coupled between nodes?
Do you have a strategy in place for moving your workload when AWS pulls the plug?
Once you determine that your workload is a good candidate for Spot Instances, here are a few helpful pointers:
Consider less popular Spot Instances as your chances of getting interrupted are lower.
Check an instance’s frequency of interruption (the rate at which this instance reclaimed capacity during the trailing month). You can check it in AWS Spot Instance Advisor:
Don’t be afraid of using Spot Instances for more important workloads. AWS offers special Spot Instances that guarantee uninterrupted operation for up to 6 hours. They’re a bit more expensive but you still achieve 30-50% cost savings.
When bidding your price on a Spot Instance, set the value equal to On-Demand pricing. Otherwise, you risk that your workload is interrupted when the price increases.
Set up groups called AWS Spot Fleets to boost your chances of snatching a Spot Instance. This is how you can request multiple instance types simultaneously. You’ll pay the maximum price per hour for the entire fleet, not specific spot pool (i.e. instances of the same type and with the same OS, availability zone, and network).

8. Automate it all

Luckily, you can use intelligent cloud optimization tools to get your hands on the best instances and avoid locking yourself into a long-term expensive commitment.

The CAST AI instance selection algorithm cherry-picks the most cost-effective EC2 instance types and sizes that meet your application’s requirements.

Free savings report for Kubernetes clusters

If you run Kubernetes on EKS, start by analyzing your clusters to identify potential savings.

It’s free of charge, you can do it here.

DEV Community