DEV Community

Cover image for Introduction to EC2 Auto Scaling

Posted on

Introduction to EC2 Auto Scaling


Picture source: AWS

This blog has been written for AWS UG Madurai - AWS Cloud Practitioner BOOT CAMP

Table of Contents

  1. Introduction
  2. What is EC2 Auto Scaling?
  3. EC2 Auto Scaling Lifecylce
  4. Types of Scaling
  5. Warm Pool
  6. Scaling Termination Policy
  7. Recommended Further Reading
  8. Referrals


AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. Using AWS Auto Scaling, it’s easy to setup application scaling for multiple resources across multiple services in minutes.


  • Setup Scaling quickly
  • Make SMART scaling decisions
  • Automatically maintain performance
  • Pay only for what you need

Veritcal vs Horizontal Scaling

  • Veritcal Scaling: Upgrading server hardware, For example, Changing Instance type from t2.medium (2 vCPU, 24GB Memory) to t2.2xlarge (8 vCPU, 32GB Memory)

Use-Case: A data-processing job may require more vCPU and memory in order to complete in a shorter period of time.


  • An Instance restart is required.
  • You could only be able to vertically scale to the maximum limit of instance availability, For example, m6a.48xlarge (192 vCPU, 192 Memory).
  • Horizontal Scaling: Add more instances to the existing instances

Types of Scaling

What is EC2 Auto Scaling?

Amazon EC2 Auto Scaling helps to maintain the number of instances in each Auto Scaling Group.

For example, If we define minimum size of one instance, a desired capacity of two instances, and a maximum size of four instances.
The Auto Scaling policies will maintain the number of instances within the minimum and maximum number of instances. By default, the desired number of instances will always be available, which is two in this scenario.


Picture source: AWS

EC2 Auto Scaling Lifecylce

The following illustration shows the lifecycle of an Auto Scaling instance


Picture source: AWS

Types of Scaling

  • Manual Scaling: Change the size of Auto Scaling Group manually by updating the desired capacity.
  • Predictive Scaling: To increase the number of EC2 instances in your Auto Scaling group in advance of daily and weekly patterns in traffic flows.
  • Scheduled Scaling: To set up your own scaling schedule according to predictable load changes. For example, let's say every week on Wednesday, the traffic to the application increases and decreases on Friday.
  • Dynamic Scaling:

    • Simple Scaling: We choose scaling metrics and threshold values for the CloudWatch alarms that invoke the scaling process.
    • Step Scaling: We specify one or more step adjustments that automatically scale the number of instances dynamically based on the size of the alarm breach. Each step adjustment specifies the following:

      • A lower bound for the metric value
      • An upper bound for the metric value
      • The amount by which to scale, based on the scaling adjustment type

      For example:

      Step 1: Add 1 instance if CPU utilization is over 40%.

      Step 2: Add 3 instances if CPU utilization is over 60% and similarly scale in when the CPU utilization reduces.

    • Target Tracking: We can select a scaling metric and set a target value. For example, to keep the average aggregate CPU utilization of your Auto Scaling group at 50 percent

The main issue with simple scaling is that after a scaling activity is started, the policy must wait for the scaling activity or health check replacement to complete and the cooldown period to expire before responding to additional alarms. Cooldown periods help to prevent the initiation of additional scaling activities before the effects of previous activities are visible.activities are visible.

Warm Pool

  • A warm pool is a pool of pre-initialized EC2 instances that sits alongside an Auto Scaling group. Whenever your application needs to scale out, the Auto Scaling group can draw on the warm pool to meet its new desired capacity. This helps you to ensure that instances are ready to quickly start serving application traffic, accelerating the response to a scale-out event. As instances leave the warm pool, they count toward the desired capacity of the group. This is known as a warm start.
  • The warm pool is calculated as the difference between the Auto Scaling group's maximum capacity and its desired capacity.
  • The instances in the warm pool in one of three states: Stopped, Running, or Hibernated. Keeping instances in a Stopped state is an effective way to minimize costs. With stopped instances, you pay only for the volumes that you use and the Elastic IP addresses attached to the instances. But you don't pay for the stopped instances themselves. You pay for the instances only when they are running. Alternatively, you can keep instances in a Hibernated state to stop instances without deleting their memory contents (RAM). When an instance is hibernated, this signals the operating system to save the contents of your RAM to your Amazon EBS root volume. When the instance is started again, the root volume is restored to its previous state and the RAM contents are reloaded. While the instances are in hibernation, you pay only for the EBS volumes, including storage for the RAM contents, and the Elastic IP addresses attached to the instances.

Scaling Termination Policy

The default termination policy is as following, however you can change or add termination policies to an ASG.

Termination Policy

Recommended Further Reading


Top comments (1)

vivek0712 profile image

Very well written and explained :) Amazing work!