Garrett Yan

Posted on Aug 10

Cutting AWS Auto Scaling Costs by 70% While Maintaining 99.99% Availability

#aws #devops #cloud #cost

Introduction

After successfully reducing our database costs by 40% (as covered in my previous post on Aurora Serverless v2 migration), our next target was the compute layer. Our EC2 costs were spiraling with our Auto Scaling Groups (ASG) running 24/7 at peak capacity "just to be safe."

This post details how we achieved a 70% cost reduction in our ASG infrastructure while actually improving our availability from 99.9% to 99.99%. The secret? A carefully orchestrated mix of On-Demand and Spot instances, combined with intelligent scaling strategies.

The Problem: Over-Provisioning for Peace of Mind
The Solution: Mixed Instance Strategy
Implementation Guide
Advanced Optimization Techniques
Monitoring and Alerting
Results and Metrics
Lessons Learned
Conclusion

The Problem: Over-Provisioning for Peace of Mind

Our initial setup was typical of many AWS deployments:

20 On-Demand instances running constantly
Scaling up to 50 instances during peak hours
Monthly cost: ~$7,200
Actual average utilization: 35%

We were essentially paying for insurance we rarely needed. Sound familiar?

The Solution: Mixed Instance Strategy + Intelligent Scaling

1. The Foundation: Fixed On-Demand + Flexible Spot Instances

The core strategy is simple but powerful:

Fixed base capacity: On-Demand instances for guaranteed availability
Variable capacity: Spot instances for cost-effective scaling
Intelligent distribution: 30% On-Demand, 70% Spot during normal operations

Here's our optimized ASG configuration:

# auto-scaling-group.yaml
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  OptimizedASG:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: production-web-asg
      MinSize: 6
      MaxSize: 50
      DesiredCapacity: 10
      HealthCheckType: ELB
      HealthCheckGracePeriod: 300

      # Mixed Instances Policy - The Key to Cost Savings
      MixedInstancesPolicy:
        InstancesDistribution:
          OnDemandAllocationStrategy: prioritized
          OnDemandBaseCapacity: 3  # Fixed On-Demand instances
          OnDemandPercentageAboveBaseCapacity: 20  # 20% of additional capacity
          SpotAllocationStrategy: capacity-optimized-prioritized
          SpotInstancePools: 4  # Diversify across 4 instance types

        LaunchTemplate:
          LaunchTemplateSpecification:
            LaunchTemplateId: !Ref LaunchTemplate
            Version: !GetAtt LaunchTemplate.LatestVersionNumber

          Overrides:
            # Diversified instance types for better Spot availability
            - InstanceType: t3.medium
              WeightedCapacity: 1
            - InstanceType: t3a.medium
              WeightedCapacity: 1
            - InstanceType: t2.medium
              WeightedCapacity: 1
            - InstanceType: m5.large
              WeightedCapacity: 2

2. Implementing Predictive Scaling

Instead of reactive scaling, we implemented predictive scaling based on historical patterns:

# predictive_scaling_config.py
import boto3
from datetime import datetime, timedelta

autoscaling = boto3.client('autoscaling')

def configure_predictive_scaling(asg_name):
    """Configure predictive scaling policy for ASG"""

    response = autoscaling.put_scaling_policy(
        AutoScalingGroupName=asg_name,
        PolicyName='predictive-scaling-policy',
        PolicyType='PredictiveScaling',
        PredictiveScalingConfiguration={
            'MetricSpecifications': [
                {
                    'TargetValue': 50.0,
                    'PredefinedMetricPairSpecification': {
                        'PredefinedMetricType': 'ASGAverageCPUUtilization'
                    }
                }
            ],
            'Mode': 'ForecastAndScale',
            'SchedulingBufferTime': 600,  # 10 minute buffer
            # Predict based on last 2 weeks of data
            'MaxCapacityBreachBehavior': 'IncreaseMaxCapacity',
            'MaxCapacityBuffer': 10  # Allow 10% above max for unexpected spikes
        }
    )

    return response

# Enable predictive scaling
configure_predictive_scaling('production-web-asg')

3. Warm Pool Strategy for Instant Scaling

To maintain high availability while using Spot instances, we implemented a Warm Pool:

WarmPool:
  Type: AWS::AutoScaling::WarmPool
  Properties:
    AutoScalingGroupName: !Ref OptimizedASG
    MinSize: 5
    MaxGroupPreparedCapacity: 10
    PoolState: Stopped  # Save costs by keeping instances stopped
    InstanceReusePolicy:
      ReuseOnScaleIn: true  # Reuse instances to save on provisioning time

This reduced our scale-up time from 5 minutes to 30 seconds!

Advanced Cost Optimization Techniques

A. Scheduled Scaling for Predictable Patterns

# scheduled_scaling.py
def create_scheduled_actions(asg_name):
    """Create scheduled scaling actions for predictable traffic patterns"""

    # Scale up for business hours (Mon-Fri, 8 AM - 6 PM)
    autoscaling.put_scheduled_update_group_action(
        AutoScalingGroupName=asg_name,
        ScheduledActionName='business-hours-scale-up',
        MinSize=10,
        DesiredCapacity=15,
        Recurrence='0 8 * * MON-FRI',  # 8 AM Mon-Fri
        TimeZone='America/New_York'
    )

    # Scale down for nights and weekends
    autoscaling.put_scheduled_update_group_action(
        AutoScalingGroupName=asg_name,
        ScheduledActionName='off-hours-scale-down',
        MinSize=3,
        DesiredCapacity=5,
        Recurrence='0 18 * * MON-FRI',  # 6 PM Mon-Fri
        TimeZone='America/New_York'
    )

    # Weekend minimum capacity
    autoscaling.put_scheduled_update_group_action(
        AutoScalingGroupName=asg_name,
        ScheduledActionName='weekend-minimum',
        MinSize=2,
        DesiredCapacity=3,
        Recurrence='0 0 * * SAT',  # Saturday midnight
        TimeZone='America/New_York'
    )

B. Spot Instance Interruption Handling

Here's our battle-tested Spot interruption handler:

# spot_interruption_handler.py
import requests
import time
import logging
import boto3
from concurrent.futures import ThreadPoolExecutor

logger = logging.getLogger(__name__)
ec2 = boto3.client('ec2')
elbv2 = boto3.client('elbv2')

class SpotInterruptionHandler:
    def __init__(self, instance_id, target_group_arn):
        self.instance_id = instance_id
        self.target_group_arn = target_group_arn
        self.metadata_url = "http://169.254.169.254/latest/meta-data/spot/instance-action"

    def check_spot_interruption(self):
        """Check for Spot instance interruption notices"""
        try:
            response = requests.get(self.metadata_url, timeout=1)
            if response.status_code == 200:
                interruption_data = response.json()
                logger.warning(f"Spot interruption notice: {interruption_data}")

                # We have 2 minutes to act
                self.handle_interruption(interruption_data['time'])
                return True

        except requests.exceptions.RequestException:
            # No interruption notice (normal operation)
            return False

    def handle_interruption(self, interruption_time):
        """Gracefully handle Spot interruption"""
        with ThreadPoolExecutor(max_workers=3) as executor:
            # Parallel execution for speed
            executor.submit(self.drain_connections)
            executor.submit(self.save_state)
            executor.submit(self.notify_monitoring)

        # Deregister from target group
        self.deregister_from_alb()

    def drain_connections(self):
        """Stop accepting new connections and drain existing ones"""
        # Application-specific implementation
        logger.info("Draining connections...")
        # Set health check to fail
        with open('/var/www/health-check', 'w') as f:
            f.write('draining')

        # Wait for connections to drain (max 90 seconds)
        time.sleep(90)

    def deregister_from_alb(self):
        """Deregister instance from ALB target group"""
        try:
            elbv2.deregister_targets(
                TargetGroupArn=self.target_group_arn,
                Targets=[{'Id': self.instance_id}]
            )
            logger.info(f"Deregistered {self.instance_id} from target group")
        except Exception as e:
            logger.error(f"Failed to deregister: {e}")

# Run this as a daemon on each instance
if __name__ == "__main__":
    handler = SpotInterruptionHandler(
        instance_id=requests.get('http://169.254.169.254/latest/meta-data/instance-id').text,
        target_group_arn='arn:aws:elasticloadbalancing:region:account:targetgroup/name/id'
    )

    while True:
        if handler.check_spot_interruption():
            break
        time.sleep(5)

C. Multi-AZ Spot Diversification

Spread your risk across availability zones and instance types:

# spot-diversification-template.yaml
LaunchTemplate:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: diverse-spot-template
    LaunchTemplateData:
      IamInstanceProfile:
        Arn: !GetAtt InstanceProfile.Arn
      SecurityGroupIds:
        - !Ref WebSecurityGroup
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          # Install Spot interruption handler
          curl -o /usr/local/bin/spot-handler.py ${SpotHandlerUrl}
          chmod +x /usr/local/bin/spot-handler.py

          # Run as systemd service
          cat > /etc/systemd/system/spot-handler.service << EOF
          [Unit]
          Description=Spot Instance Interruption Handler
          After=network.target

          [Service]
          Type=simple
          ExecStart=/usr/bin/python3 /usr/local/bin/spot-handler.py
          Restart=always

          [Install]
          WantedBy=multi-user.target
          EOF

          systemctl enable spot-handler
          systemctl start spot-handler

          # Your application startup script here
          /opt/app/start.sh

Monitoring and Alerting

Comprehensive monitoring is crucial for maintaining high availability:

# monitoring_setup.py
import boto3

cloudwatch = boto3.client('cloudwatch')

def create_comprehensive_monitoring(asg_name, sns_topic_arn):
    """Set up comprehensive monitoring for mixed instance ASG"""

    alarms = []

    # 1. On-Demand instance health check
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-on-demand-minimum',
        ComparisonOperator='LessThanThreshold',
        EvaluationPeriods=2,
        MetricName='GroupInServiceInstances',
        Namespace='AWS/AutoScaling',
        Period=60,
        Statistic='Minimum',
        Threshold=3.0,  # Minimum On-Demand instances
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        AlarmDescription='On-Demand instances below minimum threshold',
        Dimensions=[
            {
                'Name': 'AutoScalingGroupName',
                'Value': asg_name
            }
        ]
    ))

    # 2. Spot interruption rate monitoring
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-high-spot-interruption-rate',
        ComparisonOperator='GreaterThanThreshold',
        EvaluationPeriods=1,
        MetricName='SpotInstanceInterruptionWarnings',
        Namespace='AWS/EC2',
        Period=300,
        Statistic='Sum',
        Threshold=5.0,
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        AlarmDescription='High Spot instance interruption rate detected'
    ))

    # 3. Cost anomaly detection
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-cost-anomaly',
        ComparisonOperator='GreaterThanThreshold',
        EvaluationPeriods=4,
        MetricName='EstimatedCharges',
        Namespace='AWS/Billing',
        Period=3600,  # 1 hour
        Statistic='Maximum',
        Threshold=100.0,  # Adjust based on your expected hourly cost
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        AlarmDescription='Unexpected cost spike detected',
        Dimensions=[
            {
                'Name': 'Currency',
                'Value': 'USD'
            }
        ]
    ))

    # 4. Application-level availability
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-application-availability',
        ComparisonOperator='LessThanThreshold',
        EvaluationPeriods=3,
        MetricName='HealthyHostCount',
        Namespace='AWS/ApplicationELB',
        Period=60,
        Statistic='Average',
        Threshold=0.9,  # 90% of hosts should be healthy
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        TreatMissingData='breaching',
        AlarmDescription='Application availability below threshold'
    ))

    return alarms

# Create all monitoring alarms
create_comprehensive_monitoring(
    'production-web-asg',
    'arn:aws:sns:us-east-1:123456789012:asg-alerts'
)

Results: 70% Cost Reduction, Better Availability

After implementing these strategies, here are our results:

💰 Cost Breakdown:

Metric	Before	After	Improvement
Monthly Cost	$7,200	$2,160	70% reduction
Cost per Million Requests	$24	$7.20	70% reduction
On-Demand Instances	20-50	3-10	85% reduction
Spot Instance Usage	0%	70%	New savings source

📈 Availability Improvements:

Metric	Before	After	Improvement
Availability	99.9%	99.99%	10x better
Monthly Downtime	43 minutes	4.3 minutes	90% reduction
Scale-up Time	5 minutes	30 seconds	10x faster
Recovery Time	10 minutes	< 2 minutes	5x faster

🚀 Performance Metrics:

Request latency: No change (same instance types)
Spot interruption impact: < 0.01% of requests affected
Warm Pool efficiency: 95% hit rate during scale events
Predictive scaling accuracy: 92% (reduced reactive scaling by 80%)

Key Lessons Learned

1. Start Conservative

Begin with 50% Spot instances and gradually increase as you gain confidence. We started at 50/50 and moved to 30/70 after 3 months.

2. Diversification is Critical

Use at least 4 different instance types
Spread across multiple AZs
Consider cross-region failover for critical apps

3. Warm Pools are Game-Changers

The slight additional cost (~$50/month) pays for itself in:

Improved user experience during scaling
Reduced Spot interruption impact
Better handling of traffic spikes

4. Monitor Everything

Set up alerts for:

Minimum On-Demand capacity
Spot interruption rates
Cost anomalies
Application-level metrics

5. Test, Test, Test

Regular chaos engineering exercises:

# Simulate Spot interruptions in staging
aws ec2 terminate-instances --instance-ids $(aws ec2 describe-instances \
  --filters "Name=tag:Environment,Values=staging" \
  "Name=instance-lifecycle,Values=spot" \
  --query 'Reservations[*].Instances[*].InstanceId' \
  --output text | shuf -n 3)

Implementation Checklist

Ready to implement this in your environment? Here's your checklist:

[ ] Analyze current ASG utilization patterns (use CloudWatch metrics)
[ ] Calculate minimum On-Demand capacity needed (peak traffic / instance capacity)
[ ] Set up Mixed Instances Policy with 4+ instance types
[ ] Implement Warm Pool (start with 20% of peak capacity)
[ ] Configure predictive scaling based on 2 weeks of data
[ ] Set up scheduled scaling for known patterns
[ ] Deploy Spot interruption handlers on all instances
[ ] Create comprehensive CloudWatch alarms
[ ] Test Spot interruption handling in staging
[ ] Document runbooks for various failure scenarios
[ ] Set up cost allocation tags for tracking savings

Conclusion

Optimizing Auto Scaling Groups doesn't require sacrificing availability for cost savings. By intelligently combining On-Demand and Spot instances with modern scaling strategies, we achieved dramatic cost reductions while actually improving our service reliability.

The key is starting with a solid foundation of On-Demand instances and gradually optimizing with Spot instances, predictive scaling, and warm pools. With proper monitoring and automation, you can maintain enterprise-grade availability at startup-friendly costs.

Next Steps

Start with the Mixed Instances Policy - it's the quickest win
Add Warm Pools once you're comfortable with Spot instances
Implement predictive scaling after gathering 2 weeks of metrics
Continuously monitor and optimize based on your specific patterns

What's your experience with ASG cost optimization? Have you tried mixing instance types? I'd love to hear about your strategies in the comments!

This is part 2 of my AWS Cost Optimization series. Check out Part 1: Zero-Downtime RDS to Aurora Serverless v2 Migration

Next in this series: "Zero-Downtime Blue-Green Deployments with 90% Less Infrastructure Cost"

Found this helpful? Follow me for more AWS cost optimization tips and real-world DevOps experiences!

DEV Community