DEV Community

Garrett Yan
Garrett Yan

Posted on

Cutting AWS Auto Scaling Costs by 70% While Maintaining 99.99% Availability

Introduction

After successfully reducing our database costs by 40% (as covered in my previous post on Aurora Serverless v2 migration), our next target was the compute layer. Our EC2 costs were spiraling with our Auto Scaling Groups (ASG) running 24/7 at peak capacity "just to be safe."

This post details how we achieved a 70% cost reduction in our ASG infrastructure while actually improving our availability from 99.9% to 99.99%. The secret? A carefully orchestrated mix of On-Demand and Spot instances, combined with intelligent scaling strategies.

Table of Contents

The Problem: Over-Provisioning for Peace of Mind

Our initial setup was typical of many AWS deployments:

  • 20 On-Demand instances running constantly
  • Scaling up to 50 instances during peak hours
  • Monthly cost: ~$7,200
  • Actual average utilization: 35%

We were essentially paying for insurance we rarely needed. Sound familiar?

The Solution: Mixed Instance Strategy + Intelligent Scaling

1. The Foundation: Fixed On-Demand + Flexible Spot Instances

The core strategy is simple but powerful:

  • Fixed base capacity: On-Demand instances for guaranteed availability
  • Variable capacity: Spot instances for cost-effective scaling
  • Intelligent distribution: 30% On-Demand, 70% Spot during normal operations

Here's our optimized ASG configuration:

# auto-scaling-group.yaml
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  OptimizedASG:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: production-web-asg
      MinSize: 6
      MaxSize: 50
      DesiredCapacity: 10
      HealthCheckType: ELB
      HealthCheckGracePeriod: 300

      # Mixed Instances Policy - The Key to Cost Savings
      MixedInstancesPolicy:
        InstancesDistribution:
          OnDemandAllocationStrategy: prioritized
          OnDemandBaseCapacity: 3  # Fixed On-Demand instances
          OnDemandPercentageAboveBaseCapacity: 20  # 20% of additional capacity
          SpotAllocationStrategy: capacity-optimized-prioritized
          SpotInstancePools: 4  # Diversify across 4 instance types

        LaunchTemplate:
          LaunchTemplateSpecification:
            LaunchTemplateId: !Ref LaunchTemplate
            Version: !GetAtt LaunchTemplate.LatestVersionNumber

          Overrides:
            # Diversified instance types for better Spot availability
            - InstanceType: t3.medium
              WeightedCapacity: 1
            - InstanceType: t3a.medium
              WeightedCapacity: 1
            - InstanceType: t2.medium
              WeightedCapacity: 1
            - InstanceType: m5.large
              WeightedCapacity: 2
Enter fullscreen mode Exit fullscreen mode

2. Implementing Predictive Scaling

Instead of reactive scaling, we implemented predictive scaling based on historical patterns:

# predictive_scaling_config.py
import boto3
from datetime import datetime, timedelta

autoscaling = boto3.client('autoscaling')

def configure_predictive_scaling(asg_name):
    """Configure predictive scaling policy for ASG"""

    response = autoscaling.put_scaling_policy(
        AutoScalingGroupName=asg_name,
        PolicyName='predictive-scaling-policy',
        PolicyType='PredictiveScaling',
        PredictiveScalingConfiguration={
            'MetricSpecifications': [
                {
                    'TargetValue': 50.0,
                    'PredefinedMetricPairSpecification': {
                        'PredefinedMetricType': 'ASGAverageCPUUtilization'
                    }
                }
            ],
            'Mode': 'ForecastAndScale',
            'SchedulingBufferTime': 600,  # 10 minute buffer
            # Predict based on last 2 weeks of data
            'MaxCapacityBreachBehavior': 'IncreaseMaxCapacity',
            'MaxCapacityBuffer': 10  # Allow 10% above max for unexpected spikes
        }
    )

    return response

# Enable predictive scaling
configure_predictive_scaling('production-web-asg')
Enter fullscreen mode Exit fullscreen mode

3. Warm Pool Strategy for Instant Scaling

To maintain high availability while using Spot instances, we implemented a Warm Pool:

WarmPool:
  Type: AWS::AutoScaling::WarmPool
  Properties:
    AutoScalingGroupName: !Ref OptimizedASG
    MinSize: 5
    MaxGroupPreparedCapacity: 10
    PoolState: Stopped  # Save costs by keeping instances stopped
    InstanceReusePolicy:
      ReuseOnScaleIn: true  # Reuse instances to save on provisioning time
Enter fullscreen mode Exit fullscreen mode

This reduced our scale-up time from 5 minutes to 30 seconds!

Advanced Cost Optimization Techniques

A. Scheduled Scaling for Predictable Patterns

# scheduled_scaling.py
def create_scheduled_actions(asg_name):
    """Create scheduled scaling actions for predictable traffic patterns"""

    # Scale up for business hours (Mon-Fri, 8 AM - 6 PM)
    autoscaling.put_scheduled_update_group_action(
        AutoScalingGroupName=asg_name,
        ScheduledActionName='business-hours-scale-up',
        MinSize=10,
        DesiredCapacity=15,
        Recurrence='0 8 * * MON-FRI',  # 8 AM Mon-Fri
        TimeZone='America/New_York'
    )

    # Scale down for nights and weekends
    autoscaling.put_scheduled_update_group_action(
        AutoScalingGroupName=asg_name,
        ScheduledActionName='off-hours-scale-down',
        MinSize=3,
        DesiredCapacity=5,
        Recurrence='0 18 * * MON-FRI',  # 6 PM Mon-Fri
        TimeZone='America/New_York'
    )

    # Weekend minimum capacity
    autoscaling.put_scheduled_update_group_action(
        AutoScalingGroupName=asg_name,
        ScheduledActionName='weekend-minimum',
        MinSize=2,
        DesiredCapacity=3,
        Recurrence='0 0 * * SAT',  # Saturday midnight
        TimeZone='America/New_York'
    )
Enter fullscreen mode Exit fullscreen mode

B. Spot Instance Interruption Handling

Here's our battle-tested Spot interruption handler:

# spot_interruption_handler.py
import requests
import time
import logging
import boto3
from concurrent.futures import ThreadPoolExecutor

logger = logging.getLogger(__name__)
ec2 = boto3.client('ec2')
elbv2 = boto3.client('elbv2')

class SpotInterruptionHandler:
    def __init__(self, instance_id, target_group_arn):
        self.instance_id = instance_id
        self.target_group_arn = target_group_arn
        self.metadata_url = "http://169.254.169.254/latest/meta-data/spot/instance-action"

    def check_spot_interruption(self):
        """Check for Spot instance interruption notices"""
        try:
            response = requests.get(self.metadata_url, timeout=1)
            if response.status_code == 200:
                interruption_data = response.json()
                logger.warning(f"Spot interruption notice: {interruption_data}")

                # We have 2 minutes to act
                self.handle_interruption(interruption_data['time'])
                return True

        except requests.exceptions.RequestException:
            # No interruption notice (normal operation)
            return False

    def handle_interruption(self, interruption_time):
        """Gracefully handle Spot interruption"""
        with ThreadPoolExecutor(max_workers=3) as executor:
            # Parallel execution for speed
            executor.submit(self.drain_connections)
            executor.submit(self.save_state)
            executor.submit(self.notify_monitoring)

        # Deregister from target group
        self.deregister_from_alb()

    def drain_connections(self):
        """Stop accepting new connections and drain existing ones"""
        # Application-specific implementation
        logger.info("Draining connections...")
        # Set health check to fail
        with open('/var/www/health-check', 'w') as f:
            f.write('draining')

        # Wait for connections to drain (max 90 seconds)
        time.sleep(90)

    def deregister_from_alb(self):
        """Deregister instance from ALB target group"""
        try:
            elbv2.deregister_targets(
                TargetGroupArn=self.target_group_arn,
                Targets=[{'Id': self.instance_id}]
            )
            logger.info(f"Deregistered {self.instance_id} from target group")
        except Exception as e:
            logger.error(f"Failed to deregister: {e}")

# Run this as a daemon on each instance
if __name__ == "__main__":
    handler = SpotInterruptionHandler(
        instance_id=requests.get('http://169.254.169.254/latest/meta-data/instance-id').text,
        target_group_arn='arn:aws:elasticloadbalancing:region:account:targetgroup/name/id'
    )

    while True:
        if handler.check_spot_interruption():
            break
        time.sleep(5)
Enter fullscreen mode Exit fullscreen mode

C. Multi-AZ Spot Diversification

Spread your risk across availability zones and instance types:

# spot-diversification-template.yaml
LaunchTemplate:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: diverse-spot-template
    LaunchTemplateData:
      IamInstanceProfile:
        Arn: !GetAtt InstanceProfile.Arn
      SecurityGroupIds:
        - !Ref WebSecurityGroup
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          # Install Spot interruption handler
          curl -o /usr/local/bin/spot-handler.py ${SpotHandlerUrl}
          chmod +x /usr/local/bin/spot-handler.py

          # Run as systemd service
          cat > /etc/systemd/system/spot-handler.service << EOF
          [Unit]
          Description=Spot Instance Interruption Handler
          After=network.target

          [Service]
          Type=simple
          ExecStart=/usr/bin/python3 /usr/local/bin/spot-handler.py
          Restart=always

          [Install]
          WantedBy=multi-user.target
          EOF

          systemctl enable spot-handler
          systemctl start spot-handler

          # Your application startup script here
          /opt/app/start.sh
Enter fullscreen mode Exit fullscreen mode

Monitoring and Alerting

Comprehensive monitoring is crucial for maintaining high availability:

# monitoring_setup.py
import boto3

cloudwatch = boto3.client('cloudwatch')

def create_comprehensive_monitoring(asg_name, sns_topic_arn):
    """Set up comprehensive monitoring for mixed instance ASG"""

    alarms = []

    # 1. On-Demand instance health check
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-on-demand-minimum',
        ComparisonOperator='LessThanThreshold',
        EvaluationPeriods=2,
        MetricName='GroupInServiceInstances',
        Namespace='AWS/AutoScaling',
        Period=60,
        Statistic='Minimum',
        Threshold=3.0,  # Minimum On-Demand instances
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        AlarmDescription='On-Demand instances below minimum threshold',
        Dimensions=[
            {
                'Name': 'AutoScalingGroupName',
                'Value': asg_name
            }
        ]
    ))

    # 2. Spot interruption rate monitoring
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-high-spot-interruption-rate',
        ComparisonOperator='GreaterThanThreshold',
        EvaluationPeriods=1,
        MetricName='SpotInstanceInterruptionWarnings',
        Namespace='AWS/EC2',
        Period=300,
        Statistic='Sum',
        Threshold=5.0,
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        AlarmDescription='High Spot instance interruption rate detected'
    ))

    # 3. Cost anomaly detection
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-cost-anomaly',
        ComparisonOperator='GreaterThanThreshold',
        EvaluationPeriods=4,
        MetricName='EstimatedCharges',
        Namespace='AWS/Billing',
        Period=3600,  # 1 hour
        Statistic='Maximum',
        Threshold=100.0,  # Adjust based on your expected hourly cost
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        AlarmDescription='Unexpected cost spike detected',
        Dimensions=[
            {
                'Name': 'Currency',
                'Value': 'USD'
            }
        ]
    ))

    # 4. Application-level availability
    alarms.append(cloudwatch.put_metric_alarm(
        AlarmName=f'{asg_name}-application-availability',
        ComparisonOperator='LessThanThreshold',
        EvaluationPeriods=3,
        MetricName='HealthyHostCount',
        Namespace='AWS/ApplicationELB',
        Period=60,
        Statistic='Average',
        Threshold=0.9,  # 90% of hosts should be healthy
        ActionsEnabled=True,
        AlarmActions=[sns_topic_arn],
        TreatMissingData='breaching',
        AlarmDescription='Application availability below threshold'
    ))

    return alarms

# Create all monitoring alarms
create_comprehensive_monitoring(
    'production-web-asg',
    'arn:aws:sns:us-east-1:123456789012:asg-alerts'
)
Enter fullscreen mode Exit fullscreen mode

Results: 70% Cost Reduction, Better Availability

After implementing these strategies, here are our results:

💰 Cost Breakdown:

Metric Before After Improvement
Monthly Cost $7,200 $2,160 70% reduction
Cost per Million Requests $24 $7.20 70% reduction
On-Demand Instances 20-50 3-10 85% reduction
Spot Instance Usage 0% 70% New savings source

📈 Availability Improvements:

Metric Before After Improvement
Availability 99.9% 99.99% 10x better
Monthly Downtime 43 minutes 4.3 minutes 90% reduction
Scale-up Time 5 minutes 30 seconds 10x faster
Recovery Time 10 minutes < 2 minutes 5x faster

🚀 Performance Metrics:

  • Request latency: No change (same instance types)
  • Spot interruption impact: < 0.01% of requests affected
  • Warm Pool efficiency: 95% hit rate during scale events
  • Predictive scaling accuracy: 92% (reduced reactive scaling by 80%)

Key Lessons Learned

1. Start Conservative

Begin with 50% Spot instances and gradually increase as you gain confidence. We started at 50/50 and moved to 30/70 after 3 months.

2. Diversification is Critical

  • Use at least 4 different instance types
  • Spread across multiple AZs
  • Consider cross-region failover for critical apps

3. Warm Pools are Game-Changers

The slight additional cost (~$50/month) pays for itself in:

  • Improved user experience during scaling
  • Reduced Spot interruption impact
  • Better handling of traffic spikes

4. Monitor Everything

Set up alerts for:

  • Minimum On-Demand capacity
  • Spot interruption rates
  • Cost anomalies
  • Application-level metrics

5. Test, Test, Test

Regular chaos engineering exercises:

# Simulate Spot interruptions in staging
aws ec2 terminate-instances --instance-ids $(aws ec2 describe-instances \
  --filters "Name=tag:Environment,Values=staging" \
  "Name=instance-lifecycle,Values=spot" \
  --query 'Reservations[*].Instances[*].InstanceId' \
  --output text | shuf -n 3)
Enter fullscreen mode Exit fullscreen mode

Implementation Checklist

Ready to implement this in your environment? Here's your checklist:

  • [ ] Analyze current ASG utilization patterns (use CloudWatch metrics)
  • [ ] Calculate minimum On-Demand capacity needed (peak traffic / instance capacity)
  • [ ] Set up Mixed Instances Policy with 4+ instance types
  • [ ] Implement Warm Pool (start with 20% of peak capacity)
  • [ ] Configure predictive scaling based on 2 weeks of data
  • [ ] Set up scheduled scaling for known patterns
  • [ ] Deploy Spot interruption handlers on all instances
  • [ ] Create comprehensive CloudWatch alarms
  • [ ] Test Spot interruption handling in staging
  • [ ] Document runbooks for various failure scenarios
  • [ ] Set up cost allocation tags for tracking savings

Conclusion

Optimizing Auto Scaling Groups doesn't require sacrificing availability for cost savings. By intelligently combining On-Demand and Spot instances with modern scaling strategies, we achieved dramatic cost reductions while actually improving our service reliability.

The key is starting with a solid foundation of On-Demand instances and gradually optimizing with Spot instances, predictive scaling, and warm pools. With proper monitoring and automation, you can maintain enterprise-grade availability at startup-friendly costs.

Next Steps

  1. Start with the Mixed Instances Policy - it's the quickest win
  2. Add Warm Pools once you're comfortable with Spot instances
  3. Implement predictive scaling after gathering 2 weeks of metrics
  4. Continuously monitor and optimize based on your specific patterns

What's your experience with ASG cost optimization? Have you tried mixing instance types? I'd love to hear about your strategies in the comments!


This is part 2 of my AWS Cost Optimization series. Check out Part 1: Zero-Downtime RDS to Aurora Serverless v2 Migration

Next in this series: "Zero-Downtime Blue-Green Deployments with 90% Less Infrastructure Cost"


Found this helpful? Follow me for more AWS cost optimization tips and real-world DevOps experiences!

Top comments (0)