DEV Community

Cover image for AWS ELB & Auto Scaling — Building Self-Healing, Scalable Infrastructure
Tejas Shinkar
Tejas Shinkar

Posted on

AWS ELB & Auto Scaling — Building Self-Healing, Scalable Infrastructure

Elastic Load Balancing & Auto Scaling

Traffic Distribution · ASG · Scaling Policies · ALB vs NLB · Health Checks · Zero-Downtime Deployments


Part of my AWS learning journey — transitioning from Systems Engineer to Cloud/DevOps. This session covers the two services that make AWS infrastructure truly production-grade: ELB and Auto Scaling.


📋 Topics Covered

# Topic Type
1 Purpose of Load Balancing Concept
2 How Load Balancing Works (Algorithms) Concept + Interview
3 Horizontal vs Vertical Scaling Concept + Interview
4 Scaling Policies — Dynamic, Scheduled, Predictive Concept + Cert
5 Auto Scaling Group (ASG) — Core Config Concept + Lab
6 ASG Lifecycle & Key Metrics Concept + Cert
7 Health Checks & Replacement Policies Concept + Lab
8 Session Stickiness Concept + Interview
9 Connection Draining (Deregistration Delay) Concept + Interview
10 Instance Refresh — Zero-Downtime Deployments Concept + DevOps
11 Load Balancer Types — ALB vs NLB vs GWLB Concept + Cert
12 Target Groups & Health Check Config Concept + Lab
13 ALB Deep Dive — Routing Rules Concept + Cert
14 Architecture — ALB + ASG Concept + DevOps
15 Monitoring & Troubleshooting Concept + DevOps
16 Best Practices DevOps
17 AWS CLI Commands Practical
18 Lab — Create ELB + Auto Scaling Group (7 Steps) Lab
19 Cleanup Checklist Lab
20 Assignment Practice

Why Load Balancing Exists

Imagine a popular restaurant with one cashier — as soon as the lunch rush hits, the queue grows, the cashier gets overwhelmed, and eventually everything grinds to a halt. The fix? More cashiers, with a manager at the door directing customers to whoever is free.

A Load Balancer is that manager.

It sits in front of your EC2 instances and distributes incoming traffic across them — so no single server gets overwhelmed, and if one fails, the others keep running.

Without a Load Balancer:

  • Single point of failure — one server down means your app is down
  • No horizontal scaling — you're stuck with one server
  • No health checking — dead servers still receive traffic
  • Manual failover — someone has to intervene

With a Load Balancer:

  • High availability across multiple AZs
  • Horizontal scalability — add more servers on demand
  • Automatic health checks — dead servers removed instantly
  • Automatic failover — zero manual intervention
  • SSL/TLS termination — offloads HTTPS decryption from backend servers

How Load Balancing Works

Request flow — what actually happens:

1. User's browser sends HTTP request to your domain
2. DNS resolves your domain → returns ALB's IP address
3. Request hits the Load Balancer
4. LB picks a healthy backend EC2 instance
5. Forwards request to that instance
6. Instance processes and responds
7. LB returns the response to the user
Enter fullscreen mode Exit fullscreen mode

Load Distribution Algorithms:

Algorithm How it works AWS Usage
Round Robin Requests go in rotation: 1→2→3→1→2→3 ALB default
Least Connections Route to instance with fewest active connections Good for variable request lengths
IP Hash Client IP always maps to same backend Sticky sessions
Weighted Instance A gets 70%, B gets 30% Canary deployments
Flow Hash Hash of protocol+IP+port NLB default for TCP/UDP

🎯 Cert Tip: ALB uses weighted round robin by default. NLB uses flow hash for TCP/UDP traffic.


Horizontal vs Vertical Scaling

This is one of the most fundamental concepts in cloud architecture — and a guaranteed interview question.

Analogy: Your app is a delivery service. Vertical scaling = buying a bigger truck. Horizontal scaling = buying more trucks.

Vertical Scaling (Scale Up)

Upgrade the existing instance — stop it, change the type, restart.

t3.micro → t3.medium → t3.large → t3.xlarge
Enter fullscreen mode Exit fullscreen mode
  • ✅ Simple — no distributed system complexity
  • ✅ Better for databases (data consistency is easier)
  • ❌ Has a hard ceiling — biggest instance type is the limit
  • ❌ Requires downtime during the type change
  • ❌ Cannot auto-scale

Horizontal Scaling (Scale Out)

Add more instances — distribute load across them.

1 × t3.micro → 2 × t3.micro → 4 × t3.micro → back to 2
Enter fullscreen mode Exit fullscreen mode
  • ✅ Theoretically unlimited scalability
  • ✅ Fault tolerant — one fails, others keep running
  • ✅ Scales dynamically with Auto Scaling Groups
  • ✅ No downtime when scaling
  • ❌ Requires a Load Balancer to distribute traffic
  • ❌ Requires stateless app design (or sticky sessions for state)

🎯 Cloud-native best practice: always use horizontal scaling with ASG. Vertical scaling is for legacy systems or databases that can't easily distribute state.


Scaling Policies — Three Types

How does AWS know when to scale? Scaling Policies define the trigger.

1. Dynamic Scaling — Target Tracking (Most Common)

Monitor a metric and automatically adjust to keep it at a target value.

Metric: Average CPU Utilization
Target: 50%

CPU rises to 75% → ASG adds instances until CPU ≈ 50%
CPU drops to 20% → ASG removes instances until CPU ≈ 50%
Enter fullscreen mode Exit fullscreen mode

AWS creates two CloudWatch alarms automatically — one for scale-out, one for scale-in. You don't configure alarms manually.

Best for: Most production workloads. Simplest to set up and maintain.

2. Scheduled Scaling

Scale at pre-defined times — for predictable, known traffic patterns.

Every weekday 8:30 AM  → set Desired = 6  (office hours ramp up)
Every weekday 7:00 PM  → set Desired = 2  (evening wind down)
Every Saturday 12:00 AM → set Desired = 8 (weekend traffic spike)
Enter fullscreen mode Exit fullscreen mode

Best for: Business-hours traffic, batch jobs at fixed times, known seasonal spikes.

3. Predictive Scaling (ML-based)

AWS analyses at least 2 weeks of historical metric data, forecasts future demand, and scales EC2 instances before traffic arrives — not in reaction to it.

Historical pattern: Every Friday at 5PM, traffic doubles
Predictive: ASG pre-scales at 4:45PM — instances ready before surge hits
Enter fullscreen mode Exit fullscreen mode

Best for: Recurring, predictable patterns where reactive scaling causes latency spikes.

🎯 Cert Tip: Predictive Scaling requires a minimum of 2 weeks of historical data to build its forecast model.


Auto Scaling Group (ASG)

An ASG is the component that actually manages your EC2 fleet — launching, terminating, and replacing instances automatically.

Analogy: ASG is like a staffing agency for your servers. You tell it: "I need at least 2 people, ideally 4, maximum 10 — and if anyone quits or gets sick, hire a replacement immediately."

The Three Capacity Numbers (Core Concept)

MAX     [4] ──── Hard ceiling. ASG will NEVER exceed this, even under extreme load
DESIRED [2] ──── ASG always tries to maintain exactly this many healthy instances
MIN     [1] ──── Hard floor. ASG will NEVER go below this, even at zero traffic
Enter fullscreen mode Exit fullscreen mode

ASG constantly compares running instances against Desired capacity. If an instance fails a health check, it terminates it and launches a replacement — no human intervention needed.

ASG Configuration Components

Setting What it does
Launch Template Blueprint for every EC2 ASG launches — AMI, instance type, SG, User Data
Min / Desired / Max Capacity boundaries
Availability Zones Which AZs to spread instances across (always 2+)
Load Balancer / Target Group Where to register launched instances
Health Check Type EC2 (VM-level) or ELB (app-level) — always use ELB
Scaling Policies Dynamic, scheduled, or predictive rules
Termination Policy Which instances to remove during scale-in (default: oldest)

✅ Always use Launch Templates not Launch Configurations — templates support versioning, Instance Refresh, and newer EC2 features.


ASG Lifecycle — Instance States

Scale-out triggered
      ↓
  Pending ──── being launched by ASG
      ↓
  InService ── running, healthy, serving traffic ← normal operating state
      ↓
  Draining ─── marked for removal, existing connections completing
      ↓
  Terminated ─ deleted, no longer in ASG

  Unhealthy ── failed health check → immediately replaced with new instance
Enter fullscreen mode Exit fullscreen mode

Health Checks & Replacement Policies

Two Types of Health Checks

EC2 Health Check (VM level)
Checks if the instance is running — system and instance status checks pass. Doesn't know if your application is actually responding.

ELB Health Check (App level)
The ALB sends a real HTTP request to your configured path (e.g., /health). If the app doesn't return HTTP 200, the instance is marked unhealthy.

Always use ELB health checks for web applications. An instance can be "running" while Apache is crashed inside — EC2 checks miss this entirely. ELB checks catch it within 60 seconds.

Health Check Configuration

Setting Default What it means
Interval 30 seconds How often checks are sent
Timeout 5 seconds How long to wait for response
Healthy Threshold 2 consecutive successes Instance declared healthy
Unhealthy Threshold 2 consecutive failures Instance declared unhealthy
Grace Period 300 seconds Time after launch before checks begin (allows app boot time)

Self-Healing Flow

Health check fails twice
      ↓
ALB marks instance "Unhealthy" → stops sending traffic to it
      ↓
ASG receives notification
      ↓
ASG terminates the unhealthy instance
      ↓
ASG launches a new replacement from Launch Template
      ↓
Replacement passes health checks → added to Target Group
      ↓
Total time: ~5-10 minutes, fully automated ✅
Enter fullscreen mode Exit fullscreen mode

Session Stickiness (Sticky Sessions)

By default, ALB routes each request to any healthy instance via round robin — a user's second request might go to a different server than their first. For stateless apps that's fine. For stateful apps (shopping carts, auth sessions), the session exists only on one server.

Sticky Sessions fixes this: ALB sets a cookie in the browser that pins the user to one specific backend instance for the session duration.

User logs in → ALB routes to EC2-1 → sets cookie: "stick to EC2-1"
Next request → user sends cookie → ALB routes to EC2-1 again ✅
Session expires / cookie deleted → back to normal round robin
Enter fullscreen mode Exit fullscreen mode

When to use: Shopping carts, user authentication, stateful apps with in-memory sessions.

Downsides:

  • Uneven load distribution
  • If the pinned instance fails, the session is lost anyway

🎯 Proper long-term fix: Externalize session state to ElastiCache (Redis) or DynamoDB so any instance can serve any user without needing stickiness.


Connection Draining (Deregistration Delay)

When an instance is removed (scale-in, replacement, deployment), what happens to users mid-request on that instance?

Without draining: requests are immediately dropped → users get errors.

Connection Draining:

Instance marked for removal
      ↓
ALB immediately stops sending NEW requests to this instance
      ↓
ALB waits for EXISTING in-flight requests to finish
      ↓
Default wait: up to 300 seconds
      ↓
After all connections finish (or timeout) → instance terminated
      ↓
Zero dropped requests ✅
Enter fullscreen mode Exit fullscreen mode

This is what enables zero-downtime scaling and deployments.


Instance Refresh — Zero-Downtime Deployments

When you update your application (new AMI, new config), you need to replace all running instances with the new version. Instance Refresh automates this within ASG.

Refresh strategies:

Strategy Method Speed Risk
Rolling One instance at a time Slowest Lowest
Rolling with Stagger X% in parallel (25%→50%→75%) Medium Medium
Blue/Green All new instances launched, traffic switched, old terminated Fastest Higher cost

Workflow:

Update Launch Template (new AMI version)
      ↓
Start Instance Refresh in ASG console
      ↓
ASG launches new instance from updated template
      ↓
Health checks pass on new instance
      ↓
Old instance: connection draining → terminated
      ↓
Repeat for all instances
      ↓
All instances running new AMI — zero downtime ✅
Enter fullscreen mode Exit fullscreen mode

🎯 DevOps relevance: Instance Refresh is the ASG equivalent of a rolling update in Kubernetes. Same concept, different tool.


Load Balancer Types — ALB vs NLB vs GWLB

ALB NLB GWLB
OSI Layer Layer 7 (Application) Layer 4 (Transport) Layer 3 (Network)
Protocol HTTP, HTTPS TCP, UDP, TLS All IP protocols
Routing basis Path, host, header, query string IP + Port (flow hash) Transparent proxy
Latency ~25-100ms <100 microseconds Very low
Static IP ❌ No ✅ Yes ✅ Yes
SSL Termination ✅ Yes ✅ Yes ❌ No
Use Case Web apps, APIs, microservices Gaming, IoT, VoIP, real-time Firewalls, IDS/IPS appliances

Decision guide:

Web app or REST API?                 → ALB (almost always)
Ultra-low latency TCP/UDP required?  → NLB
Traffic must pass through a firewall/security appliance? → GWLB
Enter fullscreen mode Exit fullscreen mode

⚠️ Classic Load Balancer (CLB) exists but is legacy — never use for new deployments.

🎯 Cert scenarios: "Which LB supports path-based routing?" → ALB only. "Which provides a static IP?" → NLB. "Which for real-time gaming at massive scale?" → NLB.


ALB Deep Dive — Routing Rules

ALB operates at Layer 7 — it understands HTTP. This enables routing based on the actual content of each request.

Listener Rules (evaluated top-to-bottom):

Rule 1: Host = api.example.com
        → Forward to: api-target-group (3x t3.medium)

Rule 2: Host = admin.example.com
        → Forward to: admin-target-group (restricted access)

Rule 3: Path = /images/*
        → Forward to: media-target-group (or S3)

Rule 4: Path = /api/*
        → Forward to: api-target-group

Rule 5: HTTP (port 80) — any path
        → Redirect: 301 to HTTPS

Default Rule: no other rule matched
        → Forward to: web-target-group (main homepage)
Enter fullscreen mode Exit fullscreen mode

One ALB can route to multiple backend services — this is the foundation of microservices architecture on AWS.


Target Groups & Health Check Config

A Target Group is a named pool of EC2 instances (or IPs, or Lambdas) that the ALB routes traffic to. One ALB → multiple target groups via listener rules.

Setting Value
Target Type Instances
Protocol HTTP or HTTPS
Port App's listening port (80, 8080, 3000, etc.)
VPC Must match your EC2 instances
Health Check Path /health — must return HTTP 200-299

⚠️ When using ASG — never manually register instances to the Target Group. Let ASG handle registration/deregistration automatically as instances launch and terminate.


Architecture — ALB + ASG (Production Standard)

                     Internet Users
                           │
                       Route 53
                  (domain → ALB DNS)
                           │
               Application Load Balancer
             (Internet-facing, 2+ AZs, SG-ALB)
                           │
                     Target Group
               (health-checked instance pool)
                           │
            Auto Scaling Group (min=1, desired=2, max=4)
         ┌─────────────────┼──────────────────┐
         │                 │                  │
    EC2 (AZ-a)         EC2 (AZ-b)        EC2 (AZ-b)
    SG-EC2, :80        SG-EC2, :80       SG-EC2, :80
         │
     (shared state)
    RDS Database
    or ElastiCache

CloudWatch CPU metric → Scaling Policy → ASG scales in/out
Enter fullscreen mode Exit fullscreen mode

Three architectural pillars achieved:

  • High Availability — multi-AZ ALB and instances
  • Fault Tolerance — ASG self-heals on any instance failure
  • Cost Efficiency — scale down at night, up during peaks

Monitoring & Troubleshooting

Key CloudWatch Metrics

ALB Metrics:

Metric Alert When
TargetResponseTime > 500ms sustained
RequestCount Sudden unexpected drop
HTTPCode_Target_5XX > 0 sustained
HealthyHostCount Below Desired capacity
UnHealthyHostCount Greater than 0

ASG Metrics:

Metric What it shows
GroupDesiredCapacity Target count
GroupInServiceInstances Healthy, running count
GroupPendingInstances Currently launching
GroupTerminatingInstances Being removed

Common Issues & Fixes

UnHealthyHostCount > 0

Check: SG-EC2 allows port 80 FROM SG-ALB (not from internet)
Check: Health check path actually returns 200 (try curl from EC2)
Check: App running → systemctl status httpd
Check: Grace period → did instance have 300s to boot before checks started?
Enter fullscreen mode Exit fullscreen mode

ASG not scaling despite high CPU

Check: Scaling policy threshold — is CPU actually crossing it?
Check: Max capacity — already at the ceiling?
Check: Cooldown — previous scale action still in cooldown?
Check: CloudWatch alarm — is it actually in ALARM state?
Enter fullscreen mode Exit fullscreen mode

Requests timing out after ALB

Check: EC2 instances responding? Try curl from inside the VPC
Check: Security group — SG-EC2 must allow HTTP FROM SG-ALB only
Check: App logs — any errors being thrown?
Enter fullscreen mode Exit fullscreen mode

Session lost when instance replaced

Short-term: Enable sticky sessions on the Target Group
Long-term: Move session state to ElastiCache (Redis) or DynamoDB
Enter fullscreen mode Exit fullscreen mode

Best Practices

ALB
  ✅ Always deploy across 2+ AZs
  ✅ Use path-based routing to consolidate microservices on one ALB
  ✅ Enable access logs for debugging and compliance
  ✅ Add HTTPS listener + redirect HTTP → HTTPS (use ACM for free cert)
  ✅ Use dedicated health check endpoint /health (not just /)

AUTO SCALING
  ✅ Use Target Tracking (simplest) over Step Scaling
  ✅ Set cooldown 300-600 seconds to avoid scaling oscillation
  ✅ Set Grace Period long enough for your app to boot (300s minimum)
  ✅ Use Launch Templates (not Launch Configurations)
  ✅ Test scaling with a load test before going to production
  ✅ Combine Scheduled + Target Tracking for predictable + reactive scaling

HEALTH CHECKS
  ✅ Always use ELB health checks (not EC2 only)
  ✅ Healthy threshold: 2, Unhealthy threshold: 2 (fast detection)
  ✅ Enable connection draining (default 300s)

SECURITY
  ✅ SG-EC2 inbound: HTTP from SG-ALB only (never 0.0.0.0/0)
  ✅ Users hit ALB only — EC2 instances never exposed to internet directly
Enter fullscreen mode Exit fullscreen mode

AWS CLI Commands

# Create Application Load Balancer
aws elbv2 create-load-balancer \
  --name web-app-alb \
  --subnets subnet-aaa subnet-bbb \
  --security-groups sg-alb-xxx \
  --scheme internet-facing \
  --type application

# Create Target Group
aws elbv2 create-target-group \
  --name web-server-tg \
  --protocol HTTP --port 80 \
  --vpc-id vpc-xxx \
  --health-check-path /

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-app-asg \
  --launch-template LaunchTemplateName=web-server-lt,Version='$Latest' \
  --min-size 1 --max-size 4 --desired-capacity 2 \
  --availability-zones us-east-1a us-east-1b

# Attach Target Group to ASG
aws autoscaling attach-load-balancer-target-groups \
  --auto-scaling-group-name web-app-asg \
  --target-group-arns arn:aws:elasticloadbalancing:...

# Create Target Tracking Policy (CPU 50%)
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-app-asg \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration \
  '{"TargetValue":50.0,"PredefinedMetricSpecification":{"PredefinedMetricType":"ASGAverageCPUUtilization"}}'

# Describe ASG
aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names web-app-asg

# Manually set desired capacity
aws autoscaling set-desired-capacity \
  --auto-scaling-group-name web-app-asg \
  --desired-capacity 3
Enter fullscreen mode Exit fullscreen mode

🧪 LAB — Create ELB + Auto Scaling Group

Security Groups (Create First)

SG-ALB:

Inbound:  HTTP (80) from 0.0.0.0/0
Outbound: All traffic
Enter fullscreen mode Exit fullscreen mode

SG-EC2:

Inbound:  HTTP (80) from SG-ALB  ← reference the SG, not CIDR
Outbound: All traffic
Enter fullscreen mode Exit fullscreen mode

⚠️ SG-EC2's inbound rule must reference SG-ALB as source — not 0.0.0.0/0. This blocks direct internet access to EC2. Only ALB can reach them.


Step 1 — Create Launch Template

EC2 → Launch Templates → Create launch template

Name:            web-server-lt
AMI:             Amazon Linux 2023 (64-bit x86)
Instance Type:   t2.micro / t3.micro (Free Tier)
Key Pair:        your-key-pair
Security Group:  SG-EC2
Storage:         8 GiB gp3
Enter fullscreen mode Exit fullscreen mode

User Data (paste in Advanced Details):

#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd

TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

AZ=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/placement/availability-zone)
INSTANCE_ID=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/placement/region)

cat > /var/www/html/index.html << EOF
<!DOCTYPE html>
<html>
<body style="font-family:Arial;text-align:center;padding:60px;background:#0A1628;color:#fff;">
  <h1 style="color:#FF9900;">Hello from AWS!</h1>
  <p>Instance ID: <b style="color:#FF9900;">$INSTANCE_ID</b></p>
  <p>Availability Zone: <b style="color:#FF9900;">$AZ</b></p>
  <p>Region: <b style="color:#FF9900;">$REGION</b></p>
</body>
</html>
EOF
Enter fullscreen mode Exit fullscreen mode

✅ Checkpoint: web-server-lt listed in EC2 → Launch Templates


Step 2 — Create Target Group

EC2 → Load Balancing → Target Groups → Create target group

Target type:         Instances
Name:                web-server-tg
Protocol / Port:     HTTP / 80
VPC:                 Default VPC
Health check path:   /
Healthy threshold:   2
Unhealthy threshold: 2
Timeout:             5 seconds
Interval:            30 seconds
Enter fullscreen mode Exit fullscreen mode

⚠️ On "Register targets" page — skip it entirely, click Create directly. ASG will register instances automatically.

✅ Checkpoint: web-server-tg shows status "unused" — this is correct, no instances yet.


Step 3 — Create Application Load Balancer

EC2 → Load Balancers → Create Load Balancer → Application Load Balancer

Name:           web-app-alb
Scheme:         Internet-facing
VPC:            Default VPC
Subnets:        Select 2 subnets in DIFFERENT AZs (mandatory — fails with one)
Security Group: SG-ALB (remove default SG)
Listener:       HTTP : 80 → Forward to web-server-tg
Enter fullscreen mode Exit fullscreen mode

Wait ~2-3 minutes for state to become Active.

✅ Checkpoint: Copy ALB DNS name — you'll need it to test.


Step 4 — Create Auto Scaling Group

EC2 → Auto Scaling Groups → Create Auto Scaling group

Name:             web-app-asg
Launch template:  web-server-lt (Latest)
VPC + Subnets:    Same 2 subnets as ALB

Load balancing:   Attach to existing → web-server-tg | HTTP
Health check:     ELB, Grace period: 300 seconds

Desired: 2  |  Min: 1  |  Max: 4
Scaling: None for now (add in Step 5)
Enter fullscreen mode Exit fullscreen mode

✅ Checkpoint: 2 EC2 instances launch within 1-2 min, both show Healthy in Target Group.


Step 5 — Configure Scaling Policy

ASG → web-app-asg → Automatic scaling → Create dynamic scaling policy

Policy type:   Target tracking scaling
Policy name:   cpu-target-tracking
Metric:        Average CPU Utilization
Target value:  50
Warmup:        300 seconds
Enter fullscreen mode Exit fullscreen mode

AWS auto-creates two CloudWatch alarms (scale-out + scale-in).


Step 6 — Test & Verify

# Open ALB DNS in browser
http://<ALB-DNS-name>
# → "Hello from AWS!" + Instance ID + AZ

# Refresh 5-10 times
# → Instance ID changes = ALB routing to different instances ✅
# → AZ changes = multi-AZ distribution ✅

# Trigger scale-out (SSH into one EC2)
ssh -i your-key.pem ec2-user@<ec2-public-ip>
sudo yum install -y stress
stress --cpu 4 --timeout 600

# Watch in console:
# EC2 → Auto Scaling Groups → Activity tab (new instances launching)
# CloudWatch → Alarms (CPU alarm goes to ALARM state)
Enter fullscreen mode Exit fullscreen mode

Expected timeline after stress starts:

T+0:00  CPU spikes
T+1:00  CloudWatch alarm: OK → ALARM
T+3:00  New EC2 launches
T+5:00  New instance healthy → ALB routes to 3 instances
T+10:00 Stress ends → CPU drops
T+18:00 ASG scale-in → back to Desired=2
Enter fullscreen mode Exit fullscreen mode

Step 7 — Monitor with CloudWatch

CloudWatch → Metrics → EC2 → By Auto Scaling Group
  → web-app-asg → CPUUtilization → Period: 1 min

CloudWatch → Metrics → ApplicationELB
  → Add: RequestCount, HealthyHostCount, TargetResponseTime
Enter fullscreen mode Exit fullscreen mode

🧹 Cleanup — Delete in This Exact Order

1. Auto Scaling Group  → terminates all EC2 instances automatically
   Wait 2-3 min for all instances to fully terminate

2. Application Load Balancer

3. Target Group

4. Launch Template

5. Security Groups  → delete SG-EC2 first, then SG-ALB
   (SG-ALB referenced by SG-EC2 — must delete SG-EC2 first)

6. CloudWatch Alarms  → delete auto-created scaling alarms
Enter fullscreen mode Exit fullscreen mode

💰 ALB = ~$16/month + EC2 costs. Always clean up after labs.


📝 Assignment

Task 1 (Mandatory) — Prove Self-Healing:
Manually terminate one running EC2 instance. Watch ASG Activity tab — confirm it auto-launches a replacement. Verify ALB kept serving traffic with zero downtime.

Task 2 (Intermediate) — Step Scaling Policy:
Create a Step Scaling policy: add 2 instances if CPU > 70%, add 1 if CPU 50-70%, remove 1 if CPU < 20%.

Task 3 (Intermediate) — HTTPS:
Add HTTPS listener (port 443) to ALB. Use AWS Certificate Manager for a free certificate. Add listener rule: HTTP → HTTPS redirect.

Task 4 (Advanced) — Scheduled Scaling:
Scale up to 4 instances at 9AM weekdays, down to 1 at 6PM weekdays. Set times in UTC.

Task 5 (Advanced) — CloudWatch Dashboard:
Build a dashboard with 4 widgets: CPU%, RequestCount, HealthyHostCount, GroupInServiceInstances.


⚡ Quick Revision

LOAD BALANCING
  Manager directing traffic to healthy servers
  Eliminates single point of failure
  ALB algorithm: weighted round robin (default)

SCALING
  Horizontal = more instances (cloud-native, preferred)
  Vertical   = bigger instance (simple, has ceiling, needs downtime)

SCALING POLICIES
  Target Tracking = keep metric at target, AWS manages alarms (use this)
  Scheduled       = scale at specific times
  Predictive      = ML-based, needs 2 weeks history

ASG CAPACITY
  MIN     = hard floor, never below
  DESIRED = ASG always works to maintain this count
  MAX     = hard ceiling, never exceed

HEALTH CHECKS
  EC2  = is instance running? (basic)
  ELB  = is app returning HTTP 200? (always use this)
  Grace Period = time after launch before first check (300s default)

LB TYPES
  ALB  = Layer 7, HTTP/HTTPS, path/host routing → web apps
  NLB  = Layer 4, TCP/UDP, microsecond latency → gaming/IoT
  GWLB = Layer 3, security appliances, firewalls

STICKY SESSIONS
  Cookie pins user to one EC2 for session duration
  Proper fix: external session store (Redis/ElastiCache)

CONNECTION DRAINING
  In-flight requests complete before instance terminates
  Default: 300 seconds | Enables zero-downtime scaling

INSTANCE REFRESH
  Rolling AMI replacement across entire ASG fleet
  Zero-downtime | Kubernetes rolling update equivalent

SECURITY GROUP PATTERN
  SG-ALB: HTTP 80 from 0.0.0.0/0
  SG-EC2: HTTP 80 FROM SG-ALB only
  Direct EC2 access blocked from internet
Enter fullscreen mode Exit fullscreen mode

💼 Interview Questions

Q1: Horizontal vs Vertical scaling — what's the difference and which does AWS recommend?
Horizontal = add more instances, requires LB, unlimited scale, fault tolerant. Vertical = upgrade existing instance, simple, has a hard ceiling, requires downtime. AWS recommends horizontal scaling for cloud-native apps.

Q2: What are Min, Desired, and Max capacity in an ASG?
Min = hard floor, ASG never drops below. Max = hard ceiling, ASG never exceeds. Desired = the target count ASG constantly works to maintain. Scaling policies adjust Desired within Min-Max boundaries.

Q3: Why use ELB health checks instead of EC2 health checks?
EC2 checks only verify the VM is running. ELB checks send a real HTTP request and verify the app returns 200. An EC2 can be "running" while Apache is crashed — EC2 checks miss it, ELB checks catch it within 60 seconds.

Q4: What is Connection Draining?
It allows in-flight requests to complete before an instance is terminated. ALB immediately stops sending new requests but waits up to 300 seconds for active connections to finish — enabling zero-downtime scaling and deployments.

Q5: ALB vs NLB — when do you use each?
ALB for HTTP/HTTPS web apps and microservices — understands application content, supports path/host routing. NLB for ultra-low latency TCP/UDP workloads (gaming, IoT, VoIP) or when a static IP is required.

Q6: Why reference SG-ALB as source in SG-EC2 instead of 0.0.0.0/0?
Referencing SG-ALB as source means only traffic that came through the ALB can reach EC2. Direct internet access to the EC2 IP is blocked. More secure, no IP management needed as instances scale.

Q7: What is Instance Refresh?
ASG's native mechanism to replace all running instances with a new Launch Template version (new AMI or config). It handles draining, replacement, and health checks automatically. Equivalent to a Kubernetes rolling update.


AWS Session 6 — ELB & Auto Scaling | Cloud + DevOps learning journey — Systems Engineer → Cloud/DevOps Engineer

Top comments (0)