Elastic Load Balancing & Auto Scaling
Traffic Distribution · ASG · Scaling Policies · ALB vs NLB · Health Checks · Zero-Downtime Deployments
Part of my AWS learning journey — transitioning from Systems Engineer to Cloud/DevOps. This session covers the two services that make AWS infrastructure truly production-grade: ELB and Auto Scaling.
📋 Topics Covered
| # | Topic | Type |
|---|---|---|
| 1 | Purpose of Load Balancing | Concept |
| 2 | How Load Balancing Works (Algorithms) | Concept + Interview |
| 3 | Horizontal vs Vertical Scaling | Concept + Interview |
| 4 | Scaling Policies — Dynamic, Scheduled, Predictive | Concept + Cert |
| 5 | Auto Scaling Group (ASG) — Core Config | Concept + Lab |
| 6 | ASG Lifecycle & Key Metrics | Concept + Cert |
| 7 | Health Checks & Replacement Policies | Concept + Lab |
| 8 | Session Stickiness | Concept + Interview |
| 9 | Connection Draining (Deregistration Delay) | Concept + Interview |
| 10 | Instance Refresh — Zero-Downtime Deployments | Concept + DevOps |
| 11 | Load Balancer Types — ALB vs NLB vs GWLB | Concept + Cert |
| 12 | Target Groups & Health Check Config | Concept + Lab |
| 13 | ALB Deep Dive — Routing Rules | Concept + Cert |
| 14 | Architecture — ALB + ASG | Concept + DevOps |
| 15 | Monitoring & Troubleshooting | Concept + DevOps |
| 16 | Best Practices | DevOps |
| 17 | AWS CLI Commands | Practical |
| 18 | Lab — Create ELB + Auto Scaling Group (7 Steps) | Lab |
| 19 | Cleanup Checklist | Lab |
| 20 | Assignment | Practice |
Why Load Balancing Exists
Imagine a popular restaurant with one cashier — as soon as the lunch rush hits, the queue grows, the cashier gets overwhelmed, and eventually everything grinds to a halt. The fix? More cashiers, with a manager at the door directing customers to whoever is free.
A Load Balancer is that manager.
It sits in front of your EC2 instances and distributes incoming traffic across them — so no single server gets overwhelmed, and if one fails, the others keep running.
Without a Load Balancer:
- Single point of failure — one server down means your app is down
- No horizontal scaling — you're stuck with one server
- No health checking — dead servers still receive traffic
- Manual failover — someone has to intervene
With a Load Balancer:
- High availability across multiple AZs
- Horizontal scalability — add more servers on demand
- Automatic health checks — dead servers removed instantly
- Automatic failover — zero manual intervention
- SSL/TLS termination — offloads HTTPS decryption from backend servers
How Load Balancing Works
Request flow — what actually happens:
1. User's browser sends HTTP request to your domain
2. DNS resolves your domain → returns ALB's IP address
3. Request hits the Load Balancer
4. LB picks a healthy backend EC2 instance
5. Forwards request to that instance
6. Instance processes and responds
7. LB returns the response to the user
Load Distribution Algorithms:
| Algorithm | How it works | AWS Usage |
|---|---|---|
| Round Robin | Requests go in rotation: 1→2→3→1→2→3 | ALB default |
| Least Connections | Route to instance with fewest active connections | Good for variable request lengths |
| IP Hash | Client IP always maps to same backend | Sticky sessions |
| Weighted | Instance A gets 70%, B gets 30% | Canary deployments |
| Flow Hash | Hash of protocol+IP+port | NLB default for TCP/UDP |
🎯 Cert Tip: ALB uses weighted round robin by default. NLB uses flow hash for TCP/UDP traffic.
Horizontal vs Vertical Scaling
This is one of the most fundamental concepts in cloud architecture — and a guaranteed interview question.
Analogy: Your app is a delivery service. Vertical scaling = buying a bigger truck. Horizontal scaling = buying more trucks.
Vertical Scaling (Scale Up)
Upgrade the existing instance — stop it, change the type, restart.
t3.micro → t3.medium → t3.large → t3.xlarge
- ✅ Simple — no distributed system complexity
- ✅ Better for databases (data consistency is easier)
- ❌ Has a hard ceiling — biggest instance type is the limit
- ❌ Requires downtime during the type change
- ❌ Cannot auto-scale
Horizontal Scaling (Scale Out)
Add more instances — distribute load across them.
1 × t3.micro → 2 × t3.micro → 4 × t3.micro → back to 2
- ✅ Theoretically unlimited scalability
- ✅ Fault tolerant — one fails, others keep running
- ✅ Scales dynamically with Auto Scaling Groups
- ✅ No downtime when scaling
- ❌ Requires a Load Balancer to distribute traffic
- ❌ Requires stateless app design (or sticky sessions for state)
🎯 Cloud-native best practice: always use horizontal scaling with ASG. Vertical scaling is for legacy systems or databases that can't easily distribute state.
Scaling Policies — Three Types
How does AWS know when to scale? Scaling Policies define the trigger.
1. Dynamic Scaling — Target Tracking (Most Common)
Monitor a metric and automatically adjust to keep it at a target value.
Metric: Average CPU Utilization
Target: 50%
CPU rises to 75% → ASG adds instances until CPU ≈ 50%
CPU drops to 20% → ASG removes instances until CPU ≈ 50%
AWS creates two CloudWatch alarms automatically — one for scale-out, one for scale-in. You don't configure alarms manually.
Best for: Most production workloads. Simplest to set up and maintain.
2. Scheduled Scaling
Scale at pre-defined times — for predictable, known traffic patterns.
Every weekday 8:30 AM → set Desired = 6 (office hours ramp up)
Every weekday 7:00 PM → set Desired = 2 (evening wind down)
Every Saturday 12:00 AM → set Desired = 8 (weekend traffic spike)
Best for: Business-hours traffic, batch jobs at fixed times, known seasonal spikes.
3. Predictive Scaling (ML-based)
AWS analyses at least 2 weeks of historical metric data, forecasts future demand, and scales EC2 instances before traffic arrives — not in reaction to it.
Historical pattern: Every Friday at 5PM, traffic doubles
Predictive: ASG pre-scales at 4:45PM — instances ready before surge hits
Best for: Recurring, predictable patterns where reactive scaling causes latency spikes.
🎯 Cert Tip: Predictive Scaling requires a minimum of 2 weeks of historical data to build its forecast model.
Auto Scaling Group (ASG)
An ASG is the component that actually manages your EC2 fleet — launching, terminating, and replacing instances automatically.
Analogy: ASG is like a staffing agency for your servers. You tell it: "I need at least 2 people, ideally 4, maximum 10 — and if anyone quits or gets sick, hire a replacement immediately."
The Three Capacity Numbers (Core Concept)
MAX [4] ──── Hard ceiling. ASG will NEVER exceed this, even under extreme load
DESIRED [2] ──── ASG always tries to maintain exactly this many healthy instances
MIN [1] ──── Hard floor. ASG will NEVER go below this, even at zero traffic
ASG constantly compares running instances against Desired capacity. If an instance fails a health check, it terminates it and launches a replacement — no human intervention needed.
ASG Configuration Components
| Setting | What it does |
|---|---|
| Launch Template | Blueprint for every EC2 ASG launches — AMI, instance type, SG, User Data |
| Min / Desired / Max | Capacity boundaries |
| Availability Zones | Which AZs to spread instances across (always 2+) |
| Load Balancer / Target Group | Where to register launched instances |
| Health Check Type | EC2 (VM-level) or ELB (app-level) — always use ELB |
| Scaling Policies | Dynamic, scheduled, or predictive rules |
| Termination Policy | Which instances to remove during scale-in (default: oldest) |
✅ Always use Launch Templates not Launch Configurations — templates support versioning, Instance Refresh, and newer EC2 features.
ASG Lifecycle — Instance States
Scale-out triggered
↓
Pending ──── being launched by ASG
↓
InService ── running, healthy, serving traffic ← normal operating state
↓
Draining ─── marked for removal, existing connections completing
↓
Terminated ─ deleted, no longer in ASG
Unhealthy ── failed health check → immediately replaced with new instance
Health Checks & Replacement Policies
Two Types of Health Checks
EC2 Health Check (VM level)
Checks if the instance is running — system and instance status checks pass. Doesn't know if your application is actually responding.
ELB Health Check (App level)
The ALB sends a real HTTP request to your configured path (e.g., /health). If the app doesn't return HTTP 200, the instance is marked unhealthy.
✅ Always use ELB health checks for web applications. An instance can be "running" while Apache is crashed inside — EC2 checks miss this entirely. ELB checks catch it within 60 seconds.
Health Check Configuration
| Setting | Default | What it means |
|---|---|---|
| Interval | 30 seconds | How often checks are sent |
| Timeout | 5 seconds | How long to wait for response |
| Healthy Threshold | 2 consecutive successes | Instance declared healthy |
| Unhealthy Threshold | 2 consecutive failures | Instance declared unhealthy |
| Grace Period | 300 seconds | Time after launch before checks begin (allows app boot time) |
Self-Healing Flow
Health check fails twice
↓
ALB marks instance "Unhealthy" → stops sending traffic to it
↓
ASG receives notification
↓
ASG terminates the unhealthy instance
↓
ASG launches a new replacement from Launch Template
↓
Replacement passes health checks → added to Target Group
↓
Total time: ~5-10 minutes, fully automated ✅
Session Stickiness (Sticky Sessions)
By default, ALB routes each request to any healthy instance via round robin — a user's second request might go to a different server than their first. For stateless apps that's fine. For stateful apps (shopping carts, auth sessions), the session exists only on one server.
Sticky Sessions fixes this: ALB sets a cookie in the browser that pins the user to one specific backend instance for the session duration.
User logs in → ALB routes to EC2-1 → sets cookie: "stick to EC2-1"
Next request → user sends cookie → ALB routes to EC2-1 again ✅
Session expires / cookie deleted → back to normal round robin
When to use: Shopping carts, user authentication, stateful apps with in-memory sessions.
Downsides:
- Uneven load distribution
- If the pinned instance fails, the session is lost anyway
🎯 Proper long-term fix: Externalize session state to ElastiCache (Redis) or DynamoDB so any instance can serve any user without needing stickiness.
Connection Draining (Deregistration Delay)
When an instance is removed (scale-in, replacement, deployment), what happens to users mid-request on that instance?
Without draining: requests are immediately dropped → users get errors.
Connection Draining:
Instance marked for removal
↓
ALB immediately stops sending NEW requests to this instance
↓
ALB waits for EXISTING in-flight requests to finish
↓
Default wait: up to 300 seconds
↓
After all connections finish (or timeout) → instance terminated
↓
Zero dropped requests ✅
This is what enables zero-downtime scaling and deployments.
Instance Refresh — Zero-Downtime Deployments
When you update your application (new AMI, new config), you need to replace all running instances with the new version. Instance Refresh automates this within ASG.
Refresh strategies:
| Strategy | Method | Speed | Risk |
|---|---|---|---|
| Rolling | One instance at a time | Slowest | Lowest |
| Rolling with Stagger | X% in parallel (25%→50%→75%) | Medium | Medium |
| Blue/Green | All new instances launched, traffic switched, old terminated | Fastest | Higher cost |
Workflow:
Update Launch Template (new AMI version)
↓
Start Instance Refresh in ASG console
↓
ASG launches new instance from updated template
↓
Health checks pass on new instance
↓
Old instance: connection draining → terminated
↓
Repeat for all instances
↓
All instances running new AMI — zero downtime ✅
🎯 DevOps relevance: Instance Refresh is the ASG equivalent of a rolling update in Kubernetes. Same concept, different tool.
Load Balancer Types — ALB vs NLB vs GWLB
| ALB | NLB | GWLB | |
|---|---|---|---|
| OSI Layer | Layer 7 (Application) | Layer 4 (Transport) | Layer 3 (Network) |
| Protocol | HTTP, HTTPS | TCP, UDP, TLS | All IP protocols |
| Routing basis | Path, host, header, query string | IP + Port (flow hash) | Transparent proxy |
| Latency | ~25-100ms | <100 microseconds | Very low |
| Static IP | ❌ No | ✅ Yes | ✅ Yes |
| SSL Termination | ✅ Yes | ✅ Yes | ❌ No |
| Use Case | Web apps, APIs, microservices | Gaming, IoT, VoIP, real-time | Firewalls, IDS/IPS appliances |
Decision guide:
Web app or REST API? → ALB (almost always)
Ultra-low latency TCP/UDP required? → NLB
Traffic must pass through a firewall/security appliance? → GWLB
⚠️ Classic Load Balancer (CLB) exists but is legacy — never use for new deployments.
🎯 Cert scenarios: "Which LB supports path-based routing?" → ALB only. "Which provides a static IP?" → NLB. "Which for real-time gaming at massive scale?" → NLB.
ALB Deep Dive — Routing Rules
ALB operates at Layer 7 — it understands HTTP. This enables routing based on the actual content of each request.
Listener Rules (evaluated top-to-bottom):
Rule 1: Host = api.example.com
→ Forward to: api-target-group (3x t3.medium)
Rule 2: Host = admin.example.com
→ Forward to: admin-target-group (restricted access)
Rule 3: Path = /images/*
→ Forward to: media-target-group (or S3)
Rule 4: Path = /api/*
→ Forward to: api-target-group
Rule 5: HTTP (port 80) — any path
→ Redirect: 301 to HTTPS
Default Rule: no other rule matched
→ Forward to: web-target-group (main homepage)
One ALB can route to multiple backend services — this is the foundation of microservices architecture on AWS.
Target Groups & Health Check Config
A Target Group is a named pool of EC2 instances (or IPs, or Lambdas) that the ALB routes traffic to. One ALB → multiple target groups via listener rules.
| Setting | Value |
|---|---|
| Target Type | Instances |
| Protocol | HTTP or HTTPS |
| Port | App's listening port (80, 8080, 3000, etc.) |
| VPC | Must match your EC2 instances |
| Health Check Path |
/health — must return HTTP 200-299 |
⚠️ When using ASG — never manually register instances to the Target Group. Let ASG handle registration/deregistration automatically as instances launch and terminate.
Architecture — ALB + ASG (Production Standard)
Internet Users
│
Route 53
(domain → ALB DNS)
│
Application Load Balancer
(Internet-facing, 2+ AZs, SG-ALB)
│
Target Group
(health-checked instance pool)
│
Auto Scaling Group (min=1, desired=2, max=4)
┌─────────────────┼──────────────────┐
│ │ │
EC2 (AZ-a) EC2 (AZ-b) EC2 (AZ-b)
SG-EC2, :80 SG-EC2, :80 SG-EC2, :80
│
(shared state)
RDS Database
or ElastiCache
CloudWatch CPU metric → Scaling Policy → ASG scales in/out
Three architectural pillars achieved:
- High Availability — multi-AZ ALB and instances
- Fault Tolerance — ASG self-heals on any instance failure
- Cost Efficiency — scale down at night, up during peaks
Monitoring & Troubleshooting
Key CloudWatch Metrics
ALB Metrics:
| Metric | Alert When |
|---|---|
TargetResponseTime |
> 500ms sustained |
RequestCount |
Sudden unexpected drop |
HTTPCode_Target_5XX |
> 0 sustained |
HealthyHostCount |
Below Desired capacity |
UnHealthyHostCount |
Greater than 0 |
ASG Metrics:
| Metric | What it shows |
|---|---|
GroupDesiredCapacity |
Target count |
GroupInServiceInstances |
Healthy, running count |
GroupPendingInstances |
Currently launching |
GroupTerminatingInstances |
Being removed |
Common Issues & Fixes
UnHealthyHostCount > 0
Check: SG-EC2 allows port 80 FROM SG-ALB (not from internet)
Check: Health check path actually returns 200 (try curl from EC2)
Check: App running → systemctl status httpd
Check: Grace period → did instance have 300s to boot before checks started?
ASG not scaling despite high CPU
Check: Scaling policy threshold — is CPU actually crossing it?
Check: Max capacity — already at the ceiling?
Check: Cooldown — previous scale action still in cooldown?
Check: CloudWatch alarm — is it actually in ALARM state?
Requests timing out after ALB
Check: EC2 instances responding? Try curl from inside the VPC
Check: Security group — SG-EC2 must allow HTTP FROM SG-ALB only
Check: App logs — any errors being thrown?
Session lost when instance replaced
Short-term: Enable sticky sessions on the Target Group
Long-term: Move session state to ElastiCache (Redis) or DynamoDB
Best Practices
ALB
✅ Always deploy across 2+ AZs
✅ Use path-based routing to consolidate microservices on one ALB
✅ Enable access logs for debugging and compliance
✅ Add HTTPS listener + redirect HTTP → HTTPS (use ACM for free cert)
✅ Use dedicated health check endpoint /health (not just /)
AUTO SCALING
✅ Use Target Tracking (simplest) over Step Scaling
✅ Set cooldown 300-600 seconds to avoid scaling oscillation
✅ Set Grace Period long enough for your app to boot (300s minimum)
✅ Use Launch Templates (not Launch Configurations)
✅ Test scaling with a load test before going to production
✅ Combine Scheduled + Target Tracking for predictable + reactive scaling
HEALTH CHECKS
✅ Always use ELB health checks (not EC2 only)
✅ Healthy threshold: 2, Unhealthy threshold: 2 (fast detection)
✅ Enable connection draining (default 300s)
SECURITY
✅ SG-EC2 inbound: HTTP from SG-ALB only (never 0.0.0.0/0)
✅ Users hit ALB only — EC2 instances never exposed to internet directly
AWS CLI Commands
# Create Application Load Balancer
aws elbv2 create-load-balancer \
--name web-app-alb \
--subnets subnet-aaa subnet-bbb \
--security-groups sg-alb-xxx \
--scheme internet-facing \
--type application
# Create Target Group
aws elbv2 create-target-group \
--name web-server-tg \
--protocol HTTP --port 80 \
--vpc-id vpc-xxx \
--health-check-path /
# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name web-app-asg \
--launch-template LaunchTemplateName=web-server-lt,Version='$Latest' \
--min-size 1 --max-size 4 --desired-capacity 2 \
--availability-zones us-east-1a us-east-1b
# Attach Target Group to ASG
aws autoscaling attach-load-balancer-target-groups \
--auto-scaling-group-name web-app-asg \
--target-group-arns arn:aws:elasticloadbalancing:...
# Create Target Tracking Policy (CPU 50%)
aws autoscaling put-scaling-policy \
--auto-scaling-group-name web-app-asg \
--policy-name cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration \
'{"TargetValue":50.0,"PredefinedMetricSpecification":{"PredefinedMetricType":"ASGAverageCPUUtilization"}}'
# Describe ASG
aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names web-app-asg
# Manually set desired capacity
aws autoscaling set-desired-capacity \
--auto-scaling-group-name web-app-asg \
--desired-capacity 3
🧪 LAB — Create ELB + Auto Scaling Group
Security Groups (Create First)
SG-ALB:
Inbound: HTTP (80) from 0.0.0.0/0
Outbound: All traffic
SG-EC2:
Inbound: HTTP (80) from SG-ALB ← reference the SG, not CIDR
Outbound: All traffic
⚠️ SG-EC2's inbound rule must reference SG-ALB as source — not
0.0.0.0/0. This blocks direct internet access to EC2. Only ALB can reach them.
Step 1 — Create Launch Template
EC2 → Launch Templates → Create launch template
Name: web-server-lt
AMI: Amazon Linux 2023 (64-bit x86)
Instance Type: t2.micro / t3.micro (Free Tier)
Key Pair: your-key-pair
Security Group: SG-EC2
Storage: 8 GiB gp3
User Data (paste in Advanced Details):
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
AZ=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/placement/availability-zone)
INSTANCE_ID=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/placement/region)
cat > /var/www/html/index.html << EOF
<!DOCTYPE html>
<html>
<body style="font-family:Arial;text-align:center;padding:60px;background:#0A1628;color:#fff;">
<h1 style="color:#FF9900;">Hello from AWS!</h1>
<p>Instance ID: <b style="color:#FF9900;">$INSTANCE_ID</b></p>
<p>Availability Zone: <b style="color:#FF9900;">$AZ</b></p>
<p>Region: <b style="color:#FF9900;">$REGION</b></p>
</body>
</html>
EOF
✅ Checkpoint: web-server-lt listed in EC2 → Launch Templates
Step 2 — Create Target Group
EC2 → Load Balancing → Target Groups → Create target group
Target type: Instances
Name: web-server-tg
Protocol / Port: HTTP / 80
VPC: Default VPC
Health check path: /
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 5 seconds
Interval: 30 seconds
⚠️ On "Register targets" page — skip it entirely, click Create directly. ASG will register instances automatically.
✅ Checkpoint: web-server-tg shows status "unused" — this is correct, no instances yet.
Step 3 — Create Application Load Balancer
EC2 → Load Balancers → Create Load Balancer → Application Load Balancer
Name: web-app-alb
Scheme: Internet-facing
VPC: Default VPC
Subnets: Select 2 subnets in DIFFERENT AZs (mandatory — fails with one)
Security Group: SG-ALB (remove default SG)
Listener: HTTP : 80 → Forward to web-server-tg
Wait ~2-3 minutes for state to become Active.
✅ Checkpoint: Copy ALB DNS name — you'll need it to test.
Step 4 — Create Auto Scaling Group
EC2 → Auto Scaling Groups → Create Auto Scaling group
Name: web-app-asg
Launch template: web-server-lt (Latest)
VPC + Subnets: Same 2 subnets as ALB
Load balancing: Attach to existing → web-server-tg | HTTP
Health check: ELB, Grace period: 300 seconds
Desired: 2 | Min: 1 | Max: 4
Scaling: None for now (add in Step 5)
✅ Checkpoint: 2 EC2 instances launch within 1-2 min, both show Healthy in Target Group.
Step 5 — Configure Scaling Policy
ASG → web-app-asg → Automatic scaling → Create dynamic scaling policy
Policy type: Target tracking scaling
Policy name: cpu-target-tracking
Metric: Average CPU Utilization
Target value: 50
Warmup: 300 seconds
AWS auto-creates two CloudWatch alarms (scale-out + scale-in).
Step 6 — Test & Verify
# Open ALB DNS in browser
http://<ALB-DNS-name>
# → "Hello from AWS!" + Instance ID + AZ
# Refresh 5-10 times
# → Instance ID changes = ALB routing to different instances ✅
# → AZ changes = multi-AZ distribution ✅
# Trigger scale-out (SSH into one EC2)
ssh -i your-key.pem ec2-user@<ec2-public-ip>
sudo yum install -y stress
stress --cpu 4 --timeout 600
# Watch in console:
# EC2 → Auto Scaling Groups → Activity tab (new instances launching)
# CloudWatch → Alarms (CPU alarm goes to ALARM state)
Expected timeline after stress starts:
T+0:00 CPU spikes
T+1:00 CloudWatch alarm: OK → ALARM
T+3:00 New EC2 launches
T+5:00 New instance healthy → ALB routes to 3 instances
T+10:00 Stress ends → CPU drops
T+18:00 ASG scale-in → back to Desired=2
Step 7 — Monitor with CloudWatch
CloudWatch → Metrics → EC2 → By Auto Scaling Group
→ web-app-asg → CPUUtilization → Period: 1 min
CloudWatch → Metrics → ApplicationELB
→ Add: RequestCount, HealthyHostCount, TargetResponseTime
🧹 Cleanup — Delete in This Exact Order
1. Auto Scaling Group → terminates all EC2 instances automatically
Wait 2-3 min for all instances to fully terminate
2. Application Load Balancer
3. Target Group
4. Launch Template
5. Security Groups → delete SG-EC2 first, then SG-ALB
(SG-ALB referenced by SG-EC2 — must delete SG-EC2 first)
6. CloudWatch Alarms → delete auto-created scaling alarms
💰 ALB = ~$16/month + EC2 costs. Always clean up after labs.
📝 Assignment
Task 1 (Mandatory) — Prove Self-Healing:
Manually terminate one running EC2 instance. Watch ASG Activity tab — confirm it auto-launches a replacement. Verify ALB kept serving traffic with zero downtime.
Task 2 (Intermediate) — Step Scaling Policy:
Create a Step Scaling policy: add 2 instances if CPU > 70%, add 1 if CPU 50-70%, remove 1 if CPU < 20%.
Task 3 (Intermediate) — HTTPS:
Add HTTPS listener (port 443) to ALB. Use AWS Certificate Manager for a free certificate. Add listener rule: HTTP → HTTPS redirect.
Task 4 (Advanced) — Scheduled Scaling:
Scale up to 4 instances at 9AM weekdays, down to 1 at 6PM weekdays. Set times in UTC.
Task 5 (Advanced) — CloudWatch Dashboard:
Build a dashboard with 4 widgets: CPU%, RequestCount, HealthyHostCount, GroupInServiceInstances.
⚡ Quick Revision
LOAD BALANCING
Manager directing traffic to healthy servers
Eliminates single point of failure
ALB algorithm: weighted round robin (default)
SCALING
Horizontal = more instances (cloud-native, preferred)
Vertical = bigger instance (simple, has ceiling, needs downtime)
SCALING POLICIES
Target Tracking = keep metric at target, AWS manages alarms (use this)
Scheduled = scale at specific times
Predictive = ML-based, needs 2 weeks history
ASG CAPACITY
MIN = hard floor, never below
DESIRED = ASG always works to maintain this count
MAX = hard ceiling, never exceed
HEALTH CHECKS
EC2 = is instance running? (basic)
ELB = is app returning HTTP 200? (always use this)
Grace Period = time after launch before first check (300s default)
LB TYPES
ALB = Layer 7, HTTP/HTTPS, path/host routing → web apps
NLB = Layer 4, TCP/UDP, microsecond latency → gaming/IoT
GWLB = Layer 3, security appliances, firewalls
STICKY SESSIONS
Cookie pins user to one EC2 for session duration
Proper fix: external session store (Redis/ElastiCache)
CONNECTION DRAINING
In-flight requests complete before instance terminates
Default: 300 seconds | Enables zero-downtime scaling
INSTANCE REFRESH
Rolling AMI replacement across entire ASG fleet
Zero-downtime | Kubernetes rolling update equivalent
SECURITY GROUP PATTERN
SG-ALB: HTTP 80 from 0.0.0.0/0
SG-EC2: HTTP 80 FROM SG-ALB only
Direct EC2 access blocked from internet
💼 Interview Questions
Q1: Horizontal vs Vertical scaling — what's the difference and which does AWS recommend?
Horizontal = add more instances, requires LB, unlimited scale, fault tolerant. Vertical = upgrade existing instance, simple, has a hard ceiling, requires downtime. AWS recommends horizontal scaling for cloud-native apps.
Q2: What are Min, Desired, and Max capacity in an ASG?
Min = hard floor, ASG never drops below. Max = hard ceiling, ASG never exceeds. Desired = the target count ASG constantly works to maintain. Scaling policies adjust Desired within Min-Max boundaries.
Q3: Why use ELB health checks instead of EC2 health checks?
EC2 checks only verify the VM is running. ELB checks send a real HTTP request and verify the app returns 200. An EC2 can be "running" while Apache is crashed — EC2 checks miss it, ELB checks catch it within 60 seconds.
Q4: What is Connection Draining?
It allows in-flight requests to complete before an instance is terminated. ALB immediately stops sending new requests but waits up to 300 seconds for active connections to finish — enabling zero-downtime scaling and deployments.
Q5: ALB vs NLB — when do you use each?
ALB for HTTP/HTTPS web apps and microservices — understands application content, supports path/host routing. NLB for ultra-low latency TCP/UDP workloads (gaming, IoT, VoIP) or when a static IP is required.
Q6: Why reference SG-ALB as source in SG-EC2 instead of 0.0.0.0/0?
Referencing SG-ALB as source means only traffic that came through the ALB can reach EC2. Direct internet access to the EC2 IP is blocked. More secure, no IP management needed as instances scale.
Q7: What is Instance Refresh?
ASG's native mechanism to replace all running instances with a new Launch Template version (new AMI or config). It handles draining, replacement, and health checks automatically. Equivalent to a Kubernetes rolling update.
AWS Session 6 — ELB & Auto Scaling | Cloud + DevOps learning journey — Systems Engineer → Cloud/DevOps Engineer
Top comments (0)