Tejas Shinkar

Posted on Jul 3

AWS ELB & Auto Scaling — Building Self-Healing, Scalable Infrastructure

#beginners #cloud #devops #aws

Elastic Load Balancing & Auto Scaling

Traffic Distribution · ASG · Scaling Policies · ALB vs NLB · Health Checks · Zero-Downtime Deployments

Part of my AWS learning journey — transitioning from Systems Engineer to Cloud/DevOps. This session covers the two services that make AWS infrastructure truly production-grade: ELB and Auto Scaling.

📋 Topics Covered

#	Topic	Type
1	Purpose of Load Balancing	Concept
2	How Load Balancing Works (Algorithms)	Concept + Interview
3	Horizontal vs Vertical Scaling	Concept + Interview
4	Scaling Policies — Dynamic, Scheduled, Predictive	Concept + Cert
5	Auto Scaling Group (ASG) — Core Config	Concept + Lab
6	ASG Lifecycle & Key Metrics	Concept + Cert
7	Health Checks & Replacement Policies	Concept + Lab
8	Session Stickiness	Concept + Interview
9	Connection Draining (Deregistration Delay)	Concept + Interview
10	Instance Refresh — Zero-Downtime Deployments	Concept + DevOps
11	Load Balancer Types — ALB vs NLB vs GWLB	Concept + Cert
12	Target Groups & Health Check Config	Concept + Lab
13	ALB Deep Dive — Routing Rules	Concept + Cert
14	Architecture — ALB + ASG	Concept + DevOps
15	Monitoring & Troubleshooting	Concept + DevOps
16	Best Practices	DevOps
17	AWS CLI Commands	Practical
18	Lab — Create ELB + Auto Scaling Group (7 Steps)	Lab
19	Cleanup Checklist	Lab
20	Assignment	Practice

Why Load Balancing Exists

Imagine a popular restaurant with one cashier — as soon as the lunch rush hits, the queue grows, the cashier gets overwhelmed, and eventually everything grinds to a halt. The fix? More cashiers, with a manager at the door directing customers to whoever is free.

A Load Balancer is that manager.

It sits in front of your EC2 instances and distributes incoming traffic across them — so no single server gets overwhelmed, and if one fails, the others keep running.

Without a Load Balancer:

Single point of failure — one server down means your app is down
No horizontal scaling — you're stuck with one server
No health checking — dead servers still receive traffic
Manual failover — someone has to intervene

With a Load Balancer:

High availability across multiple AZs
Horizontal scalability — add more servers on demand
Automatic health checks — dead servers removed instantly
Automatic failover — zero manual intervention
SSL/TLS termination — offloads HTTPS decryption from backend servers

How Load Balancing Works

Request flow — what actually happens:

1. User's browser sends HTTP request to your domain
2. DNS resolves your domain → returns ALB's IP address
3. Request hits the Load Balancer
4. LB picks a healthy backend EC2 instance
5. Forwards request to that instance
6. Instance processes and responds
7. LB returns the response to the user

Load Distribution Algorithms:

Algorithm	How it works	AWS Usage
Round Robin	Requests go in rotation: 1→2→3→1→2→3	ALB default
Least Connections	Route to instance with fewest active connections	Good for variable request lengths
IP Hash	Client IP always maps to same backend	Sticky sessions
Weighted	Instance A gets 70%, B gets 30%	Canary deployments
Flow Hash	Hash of protocol+IP+port	NLB default for TCP/UDP

🎯 Cert Tip: ALB uses weighted round robin by default. NLB uses flow hash for TCP/UDP traffic.

Horizontal vs Vertical Scaling

This is one of the most fundamental concepts in cloud architecture — and a guaranteed interview question.

Analogy: Your app is a delivery service. Vertical scaling = buying a bigger truck. Horizontal scaling = buying more trucks.

Vertical Scaling (Scale Up)

Upgrade the existing instance — stop it, change the type, restart.

t3.micro → t3.medium → t3.large → t3.xlarge

✅ Simple — no distributed system complexity
✅ Better for databases (data consistency is easier)
❌ Has a hard ceiling — biggest instance type is the limit
❌ Requires downtime during the type change
❌ Cannot auto-scale

Horizontal Scaling (Scale Out)

Add more instances — distribute load across them.

1 × t3.micro → 2 × t3.micro → 4 × t3.micro → back to 2

✅ Theoretically unlimited scalability
✅ Fault tolerant — one fails, others keep running
✅ Scales dynamically with Auto Scaling Groups
✅ No downtime when scaling
❌ Requires a Load Balancer to distribute traffic
❌ Requires stateless app design (or sticky sessions for state)

🎯 Cloud-native best practice: always use horizontal scaling with ASG. Vertical scaling is for legacy systems or databases that can't easily distribute state.

Scaling Policies — Three Types

How does AWS know when to scale? Scaling Policies define the trigger.

1. Dynamic Scaling — Target Tracking (Most Common)

Monitor a metric and automatically adjust to keep it at a target value.

Metric: Average CPU Utilization
Target: 50%

CPU rises to 75% → ASG adds instances until CPU ≈ 50%
CPU drops to 20% → ASG removes instances until CPU ≈ 50%

AWS creates two CloudWatch alarms automatically — one for scale-out, one for scale-in. You don't configure alarms manually.

Best for: Most production workloads. Simplest to set up and maintain.

2. Scheduled Scaling

Scale at pre-defined times — for predictable, known traffic patterns.

Every weekday 8:30 AM  → set Desired = 6  (office hours ramp up)
Every weekday 7:00 PM  → set Desired = 2  (evening wind down)
Every Saturday 12:00 AM → set Desired = 8 (weekend traffic spike)

Best for: Business-hours traffic, batch jobs at fixed times, known seasonal spikes.

3. Predictive Scaling (ML-based)

AWS analyses at least 2 weeks of historical metric data, forecasts future demand, and scales EC2 instances before traffic arrives — not in reaction to it.

Historical pattern: Every Friday at 5PM, traffic doubles
Predictive: ASG pre-scales at 4:45PM — instances ready before surge hits

Best for: Recurring, predictable patterns where reactive scaling causes latency spikes.

🎯 Cert Tip: Predictive Scaling requires a minimum of 2 weeks of historical data to build its forecast model.

Auto Scaling Group (ASG)

An ASG is the component that actually manages your EC2 fleet — launching, terminating, and replacing instances automatically.

Analogy: ASG is like a staffing agency for your servers. You tell it: "I need at least 2 people, ideally 4, maximum 10 — and if anyone quits or gets sick, hire a replacement immediately."

The Three Capacity Numbers (Core Concept)

MAX     [4] ──── Hard ceiling. ASG will NEVER exceed this, even under extreme load
DESIRED [2] ──── ASG always tries to maintain exactly this many healthy instances
MIN     [1] ──── Hard floor. ASG will NEVER go below this, even at zero traffic

ASG constantly compares running instances against Desired capacity. If an instance fails a health check, it terminates it and launches a replacement — no human intervention needed.

ASG Configuration Components

Setting	What it does
Launch Template	Blueprint for every EC2 ASG launches — AMI, instance type, SG, User Data
Min / Desired / Max	Capacity boundaries
Availability Zones	Which AZs to spread instances across (always 2+)
Load Balancer / Target Group	Where to register launched instances
Health Check Type	EC2 (VM-level) or ELB (app-level) — always use ELB
Scaling Policies	Dynamic, scheduled, or predictive rules
Termination Policy	Which instances to remove during scale-in (default: oldest)

✅ Always use Launch Templates not Launch Configurations — templates support versioning, Instance Refresh, and newer EC2 features.

ASG Lifecycle — Instance States

Scale-out triggered
      ↓
  Pending ──── being launched by ASG
      ↓
  InService ── running, healthy, serving traffic ← normal operating state
      ↓
  Draining ─── marked for removal, existing connections completing
      ↓
  Terminated ─ deleted, no longer in ASG

  Unhealthy ── failed health check → immediately replaced with new instance

Health Checks & Replacement Policies

Two Types of Health Checks

EC2 Health Check (VM level)
Checks if the instance is running — system and instance status checks pass. Doesn't know if your application is actually responding.

ELB Health Check (App level)
The ALB sends a real HTTP request to your configured path (e.g., /health). If the app doesn't return HTTP 200, the instance is marked unhealthy.

✅ Always use ELB health checks for web applications. An instance can be "running" while Apache is crashed inside — EC2 checks miss this entirely. ELB checks catch it within 60 seconds.

Health Check Configuration

Setting	Default	What it means
Interval	30 seconds	How often checks are sent
Timeout	5 seconds	How long to wait for response
Healthy Threshold	2 consecutive successes	Instance declared healthy
Unhealthy Threshold	2 consecutive failures	Instance declared unhealthy
Grace Period	300 seconds	Time after launch before checks begin (allows app boot time)

Self-Healing Flow

Health check fails twice
      ↓
ALB marks instance "Unhealthy" → stops sending traffic to it
      ↓
ASG receives notification
      ↓
ASG terminates the unhealthy instance
      ↓
ASG launches a new replacement from Launch Template
      ↓
Replacement passes health checks → added to Target Group
      ↓
Total time: ~5-10 minutes, fully automated ✅

Session Stickiness (Sticky Sessions)

By default, ALB routes each request to any healthy instance via round robin — a user's second request might go to a different server than their first. For stateless apps that's fine. For stateful apps (shopping carts, auth sessions), the session exists only on one server.

Sticky Sessions fixes this: ALB sets a cookie in the browser that pins the user to one specific backend instance for the session duration.

User logs in → ALB routes to EC2-1 → sets cookie: "stick to EC2-1"
Next request → user sends cookie → ALB routes to EC2-1 again ✅
Session expires / cookie deleted → back to normal round robin

When to use: Shopping carts, user authentication, stateful apps with in-memory sessions.

Downsides:

Uneven load distribution
If the pinned instance fails, the session is lost anyway

🎯 Proper long-term fix: Externalize session state to ElastiCache (Redis) or DynamoDB so any instance can serve any user without needing stickiness.

Connection Draining (Deregistration Delay)

When an instance is removed (scale-in, replacement, deployment), what happens to users mid-request on that instance?

Without draining: requests are immediately dropped → users get errors.

Connection Draining:

Instance marked for removal
      ↓
ALB immediately stops sending NEW requests to this instance
      ↓
ALB waits for EXISTING in-flight requests to finish
      ↓
Default wait: up to 300 seconds
      ↓
After all connections finish (or timeout) → instance terminated
      ↓
Zero dropped requests ✅

This is what enables zero-downtime scaling and deployments.

Instance Refresh — Zero-Downtime Deployments

When you update your application (new AMI, new config), you need to replace all running instances with the new version. Instance Refresh automates this within ASG.

Refresh strategies:

Strategy	Method	Speed	Risk
Rolling	One instance at a time	Slowest	Lowest
Rolling with Stagger	X% in parallel (25%→50%→75%)	Medium	Medium
Blue/Green	All new instances launched, traffic switched, old terminated	Fastest	Higher cost

Workflow:

Update Launch Template (new AMI version)
      ↓
Start Instance Refresh in ASG console
      ↓
ASG launches new instance from updated template
      ↓
Health checks pass on new instance
      ↓
Old instance: connection draining → terminated
      ↓
Repeat for all instances
      ↓
All instances running new AMI — zero downtime ✅

🎯 DevOps relevance: Instance Refresh is the ASG equivalent of a rolling update in Kubernetes. Same concept, different tool.

Load Balancer Types — ALB vs NLB vs GWLB

	ALB	NLB	GWLB
OSI Layer	Layer 7 (Application)	Layer 4 (Transport)	Layer 3 (Network)
Protocol	HTTP, HTTPS	TCP, UDP, TLS	All IP protocols
Routing basis	Path, host, header, query string	IP + Port (flow hash)	Transparent proxy
Latency	~25-100ms	<100 microseconds	Very low
Static IP	❌ No	✅ Yes	✅ Yes
SSL Termination	✅ Yes	✅ Yes	❌ No
Use Case	Web apps, APIs, microservices	Gaming, IoT, VoIP, real-time	Firewalls, IDS/IPS appliances

Decision guide:

Web app or REST API?                 → ALB (almost always)
Ultra-low latency TCP/UDP required?  → NLB
Traffic must pass through a firewall/security appliance? → GWLB

⚠️ Classic Load Balancer (CLB) exists but is legacy — never use for new deployments.

🎯 Cert scenarios: "Which LB supports path-based routing?" → ALB only. "Which provides a static IP?" → NLB. "Which for real-time gaming at massive scale?" → NLB.

ALB Deep Dive — Routing Rules

ALB operates at Layer 7 — it understands HTTP. This enables routing based on the actual content of each request.

Listener Rules (evaluated top-to-bottom):

Rule 1: Host = api.example.com
        → Forward to: api-target-group (3x t3.medium)

Rule 2: Host = admin.example.com
        → Forward to: admin-target-group (restricted access)

Rule 3: Path = /images/*
        → Forward to: media-target-group (or S3)

Rule 4: Path = /api/*
        → Forward to: api-target-group

Rule 5: HTTP (port 80) — any path
        → Redirect: 301 to HTTPS

Default Rule: no other rule matched
        → Forward to: web-target-group (main homepage)

One ALB can route to multiple backend services — this is the foundation of microservices architecture on AWS.

Target Groups & Health Check Config

A Target Group is a named pool of EC2 instances (or IPs, or Lambdas) that the ALB routes traffic to. One ALB → multiple target groups via listener rules.

Setting	Value
Target Type	Instances
Protocol	HTTP or HTTPS
Port	App's listening port (80, 8080, 3000, etc.)
VPC	Must match your EC2 instances
Health Check Path	`/health` — must return HTTP 200-299

⚠️ When using ASG — never manually register instances to the Target Group. Let ASG handle registration/deregistration automatically as instances launch and terminate.

Architecture — ALB + ASG (Production Standard)

                     Internet Users
                           │
                       Route 53
                  (domain → ALB DNS)
                           │
               Application Load Balancer
             (Internet-facing, 2+ AZs, SG-ALB)
                           │
                     Target Group
               (health-checked instance pool)
                           │
            Auto Scaling Group (min=1, desired=2, max=4)
         ┌─────────────────┼──────────────────┐
         │                 │                  │
    EC2 (AZ-a)         EC2 (AZ-b)        EC2 (AZ-b)
    SG-EC2, :80        SG-EC2, :80       SG-EC2, :80
         │
     (shared state)
    RDS Database
    or ElastiCache

CloudWatch CPU metric → Scaling Policy → ASG scales in/out

Three architectural pillars achieved:

High Availability — multi-AZ ALB and instances
Fault Tolerance — ASG self-heals on any instance failure
Cost Efficiency — scale down at night, up during peaks

Monitoring & Troubleshooting

Key CloudWatch Metrics

ALB Metrics:

Metric	Alert When
`TargetResponseTime`	> 500ms sustained
`RequestCount`	Sudden unexpected drop
`HTTPCode_Target_5XX`	> 0 sustained
`HealthyHostCount`	Below Desired capacity
`UnHealthyHostCount`	Greater than 0

ASG Metrics:

Metric	What it shows
`GroupDesiredCapacity`	Target count
`GroupInServiceInstances`	Healthy, running count
`GroupPendingInstances`	Currently launching
`GroupTerminatingInstances`	Being removed

Common Issues & Fixes

UnHealthyHostCount > 0

Check: SG-EC2 allows port 80 FROM SG-ALB (not from internet)
Check: Health check path actually returns 200 (try curl from EC2)
Check: App running → systemctl status httpd
Check: Grace period → did instance have 300s to boot before checks started?

ASG not scaling despite high CPU

Check: Scaling policy threshold — is CPU actually crossing it?
Check: Max capacity — already at the ceiling?
Check: Cooldown — previous scale action still in cooldown?
Check: CloudWatch alarm — is it actually in ALARM state?

Requests timing out after ALB

Check: EC2 instances responding? Try curl from inside the VPC
Check: Security group — SG-EC2 must allow HTTP FROM SG-ALB only
Check: App logs — any errors being thrown?

Session lost when instance replaced

Short-term: Enable sticky sessions on the Target Group
Long-term: Move session state to ElastiCache (Redis) or DynamoDB

Best Practices

ALB
  ✅ Always deploy across 2+ AZs
  ✅ Use path-based routing to consolidate microservices on one ALB
  ✅ Enable access logs for debugging and compliance
  ✅ Add HTTPS listener + redirect HTTP → HTTPS (use ACM for free cert)
  ✅ Use dedicated health check endpoint /health (not just /)

AUTO SCALING
  ✅ Use Target Tracking (simplest) over Step Scaling
  ✅ Set cooldown 300-600 seconds to avoid scaling oscillation
  ✅ Set Grace Period long enough for your app to boot (300s minimum)
  ✅ Use Launch Templates (not Launch Configurations)
  ✅ Test scaling with a load test before going to production
  ✅ Combine Scheduled + Target Tracking for predictable + reactive scaling

HEALTH CHECKS
  ✅ Always use ELB health checks (not EC2 only)
  ✅ Healthy threshold: 2, Unhealthy threshold: 2 (fast detection)
  ✅ Enable connection draining (default 300s)

SECURITY
  ✅ SG-EC2 inbound: HTTP from SG-ALB only (never 0.0.0.0/0)
  ✅ Users hit ALB only — EC2 instances never exposed to internet directly

AWS CLI Commands

# Create Application Load Balancer
aws elbv2 create-load-balancer \
  --name web-app-alb \
  --subnets subnet-aaa subnet-bbb \
  --security-groups sg-alb-xxx \
  --scheme internet-facing \
  --type application

# Create Target Group
aws elbv2 create-target-group \
  --name web-server-tg \
  --protocol HTTP --port 80 \
  --vpc-id vpc-xxx \
  --health-check-path /

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-app-asg \
  --launch-template LaunchTemplateName=web-server-lt,Version='$Latest' \
  --min-size 1 --max-size 4 --desired-capacity 2 \
  --availability-zones us-east-1a us-east-1b

# Attach Target Group to ASG
aws autoscaling attach-load-balancer-target-groups \
  --auto-scaling-group-name web-app-asg \
  --target-group-arns arn:aws:elasticloadbalancing:...

# Create Target Tracking Policy (CPU 50%)
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-app-asg \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration \
  '{"TargetValue":50.0,"PredefinedMetricSpecification":{"PredefinedMetricType":"ASGAverageCPUUtilization"}}'

# Describe ASG
aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names web-app-asg

# Manually set desired capacity
aws autoscaling set-desired-capacity \
  --auto-scaling-group-name web-app-asg \
  --desired-capacity 3

🧪 LAB — Create ELB + Auto Scaling Group

Security Groups (Create First)

SG-ALB:

Inbound:  HTTP (80) from 0.0.0.0/0
Outbound: All traffic

SG-EC2:

Inbound:  HTTP (80) from SG-ALB  ← reference the SG, not CIDR
Outbound: All traffic

⚠️ SG-EC2's inbound rule must reference SG-ALB as source — not 0.0.0.0/0. This blocks direct internet access to EC2. Only ALB can reach them.

Step 1 — Create Launch Template

EC2 → Launch Templates → Create launch template

Name:            web-server-lt
AMI:             Amazon Linux 2023 (64-bit x86)
Instance Type:   t2.micro / t3.micro (Free Tier)
Key Pair:        your-key-pair
Security Group:  SG-EC2
Storage:         8 GiB gp3

User Data (paste in Advanced Details):

#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd

TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

AZ=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/placement/availability-zone)
INSTANCE_ID=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/placement/region)

cat > /var/www/html/index.html << EOF
<!DOCTYPE html>
<html>
<body style="font-family:Arial;text-align:center;padding:60px;background:#0A1628;color:#fff;">
  <h1 style="color:#FF9900;">Hello from AWS!</h1>
  <p>Instance ID: <b style="color:#FF9900;">$INSTANCE_ID</b></p>
  <p>Availability Zone: <b style="color:#FF9900;">$AZ</b></p>
  <p>Region: <b style="color:#FF9900;">$REGION</b></p>
</body>
</html>
EOF

✅ Checkpoint: web-server-lt listed in EC2 → Launch Templates

Step 2 — Create Target Group

EC2 → Load Balancing → Target Groups → Create target group

Target type:         Instances
Name:                web-server-tg
Protocol / Port:     HTTP / 80
VPC:                 Default VPC
Health check path:   /
Healthy threshold:   2
Unhealthy threshold: 2
Timeout:             5 seconds
Interval:            30 seconds

⚠️ On "Register targets" page — skip it entirely, click Create directly. ASG will register instances automatically.

✅ Checkpoint: web-server-tg shows status "unused" — this is correct, no instances yet.

Step 3 — Create Application Load Balancer

EC2 → Load Balancers → Create Load Balancer → Application Load Balancer

Name:           web-app-alb
Scheme:         Internet-facing
VPC:            Default VPC
Subnets:        Select 2 subnets in DIFFERENT AZs (mandatory — fails with one)
Security Group: SG-ALB (remove default SG)
Listener:       HTTP : 80 → Forward to web-server-tg

Wait ~2-3 minutes for state to become Active.

✅ Checkpoint: Copy ALB DNS name — you'll need it to test.

Step 4 — Create Auto Scaling Group

EC2 → Auto Scaling Groups → Create Auto Scaling group

Name:             web-app-asg
Launch template:  web-server-lt (Latest)
VPC + Subnets:    Same 2 subnets as ALB

Load balancing:   Attach to existing → web-server-tg | HTTP
Health check:     ELB, Grace period: 300 seconds

Desired: 2  |  Min: 1  |  Max: 4
Scaling: None for now (add in Step 5)

✅ Checkpoint: 2 EC2 instances launch within 1-2 min, both show Healthy in Target Group.

Step 5 — Configure Scaling Policy

ASG → web-app-asg → Automatic scaling → Create dynamic scaling policy

Policy type:   Target tracking scaling
Policy name:   cpu-target-tracking
Metric:        Average CPU Utilization
Target value:  50
Warmup:        300 seconds

AWS auto-creates two CloudWatch alarms (scale-out + scale-in).

Step 6 — Test & Verify

# Open ALB DNS in browser
http://<ALB-DNS-name>
# → "Hello from AWS!" + Instance ID + AZ

# Refresh 5-10 times
# → Instance ID changes = ALB routing to different instances ✅
# → AZ changes = multi-AZ distribution ✅

# Trigger scale-out (SSH into one EC2)
ssh -i your-key.pem ec2-user@<ec2-public-ip>
sudo yum install -y stress
stress --cpu 4 --timeout 600

# Watch in console:
# EC2 → Auto Scaling Groups → Activity tab (new instances launching)
# CloudWatch → Alarms (CPU alarm goes to ALARM state)

Expected timeline after stress starts:

T+0:00  CPU spikes
T+1:00  CloudWatch alarm: OK → ALARM
T+3:00  New EC2 launches
T+5:00  New instance healthy → ALB routes to 3 instances
T+10:00 Stress ends → CPU drops
T+18:00 ASG scale-in → back to Desired=2

Step 7 — Monitor with CloudWatch

CloudWatch → Metrics → EC2 → By Auto Scaling Group
  → web-app-asg → CPUUtilization → Period: 1 min

CloudWatch → Metrics → ApplicationELB
  → Add: RequestCount, HealthyHostCount, TargetResponseTime

🧹 Cleanup — Delete in This Exact Order

1. Auto Scaling Group  → terminates all EC2 instances automatically
   Wait 2-3 min for all instances to fully terminate

2. Application Load Balancer

3. Target Group

4. Launch Template

5. Security Groups  → delete SG-EC2 first, then SG-ALB
   (SG-ALB referenced by SG-EC2 — must delete SG-EC2 first)

6. CloudWatch Alarms  → delete auto-created scaling alarms

💰 ALB = ~$16/month + EC2 costs. Always clean up after labs.

📝 Assignment

Task 1 (Mandatory) — Prove Self-Healing:
Manually terminate one running EC2 instance. Watch ASG Activity tab — confirm it auto-launches a replacement. Verify ALB kept serving traffic with zero downtime.

Task 2 (Intermediate) — Step Scaling Policy:
Create a Step Scaling policy: add 2 instances if CPU > 70%, add 1 if CPU 50-70%, remove 1 if CPU < 20%.

Task 3 (Intermediate) — HTTPS:
Add HTTPS listener (port 443) to ALB. Use AWS Certificate Manager for a free certificate. Add listener rule: HTTP → HTTPS redirect.

Task 4 (Advanced) — Scheduled Scaling:
Scale up to 4 instances at 9AM weekdays, down to 1 at 6PM weekdays. Set times in UTC.

Task 5 (Advanced) — CloudWatch Dashboard:
Build a dashboard with 4 widgets: CPU%, RequestCount, HealthyHostCount, GroupInServiceInstances.

⚡ Quick Revision

LOAD BALANCING
  Manager directing traffic to healthy servers
  Eliminates single point of failure
  ALB algorithm: weighted round robin (default)

SCALING
  Horizontal = more instances (cloud-native, preferred)
  Vertical   = bigger instance (simple, has ceiling, needs downtime)

SCALING POLICIES
  Target Tracking = keep metric at target, AWS manages alarms (use this)
  Scheduled       = scale at specific times
  Predictive      = ML-based, needs 2 weeks history

ASG CAPACITY
  MIN     = hard floor, never below
  DESIRED = ASG always works to maintain this count
  MAX     = hard ceiling, never exceed

HEALTH CHECKS
  EC2  = is instance running? (basic)
  ELB  = is app returning HTTP 200? (always use this)
  Grace Period = time after launch before first check (300s default)

LB TYPES
  ALB  = Layer 7, HTTP/HTTPS, path/host routing → web apps
  NLB  = Layer 4, TCP/UDP, microsecond latency → gaming/IoT
  GWLB = Layer 3, security appliances, firewalls

STICKY SESSIONS
  Cookie pins user to one EC2 for session duration
  Proper fix: external session store (Redis/ElastiCache)

CONNECTION DRAINING
  In-flight requests complete before instance terminates
  Default: 300 seconds | Enables zero-downtime scaling

INSTANCE REFRESH
  Rolling AMI replacement across entire ASG fleet
  Zero-downtime | Kubernetes rolling update equivalent

SECURITY GROUP PATTERN
  SG-ALB: HTTP 80 from 0.0.0.0/0
  SG-EC2: HTTP 80 FROM SG-ALB only
  Direct EC2 access blocked from internet

💼 Interview Questions

Q1: Horizontal vs Vertical scaling — what's the difference and which does AWS recommend?
Horizontal = add more instances, requires LB, unlimited scale, fault tolerant. Vertical = upgrade existing instance, simple, has a hard ceiling, requires downtime. AWS recommends horizontal scaling for cloud-native apps.

Q2: What are Min, Desired, and Max capacity in an ASG?
Min = hard floor, ASG never drops below. Max = hard ceiling, ASG never exceeds. Desired = the target count ASG constantly works to maintain. Scaling policies adjust Desired within Min-Max boundaries.

Q3: Why use ELB health checks instead of EC2 health checks?
EC2 checks only verify the VM is running. ELB checks send a real HTTP request and verify the app returns 200. An EC2 can be "running" while Apache is crashed — EC2 checks miss it, ELB checks catch it within 60 seconds.

Q4: What is Connection Draining?
It allows in-flight requests to complete before an instance is terminated. ALB immediately stops sending new requests but waits up to 300 seconds for active connections to finish — enabling zero-downtime scaling and deployments.

Q5: ALB vs NLB — when do you use each?
ALB for HTTP/HTTPS web apps and microservices — understands application content, supports path/host routing. NLB for ultra-low latency TCP/UDP workloads (gaming, IoT, VoIP) or when a static IP is required.

Q6: Why reference SG-ALB as source in SG-EC2 instead of 0.0.0.0/0?
Referencing SG-ALB as source means only traffic that came through the ALB can reach EC2. Direct internet access to the EC2 IP is blocked. More secure, no IP management needed as instances scale.

Q7: What is Instance Refresh?
ASG's native mechanism to replace all running instances with a new Launch Template version (new AMI or config). It handles draining, replacement, and health checks automatically. Equivalent to a Kubernetes rolling update.

AWS Session 6 — ELB & Auto Scaling | Cloud + DevOps learning journey — Systems Engineer → Cloud/DevOps Engineer