Emmanuel Chukwudi

Posted on Jun 14

Azure Virtual Machine Scale Sets (VMSS): A Complete Guide

#azure #devops #cloud #vmss

Learn how to deploy, configure, and auto-scale fleets of identical VMs on Azure from zero to production-ready.

Introduction

Imagine your application suddenly gets a traffic spike maybe a product launch, a viral post, or a scheduled batch job. Without automation, you're either over-provisioned (paying for idle VMs) or scrambling to manually spin up instances while users hit errors.

Azure Virtual Machine Scale Sets (VMSS) solve this. They let you deploy and manage a group of identical, load-balanced VMs that automatically scale in or out based on demand all from a single configuration.

In this guide, we'll cover:

What VMSS is and how it works under the hood
Orchestration modes: Uniform vs Flexible
How to create a VMSS via the Portal and Azure CLI
Configuring autoscaling rules
Integrating with a Load Balancer
Updating your Scale Set (rolling upgrades)
Real-world best practices

Let's build.

What is a Virtual Machine Scale Set?

A Virtual Machine Scale Set is an Azure compute resource that lets you create and manage a group of load-balanced VMs. Key characteristics:

All VMs in a scale set are created from the same base image and configuration
The set can automatically increase or decrease the number of VM instances based on demand or a schedule
VMs are distributed across Availability Zones or Fault/Update Domains for high availability
Integrates natively with Azure Load Balancer, Application Gateway, and Azure Monitor

How It Works

                        ┌─────────────────────────────────────┐
                        │         Azure Load Balancer          │
                        └────────────────┬────────────────────┘
                                         │
                   ┌─────────────────────┼─────────────────────┐
                   │                     │                       │
            ┌──────▼──────┐      ┌───────▼─────┐      ┌────────▼────┐
            │   VM #1      │      │    VM #2     │      │   VM #3     │
            │ (instance 0) │      │ (instance 1) │      │ (instance 2)│
            └─────────────┘      └─────────────┘      └────────────-┘
                   │                     │                       │
                   └─────────────────────┼─────────────────────┘
                                         │
                              ┌──────────▼──────────┐
                              │   Autoscale Engine   │
                              │  (Azure Monitor)     │
                              └─────────────────────┘

When CPU crosses a threshold (or any metric you define), Azure's autoscale engine fires and adds or removes VM instances automatically. The Load Balancer redistributes traffic across the new fleet.

Orchestration Modes: Uniform vs Flexible

Before creating a VMSS, you need to choose an orchestration mode. This is one of the most important decisions and a common source of confusion.

Uniform Orchestration (Classic)

All VMs are identical same size, same image, same config
Azure manages the VMs as a fleet; you interact with the scale set, not individual VMs
Best for stateless workloads: web servers, API backends, batch processing
Supports up to 1,000 VM instances (with platform images)
Built-in integration with autoscale

Flexible Orchestration (Modern — Recommended)

VMs can have different sizes and configurations within the same scale set
You get full VM-level control SSH, unique managed identities, individual updates
Supports mixing spot and on-demand instances in the same set
Works across Availability Zones with zone balancing
Supports up to 1,000 instances
Microsoft's recommended mode for new workloads

Feature	Uniform	Flexible
VM customization	All identical	Individual VM control
Max instances	1,000	1,000
Autoscale	✅	✅
Spot + On-demand mix	❌	✅
Availability Zones	✅	✅
Use case	Stateless fleets	General purpose

For new projects, default to Flexible orchestration unless you have a specific reason for Uniform.

Prerequisites

Before creating a VMSS, make sure you have:

An Azure subscription
Azure CLI installed: az --version (install from aka.ms/installazurecliwindows)
A Resource Group and Virtual Network ready (we'll create these below)

# Login to Azure
az login

# Set your subscription
az account set --subscription "your-subscription-id"

# Create a resource group
az group create \
  --name vmss-demo-rg \
  --location eastus

# Create a VNet and subnet
az network vnet create \
  --resource-group vmss-demo-rg \
  --name vmss-vnet \
  --address-prefix 10.0.0.0/16 \
  --subnet-name vmss-subnet \
  --subnet-prefix 10.0.1.0/24

Part 1: Create a VMSS via the Azure Portal

Step 1: Navigate to Virtual Machine Scale Sets

In the Azure Portal, search for "Virtual machine scale sets" in the top search bar
Click + Create

Step 2: Basics Tab

Fill in the following:

Field	Value
Subscription	Your subscription
Resource group	`vmss-demo-rg`
Virtual machine scale set name	`my-app-vmss`
Region	East US
Availability zone	Zones 1, 2, 3 (select all for HA)
Orchestration mode	Flexible (recommended)
Security type	Standard
Image	Ubuntu Server 22.04 LTS
VM architecture	x64
Size	Standard_B2s (2 vCPUs, 4 GB RAM)
Authentication type	SSH public key
Username	`azureuser`
SSH public key	Paste your public key

Tip on Availability Zones: Selecting all three zones means Azure will spread your VMs across three physically separate datacenters. If one zone goes down, the others keep serving traffic.

Step 3: Disks Tab

OS disk type: Premium SSD (for production) or Standard SSD (for dev/test)
Encryption: Platform-managed key (default) or Customer-managed key

Step 4: Networking Tab

Virtual network: Select vmss-vnet
Subnet: vmss-subnet
Load balancing: Select Azure load balancer
Click Create a load balancer:
- Name: my-lb
- Type: Public (for internet-facing) or Internal (for private)
- Protocol: TCP
- Frontend port: 80
- Backend port: 80
Public IP address: Create new → my-app-lb-pip
NIC network security group: Advanced → Create NSG
- Add inbound rule: Allow TCP 80 from Internet
- Add inbound rule: Allow TCP 22 from your IP (for SSH)

Step 5: Scaling Tab

This is where VMSS gets powerful.

Field	Value
Initial instance count	2
Scaling policy	Autoscale

Configure autoscale:

Click Configure
Set minimum instances: 2
Set maximum instances: 10
Set default instance count: 2

Add a scale-out rule:

Metric: Percentage CPU
Operator: Greater than
Threshold: 75%
Duration: 5 minutes
Action: Increase count by 2
Cool down: 5 minutes

Add a scale-in rule:

Metric: Percentage CPU
Operator: Less than
Threshold: 25%
Duration: 5 minutes
Action: Decrease count by 1
Cool down: 5 minutes

Cool down periods prevent flapping where the scale set rapidly adds and removes instances in response to brief spikes. Always set a cool down of at least 5 minutes.

Step 6: Health Tab

Enable Application health monitoring:

Extension: Application Health Extension
Protocol: HTTP
Port: 80
Path: /health (or / if you don't have a health endpoint)

This lets Azure know whether individual VM instances are actually serving traffic successfully — not just whether the VM is running.

Step 7: Advanced Tab

Custom data (cloud-init): You can pass a startup script that runs on every new instance:

#cloud-config
package_update: true
packages:
  - nginx
runcmd:
  - systemctl enable nginx
  - systemctl start nginx
  - echo "Hello from $(hostname)" > /var/www/html/index.html

Paste this in the Custom data field (base64 encoding is handled automatically by the portal).

Step 8: Review + Create

Review all settings, then click Create. Azure will:

Provision the Load Balancer and Public IP
Create the initial VM instances (2 in our case)
Register them with the load balancer backend pool
Apply the autoscale policy

The deployment typically takes 3–5 minutes.

Part 2: Create a VMSS via Azure CLI

The CLI approach is faster, scriptable, and version-controllable ideal for CI/CD pipelines and IaC workflows.

Create the VMSS

az vmss create \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --image Ubuntu2204 \
  --vm-sku Standard_B2s \
  --instance-count 2 \
  --admin-username azureuser \
  --generate-ssh-keys \
  --vnet-name vmss-vnet \
  --subnet vmss-subnet \
  --public-ip-address my-app-lb-pip \
  --lb my-app-lb \
  --backend-pool-name my-app-backend \
  --lb-sku Standard \
  --zones 1 2 3 \
  --orchestration-mode Flexible \
  --upgrade-policy-mode Rolling \
  --custom-data cloud-init.yaml

The --lb flag automatically creates an Azure Load Balancer and wires the VMSS backend pool to it.

Open Port 80 on the Load Balancer

# Create a load balancer rule for HTTP traffic
az network lb rule create \
  --resource-group vmss-demo-rg \
  --lb-name my-app-lb \
  --name http-rule \
  --protocol tcp \
  --frontend-port 80 \
  --backend-port 80 \
  --frontend-ip-name loadBalancerFrontEnd \
  --backend-pool-name my-app-backend \
  --probe-name healthProbe

# Create a health probe
az network lb probe create \
  --resource-group vmss-demo-rg \
  --lb-name my-app-lb \
  --name healthProbe \
  --protocol http \
  --port 80 \
  --path /

Configure Autoscale Rules

# Get the VMSS resource ID
VMSS_ID=$(az vmss show \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --query id \
  --output tsv)

# Create the autoscale profile
az monitor autoscale create \
  --resource-group vmss-demo-rg \
  --resource $VMSS_ID \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name my-app-autoscale \
  --min-count 2 \
  --max-count 10 \
  --count 2

# Add scale-out rule (CPU > 75% → add 2 instances)
az monitor autoscale rule create \
  --resource-group vmss-demo-rg \
  --autoscale-name my-app-autoscale \
  --condition "Percentage CPU > 75 avg 5m" \
  --scale out 2 \
  --cooldown 5

# Add scale-in rule (CPU < 25% → remove 1 instance)
az monitor autoscale rule create \
  --resource-group vmss-demo-rg \
  --autoscale-name my-app-autoscale \
  --condition "Percentage CPU < 25 avg 5m" \
  --scale in 1 \
  --cooldown 5

Verify the Deployment

# List all instances in the scale set
az vmss list-instances \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --output table

# Check instance health
az vmss get-instance-view \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --instance-id 0

# Get the public IP of the load balancer
az network public-ip show \
  --resource-group vmss-demo-rg \
  --name my-app-lb-pip \
  --query ipAddress \
  --output tsv

Open a browser and navigate to the public IP — you should see your Nginx page.

Part 3: Upgrade Policies... how to Update Your Fleet

One of the trickiest parts of managing a scale set is rolling out updates (new OS image, new app version) without downtime. VMSS supports three upgrade modes:

Automatic Upgrades

Azure automatically upgrades VM instances as soon as the scale set model is updated. No manual intervention needed, but instances may restart without warning.

az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set upgradePolicy.mode=Automatic

Best for: Dev/test environments.

Rolling Upgrades (Recommended for Production)

Azure upgrades VMs in batches, validating health before moving to the next batch. Requires health probes to be configured.

az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set upgradePolicy.mode=Rolling \
  --set upgradePolicy.rollingUpgradePolicy.maxBatchInstancePercent=20 \
  --set upgradePolicy.rollingUpgradePolicy.maxUnhealthyInstancePercent=20 \
  --set upgradePolicy.rollingUpgradePolicy.maxUnhealthyUpgradedInstancePercent=5 \
  --set upgradePolicy.rollingUpgradePolicy.pauseTimeBetweenBatches=PT30S

With this config, Azure upgrades 20% of VMs at a time, waits 30 seconds between batches, and stops if more than 5% of upgraded instances become unhealthy.

Manual Upgrades

The scale set model updates, but instances are only upgraded when you explicitly tell Azure to do so. Maximum control, but requires operator action.

# Update the model (e.g., new image version)
az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set upgradePolicy.mode=Manual

# Manually upgrade specific instances
az vmss update-instances \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --instance-ids 0 1 2

Part 4: Scaling Operations

Manual Scaling

# Scale to 5 instances manually
az vmss scale \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --new-capacity 5

Schedule-Based Scaling

Useful for predictable traffic patterns (e.g., scale up at 8am, scale down at 8pm).

# Scale up at 8am UTC on weekdays
az monitor autoscale profile create \
  --resource-group vmss-demo-rg \
  --autoscale-name my-app-autoscale \
  --name weekday-peak \
  --min-count 4 \
  --max-count 10 \
  --count 4 \
  --recurrence week mon tue wed thu fri \
  --timezone "UTC" \
  --start 08:00 \
  --end 20:00

SSH Into a Specific Instance

# Get the NAT rules to find which port maps to which instance
az network lb inbound-nat-rule list \
  --resource-group vmss-demo-rg \
  --lb-name my-app-lb \
  --output table

# SSH using the NAT port (e.g., port 50000 maps to instance 0)
ssh -p 50000 azureuser@<load-balancer-public-ip>

Part 5: Monitoring Your Scale Set

View Autoscale Activity

az monitor activity-log list \
  --resource-group vmss-demo-rg \
  --max-events 20 \
  --query "[?contains(operationName.value, 'autoscale')]" \
  --output table

Key Metrics to Monitor in Azure Monitor

Metric	Description	Alert threshold
Percentage CPU	Average CPU across all instances	> 80% for 10 min
Network In/Out	Traffic volume	Spike detection
Disk Read/Write	Storage I/O	> 90% of provisioned IOPS
VmAvailabilityMetric	Instance health status	Any unhealthy
Autoscale Scale Actions	Scale in/out events	Alert on unexpected scale-in

# Create a CPU alert
az monitor metrics alert create \
  --resource-group vmss-demo-rg \
  --name high-cpu-alert \
  --scopes $VMSS_ID \
  --condition "avg Percentage CPU > 85" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action my-action-group \
  --description "VMSS CPU exceeded 85% for 5 minutes"

Part 6: Using a Custom Image with VMSS

Remember the Azure Custom Image we built in the previous article? Here's how to plug it into a VMSS.

# Get your gallery image version ID
IMAGE_ID=$(az sig image-version show \
  --resource-group my-resource-group \
  --gallery-name MyAppGallery \
  --gallery-image-definition MyAppImage \
  --gallery-image-version 1.0.0 \
  --query id \
  --output tsv)

# Create VMSS using your custom image
az vmss create \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --image $IMAGE_ID \
  --vm-sku Standard_B2s \
  --instance-count 2 \
  --admin-username azureuser \
  --generate-ssh-keys \
  --lb my-app-lb \
  --zones 1 2 3 \
  --orchestration-mode Flexible

This is the golden image pattern in action — every instance spins up from your pre-configured, pre-hardened image.

Architecture: Production-Ready VMSS Setup

Here's what a production VMSS deployment typically looks like:

                         Internet
                            │
                   ┌────────▼────────┐
                   │  Azure Front    │
                   │  Door / WAF     │
                   └────────┬────────┘
                            │
                   ┌────────▼────────┐
                   │  App Gateway /  │
                   │  Load Balancer  │
                   └────────┬────────┘
                            │
              ┌─────────────┼─────────────┐
              │ Zone 1      │ Zone 2       │ Zone 3
         ┌────▼────┐   ┌────▼────┐   ┌────▼────┐
         │  VM #1  │   │  VM #2  │   │  VM #3  │
         │ (VMSS)  │   │ (VMSS)  │   │ (VMSS)  │
         └────┬────┘   └────┬────┘   └────┬────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
                    ┌────────▼────────┐
                    │  Azure Monitor  │
                    │  + Autoscale    │
                    └────────┬────────┘
                             │
              ┌──────────────┼───────────────┐
              │              │               │
      ┌───────▼──┐   ┌───────▼──┐   ┌───────▼──┐
      │ Azure DB │   │Key Vault │   │  Storage  │
      └──────────┘   └──────────┘   └───────────┘

Key components:

Azure Front Door or WAF: DDoS protection and global routing at the edge
Application Gateway or Load Balancer: Layer 7 or Layer 4 traffic distribution
VMSS across 3 Availability Zones: High availability against datacenter failures
Azure Monitor + Autoscale: Reactive and scheduled scaling
Azure Key Vault: Secrets injected at runtime, never baked into images
Managed Identity: VM instances authenticate to Azure services without credentials

Best Practices

Design for Statelessness

VMs in a scale set can be added or removed at any time. Your application should:

Store session data in Azure Cache for Redis, not in-memory
Write files to Azure Blob Storage or a shared file system, not local disk
Use Azure Service Bus or Event Hub for message queuing

Use Spot Instances for Cost Savings

For fault-tolerant, interruptible workloads (batch jobs, rendering, CI runners), mix spot instances with on-demand:

az vmss create \
  --resource-group vmss-demo-rg \
  --name my-batch-vmss \
  --priority Spot \
  --eviction-policy Deallocate \
  --max-price 0.05 \
  --image Ubuntu2204 \
  --vm-sku Standard_D4s_v3 \
  --instance-count 0

Spot instances can save up to 90% compared to on-demand pricing — with the tradeoff that Azure can evict them when capacity is needed.

Always Configure Health Probes

Without health probes, Azure doesn't know if your application is actually working. A VM could be running but serving 500 errors, and autoscale would keep it in the pool.

az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set virtualMachineProfile.extensionProfile.extensions='[{
    "name": "HealthExtension",
    "properties": {
      "publisher": "Microsoft.ManagedServices",
      "type": "ApplicationHealthLinux",
      "typeHandlerVersion": "1.0",
      "settings": {
        "protocol": "http",
        "port": 80,
        "requestPath": "/health"
      }
    }
  }]'

Protect Against Accidental Scale-In

In production, you may want to prevent certain instances from being terminated during a scale-in event (e.g., an instance running a long job).

# Protect a specific instance from scale-in
az vmss update-instances \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --instance-ids 2 \
  --protect-from-scale-in true

Quick Reference — Common CLI Commands

# Create VMSS
az vmss create --resource-group <rg> --name <name> --image <image> --instance-count <n>

# List instances
az vmss list-instances --resource-group <rg> --name <name> --output table

# Scale manually
az vmss scale --resource-group <rg> --name <name> --new-capacity <n>

# Update instances to latest model
az vmss update-instances --resource-group <rg> --name <name> --instance-ids "*"

# Reimage an instance (fresh OS disk)
az vmss reimage --resource-group <rg> --name <name> --instance-id <id>

# Delete a specific instance
az vmss delete-instances --resource-group <rg> --name <name> --instance-ids <id>

# Show autoscale settings
az monitor autoscale show --resource-group <rg> --name <autoscale-name>

# Delete the entire VMSS
az vmss delete --resource-group <rg> --name <name>

Conclusion

Azure Virtual Machine Scale Sets are one of the most powerful tools in a cloud engineer's toolkit. Once you understand the orchestration modes, upgrade policies, and autoscale configuration, you can build infrastructure that handles anything from a quiet weekend to a viral traffic spike without manual intervention.

Recap of what we covered:

Uniform vs Flexible orchestration modes
Creating a VMSS via Portal and Azure CLI
Wiring up a Load Balancer with health probes
Configuring CPU-based and schedule-based autoscaling
Rolling upgrade strategies for zero-downtime deployments
Using custom images from Azure Compute Gallery
Production architecture patterns and best practices

Found this helpful? Drop a ❤️ and share it with your team. Questions or corrections? Leave a comment below.