Learn how to deploy, configure, and auto-scale fleets of identical VMs on Azure from zero to production-ready.
Introduction
Imagine your application suddenly gets a traffic spike maybe a product launch, a viral post, or a scheduled batch job. Without automation, you're either over-provisioned (paying for idle VMs) or scrambling to manually spin up instances while users hit errors.
Azure Virtual Machine Scale Sets (VMSS) solve this. They let you deploy and manage a group of identical, load-balanced VMs that automatically scale in or out based on demand all from a single configuration.
In this guide, we'll cover:
- What VMSS is and how it works under the hood
- Orchestration modes: Uniform vs Flexible
- How to create a VMSS via the Portal and Azure CLI
- Configuring autoscaling rules
- Integrating with a Load Balancer
- Updating your Scale Set (rolling upgrades)
- Real-world best practices
Let's build.
What is a Virtual Machine Scale Set?
A Virtual Machine Scale Set is an Azure compute resource that lets you create and manage a group of load-balanced VMs. Key characteristics:
- All VMs in a scale set are created from the same base image and configuration
- The set can automatically increase or decrease the number of VM instances based on demand or a schedule
- VMs are distributed across Availability Zones or Fault/Update Domains for high availability
- Integrates natively with Azure Load Balancer, Application Gateway, and Azure Monitor
How It Works
┌─────────────────────────────────────┐
│ Azure Load Balancer │
└────────────────┬────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
┌──────▼──────┐ ┌───────▼─────┐ ┌────────▼────┐
│ VM #1 │ │ VM #2 │ │ VM #3 │
│ (instance 0) │ │ (instance 1) │ │ (instance 2)│
└─────────────┘ └─────────────┘ └────────────-┘
│ │ │
└─────────────────────┼─────────────────────┘
│
┌──────────▼──────────┐
│ Autoscale Engine │
│ (Azure Monitor) │
└─────────────────────┘
When CPU crosses a threshold (or any metric you define), Azure's autoscale engine fires and adds or removes VM instances automatically. The Load Balancer redistributes traffic across the new fleet.
Orchestration Modes: Uniform vs Flexible
Before creating a VMSS, you need to choose an orchestration mode. This is one of the most important decisions and a common source of confusion.
Uniform Orchestration (Classic)
- All VMs are identical same size, same image, same config
- Azure manages the VMs as a fleet; you interact with the scale set, not individual VMs
- Best for stateless workloads: web servers, API backends, batch processing
- Supports up to 1,000 VM instances (with platform images)
- Built-in integration with autoscale
Flexible Orchestration (Modern — Recommended)
- VMs can have different sizes and configurations within the same scale set
- You get full VM-level control SSH, unique managed identities, individual updates
- Supports mixing spot and on-demand instances in the same set
- Works across Availability Zones with zone balancing
- Supports up to 1,000 instances
- Microsoft's recommended mode for new workloads
| Feature | Uniform | Flexible |
|---|---|---|
| VM customization | All identical | Individual VM control |
| Max instances | 1,000 | 1,000 |
| Autoscale | ✅ | ✅ |
| Spot + On-demand mix | ❌ | ✅ |
| Availability Zones | ✅ | ✅ |
| Use case | Stateless fleets | General purpose |
For new projects, default to Flexible orchestration unless you have a specific reason for Uniform.
Prerequisites
Before creating a VMSS, make sure you have:
- An Azure subscription
- Azure CLI installed:
az --version(install from aka.ms/installazurecliwindows) - A Resource Group and Virtual Network ready (we'll create these below)
# Login to Azure
az login
# Set your subscription
az account set --subscription "your-subscription-id"
# Create a resource group
az group create \
--name vmss-demo-rg \
--location eastus
# Create a VNet and subnet
az network vnet create \
--resource-group vmss-demo-rg \
--name vmss-vnet \
--address-prefix 10.0.0.0/16 \
--subnet-name vmss-subnet \
--subnet-prefix 10.0.1.0/24
Part 1: Create a VMSS via the Azure Portal
Step 1: Navigate to Virtual Machine Scale Sets
- In the Azure Portal, search for "Virtual machine scale sets" in the top search bar
- Click + Create
Step 2: Basics Tab
Fill in the following:
| Field | Value |
|---|---|
| Subscription | Your subscription |
| Resource group | vmss-demo-rg |
| Virtual machine scale set name | my-app-vmss |
| Region | East US |
| Availability zone | Zones 1, 2, 3 (select all for HA) |
| Orchestration mode | Flexible (recommended) |
| Security type | Standard |
| Image | Ubuntu Server 22.04 LTS |
| VM architecture | x64 |
| Size | Standard_B2s (2 vCPUs, 4 GB RAM) |
| Authentication type | SSH public key |
| Username | azureuser |
| SSH public key | Paste your public key |
Tip on Availability Zones: Selecting all three zones means Azure will spread your VMs across three physically separate datacenters. If one zone goes down, the others keep serving traffic.
Step 3: Disks Tab
- OS disk type: Premium SSD (for production) or Standard SSD (for dev/test)
- Encryption: Platform-managed key (default) or Customer-managed key
Step 4: Networking Tab
-
Virtual network: Select
vmss-vnet Subnet:
vmss-subnetLoad balancing: Select Azure load balancer
-
Click Create a load balancer:
- Name:
my-lb - Type: Public (for internet-facing) or Internal (for private)
- Protocol: TCP
- Frontend port: 80
- Backend port: 80
- Name:
Public IP address: Create new →
my-app-lb-pip-
NIC network security group: Advanced → Create NSG
- Add inbound rule: Allow TCP 80 from Internet
- Add inbound rule: Allow TCP 22 from your IP (for SSH)
Step 5: Scaling Tab
This is where VMSS gets powerful.
| Field | Value |
|---|---|
| Initial instance count | 2 |
| Scaling policy | Autoscale |
Configure autoscale:
- Click Configure
- Set minimum instances: 2
- Set maximum instances: 10
- Set default instance count: 2
Add a scale-out rule:
- Metric: Percentage CPU
- Operator: Greater than
- Threshold: 75%
- Duration: 5 minutes
- Action: Increase count by 2
- Cool down: 5 minutes
Add a scale-in rule:
- Metric: Percentage CPU
- Operator: Less than
- Threshold: 25%
- Duration: 5 minutes
- Action: Decrease count by 1
- Cool down: 5 minutes
Cool down periods prevent flapping where the scale set rapidly adds and removes instances in response to brief spikes. Always set a cool down of at least 5 minutes.
Step 6: Health Tab
Enable Application health monitoring:
- Extension: Application Health Extension
- Protocol: HTTP
- Port: 80
- Path:
/health(or/if you don't have a health endpoint)
This lets Azure know whether individual VM instances are actually serving traffic successfully — not just whether the VM is running.
Step 7: Advanced Tab
Custom data (cloud-init): You can pass a startup script that runs on every new instance:
#cloud-config
package_update: true
packages:
- nginx
runcmd:
- systemctl enable nginx
- systemctl start nginx
- echo "Hello from $(hostname)" > /var/www/html/index.html
Paste this in the Custom data field (base64 encoding is handled automatically by the portal).
Step 8: Review + Create
Review all settings, then click Create. Azure will:
- Provision the Load Balancer and Public IP
- Create the initial VM instances (2 in our case)
- Register them with the load balancer backend pool
- Apply the autoscale policy
The deployment typically takes 3–5 minutes.
Part 2: Create a VMSS via Azure CLI
The CLI approach is faster, scriptable, and version-controllable ideal for CI/CD pipelines and IaC workflows.
Create the VMSS
az vmss create \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--image Ubuntu2204 \
--vm-sku Standard_B2s \
--instance-count 2 \
--admin-username azureuser \
--generate-ssh-keys \
--vnet-name vmss-vnet \
--subnet vmss-subnet \
--public-ip-address my-app-lb-pip \
--lb my-app-lb \
--backend-pool-name my-app-backend \
--lb-sku Standard \
--zones 1 2 3 \
--orchestration-mode Flexible \
--upgrade-policy-mode Rolling \
--custom-data cloud-init.yaml
The
--lbflag automatically creates an Azure Load Balancer and wires the VMSS backend pool to it.
Open Port 80 on the Load Balancer
# Create a load balancer rule for HTTP traffic
az network lb rule create \
--resource-group vmss-demo-rg \
--lb-name my-app-lb \
--name http-rule \
--protocol tcp \
--frontend-port 80 \
--backend-port 80 \
--frontend-ip-name loadBalancerFrontEnd \
--backend-pool-name my-app-backend \
--probe-name healthProbe
# Create a health probe
az network lb probe create \
--resource-group vmss-demo-rg \
--lb-name my-app-lb \
--name healthProbe \
--protocol http \
--port 80 \
--path /
Configure Autoscale Rules
# Get the VMSS resource ID
VMSS_ID=$(az vmss show \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--query id \
--output tsv)
# Create the autoscale profile
az monitor autoscale create \
--resource-group vmss-demo-rg \
--resource $VMSS_ID \
--resource-type Microsoft.Compute/virtualMachineScaleSets \
--name my-app-autoscale \
--min-count 2 \
--max-count 10 \
--count 2
# Add scale-out rule (CPU > 75% → add 2 instances)
az monitor autoscale rule create \
--resource-group vmss-demo-rg \
--autoscale-name my-app-autoscale \
--condition "Percentage CPU > 75 avg 5m" \
--scale out 2 \
--cooldown 5
# Add scale-in rule (CPU < 25% → remove 1 instance)
az monitor autoscale rule create \
--resource-group vmss-demo-rg \
--autoscale-name my-app-autoscale \
--condition "Percentage CPU < 25 avg 5m" \
--scale in 1 \
--cooldown 5
Verify the Deployment
# List all instances in the scale set
az vmss list-instances \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--output table
# Check instance health
az vmss get-instance-view \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--instance-id 0
# Get the public IP of the load balancer
az network public-ip show \
--resource-group vmss-demo-rg \
--name my-app-lb-pip \
--query ipAddress \
--output tsv
Open a browser and navigate to the public IP — you should see your Nginx page.
Part 3: Upgrade Policies... how to Update Your Fleet
One of the trickiest parts of managing a scale set is rolling out updates (new OS image, new app version) without downtime. VMSS supports three upgrade modes:
Automatic Upgrades
Azure automatically upgrades VM instances as soon as the scale set model is updated. No manual intervention needed, but instances may restart without warning.
az vmss update \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--set upgradePolicy.mode=Automatic
Best for: Dev/test environments.
Rolling Upgrades (Recommended for Production)
Azure upgrades VMs in batches, validating health before moving to the next batch. Requires health probes to be configured.
az vmss update \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--set upgradePolicy.mode=Rolling \
--set upgradePolicy.rollingUpgradePolicy.maxBatchInstancePercent=20 \
--set upgradePolicy.rollingUpgradePolicy.maxUnhealthyInstancePercent=20 \
--set upgradePolicy.rollingUpgradePolicy.maxUnhealthyUpgradedInstancePercent=5 \
--set upgradePolicy.rollingUpgradePolicy.pauseTimeBetweenBatches=PT30S
With this config, Azure upgrades 20% of VMs at a time, waits 30 seconds between batches, and stops if more than 5% of upgraded instances become unhealthy.
Manual Upgrades
The scale set model updates, but instances are only upgraded when you explicitly tell Azure to do so. Maximum control, but requires operator action.
# Update the model (e.g., new image version)
az vmss update \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--set upgradePolicy.mode=Manual
# Manually upgrade specific instances
az vmss update-instances \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--instance-ids 0 1 2
Part 4: Scaling Operations
Manual Scaling
# Scale to 5 instances manually
az vmss scale \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--new-capacity 5
Schedule-Based Scaling
Useful for predictable traffic patterns (e.g., scale up at 8am, scale down at 8pm).
# Scale up at 8am UTC on weekdays
az monitor autoscale profile create \
--resource-group vmss-demo-rg \
--autoscale-name my-app-autoscale \
--name weekday-peak \
--min-count 4 \
--max-count 10 \
--count 4 \
--recurrence week mon tue wed thu fri \
--timezone "UTC" \
--start 08:00 \
--end 20:00
SSH Into a Specific Instance
# Get the NAT rules to find which port maps to which instance
az network lb inbound-nat-rule list \
--resource-group vmss-demo-rg \
--lb-name my-app-lb \
--output table
# SSH using the NAT port (e.g., port 50000 maps to instance 0)
ssh -p 50000 azureuser@<load-balancer-public-ip>
Part 5: Monitoring Your Scale Set
View Autoscale Activity
az monitor activity-log list \
--resource-group vmss-demo-rg \
--max-events 20 \
--query "[?contains(operationName.value, 'autoscale')]" \
--output table
Key Metrics to Monitor in Azure Monitor
| Metric | Description | Alert threshold |
|---|---|---|
| Percentage CPU | Average CPU across all instances | > 80% for 10 min |
| Network In/Out | Traffic volume | Spike detection |
| Disk Read/Write | Storage I/O | > 90% of provisioned IOPS |
| VmAvailabilityMetric | Instance health status | Any unhealthy |
| Autoscale Scale Actions | Scale in/out events | Alert on unexpected scale-in |
# Create a CPU alert
az monitor metrics alert create \
--resource-group vmss-demo-rg \
--name high-cpu-alert \
--scopes $VMSS_ID \
--condition "avg Percentage CPU > 85" \
--window-size 5m \
--evaluation-frequency 1m \
--action my-action-group \
--description "VMSS CPU exceeded 85% for 5 minutes"
Part 6: Using a Custom Image with VMSS
Remember the Azure Custom Image we built in the previous article? Here's how to plug it into a VMSS.
# Get your gallery image version ID
IMAGE_ID=$(az sig image-version show \
--resource-group my-resource-group \
--gallery-name MyAppGallery \
--gallery-image-definition MyAppImage \
--gallery-image-version 1.0.0 \
--query id \
--output tsv)
# Create VMSS using your custom image
az vmss create \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--image $IMAGE_ID \
--vm-sku Standard_B2s \
--instance-count 2 \
--admin-username azureuser \
--generate-ssh-keys \
--lb my-app-lb \
--zones 1 2 3 \
--orchestration-mode Flexible
This is the golden image pattern in action — every instance spins up from your pre-configured, pre-hardened image.
Architecture: Production-Ready VMSS Setup
Here's what a production VMSS deployment typically looks like:
Internet
│
┌────────▼────────┐
│ Azure Front │
│ Door / WAF │
└────────┬────────┘
│
┌────────▼────────┐
│ App Gateway / │
│ Load Balancer │
└────────┬────────┘
│
┌─────────────┼─────────────┐
│ Zone 1 │ Zone 2 │ Zone 3
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ VM #1 │ │ VM #2 │ │ VM #3 │
│ (VMSS) │ │ (VMSS) │ │ (VMSS) │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└──────────────┼──────────────┘
│
┌────────▼────────┐
│ Azure Monitor │
│ + Autoscale │
└────────┬────────┘
│
┌──────────────┼───────────────┐
│ │ │
┌───────▼──┐ ┌───────▼──┐ ┌───────▼──┐
│ Azure DB │ │Key Vault │ │ Storage │
└──────────┘ └──────────┘ └───────────┘
Key components:
- Azure Front Door or WAF: DDoS protection and global routing at the edge
- Application Gateway or Load Balancer: Layer 7 or Layer 4 traffic distribution
- VMSS across 3 Availability Zones: High availability against datacenter failures
- Azure Monitor + Autoscale: Reactive and scheduled scaling
- Azure Key Vault: Secrets injected at runtime, never baked into images
- Managed Identity: VM instances authenticate to Azure services without credentials
Best Practices
Design for Statelessness
VMs in a scale set can be added or removed at any time. Your application should:
- Store session data in Azure Cache for Redis, not in-memory
- Write files to Azure Blob Storage or a shared file system, not local disk
- Use Azure Service Bus or Event Hub for message queuing
Use Spot Instances for Cost Savings
For fault-tolerant, interruptible workloads (batch jobs, rendering, CI runners), mix spot instances with on-demand:
az vmss create \
--resource-group vmss-demo-rg \
--name my-batch-vmss \
--priority Spot \
--eviction-policy Deallocate \
--max-price 0.05 \
--image Ubuntu2204 \
--vm-sku Standard_D4s_v3 \
--instance-count 0
Spot instances can save up to 90% compared to on-demand pricing — with the tradeoff that Azure can evict them when capacity is needed.
Always Configure Health Probes
Without health probes, Azure doesn't know if your application is actually working. A VM could be running but serving 500 errors, and autoscale would keep it in the pool.
az vmss update \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--set virtualMachineProfile.extensionProfile.extensions='[{
"name": "HealthExtension",
"properties": {
"publisher": "Microsoft.ManagedServices",
"type": "ApplicationHealthLinux",
"typeHandlerVersion": "1.0",
"settings": {
"protocol": "http",
"port": 80,
"requestPath": "/health"
}
}
}]'
Protect Against Accidental Scale-In
In production, you may want to prevent certain instances from being terminated during a scale-in event (e.g., an instance running a long job).
# Protect a specific instance from scale-in
az vmss update-instances \
--resource-group vmss-demo-rg \
--name my-app-vmss \
--instance-ids 2 \
--protect-from-scale-in true
Quick Reference — Common CLI Commands
# Create VMSS
az vmss create --resource-group <rg> --name <name> --image <image> --instance-count <n>
# List instances
az vmss list-instances --resource-group <rg> --name <name> --output table
# Scale manually
az vmss scale --resource-group <rg> --name <name> --new-capacity <n>
# Update instances to latest model
az vmss update-instances --resource-group <rg> --name <name> --instance-ids "*"
# Reimage an instance (fresh OS disk)
az vmss reimage --resource-group <rg> --name <name> --instance-id <id>
# Delete a specific instance
az vmss delete-instances --resource-group <rg> --name <name> --instance-ids <id>
# Show autoscale settings
az monitor autoscale show --resource-group <rg> --name <autoscale-name>
# Delete the entire VMSS
az vmss delete --resource-group <rg> --name <name>
Conclusion
Azure Virtual Machine Scale Sets are one of the most powerful tools in a cloud engineer's toolkit. Once you understand the orchestration modes, upgrade policies, and autoscale configuration, you can build infrastructure that handles anything from a quiet weekend to a viral traffic spike without manual intervention.
Recap of what we covered:
- Uniform vs Flexible orchestration modes
- Creating a VMSS via Portal and Azure CLI
- Wiring up a Load Balancer with health probes
- Configuring CPU-based and schedule-based autoscaling
- Rolling upgrade strategies for zero-downtime deployments
- Using custom images from Azure Compute Gallery
- Production architecture patterns and best practices
Found this helpful? Drop a ❤️ and share it with your team. Questions or corrections? Leave a comment below.



Top comments (0)