DEV Community

Cover image for Azure Virtual Machine Scale Sets (VMSS): A Complete Guide
Emmanuel Chukwudi
Emmanuel Chukwudi

Posted on

Azure Virtual Machine Scale Sets (VMSS): A Complete Guide

Learn how to deploy, configure, and auto-scale fleets of identical VMs on Azure from zero to production-ready.


Introduction

Imagine your application suddenly gets a traffic spike maybe a product launch, a viral post, or a scheduled batch job. Without automation, you're either over-provisioned (paying for idle VMs) or scrambling to manually spin up instances while users hit errors.

Azure Virtual Machine Scale Sets (VMSS) solve this. They let you deploy and manage a group of identical, load-balanced VMs that automatically scale in or out based on demand all from a single configuration.

In this guide, we'll cover:

  • What VMSS is and how it works under the hood
  • Orchestration modes: Uniform vs Flexible
  • How to create a VMSS via the Portal and Azure CLI
  • Configuring autoscaling rules
  • Integrating with a Load Balancer
  • Updating your Scale Set (rolling upgrades)
  • Real-world best practices

Let's build.


What is a Virtual Machine Scale Set?

A Virtual Machine Scale Set is an Azure compute resource that lets you create and manage a group of load-balanced VMs. Key characteristics:

  • All VMs in a scale set are created from the same base image and configuration
  • The set can automatically increase or decrease the number of VM instances based on demand or a schedule
  • VMs are distributed across Availability Zones or Fault/Update Domains for high availability
  • Integrates natively with Azure Load Balancer, Application Gateway, and Azure Monitor

How It Works

                        ┌─────────────────────────────────────┐
                        │         Azure Load Balancer          │
                        └────────────────┬────────────────────┘
                                         │
                   ┌─────────────────────┼─────────────────────┐
                   │                     │                       │
            ┌──────▼──────┐      ┌───────▼─────┐      ┌────────▼────┐
            │   VM #1      │      │    VM #2     │      │   VM #3     │
            │ (instance 0) │      │ (instance 1) │      │ (instance 2)│
            └─────────────┘      └─────────────┘      └────────────-┘
                   │                     │                       │
                   └─────────────────────┼─────────────────────┘
                                         │
                              ┌──────────▼──────────┐
                              │   Autoscale Engine   │
                              │  (Azure Monitor)     │
                              └─────────────────────┘
Enter fullscreen mode Exit fullscreen mode

When CPU crosses a threshold (or any metric you define), Azure's autoscale engine fires and adds or removes VM instances automatically. The Load Balancer redistributes traffic across the new fleet.


Orchestration Modes: Uniform vs Flexible

Before creating a VMSS, you need to choose an orchestration mode. This is one of the most important decisions and a common source of confusion.

Uniform Orchestration (Classic)

  • All VMs are identical same size, same image, same config
  • Azure manages the VMs as a fleet; you interact with the scale set, not individual VMs
  • Best for stateless workloads: web servers, API backends, batch processing
  • Supports up to 1,000 VM instances (with platform images)
  • Built-in integration with autoscale

Flexible Orchestration (Modern — Recommended)

  • VMs can have different sizes and configurations within the same scale set
  • You get full VM-level control SSH, unique managed identities, individual updates
  • Supports mixing spot and on-demand instances in the same set
  • Works across Availability Zones with zone balancing
  • Supports up to 1,000 instances
  • Microsoft's recommended mode for new workloads
Feature Uniform Flexible
VM customization All identical Individual VM control
Max instances 1,000 1,000
Autoscale
Spot + On-demand mix
Availability Zones
Use case Stateless fleets General purpose

For new projects, default to Flexible orchestration unless you have a specific reason for Uniform.


Prerequisites

Before creating a VMSS, make sure you have:

  • An Azure subscription
  • Azure CLI installed: az --version (install from aka.ms/installazurecliwindows)
  • A Resource Group and Virtual Network ready (we'll create these below)
# Login to Azure
az login

# Set your subscription
az account set --subscription "your-subscription-id"

# Create a resource group
az group create \
  --name vmss-demo-rg \
  --location eastus

# Create a VNet and subnet
az network vnet create \
  --resource-group vmss-demo-rg \
  --name vmss-vnet \
  --address-prefix 10.0.0.0/16 \
  --subnet-name vmss-subnet \
  --subnet-prefix 10.0.1.0/24
Enter fullscreen mode Exit fullscreen mode

Part 1: Create a VMSS via the Azure Portal

Step 1: Navigate to Virtual Machine Scale Sets

  1. In the Azure Portal, search for "Virtual machine scale sets" in the top search bar
  2. Click + Create


Step 2: Basics Tab

Fill in the following:

Field Value
Subscription Your subscription
Resource group vmss-demo-rg
Virtual machine scale set name my-app-vmss
Region East US
Availability zone Zones 1, 2, 3 (select all for HA)
Orchestration mode Flexible (recommended)
Security type Standard
Image Ubuntu Server 22.04 LTS
VM architecture x64
Size Standard_B2s (2 vCPUs, 4 GB RAM)
Authentication type SSH public key
Username azureuser
SSH public key Paste your public key

Tip on Availability Zones: Selecting all three zones means Azure will spread your VMs across three physically separate datacenters. If one zone goes down, the others keep serving traffic.


Step 3: Disks Tab

  • OS disk type: Premium SSD (for production) or Standard SSD (for dev/test)
  • Encryption: Platform-managed key (default) or Customer-managed key


Step 4: Networking Tab

  1. Virtual network: Select vmss-vnet
  2. Subnet: vmss-subnet

  3. Load balancing: Select Azure load balancer

  4. Click Create a load balancer:

    • Name: my-lb
    • Type: Public (for internet-facing) or Internal (for private)
    • Protocol: TCP
    • Frontend port: 80
    • Backend port: 80
  5. Public IP address: Create new → my-app-lb-pip

  6. NIC network security group: Advanced → Create NSG

    • Add inbound rule: Allow TCP 80 from Internet
    • Add inbound rule: Allow TCP 22 from your IP (for SSH)

Step 5: Scaling Tab

This is where VMSS gets powerful.

Field Value
Initial instance count 2
Scaling policy Autoscale

Configure autoscale:

  1. Click Configure
  2. Set minimum instances: 2
  3. Set maximum instances: 10
  4. Set default instance count: 2

Add a scale-out rule:

  • Metric: Percentage CPU
  • Operator: Greater than
  • Threshold: 75%
  • Duration: 5 minutes
  • Action: Increase count by 2
  • Cool down: 5 minutes

Add a scale-in rule:

  • Metric: Percentage CPU
  • Operator: Less than
  • Threshold: 25%
  • Duration: 5 minutes
  • Action: Decrease count by 1
  • Cool down: 5 minutes

Cool down periods prevent flapping where the scale set rapidly adds and removes instances in response to brief spikes. Always set a cool down of at least 5 minutes.


Step 6: Health Tab

Enable Application health monitoring:

  • Extension: Application Health Extension
  • Protocol: HTTP
  • Port: 80
  • Path: /health (or / if you don't have a health endpoint)

This lets Azure know whether individual VM instances are actually serving traffic successfully — not just whether the VM is running.


Step 7: Advanced Tab

Custom data (cloud-init): You can pass a startup script that runs on every new instance:

#cloud-config
package_update: true
packages:
  - nginx
runcmd:
  - systemctl enable nginx
  - systemctl start nginx
  - echo "Hello from $(hostname)" > /var/www/html/index.html
Enter fullscreen mode Exit fullscreen mode

Paste this in the Custom data field (base64 encoding is handled automatically by the portal).


Step 8: Review + Create

Review all settings, then click Create. Azure will:

  1. Provision the Load Balancer and Public IP
  2. Create the initial VM instances (2 in our case)
  3. Register them with the load balancer backend pool
  4. Apply the autoscale policy

The deployment typically takes 3–5 minutes.


Part 2: Create a VMSS via Azure CLI

The CLI approach is faster, scriptable, and version-controllable ideal for CI/CD pipelines and IaC workflows.

Create the VMSS

az vmss create \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --image Ubuntu2204 \
  --vm-sku Standard_B2s \
  --instance-count 2 \
  --admin-username azureuser \
  --generate-ssh-keys \
  --vnet-name vmss-vnet \
  --subnet vmss-subnet \
  --public-ip-address my-app-lb-pip \
  --lb my-app-lb \
  --backend-pool-name my-app-backend \
  --lb-sku Standard \
  --zones 1 2 3 \
  --orchestration-mode Flexible \
  --upgrade-policy-mode Rolling \
  --custom-data cloud-init.yaml
Enter fullscreen mode Exit fullscreen mode

The --lb flag automatically creates an Azure Load Balancer and wires the VMSS backend pool to it.


Open Port 80 on the Load Balancer

# Create a load balancer rule for HTTP traffic
az network lb rule create \
  --resource-group vmss-demo-rg \
  --lb-name my-app-lb \
  --name http-rule \
  --protocol tcp \
  --frontend-port 80 \
  --backend-port 80 \
  --frontend-ip-name loadBalancerFrontEnd \
  --backend-pool-name my-app-backend \
  --probe-name healthProbe

# Create a health probe
az network lb probe create \
  --resource-group vmss-demo-rg \
  --lb-name my-app-lb \
  --name healthProbe \
  --protocol http \
  --port 80 \
  --path /
Enter fullscreen mode Exit fullscreen mode

Configure Autoscale Rules

# Get the VMSS resource ID
VMSS_ID=$(az vmss show \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --query id \
  --output tsv)

# Create the autoscale profile
az monitor autoscale create \
  --resource-group vmss-demo-rg \
  --resource $VMSS_ID \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name my-app-autoscale \
  --min-count 2 \
  --max-count 10 \
  --count 2

# Add scale-out rule (CPU > 75% → add 2 instances)
az monitor autoscale rule create \
  --resource-group vmss-demo-rg \
  --autoscale-name my-app-autoscale \
  --condition "Percentage CPU > 75 avg 5m" \
  --scale out 2 \
  --cooldown 5

# Add scale-in rule (CPU < 25% → remove 1 instance)
az monitor autoscale rule create \
  --resource-group vmss-demo-rg \
  --autoscale-name my-app-autoscale \
  --condition "Percentage CPU < 25 avg 5m" \
  --scale in 1 \
  --cooldown 5
Enter fullscreen mode Exit fullscreen mode

Verify the Deployment

# List all instances in the scale set
az vmss list-instances \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --output table

# Check instance health
az vmss get-instance-view \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --instance-id 0

# Get the public IP of the load balancer
az network public-ip show \
  --resource-group vmss-demo-rg \
  --name my-app-lb-pip \
  --query ipAddress \
  --output tsv
Enter fullscreen mode Exit fullscreen mode

Open a browser and navigate to the public IP — you should see your Nginx page.


Part 3: Upgrade Policies... how to Update Your Fleet

One of the trickiest parts of managing a scale set is rolling out updates (new OS image, new app version) without downtime. VMSS supports three upgrade modes:

Automatic Upgrades

Azure automatically upgrades VM instances as soon as the scale set model is updated. No manual intervention needed, but instances may restart without warning.

az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set upgradePolicy.mode=Automatic
Enter fullscreen mode Exit fullscreen mode

Best for: Dev/test environments.


Rolling Upgrades (Recommended for Production)

Azure upgrades VMs in batches, validating health before moving to the next batch. Requires health probes to be configured.

az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set upgradePolicy.mode=Rolling \
  --set upgradePolicy.rollingUpgradePolicy.maxBatchInstancePercent=20 \
  --set upgradePolicy.rollingUpgradePolicy.maxUnhealthyInstancePercent=20 \
  --set upgradePolicy.rollingUpgradePolicy.maxUnhealthyUpgradedInstancePercent=5 \
  --set upgradePolicy.rollingUpgradePolicy.pauseTimeBetweenBatches=PT30S
Enter fullscreen mode Exit fullscreen mode

With this config, Azure upgrades 20% of VMs at a time, waits 30 seconds between batches, and stops if more than 5% of upgraded instances become unhealthy.


Manual Upgrades

The scale set model updates, but instances are only upgraded when you explicitly tell Azure to do so. Maximum control, but requires operator action.

# Update the model (e.g., new image version)
az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set upgradePolicy.mode=Manual

# Manually upgrade specific instances
az vmss update-instances \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --instance-ids 0 1 2
Enter fullscreen mode Exit fullscreen mode

Part 4: Scaling Operations

Manual Scaling

# Scale to 5 instances manually
az vmss scale \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --new-capacity 5
Enter fullscreen mode Exit fullscreen mode

Schedule-Based Scaling

Useful for predictable traffic patterns (e.g., scale up at 8am, scale down at 8pm).

# Scale up at 8am UTC on weekdays
az monitor autoscale profile create \
  --resource-group vmss-demo-rg \
  --autoscale-name my-app-autoscale \
  --name weekday-peak \
  --min-count 4 \
  --max-count 10 \
  --count 4 \
  --recurrence week mon tue wed thu fri \
  --timezone "UTC" \
  --start 08:00 \
  --end 20:00
Enter fullscreen mode Exit fullscreen mode

SSH Into a Specific Instance

# Get the NAT rules to find which port maps to which instance
az network lb inbound-nat-rule list \
  --resource-group vmss-demo-rg \
  --lb-name my-app-lb \
  --output table

# SSH using the NAT port (e.g., port 50000 maps to instance 0)
ssh -p 50000 azureuser@<load-balancer-public-ip>
Enter fullscreen mode Exit fullscreen mode

Part 5: Monitoring Your Scale Set

View Autoscale Activity

az monitor activity-log list \
  --resource-group vmss-demo-rg \
  --max-events 20 \
  --query "[?contains(operationName.value, 'autoscale')]" \
  --output table
Enter fullscreen mode Exit fullscreen mode

Key Metrics to Monitor in Azure Monitor

Metric Description Alert threshold
Percentage CPU Average CPU across all instances > 80% for 10 min
Network In/Out Traffic volume Spike detection
Disk Read/Write Storage I/O > 90% of provisioned IOPS
VmAvailabilityMetric Instance health status Any unhealthy
Autoscale Scale Actions Scale in/out events Alert on unexpected scale-in
# Create a CPU alert
az monitor metrics alert create \
  --resource-group vmss-demo-rg \
  --name high-cpu-alert \
  --scopes $VMSS_ID \
  --condition "avg Percentage CPU > 85" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action my-action-group \
  --description "VMSS CPU exceeded 85% for 5 minutes"
Enter fullscreen mode Exit fullscreen mode

Part 6: Using a Custom Image with VMSS

Remember the Azure Custom Image we built in the previous article? Here's how to plug it into a VMSS.

# Get your gallery image version ID
IMAGE_ID=$(az sig image-version show \
  --resource-group my-resource-group \
  --gallery-name MyAppGallery \
  --gallery-image-definition MyAppImage \
  --gallery-image-version 1.0.0 \
  --query id \
  --output tsv)

# Create VMSS using your custom image
az vmss create \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --image $IMAGE_ID \
  --vm-sku Standard_B2s \
  --instance-count 2 \
  --admin-username azureuser \
  --generate-ssh-keys \
  --lb my-app-lb \
  --zones 1 2 3 \
  --orchestration-mode Flexible
Enter fullscreen mode Exit fullscreen mode

This is the golden image pattern in action — every instance spins up from your pre-configured, pre-hardened image.


Architecture: Production-Ready VMSS Setup

Here's what a production VMSS deployment typically looks like:

                         Internet
                            │
                   ┌────────▼────────┐
                   │  Azure Front    │
                   │  Door / WAF     │
                   └────────┬────────┘
                            │
                   ┌────────▼────────┐
                   │  App Gateway /  │
                   │  Load Balancer  │
                   └────────┬────────┘
                            │
              ┌─────────────┼─────────────┐
              │ Zone 1      │ Zone 2       │ Zone 3
         ┌────▼────┐   ┌────▼────┐   ┌────▼────┐
         │  VM #1  │   │  VM #2  │   │  VM #3  │
         │ (VMSS)  │   │ (VMSS)  │   │ (VMSS)  │
         └────┬────┘   └────┬────┘   └────┬────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
                    ┌────────▼────────┐
                    │  Azure Monitor  │
                    │  + Autoscale    │
                    └────────┬────────┘
                             │
              ┌──────────────┼───────────────┐
              │              │               │
      ┌───────▼──┐   ┌───────▼──┐   ┌───────▼──┐
      │ Azure DB │   │Key Vault │   │  Storage  │
      └──────────┘   └──────────┘   └───────────┘
Enter fullscreen mode Exit fullscreen mode

Key components:

  • Azure Front Door or WAF: DDoS protection and global routing at the edge
  • Application Gateway or Load Balancer: Layer 7 or Layer 4 traffic distribution
  • VMSS across 3 Availability Zones: High availability against datacenter failures
  • Azure Monitor + Autoscale: Reactive and scheduled scaling
  • Azure Key Vault: Secrets injected at runtime, never baked into images
  • Managed Identity: VM instances authenticate to Azure services without credentials

Best Practices

Design for Statelessness

VMs in a scale set can be added or removed at any time. Your application should:

  • Store session data in Azure Cache for Redis, not in-memory
  • Write files to Azure Blob Storage or a shared file system, not local disk
  • Use Azure Service Bus or Event Hub for message queuing

Use Spot Instances for Cost Savings

For fault-tolerant, interruptible workloads (batch jobs, rendering, CI runners), mix spot instances with on-demand:

az vmss create \
  --resource-group vmss-demo-rg \
  --name my-batch-vmss \
  --priority Spot \
  --eviction-policy Deallocate \
  --max-price 0.05 \
  --image Ubuntu2204 \
  --vm-sku Standard_D4s_v3 \
  --instance-count 0
Enter fullscreen mode Exit fullscreen mode

Spot instances can save up to 90% compared to on-demand pricing — with the tradeoff that Azure can evict them when capacity is needed.

Always Configure Health Probes

Without health probes, Azure doesn't know if your application is actually working. A VM could be running but serving 500 errors, and autoscale would keep it in the pool.

az vmss update \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --set virtualMachineProfile.extensionProfile.extensions='[{
    "name": "HealthExtension",
    "properties": {
      "publisher": "Microsoft.ManagedServices",
      "type": "ApplicationHealthLinux",
      "typeHandlerVersion": "1.0",
      "settings": {
        "protocol": "http",
        "port": 80,
        "requestPath": "/health"
      }
    }
  }]'
Enter fullscreen mode Exit fullscreen mode

Protect Against Accidental Scale-In

In production, you may want to prevent certain instances from being terminated during a scale-in event (e.g., an instance running a long job).

# Protect a specific instance from scale-in
az vmss update-instances \
  --resource-group vmss-demo-rg \
  --name my-app-vmss \
  --instance-ids 2 \
  --protect-from-scale-in true
Enter fullscreen mode Exit fullscreen mode

Quick Reference — Common CLI Commands

# Create VMSS
az vmss create --resource-group <rg> --name <name> --image <image> --instance-count <n>

# List instances
az vmss list-instances --resource-group <rg> --name <name> --output table

# Scale manually
az vmss scale --resource-group <rg> --name <name> --new-capacity <n>

# Update instances to latest model
az vmss update-instances --resource-group <rg> --name <name> --instance-ids "*"

# Reimage an instance (fresh OS disk)
az vmss reimage --resource-group <rg> --name <name> --instance-id <id>

# Delete a specific instance
az vmss delete-instances --resource-group <rg> --name <name> --instance-ids <id>

# Show autoscale settings
az monitor autoscale show --resource-group <rg> --name <autoscale-name>

# Delete the entire VMSS
az vmss delete --resource-group <rg> --name <name>
Enter fullscreen mode Exit fullscreen mode

Conclusion

Azure Virtual Machine Scale Sets are one of the most powerful tools in a cloud engineer's toolkit. Once you understand the orchestration modes, upgrade policies, and autoscale configuration, you can build infrastructure that handles anything from a quiet weekend to a viral traffic spike without manual intervention.

Recap of what we covered:

  • Uniform vs Flexible orchestration modes
  • Creating a VMSS via Portal and Azure CLI
  • Wiring up a Load Balancer with health probes
  • Configuring CPU-based and schedule-based autoscaling
  • Rolling upgrade strategies for zero-downtime deployments
  • Using custom images from Azure Compute Gallery
  • Production architecture patterns and best practices

Found this helpful? Drop a ❤️ and share it with your team. Questions or corrections? Leave a comment below.

Top comments (0)