Table Of Contents.
- Introduction
- VMSS and its components.
- What is Terraform?
- Prerequisites
- Prepare environment & authenticate to Azure.
- Create a local Terraform project with modular .tf files.
- Initialize / Plan / Apply (local)
- Verify
- Conclusion
🚀 Introduction
Deploying an Azure Virtual Machine Scale Set Using Terraform
In today’s cloud-first world, scalability and automation are non-negotiable. This article explores how to deploy Azure Virtual Machine Scale Sets using Terraform, a powerful Infrastructure as Code (IaC) tool that enables consistent, repeatable, and efficient cloud provisioning. Whether building resilient applications or optimizing resource management, this guide walks you through the essentials of defining, deploying, and managing VM scale sets with Terraform, unlocking the full potential of Azure’s elastic compute capabilities.
🚀VMSS and its components
Virtual Machine Scale Sets (VMSS) allow you to create and manage a group of identical,
load-balanced virtual machines (VMs). Think of them as a way to automatically increase or
decrease computing power based on demand—similar to how supermarkets bring in more cashiers
when there are many customers, and fewer when the store is quiet.
Example: Imagine a movie theater. On a normal weekday afternoon, only one or two
ticket counters may be open. But on Friday night, when crowds rush in, more counters open
automatically. VM Scale Sets work in the same way—automatically adding or removing 'counters'
(VMs) depending on the number of 'customers' (requests) arriving.
Key Benefits of VM Scale Sets:
- Automatic Scaling: Handles traffic spikes without manual intervention.
- Load Balancing: Ensures no single VM is overloaded.
- High Availability: Keeps your app running even if some VMs fail. - **Cost Efficiency: You only pay for what you use.
Another Analogy: Think of VM Scale Sets like ride-hailing services (e.g., Uber). During peak
hours, more drivers appear on the road (scaling out). Fewer drivers are available at night when demand is low (scaling in). This ensures efficiency and availability without wasting resources.
In summary, Virtual Machine Scale Sets are like having a flexible team that grows or shrinks
automatically, ensuring smooth operations and optimized costs—whether you're running a small
website or a large-scale application.
What is Terraform?
Terraform is an open-source Infrastructure as Code (IaC) tool developed by HashiCorp. It allows you to define and provision cloud infrastructure using declarative configuration files. Instead of manually creating resources through a cloud portal, you write code that describes the desired infrastructure state, and Terraform handles the deployment and updates.
Why Terraform?
We use Terraform to deploy a Virtual Machine Scale Set (VMSS) in Azure because it offers:
- 🔁 Repeatability: You can deploy identical environments (e.g., dev, test, prod) using the same configuration.
- ⚙️ Automation: Terraform automates the creation, scaling, and management of VM instances.
- 📦 Version Control: Infrastructure changes are tracked in source control (e.g., GitHub), enabling collaboration and rollback.
- 📈 Scalability: VMSS can automatically adjust the number of VMs based on demand, and Terraform makes configuring autoscale rules seamless.
Let’s turn infrastructure into code—and scale like a pro.
🚀 Prerequisites
The following are the prerequisites needed for this task.
- VsCode
- Azure subscription and permissions to create resources (or ask an admin).
- Azure CLI installed (az) and logged in, or Service Principal credentials.
- Terraform CLI (>= 1.4 recommended).
- Git and (optionally) GitHub CLI (gh).
- An SSH key pair for VM admin access (~/.ssh/id_rsa.pub).
🚀 Prepare environment & authenticate to Azure.
A. Authenticate to Azure
Interactive (dev machine)
Open your VSCode, create your project name first as "VMSS"
mkdir vmss-terraform #this creates a directory.
cd vmss-terraform #this gets you into the root directory
Then, log in to Azure
az login #this pops your AZ account.
B. Service Principal (recommended for CI)
Run the following, and replace the with yours.
az ad sp create-for-rbac --name "http://tf-vmss-sp" \
--role "Contributor" \
--scopes "/subscriptions/"
Save the output (appId, password, tenant).
{
"appId": "6d509bb1-981f-4193-ad42-05bf45e72f12",
"displayName": "http://tf-vmss-sp",
"password": "epe8Q~hRGUg7f5hrBK_Pl~4N3wml1PutgPSiqaMn",
"tenant": "01d4fe9c-2658-4141-be2f-b1f11eca9673"
}
Then export env vars for Terraform by replacing them with the actual values, and run them collectively in the terminal.
export ARM_SUBSCRIPTION_ID="14e8ed5c-bc41-4d1b-b5b7-ff9419e2f0d6"
export ARM_CLIENT_ID="6d509bb1-981f-4193-ad42-05bf45e72f12"
export ARM_CLIENT_SECRET="epe8Q~hRGUg7f5hrBK_Pl~4N3wml1PutgPSiqaMn"
export ARM_TENANT_ID="01d4fe9c-2658-4141-be2f-b1f11eca9673"
🚀Create a local Terraform project with modular .tf files.
- Create project directory & layout setup-backend.sh vmss-terraform/ ├─ setup-backend.sh ├─ backend.tf ├─ frontend.tf ├─ main.tf # provider + top-level resources (or split) ├─ variables.tf ├─ network.tf ├─ compute.tf ├─ autoscale.tf ├─ outputs.tf ├─ user-data.sh # cloud-init to install nginx (optional) ├─ versions.tf └─ .gitignore
touch setup-backend.sh (paste the script below into it)
!/bin/bash
------------------------------------------------------------------
setup-backend.sh
Creates an Azure Storage backend for Terraform remote state
------------------------------------------------------------------
====== CONFIGURABLE VARIABLES ======
RESOURCE_GROUP_NAME="terraform-state-rg"
LOCATION="westus3"
STORAGE_ACCOUNT_NAME="tfstate$(openssl rand -hex 3 | tr -d '\n' | tr '[:upper:]' '[:lower:]')" # ensures uniqueness
CONTAINER_NAME="tfstate"
KEY_NAME="vmss.terraform.tfstate"
====================================
echo "🔍 Checking Azure CLI login..."
if ! az account show >/dev/null 2>&1; then
echo "❌ You are not logged in to Azure CLI. Run: az login"
exit 1
fi
SUBSCRIPTION_ID=$(az account show --query id -o tsv)
echo "✅ Using Azure Subscription: $SUBSCRIPTION_ID"
echo "🚀 Creating resource group if not exists..."
az group create --name "$RESOURCE_GROUP_NAME" --location "$LOCATION" >/dev/null
echo "💾 Creating storage account: $STORAGE_ACCOUNT_NAME ..."
az storage account create
--name "$STORAGE_ACCOUNT_NAME"
--resource-group "$RESOURCE_GROUP_NAME"
--location "$LOCATION"
--sku Standard_LRS
--encryption-services blob >/dev/null
echo "📦 Creating blob container: $CONTAINER_NAME ..."
az storage container create
--name "$CONTAINER_NAME"
--account-name "$STORAGE_ACCOUNT_NAME" >/dev/null
Print summary
ACCOUNT_KEY=$(az storage account keys list
--resource-group "$RESOURCE_GROUP_NAME"
--account-name "$STORAGE_ACCOUNT_NAME"
--query "[0].value" -o tsv)
echo ""
echo "✅ Terraform backend storage created successfully!"
echo ""
echo "🧩 Use the following values in your backend.tf file:"
echo "---------------------------------------------------"
echo "resource_group_name = "$RESOURCE_GROUP_NAME""
echo "storage_account_name = "$STORAGE_ACCOUNT_NAME""
echo "container_name = "$CONTAINER_NAME""
echo "key = "$KEY_NAME""
echo "---------------------------------------------------"
echo ""
echo "🔑 Storage Account Key (keep secret):"
echo "$ACCOUNT_KEY"
echo ""
echo "💡 Example: Run 'terraform init -reconfigure' after updating backend.tf."
Make it executable by running the code below
chmod +x setup-backend.sh
Run it with the code below;
./setup-backend.sh
Update your backend.tf file with the actual values below;
(resource_group_name = terraform-state-rg
storage_account_name = tfstatee0133f
container_name = tfstate
key = vmss.terraform.tfstate)
touch backend.tf (paste the script below into it)
terraform {
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "tfstatee0133f"
container_name = "tfstate"
key = "vmss.terraform.tfstate"
}
}
Run 'terraform init -reconfigure' after updating backend.tf
Re-run
chmod +x setup-backend.sh && ./setup-backend.sh
The script has some formatting issues with line breaks in the Azure CLI commands. Let's manually create the storage account with the name it generated (tfstateb3ecc2):
az storage container create --name tfstate --account-name tfstateb3ecc2
Now, let's create the container
az storage container create --name tfstate --account-name tfstateb3ecc2
Run the code below
terraform init -reconfigure
touch frontend.tf (paste the code below into it)
frontend.tf
resource "azurerm_public_ip" "frontend_pip" {
name = "vmss-frontend-pip"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
allocation_method = "Static"
domain_name_label = "vmss-frontend-${random_string.fqdn.result}"
sku = "Standard"
tags = var.tags
}
resource "azurerm_lb" "frontend_lb" {
name = "vmss-frontend-lb"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
sku = "Standard"
frontend_ip_configuration {
name = "PublicFrontendIP"
public_ip_address_id = azurerm_public_ip.frontend_pip.id
}
tags = var.tags
}
resource "azurerm_lb_backend_address_pool" "frontend_bepool" {
loadbalancer_id = azurerm_lb.frontend_lb.id
name = "FrontendBackendPool"
}
resource "azurerm_lb_probe" "frontend_probe" {
loadbalancer_id = azurerm_lb.frontend_lb.id
name = "http-probe"
protocol = "Tcp"
port = var.application_port
}
resource "azurerm_lb_rule" "frontend_rule" {
loadbalancer_id = azurerm_lb.frontend_lb.id
name = "http-rule"
protocol = "Tcp"
frontend_port = var.application_port
backend_port = var.application_port
frontend_ip_configuration_name = "PublicFrontendIP"
backend_address_pool_ids = [azurerm_lb_backend_address_pool.frontend_bepool.id]
probe_id = azurerm_lb_probe.frontend_probe.id
}
touch main.tf (paste the code below into it)
provider "azurerm" {
features {}
}
resource "random_string" "fqdn" {
length = 6
upper = false
special = false
numeric = false
}
touch variables.tf (paste the code below into it)
variable "resource_group_name" {
type = string
default = "terraform-vmss-rg"
}
variable "location" {
type = string
default = "westus3"
}
variable "admin_user" {
type = string
default = "azureuser"
}
variable "ssh_public_key_path" {
type = string
default = "~/.ssh/id_rsa.pub"
}
variable "instances" {
type = number
default = 2
}
variable "vm_size" {
type = string
default = "Standard_DS1_v2"
}
variable "application_port" {
type = number
default = 80
}
variable "tags" {
type = map(string)
default = {
env = "dev"
project = "vmss-demo"
}
}
touch network.tf (paste the code below into it)
resource "azurerm_resource_group" "rg" {
name = var.resource_group_name
location = var.location
tags = var.tags
}
resource "azurerm_virtual_network" "vnet" {
name = "vmss-vnet"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
address_space = ["10.0.0.0/16"]
tags = var.tags
}
resource "azurerm_subnet" "subnet" {
name = "vmss-subnet"
resource_group_name = azurerm_resource_group.rg.name
virtual_network_name = azurerm_virtual_network.vnet.name
address_prefixes = ["10.0.2.0/24"]
}
resource "azurerm_public_ip" "vmss_pip" {
name = "vmss-public-ip"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
allocation_method = "Static"
domain_name_label = "vmss-demo-${random_string.fqdn.result}"
tags = var.tags
}
resource "azurerm_lb" "lb" {
name = "vmss-lb"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
frontend_ip_configuration {
name = "PublicIPAddress"
public_ip_address_id = azurerm_public_ip.vmss_pip.id
}
tags = var.tags
}
resource "azurerm_lb_backend_address_pool" "bpepool" {
loadbalancer_id = azurerm_lb.lb.id
name = "BackEndPool"
}
resource "azurerm_lb_probe" "http_probe" {
loadbalancer_id = azurerm_lb.lb.id
name = "http-probe"
port = var.application_port
}
resource "azurerm_lb_rule" "http" {
loadbalancer_id = azurerm_lb.lb.id
name = "http"
protocol = "Tcp"
frontend_port = var.application_port
backend_port = var.application_port
frontend_ip_configuration_name = "PublicIPAddress"
backend_address_pool_ids = [azurerm_lb_backend_address_pool.bpepool.id]
probe_id = azurerm_lb_probe.http_probe.id
}
touch compute.tf ( paste the code below into it)
resource "azurerm_virtual_machine_scale_set" "vmss" {
name = "tf-vmss"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
upgrade_policy_mode = "Manual"
sku {
name = var.vm_size
tier = "Standard"
capacity = var.instances
}
storage_profile_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "22_04-lts"
version = "latest"
}
storage_profile_os_disk {
caching = "ReadWrite"
create_option = "FromImage"
managed_disk_type = "Standard_LRS"
}
os_profile {
computer_name_prefix = "vmss"
admin_username = var.admin_user
# NOTE: for demo simplicity MS sample uses a password. For production, prefer SSH keys or orchestrated VMSS with ssh keys.
admin_password = "ReplaceThisWithASecurePassword123!"
}
os_profile_linux_config {
disable_password_authentication = false
}
network_profile {
name = "terraformnetworkprofile"
primary = true
ip_configuration {
name = "IPConfiguration"
subnet_id = azurerm_subnet.subnet.id
load_balancer_backend_address_pool_ids = [azurerm_lb_backend_address_pool.bpepool.id]
primary = true
}
}
tags = var.tags
}
touch autoscale.tf (paste the code below into it)
resource "azurerm_monitor_autoscale_setting" "autoscale" {
name = "vmss-autoscale"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
target_resource_id = azurerm_virtual_machine_scale_set.vmss.id
enabled = true
profile {
name = "autoscale"
capacity {
default = var.instances
minimum = 1
maximum = 5
}
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_virtual_machine_scale_set.vmss.id
operator = "GreaterThan"
statistic = "Average"
time_aggregation = "Average"
time_window = "PT2M"
time_grain = "PT1M"
threshold = 75
}
scale_action {
direction = "Increase"
type = "ChangeCount"
value = "1"
cooldown = "PT5M"
}
}
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_virtual_machine_scale_set.vmss.id
operator = "LessThan"
statistic = "Average"
time_aggregation = "Average"
time_window = "PT2M"
time_grain = "PT1M"
threshold = 25
}
scale_action {
direction = "Decrease"
type = "ChangeCount"
value = "1"
cooldown = "PT5M"
}
}
}
}
touch outputs.tf (paste the code below into it)
output "public_ip" {
description = "The public IP address of the load balancer"
value = azurerm_public_ip.vmss_pip.ip_address
}
touch .gitignore (paste the code below into the file)
Terraform
.terraform/
*.tfstate
*.tfstate.backup
*.tfvars
crash.log
override.tf
override.tf.json
.terraform.lock.hcl
OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
IDE files
.vscode/
.idea/
*.swp
*.swo
*~
Environment variables
.env
.env.local
.env.*.local
touch versions.tf (paste the code below into it)
terraform {
required_version = ">= 1.4.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
"~> 3.0" means any 3.x version — 3.0 through <4.0
version = "~> 3.0"
}
random = {
source = "hashicorp/random"
version = "~> 3.0"
}
}
}
(NB; Remove the duplicate public IP and load balancer blocks from the network.tf (they’re now in frontend.tf).
In compute.tf, change this reference line:
load_balancer_backend_address_pool_ids = [azurerm_lb_backend_address_pool.bpepool.id]
TO;
load_balancer_backend_address_pool_ids = [azurerm_lb_backend_address_pool.frontend_bepool.id]
🚀Initialize / Plan / Apply (local)
initialize providers by running
review plan by running
terraform plan -out main.tfplan
apply (use the plan you reviewed)
terraform apply main.tfplan
This returned an error, as seen in the screenshot
In your compute.tf file, update the image reference to use a valid Ubuntu 22.04 LTS image below;
storage_profile_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-focal"
sku = "20_04-lts-gen2"
version = "latest"
}
Run terraform apply
Copy and save your public IP
(public_ip = "4.227.9.177")
🚀Verify
Azure CLI: Run the code below;
az vm list --resource-group --output table
Replace with your resource group
az vm list --resource-group terraform-state-rg --output table
The command returned no results, meaning no VMs exist in the terraform-state-rg resource group. This resource group likely only contains your Terraform state storage account.
Now, run
az vm list --resource-group terraform-vmss-rg --output table
— You should see VM instances created by the VMSS.
Correct - terraform-state-rg only contains your Terraform state storage account. Your actual VMs/VMSS are in a different resource group.
List all resource groups to find where your VMs are deployed
List all Virtual Machine Scale Sets
az vmss list --output table
Run az vmss list --resource-group terraform-vmss-rg --output table
Now let's check the actual instances running in your scale set:
az vmss list-instances --resource-group terraform-vmss-rg --name tf-vmss --output table
Check instance connection info:
az vmss list-instance-connection-info --resource-group terraform-vmss-rg --name tf-vmss --output table
The command returned no output, meaning no connection endpoints are configured.
Check if instances exist in the scale set:
az vmss list-instances --resource-group terraform-vmss-rg --name tf-vmss --output table
Great. We have one instance (tf-vmss_0) running successfully.
Use the public_ip output or curl the domain from azurerm_public_ip to see the nginx page. (public_ip = "4.227.9.177")
You can access the nginx page using the public IP address. Here are the ways to do it:
http://4.227.9.177 on a browser or curl --connect-timeout 15 -m 30 -v http://4.227.9.177 on your terminal.
Conclusion
Deploying an Azure Virtual Machine Scale Set using Terraform and pushing it to GitHub exemplifies modern DevOps excellence—combining infrastructure as code with version control for scalability, repeatability, and team synergy. With Terraform’s declarative power and GitHub’s collaborative backbone, you've laid the foundation for resilient cloud architecture and agile development workflows. This approach not only accelerates deployment but also ensures traceability, security, and continuous improvement across environments.
Top comments (0)