Dev and test VMs running at full capacity around the clock cost you 3x more than they should. Here's how to auto-scale Azure VMs to minimum during off hours with Terraform - saving 50-65% without blocking anyone.
A 10-person dev team. 10 VMs running Standard_D4s_v5 at full power. All 24/7. Monthly cost: $1,382. What if they auto-scaled to Standard_B2s during off hours? Monthly cost: $594. That's $788/month saved and late-night devs can still work🌙
Here's the math:
Standard_D4s_v5 (4 vCPU, 16 GB) = $0.192/hour (business hours SKU)
Standard_B2s (2 vCPU, 4 GB) = $0.042/hour (off-hours minimum SKU)
Business hours (10hrs x 22 weekdays): $0.192 x 220 = $42.24/VM/month
Off hours (remaining 510 hours): $0.042 x 510 = $21.42/VM/month
Total per VM: $63.66/month
vs. 24/7 at full power: $0.192 x 730 = $140.16/VM/month
10 VMs: $636 vs $1,401 = $765/month saved
Annual savings: $9,180 🤯
And the VMs never go offline. A developer working late or over a weekend still has access, just on a smaller instance. No tickets, no manual startups, no blocked work.
The key principle: never shut down to zero. Scale to minimum. Let's build it. ⚡
🎯 Why Scale Down Instead of Shut Down?
Full shutdown sounds great on paper. In practice, it creates problems:
❌ Full shutdown = Developer at 10 PM can't access their VM
❌ Full shutdown = Startup time of 2-5 minutes when VM restarts
❌ Full shutdown = Running processes, sessions, and state are lost
❌ Full shutdown = Teams in other timezones are blocked
✅ Scale to minimum = VM stays accessible 24/7
✅ Scale to minimum = No startup delay, it's already running
✅ Scale to minimum = Processes keep running (after a quick reboot)
✅ Scale to minimum = Works for global teams across timezones
The trade-off? You still pay something during off hours. But a Standard_B2s at $0.042/hr is 78% cheaper than a Standard_D4s_v5 at $0.192/hr. For most dev/test workloads, that's the sweet spot.
🤖 Approach 1: Auto-Resize Individual VMs with Azure Automation
For standalone VMs (not in a Scale Set), use an Azure Automation Runbook that resizes VMs based on tags. The VM reboots briefly during resize, then comes back at the smaller size.
Step 1: Deploy the Automation Account
# schedules/vm-auto-resize/main.tf
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
provider "azurerm" {
features {}
}
data "azurerm_subscription" "current" {}
resource "azurerm_resource_group" "automation" {
name = "rg-vm-autoscale"
location = "eastus"
tags = {
Environment = "shared"
CostCenter = "platform"
Owner = "team-platform"
Project = "cost-governance"
ManagedBy = "terraform"
}
}
resource "azurerm_automation_account" "vm_scaler" {
name = "aa-vm-auto-resize"
location = azurerm_resource_group.automation.location
resource_group_name = azurerm_resource_group.automation.name
sku_name = "Basic"
identity {
type = "SystemAssigned"
}
tags = azurerm_resource_group.automation.tags
}
# Least-privilege: can read, resize, start, and deallocate VMs
resource "azurerm_role_definition" "vm_resize_operator" {
name = "VM Resize Operator"
scope = data.azurerm_subscription.current.id
description = "Can read and resize VMs. Nothing else."
permissions {
actions = [
"Microsoft.Compute/virtualMachines/read",
"Microsoft.Compute/virtualMachines/write",
"Microsoft.Compute/virtualMachines/start/action",
"Microsoft.Compute/virtualMachines/powerOff/action",
"Microsoft.Compute/virtualMachines/deallocate/action",
"Microsoft.Compute/virtualMachines/instanceView/read",
"Microsoft.Resources/subscriptions/resourceGroups/read",
]
not_actions = []
}
assignable_scopes = [data.azurerm_subscription.current.id]
}
resource "azurerm_role_assignment" "automation_vm_resize" {
scope = data.azurerm_subscription.current.id
role_definition_id = azurerm_role_definition.vm_resize_operator.role_definition_resource_id
principal_id = azurerm_automation_account.vm_scaler.identity[0].principal_id
}
Step 2: Deploy the Resize Runbook
This runbook reads two tags from each VM: ScaleUpSize (business hours SKU) and ScaleDownSize (off-hours minimum SKU). It resizes VMs that have the AutoSchedule tag.
resource "azurerm_automation_runbook" "vm_resize" {
name = "Resize-VMs-By-Tag"
location = azurerm_resource_group.automation.location
resource_group_name = azurerm_resource_group.automation.name
automation_account_name = azurerm_automation_account.vm_scaler.name
log_verbose = false
log_progress = false
runbook_type = "PowerShell72"
content = <<-POWERSHELL
Param(
[Parameter(Mandatory = $true)]
[ValidateSet("ScaleUp", "ScaleDown")]
[String] $Action
)
# Connect using Managed Identity
Disable-AzContextAutosave -Scope Process
$AzureContext = (Connect-AzAccount -Identity).context
$AzureContext = Set-AzContext -SubscriptionName $AzureContext.Subscription -DefaultProfile $AzureContext
# Find all VMs with the AutoSchedule tag
$vms = Get-AzVM | Where-Object {
$_.Tags.ContainsKey("AutoSchedule") -and
$_.Tags.ContainsKey("ScaleUpSize") -and
$_.Tags.ContainsKey("ScaleDownSize")
}
Write-Output "Found $($vms.Count) VMs with AutoSchedule tags"
foreach ($vm in $vms) {
$vmName = $vm.Name
$rgName = $vm.ResourceGroupName
$currentSize = $vm.HardwareProfile.VmSize
if ($Action -eq "ScaleDown") {
$targetSize = $vm.Tags["ScaleDownSize"]
} else {
$targetSize = $vm.Tags["ScaleUpSize"]
}
if ($currentSize -eq $targetSize) {
Write-Output "SKIP: $vmName is already $currentSize"
continue
}
Write-Output "$Action : $vmName from $currentSize to $targetSize..."
try {
$vm.HardwareProfile.VmSize = $targetSize
Update-AzVM -ResourceGroupName $rgName -VM $vm
Write-Output "SUCCESS: $vmName resized to $targetSize"
}
catch {
Write-Output "ERROR: Failed to resize $vmName - $($_.Exception.Message)"
# VM may need stop/start if sizes are in different families
try {
Write-Output "Attempting stop-resize-start for $vmName..."
Stop-AzVM -ResourceGroupName $rgName -Name $vmName -Force
$vm.HardwareProfile.VmSize = $targetSize
Update-AzVM -ResourceGroupName $rgName -VM $vm
Start-AzVM -ResourceGroupName $rgName -Name $vmName
Write-Output "SUCCESS: $vmName resized to $targetSize (with restart)"
}
catch {
Write-Output "FAILED: Could not resize $vmName - $($_.Exception.Message)"
}
}
}
Write-Output "Done. Processed $($vms.Count) VMs for $Action."
POWERSHELL
tags = azurerm_resource_group.automation.tags
}
Step 3: Create the Schedules
# Scale UP at 8 AM on weekdays (full power for business hours)
resource "azurerm_automation_schedule" "scale_up" {
name = "weekday-vm-scale-up"
resource_group_name = azurerm_resource_group.automation.name
automation_account_name = azurerm_automation_account.vm_scaler.name
frequency = "Week"
interval = 1
timezone = "Eastern Standard Time"
start_time = "2026-02-19T08:00:00-05:00"
description = "Scale up dev/test VMs to full power on weekday mornings"
week_days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
}
# Scale DOWN at 7 PM on weekdays (minimum for off hours)
resource "azurerm_automation_schedule" "scale_down" {
name = "weekday-vm-scale-down"
resource_group_name = azurerm_resource_group.automation.name
automation_account_name = azurerm_automation_account.vm_scaler.name
frequency = "Week"
interval = 1
timezone = "Eastern Standard Time"
start_time = "2026-02-19T19:00:00-05:00"
description = "Scale down dev/test VMs to minimum for off hours"
week_days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
}
# Scale DOWN on weekends too (stay at minimum all weekend)
resource "azurerm_automation_schedule" "weekend_scale_down" {
name = "weekend-vm-scale-down"
resource_group_name = azurerm_resource_group.automation.name
automation_account_name = azurerm_automation_account.vm_scaler.name
frequency = "Week"
interval = 1
timezone = "Eastern Standard Time"
start_time = "2026-02-21T08:00:00-05:00"
description = "Ensure VMs stay at minimum size on weekends"
week_days = ["Saturday", "Sunday"]
}
# Link schedules to runbook
resource "azurerm_automation_job_schedule" "scale_up" {
resource_group_name = azurerm_resource_group.automation.name
automation_account_name = azurerm_automation_account.vm_scaler.name
schedule_name = azurerm_automation_schedule.scale_up.name
runbook_name = azurerm_automation_runbook.vm_resize.name
parameters = {
action = "ScaleUp"
}
}
resource "azurerm_automation_job_schedule" "scale_down" {
resource_group_name = azurerm_resource_group.automation.name
automation_account_name = azurerm_automation_account.vm_scaler.name
schedule_name = azurerm_automation_schedule.scale_down.name
runbook_name = azurerm_automation_runbook.vm_resize.name
parameters = {
action = "ScaleDown"
}
}
resource "azurerm_automation_job_schedule" "weekend_scale_down" {
resource_group_name = azurerm_resource_group.automation.name
automation_account_name = azurerm_automation_account.vm_scaler.name
schedule_name = azurerm_automation_schedule.weekend_scale_down.name
runbook_name = azurerm_automation_runbook.vm_resize.name
parameters = {
action = "ScaleDown"
}
}
Step 4: Tag Your VMs to Opt In
resource "azurerm_linux_virtual_machine" "dev_api" {
name = "vm-api-dev"
resource_group_name = azurerm_resource_group.dev.name
location = azurerm_resource_group.dev.location
size = "Standard_D4s_v5" # Daytime size
# ... other config ...
tags = {
Environment = "dev"
CostCenter = "CC-1042"
Owner = "team-backend"
Project = "api-platform"
AutoSchedule = "business-hours"
ScaleUpSize = "Standard_D4s_v5" # Full power: 4 vCPU, 16 GB
ScaleDownSize = "Standard_B2s" # Minimum: 2 vCPU, 4 GB
ManagedBy = "terraform"
}
}
At 7 PM: VM resizes from D4s_v5 to B2s (brief reboot, ~60 seconds). At 8 AM: VM resizes back to D4s_v5. A developer at midnight? Still has access on the B2s. 🌙
📈 Approach 2: VMSS Schedule-Based Autoscale Profiles
For workloads running on Virtual Machine Scale Sets, you don't resize individual VMs. Instead, you define autoscale profiles with different capacity settings for business hours vs off hours.
# schedules/vmss-autoscale/main.tf
resource "azurerm_monitor_autoscale_setting" "web_app" {
name = "autoscale-web-app"
resource_group_name = azurerm_resource_group.app.name
location = azurerm_resource_group.app.location
target_resource_id = azurerm_linux_virtual_machine_scale_set.web_app.id
enabled = true
# Business Hours Profile (weekdays 8 AM - 7 PM)
profile {
name = "business-hours"
capacity {
minimum = 3
maximum = 10
default = 3
}
recurrence {
timezone = "Eastern Standard Time"
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
hours = [8]
minutes = [0]
}
# Scale out when CPU > 70%
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_linux_virtual_machine_scale_set.web_app.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "GreaterThan"
threshold = 70
}
scale_action {
direction = "Increase"
type = "ChangeCount"
value = "1"
cooldown = "PT5M"
}
}
# Scale in when CPU < 25%
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_linux_virtual_machine_scale_set.web_app.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "LessThan"
threshold = 25
}
scale_action {
direction = "Decrease"
type = "ChangeCount"
value = "1"
cooldown = "PT5M"
}
}
}
# Off Hours Profile (weekday evenings)
profile {
name = "off-hours-minimum"
capacity {
minimum = 1 # Never zero! At least 1 instance always running
maximum = 3 # Cap maximum to prevent accidental scale-out
default = 1 # Scale down to 1 instance
}
recurrence {
timezone = "Eastern Standard Time"
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
hours = [19]
minutes = [0]
}
# Still allow scale-out if needed (late-night traffic spike)
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_linux_virtual_machine_scale_set.web_app.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
scale_action {
direction = "Increase"
type = "ChangeCount"
value = "1"
cooldown = "PT10M"
}
}
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_linux_virtual_machine_scale_set.web_app.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "LessThan"
threshold = 20
}
scale_action {
direction = "Decrease"
type = "ChangeCount"
value = "1"
cooldown = "PT10M"
}
}
}
# Weekend Profile
profile {
name = "weekend-minimum"
capacity {
minimum = 1 # Never zero!
maximum = 2
default = 1
}
recurrence {
timezone = "Eastern Standard Time"
days = ["Saturday", "Sunday"]
hours = [0]
minutes = [0]
}
}
notification {
email {
send_to_subscription_administrator = false
send_to_subscription_co_administrator = false
custom_emails = ["finops@company.com"]
}
}
}
Key design decisions:
- Business hours: min 3, max 10 (full autoscaling)
- Off hours: min 1, max 3 (reduced but never zero)
- Weekends: min 1, max 2 (skeleton crew)
- Off-hours still allow scale-out if CPU spikes (for that late-night deployment) 🎯
⚡ Quick Audit: Find VMs Running at Full Power Without Schedules
# Find ALL running non-prod VMs without AutoSchedule tag
az vm list --show-details \
--query "[?powerState=='VM running' && tags.Environment!='prod' && tags.AutoSchedule==null].{
Name:name,
Size:hardwareProfile.vmSize,
RG:resourceGroup,
Environment:tags.Environment
}" --output table
# Check which VMSS have autoscale configured
az monitor autoscale list \
--query "[].{Name:name, Enabled:enabled, ResourceGroup:resourceGroup}" \
--output table
Every non-prod VM in that first list running at full size 24/7 is wasting 50-65% of its compute budget. 🔥
💡 Architect Pro Tips
Stay within the same VM family for resizing. Resizing from
Standard_D4s_v5toStandard_B2smay require a brief stop/start because they're different families. Resizing within the same family (D4s_v5 to D2s_v5) can sometimes happen without a reboot. The runbook handles both cases.B-series VMs are ideal for off-hours minimums. The B-series is "burstable," meaning they accumulate CPU credits when idle. A dev who logs in at midnight gets burst performance from accumulated credits. Perfect for occasional off-hours use.
Never set VMSS minimum to 0. Even if you think nobody will use it, a minimum of 1 ensures zero cold-start latency. Scaling from 0 to 1 takes minutes. Scaling from 1 to 3 takes seconds. That one instance is your insurance policy.
Disk costs don't change with VM resize. Managed disks are billed regardless of VM size or state. This strategy saves on compute (CPU/RAM) costs, not storage. Combine with disk optimization for additional savings.
Azure Automation pricing is nearly free. You get 500 minutes of free runbook execution per month. A resize job running twice daily for 20 VMs uses around 30 minutes total. The cost of the automation itself is negligible.
Test your resize path first. Before enabling schedules, manually resize one VM from your full-power SKU to your minimum SKU and back. Confirm your application handles the transition gracefully. Some apps may need a service restart script in the runbook.
📊 TL;DR
| Action | Savings | Availability Impact |
|---|---|---|
| Resize VMs to B2s off hours (Approach 1) | ~55% on non-prod compute | Brief reboot (~60s) during resize |
| VMSS schedule profiles (Approach 2) | ~50-65% on VMSS compute | Zero downtime (gradual scale) |
| Weekend minimum (both approaches) | Additional ~20% savings | Always-on at minimum capacity |
The savings math for a typical dev team:
| Schedule Strategy | Monthly Cost (10 VMs) | vs. 24/7 |
|---|---|---|
| 24/7 full power (no schedule) | $1,401 | Baseline |
| Scale to B2s off hours + weekends | $636 | Save $765/mo |
| Scale within same family (D2s_v5) | $784 | Save $617/mo |
Bottom line: Scaling to minimum during off hours captures 70-80% of the savings you'd get from full shutdown, with none of the availability problems. Your late-night developers, your weekend deployers, and your distributed teams across timezones all keep working. Deploy this alongside the tagging (Part 1) and budget alerts (Part 2) for a complete cost governance stack. 🌙
Run that audit command. Count your non-prod VMs without an AutoSchedule tag. Multiply each one by $75/month in potential savings. That's money you're leaving on the table tonight. 😏
This is Part 3 of the "Save on Azure with Terraform" series. Next up: Your Cloud Bill Has Ghosts 👻. Finding and destroying orphaned Azure resources that are quietly billing you every month. 💬
Top comments (0)