That D8s_v5 running at 12% CPU is costing you 4x what you need. Here's how to use Azure Advisor data, build a workload-to-SKU mapping module in Terraform, and stop over-provisioning VMs across every environment.
An audit of 40 Azure VMs across three environments reveals: average CPU utilization is 11%, average memory usage is 23%. Half the fleet is running D4s_v5 (4 vCPU, 16 GB) when a B2s (2 vCPU, 4 GB) would handle the workload fine. The overspend: $2,100/month. Annual waste: $25,200 - from just 40 VMs. 📏
Here's the pricing reality for common Azure VM sizes (Linux, East US, pay-as-you-go):
Standard_D8s_v5 8 vCPU, 32 GB $0.384/hr $280/month
Standard_D4s_v5 4 vCPU, 16 GB $0.192/hr $140/month
Standard_D2s_v5 2 vCPU, 8 GB $0.096/hr $70/month
Standard_B2s 2 vCPU, 4 GB $0.042/hr $31/month
Standard_B2ms 2 vCPU, 8 GB $0.083/hr $61/month
A single VM running D4s_v5 at 12% CPU wastes roughly $109/month compared to a B2ms that could handle the same load. Multiply that across a fleet and you're looking at serious money.
The problem? Most teams pick a VM size during initial deployment and never revisit it. "It works, don't touch it." Meanwhile, the workload settled at 10% CPU months ago and nobody noticed. Let's fix that with data and automation. ⚡
🎯 Step 1: Find the Over-Provisioned VMs
Azure Advisor: Your Free Right-Sizing Scout
Azure Advisor monitors VM utilization for 7 days (configurable) and flags underutilized VMs. The default threshold is 5% CPU, but you can set it to 5%, 10%, 15%, or 20%.
# Get Advisor right-sizing recommendations
az advisor recommendation list \
--category Cost \
--query "[?shortDescription.problem=='Right-size or shutdown underutilized virtual machines'].{
VM:resourceMetadata.resourceId,
CurrentSKU:extendedProperties.currentSku,
RecommendedSKU:extendedProperties.targetSku,
AnnualSavings:extendedProperties.annualSavingsAmount
}" --output table
This gives you an instant hit list of VMs that Advisor has identified as oversized, along with specific SKU recommendations and dollar savings.
Azure Monitor: The 30-Day Deep Dive
Advisor only looks at 7 days by default. For a more accurate picture, check 30+ days of metrics to account for weekly cycles and monthly peaks:
# Check average CPU over last 30 days for a specific VM
az monitor metrics list \
--resource "/subscriptions/<SUB_ID>/resourceGroups/<RG>/providers/Microsoft.Compute/virtualMachines/<VM_NAME>" \
--metric "Percentage CPU" \
--interval PT1H \
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--aggregation Average Maximum \
--query "value[0].timeseries[0].data[].{Time:timeStamp, AvgCPU:average, MaxCPU:maximum}" \
--output table
# Quick scan: find all VMs and their sizes
az vm list \
--query "[].{Name:name, Size:hardwareProfile.vmSize, RG:resourceGroup}" \
--output table
The right-sizing decision matrix:
| Avg CPU | Peak CPU | Avg Memory | Action |
|---|---|---|---|
| <5% | <20% | <20% | Downsize 2 tiers or switch to B-series |
| 5-15% | <40% | <40% | Downsize 1 tier |
| 15-40% | <70% | <70% | Likely right-sized, monitor |
| >40% | >80% | >80% | Consider upsizing |
🏗️ Step 2: The Workload-to-SKU Mapping Module
Instead of letting every team pick VM sizes ad-hoc, build a Terraform module that maps workload types and environments to pre-approved, cost-optimized SKUs:
# modules/vm-rightsized/variables.tf
variable "workload_type" {
type = string
description = "Type of workload: web, api, worker, database, ci_runner, monitoring"
validation {
condition = contains(
["web", "api", "worker", "database", "ci_runner", "monitoring"],
var.workload_type
)
error_message = "Must be: web, api, worker, database, ci_runner, or monitoring."
}
}
variable "environment" {
type = string
description = "Environment: dev, staging, prod"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be: dev, staging, or prod."
}
}
variable "size_override" {
type = string
default = null
description = "Override the auto-selected SKU. Requires justification tag."
}
variable "name" { type = string }
variable "resource_group_name" { type = string }
variable "location" { type = string }
variable "subnet_id" { type = string }
variable "admin_username" { type = string default = "azureadmin" }
variable "admin_ssh_key" { type = string }
variable "tags" { type = map(string) default = {} }
# modules/vm-rightsized/sku_map.tf
locals {
# ────────────────────────────────────────────
# The SKU Decision Matrix
# Maps workload type + environment to optimal VM size
# ────────────────────────────────────────────
sku_map = {
web = {
dev = "Standard_B2s" # 2 vCPU, 4 GB - $31/mo
staging = "Standard_B2ms" # 2 vCPU, 8 GB - $61/mo
prod = "Standard_D2s_v5" # 2 vCPU, 8 GB - $70/mo
}
api = {
dev = "Standard_B2ms" # 2 vCPU, 8 GB - $61/mo
staging = "Standard_D2s_v5" # 2 vCPU, 8 GB - $70/mo
prod = "Standard_D4s_v5" # 4 vCPU, 16 GB - $140/mo
}
worker = {
dev = "Standard_B2s" # 2 vCPU, 4 GB - $31/mo
staging = "Standard_D2s_v5" # 2 vCPU, 8 GB - $70/mo
prod = "Standard_D4s_v5" # 4 vCPU, 16 GB - $140/mo
}
database = {
dev = "Standard_B2ms" # 2 vCPU, 8 GB - $61/mo
staging = "Standard_E2s_v5" # 2 vCPU, 16 GB - $126/mo
prod = "Standard_E4s_v5" # 4 vCPU, 32 GB - $252/mo
}
ci_runner = {
dev = "Standard_B2s" # 2 vCPU, 4 GB - $31/mo
staging = "Standard_B2s" # 2 vCPU, 4 GB - $31/mo
prod = "Standard_F4s_v2" # 4 vCPU, 8 GB - $170/mo
}
monitoring = {
dev = "Standard_B2s" # 2 vCPU, 4 GB - $31/mo
staging = "Standard_B2s" # 2 vCPU, 4 GB - $31/mo
prod = "Standard_D2s_v5" # 2 vCPU, 8 GB - $70/mo
}
}
# Override or use the mapped SKU
selected_sku = coalesce(var.size_override, local.sku_map[var.workload_type][var.environment])
}
# modules/vm-rightsized/main.tf
resource "azurerm_network_interface" "this" {
name = "${var.name}-nic"
location = var.location
resource_group_name = var.resource_group_name
ip_configuration {
name = "internal"
subnet_id = var.subnet_id
private_ip_address_allocation = "Dynamic"
}
tags = var.tags
}
resource "azurerm_linux_virtual_machine" "this" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
size = local.selected_sku
admin_username = var.admin_username
network_interface_ids = [azurerm_network_interface.this.id]
admin_ssh_key {
username = var.admin_username
public_key = var.admin_ssh_key
}
os_disk {
caching = "ReadWrite"
storage_account_type = var.environment == "prod" ? "Premium_LRS" : "StandardSSD_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts-gen2"
version = "latest"
}
tags = merge(var.tags, {
WorkloadType = var.workload_type
Environment = var.environment
VMSize = local.selected_sku
SizeOverride = var.size_override != null ? "true" : "false"
ManagedBy = "terraform"
})
}
output "vm_id" { value = azurerm_linux_virtual_machine.this.id }
output "selected_sku" { value = local.selected_sku }
output "private_ip" { value = azurerm_network_interface.this.private_ip_address }
Usage:
# Dev API server: automatically gets B2ms ($61/mo)
module "api_dev" {
source = "./modules/vm-rightsized"
name = "vm-api-dev-01"
resource_group_name = azurerm_resource_group.dev.name
location = azurerm_resource_group.dev.location
subnet_id = azurerm_subnet.dev.id
admin_ssh_key = file("~/.ssh/id_rsa.pub")
workload_type = "api"
environment = "dev"
tags = { CostCenter = "CC-1042", Team = "backend" }
}
# Prod database: automatically gets E4s_v5 ($252/mo)
module "db_prod" {
source = "./modules/vm-rightsized"
name = "vm-db-prod-01"
resource_group_name = azurerm_resource_group.prod.name
location = azurerm_resource_group.prod.location
subnet_id = azurerm_subnet.prod.id
admin_ssh_key = file("~/.ssh/id_rsa.pub")
workload_type = "database"
environment = "prod"
tags = { CostCenter = "CC-1042", Team = "data" }
}
# Need a bigger SKU? Override with justification
module "api_prod_heavy" {
source = "./modules/vm-rightsized"
name = "vm-api-prod-02"
resource_group_name = azurerm_resource_group.prod.name
location = azurerm_resource_group.prod.location
subnet_id = azurerm_subnet.prod.id
admin_ssh_key = file("~/.ssh/id_rsa.pub")
workload_type = "api"
environment = "prod"
size_override = "Standard_D8s_v5" # Override visible in tags
tags = {
CostCenter = "CC-1042"
Team = "backend"
OverrideReason = "Black Friday traffic spike handling"
}
}
What this gives you:
- Consistent sizing across environments, no more "I picked D8s_v5 because it was the default"
- Dev/staging always cheaper than prod by design
-
Override path with visibility (tagged as
SizeOverride = truefor audit) - B-series for dev saves 55-75% compared to D-series equivalents
🔍 Step 3: Azure Policy Guard Rails
Prevent expensive SKUs in non-production subscriptions with Azure Policy via Terraform:
# Deny large VM SKUs in dev/staging subscriptions
resource "azurerm_subscription_policy_assignment" "restrict_vm_sizes" {
name = "restrict-vm-sizes-nonprod"
subscription_id = data.azurerm_subscription.nonprod.id
policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/cccc23c7-8427-4f53-ad12-b6a63eb452b3"
display_name = "Restrict VM sizes in non-production"
parameters = jsonencode({
listOfAllowedSKUs = {
value = [
"Standard_B2s",
"Standard_B2ms",
"Standard_B4ms",
"Standard_D2s_v5",
"Standard_D2as_v5",
"Standard_E2s_v5"
]
}
})
}
Now if someone tries to deploy a D16s_v5 in the dev subscription, Azure blocks it at the ARM layer before Terraform even finishes applying. 🚫
⚡ Quick Audit: Find Your Biggest Savings Right Now
# One-liner: list all VMs sorted by size (biggest spenders first)
az vm list -d \
--query "sort_by([].{
Name:name,
Size:hardwareProfile.vmSize,
RG:resourceGroup,
State:powerState
}, &Size)" \
--output table
# Get Advisor cost recommendations with savings amounts
az advisor recommendation list \
--category Cost \
--query "[?shortDescription.problem contains 'underutilized'].{
Resource:shortDescription.solution,
Savings:extendedProperties.annualSavingsAmount
}" --output table
Run these, sort by savings amount, and start with the top 5. That's your quick win list. 🎯
💡 Architect Pro Tips
B-series is your dev/staging workhorse. B-series VMs accumulate CPU credits when idle and burst when needed. A B2ms at $61/month handles most dev workloads that would otherwise run on a D4s_v5 at $140/month. That's a 56% savings per VM.
D-series for steady production, E-series for memory-hungry workloads. D-series gives you 4 GB per vCPU (balanced). E-series gives you 8 GB per vCPU (memory-optimized). Running a database on D-series when it needs memory? You're paying for extra CPU you don't use. Switch to E-series: fewer vCPUs, more RAM, often cheaper for memory workloads.
F-series for compute-only work. CI/CD runners, batch processing, and compute-heavy tasks. F-series gives you a higher CPU-to-memory ratio at a lower price than D-series when memory isn't a concern.
AMD (Das/Eas) variants are 5-10% cheaper. The
ainD2as_v5means AMD processor. For most workloads, AMD and Intel perform identically. The AMD variants are slightly cheaper. Easy savings if you're not locked to Intel.Resizing within the same family usually requires no reboot. Going from D4s_v5 to D2s_v5 is a quick stop/start. Cross-family changes (D-series to B-series) always require a stop/start. Plan accordingly.
Tag your overrides. The module above tags VMs with
SizeOverride = truewhen someone overrides the recommended SKU. This makes it easy to audit which VMs are running larger than recommended and why.Review quarterly. Workloads change. A VM that needed D4s_v5 six months ago might be running at 8% CPU today after an optimization. Make right-sizing a quarterly process, not a one-time event.
📊 TL;DR
| Workload | Dev SKU | Prod SKU | Dev Monthly | Prod Monthly |
|---|---|---|---|---|
| Web server | B2s | D2s_v5 | $31 | $70 |
| API server | B2ms | D4s_v5 | $61 | $140 |
| Worker | B2s | D4s_v5 | $31 | $140 |
| Database | B2ms | E4s_v5 | $61 | $252 |
| CI runner | B2s | F4s_v2 | $31 | $170 |
| Monitoring | B2s | D2s_v5 | $31 | $70 |
Before right-sizing (all D4s_v5 across 6 VMs per env):
Dev: 6 x $140 = $840/month | Prod: 6 x $140 = $840/month
After right-sizing (mapped SKUs):
Dev: $246/month | Prod: $842/month
Dev savings: $594/month = $7,128/year from just 6 VMs. Prod stays appropriately sized for each workload type, with databases getting memory-optimized E-series instead of overpaying for CPU on D-series. 💰
Run the Advisor audit command. Find your VMs running at <15% CPU. Calculate the savings if you dropped them one tier. That number will get your manager's attention. 😀
This is Part 7 of the "Save on Azure with Terraform" series. Next up: Spot the Savings 🎯. Running non-critical workloads on Azure Spot VMs with up to 90% savings. 💬
Top comments (0)