Suhas Mallesh

Posted on Mar 7

Your Deleted VMs in GCP Left Behind 500GB of Orphaned Disks (They're Still on Your Bill) 👻

#terraform #devops #cloud #gcp

When you delete a VM in GCP, the boot disk gets deleted. But attached persistent disks? They stay behind. Silently billing you $0.040/GB/month for pd-standard or $0.170/GB/month for pd-ssd. A single forgotten 500GB SSD disk costs $85/month, $1,020/year, for storage nobody uses. Multiply that across your team and you're burning thousands on ghost disks.

Here's how orphaned disks happen:

Developer creates a VM with an extra data disk
Developer deletes the VM (maybe via console, maybe via Terraform)
The data disk survives because auto_delete = false (the default for attached disks)
Nobody remembers it exists
It bills you every month forever

This is so common that studies show organizations waste 15-25% of their cloud storage spend on unattached disks alone. A medium-sized company can save $5,000-$15,000 annually just by cleaning them up.

Let's find them, clean them up, and prevent it from happening again.

📊 What Orphaned Resources Cost You

Resource Type	Pricing	500 GB Cost	10 Orphaned =
pd-standard (HDD)	$0.040/GB/month	$20/month	$200/month
pd-ssd	$0.170/GB/month	$85/month	$850/month
pd-balanced	$0.100/GB/month	$50/month	$500/month
Snapshot (standard)	$0.026/GB/month	$13/month	$130/month
Snapshot (archive)	$0.0026/GB/month	$1.30/month	$13/month

The disks are the expensive part. But old snapshots pile up too, especially if you have snapshot schedules that nobody monitors.

🔍 Step 1: Find All Orphaned Disks (5-Minute Audit)

Run this gcloud command to list every unattached disk in your project:

# Find all unattached disks with their size and cost
gcloud compute disks list \
  --filter="NOT users:*" \
  --format="table(
    name,
    zone.basename(),
    sizeGb,
    type.basename(),
    status,
    lastDetachTimestamp
  )" \
  --project=YOUR_PROJECT_ID

This returns something like:

NAME                    ZONE            SIZE_GB  TYPE         STATUS  LAST_DETACH
old-api-data            us-central1-a   500      pd-ssd       READY   2025-08-15
staging-worker-disk     us-central1-b   200      pd-standard  READY   2025-11-02
experiment-ml-data      us-east1-b      1000     pd-ssd       READY   2025-06-20
temp-migration-disk     us-west1-a      100      pd-balanced  READY   (never)

That's 1,800 GB of orphaned storage costing roughly $220/month in this example.

For a quick cost estimate across your project:

# Calculate total cost of orphaned disks
gcloud compute disks list \
  --filter="NOT users:*" \
  --format="csv[no-heading](sizeGb,type.basename())" \
  --project=YOUR_PROJECT_ID | \
awk -F',' '{
  if ($2 == "pd-ssd") cost = $1 * 0.170;
  else if ($2 == "pd-balanced") cost = $1 * 0.100;
  else cost = $1 * 0.040;
  total += cost;
  printf "%-12s %6d GB  $%.2f/month\n", $2, $1, cost
} END { printf "\nTotal orphaned disk cost: $%.2f/month ($%.2f/year)\n", total, total*12 }'

🧹 Step 2: Find Old Snapshots

Snapshots are cheaper per GB but they accumulate fast, especially with automated schedules:

# Find snapshots older than 90 days
gcloud compute snapshots list \
  --filter="creationTimestamp < $(date -u -d '90 days ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --format="table(
    name,
    diskSizeGb,
    storageBytes.yesno(yes='Active', no=''),
    creationTimestamp.date(),
    sourceDisk.basename()
  )" \
  --project=YOUR_PROJECT_ID

⚠️ Don't blindly delete snapshots. Some might be your only backup of critical data. Always verify the source disk still exists and has recent backups before deleting old snapshots.

🔧 Step 3: Prevent Orphaned Disks in Terraform

The root cause is auto_delete = false on attached disks. Fix this in your Terraform configs:

resource "google_compute_instance" "app" {
  name         = "app-server"
  machine_type = "e2-custom-2-5120"
  zone         = var.zone

  # Boot disk - auto_delete is true by default (good!)
  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-12"
      size  = 20
    }
  }

  # Attached disk - explicitly set auto_delete = true
  attached_disk {
    source      = google_compute_disk.data.id
    device_name = "data-disk"
  }

  labels = local.common_labels

  # This lifecycle block prevents the VM from being recreated
  # when disk changes occur
  lifecycle {
    ignore_changes = [attached_disk]
  }
}

resource "google_compute_disk" "data" {
  name = "app-data-disk"
  type = "pd-ssd"
  zone = var.zone
  size = 50

  labels = merge(local.common_labels, {
    attached-to = "app-server"
    purpose     = "application-data"
  })
}

⚠️ Gotcha: The auto_delete flag in the attached_disk block is NOT the same as in the boot_disk block. For attached_disk, you need to set it via the google_compute_attached_disk resource or through the GCP API. Terraform's attached_disk block doesn't directly support auto_delete. The safest approach is to manage disk lifecycle through Terraform state and labels.

The labeling trick: Always add an attached-to label to your disks. This makes it trivial to find orphans later, because any disk with an attached-to label that points to a non-existent VM is an orphan.

🤖 Step 4: Automated Orphan Detection with Cloud Function

Deploy a Cloud Function that runs weekly, finds orphaned disks, and alerts your team on Slack:

# Cloud Scheduler triggers weekly orphan scan
resource "google_cloud_scheduler_job" "orphan_scan" {
  name     = "weekly-orphan-disk-scan"
  schedule = "0 9 * * MON"  # Every Monday at 9 AM
  time_zone = "America/Los_Angeles"

  http_target {
    http_method = "POST"
    uri         = google_cloudfunctions2_function.orphan_detector.url

    oidc_token {
      service_account_email = var.scheduler_sa_email
    }
  }
}

The Cloud Function logic:

from google.cloud import compute_v1

def find_orphaned_disks(project_id):
    """Find all unattached disks and calculate waste."""
    client = compute_v1.DisksClient()
    orphans = []
    total_cost = 0

    # List disks across all zones
    request = compute_v1.AggregatedListDisksRequest(project=project_id)
    for zone, response in client.aggregated_list(request=request):
        if response.disks:
            for disk in response.disks:
                # No users = unattached = orphan
                if not disk.users:
                    size_gb = disk.size_gb
                    disk_type = disk.type_.split("/")[-1]

                    if "pd-ssd" in disk_type:
                        monthly_cost = size_gb * 0.170
                    elif "pd-balanced" in disk_type:
                        monthly_cost = size_gb * 0.100
                    else:
                        monthly_cost = size_gb * 0.040

                    total_cost += monthly_cost
                    orphans.append({
                        "name": disk.name,
                        "zone": zone.split("/")[-1],
                        "size_gb": size_gb,
                        "type": disk_type,
                        "monthly_cost": round(monthly_cost, 2),
                        "last_detach": getattr(
                            disk, "last_detach_timestamp", "never"
                        )
                    })

    return orphans, round(total_cost, 2)

This posts a Slack message every Monday like:

🪦 Orphaned Disk Report - 2026-02-27
Found 8 unattached disks costing $340/month ($4,080/year)

Top offenders:
  experiment-ml-data    1000 GB pd-ssd     $170.00/mo (detached 8 months ago)
  old-api-data           500 GB pd-ssd      $85.00/mo (detached 6 months ago)
  staging-worker-disk    200 GB pd-standard  $8.00/mo (detached 4 months ago)

Action: Review and delete, or snapshot and delete.

📸 Step 5: Snapshot Schedule with Automatic Cleanup

If you need snapshots for backups, set a schedule that automatically cleans up old ones:

resource "google_compute_resource_policy" "daily_backup" {
  name   = "daily-backup-7day-retention"
  region = var.region

  snapshot_schedule_policy {
    schedule {
      daily_schedule {
        days_in_cycle = 1
        start_time    = "04:00"  # 4 AM UTC
      }
    }

    retention_policy {
      max_retention_days    = 7      # Keep only 7 days of snapshots
      on_source_disk_delete = "KEEP_AUTO_SNAPSHOTS"
    }

    snapshot_properties {
      labels = merge(local.common_labels, {
        snapshot-type = "automated-daily"
      })
      storage_locations = [var.region]
    }
  }
}

# Attach policy to disks that need backups
resource "google_compute_disk_resource_policy_attachment" "backup" {
  for_each = toset(var.disks_to_backup)

  name = google_compute_resource_policy.daily_backup.name
  disk = each.value
  zone = var.zone
}

Without a retention policy, snapshot schedules create snapshots forever. A daily schedule on a 500 GB disk generates ~365 snapshots/year at $0.026/GB each. That's $4,745/year in snapshot costs alone for a single disk. A 7-day retention policy cuts that to $91/year. ✅

🏢 Terraform State Audit: Find Forgotten Resources

Terraform state itself can reveal orphans. Resources in state but not in your .tf files, or resources that exist in GCP but not in state:

# List all disks in Terraform state
terraform state list | grep "google_compute_disk"

# Compare with actual GCP disks
gcloud compute disks list --format="value(name)" --project=YOUR_PROJECT_ID

# Find disks in GCP that aren't in Terraform state (potential orphans)
comm -23 \
  <(gcloud compute disks list --format="value(name)" --project=YOUR_PROJECT_ID | sort) \
  <(terraform state list | grep "google_compute_disk" | \
    xargs -I{} terraform state show {} 2>/dev/null | \
    grep "name" | awk -F'"' '{print $2}' | sort) \
  | head -20

Any disk that exists in GCP but NOT in Terraform state is either manually created or orphaned from a deleted Terraform resource. Both deserve investigation.

💡 Quick Reference: What to Do First

Action	Effort	Savings
Run orphaned disk audit (gcloud)	5 min	Identifies all waste immediately
Delete confirmed orphaned disks	10 min	15-25% of disk storage costs
Add `attached-to` labels to all disks	15 min	Makes future orphans easy to find
Set up snapshot retention policies	10 min	Prevents snapshot cost explosion
Deploy weekly orphan scan Cloud Function	30 min	Catches new orphans automatically
Audit Terraform state vs GCP resources	15 min	Finds drift and forgotten resources

Start with the audit. Run that gcloud compute disks list --filter="NOT users:*" command right now. You will find orphans. Everyone does. 🎯

📊 TL;DR

Orphaned disks             = unattached disks billing you silently
Default behavior           = attached disks survive VM deletion
pd-ssd at 500 GB           = $85/month wasted per orphan
Snapshots without retention = accumulate forever, cost adds up fast
gcloud filter "NOT users:*" = finds all unattached disks instantly
attached-to label          = tracks which VM a disk belongs to
Snapshot retention policy   = auto-delete old snapshots (use 7-14 days)
Weekly Cloud Function scan  = catches new orphans before they age
Terraform state audit       = finds resources not managed by IaC
15-25% storage savings      = typical from cleaning up orphans

Bottom line: Every team that has ever deleted a VM has orphaned disks. They're invisible in daily operations but visible on every monthly bill. Five minutes of auditing finds them, and one gcloud compute disks delete cleans them up. Do it today. 💀

Run `gcloud compute disks list --filter="NOT users:"` on your project right now. I guarantee you'll find at least one disk from a VM that was deleted months ago. It's been on your bill this whole time.* 😀

Found this helpful? Follow for more GCP cost optimization with Terraform! 💬

DEV Community