Your Dev Environment Runs at Full Capacity 24/7 But Your Team Works 8 Hours: Scale It Down and Save 65% ⏰

#terraform #devops #cloud #gcp

You're running 10 staging VMs around the clock. Your developers work 9-to-5, Monday to Friday. That's 128 hours every week at full capacity for nothing. But shutting down completely is risky since some devs work late or across timezones. The fix? Scale to minimum during off hours. Always on, never full price.

Let's do the math. A team running 10 x e2-standard-4 instances in staging pays $970/month. But only 2-3 developers might work outside business hours. You don't need 10 instances at 2 AM. You need one or two.

Scale from 10 instances to 2 during off hours and you save $630/month on staging alone. Nobody gets locked out, the late-night coder still has a working environment, and your bill drops by 65%.

The key principle: never scale to zero, always keep a minimum running.

📊 The Scaling Math

Scenario	Peak Hours (40hr/wk)	Off Hours (128hr/wk)	Monthly Cost (e2-standard-4)	vs 24/7
24/7 full capacity (10 VMs)	10 instances	10 instances	~$970	baseline
Scale to min 2 off-hours	10 instances	2 instances	~$350	64% saved
Scale to min 1 off-hours	10 instances	1 instance	~$295	70% saved
Shut down (DON'T do this)	10 instances	0 instances	~$230	Devs locked out 🚫

Scaling to minimum gives you almost the same savings as a full shutdown without the risk of blocking anyone.

🔧 Step 1: MIG Autoscaling with Scheduled Scaling (Best Approach)

Managed Instance Groups (MIGs) support scheduled autoscaling natively. This is the cleanest solution: run more instances during business hours, scale to a minimum during off hours, never hit zero.

# Instance template for your dev/staging workers
resource "google_compute_instance_template" "staging_worker" {
  name_prefix  = "staging-worker-"
  machine_type = "e2-standard-4"
  region       = var.region

  disk {
    source_image = "debian-cloud/debian-12"
    auto_delete  = true
    boot         = true
    disk_size_gb = 20
  }

  network_interface {
    network    = var.network_id
    subnetwork = var.subnet_id
  }

  labels = merge(local.common_labels, {
    schedule = "business-hours-scaling"
  })

  lifecycle {
    create_before_destroy = true
  }
}

# Regional MIG
resource "google_compute_region_instance_group_manager" "staging" {
  name               = "staging-worker-mig"
  base_instance_name = "staging-worker"
  region             = var.region

  version {
    instance_template = google_compute_instance_template.staging_worker.id
  }

  # This gets overridden by the autoscaler, but sets the initial size
  target_size = var.peak_instance_count
}

Now add the autoscaler with time-based scaling:

resource "google_compute_region_autoscaler" "staging" {
  name   = "staging-scheduled-scaler"
  region = var.region
  target = google_compute_region_instance_group_manager.staging.id

  autoscaling_policy {
    min_replicas    = var.off_hours_min_instances  # e.g., 2
    max_replicas    = var.peak_instance_count       # e.g., 10
    cooldown_period = 60

    # CPU-based scaling during the day
    cpu_utilization {
      target = 0.6
    }

    # Scale UP for business hours
    scaling_schedules {
      name                  = "weekday-peak"
      min_required_replicas = var.peak_instance_count  # e.g., 10
      schedule              = "0 8 * * MON-FRI"
      time_zone             = "America/Los_Angeles"
      duration_sec          = 36000  # 10 hours (8am - 6pm)
      description           = "Full capacity during business hours"
    }

    # Weekend: keep minimum alive
    scaling_schedules {
      name                  = "weekend-minimum"
      min_required_replicas = var.off_hours_min_instances  # e.g., 2
      schedule              = "0 0 * * SAT"
      time_zone             = "America/Los_Angeles"
      duration_sec          = 172800  # 48 hours (all weekend)
      description           = "Minimum capacity on weekends"
    }
  }
}

variable "peak_instance_count" {
  type        = number
  default     = 10
  description = "Number of instances during business hours"
}

variable "off_hours_min_instances" {
  type        = number
  default     = 2
  description = "Minimum instances during off hours (never 0!)"

  validation {
    condition     = var.off_hours_min_instances >= 1
    error_message = "Minimum instances must be at least 1. Never scale to zero."
  }
}

⚠️ Key detail: The scaling_schedules block sets a minimum floor for that time window. Outside scheduled windows, the autoscaler falls back to min_replicas. So your off-hours minimum is controlled by min_replicas, and your peak is controlled by the schedule's min_required_replicas. Both are always >= 1.

🌍 Step 2: Multi-Timezone Team Schedules

Your US team works 9-5 PST. Your India team works 9-5 IST. The overlap is small, so you need staggered peak windows instead of one flat schedule:

resource "google_compute_region_autoscaler" "global_team" {
  name   = "global-team-scaler"
  region = var.region
  target = google_compute_region_instance_group_manager.staging.id

  autoscaling_policy {
    min_replicas    = 2  # Always keep 2 alive
    max_replicas    = 15
    cooldown_period = 60

    cpu_utilization {
      target = 0.6
    }

    # India team: 9am-9pm IST (Mon-Fri)
    scaling_schedules {
      name                  = "india-peak"
      min_required_replicas = 8
      schedule              = "0 9 * * MON-FRI"
      time_zone             = "Asia/Kolkata"
      duration_sec          = 43200  # 12 hours
      description           = "India team business hours"
    }

    # US team: 8am-6pm PST (Mon-Fri)
    scaling_schedules {
      name                  = "us-peak"
      min_required_replicas = 10
      schedule              = "0 8 * * MON-FRI"
      time_zone             = "America/Los_Angeles"
      duration_sec          = 36000  # 10 hours
      description           = "US team business hours"
    }

    # Weekend skeleton crew
    scaling_schedules {
      name                  = "weekend-min"
      min_required_replicas = 2
      schedule              = "0 0 * * SAT"
      time_zone             = "UTC"
      duration_sec          = 172800
      description           = "Weekend minimum"
    }
  }
}

When schedules overlap, the autoscaler uses the highest min_required_replicas value. So during the US+India overlap window, you get the US peak of 10. Outside both windows, you fall back to the base min_replicas of 2.

🗄️ Step 3: Scale Down Cloud SQL (Without Stopping It)

Cloud SQL can't be added to a MIG, but you can downsize the tier during off hours instead of stopping the instance. This keeps it available while cutting costs:

# Cloud Scheduler to downsize Cloud SQL at night
resource "google_cloud_scheduler_job" "sql_downsize" {
  name        = "downsize-staging-sql"
  schedule    = "0 19 * * MON-FRI"
  time_zone   = "America/Los_Angeles"

  http_target {
    http_method = "PATCH"
    uri         = "https://sqladmin.googleapis.com/v1/projects/${var.project_id}/instances/${var.sql_instance_name}"
    body = base64encode(jsonencode({
      settings = {
        tier = "db-custom-1-3840"  # Scale down to 1 vCPU, 3.75GB
      }
    }))
    oauth_token {
      service_account_email = var.scheduler_sa_email
    }
  }
}

# Cloud Scheduler to upsize Cloud SQL in the morning
resource "google_cloud_scheduler_job" "sql_upsize" {
  name        = "upsize-staging-sql"
  schedule    = "0 7 * * MON-FRI"
  time_zone   = "America/Los_Angeles"

  http_target {
    http_method = "PATCH"
    uri         = "https://sqladmin.googleapis.com/v1/projects/${var.project_id}/instances/${var.sql_instance_name}"
    body = base64encode(jsonencode({
      settings = {
        tier = "db-custom-4-16384"  # Scale up to 4 vCPU, 16GB
      }
    }))
    oauth_token {
      service_account_email = var.scheduler_sa_email
    }
  }
}

⚠️ Gotcha: Cloud SQL tier changes cause a brief restart (1-3 minutes). Schedule the upsize a few minutes before your team arrives (e.g., 7:50 AM) so it's ready by the time they log in. The downsize at night won't affect anyone.

Cost comparison for Cloud SQL:

Tier	vCPUs	RAM	Monthly Cost
db-custom-4-16384 (peak)	4	16 GB	~$200
db-custom-1-3840 (off-hours)	1	3.75 GB	~$50
Blended (40hr peak + 128hr min)			~$86

That's a 57% savings on Cloud SQL while keeping it available 24/7. ✅

🏗️ Step 4: GKE Node Pool Scaling

For GKE clusters, use node pool autoscaling with a minimum that never hits zero:

resource "google_container_node_pool" "staging_pool" {
  name     = "staging-pool"
  cluster  = google_container_cluster.staging.id
  location = var.region

  # Always keep at least 1 node per zone
  autoscaling {
    min_node_count = 1   # Never zero!
    max_node_count = 5
  }

  node_config {
    machine_type = "e2-standard-4"
    labels = merge(local.common_labels, {
      pool-type = "staging"
    })
  }
}

For more aggressive off-hours scaling, combine this with a CronJob inside the cluster that adjusts deployments replica counts:

# k8s CronJob to scale down staging deployments at 7pm
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-staging
spec:
  schedule: "0 19 * * MON-FRI"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: scaler
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=1 -n staging
          restartPolicy: OnFailure

With fewer pods running, the GKE cluster autoscaler will drain and remove excess nodes. But min_node_count = 1 ensures at least one node stays up.

💡 Quick Reference: What to Scale First

Resource	Method	Off-Hours Strategy	Effort	Savings
VM fleet (MIG)	Scheduled autoscaler	10 instances -> 2	15 min	60-65%
Cloud SQL	Cloud Scheduler tier change	4 vCPU -> 1 vCPU	10 min	50-57%
GKE node pool	Node pool autoscaling	5 nodes -> 1 node	10 min	60-70%
GPU instances	Scheduled autoscaler	4 GPUs -> 1 GPU	10 min	65-75%

Start with your MIG. Scheduled autoscaling is native, requires no Cloud Functions, and delivers the biggest savings. 🎯

📊 TL;DR

Never shut down dev        = some devs work late, off-hours, weekends
Never scale to zero        = always keep minimum instances alive
MIG scheduled scaling      = native GCP, best for VM fleets
scaling_schedules          = sets min floor per time window
Off-hours min_replicas     = 1-2 instances (validate >= 1 in Terraform)
Cloud SQL tier downsizing  = scale vCPUs down, don't stop the instance
GKE min_node_count >= 1    = cluster stays available, excess nodes drain
Multi-timezone             = stagger schedules, highest min wins overlap
GPU instances              = scale down FIRST (biggest $$$ impact)

Bottom line: You don't have to choose between saving money and being available. Scale to minimum during off hours, keep the lights on for the night owls, and stop paying full price for 128 empty hours every week. 💤

Check your staging MIG right now. If it's running the same instance count at 3 AM as it does at 3 PM, you're burning 65% of that budget for nothing. One autoscaler with a scaling schedule fixes it today. 😀

Found this helpful? Follow for more GCP cost optimization with Terraform! 💬