You're running 10 staging VMs around the clock. Your developers work 9-to-5, Monday to Friday. That's 128 hours every week at full capacity for nothing. But shutting down completely is risky since some devs work late or across timezones. The fix? Scale to minimum during off hours. Always on, never full price.
Let's do the math. A team running 10 x e2-standard-4 instances in staging pays $970/month. But only 2-3 developers might work outside business hours. You don't need 10 instances at 2 AM. You need one or two.
Scale from 10 instances to 2 during off hours and you save $630/month on staging alone. Nobody gets locked out, the late-night coder still has a working environment, and your bill drops by 65%.
The key principle: never scale to zero, always keep a minimum running.
📊 The Scaling Math
| Scenario | Peak Hours (40hr/wk) | Off Hours (128hr/wk) | Monthly Cost (e2-standard-4) | vs 24/7 |
|---|---|---|---|---|
| 24/7 full capacity (10 VMs) | 10 instances | 10 instances | ~$970 | baseline |
| Scale to min 2 off-hours | 10 instances | 2 instances | ~$350 | 64% saved |
| Scale to min 1 off-hours | 10 instances | 1 instance | ~$295 | 70% saved |
| Shut down (DON'T do this) | 10 instances | 0 instances | ~$230 | Devs locked out 🚫 |
Scaling to minimum gives you almost the same savings as a full shutdown without the risk of blocking anyone.
🔧 Step 1: MIG Autoscaling with Scheduled Scaling (Best Approach)
Managed Instance Groups (MIGs) support scheduled autoscaling natively. This is the cleanest solution: run more instances during business hours, scale to a minimum during off hours, never hit zero.
# Instance template for your dev/staging workers
resource "google_compute_instance_template" "staging_worker" {
name_prefix = "staging-worker-"
machine_type = "e2-standard-4"
region = var.region
disk {
source_image = "debian-cloud/debian-12"
auto_delete = true
boot = true
disk_size_gb = 20
}
network_interface {
network = var.network_id
subnetwork = var.subnet_id
}
labels = merge(local.common_labels, {
schedule = "business-hours-scaling"
})
lifecycle {
create_before_destroy = true
}
}
# Regional MIG
resource "google_compute_region_instance_group_manager" "staging" {
name = "staging-worker-mig"
base_instance_name = "staging-worker"
region = var.region
version {
instance_template = google_compute_instance_template.staging_worker.id
}
# This gets overridden by the autoscaler, but sets the initial size
target_size = var.peak_instance_count
}
Now add the autoscaler with time-based scaling:
resource "google_compute_region_autoscaler" "staging" {
name = "staging-scheduled-scaler"
region = var.region
target = google_compute_region_instance_group_manager.staging.id
autoscaling_policy {
min_replicas = var.off_hours_min_instances # e.g., 2
max_replicas = var.peak_instance_count # e.g., 10
cooldown_period = 60
# CPU-based scaling during the day
cpu_utilization {
target = 0.6
}
# Scale UP for business hours
scaling_schedules {
name = "weekday-peak"
min_required_replicas = var.peak_instance_count # e.g., 10
schedule = "0 8 * * MON-FRI"
time_zone = "America/Los_Angeles"
duration_sec = 36000 # 10 hours (8am - 6pm)
description = "Full capacity during business hours"
}
# Weekend: keep minimum alive
scaling_schedules {
name = "weekend-minimum"
min_required_replicas = var.off_hours_min_instances # e.g., 2
schedule = "0 0 * * SAT"
time_zone = "America/Los_Angeles"
duration_sec = 172800 # 48 hours (all weekend)
description = "Minimum capacity on weekends"
}
}
}
variable "peak_instance_count" {
type = number
default = 10
description = "Number of instances during business hours"
}
variable "off_hours_min_instances" {
type = number
default = 2
description = "Minimum instances during off hours (never 0!)"
validation {
condition = var.off_hours_min_instances >= 1
error_message = "Minimum instances must be at least 1. Never scale to zero."
}
}
⚠️ Key detail: The
scaling_schedulesblock sets a minimum floor for that time window. Outside scheduled windows, the autoscaler falls back tomin_replicas. So your off-hours minimum is controlled bymin_replicas, and your peak is controlled by the schedule'smin_required_replicas. Both are always >= 1.
🌍 Step 2: Multi-Timezone Team Schedules
Your US team works 9-5 PST. Your India team works 9-5 IST. The overlap is small, so you need staggered peak windows instead of one flat schedule:
resource "google_compute_region_autoscaler" "global_team" {
name = "global-team-scaler"
region = var.region
target = google_compute_region_instance_group_manager.staging.id
autoscaling_policy {
min_replicas = 2 # Always keep 2 alive
max_replicas = 15
cooldown_period = 60
cpu_utilization {
target = 0.6
}
# India team: 9am-9pm IST (Mon-Fri)
scaling_schedules {
name = "india-peak"
min_required_replicas = 8
schedule = "0 9 * * MON-FRI"
time_zone = "Asia/Kolkata"
duration_sec = 43200 # 12 hours
description = "India team business hours"
}
# US team: 8am-6pm PST (Mon-Fri)
scaling_schedules {
name = "us-peak"
min_required_replicas = 10
schedule = "0 8 * * MON-FRI"
time_zone = "America/Los_Angeles"
duration_sec = 36000 # 10 hours
description = "US team business hours"
}
# Weekend skeleton crew
scaling_schedules {
name = "weekend-min"
min_required_replicas = 2
schedule = "0 0 * * SAT"
time_zone = "UTC"
duration_sec = 172800
description = "Weekend minimum"
}
}
}
When schedules overlap, the autoscaler uses the highest min_required_replicas value. So during the US+India overlap window, you get the US peak of 10. Outside both windows, you fall back to the base min_replicas of 2.
🗄️ Step 3: Scale Down Cloud SQL (Without Stopping It)
Cloud SQL can't be added to a MIG, but you can downsize the tier during off hours instead of stopping the instance. This keeps it available while cutting costs:
# Cloud Scheduler to downsize Cloud SQL at night
resource "google_cloud_scheduler_job" "sql_downsize" {
name = "downsize-staging-sql"
schedule = "0 19 * * MON-FRI"
time_zone = "America/Los_Angeles"
http_target {
http_method = "PATCH"
uri = "https://sqladmin.googleapis.com/v1/projects/${var.project_id}/instances/${var.sql_instance_name}"
body = base64encode(jsonencode({
settings = {
tier = "db-custom-1-3840" # Scale down to 1 vCPU, 3.75GB
}
}))
oauth_token {
service_account_email = var.scheduler_sa_email
}
}
}
# Cloud Scheduler to upsize Cloud SQL in the morning
resource "google_cloud_scheduler_job" "sql_upsize" {
name = "upsize-staging-sql"
schedule = "0 7 * * MON-FRI"
time_zone = "America/Los_Angeles"
http_target {
http_method = "PATCH"
uri = "https://sqladmin.googleapis.com/v1/projects/${var.project_id}/instances/${var.sql_instance_name}"
body = base64encode(jsonencode({
settings = {
tier = "db-custom-4-16384" # Scale up to 4 vCPU, 16GB
}
}))
oauth_token {
service_account_email = var.scheduler_sa_email
}
}
}
⚠️ Gotcha: Cloud SQL tier changes cause a brief restart (1-3 minutes). Schedule the upsize a few minutes before your team arrives (e.g., 7:50 AM) so it's ready by the time they log in. The downsize at night won't affect anyone.
Cost comparison for Cloud SQL:
| Tier | vCPUs | RAM | Monthly Cost |
|---|---|---|---|
| db-custom-4-16384 (peak) | 4 | 16 GB | ~$200 |
| db-custom-1-3840 (off-hours) | 1 | 3.75 GB | ~$50 |
| Blended (40hr peak + 128hr min) | ~$86 |
That's a 57% savings on Cloud SQL while keeping it available 24/7. ✅
🏗️ Step 4: GKE Node Pool Scaling
For GKE clusters, use node pool autoscaling with a minimum that never hits zero:
resource "google_container_node_pool" "staging_pool" {
name = "staging-pool"
cluster = google_container_cluster.staging.id
location = var.region
# Always keep at least 1 node per zone
autoscaling {
min_node_count = 1 # Never zero!
max_node_count = 5
}
node_config {
machine_type = "e2-standard-4"
labels = merge(local.common_labels, {
pool-type = "staging"
})
}
}
For more aggressive off-hours scaling, combine this with a CronJob inside the cluster that adjusts deployments replica counts:
# k8s CronJob to scale down staging deployments at 7pm
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-staging
spec:
schedule: "0 19 * * MON-FRI"
jobTemplate:
spec:
template:
spec:
containers:
- name: scaler
image: bitnami/kubectl
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=1 -n staging
restartPolicy: OnFailure
With fewer pods running, the GKE cluster autoscaler will drain and remove excess nodes. But min_node_count = 1 ensures at least one node stays up.
💡 Quick Reference: What to Scale First
| Resource | Method | Off-Hours Strategy | Effort | Savings |
|---|---|---|---|---|
| VM fleet (MIG) | Scheduled autoscaler | 10 instances -> 2 | 15 min | 60-65% |
| Cloud SQL | Cloud Scheduler tier change | 4 vCPU -> 1 vCPU | 10 min | 50-57% |
| GKE node pool | Node pool autoscaling | 5 nodes -> 1 node | 10 min | 60-70% |
| GPU instances | Scheduled autoscaler | 4 GPUs -> 1 GPU | 10 min | 65-75% |
Start with your MIG. Scheduled autoscaling is native, requires no Cloud Functions, and delivers the biggest savings. 🎯
📊 TL;DR
Never shut down dev = some devs work late, off-hours, weekends
Never scale to zero = always keep minimum instances alive
MIG scheduled scaling = native GCP, best for VM fleets
scaling_schedules = sets min floor per time window
Off-hours min_replicas = 1-2 instances (validate >= 1 in Terraform)
Cloud SQL tier downsizing = scale vCPUs down, don't stop the instance
GKE min_node_count >= 1 = cluster stays available, excess nodes drain
Multi-timezone = stagger schedules, highest min wins overlap
GPU instances = scale down FIRST (biggest $$$ impact)
Bottom line: You don't have to choose between saving money and being available. Scale to minimum during off hours, keep the lights on for the night owls, and stop paying full price for 128 empty hours every week. 💤
Check your staging MIG right now. If it's running the same instance count at 3 AM as it does at 3 PM, you're burning 65% of that budget for nothing. One autoscaler with a scaling schedule fixes it today. 😀
Found this helpful? Follow for more GCP cost optimization with Terraform! 💬
Top comments (0)