DEV Community

Abraham Naiborhu
Abraham Naiborhu

Posted on

Terraforming a Production-Lite GCP Web Platform: MIG, Cloud NAT, Load Balancer, and Private Backends

Hi! After building my first Terraform artifact, The GCP Foundation Lite, I wanted to move one layer higher.

I wanted to answer a more complex question:

Can I provision infrastructure for an actual web platform?

Thus, I created this project titled "Production-Lite GCP Web Platform" using Terraform.

This project provisions:

  • custom VPC network
  • app subnet
  • reserved database subnet
  • map-based firewall rules
  • Cloud NAT
  • custom service account
  • regional Managed Instance Group
  • instance template
  • startup script
  • HTTP health check
  • backend service
  • external HTTP load balancer
  • remote Terraform state
  • reusable Terraform modules

The goal was not to create a full enterprise platform, but to create a small but production-shaped infra pattern that can run an application.

But, before i go even further, do check my github repository at terraform-gcp-production-lite-web-platform

Why I Built This

In my first artifact, I focused on the foundation layer.

That project included:

  • remote Terraform state
  • versioned GCS state bucket
  • custom VPC
  • role-based subnets
  • firewall rules
  • service accounts
  • IAM bindings
  • reusable modules

That was useful because it helped me understand how to create the base layer of a Google Cloud environment.

But a foundation alone does not run an application.

So for this second artifact, I wanted to build something closer to a real web platform.

The objective was to move from this:

Terraform creates the foundation.
Enter fullscreen mode Exit fullscreen mode

To this:

Terraform provisions infrastructure that can serve application traffic.
Enter fullscreen mode Exit fullscreen mode

What This Project Builds

This project creates:

  • custom VPC network
  • application subnet
  • reserved database subnet
  • firewall rules
  • Cloud Router
  • Cloud NAT
  • custom application service account
  • instance template
  • regional Managed Instance Group
  • HTTP health check
  • backend service
  • external HTTP load balancer
  • global forwarding rule
  • startup script
  • simple web application endpoint
  • remote state in GCS

The application VMs are private.

They do not have external IP addresses.

Users access the application through the external HTTP load balancer.

Outbound internet access from the private VMs is handled through Cloud NAT.

High-Level Architecture

The high-level architecture is:

User
  ↓
External HTTP Load Balancer
  ↓
Backend Service
  ↓
Regional Managed Instance Group
  ↓
Private Application VM
  ↓
Application Endpoint
Enter fullscreen mode Exit fullscreen mode

For outbound access:

Private Application VM
  ↓
Cloud NAT
  ↓
Internet
Enter fullscreen mode Exit fullscreen mode

The important point is this:

Inbound traffic enters through the load balancer.
Outbound traffic leaves through Cloud NAT.
The backend VM does not need a public IP address.
Enter fullscreen mode Exit fullscreen mode

Architecture Diagram

User / Browser
      |
      v
External HTTP Load Balancer
      |
      v
Target HTTP Proxy
      |
      v
URL Map
      |
      v
Backend Service
      |
      v
Regional Managed Instance Group
      |
      v
Private App VM(s)
      |
      v
Application running from startup script

Private App VM(s)
      |
      v
Cloud NAT
      |
      v
Outbound Internet
Enter fullscreen mode Exit fullscreen mode

What Production-Lite Means

For this project, production-lite means the infrastructure follows production-style patterns without trying to become a full enterprise platform.

This project includes:

private backend instances
load balancer entry point
health checks
Managed Instance Group
Cloud NAT
service account separation
firewall rules
remote state
Terraform modules
Enter fullscreen mode Exit fullscreen mode

But this version does not include:

HTTPS
custom domain
Cloud Armor
Cloud SQL
Secret Manager
CI/CD
multi-region deployment
blue-green deployment
Kubernetes
Enter fullscreen mode Exit fullscreen mode

Those features are important, but I intentionally deferred them.

The goal of v1.0 is to keep the scope focused:

Can I provision a production-shaped HTTP web platform with private backends?
Enter fullscreen mode Exit fullscreen mode

Why Private Backend Instances Matter

One of the most important design choices in this project is that the backend VMs do not have external IP addresses.

In the instance template, the network interface does not include an access_config block.

Conceptually, the pattern is:

network_interface {
  subnetwork = var.subnetwork_self_link

  # No access_config block.
  # This intentionally creates VMs without external IP addresses.
}
Enter fullscreen mode Exit fullscreen mode

This means the VM is not directly exposed to the internet.

Instead, application traffic must go through the load balancer.

That is a better pattern than exposing a VM directly with a public IP address.

A direct public VM might be fine for a quick test, but for an application platform, I want a cleaner entry point:

Internet
  -> Load Balancer
  -> Backend Service
  -> Private VM
Enter fullscreen mode Exit fullscreen mode

Why Cloud NAT Is Needed

Because the backend VMs are private, they do not have direct outbound internet access through an external IP.

However, the VM still needs outbound access for operational tasks such as:

  • running apt-get update
  • installing packages
  • downloading dependencies
  • calling external services
  • bootstrapping the application during startup

This is where Cloud NAT is useful.

Cloud NAT allows private resources to initiate outbound internet connections without assigning public IP addresses to those resources.

In this project, Cloud NAT is created using:

  • Cloud Router
  • Cloud NAT gateway
  • selected subnet configuration

The important improvement from my earlier lab is that I do not need to NAT every subnet.

For this artifact, the app subnet receives NAT.

The reserved database subnet does not receive NAT by default.

This is the intended posture:

app subnet -> Cloud NAT enabled
db subnet  -> Cloud NAT not enabled by default
Enter fullscreen mode Exit fullscreen mode

That separation is small, but architecturally meaningful.

Why I Used a Managed Instance Group

A standalone VM is simpler.

But a standalone VM does not really show platform thinking.

For this artifact, I used a regional Managed Instance Group.

The MIG uses:

  • instance template
  • target size
  • named port
  • autohealing policy
  • health check

The difference is important.

A standalone VM says:

I created a server.
Enter fullscreen mode Exit fullscreen mode

A Managed Instance Group says:

I defined how application instances should be created, managed, replaced, and checked.
Enter fullscreen mode Exit fullscreen mode

That is a stronger infrastructure pattern.

Why I Used an External HTTP Load Balancer

The external HTTP load balancer acts as the public entry point.

The load balancer connects to the backend service, and the backend service connects to the Managed Instance Group.

The request flow is:

User
  -> Global forwarding rule
  -> Target HTTP proxy
  -> URL map
  -> Backend service
  -> Managed Instance Group
  -> Application VM
Enter fullscreen mode Exit fullscreen mode

This is more realistic than opening port 80 directly on a public VM.

It also allows me to test important infrastructure concepts:

  • backend services
  • health checks
  • named ports
  • firewall rules for health check probes
  • load-balanced application access

Why Health Checks Matter

The load balancer needs to know whether the backend instances are healthy.

For that, I use an HTTP health check.

The application exposes:

GET /
GET /healthz
GET /metadata
Enter fullscreen mode Exit fullscreen mode

The /healthz endpoint is used by the health check.

This is better than using / as the health check path because / is usually a user-facing route, while /healthz is explicitly meant for machine health checking.

The expected response is simple:

ok
Enter fullscreen mode Exit fullscreen mode

A simple health endpoint is enough for this version.

The goal is not to build a complex application.

The goal is to prove that the infrastructure can route traffic to a healthy backend.

Application Endpoints

The sample application exposes three endpoints.

Endpoint Purpose
/ Root endpoint
/healthz Health check endpoint
/metadata Instance information endpoint

The root endpoint returns:

Hi from Terraform GCP Production-Lite Platform
Enter fullscreen mode Exit fullscreen mode

The health endpoint returns:

ok
Enter fullscreen mode Exit fullscreen mode

The metadata endpoint returns information such as:

{
  "service": "terraform-gcp-production-lite-web-platform",
  "environment": "dev",
  "version": "1.0.0",
  "hostname": "dev-web-mig-xxxx"
}
Enter fullscreen mode Exit fullscreen mode

The application is intentionally small.

Terraform and infrastructure design are the focus.

Repository Structure

The repository structure is:

terraform-gcp-production-lite-web-platform/
├── README.md
├── .gitignore
├── versions.tf
├── providers.tf
├── backend.tf.example
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars.example
├── locals.tf
├── docs/
│   ├── architecture.md
│   ├── deployment-runbook.md
│   ├── operations-runbook.md
│   ├── verification.md
│   ├── design-decisions.md
│   └── version-roadmap.md
├── scripts/
│   └── startup.sh
└── modules/
    ├── network/
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    ├── iam/
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    ├── nat/
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    ├── compute/
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    └── load-balancer/
        ├── main.tf
        ├── variables.tf
        └── outputs.tf
Enter fullscreen mode Exit fullscreen mode

There are five main modules:

network
iam
nat
compute
load-balancer
Enter fullscreen mode Exit fullscreen mode

Each module has a specific responsibility.

Module Responsibilities

Module Responsibility
network VPC, subnets, firewall rules
iam Service accounts and IAM role bindings
nat Cloud Router and Cloud NAT
compute Instance template and Managed Instance Group
load-balancer HTTP load balancer resources

The root module connects them together.

The root module should answer:

What components make up this platform?
How are those components connected?
Enter fullscreen mode Exit fullscreen mode

The child modules should answer:

How is each infrastructure area implemented?
Enter fullscreen mode Exit fullscreen mode

This separation makes the repository easier to read.

Root Module Composition

The root main.tf composes the platform.

At a high level, the order is:

network
iam
nat
health check
compute
load balancer
IAP IAM bindings
Enter fullscreen mode Exit fullscreen mode

This order makes sense because:

Compute needs network and IAM.
NAT needs network.
The load balancer needs the instance group.
IAP access needs IAM and firewall rules.
Enter fullscreen mode Exit fullscreen mode

The root module should not contain all low-level resources.

It should orchestrate modules.

Network Module

The network module creates:

  • VPC
  • subnets
  • firewall rules

The VPC is created as a custom mode VPC:

resource "google_compute_network" "this" {
  name                    = local.final_network_name
  auto_create_subnetworks = false
  routing_mode            = "REGIONAL"
}
Enter fullscreen mode Exit fullscreen mode

I use:

auto_create_subnetworks = false
Enter fullscreen mode Exit fullscreen mode

because I want explicit control over subnet ranges.

This is cleaner than relying on automatically created subnets.

Map-Based Subnets

Instead of hardcoding each subnet as a separate resource, I define subnets as a map.

Example:

subnets = {
  app = {
    cidr_range            = "10.80.1.0/24"
    private_google_access = true
    role                  = "application"
  }

  db = {
    cidr_range            = "10.80.2.0/24"
    private_google_access = true
    role                  = "database-reserved"
  }
}
Enter fullscreen mode Exit fullscreen mode

The network module then creates subnets using for_each.

Conceptually:

resource "google_compute_subnetwork" "subnets" {
  for_each = var.subnets

  name                     = "${var.environment}-${each.key}-subnet"
  region                   = coalesce(each.value.region, var.region)
  network                  = google_compute_network.this.id
  ip_cidr_range            = each.value.cidr_range
  private_ip_google_access = each.value.private_google_access
}
Enter fullscreen mode Exit fullscreen mode

This is more flexible than writing:

resource "google_compute_subnetwork" "app" {
  ...
}

resource "google_compute_subnetwork" "db" {
  ...
}
Enter fullscreen mode Exit fullscreen mode

If I want to add another subnet later, I can add another map entry.

For example:

cache = {
  cidr_range = "10.80.3.0/24"
  role       = "cache"
}
Enter fullscreen mode Exit fullscreen mode

The module does not need to change.

Map-Based Firewall Rules

The network module also creates firewall rules from a map.

Example:

firewall_rules = {
  allow-lb-health-check = {
    description   = "Allow Google Cloud load balancer health checks and proxy traffic."
    source_ranges = ["35.191.0.0/16", "130.211.0.0/22"]
    target_tags   = ["web-backend"]

    allow = [
      {
        protocol = "tcp"
        ports    = ["80"]
      }
    ]
  }

  allow-iap-ssh = {
    description   = "Allow SSH to private backend instances through IAP."
    source_ranges = ["35.235.240.0/20"]
    target_tags   = ["web-backend"]

    allow = [
      {
        protocol = "tcp"
        ports    = ["22"]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

The module creates firewall rules using for_each and dynamic allow blocks.

Conceptually:

resource "google_compute_firewall" "ingress_rules" {
  for_each = var.firewall_rules

  name          = "${var.environment}-${each.key}"
  network       = google_compute_network.this.name
  description   = each.value.description
  direction     = "INGRESS"
  source_ranges = each.value.source_ranges
  target_tags   = each.value.target_tags

  dynamic "allow" {
    for_each = each.value.allow

    content {
      protocol = allow.value.protocol
      ports    = allow.value.ports
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

I prefer this pattern because traffic policy becomes data-driven.

The module does not need to know every possible firewall rule.

It only needs to know how to create firewall rules from structured input.

Firewall Rules Used

For this version, I use three main firewall rules:

Rule Purpose
allow-lb-health-check Allows Google load balancer and health check traffic to backend VMs
allow-iap-ssh Allows SSH through IAP TCP forwarding
allow-internal Allows internal traffic inside the platform CIDR

The important security decision is that I do not open SSH to:

0.0.0.0/0
Enter fullscreen mode Exit fullscreen mode

Instead, SSH access is designed around IAP.

IAM Module

The IAM module creates service accounts.

For this artifact, the main service account is the application VM service account.

Example input:

service_accounts = {
  app = {
    account_id    = "dev-prod-lite-app-sa"
    display_name  = "Production Lite App Service Account"
    description   = "Service account used by private application VM instances."
    project_roles = [
      "roles/logging.logWriter",
      "roles/monitoring.metricWriter"
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

The service account is attached to the application instances.

This is cleaner than using the default Compute Engine service account.

It also makes the identity of the workload explicit.

NAT Module

The NAT module creates:

  • Cloud Router
  • Cloud NAT

In my earlier lab, Cloud NAT was applied to all subnets.

For this artifact, I wanted a more intentional design.

So NAT is applied only to selected subnet keys.

Example:

nat_subnet_keys = ["app"]
Enter fullscreen mode Exit fullscreen mode

That means:

app subnet gets outbound internet through Cloud NAT
db subnet does not get NAT by default
Enter fullscreen mode Exit fullscreen mode

This is a small design improvement, but it shows better network intent.

The app tier needs outbound access for package installation and application operations.

The reserved database tier should be more restricted.

Compute Module

The compute module creates:

  • instance template
  • regional Managed Instance Group
  • named port
  • autohealing policy

The instance template defines:

  • machine type
  • boot disk
  • network interface
  • startup script
  • service account
  • network tags

The important part is the network interface.

network_interface {
  subnetwork = var.subnetwork_self_link

  # No access_config block.
}
Enter fullscreen mode Exit fullscreen mode

This keeps the VM private.

The MIG then uses the instance template:

resource "google_compute_region_instance_group_manager" "this" {
  name               = "${var.environment}-${var.mig_name}"
  region             = var.region
  base_instance_name = "${var.environment}-${var.mig_name}"
  target_size        = var.target_size

  version {
    instance_template = google_compute_instance_template.this.self_link
  }

  named_port {
    name = "http"
    port = var.app_port
  }

  auto_healing_policies {
    health_check      = var.health_check_self_link
    initial_delay_sec = 120
  }
}
Enter fullscreen mode Exit fullscreen mode

The named port is important because the backend service uses it to send traffic to the correct backend port.

Load Balancer Module

The load balancer module creates:

  • global IP address
  • backend service
  • URL map
  • target HTTP proxy
  • global forwarding rule

The backend service connects the load balancer to the MIG.

Conceptually:

resource "google_compute_backend_service" "this" {
  name                  = "${var.environment}-${var.lb_name}-backend"
  protocol              = "HTTP"
  port_name             = "http"
  load_balancing_scheme = "EXTERNAL_MANAGED"
  timeout_sec           = 30

  health_checks = [
    var.health_check_self_link
  ]

  backend {
    group           = var.backend_instance_group
    balancing_mode  = "UTILIZATION"
    capacity_scaler = 1.0
  }
}
Enter fullscreen mode Exit fullscreen mode

The load balancer then exposes the service through a global forwarding rule.

For v1.0, I use HTTP on port 80.

HTTPS will come later in v1.1.

Startup Script

The startup script bootstraps the application when the VM starts.

The script installs dependencies, creates the application files, and starts the service.

A simplified version of the application behavior is:

GET /         -> returns a simple message
GET /healthz -> returns ok
GET /metadata -> returns hostname and version information
Enter fullscreen mode Exit fullscreen mode

The important operational improvement is that the application should run as a systemd service.

That is better than running a background process with &.

With systemd, I can check:

sudo systemctl status prod-lite-app
Enter fullscreen mode Exit fullscreen mode

And view logs:

sudo journalctl -u prod-lite-app --no-pager -n 50
Enter fullscreen mode Exit fullscreen mode

Remote State

Like the first artifact, this project uses a GCS backend for Terraform state.

Example backend:

terraform {
  backend "gcs" {
    bucket = "YOUR_TERRAFORM_STATE_BUCKET"
    prefix = "terraform-gcp-production-lite-web-platform/v1"
  }
}
Enter fullscreen mode Exit fullscreen mode

The state path becomes something like:

gs://YOUR_TERRAFORM_STATE_BUCKET/terraform-gcp-production-lite-web-platform/v1/default.tfstate
Enter fullscreen mode Exit fullscreen mode

Using remote state is important because this project is no longer a tiny one-file local experiment.

It has multiple modules and multiple cloud resources.

Remote state gives the project a more realistic workflow.

Git Safety

I do not commit real .tfvars files.

The repository includes:

terraform.tfvars.example
Enter fullscreen mode Exit fullscreen mode

But ignores:

terraform.tfvars
Enter fullscreen mode Exit fullscreen mode

The .gitignore includes:

.terraform/
*.tfstate
*.tfstate.*
*.tfvars
*.tfplan
crash.log
.DS_Store
Enter fullscreen mode Exit fullscreen mode

This avoids committing local values, real project IDs, or state files.

Running the Project

The execution flow is:

1. Configure backend
2. Configure terraform.tfvars
3. Run terraform fmt
4. Run terraform init
5. Run terraform validate
6. Run terraform plan
7. Run terraform apply
8. Verify the load balancer and backend health
Enter fullscreen mode Exit fullscreen mode

Configure Backend

cp backend.tf.example backend.tf
Enter fullscreen mode Exit fullscreen mode

Then edit the bucket name.

Configure Variables

cp terraform.tfvars.example terraform.tfvars
Enter fullscreen mode Exit fullscreen mode

Then edit:

project_id      = "your-gcp-project-id"
admin_principal = "user:your-email@example.com"
Enter fullscreen mode Exit fullscreen mode

Run Terraform

terraform fmt -recursive
terraform init
terraform validate
terraform plan
terraform apply
Enter fullscreen mode Exit fullscreen mode

Expected Output

After deployment, Terraform should output values such as:

network_name
subnets
firewall_rules
cloud_nat_name
cloud_router_name
health_check_name
mig_name
mig_instance_group
load_balancer_ip
load_balancer_url
curl_test_command
curl_health_check_command
platform_summary
Enter fullscreen mode Exit fullscreen mode

The most important output is:

load_balancer_url
Enter fullscreen mode Exit fullscreen mode

That URL is used to test the application.

Verification

After deployment, I can test the root endpoint:

curl -i http://LOAD_BALANCER_IP
Enter fullscreen mode Exit fullscreen mode

Expected response:

Hi from Terraform GCP Production-Lite Platform
Enter fullscreen mode Exit fullscreen mode

Test the health endpoint:

curl -i http://LOAD_BALANCER_IP/healthz
Enter fullscreen mode Exit fullscreen mode

Expected response:

ok
Enter fullscreen mode Exit fullscreen mode

Test the metadata endpoint:

curl -i http://LOAD_BALANCER_IP/metadata
Enter fullscreen mode Exit fullscreen mode

Expected response:

{
  "service": "terraform-gcp-production-lite-web-platform",
  "environment": "dev",
  "version": "1.0.0",
  "hostname": "dev-web-mig-xxxx"
}
Enter fullscreen mode Exit fullscreen mode

Verifying the Infrastructure

Verify the VPC:

gcloud compute networks list
Enter fullscreen mode Exit fullscreen mode

Verify the subnets:

gcloud compute networks subnets list
Enter fullscreen mode Exit fullscreen mode

Verify firewall rules:

gcloud compute firewall-rules list
Enter fullscreen mode Exit fullscreen mode

Verify Cloud NAT:

gcloud compute routers nats list \
  --router=dev-nat-router \
  --region=asia-southeast2
Enter fullscreen mode Exit fullscreen mode

Verify the Managed Instance Group:

gcloud compute instance-groups managed list
Enter fullscreen mode Exit fullscreen mode

Verify backend health:

gcloud compute backend-services get-health BACKEND_SERVICE_NAME --global
Enter fullscreen mode Exit fullscreen mode

Verify the application VM does not have an external IP:

gcloud compute instances list
Enter fullscreen mode Exit fullscreen mode

This is one of the most important checks.

The application should be reachable through the load balancer, not directly through a public VM IP.

Accessing the Private VM Through IAP

Since the VM has no external IP, direct SSH is not available.

Instead, I use IAP TCP forwarding.

Example:

gcloud compute ssh INSTANCE_NAME \
  --zone=asia-southeast2-a \
  --tunnel-through-iap
Enter fullscreen mode Exit fullscreen mode

This requires:

  • IAM permission for IAP tunnel access
  • OS Login role
  • firewall rule allowing IAP source range
  • correct target tag on the VM

The firewall source range for IAP TCP forwarding is:

35.235.240.0/20
Enter fullscreen mode Exit fullscreen mode

This is better than opening SSH to the public internet.

Troubleshooting Notes

Load Balancer Returns 502

Possible causes:

  • application failed to start
  • health check path is wrong
  • firewall rule does not allow health check probes
  • target tag mismatch
  • named port mismatch
  • backend service points to the wrong instance group

Useful commands:

gcloud compute backend-services get-health BACKEND_SERVICE_NAME --global
Enter fullscreen mode Exit fullscreen mode
sudo systemctl status prod-lite-app
Enter fullscreen mode Exit fullscreen mode
sudo journalctl -u prod-lite-app --no-pager -n 100
Enter fullscreen mode Exit fullscreen mode

VM Cannot Install Packages

Possible cause:

Cloud NAT is not configured correctly.
Enter fullscreen mode Exit fullscreen mode

Useful check:

curl https://example.com
Enter fullscreen mode Exit fullscreen mode

Run that from inside the VM.

IAP SSH Fails

Possible causes:

  • missing IAP role
  • missing OS Login role
  • missing service account user role
  • missing IAP firewall rule
  • wrong VM network tag

Check firewall rules:

gcloud compute firewall-rules list --filter="name~iap"
Enter fullscreen mode Exit fullscreen mode

Important Design Decisions

1. Use private backend VMs

The backend application instances do not have external IP addresses.

This reduces direct exposure and forces traffic through the load balancer.

2. Use Cloud NAT

The private VMs still need outbound access.

Cloud NAT allows that without assigning public IPs to the VMs.

3. Use map-based subnet definitions

Subnets are defined through a map so the network module stays reusable.

Adding another subnet should not require another resource block.

4. Use map-based firewall rules

Firewall rules are also defined through a map.

This keeps ingress policy data-driven and easier to extend.

5. Use a Managed Instance Group

The application tier is managed as a group based on an instance template.

This is more production-shaped than a standalone VM.

6. Use HTTP only in v1.0

HTTPS is intentionally deferred to v1.1.

The v1.0 goal is to prove:

private backends
Cloud NAT
MIG
health checks
HTTP load balancing
firewall rules
remote state
Enter fullscreen mode Exit fullscreen mode

7. Reserve a DB subnet without provisioning a database

The DB subnet demonstrates tiered network design.

A database is not provisioned in v1.0 because this artifact focuses on the web platform infrastructure.

What I Learned

This artifact helped me understand that application infrastructure is not only about creating a VM.

A proper platform needs several parts to work together:

networking
identity
firewall rules
NAT
compute lifecycle
health checks
load balancing
state management
documentation
Enter fullscreen mode Exit fullscreen mode

The most interesting part for me was seeing how small configuration details affect the whole platform.

For example:

  • if the firewall source range is wrong, the load balancer cannot reach the backend
  • if the health check path is wrong, the backend becomes unhealthy
  • if Cloud NAT is missing, private VMs may fail during startup
  • if the MIG named port does not match the backend service, traffic may not route correctly
  • if SSH is opened to 0.0.0.0/0, the design becomes weaker

This project made me appreciate that infrastructure is a system.

Each component has to be designed with the others in mind.

What I Intentionally Did Not Add

I intentionally did not add HTTPS in this version.

I also did not add:

  • Cloud Armor
  • Cloud SQL
  • Secret Manager
  • CI/CD
  • custom domain
  • blue-green deployment
  • autoscaling policy
  • multi-region deployment

Those are useful, but I want this artifact to stay focused.

The purpose of v1.0 is to build the core web platform first.

Next Step

The next version will be:

v1.1 — HTTPS and Custom Domain
Enter fullscreen mode Exit fullscreen mode

That version should add:

  • Google-managed SSL certificate
  • custom domain
  • HTTPS target proxy
  • global forwarding rule on port 443
  • optional HTTP-to-HTTPS redirect

After that, I want to continue with:

v1.2 — Security Hardening
v2.0 — Terraform CI/CD with GitHub Actions and Workload Identity Federation
v2.1 — Drift, Import, and State Recovery
v3.0 — Database and Secrets
Enter fullscreen mode Exit fullscreen mode

References

Terraform GCS backend:
https://developer.hashicorp.com/terraform/language/backend/gcs

Google Cloud — Store Terraform state in Cloud Storage:
https://cloud.google.com/docs/terraform/resource-management/store-state

Google Cloud — External Application Load Balancer overview:
https://cloud.google.com/load-balancing/docs/https

Google Cloud — Terraform examples for external Application Load Balancers:
https://cloud.google.com/load-balancing/docs/https/ext-http-lb-tf-module-examples

Google Cloud — Cloud NAT overview:
https://cloud.google.com/nat/docs/overview

Google Cloud — IAP TCP forwarding:
https://cloud.google.com/iap/docs/using-tcp-forwarding
Enter fullscreen mode Exit fullscreen mode

Top comments (0)