Chabane R. for Onepoint x Stack Labs

Posted on Jan 31, 2021 • Edited on Apr 9, 2021

Implementing step by step the hub and spoke network topology in Google Cloud

#googlecloud #terraform #network #security

Network topology is one of the most critical points in the life of an organization on Google Cloud.

If you implement the wrong topology for your business, it will cost you a lot of money later for zero business-value.

Let's start by a simple use case. A customer wanted to keep it simple (or get a return on investment in a poc 😨) and created one Shared VPC for all workloads and environments. The Shared VPC is connected to an on-premise environment using Cloud VPN. The network architecture has been built like this:

One subnet per environment. 30 critical microservices are running in production and depend on sensitives data stored in a Cloud SQL database in addition to some stateful applications like Elasticsearch.

For security reasons, the customer now wants to isolate the production workload in a separate VPC. An external service provider was engaged to perform the migration. No downtime is accepted.

The service provider analyzed the current network architecture and proposed this migration plan:

Make an audit to prioritise workload and write the workload dependencies architectures,
Create a new GCP network project and one for production v2,
Initialize data replication between Cloud SQL Instance v1 and v2,
Initialize cross cluster replication between Elasticsearch v1 and v2,
Deploy a microservice in prod v2 by pointing to database v1 and elasticsearch v1,
Split the external traffic in Load Balancer v1 between microservice v1 and v2,
Repeat steps 5 & 6 for each microservice,
Deploy the stateful applications in prod v2,
Once all microservices are in prod v2, do the final delivery:
- Switching DNS,
- Promoting Cloud SQL instance v2 as master database,
- Pointing all microservices to Cloud SQL instance v2 and Elasticsearch v2.

Depending on the dependencies between the microservices, such migration may take several months and be very expensive.

No value for the business, no value for the end user. The customer will probably postpone the migration.

I met a customer with a more complex network topology where he placed his development environment in a European subnet and the production environment in a London subnet. For the constraints of the GDPR, he wanted to migrate the workloads from London to a European subnet. As this represented no value to their business, the budget that was required could not be justified to start the migration. But he will still have to do it later, whatever the price.

So the best network topology that you can implement in Google Cloud should be:

Scalable, secure and reliable,
Isolated as possible from public internet,
Designed to be connectable to a private network if you have or wish one in the future,
Met the unique requirements of your enterprise workloads [1],
Suited to the architecture patterns that you intend to apply,

In an hybrid cloud or multi-cloud architecture, the hub and spoke topology is the common network topology encouraged by cloud providers and network community.

What is the hub and spoke network topology ?

The spoke-hub distribution paradigm is a form of transport topology optimization in which traffic planners organize routes as a series of "spokes" that connect outlying points to a central "hub". Wikipedia [2]

Star Network [3]

In a multi-cloud or hybrid cloud architecture, a set of spoke VPC networks communicate with the external environment through a hub VPC network. The relevant routes are exported from the hub VPC network into the spoke VPC networks. [4]

In this post, we will implement the following architecture in Google Cloud. A Hub-and-spoke architecture with VPC peering and a segmentation based on environments:

Each spoke represents a larger network segment.
Spokes are isolated as VPC peering is non-transitive.
Within each spoke, the connectivity between workloads is separated with VPC Firewall rules.
The hub is a custom VPC and peered to the spokes, which are Shared VPCs.
Making use of Shared VPC helps keeping the design scalable and simple.
Spokes are connected to the hub with VPC peering to ensure low latency, and minimal management overhead.
The Hub VPC is connected with on-premise through a static VPN connection. It could be replaced by a dynamic VPN connection or an Interconnect.
The hub VPC is isolated from the public Internet with explicit VPC Firewall rules.
Network services are centrally administered for connectivity between spokes and on-premise.
To allow a path between spokes and on-premise, custom routes exchange is configured between the hub and each spoke.

The following resource hierarchy is used in this example:

We use Terraform to build the infrastructure and Gitlab CI to deploy it. Let's start by creating our hub.

Hub Network Project

Create a new project mycompany-network-hub:

gcloud projects create mycompany-network-hub
gcloud compute networks delete default

The following files create:

Custom VPC and a subnet.
VPN Tunnel.
Firewall rules to deny ingress and egress traffic.

repo-mycompany-network-hub/plan/project.tf

data "google_project" "hub" {
   project_id = "mycompany-network-hub"
}

repo-mycompany-network-hub/plan/vpc.tf

resource "google_compute_network" "hub" {
  name                    = "hub"
  auto_create_subnetworks = false
  project                 = data.google_project.hub.project_id
}

resource "google_compute_subnetwork" "hub-subnet" {
  name          = "hub-subnet"
  ip_cidr_range = var.hub_subnet_ip_range
  region        = var.region
  network       = google_compute_network.hub.id
  project       = data.google_project.hub.project_id
}

repo-mycompany-network-hub/plan/vpn.tf

resource "google_compute_vpn_tunnel" "tunnel" {
  name                    = "tunnel"
  peer_ip                 = var.on_premise_peer_ip
  shared_secret           = data.google_secret_manager_secret_version.vpn-shared-secret.secret_data
  project                 = data.google_project.hub.project_id
  ike_version             = 2
  remote_traffic_selector = [var.on_premise_network_ip_range]
  local_traffic_selector  = [var.hub_subnet_ip_range]
  target_vpn_gateway      = google_compute_vpn_gateway.target_gateway.id
  region                  = var.region

  depends_on = [
    google_compute_forwarding_rule.fr_esp,
    google_compute_forwarding_rule.fr_udp500,
    google_compute_forwarding_rule.fr_udp4500,
  ]
}

resource "google_compute_vpn_gateway" "target_gateway" {
  name    = "vpn"
  network = google_compute_network.hub.id
  project = data.google_project.hub.project_id
  region  = var.region
}

resource "google_compute_forwarding_rule" "fr_esp" {
  name        = "fr-esp"
  ip_protocol = "ESP"
  ip_address  = data.google_compute_address.vpn-static-ip.address
  target      = google_compute_vpn_gateway.target_gateway.id
  project     = data.google_project.hub.project_id
  region      = var.region
}

resource "google_compute_forwarding_rule" "fr_udp500" {
  name        = "fr-udp500"
  ip_protocol = "UDP"
  port_range  = "500"
  ip_address  = data.google_compute_address.vpn-static-ip.address
  target      = google_compute_vpn_gateway.target_gateway.id
  project     = data.google_project.hub.project_id
  region      = var.region
}

resource "google_compute_forwarding_rule" "fr_udp4500" {
  name        = "fr-udp4500"
  ip_protocol = "UDP"
  port_range  = "4500"
  ip_address  = data.google_compute_address.vpn-static-ip.address
  target      = google_compute_vpn_gateway.target_gateway.id
  project     = data.google_project.hub.project_id
  region      = var.region
}

resource "google_compute_route" "route" {
  name       = "route"
  network    = google_compute_network.hub.name
  project    = data.google_project.hub.project_id
  dest_range = var.on_premise_network_ip_range
  priority   = 1000

  next_hop_vpn_tunnel = google_compute_vpn_tunnel.tunnel.id
}

data "google_secret_manager_secret_version" "vpn-shared-secret" {
  project = data.google_project.hub.project_id
  secret  = "vpn-shared-secret"
}

data "google_compute_address" "vpn-static-ip" {
  project = data.google_project.hub.project_id
  name    = "vpn-static-ip"
  region  = var.region
}

Note: Classic VPN is deprecating certain functionality on October 31, 2021. For more information, see the Classic VPN partial deprecation page.

repo-mycompany-network-hub/plan/firewall.tf

resource "google_compute_firewall" "allow-ingress-traffic-from-vpn" {
  name        = "allow-ingress-traffic-to-vpn"
  network     = google_compute_network.hub.name
  project     = data.google_project.hub.project_id

  allow {
    protocol  = "tcp"
  }

  source_ranges = [var.on_premise_network_ip_range]
  priority    = 1000
  direction   = "INGRESS"
}

resource "google_compute_firewall" "allow-egress-traffic-to-vpn" {
  name        = "allow-egress-traffic-to-vpn"
  network     = google_compute_network.hub.name
  project     = data.google_project.hub.project_id

  allow {
    protocol  = "tcp"
  }

  destination_ranges = [var.on_premise_network_ip_range]
  priority    = 1000
  direction   = "EGRESS"
}

resource "google_compute_firewall" "deny-ingress-traffic-from-internet" {
  name    = "deny-all-ingress-traffic"
  network = google_compute_network.hub.name
  project = data.google_project.hub.project_id

  deny {
    protocol  = "all"
  }

  source_ranges = ["0.0.0.0/0"]
  priority      = 2000
  direction     = "INGRESS"
}

resource "google_compute_firewall" "deny-egress-traffic-to-internet" {
  name    = "deny-all-egress-traffic"
  network = google_compute_network.hub.name
  project = data.google_project.hub.project_id

  deny {
    protocol  = "all"
  }

  destination_ranges = ["0.0.0.0/0"]
  priority    = 2000
  direction   = "EGRESS"
}

repo-mycompany-network-hub/plan/backend.tf

terraform {
  backend "gcs" {
  }
}

repo-mycompany-network-hub/plan/provider.tf

terraform {
  required_version = ">= 0.12"

  required_providers {
    google = "~> 3.0"
  }
}

repo-mycompany-network-hub/plan/variables.tf

variable "hub_subnet_ip_range" {
  type    = string
} 

variable "region" {
  type = string
  default = "europe-west1"
}

variable "on_premise_network_ip_range" {
  type = string
}

variable "on_premise_peer_ip" {
  type = string
}

repo-mycompany-network-hub/plan/terraform.tfvars

hub_subnet_ip_range            = "<HUB_SUBNET_IP_RANGE>"
on_premise_peer_ip             = "<ON_PREMISE_PEER_IP>"
on_premise_network_ip_range    = "<ON_PREMISE_NETWORK_IP_RANGE>"

Spoke Network Project

Create a new project for each spoke:

gcloud projects create mycompany-network-spoke-nonprod
gcloud compute networks delete default

gcloud projects create mycompany-network-spoke-prod
gcloud compute networks delete default

The following files create:

Custom VPC, a subnet and a peering with the hub network.
Cloud NAT.

repo-mycompany-network-spokes/plan/project.tf

data "google_project" "spoke" {
   project_id = "mycompany-network-spoke-${var.env}"
}

data "google_project" "hub" {
   project_id = "mycompany-network-hub"
}

resource "google_compute_shared_vpc_host_project" "host" {
  project = data.google_project.spoke.project_id
}

repo-mycompany-network-spokes/plan/vpc.tf

resource "google_compute_network" "spoke" {
  name                    = "spoke"
  auto_create_subnetworks = false
  project                 = data.google_project.spoke.project_id
}

resource "google_compute_subnetwork" "spoke-subnet" {
  name          = "spoke-subnet"
  ip_cidr_range = var.spoke_subnet_ip_range
  region        = var.region
  network       = google_compute_network.spoke.id
  project       = data.google_project.spoke.project_id

  secondary_ip_range = [
  {
    range_name    = "pods"
    ip_cidr_range = var.spoke_subnet_pods_ip_range
  },
  {
    range_name    = "services"
    ip_cidr_range = var.spoke_subnet_services_ip_range
  }
]
}

resource "google_compute_network_peering" "spoke-to-hub" {
  name                 = "spoke-to-hub"
  network              = google_compute_network.spoke.id
  peer_network         = data.google_compute_network.hub.self_link

  export_custom_routes = true 
  import_custom_routes = true
}

# Could be moved to network hub tf
resource "google_compute_network_peering" "hub-to-spoke" {
  name                 = "hub-to-spoke"
  network              = data.google_compute_network.hub.self_link 
  peer_network         = google_compute_network.spoke.id

  export_custom_routes = true
  import_custom_routes = true
}

data "google_compute_network" "hub" {
  name    = "hub"
  project = data.google_project.hub.project_id
}

repo-mycompany-network-spokes/plan/nat.tf


resource "google_compute_router" "router" {
  name    = "router"
  region  = google_compute_subnetwork.spoke-subnet.region
  network = google_compute_network.spoke.id
  project = data.google_project.spoke.project_id

  bgp {
    asn = 64514
  }
}

resource "google_compute_router_nat" "nat" {
  name    = "nat"
  router  = google_compute_router.router.name
  region  = google_compute_router.router.region
  project = data.google_project.spoke.project_id

  nat_ip_allocate_option = "MANUAL_ONLY"
  nat_ips                = [data.google_compute_address.nat-static-ip1.self_link, data.google_compute_address.nat-static-ip2.self_link]

  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}

data "google_compute_address" "nat-static-ip1" {
  project = data.google_project.spoke.project_id
  name    = "nat-static-ip1"
  region  = var.region
}

data "google_compute_address" "nat-static-ip2" {
  project = data.google_project.spoke.project_id
  name    = "nat-static-ip2"
  region  = var.region
}

repo-mycompany-network-spokes/plan/backend.tf

terraform {
  backend "gcs" {
  }
}

repo-mycompany-network-spokes/plan/provider.tf

terraform {
  required_version = ">= 0.12"

  required_providers {
    google = "~> 3.0"
  }
}

repo-mycompany-network-spokes/plan/variables.tf

variable "region" {
  type = string
  default = "europe-west1"
}

variable "spoke_subnet_ip_range" {
  type = string
} 

variable "spoke_subnet_pods_ip_range" {
  type = string
} 

variable "spoke_subnet_services_ip_range" {
  type = string
} 

variable "env" {
  type = string
}

repo-mycompany-network-spokes/envs/nonprod/terraform.tfvars

env                            = "<ENV>"
spoke_subnet_ip_range          = "<SPOKE_SUBNET_IP_RANGE>"
spoke_subnet_pods_ip_range     = "<SPOKE_SUBNET_PODS_IP_RANGE>"
spoke_subnet_services_ip_range = "<SPOKE_SUBNET_SERVICES_IP_RANGE>"

repo-mycompany-network-spokes/envs/prod/terraform.tfvars

env                            = "<ENV>"
spoke_subnet_ip_range          = "<SPOKE_SUBNET_IP_RANGE>"
spoke_subnet_ip_pods_range     = "<SPOKE_SUBNET_PODS_RANGE>"
spoke_subnet_ip_services_range = "<SPOKE_SUBNET_SERVICES_RANGE>"

Deployment

Before running our pipeline in Gitlab CI, we first need to create the following resources in the network hub project:

gcloud config set project mycompany-network-hub
gcloud compute addresses create vpn-static-ip --region europe-west1

gcloud services enable secretmanager.googleapis.com
gcloud beta secrets create vpn-shared-secret --locations europe-west1 --replication-policy user-managed
echo -n "<shared_key_here>" | gcloud beta secrets versions add vpn-shared-secret --data-file=-

Note: Static IP used to establish a static VPN connection should always be created manually. If you ever need to recreate (or have unintentionally destroyed) your VPN tunnel, the on-premise environment won't need to recreate the tunnel.

And for each spoke project:

gcloud config set project mycompany-network-spoke-<env>
gcloud compute addresses create nat-static-ip1 --region europe-west1
gcloud compute addresses create nat-static-ip2 --region europe-west1

Note: Static IP addresses used to create a NAT Gateway should always be created manually. If you ever need to recreate (or have unintentionally destroyed) the NAT Gateway, the tools and servers that whitelist those IP addresses won't need to update their source IP addresses.

We will also need a bucket for our terraform states.

## enable apis
gcloud config set project mycompany-secops
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable storage.googleapis.com

## create gcs bucket
export REGION_DEFAULT=europe-west1
export BUCKET_NAME=bucket-mycompany-terraform-backend
gsutil mb -c standard -l $REGION_DEFAULT gs://$BUCKET_NAME
gsutil versioning set on gs://$BUCKET_NAME

Note: I recommend customers to centralize the terraform bucket in a specific project.

Now we can create our pipelines. The Gitlab runner will need the following permissions:

roles/compute.networkAdmin at network folder level.
roles/compute.xpnAdmin at spoke folder level.
roles/storage.objectAdmin on mycompany-secops project.

Note: To assign permissions to a Gitlab runner, please check out my latest article on Securing Google Service Account from Gitlab CI.

Complete the terraform.tfvars:

HUB_SUBNET_IP_RANGE=
ON_PREMISE_PEER_IP=
ON_PREMISE_NETWORK_IP_RANGE=

sed -i "s,<HUB_SUBNET_IP_RANGE>,${HUB_SUBNET_IP_RANGE},g;s,<ON_PREMISE_PEER_IP>,${ON_PREMISE_PEER_IP},g;s,<ON_
PREMISE_NETWORK_IP_RANGE>,${ON_PREMISE_NETWORK_IP_RANGE},g" plan/terraform.tfvars

repo-mycompany-network-hub/.gitlab-ci.yaml

stages:
  - init
  - deploy

# Install Terraform
.install:
  before_script:
      - apt-get update
      - apt-get install -y zip unzip
      - curl -sS "https://releases.hashicorp.com/terraform/0.14.7/terraform_0.14.7_linux_amd64.zip" > terraform.zip
      - unzip terraform.zip -d /usr/bin

init terraform:
  extends: .install
  stage: init
  image: 
    name: google/cloud-sdk
  script: 
    - cd plan
    - gcloud config set project mycompany-network-hub
    - terraform init -backend-config="bucket=bucket-mycompany-terraform-backend" -backend-config="prefix=network/hub/terraform/state"
  artifacts:
    paths:
      - plan/.terraform
  only:
    - master 
  tags:
    - k8s-network-runner

deploy terraform:
  extends: .install
  stage: deploy
  image: 
    name: google/cloud-sdk
  script: 
    - cd plan
    - gcloud config set project mycompany-network-hub
    - terraform apply -auto-approve
  only:
    - master 
  tags:
    - k8s-network-runner

Complete the terraform.tfvars:

ENV=
SPOKE_SUBNET_IP_RANGE=
SPOKE_SUBNET_PODS_IP_RANGE=
SPOKE_SUBNET_SERVICES_IP_RANGE=

sed -i "s,<ENV>,${ENV},g;s,<SPOKE_SUBNET_IP_RANGE>,${SPOKE_SUBNET_IP_RANGE},g;s,<SPOKE_SUBNET_PODS_IP_RANGE>,${SPOKE_SUBNET_PODS_IP_RANGE},g;s,<SPOKE_SUBNET_SERVICES_IP_RANGE>,${SPOKE_SUBNET_SERVICES_IP_RANGE},g" envs/$ENV/terraform.tfvars

repo-mycompany-network-spokes/.gitlab-ci.yaml

stages:
  - init
  - deploy

init terraform:
  stage: init
  image: 
    name: google/cloud-sdk
  script: 
    - cd envs/$ENV
    - gcloud config set project mycompany-network-spoke-$ENV
    - terraform init -backend-config="bucket=bucket-mycompany-terraform-backend" -backend-config="prefix=network/spoke/$ENV/terraform/state" ../../plan/
  artifacts:
    paths:
      - envs/$ENV/.terraform
  only:
    - master 
  tags:
    - k8s-network-runner

deploy terraform:
  stage: deploy
  image: 
    name: google/cloud-sdk
  script: 
    - cd envs/$ENV
    - gcloud config set project mycompany-network-spoke-$ENV
    - terraform apply -auto-approve ../../plan/ 
  only:
    - master
  tags:
    - k8s-network-runner

Once the connection is established between the hub and the spokes, you can attach your service projects to the host projects:

gcloud config set project mycompany-business-$ENV
gcloud beta compute shared-vpc associated-projects add mycompany-business-$ENV --host-project mycompany-network-spoke-$ENV
PROJECT_NUMBER=$(gcloud projects list --filter="$(gcloud config get-value project)" --format="value(PROJECT_NUMBER)")
gcloud projects add-iam-policy-binding mycompany-network-spoke-$ENV --member "serviceAccount:$PROJECT_NUMBER@cloudservices.gserviceaccount.com" --role "roles/compute.networkUser"
gcloud projects add-iam-policy-binding mycompany-network-spoke-$ENV --member "serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com" --role "roles/compute.networkUser"

Each compute resource running in service projects will automatically have access to on-premise workloads and vice versa.

Go further

To add more security on connectivity, we can enforce some organization policy constraints:

compute.restrictVpnPeerIPs
compute.restrictDedicatedInterconnectUsage
compute.restrictPartnerInterconnectUsage
compute.restrictVpcPeering
compute.restrictCloudNATUsage
compute.restrictXpnProjectLienRemoval
compute.restrictSharedVpcHostProjects
compute.restrictSharedVpcSubnetworks
compute.skipDefaultNetworkCreation
compute.vmExternalIpAccess

Filter on-premise network traffic using a hierarchical firewall at the folder level of the spoke environment.

If your business project has a private GKE cluster, you will not be able to reach out the on-premise network from pods. You will need to force masquerading for all the traffic originating from the pods [5].

If you need to resolve DNS between on-premise and your business projects, you can implement a DNS topology similar to the hub-and-spoke model.