Robert Nemet

Posted on Sep 2, 2023 • Originally published at rnemet.dev on Sep 2, 2023

Exploring GCP With Terraform: VPC Firewall Rules, part 2

#terraform #tutorial #gcp #devops

This post would be 3rd part of the series about exploring GCP with Terraform. In the previous part, I created VPC networks, subnets, and a few firewall rules. In this part, I will explore more firewall rules and their parameters.

More precisely, I'll set up three VPCs: back-office, services and storage. In VPC back_office, I'll have two subnets; in others, I'll have one subnet. For the sake of conversation, imagine that VMs in the back-office have to call VMs in services and storage. Also, direct access to VMs from outside should not be allowed, except for one that will serve for maintenance.

What I have done so far

I have two VPCs: back-office and services. The VPC back-office has a subnet back-office and the VPC services a subnet named services. In each VPC, I added firewall rules to allow SSH access from outside through IAP. I also added a firewall rule to allow ICMP traffic from anywhere. Each VPC has a VM instance.

The Terraform project is growing. If you look at how I set a base workflow in the first post of this series, now I have two more similar workflows for networks and vms. It is not the best practice to have it like this, but this will be addressed later.

VPC Firewall rules

So, what is a firewall rule? A firewall rule is a set of conditions that define what traffic is allowed to enter or leave a VPC network. A collection of parameters defines each firewall rule. Those parameters are:

direction - ingress or egress (default: ingress)
priority - the priority of the rule. The lower the number, the higher the priority. The default value is 1000.
action - allow or deny
enforced - if the rule is enforced. The default value is true.
target - the target of the rule. The target can be a tag, a service account, or a network.
source - the source of the traffic. The source can be a tag, a service account, or a network.
protocol - the protocol of the traffic. The protocols like TCP, UDP, ICMP, etc.
logs - boolean. Make logs to see if the rule is matched or not.

Setting up the stage

VPC back_office has two subnets: back-office and back-office-private. So far, I have firewall rules: back-office-iap, back-office-icmp, and back-office-ssh, allowing ingress traffic from Google IAP, ICMP from anywhere, and SSH from anywhere. So, adding a new subnet back-office-private to the VPC back-office:

# subnet for back office: private 
resource "google_compute_subnetwork" "back_office_private" {
  name          = "back-office-private"
  ip_cidr_range = "10.2.0.0/24"
  network       = google_compute_network.back_office.self_link
  region        = var.region
}

To create a VM, you can use this code as a template inside the vms workflow:

resource "google_compute_instance" "services_vm_test" {
  name         = "services-vm-test"
  machine_type = "f1-micro"
  zone         = var.zone

  scheduling {
    preemptible        = true
    automatic_restart  = false
    provisioning_model = "SPOT"
  }

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }
  network_interface {
    network    = data.terraform_remote_state.network.outputs.vpc_services.id
    subnetwork = data.terraform_remote_state.network.outputs.vpc_services_subnetwork.id
  }
}

You may wonder what is this part:

network_interface {
    network    = data.terraform_remote_state.network.outputs.vpc_services.id
    subnetwork = data.terraform_remote_state.network.outputs.vpc_services_subnetwork.id
  }

Where is that defined? Since I have a network creation in the different workflow networks, I need to get the network ID and subnet ID from that workflow. I can do that by using the terraform_remote_state data source. But first, I need to export it in the networks workflow in the outputs.tf file:

output "vpc_services" {
  value = google_compute_network.services
}

output "vpc_services_subnetwork" {
  value = google_compute_subnetwork.services
}

Using the template above for VM and exposing all VPCs and subnets from networks, I can add two VMs in each subnet. The VMs in the back-office subnet will be named back-office-vm1 and back-office-vm2. The VMs in the back-office-private subnet will be named back-office-private-vm1 and back-office-private-vm2. First, I must change the networks and then the vms. Then, I can look for created instances:

$ gcloud compute instances list
NAME                     ZONE           MACHINE_TYPE  PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP  STATUS
back-office-private-vm1  us-central1-c  f1-micro      true         10.2.0.3                  RUNNING
back-office-private-vm2  us-central1-c  f1-micro      true         10.2.0.2                  RUNNING
back-office-vm1          us-central1-c  f1-micro      true         10.1.0.9                  RUNNING
back-office-vm2          us-central1-c  f1-micro      true         10.1.0.10                 RUNNING
services-vm-test         us-central1-c  f1-micro      true         10.1.0.5                  RUNNING

Then, let's try to connect to any VM from the local machine:

$ gcloud compute ssh back-office-vm1 --zone us-central1-c --tunnel-through-iap

Then, from it, let's try to connect to any other VM, ping it:

$ ping -c 5 10.2.0.3
PING 10.2.0.3 (10.2.0.3) 56(84) bytes of data.
64 bytes from 10.2.0.3: icmp_seq=1 ttl=64 time=0.884 ms
64 bytes from 10.2.0.3: icmp_seq=2 ttl=64 time=0.165 ms
64 bytes from 10.2.0.3: icmp_seq=3 ttl=64 time=0.233 ms
64 bytes from 10.2.0.3: icmp_seq=4 ttl=64 time=0.159 ms
64 bytes from 10.2.0.3: icmp_seq=5 ttl=64 time=0.238 ms

--- 10.2.0.3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4103ms
rtt min/avg/max/mdev = 0.159/0.335/0.884/0.276 ms

Yes, you can connect to any VM inside VPC. Why? Because we have a firewall rule that allows ICMP traffic from anywhere. Let's delete that rule and try again(from the
local machine):

$ gcloud compute firewall-rules delete back-office-icmp
The following firewalls will be deleted:
 - [back-office-icmp]

Do you want to continue (Y/n)?  y

Deleted [https://www.googleapis.com/compute/v1/projects/network-playground-382512/global/firewalls/back-office-icmp].

Repeat the ping command from the connected VM to any other VM:

ping -c 5 10.2.0.3
PING 10.2.0.3 (10.2.0.3) 56(84) bytes of data.

--- 10.2.0.3 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4074ms

Now, at this moment VMs, can not communicate with each other.

Tightening access: Only one VM for SSH (bastion)

Let's say we want to allow SSH access only to one VM from outside. We can do that by adding a firewall rule that allows SSH access only to one VM. Let's remove the rule back-office-ssh as it is not practical. Why? I'm accessing VMs with the tunnel through IAP, so I do not need it. It would be helpful only if I could access them directly. That means that the target VM has an external IP. But I do not want that.

And let's modify the rule back-office-iap to allow SSH access only to one VM. Let's add the tag bastion to the VM:

resource "google_compute_firewall" "back_office_iap" {
  name    = "back-office-iap"
  network = google_compute_network.back_office.self_link

  source_ranges = ["35.235.240.0/20"]
  target_tags   = ["bastion"]
  direction     = "INGRESS"
  allow {
    protocol = "tcp"
  }
}

I am using the tag bastion to allow SSH access by applying this firewall rule only to VMs with particular tags. If I try to access the VM back-office-vm1 from the local machine, I will get an error:

$ gcloud compute ssh back-office-vm1 --zone us-central1-c --tunnel-through-iap
ERROR: (gcloud.compute.start-iap-tunnel) Error while connecting [4003: 'failed to connect to backend']. (Failed to connect to port 22)
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535

Recommendation: To check for possible causes of SSH connectivity issues and get
recommendations, rerun the ssh command with the --troubleshoot option.

gcloud compute ssh back-office-vm1 --project=network-playground-382512 --zone=us-central1-c --troubleshoot

Or, to investigate an IAP tunneling issue:

gcloud compute ssh back-office-vm1 --project=network-playground-382512 --zone=us-central1-c --troubleshoot --tunnel-through-iap

ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

Let's add the tag bastion to the VM back-office-vm1:

$ gcloud compute instances add-tags back-office-vm1 --zone=us-central1-c --tags=bastion
Updated [https://www.googleapis.com/compute/v1/projects/network-playground-382512/zones/us-central1-c/instances/back-office-vm1].

Now, connecting to the VM back-office-vm1 works. But connecting to any other VM in the VPC back-office doesn't work. Why? Because we have a firewall rule that allows SSH access only to the VM back-office-vm1. The firewall matches the target with the tag value. If I tag any other VM in the VPC back-office with the tag bastion, I can connect to it.

Well, I'm not happy that adding/removing tags is so easy. I want a more secure way to allow SSH access to VMs. I could use the service account to enable SSH access:

resource "google_service_account" "back_office_fw_sa" {
  account_id   = "back-office"
  display_name = "back-office"
}

resource "google_compute_firewall" "back_office_iap" {
  name    = "back-office-iap"
  network = google_compute_network.back_office.self_link

  source_ranges           = ["35.235.240.0/20"]
  target_service_accounts = [google_service_account.back_office_fw_sa.email]
  direction               = "INGRESS"
  allow {
    protocol = "tcp"
  }
  depends_on = [google_service_account.back_office_fw_sa]
}

Connecting to the VM back-office-vm1 from the local machine, I will get the same error. Why? Because I'm not using the
service account to connect to the VM. I'm using the tunnel through IAP. Let's add the service account to the VM back-office-vm1:

resource "google_compute_instance" "back_office_vm1" {
  name         = "back-office-vm1"
...

  allow_stopping_for_update = true

  service_account {
    email  = data.terraform_remote_state.network.outputs.back_office_fw_sa
    scopes = ["https://www.googleapis.com/auth/cloud-platform"]
  }

...
}

I added the service account to the VM back-office-vm1. Now, I can connect to it from the local machine. IAP can connect to it. Anyone who passes the IAP authentication can connect to it. But I can't connect to any other VM in the VPC back-office. Why? Because I'm not using the service account to connect to the VMs. I'm using the tunnel through IAP, and the firewall rule allows access only to the back-office-vm1 VM.

The scopes are required to allow the service account to access the resources. When adding service accounts to VMs, you must set allow_stopping_for_update = true and service account scopes. Changing the service account on the VM requires stopping the VM.

This requirement makes a difference. You can add and remove tags on the fly but can't do that with service accounts. You must stop the VM to change the service account. That is why using service accounts is more secure than using tags.

Notice that I exported the service account email from the networks workflow. I did that because I needed to use it in the vms workflow.

OK, now I can connect to the VM back-office-vm1, in the VPC back-office, from the local machine via the IAP tunnel. But I can connect to any other VM in the VPC back-office. But I still can connect to all other VMs in the VPC services and storages. I'll remove the IAP firewall rule for those VPCs to fix this. But then I'll need access from the VPC's back_office VMs to the VPC's services and storages VMs. Keep the ICMP rule for VPC services and storages.

You can do it now if you did not add VPC storages, a subnet, and VMs. Just remember to add VPC and subnet to the networks workflow and VMs to the vms workflow. For the storage subnet in the VPC storage, I used CIDR 10.120.0.0/24. And just for a reminder, CIDR for subnet services in VPC services is 10.1.0.0/24, and subnet back-office in VPC back_office is 10.1.0.0/24 and subnet back-office-private in VPC back_office is 10.2.0.0/24.

Tightening access: Accessing other VPCs

OK, now what? I can't connect to any VM in the VPC services and datastorages from the local machine because I just removed IAP firewall rules for those VPCs. I can connect only to back-office_vm1 from the local machine. But from it, I can connect to any other VM in the VPC back-office, but not in other VPCs. To be able to connect from the local machine to back-office-vm1 and then to any VMs in VPC services and storages, I need first to connect the VPC back-office
to VPC services and storages. I can do that by setting up VPC peering. I'll set up VPC peering between VPC back-office and services and between VPC back-office and storages:

resource "google_compute_network_peering" "back_office_service_peering" {
  name         = "back-office-services-peering"
  network      = google_compute_network.back_office.self_link
  peer_network = google_compute_network.services.self_link
}

resource "google_compute_network_peering" "service_back_office_peering" {
  name         = "services-back-office-peering"
  network      = google_compute_network.services.self_link
  peer_network = google_compute_network.back_office.self_link
}

resource "google_compute_network_peering" "back_office_datastorage_peering" {
  name         = "back-office-storage-peering"
  network      = google_compute_network.back_office.self_link
  peer_network = google_compute_network.storage.self_link
}

resource "google_compute_network_peering" "datastorage_back_office_peering" {
  name         = "storage-back-office-peering"
  network      = google_compute_network.storage.self_link
  peer_network = google_compute_network.back_office.self_link
}

But we'll get an error when applying the changes:

 Error: Error waiting for Adding Network Peering: An IP range in the peer network (10.1.0.0/24) overlaps with an IP range in the local network (10.1.0.0/24) allocated by 
 resource (projects/network-playground-382512/regions/us-central1/subnetworks/back-office).

Why? Because the VPC back-office and services have subnets with the same IP range. When setting up VPC peering, it is essential to know that the IP ranges of the subnets must not overlap. Changing the IP ranges of subnets in VPC services is OK. But this means we need to plan the IP ranges of subnets. If you are using IPv6 in a VPC, then you don't need to worry about this. But, if you are using IPv4, you need to plan the IP ranges of subnets. The reason is that if you want to change CIDR for a subnet, you need to free all resources that use this subnet first, like deleting all VMs.

Changing the IP ranges of subnets in VPC services is easy. I'll change the subnet services IP range to 10.3.0.0/24. But to do that, I need to delete the VMs in the VPC services and recreate peering between VPCs. So, I'll remove VMs from the VPC services. Then, fixing the subnet range in VPC _services, recreate peering between VPCs. And finally, adding back deleted VMs to the VPC services.

While I'm doing this, I'll revert SSH access to the VMs in all VPCs, but this time, I'll use CIDR ranges to allow access:

resource "google_compute_firewall" "services_ssh" {
  name    = "services-ssh"
  network = google_compute_network.services.self_link

  source_ranges = ["10.1.0.0/24", "10.2.0.0/24"]
  direction     = "INGRESS"
  allow {
    protocol = "tcp"
    ports    = ["22"]
  }
}

This limits access to the VMs in the VPC services from the VPC back-office and back-office-private subnets. I'll do the same for the VPC storage.

You can add one for the VPC storage as well. On top of all this, you'll need to add public SSH keys to the VMs. You can do it per VM or project. I'll do it on
the project level for now:

resource "google_compute_project_metadata" "default" {
  metadata = {
    ssh-keys = <<EOF
      user:ssh-rsa public_key_goes_here user
    EOF
  }
}

When adding SSH public keys to the project level, you are allowing SSH access to all VMs in the project. For now, this is good enough. To generate SSH keys, you can use the command ssh-keygen:

$ ssh-keygen -f private-key -C rnemet -b 2048
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in private-key
Your public key has been saved in private-key.pub
...

The code above will generate two files: private-key and private-key.pub. The file private-key is the private key, and the file private-key.pub is the public key. Now, you
can use the public key to add it to the project metadata. You can also use the private key to connect to the VMs. That means you must keep the private key safe and be on machines you use to connect to the VMs. After adding the public key to the project metadata, you can connect to the VMs in the VPC services and storages from VPC back-office VMs. All you need when connecting to the VMs is the private key on bastion VMs:

$ gcloud compute scp ./private-key rnemet@back-office-vm1:.ssh/private-key --zone=us-central1-c
External IP address was not found; defaulting to using IAP tunneling.
WARNING:

To increase the performance of the tunnel, consider installing NumPy. For instructions,
please see https://cloud.google.com/iap/docs/using-tcp-forwarding#increasing_the_tcp_upload_bandwidth

private-key

When connecting to the VMs:

$ ssh rnemet@10.120.0.3 -i .ssh/private-key

With this setup, I can connect to the VMs in the VPC services and storages from the VM in the VPC back-office. Direct access to other VMs is not possible because
of IAP and firewall rules.

Conclusion

So far, I have created VPCs, subnets, and VMs. I have also created firewall rules to allow SSH access to the VMs from one VM inside my networks. I have also created VPC peering between VPCs. Also, I started to share resources between Terraform state files. However, my TF project is growing and needs a better structure. I need to invest some time to improve it. But at the same time, I would like to put some services that are reachable from the outside.

What do you think? What would you do next? Let me know...

DEV Community

Exploring GCP With Terraform: VPC Firewall Rules, part 2

What I have done so far

VPC Firewall rules

Setting up the stage

Tightening access: Only one VM for SSH (bastion)

Tightening access: Accessing other VPCs

Conclusion

References

Top comments (0)

Read next

Kubernetes Basics | Get started with Kubernetes

How to Check if Google Tag Manager is Working?

Understanding Kubernetes: part 53 – Kubernetes 1.32 Changelog

Create a Windows 11 VM that is highly available