🧭 Study how to deploy GKE private cluster using terraform and expose an echo server
🔗 Repo: https://github.com/pistocop/gke-basic-cluster-deployment
⚠️ For production deployment use Terraform Kubernetes Engine Module
📧 Found an error or have a question? write to me
📢 Intro
Kubernetes (k8s), although don’t require my introduction, is the most famous and widely adopted container manager in the world. Hosting by yourself is undoubtedly a very advanced and expert topic, so the major of companies choose a provider that provides it as managed service.
There are multiple famous k8s hosting services (e.g. GKE, EKS, AKS, Okteto), but is no doubt that one of the leading is Google Kubernetes Engine (GKE), a product provided by Google Cloud Platform (GCP).
So can we simply open GKE, start a cluster and we are ready to go? Well… yes but no, it maybe could work until something further structured and production ready is required. So then the GKE settings and tweaks start to emerge and require to be addressed. Take for example the official https://github.com/terraform-google-modules/terraform-google-kubernetes-engine terraform deploy: there are a lot of parameters that can really change how GKE will be deployed and employed.
So, simple things first: we will don’t go through all the possible parameters but we will, here, see a basic deployment of GKE with a description of the main settings that a deployment should address.
For better understanding - and code reusage - we deploy the system using [terraform](https://www.terraform.io/)
, this will give us the prospect to easily explain all the components with the unique clarity that belong to the code.
🚀 Deploy
- Prerequisites
[terraform](https://www.terraform.io/)
[kubectl](https://kubernetes.io/docs/tasks/tools/)
[gcloud](https://cloud.google.com/sdk/gcloud)
-
[tfswitch](https://tfswitch.warrensbox.com/)
- optional
- Download the repo https://github.com/pistocop/gke-basic-cluster-deployment
- Enable the following GCP APIs using the GCP console [1]
- Open GCP console -> APIs & Services -> enable APIs and services
- Enable:
- Compute Engine API
- Kubernetes Engine API
-
Compile the input variables:
$ cp iac/variables.tfvars.example iac/variables.tfvars $ vi iac/variables.tfvars # replace with your data
-
Deploy the cluster:
- Replace\set the variables with your data
# configure gcloud to the desired project $ gcloud config set project $PROJECT_ID # configure terraform $ cd iac $ tfswitch $ terraform init # deploy the GKE pre-requisites $ terraform plan -out out.plan -var-file="./variables.tfvars" -var deploy_cluster=false $ terraform apply out.plan # deploy GKE - can take more than 20 minutes $ terraform plan -out out.plan -var-file="./variables.tfvars" -var deploy_cluster=true $ terraform apply out.plan
-
Deploy the services into k8s
- Replace\set the variables with your data
# set kubectl context $ gcloud container clusters get-credentials gkedeploy-cluster --zone $PROJECT_REGION --project $PROJECT_ID # create common resources $ kubectl apply -f k8s/common # deploy the server $ kubectl apply -f k8s/gechoserver/ # wait that the ADDRESS will be displayed - can take more than 10 minutes $ kubectl -n dev get ingress -o wide NAME CLASS HOSTS ADDRESS PORTS AGE gechoserver <none> * 34.120.114.207 80 67s # query the server from internet - can take more than 10 minutes # replace "34.120.114.207" with your address: $ curl -XPOST -v <http://34.120.114.207/> -d "foo=bar" # ~ Congratulation, your server on GKE is up and running! ~
-
Destroy the cluster:
$ cd iac $ terraform destroy -auto-approve -var-file="./variables.tfvars" -var deploy_cluster=true
🏗️ Architecture
Terraform
- The k8s cluster provider is GKE from Google Cloud Platform (GCP)
- The terraform state is stored only locally (e.g. no backend on GCS)
GKE
- Deployed as private cluster, so it depends only on internal IPs
- Deployed in VPC-native mode:
- The traffic to a specific pod will be routed directly to the correct node thanks to the container native LB
- We deploy the Ingress back-end as
ClusterIP
instead ofNodePort
to leverage the container native load balancing
- The terraform variable
deploy_cluster
steers the cluster creation, set tofalse
to create only the networks components - Use a Service Account named
gkedeploy-sa
Network
- No NAT will be deployed
- Therefore the system cannot pull images from public container ◦ registries like Docker Hub, read more under Tips and Takeaways
- VPC doesn't create a subnet for each zone
- Two subnetworks are provided:
-
gkedeploy-subnet
: with the range10.10.0.0/24
is the subnetwork where GKE nodes will be deployed- Instances within this network can access Google APIs and services by using Private Google Access
-
gkedeploy-lb-proxy-only-subnet
: with the range10.14.0.0/23
is a proxy only subnet and is required by GCP to reserve a range of IPs used to deploy the Load Balancers
-
- VPC-Native cluster alias IP range could be checked under "Console -> VPC network details -> secondary IPv4 ranges"
- Under that field we find both the
cluster_ipv4_cidr_block
(for pods -10.11.0.0/21
) andservices_ipv4_cidr_block
(for services -10.12.0.0/21
) values,
- Under that field we find both the
- GKE hosted (by Google) master's nodes will use the
10.13.0.0/28
range, see master_ipv4_cidr_block parameter
🥪 Tips and Takeaways
- Every component and setting described here is reported on the
terraform
code, with also more insights, read the code to grasp those concepts - We deploy GKE after other resources (this is why we had 2
terraform
plans) because otherwise, sometimes the GKE deployment remains infinitely stuck during the health check process, and terraform returns the errorError: Error waiting for creating GKE cluster: All cluster resources were brought up, but [...]
- Maybe the error is due from SA not yet deployed/up&running, so try to deploy firstly all the resources using
var deploy_cluster=false
and then deploy the cluster using terraform withvar deploy_cluster=true
- Maybe the error is due from SA not yet deployed/up&running, so try to deploy firstly all the resources using
- To pull docker images without adding the NAT component we have two choices (memo: we have Private Google Access enabled):
- Enable the Artifact Registry service on your GCP project and upload/mirror the desired images
- Chose the images to use from the public Google Container Registry that has a Google allowed IP (e.g. the
k8s/gechoserver
deployment) - By default, you cannot reach GCE vm using
-tunnel-through-iap
because the firewalls block that connection- We add
fw-iap
firewall rule to terraform in order to use this GCP functionality, named IAP for TCP forwarding
- We add
- [1] We can write Terraform code to enable the GCP APIs, but is opinionated that we should not
- On terraform, under the GKE section, why
master_ipv4_cidr_block
is required?- because the k8s master(s) are managed by Google and a peering connection will be created from the Google network with the GKE network
- due to this connection, Google needs to know a free IP range used to assign IPs to the master's components
-
When deploying a k8s Service object, pay attention when defining UDP/TCP ports: wrong usages fail silently
-
Example:
- Start 2 pods (
A
andB
), declare ClusterIP on port80
for TCP connection -
Run the following code:
# ~ With TCP: # Client A: $ nc -l -p 8080 # Client B: $ nc network-multitool 8080 hello in TCP # Client A: hello in TCP # <-- msg received # ~ With UDP: # Client A: $ nc -l -u -p 8081 # Client B: $ nc -u network-multitool 8081 hello in UDP # Client A: <nothing>
- Start 2 pods (
-
🔗 Resources
- [1] Can I automatically enable APIs when using GCP cloud with terraform? - so
- [2] Best managed kubernetes platform - reddit
- Learn Terraform - Provision a GKE Cluster - gh
- Official GCP Terraform provider - doc
- GKE Ingress for HTTP(S) Load Balancing - doc
- Network overview - doc
- VPC-native clusters - doc
- DNS on GKE: Everything you need to know - medium
- A trip with Google Global Load Balancers - medium
Top comments (0)