I recently went through the full process of setting up GKE Gateway API on a production-grade cluster — Fleet registration, controller enablement, Helm deployment, and a handful of painful debugging sessions. This is the cleaned-up runbook so you don't have to learn these lessons the hard way.
Cluster Context
| Field | Value |
|---|---|
| Project | my-gcp-project |
| Cluster | my-gke-cluster |
| Region | us-east1 |
| Namespace | my-namespace |
Traffic flow:
Client → Cloud Load Balancer → GKE Gateway (L7) → HTTPRoute → Kubernetes Service → Pod
1. Prerequisites & Initial Setup
Set project & cluster context
gcloud config set project my-gcp-project
gcloud container clusters get-credentials my-gke-cluster \
--region us-east1
Enable required GCP APIs
gcloud services enable container.googleapis.com
gcloud services enable gkehub.googleapis.com
gcloud services enable serviceusage.googleapis.com
gcloud services enable multiclusteringress.googleapis.com
gcloud services enable multiclusterservicediscovery.googleapis.com
IAM permissions
Grant your service account the roles needed for Fleet and Ingress management:
SA=serviceAccount:my-service-account@my-gcp-project.iam.gserviceaccount.com
PROJECT=my-gcp-project
gcloud projects add-iam-policy-binding $PROJECT \
--member="$SA" --role="roles/container.admin"
gcloud projects add-iam-policy-binding $PROJECT \
--member="$SA" --role="roles/gkehub.admin"
gcloud projects add-iam-policy-binding $PROJECT \
--member="$SA" --role="roles/serviceusage.serviceUsageAdmin"
2. Fleet Registration & Gateway Enablement
Enable Workload Identity
The Workload Identity pool must match your service project, not the host project.
gcloud container clusters update my-gke-cluster \
--region us-east1 \
--workload-pool=my-service-project.svc.id.goog
Register the cluster to Fleet
gcloud container fleet memberships register my-gke-cluster \
--gke-cluster=us-east1/my-gke-cluster \
--enable-workload-identity
# Verify
gcloud container fleet memberships list
Enable Fleet Ingress (Gateway controller)
gcloud container fleet ingress enable
# Verify — expected: state: ACTIVE, membershipStates: OK
gcloud container fleet ingress describe
Enable Gateway API at cluster level
This is a separate step from Fleet Ingress. Without it,
kubectl get gatewayclass
returns nothing and the controller will never attach.
gcloud container clusters update my-gke-cluster \
--region us-east1 \
--enable-gateway-api
Enable a release channel
The GKE-managed Gateway controller requires a release channel to attach to the cluster:
gcloud container clusters update my-gke-cluster \
--region us-east1 \
--release-channel regular
Install Gateway API CRDs
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml
# Verify
kubectl get crds | grep gateway
Verify GatewayClasses
After the controller attaches, these should appear:
kubectl get gatewayclass
# NAME CONTROLLER
# gke-l7-rilb networking.gke.io/gateway
# gke-l7-global-external-managed networking.gke.io/gateway
# gke-l7-gxlb networking.gke.io/gateway
# gke-l7-regional-external-managed networking.gke.io/gateway
3. Helm Chart Deployment
helm upgrade --install my-gateway-chart ./gateway \
-n my-namespace --create-namespace
Validate
kubectl get gateway -n my-namespace
kubectl get httproute -n my-namespace
kubectl get svc -n my-namespace
kubectl describe gateway -n my-namespace
4. Issues & Fixes
This is where things got interesting.
Issue 1 — GatewayClass missing
Symptom: kubectl get gatewayclass returns no resources.
Root cause: Fleet Ingress and Gateway API at cluster level are two separate
enablement steps. You need both.
Fix: Run --enable-gateway-api on the cluster (step 2 above).
Issue 2 — Gateway stuck "Waiting for controller"
Symptom: ArgoCD shows Gateway as Progressing indefinitely.
Root cause: No release channel was set. The managed controller won't attach
without one.
Fix: Set the release channel, then delete the Gateway resource and let ArgoCD
re-sync:
kubectl delete gateway my-https-gateway -n my-namespace
# ArgoCD will recreate it against the now-attached controller
Issue 3 — CertificateMap region mismatch
Error:
CertificateMap "my-cert-map-cert-map" must not be configured in a region other than global
Two things went wrong here simultaneously.
Problem A — Helm double-suffix bug
The Helm template was concatenating a suffix onto a value that already had it:
# values.yaml
certMap: my-cert-map
# template (broken)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}-cert-map
# renders as: my-cert-map-cert-map ❌
# template (correct)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}
# renders as: my-cert-map ✔
Problem B — Wrong GatewayClass
gke-l7-rilb is regional internal. CertificateMaps are global-only — they are
incompatible:
# WRONG
gatewayClassName: gke-l7-rilb # regional internal, no certMap
# CORRECT
gatewayClassName: gke-l7-global-external-managed # global, certMap supported
🚨 Rule to remember: CertificateMaps are GLOBAL ONLY. Never pair them
with a regional GatewayClass.
Issue 4 — HTTPRoute BackendNotFound
Error:
BackendNotFound: services my-namespace/<name> not found
Root cause: The backendRefs in the HTTPRoute used Helm value names, not the
actual Kubernetes Service names.
# WRONG — these were Helm value names, not Service names
backendRefs:
- name: frontend-service # ❌
- name: search-service # ❌
# CORRECT — exact Kubernetes Service names
backendRefs:
- name: frontend # ✔
- name: search # ✔
- name: dashboard # ✔
- name: analytics-web # ✔
5. Validation & Testing
Get the Gateway IP
kubectl get gateway -n my-namespace
# Look at the ADDRESS column
Why a bare curl returns 404
curl -v http://<GATEWAY-IP>
# 404 fault filter abort
This is expected. The Gateway matched the request but no HTTPRoute matched
the Host header. It's not an error — it confirms the Gateway is working.
Test with the correct Host header
curl -H "Host: myapp.dev.example.io" http://<GATEWAY-IP>
# Should return your app's HTML
DNS
Create an A record:
myapp.dev.example.io → <GATEWAY-IP>
HTTPS
curl -vk https://myapp.dev.example.io
Local pod test
kubectl port-forward svc/analytics-web 8080:3000 -n my-namespace
curl http://localhost:8080
6. ArgoCD Cleanup
Orphaned HTTPRoutes from previous deploys will cause ArgoCD to show Degraded
even when the app is healthy. Clean them up:
kubectl delete httproute old-frontend-route -n my-namespace
kubectl delete httproute old-search-route -n my-namespace
Or enable pruning in your ArgoCD Application:
syncPolicy:
syncOptions:
- PruneLast=true
7. Quick Debug Reference
# Gateway status
kubectl describe gateway -n my-namespace
kubectl get gateway -n my-namespace -w
# HTTPRoutes
kubectl describe httproute -n my-namespace
# Controller (no pods = managed mode, that's normal)
kubectl get pods -A | grep -i gateway
kubectl get gatewayclass
kubectl get crd | grep gateway
# Cluster config
gcloud container clusters describe my-gke-cluster --region us-east1
# Events
kubectl get events -n my-namespace --sort-by=.lastTimestamp
8. Final State
| Component | Status |
|---|---|
| GatewayClass | ✅ Accepted |
| Gateway | ✅ Programmed |
| HTTPRoute | ✅ Healthy |
| Services | ✅ Resolved |
| Load Balancer | ✅ Active |
| Pods | ✅ Running |
| ArgoCD | ✅ Synced |
9. Key Lessons
-
Fleet Ingress ≠ Gateway controller ready. You need
--enable-gateway-apiseparately on the cluster. -
No release channel = no controller attachment. Add the cluster to
regularorstable. -
CertificateMaps are global only. Never use them with
gke-l7-rilbor any regional GatewayClass. -
Helm string concatenation silently breaks GCP resource names. Always
helm templateand inspect the rendered output before applying. -
HTTPRoute
backendRefsneed exact Service names. Your Helm value names and your Kubernetes Service names are not the same thing. - 404 fault filter abort = hostname mismatch, not a broken Gateway. Check your Host header.
- No gateway pods in GKE managed mode is normal. The controller is cloud-managed.
- Dataplane V2 can't be enabled on an existing cluster. You'd need to recreate it — and it's not required for Gateway API anyway.
10. Improvements Worth Making
Wildcard hostname — Reduces per-service HTTPRoute config significantly:
spec:
hostnames:
- "*.dev.example.io"
CI validation — Add a pre-deploy check that verifies every backendRef name exists as a live Service in the target namespace. This eliminates the BackendNotFound class of errors before they hit the cluster.
Helm schema guards — Use values.schema.json to validate that your certMap value doesn't already end with the suffix your template appends. Catches double-suffix bugs at helm lint time.
Unified naming — Make your Helm release name, Kubernetes Service name, and HTTPRoute backendRef all derive from the same value. One source of truth, zero drift.
Hope this saves someone a few hours. The GKE docs cover each of these pieces individually but the interactions between them — especially Fleet Ingress vs cluster-level Gateway API enablement, and the CertificateMap GatewayClass constraint — aren't obvious until you hit them.
Top comments (0)