Durrell Gemuh

Posted on Apr 27

GKE Gateway API: Full Setup & Troubleshooting Runbook

#kubernetes #gke #devops #cloud

I recently went through the full process of setting up GKE Gateway API on a production-grade cluster — Fleet registration, controller enablement, Helm deployment, and a handful of painful debugging sessions. This is the cleaned-up runbook so you don't have to learn these lessons the hard way.

Cluster Context

Field	Value
Project	`my-gcp-project`
Cluster	`my-gke-cluster`
Region	`us-east1`
Namespace	`my-namespace`

Traffic flow:
Client → Cloud Load Balancer → GKE Gateway (L7) → HTTPRoute → Kubernetes Service → Pod

1. Prerequisites & Initial Setup

Set project & cluster context

gcloud config set project my-gcp-project

gcloud container clusters get-credentials my-gke-cluster \
  --region us-east1

Enable required GCP APIs

gcloud services enable container.googleapis.com
gcloud services enable gkehub.googleapis.com
gcloud services enable serviceusage.googleapis.com
gcloud services enable multiclusteringress.googleapis.com
gcloud services enable multiclusterservicediscovery.googleapis.com

IAM permissions

Grant your service account the roles needed for Fleet and Ingress management:

SA=serviceAccount:my-service-account@my-gcp-project.iam.gserviceaccount.com
PROJECT=my-gcp-project

gcloud projects add-iam-policy-binding $PROJECT \
  --member="$SA" --role="roles/container.admin"

gcloud projects add-iam-policy-binding $PROJECT \
  --member="$SA" --role="roles/gkehub.admin"

gcloud projects add-iam-policy-binding $PROJECT \
  --member="$SA" --role="roles/serviceusage.serviceUsageAdmin"

2. Fleet Registration & Gateway Enablement

Enable Workload Identity

The Workload Identity pool must match your service project, not the host project.

gcloud container clusters update my-gke-cluster \
  --region us-east1 \
  --workload-pool=my-service-project.svc.id.goog

Register the cluster to Fleet

gcloud container fleet memberships register my-gke-cluster \
  --gke-cluster=us-east1/my-gke-cluster \
  --enable-workload-identity

# Verify
gcloud container fleet memberships list

Enable Fleet Ingress (Gateway controller)

gcloud container fleet ingress enable

# Verify — expected: state: ACTIVE, membershipStates: OK
gcloud container fleet ingress describe

Enable Gateway API at cluster level

This is a separate step from Fleet Ingress. Without it, kubectl get gatewayclass
returns nothing and the controller will never attach.

gcloud container clusters update my-gke-cluster \
  --region us-east1 \
  --enable-gateway-api

Enable a release channel

The GKE-managed Gateway controller requires a release channel to attach to the cluster:

gcloud container clusters update my-gke-cluster \
  --region us-east1 \
  --release-channel regular

Install Gateway API CRDs

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml

# Verify
kubectl get crds | grep gateway

Verify GatewayClasses

After the controller attaches, these should appear:

kubectl get gatewayclass

# NAME                                  CONTROLLER
# gke-l7-rilb                           networking.gke.io/gateway
# gke-l7-global-external-managed        networking.gke.io/gateway
# gke-l7-gxlb                           networking.gke.io/gateway
# gke-l7-regional-external-managed      networking.gke.io/gateway

3. Helm Chart Deployment

helm upgrade --install my-gateway-chart ./gateway \
  -n my-namespace --create-namespace

Validate

kubectl get gateway    -n my-namespace
kubectl get httproute  -n my-namespace
kubectl get svc        -n my-namespace
kubectl describe gateway -n my-namespace

4. Issues & Fixes

This is where things got interesting.

Issue 1 — GatewayClass missing

Symptom: kubectl get gatewayclass returns no resources.

Root cause: Fleet Ingress and Gateway API at cluster level are two separate
enablement steps. You need both.

Fix: Run --enable-gateway-api on the cluster (step 2 above).

Issue 2 — Gateway stuck "Waiting for controller"

Symptom: ArgoCD shows Gateway as Progressing indefinitely.

Root cause: No release channel was set. The managed controller won't attach
without one.

Fix: Set the release channel, then delete the Gateway resource and let ArgoCD
re-sync:

kubectl delete gateway my-https-gateway -n my-namespace
# ArgoCD will recreate it against the now-attached controller

Issue 3 — CertificateMap region mismatch

Error:

CertificateMap "my-cert-map-cert-map" must not be configured in a region other than global

Two things went wrong here simultaneously.

Problem A — Helm double-suffix bug

The Helm template was concatenating a suffix onto a value that already had it:

# values.yaml
certMap: my-cert-map

# template (broken)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}-cert-map
# renders as: my-cert-map-cert-map ❌

# template (correct)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}
# renders as: my-cert-map ✔

Problem B — Wrong GatewayClass

gke-l7-rilb is regional internal. CertificateMaps are global-only — they are
incompatible:

# WRONG
gatewayClassName: gke-l7-rilb            # regional internal, no certMap

# CORRECT
gatewayClassName: gke-l7-global-external-managed  # global, certMap supported

🚨 Rule to remember: CertificateMaps are GLOBAL ONLY. Never pair them
with a regional GatewayClass.

Issue 4 — HTTPRoute BackendNotFound

Error:

BackendNotFound: services my-namespace/<name> not found

Root cause: The backendRefs in the HTTPRoute used Helm value names, not the
actual Kubernetes Service names.

# WRONG — these were Helm value names, not Service names
backendRefs:
  - name: frontend-service   # ❌
  - name: search-service     # ❌

# CORRECT — exact Kubernetes Service names
backendRefs:
  - name: frontend           # ✔
  - name: search             # ✔
  - name: dashboard          # ✔
  - name: analytics-web      # ✔

5. Validation & Testing

Get the Gateway IP

kubectl get gateway -n my-namespace
# Look at the ADDRESS column

Why a bare curl returns 404

curl -v http://<GATEWAY-IP>
# 404 fault filter abort

This is expected. The Gateway matched the request but no HTTPRoute matched
the Host header. It's not an error — it confirms the Gateway is working.

Test with the correct Host header

curl -H "Host: myapp.dev.example.io" http://<GATEWAY-IP>
# Should return your app's HTML

DNS

Create an A record:

myapp.dev.example.io  →  <GATEWAY-IP>

HTTPS

curl -vk https://myapp.dev.example.io

Local pod test

kubectl port-forward svc/analytics-web 8080:3000 -n my-namespace
curl http://localhost:8080

6. ArgoCD Cleanup

Orphaned HTTPRoutes from previous deploys will cause ArgoCD to show Degraded
even when the app is healthy. Clean them up:

kubectl delete httproute old-frontend-route  -n my-namespace
kubectl delete httproute old-search-route    -n my-namespace

Or enable pruning in your ArgoCD Application:

syncPolicy:
  syncOptions:
    - PruneLast=true

7. Quick Debug Reference

# Gateway status
kubectl describe gateway   -n my-namespace
kubectl get gateway -n my-namespace -w

# HTTPRoutes
kubectl describe httproute -n my-namespace

# Controller (no pods = managed mode, that's normal)
kubectl get pods -A | grep -i gateway
kubectl get gatewayclass
kubectl get crd | grep gateway

# Cluster config
gcloud container clusters describe my-gke-cluster --region us-east1

# Events
kubectl get events -n my-namespace --sort-by=.lastTimestamp

8. Final State

Component	Status
GatewayClass	✅ Accepted
Gateway	✅ Programmed
HTTPRoute	✅ Healthy
Services	✅ Resolved
Load Balancer	✅ Active
Pods	✅ Running
ArgoCD	✅ Synced

9. Key Lessons

Fleet Ingress ≠ Gateway controller ready. You need --enable-gateway-api separately on the cluster.
No release channel = no controller attachment. Add the cluster to regular or stable.
CertificateMaps are global only. Never use them with gke-l7-rilb or any regional GatewayClass.
Helm string concatenation silently breaks GCP resource names. Always helm template and inspect the rendered output before applying.
HTTPRoute backendRefs need exact Service names. Your Helm value names and your Kubernetes Service names are not the same thing.
404 fault filter abort = hostname mismatch, not a broken Gateway. Check your Host header.
No gateway pods in GKE managed mode is normal. The controller is cloud-managed.
Dataplane V2 can't be enabled on an existing cluster. You'd need to recreate it — and it's not required for Gateway API anyway.

10. Improvements Worth Making

Wildcard hostname — Reduces per-service HTTPRoute config significantly:

spec:
  hostnames:
    - "*.dev.example.io"

CI validation — Add a pre-deploy check that verifies every backendRef name exists as a live Service in the target namespace. This eliminates the BackendNotFound class of errors before they hit the cluster.

Helm schema guards — Use values.schema.json to validate that your certMap value doesn't already end with the suffix your template appends. Catches double-suffix bugs at helm lint time.

Unified naming — Make your Helm release name, Kubernetes Service name, and HTTPRoute backendRef all derive from the same value. One source of truth, zero drift.

Hope this saves someone a few hours. The GKE docs cover each of these pieces individually but the interactions between them — especially Fleet Ingress vs cluster-level Gateway API enablement, and the CertificateMap GatewayClass constraint — aren't obvious until you hit them.

DEV Community