Amit Malhotra

Posted on Apr 28

GKE Gateway API: Why Ingress Is Technical Debt in 2025

#gke #kubernetes #gatewayapi #devops

GKE Gateway API: Why Ingress Is Technical Debt in 2025

Kubernetes Ingress was a reasonable abstraction when it shipped. Simple HTTP routing, basic path matching, TLS termination. It solved the problem of getting traffic into a cluster without manual Service LoadBalancer management.

That was 2015. The problem is that most teams still treat Ingress as the default choice in 2025, even when building net-new GKE platforms. They're not making a deliberate architecture decision — they're following muscle memory. And that muscle memory is costing them.

The Annotation Sprawl Problem

I've reviewed production GKE clusters with 40+ annotations on a single Ingress resource. Teams trying to approximate canary deployments, header-based routing, connection draining behaviour, and custom health check configurations — all through annotations that vary by Ingress controller and break silently during upgrades.

The worst part isn't the complexity. It's the brittleness.

Last year I worked with a SaaS platform team in Toronto running nginx ingress controller on GKE. They had a canary deployment setup using weight annotations. During a routine controller upgrade, the weights reset. Not to 50/50 — to 100/0. All traffic shifted to the canary build. The incident took 40 minutes to detect because their monitoring was checking pod health, not traffic distribution.

This isn't an edge case. Ingress was designed for simple HTTP routing. Everything beyond that is controller-specific behaviour layered on through annotations with no guaranteed stability across versions.

Gateway API Is the Successor — And It's Production-Ready on GKE

The Gateway API isn't experimental anymore. On GKE, it's backed by GCP's Global External Load Balancer as the data plane. No nginx controller VMs. No HAProxy sidecars. Native GCP infrastructure with the reliability and scaling characteristics teams already trust for their other GCP workloads.

The architecture is role-oriented by design:

Gateway resources define infrastructure: which ports, which protocols, which TLS configuration. Infrastructure teams own these.
HTTPRoute resources define application routing: which paths, which headers, which backend services. Application teams own these.
ReferenceGrant resources control cross-namespace access: explicit permission for one namespace to reference resources in another.

This separation matters. In Ingress, the application team and the platform team both edit the same resource. That creates merge conflicts, permission sprawl, and change management overhead. Gateway API's role separation aligns with how mature platform teams actually operate.

What Gateway API Handles Natively

Traffic splitting for canary deployments is built into HTTPRoute:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
spec:
  rules:
  - backendRefs:
    - name: app-stable
      port: 80
      weight: 90
    - name: app-canary
      port: 80
      weight: 10

No annotations. No third-party tooling. The weights are explicit in the resource spec, validated by the API server, and implemented by the GCP load balancer.

Certificate Manager integration is equally clean. You define a Gateway with a reference to a CertificateMap, and GKE handles the TLS termination at the load balancer level:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: external-gateway
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
  - name: https
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      options:
        networking.gke.io/cert-map: projects/PROJECT/locations/global/certificateMaps/cert-map

I've seen teams running Ingress with manual cert rotation scripts — cron jobs copying secrets between namespaces, custom operators watching for expiry. Certificate Manager with Gateway API eliminates that operational burden.

GKE also automatically creates Network Endpoint Groups (NEGs) for Gateway API backends. This enables pod-level health checking instead of node-level, which means faster failover and more accurate load balancing. With Ingress, NEG mode is possible but requires additional annotations and careful configuration.

The Business Case for Migration

Teams still running Ingress in production are carrying hidden costs:

Engineering velocity: Every routing change requires understanding controller-specific annotation behaviour. New engineers spend weeks learning the tribal knowledge of "which annotations actually work."

Operational risk: Ingress controller upgrades can silently change routing behaviour. I've seen weight annotations ignored, header matching break, and connection draining stop working — all without API validation errors.

Cloud cost: Running nginx or HAProxy ingress controllers on GKE means paying for controller pods that duplicate what GCP's load balancer already provides. On clusters with high traffic, this adds up.

Audit readiness: Gateway API's ReferenceGrant resources provide explicit, auditable cross-namespace permissions. With Ingress, cross-namespace routing often requires broad RBAC permissions that auditors question during SOC 2 reviews.

This is where the Automation and Lifecycle Operations principles from the SCALE framework apply directly. Infrastructure that requires manual intervention to change routing behaviour doesn't scale with the team. Gateway API's declarative model enables GitOps workflows where routing changes go through the same PR review process as application code.

Trade-offs to Consider

Gateway API isn't a drop-in replacement for Ingress. The migration requires planning:

Learning curve: The three-resource model (Gateway, HTTPRoute, ReferenceGrant) is more complex than a single Ingress resource. Teams unfamiliar with role-based separation need time to understand the boundaries.

GatewayClassName constraints: GKE-managed Gateway classes support specific load balancer types. If your architecture requires regional internal load balancing or TCP/UDP passthrough, verify gatewayClassName compatibility before designing.

Traffic cutover: Migrating from Ingress to Gateway API means changing the load balancer. DNS cutover or traffic shifting is required — you can't run both on the same IP address.

Controller ecosystem: Some teams have invested in Ingress controller features that don't have Gateway API equivalents yet. Rate limiting, request transformation, and custom authentication plugins may require additional work.

For teams with stable Ingress deployments that aren't adding new routing complexity, the migration may not be urgent. But for teams adding canary deployments, multi-domain TLS, cross-namespace routing, or header-based traffic splitting, Gateway API is the cleaner path.

The Decision Point

If you're building a new GKE platform in 2025, start with Gateway API. There's no technical reason to choose Ingress for new deployments — you're just accumulating migration work for later.

If you're running Ingress in production, the question is timing. Every annotation you add is another integration point that makes migration harder. Every controller upgrade is a risk window for routing behaviour changes.

Ingress will get you to production-grade traffic management eventually, with enough annotations and operational discipline. Gateway API gets you there cleanly, with infrastructure that matches how GCP actually works.

The annotation sprawl isn't a feature. It's a warning sign that you've outgrown the abstraction.

Work with a GCP specialist — book a free discovery call

Amit Malhotra

Principal GCP Architect, Buoyant Cloud Inc

Work with a GCP specialist — book a free discovery call → https://buoyantcloudtech.com

DEV Community

GKE Gateway API: Why Ingress Is Technical Debt in 2025

GKE Gateway API: Why Ingress Is Technical Debt in 2025

The Annotation Sprawl Problem

Gateway API Is the Successor — And It's Production-Ready on GKE

What Gateway API Handles Natively

The Business Case for Migration

Trade-offs to Consider

The Decision Point

Top comments (0)