Some context first: we were running on GKE Autopilot, where the Gateway API just works out of the box. Google manages the CRDs and the underlying load balancer controller for you create a Gateway, and it gets an external IP without you ever thinking about CRD lifecycle.
Moving that same ingress layer to EKS meant none of that was ready to use anymore. The first real decision wasn't about Envoy Gateway's configuration at all it was about how to install its CRDs without them colliding with the Gateway API CRDs, or with each other, during future migrations.
Installing the Gateway API CRDs
We start by installing the Gateway API CRDs first:
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.1/standard-install.yaml
What this installs:
-
GatewayClassCRD -
GatewayCRD -
HTTPRouteCRD -
ReferenceGrantCRD
Verify with:
kubectl get crd | grep gateway.networking.k8s.io
Installing Envoy Gateway's CRDs and Controller via ArgoCD
Next, install Envoy Gateway's own CRDs and the controller itself as two separate ArgoCD Applications, on two separate sync waves:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: envoy-gateway-crds
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "-1"
spec:
project: default
source:
repoURL: oci://docker.io/envoyproxy
chart: gateway-crds-helm
targetRevision: v1.8.0
helm:
values: |
crds:
gatewayAPI:
enabled: false # Gateway API CRDs managed separately
channel: experimental
envoyGateway:
enabled: true # Only Envoy-specific CRDs
destination:
server: https://kubernetes.default.svc
namespace: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- ServerSideApply=true
- CreateNamespace=false
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: envoy-gateway
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "0"
spec:
project: default
source:
repoURL: oci://docker.io/envoyproxy
chart: gateway-helm
targetRevision: v1.8.0
helm:
skipCrds: true # CRDs managed by envoy-gateway-crds app
destination:
server: https://kubernetes.default.svc
namespace: envoy-gateway-system
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- ServerSideApply=true
- CreateNamespace=true
If you're not using ArgoCD, the equivalent Helm commands are:
helm install envoy-gateway-crds oci://docker.io/envoyproxy/gateway-crds-helm \
--version v1.8.0 \
--namespace default \
--server-side \
--set crds.gatewayAPI.enabled=false \
--set crds.gatewayAPI.channel=experimental \
--set crds.envoyGateway.enabled=true
helm install envoy-gateway oci://docker.io/envoyproxy/gateway-helm \
--version v1.8.0 \
--namespace envoy-gateway-system \
--create-namespace \
--skip-crds \
--server-side
A few things matter here:
-
gatewayAPI.enabled: falsethe shared Gateway API CRDs (GatewayClass,Gateway,HTTPRoute,ReferenceGrant) aren't installed by this chart. They're installed once, separately, by their own Application, independent of any controller. -
envoyGateway.enabled: truethis chart installs only Envoy Gateway's own CRDs, includingEnvoyProxy, on sync-wave-1, before the controller exists. -
skipCrds: trueon the envoy-gateway chart (wave0) the controller deployment goes in after its CRDs already exist, and never touches CRD lifecycle itself. -
ServerSideApply=trueon both field-level ownership instead of whole-object ownership, so multiple Applications can touch overlapping CRDs without one overwriting the other.
Both Applications are templated as part of our cluster-bootstrap ApplicationSet, so every environment gets Envoy Gateway's CRDs-then-controller order automatically no manual sequencing per cluster for this part of the stack.
Architecture at a Glance
Internet
│
▼
AWS NLB (provisioned by AWS Load Balancer Controller)
│
▼
Envoy Proxy Pods (managed by Envoy Gateway, autoscaled by HPA)
│
▼
Application Services (via HTTPRoute rules)
The LoadBalancer Pending Trap
With the CRDs and controller in place, the next thing that breaks on a fresh EKS setup is the Service Envoy Gateway generates for its proxy deployment. By default, it's type LoadBalancer, and Kubernetes' in-tree cloud controller tries to provision a Classic Load Balancer for it.
On modern EKS clusters, that fails silently. No CLB gets created, no useful error appears in events, and the Service just sits at <pending> indefinitely.
The fix doesn't live on the Gateway object — Envoy Gateway generates its own Service internally, so there's nothing on Gateway.metadata to annotate. The fix has to go into the EnvoyProxy CRD, the same CRD installed separately in the -1 sync wave above, via envoyService.annotations:
envoyService:
annotations:
# Stops the in-tree CLB provisioner - AWS Load Balancer Controller
# takes over and creates an NLB instead.
service.beta.kubernetes.io/aws-load-balancer-type: "external"
# Public-facing NLB. Use "internal" for private traffic only.
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
# Route NLB traffic directly to Pod IPs via VPC CNI -
# bypasses kube-proxy and NodePort.
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
# NLB health check hits Envoy's admin port. /healthz returns 200
# only once Envoy is fully ready, so the NLB never routes to a
# pod that's still starting or draining.
service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: HTTP
service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "19002"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: "/healthz"
The single annotation that actually breaks the deadlock is aws-load-balancer-type: "external" it tells the in-tree controller to back off, and hands the Service to the AWS Load Balancer Controller, which then provisions a real NLB and writes its hostname back to gateway.status.addresses. The rest of the block (scheme, target type, health check) is what makes that NLB actually production-ready rather than just "not pending."
Putting It Together: GatewayClass, Gateway, and EnvoyProxy
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: external-gateway
namespace: gateway
spec:
gatewayClassName: eg
infrastructure:
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: external-proxy-config
listeners:
- name: http
protocol: HTTP
port: 80
And the EnvoyProxy CRD that ties resources, autoscaling, and the LB fix together in one place:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: external-proxy-config
namespace: gateway
spec:
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
patch:
type: StrategicMerge
value:
spec:
template:
spec:
containers:
- name: shutdown-manager
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 120"]
container:
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
memory: 1Gi
envoyHpa:
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
envoyService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: HTTP
service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "19002"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: "/healthz"
Applying the Gateway triggers the chain reaction: Envoy Gateway creates the proxy deployment, the LoadBalancer Service with the annotations above, and an HPA — then AWS LBC provisions the NLB. HTTPRoute objects attach to the Gateway afterward and define per-service routing, owned by app teams.
Autoscaling Explained
The envoyHpa block creates a standard Kubernetes HPA against the proxy deployment. minReplicas: 1 keeps cost down during idle periods, at the cost of zero redundancy for ~15-30s if that pod dies. averageUtilization: 60 (150m of the 250m request) triggers scale-out early enough that new pods are healthy before latency degrades. For zero-downtime guarantees, minReplicas: 2 or 3 is the move.
Final Thoughts
Running Envoy Gateway on Amazon EKS isn't just about deploying another ingress controller — it's about understanding where the responsibilities are split.
Unlike managed Kubernetes offerings where the Gateway API experience is largely invisible, EKS gives you the flexibility to control every layer. That also means you own the lifecycle of the Gateway API CRDs, the Envoy Gateway CRDs, the controller installation, and the integration with the AWS Load Balancer Controller.
Separating CRDs from the controller, using ArgoCD sync waves to guarantee deployment order, and configuring the EnvoyProxy resource as the single place for infrastructure concerns makes the setup predictable and GitOps-friendly. It also avoids one of the most common migration issues: LoadBalancer Services remaining in a perpetual Pending state because the wrong controller is trying to provision them.
Top comments (0)