Amazon EKS Hybrid Nodes Gateway Deep Dive — VXLAN, Cilium VTEP, and Lease-Based Leader Election Redefining Hybrid Kubernetes Networking in 2026
On April 28, 2026, alongside the OpenAI–Bedrock partnership announcement at the What's Next with AWS 2026 keynote, AWS quietly delivered one of the most operationally meaningful container updates of the year: the general availability of Amazon EKS Hybrid Nodes gateway. Since EKS Hybrid Nodes first shipped at re:Invent 2024, the single biggest operational tax on adopters has been a deceptively simple question: "How do we make on-premises pod CIDRs routable from the VPC?" That tax disappears with one Helm install. This post walks through the four-axis architecture of the new gateway, the AWS-maintained Cilium build with the CiliumVTEPConfig CRD, the VXLAN tunnel (VNI 2 / UDP 8472), the Kubernetes Lease-based leader election (3–5s failover), the automated VPC route table synchronization, IAM and CIDR design rules, and the parallel ECS announcements (EC2 Capacity Reservations integration and NLB Canary deployments) that landed in the same week — all from a hybrid-cluster operator's point of view.
1. Why the Gateway Is an Inflection Point — From 2024 GA to 2026 Gateway
EKS Hybrid Nodes went GA at re:Invent 2024 with a clear promise: "Manage cloud EC2 workers and on-prem bare metal under a single EKS control plane." The first year of adoption was rougher than the marketing implied. The biggest hurdle was networking. The AWS VPC CNI is incompatible with hybrid nodes, so operators had to deploy Cilium or Calico — and to make control-plane-to-webhook, EC2-pod-to-on-prem-pod, and ALB/NLB-to-on-prem-pod traffic work, they had to explicitly register on-prem pod CIDRs in VPC route tables, Transit Gateways, and Virtual Private Gateways.
That work created three persistent operational debts. First, every change to on-prem pod CIDRs required a coordinated change across VPC route tables, TGW, and VGW. Second, in some enterprises (finance, public sector) the routing policy simply forbids exposing pod CIDRs externally — which blocked EKS Hybrid Nodes adoption entirely. Third, BGP-level pod traffic exposure added a new monitoring and operational surface for IDC network teams.
The April 28, 2026 EKS Hybrid Nodes gateway pays down all three debts at once. The core decision is to stop trying to make pod CIDRs routable; instead encapsulate the traffic with VXLAN inside the VPC and carry it to the hybrid nodes as opaque payloads. The VPC no longer needs to know that on-prem pod CIDRs exist. It only needs to know one thing: "send traffic destined for these pod CIDRs to the gateway EC2 ENI."
| Aspect | 2024 EKS Hybrid Nodes (BGP model) | 2026 Hybrid Nodes Gateway (VXLAN model) |
|---|---|---|
| On-prem pod CIDR exposure | Must register in VPC, TGW, VGW | Not required (VXLAN encapsulation) |
| VPC route table management | Manual or IaC-driven changes | Auto-synced by gateway |
| On-prem ↔ AWS routing | Requires BGP peering | UDP 8472 inbound/outbound is sufficient |
| Control plane → webhook | Routed via VGW/TGW | Encapsulated via gateway ENI |
| EC2 pod ↔ on-prem pod | Depends on BGP routing | Direct VXLAN tunnel |
| ALB/NLB → on-prem pod | Previously unsupported | Native, at no additional charge |
| Failover time | BGP reconvergence (tens of seconds) | Lease-based leader election: 3–5 seconds |
| Pricing | EKS Hybrid Nodes per-vCPU-hour | Gateway itself: no extra charge (only standard EC2/EKS fees) |
One-line summary: hybrid Kubernetes is finally free of BGP governance.
2. The Four-Axis Architecture — VPC, Gateway, VXLAN, On-Prem
The gateway can be reasoned about as four axes. Splitting responsibilities along these lines makes security design, observability, and troubleshooting fall into place at once.
2.1 Axis 1: VPC Route Table — "Send pod-CIDR traffic to the gateway ENI"
At install time the operator provides a list of VPC route table IDs. The gateway controller pod automatically inserts entries that point on-prem pod CIDRs at the active gateway pod's ENI. These entries are written by the gateway pod's IAM role calling EC2 APIs directly — no IaC is involved.
# Auto-inserted route table entries (example)
Destination Target Status
10.200.0.0/16 eni-0abc1234... (active gateway pod) active
10.201.0.0/16 eni-0abc1234... active
# On failover the ENI target is updated to the new active ENI.
# On clean shutdown the routes are removed automatically.
Thanks to those routes, EC2 pods, ALBs/NLBs, and the EKS control plane ENIs (used for webhook calls) inside the VPC all have a deterministic path to the gateway whenever they target an on-prem pod IP.
2.2 Axis 2: Gateway EC2 — Active/Standby with Lease-Based Leader Election
The gateway is a Deployment of two pods. Both land on EC2 nodes labeled for gateway use (a dedicated node pool, managed node group, or self-managed nodes). Leader election uses a Kubernetes Lease object to decide which pod actively forwards traffic. Both Active and Standby create the hybrid_vxlan0 VXLAN interface at startup and run a node reconciler that watches CiliumNode CRs. Because both the VXLAN interface and the reconciler are pre-warmed on Standby, failover completes within 3–5 seconds when the Active pod dies.
Two operational implications follow. First, place the two gateway EC2 instances in different AZs. Second, choose a network-bandwidth-rich instance family (m6i/m7i large or above) — the gateway is the single path for all VPC-to-on-prem pod traffic.
2.3 Axis 3: VXLAN (VNI 2 / UDP 8472) — Cilium-Compatible by Default
VXLAN is implemented by the hybrid_vxlan0 interface inside the gateway EC2 instance. VNI 2 / UDP 8472 matches the Cilium default, so the gateway shares a data plane with the on-prem nodes' Cilium agents. When a new hybrid node registers, the gateway adds it as a remote VTEP, programming FDB entries, ARP entries, and routes on the VXLAN interface to complete the tunnel. Dynamic registration relies on the CiliumVTEPConfig CRD bundled with the AWS-maintained Cilium build — not present in upstream Cilium — which the gateway uses to register itself as the remote VTEP.
# CiliumVTEPConfig CR — auto-generated and managed by the gateway
apiVersion: cilium.io/v2alpha1
kind: CiliumVTEPConfig
metadata:
name: eks-hybrid-gateway
namespace: kube-system
spec:
vtepEndpoints:
- ip: 10.10.1.42 # active gateway ENI primary IP
mac: "0a:1b:2c:3d:4e:5f"
cidr: "10.0.0.0/16" # VPC CIDR — encapsulate traffic for these pods
vni: 2
port: 8472 # UDP 8472, Cilium default
mtu: 1450 # 50 bytes for VXLAN header
status:
active: true
lastReconciledAt: "2026-05-06T08:30:00Z"
2.4 Axis 4: On-Prem Nodes (Cilium VTEP Decoder)
Each on-prem node's Cilium agent reads the CiliumVTEPConfig and registers the gateway IP as a remote VTEP. When an encapsulated packet arrives, it strips the VXLAN header and routes inline to the destination pod. The reverse direction (on-prem pod → VPC) uses the same tunnel. Cilium unifies both directions into one data plane, so policies (NetworkPolicy / CiliumNetworkPolicy) apply consistently across the cluster.
3. Prerequisites — IAM, Security Groups, CIDR Design
Three pre-flight tasks must be completed before deployment. Skipping any one of them either blocks the gateway from updating the VPC routes or yields a cluster where packets reach the on-prem side but never come back.
3.1 Gateway IAM Permissions — Scoped via IRSA
The gateway pod must update its own node's ENI-pointing routes, which requires EC2 permissions. Bind the following policy via IRSA (IAM Roles for Service Accounts).
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DescribeRouteTables",
"Effect": "Allow",
"Action": [
"ec2:DescribeRouteTables",
"ec2:DescribeNetworkInterfaces"
],
"Resource": "*"
},
{
"Sid": "ManagePodCidrRoutes",
"Effect": "Allow",
"Action": [
"ec2:CreateRoute",
"ec2:ReplaceRoute",
"ec2:DeleteRoute"
],
"Resource": [
"arn:aws:ec2:ap-northeast-2:123456789012:route-table/rtb-aaaa1111",
"arn:aws:ec2:ap-northeast-2:123456789012:route-table/rtb-bbbb2222"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/eks-hybrid-nodes-gateway": "true"
}
}
}
]
}
Design note: never use a wildcard for Resource. Pin the route-table ARNs the gateway is allowed to manage, and add a tag-based Condition so that only route tables explicitly tagged eks-hybrid-nodes-gateway=true are eligible. This contains the blast radius if anything goes wrong.
3.2 Security Groups and On-Prem Firewalls
VXLAN needs a single bidirectional UDP port (8472) open in both places.
| Where | Direction | Protocol/Port | Source/Destination |
|---|---|---|---|
| Gateway EC2 SG | Inbound | UDP 8472 | On-prem node IP CIDR |
| Gateway EC2 SG | Outbound | UDP 8472 | On-prem node IP CIDR |
| On-prem firewall | Inbound/Outbound | UDP 8472 | Gateway EC2 ENI IP range |
| EKS cluster SG | Inbound | TCP 443 / 10250 | On-prem node IP CIDR (kubelet ↔ API) |
3.3 CIDR Design — RFC-1918 / RFC-6598, Non-Overlapping
On-prem node and pod CIDRs must come from one of these ranges:
- RFC-1918:
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 - RFC-6598 (CGNAT):
100.64.0.0/10
And they must not overlap with any of:
- The VPC CIDR (e.g.,
10.0.0.0/16) - The Kubernetes service CIDR (e.g.,
10.100.0.0/16) - Each other (on-prem node CIDR ↔ on-prem pod CIDR)
- Any peered or TGW-routed CIDR
Practical recommendation: pick 100.64.0.0/16 or 100.65.0.0/16 from RFC-6598 for on-prem pod CIDRs. They almost never collide with internal RFC-1918 corporate ranges, and since the VPC never has to know about them, you get extra freedom.
4. Deployment — One Helm Install, 30-Second Failover Test
The gateway ships as a Helm chart alongside the AWS-maintained Cilium build. Deployment is four steps.
4.1 Provision the Gateway Node Pool
# eksctl: dedicated gateway node group across two AZs
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: prod-hybrid
region: ap-northeast-2
version: "1.32"
managedNodeGroups:
- name: gateway-pool
instanceType: m7i.large # generous network bandwidth
desiredCapacity: 2
minSize: 2
maxSize: 4
availabilityZones: ["ap-northeast-2a", "ap-northeast-2c"]
labels:
role: hybrid-gateway
taints:
- key: hybrid-gateway
value: "true"
effect: NoSchedule
privateNetworking: true
4.2 Install the AWS-Maintained Cilium
helm repo add eks https://aws.github.io/eks-charts
helm upgrade --install cilium eks/cilium-eks \
--namespace kube-system \
--version 1.16.7-eks.1 \
--set kubeProxyReplacement=true \
--set tunnelProtocol=vxlan \
--set ipv4NativeRoutingCIDR=100.64.0.0/16 \
--set hybridNodes.enabled=true
4.3 Install the Gateway Components + Bind IRSA
helm upgrade --install eks-hybrid-gateway eks/eks-hybrid-nodes-gateway \
--namespace kube-system \
--version 1.0.0 \
--set serviceAccount.create=true \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789012:role/eks-hybrid-gateway \
--set nodeSelector.role=hybrid-gateway \
--set tolerations[0].key=hybrid-gateway \
--set tolerations[0].operator=Equal \
--set tolerations[0].value=true \
--set tolerations[0].effect=NoSchedule \
--set vpcRouteTableIds="{rtb-aaaa1111,rtb-bbbb2222}" \
--set podAntiAffinity.topologyKey=topology.kubernetes.io/zone
4.4 Verify — 30-Second Failover Test
# 1) Identify the leader gateway pod
kubectl -n kube-system get lease eks-hybrid-gateway -o yaml | grep holderIdentity
# 2) Confirm the auto-inserted VPC route entries
aws ec2 describe-route-tables \
--route-table-ids rtb-aaaa1111 \
--query "RouteTables[].Routes[?DestinationCidrBlock=='100.64.0.0/16']"
# 3) Reach an on-prem pod from a VPC pod
kubectl run curl --rm -it --image=curlimages/curl -- \
curl -fsS http://100.64.0.50:8080/healthz
# 4) Failover test — kill the active pod
kubectl -n kube-system delete pod eks-hybrid-gateway-79bc6f5c5d-abcde
# Standby becomes leader within 3–5 seconds; ENI target on the VPC route flips automatically.
5. Traffic Matrix — Four Flows You Must Recognize
Once installed, the cluster carries traffic in four distinct patterns. Recognizing each pattern keeps policy, observability, and cost design coherent.
| Source → Destination | Path | Encapsulation | Use Case |
|---|---|---|---|
| EKS control plane → on-prem pod (Webhook) | EKS ENI → VPC route → gateway ENI → VXLAN → on-prem pod | VXLAN | Admission webhook, aggregated APIServer |
| VPC EC2 pod → on-prem pod | EC2 ENI → VPC route → gateway ENI → VXLAN → on-prem pod | VXLAN | Microservice-to-microservice calls |
| On-prem pod → VPC EC2 pod | On-prem Cilium → VXLAN → gateway ENI → EC2 pod | VXLAN (reverse) | IDC workload calling cache/services next to RDS |
| ALB/NLB → on-prem pod | ALB/NLB → gateway ENI → VXLAN → on-prem pod | VXLAN | External user traffic reaching IDC GPU/licensed workloads |
The fourth flow (ALB/NLB → on-prem pod) is the highest-impact change in 2026. Previously, sending external user traffic to an IDC workload meant placing a separate ALB/NLB inside the IDC or detouring via Direct Connect; now a single ALB can target both EC2 pods and IDC pods in the same target group. That unlocks two important workload classes — AI inference tied to GPU licenses anchored in the IDC, and compliance-bound data-processing that cannot leave the IDC — by letting them participate in the cloud-native ingress flow without extra infrastructure.
6. Operations — Metrics, Failover Scenarios, and Five Common Pitfalls
6.1 Metrics and Logs
The gateway exposes Prometheus metrics. The standard pattern is to scrape them with the OpenTelemetry Collector and ship to CloudWatch or Grafana.
# Prometheus scrape example
- job_name: eks-hybrid-gateway
kubernetes_sd_configs:
- role: pod
namespaces:
names: [kube-system]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: eks-hybrid-gateway
metrics_path: /metrics
# Key metrics:
# eks_hybrid_gateway_lease_holder{pod} 1=leader, 0=standby
# eks_hybrid_gateway_vtep_count number of registered remote VTEPs
# eks_hybrid_gateway_vxlan_packets_total{dir} tx/rx packet counts
# eks_hybrid_gateway_route_reconcile_errors_total VPC route update failures
6.2 Four Failover Scenarios
| Failure | Recovery Time | Operator Action |
|---|---|---|
| Active gateway pod crash | 3–5s | None — Lease flips automatically |
| Active node OS crash | 5–10s (Lease TTL + route update) | None — ASG replaces the node |
| VPC route update failure | Alert only; previous routes remain | Verify IAM Condition and route-table ARN scoping |
| UDP 8472 blocked between sites | Total cutoff | Inspect firewalls / VPN / Direct Connect ACLs |
6.3 Five Common Pitfalls
-
Pod CIDR collision — even one bit of overlap with VPC CIDR breaks the routes. Run
aws ec2 describe-vpcsand review every peering and TGW entry. -
MTU 1450 missing — the VXLAN header eats 50 bytes; leaving MTU at 1500 yields fragmentation errors. Set
tunnelMTU=1450in the Cilium chart. - Unidirectional UDP 8472 — both the security group and the on-prem firewall must allow traffic in both directions. Half-open kills the handshake.
-
IAM
Resourcewildcard — letting the gateway touch arbitrary route tables is a foot-gun. Pin ARNs and use a tagCondition. -
Missing anti-affinity — if both gateway pods land on the same node or AZ, failover loses meaning. Enforce
topologyKey=topology.kubernetes.io/zone.
7. The Same Week: ECS Capacity Reservations and NLB Canary
The What's Next with AWS 2026 keynote also delivered two meaningful ECS updates that hybrid-cluster operators should review in the same quarter.
7.1 ECS Managed Instances + EC2 Capacity Reservations
ECS Managed Instance capacity providers gained a new option, capacityOptionType=reserved, allowing tasks to consume previously reserved EC2 capacity. Three strategies are available.
| Strategy | Meaning | Recommended Use |
|---|---|---|
reservations-only |
Launch only into reserved capacity | License/GPU/BYOL workloads |
reservations-first |
Prefer reservations, fall back to on-demand | Predictable baseline plus spike handling |
reservations-excluded |
Never use reservations | Spot-heavy cost-optimized workloads |
# Bind a Capacity Reservation to an ECS capacity provider
aws ecs create-capacity-provider \
--name reserved-baseline \
--auto-scaling-group-provider 'autoScalingGroupArn=arn:aws:autoscaling:...' \
--managed-instances-provider '{
"capacityOptionType": "reserved",
"capacityReservationGroupArn": "arn:aws:resource-groups:ap-northeast-2:123456789012:group/cr-baseline",
"reservationStrategy": "reservations-first"
}'
7.2 ECS Linear/Canary Deployments on NLB
ECS already supported linear/canary on ALB; NLB was the gap. With the new release, NLB-backed services can shift traffic in fractional steps, integrated with CloudWatch alarms for automatic rollback when P99 latency or 5xx error rates breach thresholds.
# Apply NLB canary deployment policy to an ECS service
aws ecs update-service \
--cluster prod \
--service api-grpc \
--deployment-configuration '{
"deploymentCircuitBreaker": {"enable": true, "rollback": true},
"strategy": "CANARY",
"stepWeights": [10, 25, 50, 100],
"stepDuration": 300
}'
Latency-critical workloads that require NLB — gRPC services, financial matching engines — finally get the same deployment freedom that ALB users have had for years.
8. A 4-Week Migration Checklist
The 4-week plan a hybrid-cluster operator can use to migrate an existing EKS 1.32 + IDC GPU node deployment to the new gateway:
| Week | Work | Done When |
|---|---|---|
| Week 1 | CIDR redesign, IAM role, security groups, Direct Connect ACL review | On-prem pod CIDR fixed in non-overlapping RFC-6598 range; IAM Resource ARN/tag scoped |
| Week 2 | Deploy gateway pool + AWS-maintained Cilium + gateway chart in Dev | Lease holder, VTEP count, 3–5s failover verified; P99 RTT measured |
| Week 3 | Run all four traffic flows end-to-end in Staging | Bidirectional reachability across all flows; consistent NetworkPolicy enforcement |
| Week 4 | Prod cutover + retire BGP routing + apply ECS Capacity Reservations / NLB Canary | Manually-registered pod CIDRs removed from VPC route tables; cost & latency SLOs healthy |
8.1 Cost Model
The gateway itself is free. Total cost is driven by three things: (1) the EC2 cost of two gateway instances (or managed node group equivalents); (2) data-transfer fees through the gateway ENIs; (3) the existing EKS Hybrid Nodes per-vCPU-hour fee. As long as you reuse an existing Direct Connect / VPN circuit, the gateway cuts operational burden while adding only the cost of two EC2 instances.
8.2 Recommended SLOs
- VTEP registration latency: new hybrid node visible to the VTEP within 30 seconds (P95)
- VXLAN packet drop rate: below 0.001% per minute
- Gateway failover time: under 5 seconds (P99)
- VPC route update failures: zero (critical alert)
- ALB → on-prem pod RTT overhead: within +1–2 ms over Direct Connect baseline
9. Conclusion — Hybrid Kubernetes Has a New Default
As of May 2026, EKS Hybrid Nodes gateway settles two things. First, the default network model for hybrid Kubernetes is encapsulation, not BGP. Not exposing pod CIDRs externally shortens compliance and security review, and reduces IaC churn around route tables to zero. Second, ALB and NLB can now treat EC2 pods and IDC pods as members of the same target group — letting "external user → cloud → IDC GPU" flow through a single ingress for the first time. That is the thread that pulls license-bound and data-sovereignty-bound workloads back into a cloud-native posture without separate infrastructure.
For us at ManoIT, the next-quarter roadmap has three legs. First, walk Dev → Staging → Prod over four weeks and pin gateway metrics into the front row of our SRE dashboard. Second, shift equivalent ECS workloads to NLB Canary, wired to P99-latency-driven automatic rollback. Third, anchor GPU and licensed baselines on ECS Managed Instances + Capacity Reservations while keeping a Spot fallback strategy. When those three land, our container infrastructure becomes one platform with three faces — EKS, ECS, and hybrid — visible from a single operational surface.
This article was produced with assistance from AI (Claude) and reviewed for technical accuracy.
© 2026 ManoIT — www.manoit.co.kr
Originally published at ManoIT Tech Blog.
Top comments (0)