As Kubernetes workloads scale on AWS across multiple Availability Zones (AZs), managing inter-AZ traffic efficiently is crucial for performance and cost savings. AWS charges for data transferred between AZs, and Kubernetes’ standard networking can inadvertently increase this cross-zone traffic. Cilium, a modern, eBPF-powered networking and security solution, offers unique capabilities to reduce these costs while improving network visibility and control. This blog merges clear explanations and official resources, providing a comprehensive overview of how Cilium helps optimize cross-AZ traffic on AWS.
The Challenge of Cross-AZ Traffic on AWS
AWS bills data transfer anytime network traffic crosses AZ boundaries within the same region (PS: it costs for other regions too!, using "same region" to stay aligned with AZ boundary and concept). Kubernetes Service types such as LoadBalancer or NodePort may distribute traffic across nodes in different AZs, leading to increased cross-zone data flow and charges. This is especially impactful at scale where pod-to-pod communication patterns cause costly inter-AZ hops.
How Cilium Limits Cross-AZ Transfer Costs
Cilium employs the Linux kernel's eBPF technology to transform Kubernetes networking with efficiency and deep visibility. Its key features for reducing cross-AZ traffic include:
Topology-Aware Routing: Cilium supports Kubernetes topology-aware service routing, ensuring traffic stays within the same AZ whenever possible to avoid cross-zone charges. This feature uses annotations like
topology.kubernetes.io/zone
to guide Kubernetes service traffic locality.ENI Mode Integration: Cilium's ENI IP Address Management (IPAM) mode assigns pod IPs directly to AWS Elastic Network Interfaces (ENIs) attached to nodes within the same AZ. In this setup, pod traffic routes natively through AWS networking without encapsulation, reducing latency and avoiding cross-AZ data transfers.
Advanced IPAM: Cilium offers IPAM modes such as ENI and ClusterPool, providing granular control over IP assignment and routing. These modes improve traffic locality by aligning pod IPs with underlying AWS subnet allocation per AZ.
Policy-Driven Traffic Control: With Cilium’s rich layer 3 to layer 7 network policies, you can enforce strict AZ-local communication rules or selectively allow cross-AZ traffic only when needed.
Practical Cilium Setup on AWS EKS
Implementing Cilium on EKS involves options from full replacement of AWS VPC CNI to running alongside it in a secondary CNI mode. To optimize cross-AZ traffic:
- Enable Topology-Aware Routing: Use Kubernetes service annotations paired with Cilium’s kube-proxy replacement to route traffic preferentially within the same AZ.
+-------------------------------------------------------------+
| AWS Region (Multiple AZs) |
| |
| +------------------------+ +------------------------+ |
| | Availability Zone A | | Availability Zone B | |
| | | | | |
| | +------------------+ | | +------------------+ | |
| | | Node A1 | | | | Node B1 | | |
| | | Pod(s) A | | | | Pod(s) B | | |
| | +------------------+ | | +------------------+ | |
| | | | | | | |
| | | Service Traffic | | | Service Traffic | |
| | | goes within AZ | | | goes within AZ | |
| | v | | v | |
| | +------------------+ | | +------------------+ | |
| | | Pod(s) A (target)|<--+ | | Pod(s) B (target)|<--+ |
| | +------------------+ | | +------------------+ | |
| | | | | |
| +------------------------+ +------------------------+ |
| |
| Kubernetes Service |
| - Annotated with topology.kubernetes.io/zone |
| - Cilium replaces kube-proxy, respecting topology hints |
| - Routes client traffic preferentially within same AZ |
+-------------------------------------------------------------+
Legend:
- Service traffic stays within the same AZ
- If Pod targets exist in the same AZ, no cross-AZ routing occurs
- Traffic flows across AZs only if necessary (failover or no local endpoints)
- Deploy Cilium ENI Mode: This maps pod IPs to ENIs tied to the same AZ subnet as the hosting node, enabling native AWS routing and cutting down on costly inter-AZ traffic.
+-----------------------------------------------------------+
| AWS Availability Zone A (us-west-2a) |
| |
| +-----------------+ +-----------------+ |
| | EC2 Node #1 | | EC2 Node #2 | |
| | | | | |
| | +-----------+ | | +-----------+ | |
| | | ENI eth0 |---|------| | ENI eth0 |---|----+ |
| | +-----------+ | | +-----------+ | | |
| | | | | | | | |
| | +-----------+ | | +-----------+ | | |
| | | Pod A | | | | Pod B | | | |
| | +-----------+ | | +-----------+ | | |
| | | | | | |
| +-----------------+ +-----------------+ | |
| | |
| Native AWS Subnet & Route Table (local) | |
+---------------------------------------------------|-------+
|
minimal inter-AZ traffic |
|
+---------------------------------------------------|-------+
| AWS Availability Zone B (us-west-2b) | |
| | |
| +-----------------+ +-----------------+ | |
| | EC2 Node #3 | | EC2 Node #4 | | |
| | | | | | |
| | +-----------+ | | +-----------+ | | |
| | | ENI eth0 |---|------| | ENI eth0 |---|----+ |
| | +-----------+ | | +-----------+ | |
| | | | | | | |
| | +-----------+ | | +-----------+ | |
| | | Pod C | | | | Pod D | | |
| | +-----------+ | | +-----------+ | |
| | | | | |
| +-----------------+ +-----------------+ |
| |
+-----------------------------------------------------------+
Legend:
- ENI: AWS Elastic Network Interface
- Pod: Kubernetes Pod, with IP mapped to ENI in node's AZ subnet
- Native subnet & route: traffic is routed locally within AZ
- Inter-AZ traffic: minimized (only when necessary for HA or failover)
- Leverage Cluster Mesh: For multi-cluster or multi-region scenarios, Cluster Mesh manages service endpoints to prefer local pods and restrict unnecessary data flow across zones.
+-----------------------------------------------------------------------------------+
| AWS Region (Multi-AZ) |
| |
| +-------------------------------+ +-------------------------------+ |
| | Availability Zone A (AZ-a) | | Availability Zone B (AZ-b) | |
| | | | | |
| | +-------------------------+ | | +-------------------------+ | |
| | | Cluster A (in AZ-a) | | | | Cluster B (in AZ-b) | | |
| | | | | | | | | |
| | | +------+ +------+ | | | | +------+ +------+ | | |
| | | | Pods | | Pods | | | | | | Pods | | Pods | | | |
| | | +------+ +------+ | | | | +------+ +------+ | | |
| | | | | | | | | |
| | | Traffic stays local | | | | Traffic stays local | | |
| | | within AZ and Cluster | | | | within AZ and Cluster | | |
| | +------------|------------+ | | +------------|------------+ | |
| +---------------|---------------+ +---------------|---------------+ |
| | | |
| Traffic to other clusters stays minimal | |
| for high availability & resiliency | |
| +------------------------------------------+ |
| | |
| +-------v-------+ |
| | Cluster Mesh | |
| | Synchronizes | |
| | Service & | |
| | Endpoint Info| |
| +---------------+ |
| |
| Resiliency: Failover / backup cluster routes traffic across AZs |
+-----------------------------------------------------------------------------------+
Legend:
- Pods communicate locally within their cluster and AZ.
- Traffic to other AZs only for resiliency or failover (Cluster Mesh).
- Cluster Mesh ensures clusters share service status across AZs without unnecessary cross-AZ pod traffic.
Many users have reported notable savings on AWS data transfer costs by carefully tuning these settings in real deployments.
Benefits of Using Cilium for Cross-AZ Optimization
- Cost Reduction: Keeps data transfer local to the zone, cutting expensive AWS inter-AZ charges.
- Improved Availability: Maintains Kubernetes service resiliency by balancing traffic intelligently but favoring locality.
- Observability with Hubble: Deep, real-time visibility into pod-to-pod communication paths helps diagnose network flow and optimize topology.
- Fine-Grained Security: Layer 7 network policies enable precise control over permissible traffic patterns in and across AZs.
Challenges to Consider
- Complex Configuration: Setting up advanced IPAM modes and topology-aware routing requires deeper networking knowledge.
- Learning Curve: Teams new to eBPF and Cilium’s enhanced policy model may face an adjustment period.
- AWS Resource Limits: AWS ENI attachment limits and subnet sizing must be carefully managed to avoid capacity bottlenecks.
- Kubernetes Version Dependency: Some features rely on newer Kubernetes releases supporting topology hints and service routing.
Conclusion
Optimizing cross-AZ traffic on AWS Kubernetes clusters is essential for both cost efficiency and application performance. Cilium’s eBPF-driven approach combined with AWS native networking integration offers a modern, powerful solution. While setup complexity exists, the tradeoff is significant savings and greater control. For technical teams ready to invest in advanced networking, Cilium is a compelling choice.
Further Reading and Official Resources
- Cilium Topology-Aware Routing: https://docs.cilium.io/en/stable/networking/topology-aware-routing/
- Cilium AWS ENI Mode Documentation: https://docs.cilium.io/en/stable/networking/aws-eni/
- Installing Cilium on EKS in ENI Mode: https://cilium.io/blog/2025/06/19/eks-eni-install/
- Kubernetes Topology-Aware Service Routing: https://kubernetes.io/docs/concepts/services-networking/service-topology/
- Observability with Hubble: https://docs.cilium.io/en/stable/operations/hubble/
- Cluster Mesh Overview: https://docs.cilium.io/en/stable/networking/clustermesh/
- Getting Started with Cilium on Amazon EKS: https://aws.amazon.com/blogs/opensource/getting-started-with-cilium-service-mesh-on-amazon-eks/
With these resources and a considered approach, teams can unlock the full potential of Cilium to streamline AWS Kubernetes networking and lower your cross-AZ bill. Happy networking!
Top comments (0)