AWS Cost Optimization: When Reducing Cross-Zone Data Transfer Actually Costs More
Introduction
When reviewing AWS costs, data transfer charges often stand out—especially for high-traffic organizations. Unlike compute or storage resources, you can't easily tag data transfer costs to identify which project or environment generated them. The bill just says "data transfer between zones" with a growing number.
Most AWS cost optimization guides focus on the obvious wins:
- Reduce internet egress traffic
- Minimize cross-region data transfer
- Eliminate unnecessary data movement
But cross-zone data transfer is different. AWS designed many services to span multiple availability zones for redundancy: ECS/EKS clusters, RDS, MSK (Managed Kafka), Application Load Balancers. You can't simply force everything into a single zone without sacrificing availability.
This article explores two specific scenarios where the "obvious" cost optimization—eliminating cross-zone traffic—can actually increase your total costs or break your architecture.
A Brief History of NAT in AWS
Let me set the stage with how we got here:
Before AWS (2000s)
Organizations ran bare-metal NAT boxes in colocation facilities for outbound traffic. Cost: $1,500+ for entry-level hardware or third-party appliances.
Early AWS Days (2010-2012)
We launched EC2 instances as NAT boxes. Cost: ~$50/month. Huge improvement.
AWS Managed NAT Gateway (2015+)
AWS launched NAT Gateway as a managed service. Cost: $32/month per gateway + $0.045/GB processed. Even better.
By legacy convention, all internal resources needing outbound internet access routed through this NAT Gateway. Over the years, this created significant cross-zone traffic as instances in zones B and C sent traffic to the NAT in zone A.
No one noticed... until the organization decided to review every penny of AWS spending.
Optimization #1: NAT Gateway Per Zone
The "Obvious" Solution
The optimization seems straightforward:
- Create one NAT Gateway per availability zone
- Configure route tables so each zone uses its local NAT
- Eliminate cross-zone data transfer for outbound traffic
- Bonus: Better availability if one NAT fails
In Terraform, this looks cleaner and more organized. Excellent!
But Wait... Is That Actually Saving Money?
Here's the reality check most teams miss:
NAT Gateway costs: $32/month × 12 months = $384/year per gateway
Cross-zone data transfer: $0.01/GB
For this optimization to break even, each additional NAT Gateway must save at least 38,400 GB (38 TB) of cross-zone traffic per year.
Real-World Math
Let's say you have:
- 20 production VPCs
- 3 availability zones per VPC (currently 1 NAT per VPC)
- 3 lower environments (staging, QA, dev) with similar setup
Adding NAT gateways:
- Prod: 20 VPCs × 2 additional NATs = 40 new NATs
- Lower envs: 3 envs × 20 VPCs × 2 additional NATs = 120 new NATs
- Total new NATs: 160
Annual cost of new NATs: 160 × $384 = $61,440/year
Required savings to break even: 6,144 TB of cross-zone traffic eliminated
Actual outbound traffic (check your NAT Gateway metrics): In most organizations, outbound traffic is relatively small—maybe 100-500 GB/month per VPC.
Even if 50% crosses zones, you're looking at:
- 20 VPCs × 500 GB × 50% × 12 months = 60 TB/year
- Cost at $0.01/GB = $600/year saved
Result: Spend $61,440 to save $600.
That's not optimization—that's a 100x cost increase.
When It Actually Makes Sense
NAT per zone IS cost-effective when:
- Outbound traffic exceeds 3 TB/month per VPC
- High availability is critical (worth paying for redundancy)
- You have only a few VPCs (fixed NAT cost is manageable)
Takeaway: Check your actual traffic volumes in CloudWatch metrics before implementing this "optimization."
A Similar Pattern: VPC Endpoints
VPC Endpoints follow the same pattern:
Historical context:
- VPC Endpoints didn't exist when AWS launched
- ALL traffic to S3, CloudWatch, SSM, Secrets Manager, etc. went through NAT to the internet and back
- Sounds crazy, but that was reality
Today's dilemma:
- VPC Endpoint costs: $7.20/month + $0.01/GB processed
- AWS has hundreds of endpoint-supported services
- Should you create endpoints for all of them?
The answer: Calculate based on actual traffic to each service.
For example:
- S3 endpoint: Probably yes (high traffic volume)
- Secrets Manager endpoint: Maybe not (low traffic, API calls only)
- Athena endpoint: Depends on your query patterns
The decision is data-driven, not default.
Optimization #2: Internal ALBs and Service Discovery
This one involves both cost AND architecture tradeoffs.
Another Brief History
Before cloud (2000s-2010s):
Many organizations developed in-house service discovery systems for internal microservices communication. (Different names at different companies, same concept.)
Early AWS days:
No equivalent service existed. Application Load Balancers became the industry standard for both external and internal microservices.
Today:
AWS offers Cloud Map (Service Discovery), which could replace internal ALBs in theory.
What's the Problem with Internal ALBs?
ALBs work great... until you calculate the costs at scale.
Fixed costs per ALB:
- $16.20/month (0.0225/hour × 730 hours)
- Plus: $0.008 per LCU-hour (varies with traffic)
Data transfer costs:
By design, ECS/EKS clusters run instances across availability zones. When Service A calls Service B through an internal ALB:
- Service A → ALB: Cross-zone data transfer ($0.01/GB)
- ALB → Service B: Cross-zone data transfer ($0.01/GB)
Every internal API call potentially incurs 2× cross-zone charges.
Real-World Example
Organization with 80 internal services:
- Production: 80 ALBs
- Staging: 80 ALBs
- QA: 80 ALBs
- Dev: 80 ALBs
ALB fixed costs alone:
- 320 ALBs × $16.20 × 12 months = $62,208/year
Plus:
- LCU charges based on traffic
- Cross-zone data transfer (2× per request)
For high-traffic services, this becomes significant.
The AWS Service Discovery Alternative
AWS Cloud Map (Service Discovery) eliminates:
- ALB fixed costs
- Cross-zone data transfer charges (direct service-to-service)
- LCU charges
Sounds perfect for cost optimization!
The Architectural Problem
Critical limitation: AWS Service Discovery only works within the same VPC.
This means: Services in different VPCs—even in the same region and same account—cannot communicate via Service Discovery. Cross-VPC service discovery does not work.
Why not? AWS Service Discovery uses DNS-based service discovery with Route 53 private hosted zones. The private hosted zone is associated with a specific VPC. When a service in VPC B tries to discover a service in VPC A:
- VPC B doesn't have access to VPC A's private hosted zone
- DNS resolution fails (can't resolve the service name)
- Even with VPC Peering or Transit Gateway providing network connectivity, DNS discovery still fails
You CAN manually associate the Route 53 hosted zone with multiple VPCs as a workaround, but this:
- Requires manual configuration for each VPC
- Adds operational complexity managing DNS associations
- Defeats the entire purpose of "simple service discovery"
At that point, you're just trading ALB complexity for DNS management complexity.
This breaks a fundamental architecture pattern from the multi-account article: Project Team Autonomy.
If you gave each team their own VPC (Strategy 1 or 3 from the previous article), switching to Service Discovery forces you to either:
- Consolidate teams into fewer VPCs (losing isolation)
- Manually associate Route 53 hosted zones across all VPCs (operational complexity, defeats simplicity)
- Keep ALBs for cross-VPC communication (partial benefit only, still paying for ALBs)
The Real Tradeoff
This isn't a pure cost optimization—it's an architectural tradeoff:
Option A: Keep internal ALBs
- ✅ Teams maintain VPC isolation
- ✅ Cross-VPC communication works seamlessly
- ❌ Higher costs ($62k+/year for 320 ALBs)
- ❌ Cross-zone data transfer overhead
Option B: Switch to Service Discovery
- ✅ Eliminate ALB fixed costs
- ✅ Reduce cross-zone data transfer
- ❌ Lose cross-VPC communication
- ❌ Break team autonomy architecture
- ❌ Requires VPC consolidation or Transit Gateway
For many organizations, the architectural cost of losing team autonomy outweighs the dollar savings.
Key Takeaways
1. "Obvious" Optimizations Aren't Always Savings
The pattern repeats:
- Adding NAT per zone sounds logical
- Creating VPC endpoints for everything seems safe
- Replacing ALBs with Service Discovery looks cost-effective
But each requires actual data to determine if savings exceed new costs.
2. AWS Architecture Forces Tradeoffs
Cross-zone data transfer costs exist because AWS resources are multi-zone by design:
- You can't eliminate the traffic without losing redundancy
- "Optimizations" often just shift costs to different line items
- Sometimes the shifted cost is higher than the original
3. Hidden Costs of "Cheaper" Solutions
When evaluating alternatives, consider:
- Fixed costs: NAT Gateways, VPC Endpoints, ALBs have monthly charges
- Architectural constraints: Service Discovery's VPC limitation
- Operational complexity: Managing more resources costs engineering time
- Availability implications: Single-zone NAT is a single point of failure
4. Calculate Before Implementing
Before any "optimization":
- Check CloudWatch metrics for actual traffic volumes
- Calculate break-even points
- Consider architectural implications
- Factor in operational overhead
$600 saved on data transfer isn't worth $60,000 spent on NAT Gateways.
5. Sometimes the "Expensive" Option is Correct
Paying for internal ALBs might be the right choice if:
- Team autonomy is architecturally important
- Cross-VPC communication is a core requirement
- The cost is acceptable for the business value provided
Cost optimization doesn't mean "minimize every line item." It means "maximize value per dollar spent."
Final Thought
Cross-zone data transfer optimization is like the multi-account architecture problem: AWS's design decisions create tradeoffs that seem fixable with "obvious" solutions, but the obvious solution often creates new problems or higher costs.
The goal isn't to eliminate cross-zone traffic at all costs. The goal is to understand:
- Where your actual costs come from (check the data!)
- What each optimization really costs (including hidden costs)
- Whether the architectural tradeoffs are acceptable
Sometimes the best optimization is accepting the cost and focusing on higher-value improvements elsewhere.
Building cost-effective AWS infrastructure? Optimizing data transfer costs? Share your experiences and lessons learned in the comments.
Connect with me on LinkedIn: https://www.linkedin.com/in/rex-zhen-b8b06632/
I share insights on cloud architecture, SRE practices, and cost optimization strategies. Let's connect and learn together!
Top comments (0)