Introduction
While maintaining an internal web application at a previous job, I was asked to investigate why our AWS bill was so high.
In this post, I’ll share the process from identifying the cause to implementing a solution.
By the end, we managed to reduce costs by about $10,000 per month, so I hope this helps anyone running a similar architecture.
Identifying the Cause
The investigation revealed that the main culprit was the NAT Gateway.
Here’s what the relevant part of the system architecture looked like before:
- ECS tasks were running inside a private subnet on a schedule
- Each time they pulled Docker images from ECR, traffic went through the NAT Gateway
- Multiple tasks ran every 1–5 minutes, generating huge amounts of outbound traffic
As a result, NAT Gateway data transfer charges reached about $5,000/month.
Since the same setup was used in both the production and development environments, the total cost was about $10,000/month.
The Solution: Introduce a VPC Endpoint
We changed the setup so that pulling images from ECR no longer went through the NAT Gateway, but instead used a VPC Endpoint.
Here’s the updated architecture:
For the exact configuration steps, refer to the official documentation:
AWS Docs: Using VPC Endpoints for ECR Access
What is a VPC Endpoint?
A VPC Endpoint allows your VPC to connect privately to AWS services without going through the internet.
Key benefits include:
- No internet traffic: Access AWS services via private IP addresses
- No NAT Gateway required: Greatly reduces data transfer costs
- Better security: Keep traffic off the public internet
Cost Comparison: NAT Gateway vs VPC Endpoint
Feature | NAT Gateway | VPC Endpoint |
---|---|---|
Data transfer | Expensive (per GB) | Free or very low |
Internet route | Required | Not required |
Security | Public IP connection | Private traffic |
If you frequently pull images from ECR, costs can spike dramatically when using a NAT Gateway. With a VPC Endpoint, costs remain almost fixed.
Cost Impact After the Change
Before: About $5,000/month in NAT Gateway data transfer charges
After switching to VPC Endpoints: Under $100/month
Since both production and development environments had the same setup, we achieved a total savings of about $10,000/month.
Lessons Learned
From this experience, I took away several key points:
- Pulling from ECR through a NAT Gateway with frequent task runs can cause costs to skyrocket
- Using a VPC Endpoint can significantly cut costs
-
Documentation and handover are critical
- All original developers had left, and there was little system documentation, which made root cause analysis harder
-
Operational knowledge is as important as development skills
- If I had been more familiar with Cost Explorer or CloudWatch Logs, I could have found the cause much faster
When optimizing AWS operational costs, especially for workloads that frequently use ECR or S3, VPC Endpoints can be a game-changer.
If you’re facing similar challenges, I highly recommend considering them.
Top comments (0)