DEV Community

ak0047
ak0047

Posted on

How We Cut AWS Costs by $10,000/Month: Switching from NAT Gateway to VPC Endpoint

Introduction

While maintaining an internal web application at a previous job, I was asked to investigate why our AWS bill was so high.
In this post, I’ll share the process from identifying the cause to implementing a solution.

By the end, we managed to reduce costs by about $10,000 per month, so I hope this helps anyone running a similar architecture.


Identifying the Cause

The investigation revealed that the main culprit was the NAT Gateway.
Here’s what the relevant part of the system architecture looked like before:

system configuration diagram before
Why was it costing so much?

  • ECS tasks were running inside a private subnet on a schedule
  • Each time they pulled Docker images from ECR, traffic went through the NAT Gateway
  • Multiple tasks ran every 1–5 minutes, generating huge amounts of outbound traffic

As a result, NAT Gateway data transfer charges reached about $5,000/month.

Since the same setup was used in both the production and development environments, the total cost was about $10,000/month.


The Solution: Introduce a VPC Endpoint

We changed the setup so that pulling images from ECR no longer went through the NAT Gateway, but instead used a VPC Endpoint.

Here’s the updated architecture:

system configuration diagram after

For the exact configuration steps, refer to the official documentation:
AWS Docs: Using VPC Endpoints for ECR Access


What is a VPC Endpoint?

A VPC Endpoint allows your VPC to connect privately to AWS services without going through the internet.

Key benefits include:

  • No internet traffic: Access AWS services via private IP addresses
  • No NAT Gateway required: Greatly reduces data transfer costs
  • Better security: Keep traffic off the public internet

Cost Comparison: NAT Gateway vs VPC Endpoint

Feature NAT Gateway VPC Endpoint
Data transfer Expensive (per GB) Free or very low
Internet route Required Not required
Security Public IP connection Private traffic

If you frequently pull images from ECR, costs can spike dramatically when using a NAT Gateway. With a VPC Endpoint, costs remain almost fixed.


Cost Impact After the Change

Before: About $5,000/month in NAT Gateway data transfer charges
After switching to VPC Endpoints: Under $100/month

Since both production and development environments had the same setup, we achieved a total savings of about $10,000/month.


Lessons Learned

From this experience, I took away several key points:

  • Pulling from ECR through a NAT Gateway with frequent task runs can cause costs to skyrocket
  • Using a VPC Endpoint can significantly cut costs
  • Documentation and handover are critical
    • All original developers had left, and there was little system documentation, which made root cause analysis harder
  • Operational knowledge is as important as development skills
    • If I had been more familiar with Cost Explorer or CloudWatch Logs, I could have found the cause much faster

When optimizing AWS operational costs, especially for workloads that frequently use ECR or S3, VPC Endpoints can be a game-changer.
If you’re facing similar challenges, I highly recommend considering them.

Top comments (0)