DEV Community

Cover image for Stop Paying $150/Month for NAT Gateways: Save 90% with This Terraform Trick πŸ’Έ
Suhas Mallesh
Suhas Mallesh

Posted on

Stop Paying $150/Month for NAT Gateways: Save 90% with This Terraform Trick πŸ’Έ

AWS NAT Gateway costs $45/month each plus data fees. Here's how to slash costs by 90% using fck-nat with Terraformβ€”full HA setup included.

Quick question: Do you know how much your NAT Gateways cost?

Most teams don't realize they're spending $45/month per NAT Gateway plus $0.045/GB in data processing fees. A typical multi-AZ setup with 3 NAT Gateways processing 1TB/month costs:

3 NAT Gateways Γ— $32.40/month     = $97.20
Data processing: 1,000 GB Γ— $0.045 = $45.00
Total monthly cost:                  $142.20
Annual cost:                         $1,706.40
Enter fullscreen mode Exit fullscreen mode

For what? Giving your private subnet instances internet access.

There's a better way. Let me show you how to get the same functionality for $15/month using Terraform.

πŸ’Έ Why NAT Gateways Are So Expensive

NAT Gateway pricing has two components:

  1. Hourly charge: $0.045/hour per gateway = $32.40/month
  2. Data processing: $0.045/GB processed

For a production multi-AZ setup (3 availability zones):

  • 3 NAT Gateways running 24/7: $97.20/month
  • Data processing (1TB): $45/month
  • Total: $142.20/month minimum

And that's before you process any serious traffic. Handle 5TB/month? Add another $225 in data fees.

🎯 The Solution: fck-nat

fck-nat is an open-source NAT solution that runs on a tiny EC2 instance. It does the exact same thing as NAT Gateway but costs ~90% less.

Cost comparison:

Solution Monthly Cost Annual Cost
3 NAT Gateways + 1TB data $142 $1,706
3 fck-nat instances (t4g.nano) $15 $180
Savings $127 $1,526

And there's no data processing fee. Zero. Nada. πŸŽ‰

πŸ› οΈ Terraform Implementation

Basic Single-AZ Setup (Simplest)

Start simple with one NAT instance:

# modules/fck-nat/main.tf

data "aws_ami" "fck_nat" {
  most_recent = true
  owners      = ["568608671756"]  # fck-nat AMI owner

  filter {
    name   = "name"
    values = ["fck-nat-al2023-*"]
  }

  filter {
    name   = "architecture"
    values = ["arm64"]  # ARM is cheaper
  }
}

resource "aws_instance" "fck_nat" {
  ami           = data.aws_ami.fck_nat.id
  instance_type = "t4g.nano"  # $3/month, plenty of power
  subnet_id     = var.public_subnet_id

  source_dest_check = false  # Critical for NAT to work!

  tags = {
    Name = "fck-nat-instance"
  }
}

resource "aws_eip" "fck_nat" {
  domain   = "vpc"
  instance = aws_instance.fck_nat.id

  tags = {
    Name = "fck-nat-eip"
  }
}

# Route table for private subnets
resource "aws_route_table" "private" {
  vpc_id = var.vpc_id

  route {
    cidr_block           = "0.0.0.0/0"
    network_interface_id = aws_instance.fck_nat.primary_network_interface_id
  }

  tags = {
    Name = "private-route-table"
  }
}

resource "aws_route_table_association" "private" {
  for_each       = toset(var.private_subnet_ids)
  subnet_id      = each.value
  route_table_id = aws_route_table.private.id
}
Enter fullscreen mode Exit fullscreen mode

Deploy it:

terraform apply
# Total cost: ~$5/month (t4g.nano + EIP)
Enter fullscreen mode Exit fullscreen mode

High-Availability Multi-AZ Setup (Production-Ready)

For production, you want HA across multiple AZs:

# modules/fck-nat-ha/main.tf

variable "availability_zones" {
  description = "AZs to deploy NAT instances"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

variable "public_subnet_ids" {
  description = "Map of AZ to public subnet ID"
  type        = map(string)
}

variable "private_subnet_ids" {
  description = "Map of AZ to list of private subnet IDs"
  type        = map(list(string))
}

data "aws_ami" "fck_nat" {
  most_recent = true
  owners      = ["568608671756"]

  filter {
    name   = "name"
    values = ["fck-nat-al2023-*"]
  }

  filter {
    name   = "architecture"
    values = ["arm64"]
  }
}

# Security group for NAT instances
resource "aws_security_group" "fck_nat" {
  name_prefix = "fck-nat-"
  vpc_id      = var.vpc_id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = [var.vpc_cidr]  # Allow from VPC
  }

  tags = {
    Name = "fck-nat-sg"
  }
}

# One NAT instance per AZ
resource "aws_instance" "fck_nat" {
  for_each = toset(var.availability_zones)

  ami                    = data.aws_ami.fck_nat.id
  instance_type          = "t4g.nano"
  subnet_id              = var.public_subnet_ids[each.key]
  vpc_security_group_ids = [aws_security_group.fck_nat.id]

  source_dest_check = false

  tags = {
    Name = "fck-nat-${each.key}"
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Elastic IPs for each NAT instance
resource "aws_eip" "fck_nat" {
  for_each = toset(var.availability_zones)

  domain   = "vpc"
  instance = aws_instance.fck_nat[each.key].id

  tags = {
    Name = "fck-nat-eip-${each.key}"
  }
}

# Route tables - one per AZ for fault isolation
resource "aws_route_table" "private" {
  for_each = toset(var.availability_zones)

  vpc_id = var.vpc_id

  route {
    cidr_block           = "0.0.0.0/0"
    network_interface_id = aws_instance.fck_nat[each.key].primary_network_interface_id
  }

  tags = {
    Name = "private-rt-${each.key}"
  }
}

# Associate private subnets with their AZ's route table
resource "aws_route_table_association" "private" {
  for_each = {
    for item in flatten([
      for az, subnets in var.private_subnet_ids : [
        for subnet in subnets : {
          az     = az
          subnet = subnet
        }
      ]
    ]) : "${item.az}-${item.subnet}" => item
  }

  subnet_id      = each.value.subnet
  route_table_id = aws_route_table.private[each.value.az].id
}

# Auto-recovery for failed instances
resource "aws_cloudwatch_metric_alarm" "auto_recover" {
  for_each = toset(var.availability_zones)

  alarm_name          = "fck-nat-auto-recover-${each.key}"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "StatusCheckFailed_System"
  namespace           = "AWS/EC2"
  period              = 60
  statistic           = "Average"
  threshold           = 0
  alarm_description   = "Auto-recover fck-nat instance if system check fails"

  alarm_actions = ["arn:aws:automate:${var.aws_region}:ec2:recover"]

  dimensions = {
    InstanceId = aws_instance.fck_nat[each.key].id
  }
}

output "nat_instance_ids" {
  value = { for az, instance in aws_instance.fck_nat : az => instance.id }
}

output "nat_public_ips" {
  value = { for az, eip in aws_eip.fck_nat : az => eip.public_ip }
}
Enter fullscreen mode Exit fullscreen mode

Usage Example

# main.tf

module "fck_nat" {
  source = "./modules/fck-nat-ha"

  vpc_id             = aws_vpc.main.id
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  aws_region         = "us-east-1"

  public_subnet_ids = {
    "us-east-1a" = aws_subnet.public_a.id
    "us-east-1b" = aws_subnet.public_b.id
    "us-east-1c" = aws_subnet.public_c.id
  }

  private_subnet_ids = {
    "us-east-1a" = [aws_subnet.private_a.id]
    "us-east-1b" = [aws_subnet.private_b.id]
    "us-east-1c" = [aws_subnet.private_c.id]
  }
}

output "nat_details" {
  value = {
    instance_ids = module.fck_nat.nat_instance_ids
    public_ips   = module.fck_nat.nat_public_ips
  }
}
Enter fullscreen mode Exit fullscreen mode

Deploy it:

terraform init
terraform apply

# Cost: 3 Γ— t4g.nano ($3/mo) + 3 Γ— EIP ($0) = ~$15/month
# vs NAT Gateway: $142/month
# Savings: $127/month = $1,524/year πŸŽ‰
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Cost Breakdown Comparison

NAT Gateway (Traditional AWS Approach)

3 NAT Gateways:
  - 3 Γ— $0.045/hour Γ— 730 hours     = $97.20
  - Data processing: 1TB Γ— $0.045   = $45.00
  - Total:                            $142.20/month

Annual cost: $1,706.40
Enter fullscreen mode Exit fullscreen mode

fck-nat (Optimized Approach)

3 t4g.nano instances:
  - 3 Γ— $0.0042/hour Γ— 730 hours    = $9.20
  - 3 Γ— EIP (in use)                = $0.00
  - Data processing                  = $0.00
  - Total:                            $9.20/month

Annual cost: $110.40
Savings: $1,596/year (93% reduction!)
Enter fullscreen mode Exit fullscreen mode

⚑ Performance Considerations

Q: Can a t4g.nano handle my traffic?

A: Almost certainly yes. Here's the math:

  • t4g.nano baseline: 5% CPU, bursts to 100%
  • Network performance: Up to 5 Gbps
  • Typical NAT load: Very low CPU usage (mostly network I/O)

Real-world test: A single t4g.nano easily handles:

  • 100+ Mbps sustained throughput
  • 10,000+ concurrent connections
  • 1TB+/month traffic

If you need more, upgrade to t4g.micro ($6/month) for 10% baseline and better burst credits.

πŸ”’ High Availability & Fault Tolerance

The HA setup includes:

βœ… Per-AZ NAT instances - Each AZ has its own NAT (like NAT Gateway)

βœ… Auto-recovery - CloudWatch alarms automatically recover failed instances

βœ… Fault isolation - Failure in one AZ doesn't affect others

βœ… Elastic IPs - Static IPs maintained across instance recovery

What happens if an instance fails?

  1. CloudWatch detects system status check failure (~2 minutes)
  2. EC2 auto-recovery launches replacement instance (~3-5 minutes)
  3. EIP automatically reattaches
  4. Total downtime: ~5-7 minutes (acceptable for most workloads)

For zero downtime, add Auto Scaling Groups:

# Optional: Zero-downtime with ASG (adds ~$3/month)

resource "aws_autoscaling_group" "fck_nat" {
  for_each = toset(var.availability_zones)

  name                = "fck-nat-asg-${each.key}"
  vpc_zone_identifier = [var.public_subnet_ids[each.key]]
  min_size            = 1
  max_size            = 1
  desired_capacity    = 1

  launch_template {
    id      = aws_launch_template.fck_nat[each.key].id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "fck-nat-${each.key}"
    propagate_at_launch = true
  }
}
Enter fullscreen mode Exit fullscreen mode

⚠️ When NOT to Use fck-nat

There are a few scenarios where NAT Gateway might be worth the cost:

  1. Extreme traffic: >10 Gbps sustained throughput (use NAT Gateway or multiple larger instances)
  2. Compliance requirements: Some regulations explicitly require AWS-managed services
  3. Zero-tolerance for downtime: Sub-minute failover SLA (though ASG setup gets close)
  4. No time for management: You value convenience over $1,500/year in savings

For 95% of use cases, fck-nat is the smarter choice.

πŸŽ“ Migration Checklist

Switching from NAT Gateway to fck-nat:

Step 1: Deploy fck-nat alongside NAT Gateway

terraform apply -target=module.fck_nat
Enter fullscreen mode Exit fullscreen mode

Step 2: Test with one private subnet

# Update one subnet's route table to point to fck-nat
# Test connectivity from instances in that subnet
curl -I https://api.github.com
Enter fullscreen mode Exit fullscreen mode

Step 3: Migrate remaining subnets

# Update route tables one AZ at a time
terraform apply
Enter fullscreen mode Exit fullscreen mode

Step 4: Remove NAT Gateways

# Comment out NAT Gateway resources
terraform destroy -target=aws_nat_gateway.main
Enter fullscreen mode Exit fullscreen mode

Step 5: Celebrate savings πŸŽ‰

# Watch your AWS bill drop next month
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Pro Tips

1. Use ARM instances (t4g family)

t4g.nano is 20% cheaper than t3.nano and performs better for NAT workloads.

2. Enable detailed monitoring ($2/month per instance)

Worth it for better auto-recovery detection:

resource "aws_instance" "fck_nat" {
  monitoring = true  # Detailed CloudWatch metrics
}
Enter fullscreen mode Exit fullscreen mode

3. Tag your NAT instances

Makes cost tracking easier:

tags = {
  Name        = "fck-nat-${each.key}"
  Purpose     = "NAT"
  CostCenter  = "networking"
  Environment = "production"
}
Enter fullscreen mode Exit fullscreen mode

4. Set up billing alerts

Get notified if traffic spikes unexpectedly:

resource "aws_cloudwatch_metric_alarm" "nat_traffic" {
  alarm_name          = "high-nat-traffic"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "NetworkOut"
  namespace           = "AWS/EC2"
  period              = 3600
  statistic           = "Sum"
  threshold           = 100000000000  # 100 GB/hour
  alarm_description   = "NAT instance processing >100GB/hour"
}
Enter fullscreen mode Exit fullscreen mode

πŸš€ Quick Start

Want to try it right now?

# Clone the fck-nat Terraform module
git clone https://github.com/AndrewGuenther/fck-nat.git

# Or use the examples from this article
mkdir fck-nat-setup
cd fck-nat-setup

# Copy the HA setup code from above into main.tf
# Update variables with your VPC/subnet IDs

terraform init
terraform plan  # Review what will be created
terraform apply # Deploy it!

# Monitor your NAT instance
aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=fck-nat-*" \
  --query 'Reservations[].Instances[].[InstanceId,State.Name,PublicIpAddress]' \
  --output table
Enter fullscreen mode Exit fullscreen mode

πŸ“ˆ Real-World Success Story

Before (NAT Gateway setup):

  • 3 NAT Gateways across us-east-1a/b/c
  • 2TB/month average traffic
  • Monthly cost: $97 (hourly) + $90 (data) = $187/month

After (fck-nat setup):

  • 3 t4g.nano instances
  • Same 2TB/month traffic
  • Monthly cost: $9/month

Annual savings: $2,136 πŸ’°

Time to implement: 2 hours

ROI: Literally infinite (one-time 2-hour investment)

🎯 Summary

Factor NAT Gateway fck-nat Winner
Cost (3 AZs) $142/month $15/month πŸ† fck-nat
Data fees $0.045/GB $0/GB πŸ† fck-nat
Setup complexity Low Medium NAT Gateway
Performance Unlimited Up to 5 Gbps NAT Gateway*
Management Zero Minimal NAT Gateway
Annual savings - $1,524 πŸ† fck-nat

*For most workloads, 5 Gbps is more than enough

Bottom line: Unless you have extreme requirements, fck-nat saves you $1,500+/year with minimal effort.

Stop overpaying for NAT. Your AWS bill will thank you. πŸš€


Migrated from NAT Gateway to fck-nat? How much are you saving? Share in the comments! πŸ’¬

Follow for more AWS cost optimization with Terraform! ⚑

Top comments (0)