DEV Community

Cover image for AWS CDK 100 Drill Exercises #004: NAT Instance V2 — Cost-Effective NAT with Automated Scheduling and Patch Management

AWS CDK 100 Drill Exercises #004: NAT Instance V2 — Cost-Effective NAT with Automated Scheduling and Patch Management

Level 300

Introduction

This is the fourth installment of "AWS CDK 100 Drill Exercises."

For more information about AWS CDK 100 Drill Exercises, please refer to this article.

In the previous article, we learned VPC basics and introduced a method to reduce NAT Gateways to a single instance for cost optimization in development environments. This time, we'll explain how to replace NAT Gateway with an EC2-based NAT instance for even greater cost savings.

Why NAT Instance?

Development and test environments often don't need to run 24/7. However, NAT Gateway uses a pay-as-you-go pricing model, incurring charges even during idle hours.

Cost Comparison (Tokyo Region):

Resource Monthly Cost Annual Cost Features
NAT Gateway (1 instance) ~$44.64 ~$535.68 24/7 operation, data transfer charges separate
NAT Instance (t4g.nano) ~$4.66 ~$55.92 Schedulable start/stop
Savings Rate 90.8% reduction 90.8% reduction -

Furthermore, when operating only during business hours (weekdays 9:00-18:00):

Resource Monthly Cost Annual Cost Features
NAT Instance (business hours only) ~$1.84 ~$22.08 9 hours on weekdays only
Savings Rate 96.4% reduction 96.4% reduction -

💡 In development environments, you can save over $587.97 annually!

What You'll Learn

  • NAT Instance v2 implementation method
  • Automated start/stop scheduling with EventBridge
  • Static Elastic IP assignment to NAT Instance
  • Monitoring NAT Instance state changes and SNS notifications
  • Automated patch application using Systems Manager Patch Manager
  • Maintenance window configuration and operations
  • Trade-offs between NAT Gateway and NAT Instance

📁 Code Repository: All code examples for this exercise are available on GitHub.

Architecture Overview

Architecture Overview

The basic VPC configuration is the same as vpc-basics, with the following differences.

Key Changes

  1. NAT Gateway → NAT Instance: Changed from managed NAT service to EC2-based NAT instance
  2. EventBridge Schedule: Automated start/stop schedule configuration
  3. Elastic IP Assignment: Assigned static IP address to NAT Instance
  4. SNS Notification: Monitoring NAT Instance state changes
  5. Patch Manager: Automated patch application via Systems Manager

Prerequisites

In addition to the prerequisites of vpc-basics, the following are required:

  • Basic understanding of EventBridge and SNS
  • Knowledge of EC2 instance types

NAT Instance v2 Implementation

1. Creating NAT Provider

In CDK v2, you can easily create a NAT instance using NatProvider.instanceV2().

import * as ec2 from 'aws-cdk-lib/aws-ec2';

// Creating NAT Instance Provider
const natProvider = ec2.NatProvider.instanceV2({
  instanceType: ec2.InstanceType.of(
    ec2.InstanceClass.T4G,  // ARM-based Graviton2
    ec2.InstanceSize.NANO   // Smallest size for cost optimization
  ),
  machineImage: ec2.MachineImage.latestAmazonLinux2023({
    edition: ec2.AmazonLinuxEdition.STANDARD,
    cpuType: ec2.AmazonLinuxCpuType.ARM_64,  // For Graviton2
  }),
  defaultAllowedTraffic: ec2.NatTrafficDirection.OUTBOUND_ONLY,
});

// Applying NAT Provider to VPC
const vpc = new ec2.Vpc(this, 'VpcNatInstanceV2', {
  vpcName,
  ipAddresses: ec2.IpAddresses.cidr('10.1.0.0/16'),
  maxAzs: 3,
  natGateways: 3,  // One per AZ
  natGatewayProvider: natProvider,  // Using NAT Instance
  subnetConfiguration: [
    // Subnet configuration same as before
    // ...
  ],
});
Enter fullscreen mode Exit fullscreen mode

Why t4g.nano?

※ Tokyo Region pricing

Instance Type vCPU Memory Price/hour (Tokyo) Monthly Use Case
t4g.nano 2 0.5 GB $0.0054 ~$3.94 Small traffic for development environments
t4g.micro 2 1 GB $0.0108 ~$7.88 Medium traffic
t3.nano 2 0.5 GB $0.0068 ~$4.96 When x86 is required

💡 Benefits of Graviton2 (ARM):

  • Approximately 20% cheaper than equivalent x86 instances
  • Superior cost-performance ratio
  • Fully supported by Amazon Linux 2023

2. Allowing Traffic from VPC to NAT Instance

NAT Instance needs to accept all traffic from within the VPC.

// Allow all traffic from VPC CIDR
(natProvider as ec2.NatInstanceProviderV2).connections.allowFrom(
  ec2.Peer.ipv4(vpc.vpcCidrBlock),
  ec2.Port.allTraffic(),
  'Allow all traffic from VPC',
);
Enter fullscreen mode Exit fullscreen mode

This allows resources in private subnets to access the internet via NAT Instance.

Automated Start/Stop with EventBridge

You can further reduce costs by stopping NAT Instance outside business hours.

1. Creating IAM Role

import * as iam from 'aws-cdk-lib/aws-iam';

const natInstanceScheduleRole = new iam.Role(this, 'NatInstanceScheduleRole', {
  roleName: [props.project, props.environment, 'NatInstanceSchedule'].join('-'),
  assumedBy: new iam.ServicePrincipal('events.amazonaws.com'),
  managedPolicies: [
    iam.ManagedPolicy.fromAwsManagedPolicyName(
      'service-role/AmazonSSMAutomationRole'
    ),
  ],
});
Enter fullscreen mode Exit fullscreen mode

2. Creating Schedule Rules

import * as events from 'aws-cdk-lib/aws-events';

const region = cdk.Stack.of(this).region;

// Schedule configuration (UTC time)
const startCronSchedule = 'cron(0 0 ? * * *)'; // 00:00 UTC (JST 09:00)
const stopCronSchedule = 'cron(0 9 ? * * *)';  // 09:00 UTC (JST 18:00)

const natInstanceIds: string[] = [];

natProvider.configuredGateways.forEach((nat, index) => {
  natInstanceIds.push(nat.gatewayId);

  // Start schedule
  new events.CfnRule(this, `EC2StartRule${index + 1}`, {
    name: [props.project, props.environment, 'NATStartRule', nat.gatewayId].join('-'),
    description: `${nat.gatewayId} ${startCronSchedule} Start`,
    scheduleExpression: startCronSchedule,
    targets: [{
      arn: `arn:aws:ssm:${region}::automation-definition/AWS-StartEC2Instance:$DEFAULT`,
      id: 'TargetEC2Instance1',
      input: `{"InstanceId": ["${nat.gatewayId}"]}`,
      roleArn: natInstanceScheduleRole.roleArn,
    }],
  });

  // Stop schedule
  new events.CfnRule(this, `EC2StopRule${index + 1}`, {
    name: [props.project, props.environment, 'NATStopRule', nat.gatewayId].join('-'),
    description: `${nat.gatewayId} ${stopCronSchedule} Stop`,
    scheduleExpression: stopCronSchedule,
    targets: [{
      arn: `arn:aws:ssm:${region}::automation-definition/AWS-StopEC2Instance:$DEFAULT`,
      id: 'TargetEC2Instance1',
      input: `{"InstanceId": ["${nat.gatewayId}"]}`,
      roleArn: natInstanceScheduleRole.roleArn,
    }],
  });
});
Enter fullscreen mode Exit fullscreen mode

Understanding Cron Expressions

cron(minute hour day month day-of-week year)

Examples:
cron(0 0 ? * * *)     # Every day at 00:00 UTC
cron(0 9 ? * * *)     # Every day at 09:00 UTC
cron(0 0 ? * MON-FRI *) # Weekdays only at 00:00 UTC
cron(0 0 1 * ? *)     # 1st day of every month at 00:00 UTC
Enter fullscreen mode Exit fullscreen mode

💡 Time Zone Notes:

  • EventBridge cron uses UTC time
  • JST = UTC + 9 hours
  • JST 09:00 = UTC 00:00
  • JST 18:00 = UTC 09:00

Schedule examples by environment:

Environment Operating Hours (JST) Start (UTC) Stop (UTC) Monthly Cost
Development Weekdays 9:00-18:00 cron(0 0 ? * MON-FRI *) cron(0 9 ? * MON-FRI *) ~$1.07 (t4g.nano)
Test Daily 9:00-18:00 cron(0 0 ? * * *) cron(0 9 ? * * *) ~$3.89 (t4g.nano)
Staging 24/7 Recommend NAT Gateway same as production Recommend NAT Gateway same as production ~$44.64
Production 24/7 Recommend NAT Gateway Recommend NAT Gateway ~$44.64

Static Elastic IP Assignment

By assigning a static Elastic IP to NAT Instance, you can fix the source IP address for outbound communications.

const outboundEips: ec2.CfnEIP[] = [];

natProvider.configuredGateways.forEach((nat, index) => {
  // Creating Elastic IP
  const eip = new ec2.CfnEIP(this, `NatEip${index + 1}`, {
    tags: [{
      key: "Name",
      value: `${props.project}/${props.environment}/NatEIP${index + 1}`
    }],
  });
  eip.applyRemovalPolicy(cdk.RemovalPolicy.DESTROY);

  // Associate Elastic IP with NAT Instance
  new ec2.CfnEIPAssociation(this, `NatEipAssociation${index + 1}`, {
    allocationId: eip.attrAllocationId,
    instanceId: nat.gatewayId,
  });

  // Output as CloudFormation Output
  new cdk.CfnOutput(this, `NatInstance${index + 1}PublicIP`, {
    value: eip.ref,
    description: `Public IP address of NAT Instance ${index + 1}`,
  });

  outboundEips.push(eip);
});
Enter fullscreen mode Exit fullscreen mode

Why Static IP is Needed?

  1. External Service Whitelisting

    • Many third-party APIs require IP whitelisting
    • Database access control via security groups
    • VPN connection configurations
  2. Log Analysis and Troubleshooting

    • Tracking outbound traffic with fixed source IP
    • Easier identification of access source
  3. Compliance Requirements

    • Some industries require tracking of outbound communication sources
    • Auditing and logging requirements

Monitoring NAT Instance State Changes

Set up SNS notifications to receive alerts when NAT Instance state changes.

1. Creating SNS Topic

import * as sns from 'aws-cdk-lib/aws-sns';
import * as subscriptions from 'aws-cdk-lib/aws-sns-subscriptions';

// Create SNS Topic
const natInstanceTopic = new sns.Topic(this, 'NatInstanceTopic', {
  topicName: [props.project, props.environment, 'NatInstance'].join('-'),
  displayName: 'NAT Instance State Change Notifications',
});

// Email subscription (optional)
if (props.notificationEmail) {
  natInstanceTopic.addSubscription(
    new subscriptions.EmailSubscription(props.notificationEmail)
  );
}
Enter fullscreen mode Exit fullscreen mode

2. Creating EventBridge Rule

import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';

// Rule for NAT Instance state change
const natInstanceStateRule = new events.Rule(this, 'NatInstanceStateRule', {
  ruleName: [props.project, props.environment, 'NatInstanceState'].join('-'),
  description: 'Notify when NAT Instance state changes',
  eventPattern: {
    source: ['aws.ec2'],
    detailType: ['EC2 Instance State-change Notification'],
    detail: {
      'instance-id': natInstanceIds,
      'state': ['pending', 'running', 'stopping', 'stopped', 'terminated'],
    },
  },
});

// Add SNS topic as target
natInstanceStateRule.addTarget(new targets.SnsTopic(natInstanceTopic, {
  message: events.RuleTargetInput.fromObject({
    subject: `[${props.project.toUpperCase()}-${props.environment.toUpperCase()}] NAT Instance State Change`,
    instanceId: events.EventField.fromPath('$.detail.instance-id'),
    state: events.EventField.fromPath('$.detail.state'),
    time: events.EventField.fromPath('$.time'),
  }),
}));
Enter fullscreen mode Exit fullscreen mode

3. Setting Up Email Notifications (Optional)

When an email address is provided, you'll receive notifications like:

{
  "subject": "[YOURPROJECT-DEV] NAT Instance State Change",
  "instanceId": "i-0123456789abcdef0",
  "state": "running",
  "time": "2024-01-15T12:00:00Z"
}
Enter fullscreen mode Exit fullscreen mode

Notified Events

  • pending: Instance is starting
  • running: Instance is running
  • stopping: Instance is stopping
  • stopped: Instance is stopped
  • terminated: Instance is terminated

💡 Tips:

  • Set up Slack/Teams integration via SNS for team notifications
  • Log important events to CloudWatch Logs
  • Create dashboards combining CloudWatch metrics

NAT Gateway vs NAT Instance: Trade-offs

Feature Comparison Table

Feature NAT Gateway NAT Instance
Availability ✅ Managed by AWS Depends on instance type
Performance ✅ Up to 100 Gbps Depends on instance type
Cost (24/7) ~$44.64/month ✅ ~$3.94/month (t4g.nano)
Scheduled Control ❌ Not available ✅ EventBridge schedule
Patch Management ✅ Not required (Managed) Required (SSM Patch Manager)
Scalability ✅ Automatic Manual (instance type change)
Monitoring CloudWatch metrics CloudWatch + OS metrics
Single Point of Failure ✅ No Yes (single instance)

Recommended Use Cases

Use Cases for NAT Gateway

  1. Production Environments

    • 24/7 operation required
    • High availability is critical
    • Handling large traffic volume
  2. High Performance Requirements

    • Burst traffic handling needed
    • Bandwidth requirements over 5 Gbps
    • Multiple concurrent connections
  3. Zero Operational Overhead

    • No infrastructure management desired
    • Fully managed service preferred
    • No patching management needed

Use Cases for NAT Instance

  1. Development/Test Environments

    • Traffic only during business hours
    • Cost optimization prioritized
    • Downtime acceptable
  2. Cost Reduction Priority

    • Small-scale traffic
    • Scheduled operation possible
    • Operational overhead acceptable
  3. Custom Network Control

    • Custom security groups needed
    • Traffic filtering required
    • Specific logging requirements

Performance Comparison

Metric NAT Gateway NAT Instance (t4g.nano) NAT Instance (t4g.micro)
Bandwidth Up to 100 Gbps Up to 5 Gbps Up to 5 Gbps
Max Connections 55,000~440,000 ~55,000 ~55,000
Bandwidth 10 Gbps ~ 5 Gbps ~ 5 Gbps
Latency Low Low Low

💡 Tips: For development environments with low traffic, t4g.nano is sufficient. For production use, consider t4g.medium or larger.

Cost Analysis

Detailed Cost Comparison (Tokyo Region)

1. NAT Gateway (Traditional)

Base Charge:
- $0.062/hour × 730 hours = $45.26/month

Data Processing:
- $0.062/GB
- Example: 100 GB/month = $6.20

Total: ~$51.46/month per AZ
3 AZs: ~$154.38/month
Enter fullscreen mode Exit fullscreen mode

2. NAT Instance (t4g.nano) - 24/7 Operation

Instance Charge:
- $0.0054/hour × 730 hours = $3.94/month

Elastic IP:
- While stopped: $0.005/hour × 730 hours = $3.65/month

Data Transfer:
- Same as NAT Gateway ($0.062/GB)
- Example: 100 GB/month = $6.20

Total: ~$13.79/month per instance
3 instances: ~$41.37/month

Savings: $154.38 - $41.37 = $113.01/month (73% reduction)
Annual savings: ~$1,356
Enter fullscreen mode Exit fullscreen mode

3. NAT Instance - Business Hours Only (Weekdays 9 hours)

Instance Charge:
- Operating hours: 9 hours × 5 days × 4.33 weeks = ~195 hours/month
- $0.0054/hour × 195 hours = $1.05/month

Elastic IP:
- While stopped: $0.005/hour × 730 hours = $3.65/month

Data Transfer:
- Same as NAT Gateway ($0.062/GB)
- Example: 100 GB/month = $6.20

Total: ~$10.9/month per instance
3 instances: ~$32.7/month
Savings: $154.38 - $32.7 = $121.68/month (79% reduction)
Annual savings: ~$1,460
Enter fullscreen mode Exit fullscreen mode

💡 Important Note: For scheduled operation, also calculate Elastic IP charges during stop periods. In many cases, 24/7 operation may be more cost-effective.

Recommended Configuration and Costs by Environment

Environment Configuration Monthly Cost Reasoning
Development NAT Instance × 1 (t4g.nano, 9/7) ~$1.84 Cost priority, downtime acceptable
Test NAT Instance × 1 (t4g.micro, 9/7) ~$1.84 Cost and performance balance
Staging NAT Gateway × 1 ~$44.64 Production equivalent
Production NAT Gateway × 3 ~$154 High availability critical

Automating Patch Application

Use Systems Manager Patch Manager to automatically apply security patches to NAT Instance.

1. Granting SSM Permissions to NAT Instance

NAT Instance requires permissions to use Systems Manager.

import * as iam from 'aws-cdk-lib/aws-iam';

// Add SSM managed policy to NAT Instance role
(natProvider as ec2.NatInstanceProviderV2).connections.allowFrom(
  ec2.Peer.ipv4(vpc.vpcCidrBlock),
  ec2.Port.allTraffic(),
  'Allow all traffic from VPC',
);

// Get NAT Instance role
const natInstanceRole = (natProvider as ec2.NatInstanceProviderV2).securityGroup.node.tryFindChild('InstanceRole') as iam.Role;

// Add SSM managed policy
natInstanceRole.addManagedPolicy(
  iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonSSMManagedInstanceCore')
);
Enter fullscreen mode Exit fullscreen mode

This allows NAT Instance to:

  • Report to Systems Manager Fleet Manager
  • Execute commands via Session Manager
  • Execute Patch Manager tasks

2. Creating Patch Baseline

import * as ssm from 'aws-cdk-lib/aws-ssm';

// Create patch baseline for Amazon Linux 2023
const patchBaseline = new ssm.CfnPatchBaseline(this, 'NatInstancePatchBaseline', {
  name: `${props.project}-${props.environment}-AL2023-PatchBaseline`,
  description: 'Patch baseline for NAT Instance (Amazon Linux 2023)',
  operatingSystem: 'AMAZON_LINUX_2023',
  approvalRules: {
    patchRules: [
      {
        // Security patches
        patchFilterGroup: {
          patchFilters: [
            {
              key: 'CLASSIFICATION',
              values: ['Security', 'Bugfix'],
            },
            {
              key: 'SEVERITY',
              values: ['Critical', 'Important'],
            },
          ],
        },
        approveAfterDays: 7,  // Apply 7 days after release
        complianceLevel: 'HIGH',
        enableNonSecurity: false,
      },
    ],
  },
  // Tag-based target specification
  patchGroups: [`/NatInstance/${props.project}/${props.environment}`],
});
Enter fullscreen mode Exit fullscreen mode

💡 Key Points:

  • approveAfterDays: 7: Apply only tested patches
  • SEVERITY: Critical, Important: Apply high-priority patches first

3. Configuring Maintenance Window

3.1. IAM Role for Maintenance Window

const maintenanceWindowRole = new iam.Role(this, 'MaintenanceWindowRole', {
  roleName: `${props.project}-${props.environment}-MaintenanceWindowRole`,
  assumedBy: new iam.CompositePrincipal(
    new iam.ServicePrincipal('ssm.amazonaws.com'),
    new iam.ServicePrincipal('ec2.amazonaws.com'),
  ),
  managedPolicies: [
    iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonSSMMaintenanceWindowRole'),
    iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonSSMManagedInstanceCore'),
  ],
});

// Add PassRole permission
maintenanceWindowRole.addToPolicy(new iam.PolicyStatement({
  effect: iam.Effect.ALLOW,
  actions: ['iam:PassRole'],
  resources: [natInstanceRole.roleArn],
  conditions: {
    StringEquals: {
      'iam:PassedToService': 'ssm.amazonaws.com',
    },
  },
}));
Enter fullscreen mode Exit fullscreen mode

3.2. Creating Maintenance Window

// Maintenance window: Every Sunday 12:00 JST (03:00 UTC)
const maintenanceWindow = new ssm.CfnMaintenanceWindow(this, 'NatInstanceMaintenanceWindow', {
  name: `${props.project}-${props.environment}-NatInstance-PatchWindow`,
  description: 'Weekly maintenance window for NAT Instance patches',
  schedule: 'cron(0 3 ? * SUN *)',  // Every Sunday 03:00 UTC (12:00 JST)
  duration: 4,  // 4 hours
  cutoff: 1,    // Stop 1 hour before end
  allowUnassociatedTargets: false,
  scheduleTimezone: 'UTC',
});
Enter fullscreen mode Exit fullscreen mode

3.3. Configuring Patch Task

// Maintenance window target
const maintenanceWindowTarget = new ssm.CfnMaintenanceWindowTarget(this, 'NatInstanceTarget', {
  windowId: maintenanceWindow.ref,
  resourceType: 'INSTANCE',
  targets: [
    {
      key: 'tag:Patch Group',
      values: [`/NatInstance/${props.project}/${props.environment}`],
    },
  ],
});

// Patch task
new ssm.CfnMaintenanceWindowTask(this, 'PatchTask', {
  windowId: maintenanceWindow.ref,
  taskType: 'RUN_COMMAND',
  taskArn: 'AWS-RunPatchBaseline',
  targets: [
    {
      key: 'WindowTargetIds',
      values: [maintenanceWindowTarget.ref],
    },
  ],
  serviceRoleArn: maintenanceWindowRole.roleArn,
  priority: 1,
  maxConcurrency: '1',  // Sequential execution one at a time
  maxErrors: '1',
  taskInvocationParameters: {
    maintenanceWindowRunCommandParameters: {
      parameters: {
        Operation: ['Install'],
        RebootOption: ['RebootIfNeeded'],
      },
      timeoutSeconds: 3600,  // 1 hour timeout
      cloudWatchOutputConfig: {
        cloudWatchLogGroupName: `/aws/ssm/${props.project}/${props.environment}/patch`,
        cloudWatchOutputEnabled: true,
      },
    },
  },
});
Enter fullscreen mode Exit fullscreen mode

💡 Important Parameters:

  • maxConcurrency: '1': Apply sequentially one instance at a time (maintain availability)
  • RebootOption: 'RebootIfNeeded': Auto-reboot when kernel patch requires it
  • timeoutSeconds: 3600: Allow sufficient time for patch application

5. Monitoring Patch Application Status

import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
import * as cloudwatchActions from 'aws-cdk-lib/aws-cloudwatch-actions';

// SNS topic for patch notifications
const patchNotificationTopic = new sns.Topic(this, 'PatchNotificationTopic', {
  topicName: `${props.project}-${props.environment}-PatchNotification`,
  displayName: 'NAT Instance Patch Status Notifications',
});

// Email subscription (optional)
if (props.notificationEmail) {
  patchNotificationTopic.addSubscription(
    new subscriptions.EmailSubscription(props.notificationEmail)
  );
}

// EventBridge rule for patch completion
const patchCompletionRule = new events.Rule(this, 'PatchCompletionRule', {
  ruleName: `${props.project}-${props.environment}-NatInstancePatchCompletion`,
  description: 'Notify when NAT Instance patch completes',
  eventPattern: {
    source: ['aws.ssm'],
    detailType: ['EC2 Command Status-change Notification'],
    detail: {
      'status': ['Success', 'Failed', 'TimedOut'],
      'document-name': ['AWS-RunPatchBaseline'],
    },
  },
});

patchCompletionRule.addTarget(new targets.SnsTopic(patchNotificationTopic, {
  message: events.RuleTargetInput.fromObject({
    default: events.EventField.fromPath('$.detail'),
    subject: `[${props.project.toUpperCase()}-${props.environment.toUpperCase()}] NAT Instance Patch Status`,
    message: {
      summary: `Patch operation ${events.EventField.fromPath('$.detail.status')}`,
      details: {
        commandId: events.EventField.fromPath('$.detail.command-id'),
        instanceId: events.EventField.fromPath('$.detail.instance-id'),
        status: events.EventField.fromPath('$.detail.status'),
        documentName: events.EventField.fromPath('$.detail.document-name'),
      },
    },
  }),
}));

// Compliance violation alarm
const complianceMetric = new cloudwatch.Metric({
  namespace: 'AWS/SSM',
  metricName: 'PatchComplianceNonCompliantCount',
  dimensionsMap: {
    PatchGroup: `/NatInstance/${props.project}/${props.environment}`,
  },
  statistic: 'Average',
  period: cdk.Duration.hours(1),
});

new cloudwatch.Alarm(this, 'PatchComplianceAlarm', {
  alarmName: `${props.project}-${props.environment}-NatInstancePatchNonCompliant`,
  metric: complianceMetric,
  threshold: 0,
  evaluationPeriods: 2,
  comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
  treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
}).addAlarmAction(new cloudwatchActions.SnsAction(patchNotificationTopic));
Enter fullscreen mode Exit fullscreen mode

Key Points for Patch Management

Item Setting Reason
Execution Timing Every Sunday 3:00 UTC (12:00 JST) Low traffic period
Window Duration 4 hours Accommodate sequential application to multiple instances
Concurrency 1 instance Sequential execution to maintain availability
Reboot As needed For kernel patches etc.
Approval Period 7 days Apply only tested patches
Target Patches Critical/Important security patches Prioritize high-severity

Patch Application Flow

1. Every Sunday 12:00 JST
   ↓
2. Maintenance window starts
   ↓
3. Apply patches to NAT Instance #1
   ↓ (Reboot if needed)
   ↓
4. Apply patches to NAT Instance #2
   ↓ (Reboot if needed)
   ↓
5. Apply patches to NAT Instance #3
   ↓ (Reboot if needed)
   ↓
6. Completion notification (via SNS)
Enter fullscreen mode Exit fullscreen mode

💡 For development environments: With single NAT Instance configuration, internet connection is temporarily lost during patch application (especially during reboot). Recommend setting maintenance window outside business hours.

Best Practices

1. Security

// ✅ Disable source/destination check for NAT Instance
// (Automatically set by NatProvider.instanceV2())

// ✅ Restrict traffic with security groups
(natProvider as ec2.NatInstanceProviderV2).connections.allowFrom(
  ec2.Peer.ipv4(vpc.vpcCidrBlock),
  ec2.Port.allTraffic(),
  'Allow all traffic from VPC',
);

// ✅ Access via Systems Manager Session Manager
// (Disable public SSH access)
Enter fullscreen mode Exit fullscreen mode

2. Monitoring and Alerts

// CloudWatch alarm configuration example
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
import * as actions from 'aws-cdk-lib/aws-cloudwatch-actions';

const cpuAlarm = new cloudwatch.Alarm(this, 'NatInstanceCpuAlarm', {
  metric: new cloudwatch.Metric({
    namespace: 'AWS/EC2',
    metricName: 'CPUUtilization',
    dimensionsMap: {
      InstanceId: natInstanceId,
    },
    statistic: 'Average',
    period: cdk.Duration.minutes(5),
  }),
  threshold: 80,
  evaluationPeriods: 2,
  alarmDescription: 'NAT Instance CPU utilization is too high',
});

cpuAlarm.addAlarmAction(new actions.SnsAction(snsTopic));

// Patch compliance alarm
const complianceAlarm = new cloudwatch.Alarm(this, 'PatchComplianceAlarm', {
  metric: complianceMetric,
  threshold: 0,
  evaluationPeriods: 2,
  comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
  alarmDescription: 'NAT Instance has missing security patches',
});

complianceAlarm.addAlarmAction(new actions.SnsAction(patchNotificationTopic));
Enter fullscreen mode Exit fullscreen mode

3. Optimal Availability for Development Environments

In this architecture, we configured 3 NAT instances as a high-availability example, but for development environments, a single NAT instance is often sufficient. In that case, adjust the natGateways count.

// Deploy NAT Instance to multiple AZs
const vpc = new ec2.Vpc(this, 'VpcNatInstanceV2', {
  natGateways: 1,  // Only 1 instance
  natGatewayProvider: natProvider,
});
Enter fullscreen mode Exit fullscreen mode

Cost becomes:

  • 3 instances: $9.30/month
  • 1 instance: $3.10/month

Troubleshooting

Common Issues and Solutions

1. Cannot Access Internet from Private Subnet After NAT Instance Stops

Cause: Stopped by schedule or manually stopped

Solution:

# Manually start NAT Instance
aws ec2 start-instances --instance-ids i-xxxxx

# Or temporarily disable schedule
aws events disable-rule --name YourProject-dev-NATStopRule-i-xxxxx
Enter fullscreen mode Exit fullscreen mode

2. Instance Not Showing in Systems Manager

Cause: IAM role missing AmazonSSMManagedInstanceCore policy

Solution:

# Check in Fleet Manager
aws ssm describe-instance-information \
  --query 'InstanceInformationList[].[InstanceId,PingStatus,PlatformName]' \
  --output table

# If instance not shown, check IAM role
aws iam list-attached-role-policies --role-name <NAT-Instance-Role-Name>
Enter fullscreen mode Exit fullscreen mode

💡 NAT Instance IAM role requires the following policies:

  • AmazonSSMManagedInstanceCore (For Systems Manager management)

3. Patch Application Fails

Cause: Maintenance window IAM role missing iam:PassRole permission

Solution:

Check error in CloudTrail logs.

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=SendCommand \
  --max-results 5
Enter fullscreen mode Exit fullscreen mode

If error is InvalidDocument: document hash and hash type must both be present or none, remove documentHashType parameter (not needed for AWS managed documents).

4. Poor Performance

Cause: Instance type is too small

Solution: Change to larger instance type

instanceType: ec2.InstanceType.of(
  ec2.InstanceClass.T4G,
  ec2.InstanceSize.MICRO,  // nano → micro
),
Enter fullscreen mode Exit fullscreen mode

5. Unexpected Shutdown

Cause: Need to check CloudWatch Logs

Solution:

# Check logs in Systems Manager
aws logs filter-log-events \
  --log-group-name /aws/ssm/automation \
  --filter-pattern "i-xxxxx"

# Check EC2 instance status history
aws ec2 describe-instance-status \
  --instance-ids i-xxxxx \
  --include-all-instances
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Tips

  • Utilize CloudTrail: Analyze API call failure reasons in detail
  • Validate IAM permissions: Correctly configure conditional policies for iam:PassRole
  • Systems Manager logs: Check detailed execution logs for patch application
  • EventBridge metrics: Monitor schedule rule execution status

Deployment and Cleanup

Deployment

# Install dependencies
cd infrastructure/cdk-workspaces/workspaces/vpc-natinstance-v2
npm install

# Bootstrap CDK (first time only)
cdk bootstrap

# Deploy
cdk deploy "**" --project=YourProject --env=dev
Enter fullscreen mode Exit fullscreen mode

Cleanup

# Delete stack
cdk destroy "**" --project=YourProject --env=dev

# Skip confirmation prompt
cdk destroy "**" --project=YourProject --env=dev --force
Enter fullscreen mode Exit fullscreen mode

⚠️ Notes:

  • Elastic IPs are automatically released
  • Flow logs in S3 bucket are auto-deleted due to autoDeleteObjects: true
  • For production environments, use removalPolicy: cdk.RemovalPolicy.RETAIN

Summary

NAT Instance is an excellent alternative to NAT Gateway. Especially for development environments, it offers the following benefits:

Benefits:

  • Significant cost savings: Save over $400 annually
  • Schedule management: Auto-stop outside business hours
  • Flexible control: Security groups and Network ACLs
  • Monitoring and alerts: CloudWatch and EventBridge integration

Drawbacks (Considerations):

  • Single point of failure (for single instance)
  • Performance limitations (depends on instance type)
  • Operational management required (patching, monitoring)
  • Additional cost for high availability configuration

Recommendations

  1. Development/Test environments: Use NAT Instance for cost optimization
  2. Production environments: Use NAT Gateway for availability
  3. Hybrid configuration: Choose optimal configuration per environment

References


Let's continue learning practical AWS CDK patterns through the 100 drill exercises!
If you found this helpful, please ⭐ the repository!

📌 You can see the entire code in My GitHub Repository.

Top comments (0)