Manish Kumar

Posted on Jan 19

AWS EC2 Deep Dive: Architecture, Operations, and Best Practices

#aws #ec2 #cloud #architecture

AWS EC2 Complete Working Reference Guide

Instance Types and Families

Instance Type Nomenclature

Format: [Family][Generation][Additional Capabilities].[Size]
Example: c7g.xlarge
- c = Compute optimized family
- 7 = 7th generation
- g = AWS Graviton processor
- xlarge = Size

Instance Families Overview

Family	Category	Processor Options	Use Cases	Key Characteristics
T3, T3a, T4g	General Purpose	Intel, AMD, Graviton	Web servers, dev/test, microservices	Burstable CPU, cost-effective
M5, M6i, M7i	General Purpose	Intel, AMD, Graviton	Databases, application servers	Balanced CPU/memory/network
C5, C6i, C7g	Compute Optimized	Intel, AMD, Graviton	HPC, batch processing, gaming	High CPU-to-memory ratio
R5, R6i, R7g, X1, X2	Memory Optimized	Intel, AMD, Graviton	In-memory databases, big data	High memory-to-CPU ratio
I3, I4i, D2, D3	Storage Optimized	Intel, AMD	Data warehousing, NoSQL, distributed file systems	High IOPS, local NVMe storage
P4, P5, G5, Inf2, Trn1	Accelerated Computing	NVIDIA GPUs, AWS Trainium/Inferentia	ML training/inference, rendering	GPUs, TPUs, specialized accelerators
Mac	General Purpose	Apple Silicon	iOS/macOS development	Dedicated Mac hardware
Hpc7g	HPC Optimized	Graviton	Molecular dynamics, CFD simulations	Optimized for tightly coupled workloads

Instance Sizes

nano, micro, small, medium
large, xlarge, 2xlarge, 4xlarge, 8xlarge, 12xlarge, 16xlarge, 24xlarge, 32xlarge, 48xlarge, 56xlarge, 112xlarge
Each size typically doubles vCPUs and memory from previous size
Metal instances provide access to physical server resources

Processor Variants

Intel: Standard option (M5, C5, R5)
AMD: Cost-optimized (M5a, C5a, R5a - typically 10% cheaper)
AWS Graviton: ARM-based, up to 40% better price-performance (M7g, C7g, R7g)
g suffix: Graviton processor
a suffix: AMD processor
n suffix: Enhanced networking
d suffix: Instance store volumes included

Pricing Models Comparison

Model	Commitment	Savings	Flexibility	Best For	Interruption Risk
On-Demand	None	None (baseline)	Full	Spiky workloads, dev/test	None
Reserved Instances	1-3 years	Up to 72%	Instance family/region locked	Predictable, steady-state workloads	None
Savings Plans - Compute	1-3 years	Up to 66%	Any instance type/region	Flexible compute usage	None
Savings Plans - EC2	1-3 years	Up to 72%	Instance family locked, region locked	Predictable EC2 usage in specific family	None
Spot Instances	None	Up to 90%	Full	Fault-tolerant, batch jobs	Yes (2-minute warning)
Dedicated Hosts	On-demand or 1-3 year	Additional RI discounts	Physical server control	BYOL, compliance	None
Capacity Reservations	On-demand	None (billed if unused)	AZ-specific capacity	Business-critical apps	None

Spot Instance Characteristics

Variable pricing based on supply/demand
2-minute interruption notification
Can be 85% cheaper than On-Demand during low demand periods
Example: c7i.2xlarge at \$0.054/hour (Spot) vs \$0.357/hour (On-Demand)
Best for: Stateless applications, CI/CD, data processing, containerized workloads

Savings Plans Priority

Applies to On-Demand usage first
Leftover commitment applies to Spot at Spot rates
Example: \$100/hour plan with \$80 On-Demand + \$30 Spot = covers \$80 On-Demand fully + \$20 Spot

Reserved Instances Types

Standard RI: Maximum savings, least flexibility
Convertible RI: Can change instance family, lower discount
Scheduled RI: Reserved for specific time windows (deprecated)

Instance Launch Methods

Launch via AWS Console

Navigate to EC2 Dashboard → Launch Instance
Configure:
- Name and tags
- AMI selection (Amazon Linux, Ubuntu, Windows, etc.)
- Instance type
- Key pair (create or select existing)
- Network settings (VPC, subnet, security groups)
- Storage configuration
- Advanced details (user data, IAM role, metadata options)
Review and launch

Launch via AWS CLI

# Basic instance launch
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type t3.medium \
  --key-name MyKeyPair \
  --security-group-ids sg-0123456789abcdef0 \
  --subnet-id subnet-0bb1c79de3EXAMPLE \
  --count 1 \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=MyInstance}]'

# Launch with user data
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type t3.medium \
  --key-name MyKeyPair \
  --security-group-ids sg-0123456789abcdef0 \
  --subnet-id subnet-0bb1c79de3EXAMPLE \
  --user-data file://user-data.sh \
  --iam-instance-profile Name=MyInstanceProfile

# Launch Spot Instance
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type t3.medium \
  --instance-market-options '{"MarketType":"spot","SpotOptions":{"MaxPrice":"0.05","SpotInstanceType":"one-time"}}' \
  --key-name MyKeyPair \
  --security-group-ids sg-0123456789abcdef0

User Data Script Example

#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello from $(hostname -f)</h1>" > /var/www/html/index.html

# Get instance metadata
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
echo "<p>Instance ID: $INSTANCE_ID</p>" >> /var/www/html/index.html
echo "<p>Availability Zone: $AZ</p>" >> /var/www/html/index.html

Launch with Terraform

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.medium"
  key_name      = "MyKeyPair"

  vpc_security_group_ids = [aws_security_group.web.id]
  subnet_id              = aws_subnet.public.id

  iam_instance_profile = aws_iam_instance_profile.ec2_profile.name

  user_data = <<-EOF
              #!/bin/bash
              yum update -y
              yum install -y httpd
              systemctl start httpd
              systemctl enable httpd
              EOF

  root_block_device {
    volume_type = "gp3"
    volume_size = 30
    encrypted   = true
  }

  tags = {
    Name        = "WebServer"
    Environment = "Production"
  }

  monitoring = true
}

Launch with CloudFormation

Resources:
  MyEC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0c55b159cbfafe1f0
      InstanceType: t3.medium
      KeyName: MyKeyPair
      SecurityGroupIds:
        - !Ref WebSecurityGroup
      SubnetId: !Ref PublicSubnet
      IamInstanceProfile: !Ref EC2InstanceProfile
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          yum update -y
          yum install -y httpd
          systemctl start httpd
          systemctl enable httpd
      BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            VolumeType: gp3
            VolumeSize: 30
            Encrypted: true
      Tags:
        - Key: Name
          Value: WebServer

Launch Templates

Create Launch Template via CLI

aws ec2 create-launch-template \
  --launch-template-name MyLaunchTemplate \
  --version-description "Version 1" \
  --launch-template-data '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "t3.medium",
    "KeyName": "MyKeyPair",
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "IamInstanceProfile": {
      "Name": "MyInstanceProfile"
    },
    "BlockDeviceMappings": [{
      "DeviceName": "/dev/xvda",
      "Ebs": {
        "VolumeSize": 30,
        "VolumeType": "gp3",
        "DeleteOnTermination": true,
        "Encrypted": true
      }
    }],
    "Monitoring": {
      "Enabled": true
    },
    "UserData": "IyEvYmluL2Jhc2gKCnl1bSB1cGRhdGUgLXkKeXVtIGluc3RhbGwgLXkgaHR0cGQ="
  }'

Launch Template with Systems Manager Parameter

# Create SSM parameter for AMI ID
aws ssm put-parameter \
  --name "/golden-ami/latest" \
  --value "ami-0c55b159cbfafe1f0" \
  --type "String"

# Create launch template referencing SSM parameter
aws ec2 create-launch-template \
  --launch-template-name MyTemplate \
  --launch-template-data '{
    "ImageId": "resolve:ssm:/golden-ami/latest",
    "InstanceType": "t3.medium"
  }'

Launch Template with Terraform

resource "aws_launch_template" "app" {
  name_prefix   = "app-"
  image_id      = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.medium"
  key_name      = "MyKeyPair"

  vpc_security_group_ids = [aws_security_group.app.id]

  iam_instance_profile {
    name = aws_iam_instance_profile.app.name
  }

  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_size           = 30
      volume_type           = "gp3"
      iops                  = 3000
      throughput            = 125
      delete_on_termination = true
      encrypted             = true
    }
  }

  network_interfaces {
    associate_public_ip_address = true
    delete_on_termination       = true
    security_groups             = [aws_security_group.app.id]
  }

  monitoring {
    enabled = true
  }

  user_data = base64encode(<<-EOF
              #!/bin/bash
              yum update -y
              yum install -y httpd
              systemctl start httpd
              EOF
  )

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "AppServer"
    }
  }
}

Launch Instance from Template

aws ec2 run-instances \
  --launch-template LaunchTemplateName=MyLaunchTemplate,Version=1 \
  --count 2 \
  --subnet-id subnet-0bb1c79de3EXAMPLE

Update Launch Template (Create New Version)

aws ec2 create-launch-template-version \
  --launch-template-id lt-0abcd290751193123 \
  --source-version 1 \
  --launch-template-data '{"InstanceType":"t3.large"}'

Storage Options

Storage Type Comparison

Type	Persistence	Performance	Use Case	Backup Method
EBS (gp3)	Yes (network-attached)	3,000-16,000 IOPS	General purpose, boot volumes	EBS Snapshots
EBS (gp2)	Yes (network-attached)	Up to 16,000 IOPS	Legacy general purpose	EBS Snapshots
EBS (io2)	Yes (network-attached)	Up to 64,000 IOPS	High-performance databases	EBS Snapshots
EBS (st1)	Yes (network-attached)	Throughput-optimized	Big data, data warehouses	EBS Snapshots
EBS (sc1)	Yes (network-attached)	Cold HDD, lowest cost	Infrequent access	EBS Snapshots
Instance Store	No (ephemeral)	Very high IOPS	Temporary data, caches	Must use application-level backup

EBS Volume Types Detailed

gp3 (General Purpose SSD)

3,000 IOPS baseline (configurable up to 16,000)
125 MB/s throughput baseline (configurable up to 1,000 MB/s)
Price: \$0.08/GB-month
Independent IOPS and throughput configuration
Recommended for most workloads

gp2 (General Purpose SSD - Legacy)

IOPS scales with volume size (3 IOPS per GB)
Burstable up to 3,000 IOPS for volumes < 1 TB
Throughput: up to 250 MB/s
Use gp3 for new deployments (better value)

io2 Block Express (Provisioned IOPS SSD)

Up to 256,000 IOPS per volume
99.999% durability
Up to 4,000 MB/s throughput
Sub-millisecond latency
Use for critical databases

EBS Volume Operations

# Create EBS volume
aws ec2 create-volume \
  --availability-zone us-east-1a \
  --size 100 \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125 \
  --encrypted \
  --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=MyVolume}]'

# Attach volume to instance
aws ec2 attach-volume \
  --volume-id vol-0123456789abcdef0 \
  --instance-id i-0123456789abcdef0 \
  --device /dev/sdf

# Modify volume (increase size and IOPS)
aws ec2 modify-volume \
  --volume-id vol-0123456789abcdef0 \
  --size 200 \
  --iops 5000

# Create snapshot
aws ec2 create-snapshot \
  --volume-id vol-0123456789abcdef0 \
  --description "Backup of MyVolume"

# Create volume from snapshot
aws ec2 create-volume \
  --snapshot-id snap-0123456789abcdef0 \
  --availability-zone us-east-1a \
  --volume-type gp3

# Detach volume
aws ec2 detach-volume \
  --volume-id vol-0123456789abcdef0

# Delete volume
aws ec2 delete-volume \
  --volume-id vol-0123456789abcdef0

EBS Snapshot Management

# Create multi-volume snapshot for entire instance
aws ec2 create-snapshots \
  --instance-specification InstanceId=i-0123456789abcdef0 \
  --description "Full instance backup"

# Copy snapshot to another region
aws ec2 copy-snapshot \
  --source-region us-east-1 \
  --source-snapshot-id snap-0123456789abcdef0 \
  --destination-region us-west-2 \
  --description "DR copy"

# Create AMI from instance (includes all attached EBS volumes)
aws ec2 create-image \
  --instance-id i-0123456789abcdef0 \
  --name "MyGoldenImage" \
  --description "Production baseline" \
  --no-reboot

# List snapshots
aws ec2 describe-snapshots \
  --owner-ids self \
  --filters "Name=status,Values=completed"

# Delete snapshot
aws ec2 delete-snapshot \
  --snapshot-id snap-0123456789abcdef0

Instance Store Characteristics

Physically attached to host server
Data lost on instance stop/terminate/hardware failure
Included in instance price (no additional cost)
Very high IOPS (millions)
Available on specific instance types (c5d, m5d, r5d, i3, i4i)

AMI Management

Create Custom AMI

# Create AMI from running instance (with reboot)
aws ec2 create-image \
  --instance-id i-0123456789abcdef0 \
  --name "MyCustomAMI-$(date +%Y%m%d)" \
  --description "Custom application image"

# Create AMI without rebooting
aws ec2 create-image \
  --instance-id i-0123456789abcdef0 \
  --name "MyCustomAMI" \
  --no-reboot

# Register AMI from snapshot
aws ec2 register-image \
  --name "MyAMI" \
  --root-device-name /dev/xvda \
  --block-device-mappings \
    "DeviceName=/dev/xvda,Ebs={SnapshotId=snap-0123456789abcdef0,VolumeType=gp3}"

AMI Operations

# List AMIs owned by you
aws ec2 describe-images \
  --owners self \
  --filters "Name=state,Values=available"

# Copy AMI to another region
aws ec2 copy-image \
  --source-region us-east-1 \
  --source-image-id ami-0123456789abcdef0 \
  --region us-west-2 \
  --name "MyAMI-Copy"

# Share AMI with another account
aws ec2 modify-image-attribute \
  --image-id ami-0123456789abcdef0 \
  --launch-permission "Add=[{UserId=123456789012}]"

# Make AMI public
aws ec2 modify-image-attribute \
  --image-id ami-0123456789abcdef0 \
  --launch-permission "Add=[{Group=all}]"

# Deregister AMI
aws ec2 deregister-image \
  --image-id ami-0123456789abcdef0

AMI User Data

User data NOT stored in AMI
Must specify user data each time launching from AMI
User data embedded in launch templates persists across launches
AMI captures: OS, applications, configurations, attached EBS volume snapshots

Networking Configuration

Security Groups vs Network ACLs

Feature	Security Groups	Network ACLs
Scope	Instance-level	Subnet-level
State	Stateful (return traffic auto-allowed)	Stateless (must explicitly allow return)
Rules	Allow rules only	Allow and Deny rules
Rule Processing	All rules evaluated	Rules evaluated in order
Default Behavior	Deny all inbound, allow all outbound	Default NACL allows all
Assignment	Must be explicitly assigned	Automatically applied to subnet
Rule Limit	60 inbound + 60 outbound per group	20 inbound + 20 outbound per NACL

Security Group Configuration

# Create security group
aws ec2 create-security-group \
  --group-name WebServerSG \
  --description "Security group for web servers" \
  --vpc-id vpc-0123456789abcdef0

# Add inbound rules
aws ec2 authorize-security-group-ingress \
  --group-id sg-0123456789abcdef0 \
  --protocol tcp \
  --port 80 \
  --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
  --group-id sg-0123456789abcdef0 \
  --protocol tcp \
  --port 443 \
  --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
  --group-id sg-0123456789abcdef0 \
  --protocol tcp \
  --port 22 \
  --cidr 203.0.113.0/24

# Allow traffic from another security group
aws ec2 authorize-security-group-ingress \
  --group-id sg-0123456789abcdef0 \
  --protocol tcp \
  --port 3306 \
  --source-group sg-9876543210abcdef0

# Remove rule
aws ec2 revoke-security-group-ingress \
  --group-id sg-0123456789abcdef0 \
  --protocol tcp \
  --port 22 \
  --cidr 0.0.0.0/0

# Add outbound rule
aws ec2 authorize-security-group-egress \
  --group-id sg-0123456789abcdef0 \
  --protocol tcp \
  --port 443 \
  --cidr 0.0.0.0/0

Security Group with Terraform

resource "aws_security_group" "web" {
  name        = "web-server-sg"
  description = "Security group for web servers"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTP from anywhere"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTPS from anywhere"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description     = "SSH from bastion"
    from_port       = 22
    to_port         = 22
    protocol        = "tcp"
    security_groups = [aws_security_group.bastion.id]
  }

  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "WebServerSG"
  }
}

Network ACL Configuration

# Create Network ACL
aws ec2 create-network-acl \
  --vpc-id vpc-0123456789abcdef0

# Add inbound rule (allow HTTP)
aws ec2 create-network-acl-entry \
  --network-acl-id acl-0123456789abcdef0 \
  --ingress \
  --rule-number 100 \
  --protocol tcp \
  --port-range From=80,To=80 \
  --cidr-block 0.0.0.0/0 \
  --rule-action allow

# Add deny rule (higher priority)
aws ec2 create-network-acl-entry \
  --network-acl-id acl-0123456789abcdef0 \
  --ingress \
  --rule-number 99 \
  --protocol icmp \
  --icmp-type-code Code=-1,Type=-1 \
  --cidr-block 0.0.0.0/0 \
  --rule-action deny

# Add outbound rule for ephemeral ports
aws ec2 create-network-acl-entry \
  --network-acl-id acl-0123456789abcdef0 \
  --egress \
  --rule-number 100 \
  --protocol tcp \
  --port-range From=1024,To=65535 \
  --cidr-block 0.0.0.0/0 \
  --rule-action allow

# Associate NACL with subnet
aws ec2 replace-network-acl-association \
  --association-id aclassoc-0123456789abcdef0 \
  --network-acl-id acl-0123456789abcdef0

Elastic Network Interface (ENI)

# Create ENI with static private IP
aws ec2 create-network-interface \
  --subnet-id subnet-0123456789abcdef0 \
  --description "Primary network interface" \
  --groups sg-0123456789abcdef0 \
  --private-ip-address 10.0.1.10

# Attach ENI to instance
aws ec2 attach-network-interface \
  --network-interface-id eni-0123456789abcdef0 \
  --instance-id i-0123456789abcdef0 \
  --device-index 1

# Assign secondary private IP
aws ec2 assign-private-ip-addresses \
  --network-interface-id eni-0123456789abcdef0 \
  --private-ip-addresses 10.0.1.11 10.0.1.12

# Detach ENI
aws ec2 detach-network-interface \
  --attachment-id eni-attach-0123456789abcdef0

# Delete ENI
aws ec2 delete-network-interface \
  --network-interface-id eni-0123456789abcdef0

Elastic IP (EIP)

# Allocate Elastic IP
aws ec2 allocate-address --domain vpc

# Associate EIP with instance
aws ec2 associate-address \
  --instance-id i-0123456789abcdef0 \
  --allocation-id eipalloc-0123456789abcdef0

# Associate EIP with ENI
aws ec2 associate-address \
  --network-interface-id eni-0123456789abcdef0 \
  --allocation-id eipalloc-0123456789abcdef0

# Disassociate EIP
aws ec2 disassociate-address \
  --association-id eipassoc-0123456789abcdef0

# Release EIP
aws ec2 release-address \
  --allocation-id eipalloc-0123456789abcdef0

Enhanced Networking

SR-IOV (Single Root I/O Virtualization): Higher PPS, lower latency, lower jitter
ENA (Elastic Network Adapter): Up to 100 Gbps, required for current generation instances
Intel 82599 VF: Up to 10 Gbps, legacy instances
Placement Groups: Cluster, Partition, Spread

Placement Groups

# Create cluster placement group (low latency)
aws ec2 create-placement-group \
  --group-name HPC-Cluster \
  --strategy cluster

# Create partition placement group (distributed)
aws ec2 create-placement-group \
  --group-name BigData-Partition \
  --strategy partition \
  --partition-count 7

# Create spread placement group (high availability)
aws ec2 create-placement-group \
  --group-name Critical-Spread \
  --strategy spread

# Launch instance in placement group
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type c5n.18xlarge \
  --placement "GroupName=HPC-Cluster"

Placement Strategy	Max Instances	Use Case	Characteristics
Cluster	Thousands	HPC, low-latency apps	Single AZ, same hardware
Partition	7 partitions per AZ	Distributed systems (Hadoop, Cassandra)	Isolated hardware per partition
Spread	7 instances per AZ	Critical applications	Each instance on separate hardware

Instance Lifecycle Management

Instance States

State	Description	Billing	Operations Allowed
pending	Launching, preparing	Not billed	Wait
running	Instance is running	Billed	Stop, reboot, hibernate, terminate
stopping	Preparing to stop	Not billed	Wait
stopped	Instance shutdown, can restart	Not billed (storage charges apply)	Start, terminate
shutting-down	Preparing to terminate	Not billed	Wait
terminated	Permanently deleted	Not billed	None (cannot restart)
hibernate	RAM saved to EBS, quick restart	Billed during stopping	Start

Instance Operations

# Start instance
aws ec2 start-instances --instance-ids i-0123456789abcdef0

# Stop instance
aws ec2 stop-instances --instance-ids i-0123456789abcdef0

# Reboot instance
aws ec2 reboot-instances --instance-ids i-0123456789abcdef0

# Terminate instance
aws ec2 terminate-instances --instance-ids i-0123456789abcdef0

# Enable termination protection
aws ec2 modify-instance-attribute \
  --instance-id i-0123456789abcdef0 \
  --disable-api-termination

# Disable termination protection
aws ec2 modify-instance-attribute \
  --instance-id i-0123456789abcdef0 \
  --no-disable-api-termination

# Change instance type (must stop first)
aws ec2 stop-instances --instance-ids i-0123456789abcdef0
aws ec2 modify-instance-attribute \
  --instance-id i-0123456789abcdef0 \
  --instance-type "{\"Value\": \"t3.large\"}"
aws ec2 start-instances --instance-ids i-0123456789abcdef0

Hibernation

RAM contents saved to EBS root volume
Must be enabled at launch
Instance resumes with same instance ID and private IP
Faster startup than stop/start
Requirements:
- Supported instance families: C3-C5, M3-M5, R3-R5, T2-T3
- RAM must be < 150 GB
- Root volume must be EBS, encrypted
- Cannot hibernate > 60 days

# Launch instance with hibernation enabled
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type m5.large \
  --hibernation-options Configured=true \
  --block-device-mappings \
    "DeviceName=/dev/xvda,Ebs={VolumeSize=30,Encrypted=true}"

# Hibernate instance
aws ec2 stop-instances \
  --instance-ids i-0123456789abcdef0 \
  --hibernate

Instance Metadata Service (IMDS)

# IMDSv1 (legacy)
curl http://169.254.169.254/latest/meta-data/

# IMDSv2 (token-based, more secure)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/

# Common metadata endpoints
# Instance ID
curl http://169.254.169.254/latest/meta-data/instance-id

# Availability Zone
curl http://169.254.169.254/latest/meta-data/placement/availability-zone

# IAM role credentials
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE-NAME

# User data
curl http://169.254.169.254/latest/user-data

Enforce IMDSv2

aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456789abcdef0 \
  --http-tokens required \
  --http-put-response-hop-limit 1

Auto Scaling

Auto Scaling Components

Launch Template: Defines instance configuration
Auto Scaling Group (ASG): Manages instance fleet
Scaling Policies: Define when to scale
Load Balancer: Distributes traffic

Create Auto Scaling Group

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name MyASG \
  --launch-template "LaunchTemplateName=MyLaunchTemplate,Version=1" \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 3 \
  --vpc-zone-identifier "subnet-0123,subnet-4567,subnet-89ab" \
  --target-group-arns "arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/abc123" \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --tags "Key=Name,Value=WebServer,PropagateAtLaunch=true"

Auto Scaling with Terraform

resource "aws_autoscaling_group" "web" {
  name                = "web-asg"
  min_size            = 2
  max_size            = 10
  desired_capacity    = 3
  health_check_type   = "ELB"
  health_check_grace_period = 300
  vpc_zone_identifier = aws_subnet.private[*].id
  target_group_arns   = [aws_lb_target_group.web.arn]

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "WebServer"
    propagate_at_launch = true
  }

  enabled_metrics = [
    "GroupDesiredCapacity",
    "GroupInServiceInstances",
    "GroupMinSize",
    "GroupMaxSize"
  ]
}

Scaling Policies

Target Tracking Scaling

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name MyASG \
  --policy-name target-tracking-cpu \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 70.0
  }'

Step Scaling

# Create CloudWatch alarm
aws cloudwatch put-metric-alarm \
  --alarm-name high-cpu \
  --alarm-description "Scale up when CPU > 80%" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --evaluation-periods 2 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=AutoScalingGroupName,Value=MyASG

# Create scaling policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name MyASG \
  --policy-name scale-up-policy \
  --policy-type StepScaling \
  --adjustment-type ChangeInCapacity \
  --step-adjustments \
    "MetricIntervalLowerBound=0,MetricIntervalUpperBound=10,ScalingAdjustment=1" \
    "MetricIntervalLowerBound=10,ScalingAdjustment=2"

Scheduled Scaling

aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name MyASG \
  --scheduled-action-name ScaleUpMorning \
  --start-time "2026-01-20T08:00:00Z" \
  --recurrence "0 8 * * MON-FRI" \
  --min-size 5 \
  --max-size 20 \
  --desired-capacity 10

Lifecycle Hooks

aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name instance-launching-hook \
  --auto-scaling-group-name MyASG \
  --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
  --default-result CONTINUE \
  --heartbeat-timeout 300 \
  --notification-target-arn arn:aws:sns:region:account:my-topic

Load Balancing

Load Balancer Types

Type	OSI Layer	Protocol	Use Case	Key Features
Application Load Balancer (ALB)	Layer 7	HTTP/HTTPS	Web applications, microservices	Path/host routing, WebSocket, HTTP/2
Network Load Balancer (NLB)	Layer 4	TCP/UDP/TLS	High-performance, low latency	Static IP, millions RPS, preserve source IP
Gateway Load Balancer (GWLB)	Layer 3	IP	Third-party virtual appliances	Traffic inspection, firewall integration
Classic Load Balancer (CLB)	Layer 4/7	TCP/HTTP	Legacy applications	Deprecated for new deployments

Application Load Balancer with Auto Scaling

# Create target group
aws elbv2 create-target-group \
  --name web-tg \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-0123456789abcdef0 \
  --health-check-enabled \
  --health-check-protocol HTTP \
  --health-check-path /health \
  --health-check-interval-seconds 30 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3

# Create ALB
aws elbv2 create-load-balancer \
  --name web-alb \
  --subnets subnet-0123 subnet-4567 \
  --security-groups sg-0123456789abcdef0 \
  --scheme internet-facing \
  --type application

# Create listener
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/web-alb/abc123 \
  --protocol HTTP \
  --port 80 \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/web-tg/xyz789

# Add HTTPS listener with SSL certificate
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/web-alb/abc123 \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=arn:aws:acm:region:account:certificate/cert-id \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/web-tg/xyz789

ALB with Terraform

resource "aws_lb" "web" {
  name               = "web-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = true
  enable_http2              = true
}

resource "aws_lb_target_group" "web" {
  name     = "web-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  health_check {
    path                = "/health"
    protocol            = "HTTP"
    interval            = 30
    timeout             = 5
    healthy_threshold   = 2
    unhealthy_threshold = 3
  }
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.web.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.web.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate.web.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

# Path-based routing
resource "aws_lb_listener_rule" "api" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 100

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api.arn
  }

  condition {
    path_pattern {
      values = ["/api/*"]
    }
  }
}

ELB Health Checks

Auto Scaling uses health checks to replace unhealthy instances
Health check types:
- EC2: Instance status checks
- ELB: Load balancer health checks (recommended)
Grace period: Time before health checks start after instance launch

Monitoring and CloudWatch

CloudWatch Metrics for EC2

Basic Monitoring (Free, 5-minute intervals)

CPUUtilization
DiskReadOps, DiskWriteOps
DiskReadBytes, DiskWriteBytes
NetworkIn, NetworkOut
NetworkPacketsIn, NetworkPacketsOut
StatusCheckFailed, StatusCheckFailed_Instance, StatusCheckFailed_System

Detailed Monitoring (Paid, 1-minute intervals)

# Enable detailed monitoring
aws ec2 monitor-instances --instance-ids i-0123456789abcdef0

# Disable detailed monitoring
aws ec2 unmonitor-instances --instance-ids i-0123456789abcdef0

Custom Metrics

Memory utilization (not included by default)
Disk space utilization
Application-specific metrics

CloudWatch Agent Installation

# Download and install CloudWatch agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
sudo rpm -U ./amazon-cloudwatch-agent.rpm

# Configure agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

# Start agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config \
  -m ec2 \
  -s \
  -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

CloudWatch Agent Configuration (JSON)

{
  "metrics": {
    "namespace": "CustomMetrics/EC2",
    "metrics_collected": {
      "mem": {
        "measurement": [
          {"name": "mem_used_percent", "rename": "MemoryUtilization", "unit": "Percent"}
        ],
        "metrics_collection_interval": 60
      },
      "disk": {
        "measurement": [
          {"name": "used_percent", "rename": "DiskUtilization", "unit": "Percent"}
        ],
        "metrics_collection_interval": 60,
        "resources": ["*"]
      }
    }
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/httpd/access_log",
            "log_group_name": "/aws/ec2/httpd",
            "log_stream_name": "{instance_id}/access_log"
          }
        ]
      }
    }
  }
}

CloudWatch Alarms

# Create CPU alarm
aws cloudwatch put-metric-alarm \
  --alarm-name high-cpu-alarm \
  --alarm-description "Alert when CPU exceeds 80%" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --evaluation-periods 2 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --alarm-actions arn:aws:sns:region:account:my-topic

# Create disk space alarm (custom metric)
aws cloudwatch put-metric-alarm \
  --alarm-name high-disk-usage \
  --alarm-description "Alert when disk usage > 80%" \
  --metric-name DiskUtilization \
  --namespace CustomMetrics/EC2 \
  --statistic Average \
  --period 300 \
  --evaluation-periods 1 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0,Name=path,Value=/ \
  --alarm-actions arn:aws:sns:region:account:my-topic

# Create alarm with EC2 action (stop instance)
aws cloudwatch put-metric-alarm \
  --alarm-name stop-instance-on-high-cpu \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --evaluation-periods 3 \
  --threshold 95 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --alarm-actions arn:aws:automate:region:ec2:stop

Alarm Actions

SNS notification
EC2 action: stop, terminate, reboot, recover
Auto Scaling action
Systems Manager action
Lambda function invocation

CloudWatch Logs

# Create log group
aws logs create-log-group --log-group-name /aws/ec2/application

# Set retention policy
aws logs put-retention-policy \
  --log-group-name /aws/ec2/application \
  --retention-in-days 7

# Create metric filter
aws logs put-metric-filter \
  --log-group-name /aws/ec2/application \
  --filter-name ErrorCount \
  --filter-pattern "[ERROR]" \
  --metric-transformations \
    metricName=ApplicationErrors,metricNamespace=CustomApp,metricValue=1

IAM Roles and Instance Profiles

Create IAM Role for EC2

# Create trust policy
cat > ec2-trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

# Create IAM role
aws iam create-role \
  --role-name EC2-S3-Access-Role \
  --assume-role-policy-document file://ec2-trust-policy.json

# Attach policy to role
aws iam attach-role-policy \
  --role-name EC2-S3-Access-Role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

# Create instance profile
aws iam create-instance-profile \
  --instance-profile-name EC2-S3-Access-Profile

# Add role to instance profile
aws iam add-role-to-instance-profile \
  --instance-profile-name EC2-S3-Access-Profile \
  --role-name EC2-S3-Access-Role

# Attach instance profile to running instance
aws ec2 associate-iam-instance-profile \
  --instance-id i-0123456789abcdef0 \
  --iam-instance-profile Name=EC2-S3-Access-Profile

IAM Role with Terraform

resource "aws_iam_role" "ec2_role" {
  name = "ec2-app-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "s3_access" {
  role       = aws_iam_role.ec2_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
}

resource "aws_iam_instance_profile" "ec2_profile" {
  name = "ec2-app-profile"
  role = aws_iam_role.ec2_role.name
}

resource "aws_instance" "app" {
  ami                  = "ami-0c55b159cbfafe1f0"
  instance_type        = "t3.medium"
  iam_instance_profile = aws_iam_instance_profile.ec2_profile.name
}

Instance Management Commands

List and Describe Instances

# List all instances
aws ec2 describe-instances

# List instances with specific state
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running"

# List instances with specific tag
aws ec2 describe-instances \
  --filters "Name=tag:Environment,Values=Production"

# Get instance details in table format
aws ec2 describe-instances \
  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,PrivateIpAddress,PublicIpAddress,Tags[?Key==`Name`].Value|]' \
  --output table

# Get specific instance details
aws ec2 describe-instances \
  --instance-ids i-0123456789abcdef0

# Get instance status
aws ec2 describe-instance-status \
  --instance-ids i-0123456789abcdef0

Tagging Operations

# Create tags
aws ec2 create-tags \
  --resources i-0123456789abcdef0 \
  --tags Key=Name,Value=WebServer Key=Environment,Value=Production

# Delete tags
aws ec2 delete-tags \
  --resources i-0123456789abcdef0 \
  --tags Key=OldTag

Console Access and Troubleshooting

# Get console output
aws ec2 get-console-output \
  --instance-id i-0123456789abcdef0

# Get console screenshot
aws ec2 get-console-screenshot \
  --instance-id i-0123456789abcdef0

# Get password data (Windows)
aws ec2 get-password-data \
  --instance-id i-0123456789abcdef0 \
  --priv-launch-key-file MyKeyPair.pem

Cost Optimization Best Practices

Right-Sizing

Use AWS Compute Optimizer for recommendations
Monitor CloudWatch metrics for actual utilization
Start with burstable instances (T3/T4g) for variable workloads
Use AWS Cost Explorer to identify underutilized instances

Instance Selection

Prefer Graviton instances (T4g, M7g, C7g) for up to 40% better price-performance
Use AMD instances (T3a, M5a, C5a) for 10% cost savings
Consider Spot instances for fault-tolerant workloads (up to 90% savings)
Implement Savings Plans for committed usage (up to 72% savings)

Storage Optimization

Use gp3 instead of gp2 (20% cheaper, better performance)
Delete unused EBS volumes and snapshots
Implement lifecycle policies for snapshot retention
Use S3 for infrequently accessed data

Auto Scaling Configuration

Set appropriate min/max/desired capacity
Use target tracking for dynamic scaling
Implement scheduled scaling for predictable patterns
Configure scale-in protection for long-running tasks

Monitoring and Cleanup

Tag all resources for cost allocation
Set up billing alerts
Regularly review and terminate unused instances
Use AWS Trusted Advisor for optimization recommendations

Security Best Practices

Network Security

Deploy instances in private subnets
Use security groups with least privilege
Implement Network ACLs for subnet-level filtering
Enable VPC Flow Logs for traffic analysis
Use AWS PrivateLink for service access

Access Control

Use IAM roles instead of access keys
Implement least privilege IAM policies
Enable MFA for privileged operations
Use Systems Manager Session Manager instead of SSH (no key management)
Rotate SSH keys regularly

Data Protection

Enable EBS encryption by default
Encrypt snapshots
Use encrypted AMIs
Implement backup strategies
Enable termination protection for critical instances

Instance Hardening

Keep OS and applications updated
Use AWS Systems Manager Patch Manager
Implement host-based firewalls
Disable unnecessary services
Use IMDSv2 for metadata access
Enable CloudWatch Logs for audit trails

Monitoring and Compliance

Enable CloudTrail for API logging
Use AWS Config for compliance monitoring
Implement AWS Security Hub
Set up CloudWatch alarms for security events
Regular security assessments and penetration testing

This comprehensive reference provides all essential working details for AWS EC2 operations in a structured, point-wise format suitable for quick reference and immediate implementation.

DEV Community