Amazon Elastic Compute Cloud (EC2) has more than 13 years of public history and is one of the oldest AWS services. EC2 is a mature service that reinvented itself many times:
- From EC2 classic to Amazon VPC.
- From SSH access to AWS SSM Session Manager.
- From self-managed backup solution to AWS Backup.
- More powerful instance families.
- New pricing options.
- And much more.
But there are still two approaches when it comes to managing EC2 instances: mutable and immutable.
A mutable EC2 instance is created once and then lives for many years. Humans log on to the machine (e.g., via SSH or RDP) and do their work. OS updates are applied to the running system; new packages are installed from time to time; configuration files are modified when needed. Deployments happen while the EC2 instance is running.
An immutable EC2 instance is never changed after creation. If you want to update the OS, you create a new EC2 instance that starts from a fresher image (AMI). If new packages are needed, a new AMI is created that contains those packages. If a new deployment is necessary, a new AMI is built and rolled out be replacing the EC2 instances. The EC2 instance is ephemeral and must not be used to persists data!
In this blog post, I will focus on the mutable approach and show you how to solve everyday challenges with the tools and features that AWS provides in 2020:
- Patching
- Backup and Restore
- Remote Access
- Software Deployments
- Monitoring
- Logs
- Single Point of Failure
If you prefer the immutable approach, Packer by HashiCorp is still the best approach to create AMIs.
You can find a link to a CloudFormation template with the implementation of all best practices at the end of the article.
Patching
As soon as you launch an EC2 instance, you have to ask yourself one question: How can I keep this machine up-to-date? The best option today is provided by AWS Systems Manager (SSM). A combination of the following capabilities (aka Patch Manager) allows us to patch EC2 instances during a predefined window in a configurable way:
- Patch Baseline: Defines which patches are approved for installation on your instance (e.g., install critical patches 7 days after they are releases).
-
Document
AWS-RunPatchBaseline
: The script that installs the patches approved by the baseline. - Maintenance Window: Executs the document on a set of EC2 instances within a recurring time window.
The default patch baseline for Amazon Linux 2 looks like this:
The maintenance window is configured to run every day at 12:35 UTC (this is one of the few places in AWS where you can set your timezone!)
You also get full insights into the executions of the maintenance window executions.
Use CloudWatch Event Rules to subscribe to failures. Our Slack bot marbot can set up the CloudWatch Event Rules for you.
Backup and Restore
Mutable EC2 instances likely contain data that needs to be backed up. The best way to perform backups of EC2 instances is AWS Backup. AWS Backup allows us to backup EC2 instances during a predefined window and manages the lifecycle of a backup as well (e.g., delete backups after 30 days).
The following screenshot shows a list of daily backups. You can restore any of these backups right through AWS Backup.
You also get full insights into the backup jobs.
Use AWS Backup Events to subscribe to failures.
Keep in mind that EC2 backups performed by AWS Backup are "crash consistent". Writes not flushed to disk can cause data corruption.
Remote Access
To modify a mutable EC2 instance, you likely want to open an SSH/RDP connection to your instance. Remote access comes with several challenges:
- configuration of security groups
- distribution of credentials
- rotation of credentials
- SSH client needs to be installed and configured on your machine
The less painful approach is to use AWS SSM Session Manager. Session Manager is integrated into the AWS Management Console and can also be used in your terminal.
Keep in mind that your IAM permissions now also manage who can become root on any EC2 instance.
Software Deployments
Deploying a new software release is a risky task. Instead of uploading a new release to the EC2 instance manually, I recommend using AWS CodeDeploy. CodeDeploy helps you to deploy your software in an automated way with automatic rollback if things go wrong.
Monitoring
A lot of useful information is published to CloudWatch by default:
- CPU utilization
- Network IO
- Disk IO
What information is missing?
- Memory
- Disk usage
The missing metrics can be collected with the Unified CloudWatch Agent best installed via SSM.
Create CloudWatch Alarms to monitor if a metric reaches a threshold.
Logs
Mutable EC2 instances are around for some time. You can search tough the logs as usual: Open a remote session and open the log files on your editor of choice.
If you want to centralize your logs, I recommend to ship them to CloudWatch Logs. The Unified CloudWatch Agent that you learned about before can pipe the logs from the EC2 instance to CloudWatch Logs. With CloudWatch Logs Insights, you can search and visualize the logs with ease.
Single Point of Failure
Remember that a single EC2 instance is always a single point of failure (SPOF). The risk of a failing hypervisor can be limited by configuring automatic instance recovery. Instance recovery does not protect your instance from Availability Zones outages.
Keep in mind that the EC2 SLA does not cover single instances.
Summary
Managing a mutable EC2 instance comes with many responsibilities. In this post, I showed you how to solve everyday challenges by leveraging the latest and greatest capabilities of the AWS platform.
Find a full implementation codified into two CloudFormation templates (al2-mutable-public.yaml
and al2-mutable-private.yaml
) on Github: https://github.com/widdix/aws-cf-templates/tree/master/ec2
Top comments (0)