Are you monitoring the network utilization of your EC2 instances? Why not? The network is one of the rare resources that will limit your workload's maximum throughput:
I've debugged performance problems in a lot of infrastructures during the last 12 months. In most of the scenarios, the network capabilities of EC2 or RDS instances was the bottleneck causing troubles. That is why I want to share with you how to monitor the network utilization of EC2 instances.
To monitor the networking utilization of an EC2 instance, we need to solve two challenges.
To be able to monitor the network utilization of your EC2 instance, you need to be able to answer the following question. What are the baseline and maximum network throughput of your EC2 instance? Unfortunately, AWS does not provide accurate information about the network performance for most instance types. For example, AWS promises
Moderate network performance for a
t2.xlarge instance or
Up to 10 Gbps for a
This provided information is not satisfactory. That is why I ran a network performance benchmark and published the results at EC2 Network Performance Cheat Sheet. The results are astonishing.
m5.large instance provides 10.04 Gbit/s for a few minutes only. Afterward, the baseline network performance for an m5.large instance is around 0.74 Gbit/s. The results for other instance types look similar.
The EC2 Network Performance Cheat Sheet gives you an estimation for the baseline and maximum network throughput of your EC2 instance which allows you to define a threshold for monitoring.
Fine, we have solved the challenge #1.
Each EC2 instance reports various metrics to CloudWatch. The metrics
NetworkOut collect the number of bytes received on all network interfaces by the instance. However, to calculate the network utilization of your EC2 instance, you need to add up both metrics.
Pick one of the following options to create a CloudWatch alarm monitoring the total network utilization of your EC2 instance:
- Use the AWS Management Console to create the CloudWatch alarm manually.
- Use CloudFormation to create the CloudWatch alarm with Infrastructure as Code.
- Use marbot's Jump Start to create the CloudWatch alarm.
Log into the AWS Management Console and go to CloudWatch. Select
Alarms from the sub-navigation and click the
Create Alarm button. The wizard shown in the following screenshot appears. Click the
Select metric button.
Search for the
NetworkOut metrics of your EC2 instance and select them both. After doing so, select the
Graphed metrics tab.
Add a math expression.
- Type in id
- Type in the expression
Let me quickly explain the math expression
- Add up
- Divide by
300to convert from 5 minutes to 1 second.
- Divide by
1000/1000/1000*8to convert Byte in Gbit.
Make sure you have only selected the math expression before you click the
Select metric button.
Finally, set up the alarm.
- Type in a name and description.
- Define the threshold. For example, 80% of the baseline network performance listed in the EC2 Network Performance Cheat Sheet.
- To avoid alarms from short network utilization spikes configure
8 out of 12 datapoints. Which translates to 45 minutes within an hour.
Create Alarm button.
Fine, you have set up a CloudWatch alarm monitoring the network utilization of your EC2 instance.
Instead of going through this process manually, you could create CloudWatch alarms in an automated way with the help of CloudFormation as well.
The following snippet shows a CloudFormation template setting up a CloudWatch alarm monitoring the network utilization of an EC2 instance.
You need to modify the
Threshold. I suggest 80% of the network baseline performance as listed in the EC2 Network Performance Cheat Sheet.
AWSTemplateFormatVersion: '2010-09-09' Parameters: Topic: Type: String InstanceId: Type: String Resources: NetworkUtilizationTooHighAlarm: Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'EC2 High Network Utilization' Metrics: - Id: in Label: NetworkIn MetricStat: Metric: Namespace: 'AWS/EC2' MetricName: NetworkIn Dimensions: - Name: InstanceId Value: !Ref InstanceId Period: 300 Stat: Sum Unit: Bytes ReturnData: false - Id: out Label: NetworkOut MetricStat: Metric: Namespace: 'AWS/EC2' MetricName: NetworkOut Dimensions: - Name: InstanceId Value: !Ref InstanceId Period: 300 Stat: Sum Unit: Bytes ReturnData: false - Id: total Label: 'NetworkTotal' Expression: '(in+out)/300/1000/1000/1000*8' # Gbit/s ReturnData: true ComparisonOperator: GreaterThanThreshold EvaluationPeriods: 12 DatapointsToAlarm: 8 Threshold: '0.048' # Gbit/s AlarmActions: - !Ref Topic OKActions: - !Ref Topic TreatMissingData: notBreaching
Are you looking for an, even more, simpler way to monitor the network utilization of your EC2 instance?
Our chatbot marbot escalates alarms among the members of your DevOps team. Luckily, marbot provides built-in Jump Starts which simplify creating CloudWatch alarms for your cloud resources. The Jump Start for EC2 instances sets up monitoring for network utilization as well.
- Add marbot to your Slack workspace.
- Invite marbot to a channel.
- Follow the installation instructions.
- Ask marbot for help monitoring your EC2 instance:
@marbot Help me to monitor my EC2 instance.
EC2 instancesas monitoring goal and follow the Jump Start wizard as shown in the following screenshot.
It couldn't be easier!
Monitoring the network utilization of your EC2 instance is essential, as the network is a limited resource. The instance type affects maximum and baseline performance. Your EC2 instance might not be able to provide the maximum network performance for more than 5 to 30 minutes. Therefore, use the baseline performance to define the alarm threshold. Use EC2 Network Performance Cheat Sheet to get an estimation of the network performance of your EC2 instance.