At TramitApp Control Horario, due to our steady growth, month over month, we've had to move our platform from a hybrid cloud to AWS "all in" due to it's scalability benefits.
For example, you can setup alarms so that when an ec2-instance has an average CPU of X for Y minutes, you can spin up another ec2-instance to help cope with the load.
Setup
Create a script that will install CloudWatch Monitoring tools and setup a cronjob that will post metrics every 5 minutes, in our case memory used, memory utilization and disk space utilization in two volumes, / and /data
#!/bin/bash
sudo yum install -y perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https perl-Digest-SHA.x86_64
cd $HOME
wget http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip
unzip CloudWatchMonitoringScripts-1.2.1.zip
rm CloudWatchMonitoringScripts-1.2.1.zip
(crontab -l 2>/dev/null; echo "*/5 * * * * ~/aws-scripts-mon/mon-put-instance-data.pl --mem-used --mem-util --disk-space-util --disk-path=/ --disk-path=/data --from-cron") | crontab -
AWS Configure
Create a script that will do a default aws-configure to configure the proper REGION for our alarms
#!/bin/sh
REGION=$(ec2-metadata -z | grep -Po "(us|sa|eu|ap)-(north|south|central)?(east|west)?-[0-9]+")
if [ ! -d /home/ec2-user/.aws/ ]; then
mkdir -p /home/ec2-user/.aws/
fi
if [ ! -d /root/.aws/ ]; then
mkdir -p /root/.aws/
fi
echo "[default]"> /home/ec2-user/.aws/config
echo "region = $REGION" >>/home/ec2-user/.aws/config
echo "[default]"> /root/.aws/config
echo "region = $REGION" >>/root/.aws/config
Create alarms script
In my case, I use Amazon Linux, so we have ec2-metadata command, but you you can always curl http://169.254.169.254/latest/dynamic/instance-identity/document from the ec2-instance and get the same info you get with ec2-metadata if you use other distro.
In this example
#!/bin/sh
REGION=$(ec2-metadata -z | grep -Po "(us|sa|eu|ap)-(north|south|central)?(east|west)?-[0-9]+")
if [ "$REGION" = "eu-west-1" ]; then
SNS_TOPIC="WHATEVER_ARN_ID_YOU_HAVE_IN_THIS_REGION"
fi
if [ "$REGION" = "eu-west-2" ]; then
SNS_TOPIC="WHATEVER_ARN_ID_YOU_HAVE_IN_THIS_REGION"
fi
if [ "$REGION" = "eu-west-3" ]; then
SNS_TOPIC="WHATEVER_ARN_ID_YOU_HAVE_IN_THIS_REGION"
fi
INSTANCE_ID=$(ec2-metadata --instance-id | cut -d " " -f 2)
INSTANCE_PRIVATE_IP=$(ec2-metadata -o | cut -d " " -f 2)
PRIMARY_PUBLIC_IP_ADDRESS=$(ec2-metadata -v | cut -d " " -f 2)
ROOT_DISK_THRESHOLD=75
DATA_DISK_THRESHOLD=80
MEMORY_THRESHOLD=75
CPU_THRESHOLD=75
FIVE_MINUTES_PERIOD=300
FIFTEEN_MINUTES_PERIOD=900
ROOT_DEVICE=/dev/nvme0n1p1
DATA_DEVICE=/dev/nvme1n1
ROOT_PATH=/
DATA_PATH=/data
echo "Setting up ${INSTANCE_PRIVATE_IP}-cpu-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name "${INSTANCE_PRIVATE_IP}-cpu-utilization" \
--alarm-description "Alarm when CPU exceeds $CPU_THRESHOLD percent" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period ${FIFTEEN_MINUTES_PERIOD} \
--threshold ${CPU_THRESHOLD} \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=InstanceId,Value=${INSTANCE_ID} \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent
echo "Setting up $INSTANCE_PRIVATE_IP-root-disk-space-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name $INSTANCE_PRIVATE_IP-root-disk-space-utilization \
--alarm-description "Alarm when root disk space exceeds $ROOT_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $FIVE_MINUTES_PERIOD \
--threshold $ROOT_DISK_THRESHOLD \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$ROOT_DEVICE Name=InstanceId,Value=$INSTANCE_ID Name=MountPath,Value=$ROOT_PATH \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent
echo "Setting up $INSTANCE_PRIVATE_IP-data-disk-space-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name $INSTANCE_PRIVATE_IP-data-disk-space-utilization \
--alarm-description "Alarm when data disk space exceeds $DATA_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $FIVE_MINUTES_PERIOD \
--threshold $DATA_DISK_THRESHOLD \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$DATA_DEVICE Name=InstanceId,Value=$INSTANCE_ID Name=MountPath,Value=$DATA_PATH \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent
echo "Setting up $INSTANCE_PRIVATE_IP-memory-usage-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name $INSTANCE_PRIVATE_IP-memory-usage-utilization \
--alarm-description "Alarm when memory exceeds $DATA_DISK_THRESHOLD percent" \
--metric-name MemoryUtilization \
--namespace System/Linux \
--statistic Average \
--period $FIFTEEN_MINUTES_PERIOD \
--threshold $MEMORY_THRESHOLD \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=InstanceId,Value=$INSTANCE_ID \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent
Pro Tip
If you create an AMI from this instance and setup a boot service that runs this 3 scripts (just make sure the first one only runs once), you will have the alarms without having to set them up manually.
Top comments (0)