DEV Community

Cover image for 🚨 AWS 125: Guardian of the Cloud - Setting Up CloudWatch Alarms
Hritik Raj
Hritik Raj

Posted on

🚨 AWS 125: Guardian of the Cloud - Setting Up CloudWatch Alarms

AWS

πŸ“‰ Proactive Monitoring: Catching CPU Spikes Before They Cause Downtime

Hey Cloud Builders πŸ‘‹

Welcome to Day 25 of the #100DaysOfCloud Challenge!
Today, we are moving from infrastructure building to Infrastructure Observability. The Nautilus team needs to know the second our server is struggling. We are setting up a CloudWatch Alarm to monitor our CPU and alert us via SNS (Simple Notification Service) if things get too hot!

This task is part of my hands-on practice on the KodeKloud Engineer platform, which I highly recommend for anyone looking to master real-world DevOps scenarios.


🎯 Objective

  • Launch an Ubuntu-based t2.micro instance named devops-ec2.
  • Create a CloudWatch Alarm named devops-alarm.
  • Configure the alarm to trigger if CPU Utilization is >= 90% for a 5-minute window.
  • Link the alarm to the existing devops-sns-topic for instant notifications.

πŸ’‘ Why Monitoring is Non-Negotiable

If a server crashes in the middle of the night and no one is alerted, your users are the ones who suffer.

πŸ”Ή Key Concepts

  • CloudWatch Metrics AWS resources automatically send performance data (CPU, Disk I/O, Network) to CloudWatch every few minutes.

  • Thresholds & Periods We don't want to be alerted for a 1-second spike. By setting a 5-minute period, we ensure the high CPU is a real problem, not just a temporary fluke.

  • SNS Integration CloudWatch finds the problem; SNS delivers the news. This combo allows for automated emails, SMS, or even Slack alerts.


πŸ› οΈ Step-by-Step: The Monitoring Workflow

We’ll move logically from Provisioning β†’ Metric Selection β†’ Alarm Creation.


πŸ”Ή Phase A: Launch the Target Instance

  • Provision EC2: Launch a new Ubuntu instance.
  • Tagging: Name it devops-ec2.
  • The "Secret Sauce": Once the instance is launched, Copy the Instance ID (e.g., i-0abcd1234efgh5678).

⚠️ Lesson Learned: CloudWatch metrics are indexed by Instance ID, not by the Name tag. You’ll need this ID to find your metrics in the next step!


πŸ”Ή Phase B: Configure the CloudWatch Metric

  • Navigate to CloudWatch: Open the CloudWatch console and go to Alarms > All alarms > Create alarm.

  • Select Metric: Click "Select metric" and navigate to EC2 > Per-Instance Metrics.

  • Filter by ID: Paste your Instance ID into the search bar to find the CPUUtilization metric for your specific server.


πŸ”Ή Phase C: Define Alarm Conditions & Actions

  • Statistic: Set to Average.
  • Period: Set to 5 minutes.

  • Threshold Logic: Choose Static, then set "Whenever CPUUtilization is..." to Greater/Equal than 90.

  • Configure Actions:
    • Under "Alarm state trigger," select In alarm.
    • Select an existing SNS topic: devops-sns-topic.



βœ… Verify Success

  • Check the Dashboard: Once created, your alarm should show a state of "OK" (assuming your CPU isn't already melting!).

  • Test the Flow: If this were a test environment, you could run a stress tool like stress-ng on the EC2 instance to force the CPU above 90% and watch the alarm turn πŸ”΄ In Alarm.

πŸ“ Key Takeaways

  • πŸš€ Identification: Always have your Instance IDs ready when configuring monitoring.
  • πŸ•’ Standard vs. Detailed: Basic monitoring happens every 5 mins (free). Detailed monitoring happens every 1 min (paid).
  • πŸ“£ Closed Loop: Monitoring is useless without an Action. Always ensure your SNS topic has active subscribers (emails confirmed).

🚫 Common Mistakes

  • Incorrect Metric: Selecting "Disk Read" instead of "CPU Utilization."
  • Threshold Mismatch: Setting a 1-minute period but expecting 5-minute averages.
  • Unconfirmed SNS: If the email in the SNS topic hasn't clicked "Confirm Subscription," they will never get the alert!

🌟 Final Thoughts

You’ve just set up a 24/7 digital security guard for your infrastructure. This is the first step toward Auto Scaling where the alarm doesn't just send an email, but actually tells AWS to launch more servers to handle the load!


🌟 Practice Like a Pro

If you want to try these tasks yourself in a real AWS environment, check out:
πŸ‘‰ KodeKloud Engineer - Practice Labs

It’s where I’ve been sharpening my skills daily!


πŸ”— Let’s Connect

Top comments (0)