DEV Community

Subham Nandi
Subham Nandi

Posted on

Understanding AWS CloudWatch: A Comprehensive Guide

1. What is CloudWatch?

AWS CloudWatch helps you monitor AWS resources and applications. It's designed to track events, such as EC2 instances running, S3 uploads, and interactions with other resources. In simple terms, CloudWatch monitors, alerts, logs, and reports events across your AWS environment.

You can think of CloudWatch as a gatekeeper that keeps a log of all activities in your AWS account. For example, it monitors EC2 instances, collects CPU utilization metrics, and tracks logs for insights into your applications.

2. Core Concepts

  • Metrics: Metrics are key performance indicators such as CPU utilization, disk I/O, or memory usage. CloudWatch tracks these metrics in real time.

  • Alarms: Alarms allow you to set thresholds for your metrics. For example, you can configure an alarm that triggers if CPU utilization exceeds 80%. CloudWatch will send notifications via email or SMS when alarms are breached.

  • Logs: CloudWatch also logs all activities, such as when an EC2 instance communicates with other AWS services. These logs provide insight into who is accessing your resources and when.

  • Custom Metrics: By default, CloudWatch provides built-in metrics like CPU utilization, but it doesn’t track everything (e.g., memory usage). For that, you need to configure custom metrics to capture additional data.

  • Cost Optimization: CloudWatch integrates with services like AWS Lambda to help optimize costs by identifying unused resources.

  • Scaling: CloudWatch helps in scaling resources by integrating with Auto Scaling. For example, if CPU utilization exceeds 80%, it can trigger Auto Scaling to add more EC2 instances.


Step-by-Step Tutorial on AWS CloudWatch: Logs, Metrics, Alarms, and Dashboards

In this guide, we’ll walk through some of the key features of AWS CloudWatch, focusing on Logs, Metrics, and Alarms, while briefly mentioning Dashboards. We’ll ignore Service Lens for now. Let's begin by setting up and exploring the capabilities of CloudWatch in your AWS environment.


Step 1: Open AWS CloudWatch

  1. Log in to your AWS Console.
  2. Navigate to CloudWatch by searching for it in the Services tab.

On the left side of the screen, you will see a list of features supported by CloudWatch. We'll begin by exploring Logs, then move on to Metrics, Alarms, and Dashboards.


Step 2: Working with Log Groups

Log Groups automatically organize logs generated by your services. For example, if you create an application in CodeBuild, CloudWatch automatically creates a log group to store logs for that specific application.

2.1 Explore Existing Log Groups

  • In CloudWatch, on the left panel, click on Log Groups.
  • Here, you will see a list of all log groups created automatically by CloudWatch for your various AWS services.

2.2 View Log Streams

  • Click on any log group to view its log streams.
  • Each log stream contains records of actions, such as builds in CodeBuild or activity from other AWS services.
  • Click on a log stream to view the details of the logged activity.

Example:

If you had an error while running a CodeBuild project, CloudWatch will capture the entire build process, including both success and failure logs. For instance:

  • View the requirements.txt error logs if your build failed due to missing files.

Step 3: Setting Up and Using AWS CloudWatch Metrics

Metrics are vital for monitoring the performance of your AWS services.

3.1 View CloudWatch Metrics

  • On the left menu, click on Metrics.
  • CloudWatch tracks over 1000 default metrics across various AWS services. For example, EC2 instances have metrics like CPU utilization, network traffic, and disk IO.

3.2 Navigating to EC2 Metrics

  1. In the Metrics section, navigate to EC2 service metrics.
  2. Select Per-Instance Metrics to monitor the performance of individual EC2 instances.
  3. Choose metrics like CPU Utilization, Network In/Out, or Disk Reads/Writes.

3.3 Understanding Metric Data

  • CloudWatch automatically collects data for your services at predefined intervals (e.g., every 5 minutes for EC2).
  • You can adjust the time period to view metrics over the last hour, day, or even month.

Example:

If your EC2 instance's CPU utilization is high, CloudWatch will log this in the metrics dashboard, allowing you to monitor performance in real-time or historically.


Step 4: Enabling Detailed Monitoring

By default, EC2 instances report metrics every 5 minutes. However, to receive data more frequently (every minute), you can enable detailed monitoring.

4.1 Enable Detailed Monitoring

  1. Navigate to your EC2 Instances page.
  2. Select the instance you want to monitor.
  3. Go to the Monitoring tab and click Manage detailed monitoring.
  4. Enable detailed monitoring for 1-minute intervals.

Step 5: Simulate CPU Spikes and Track Metrics

Let’s create a sample Python script to simulate CPU spikes on your EC2 instance and observe how CloudWatch Metrics captures these changes.

5.1 Launch an EC2 Instance

  1. In the EC2 console, click Launch Instance.
  2. Name the instance (e.g., CloudWatch Demo).
  3. Choose Ubuntu as the operating system and t2.micro as the instance type.
  4. Assign a Key Pair for SSH access and ensure Public IP is enabled.
  5. Click Launch.

5.2 Log in to the EC2 Instance

  1. Once the instance is running, copy the public IP.
  2. SSH into the instance using the key pair.

    ssh -i "your-key.pem" ubuntu@your-instance-ip
    

5.3 Create a Python Script to Simulate CPU Usage

  1. On the EC2 instance, create a new file for the script.

    nano cpu_spike.py
    
  2. Paste the following Python code to simulate a CPU spike:

   import time

   def simulate_cpu_spike(duration=30, cpu_percent=80):
       print(f"Simulating CPU spike at {cpu_percent}%...")
       start_time = time.time()

       # Calculate the number of iterations needed to achieve the desired CPU utilization
       target_percent = cpu_percent / 100
       total_iterations = int(target_percent * 5_000_000)  # Adjust the number as needed

       # Perform simple arithmetic operations to spike CPU utilization
       for _ in range(total_iterations):
           result = 0
           for i in range(1, 1001):
               result += i

       # Wait for the rest of the time interval
       elapsed_time = time.time() - start_time
       remaining_time = max(0, duration - elapsed_time)
       time.sleep(remaining_time)

       print("CPU spike simulation completed.")

   if __name__ == '__main__':
       # Simulate a CPU spike for 30 seconds with 80% CPU utilization
       simulate_cpu_spike(duration=30, cpu_percent=80)
Enter fullscreen mode Exit fullscreen mode
  1. Save the file and exit (Ctrl+X, then Y).

  2. Run the script to simulate CPU load:

    python3 cpu_spike.py
    

5.4 Monitor the Metrics in CloudWatch

  • Return to the CloudWatch Metrics console.
  • Go to EC2 -> Per-Instance Metrics.
  • Select your instance and view the CPU Utilization metric.

You should start to see the CPU utilization graph spike as your Python script consumes CPU resources.


Step 6: Dashboards Overview

Finally, let’s touch on Dashboards, which provide a centralized view of your CloudWatch metrics.

7.1 Create a Dashboard

  1. In the CloudWatch console, click Dashboards on the left panel.
  2. Click Create Dashboard.
  3. Name your dashboard (e.g., EC2 Monitoring).
  4. Add widgets to display metrics like CPU utilization, network traffic, or disk IO in real-time.

This gives you a single place to monitor all key metrics related to your AWS resources.


Step-by-Step Guide to Setting Up AWS CloudWatch Metrics and Alarms for EC2 CPU Utilization

1. Login to AWS Console

2. Navigate to CloudWatch

  • Once logged in, use the AWS search bar to find CloudWatch.
  • Click on CloudWatch under the list of services.

3. Access EC2 Metrics

  • In the CloudWatch dashboard, go to Metrics in the sidebar.
  • Select EC2 from the list of available services to view EC2-related metrics.

4. Choose a Metric (CPU Utilization)

  • In the EC2 section, click on Per-Instance Metrics.
  • Search for CPU Utilization metric for the EC2 instance you wish to monitor.

5. Create a CloudWatch Alarm

  • After selecting the metric, click on the Create Alarm button at the top.
  • In the alarm configuration screen:
    • Statistic: Choose either Average or Maximum. For this tutorial, use Maximum.
    • Period: Set this to 1 minute for a more frequent check.
    • Threshold: Set a threshold like CPU Utilization >= 50%.

6. Set Up Actions for the Alarm

  • In the Actions section, configure how you want to be notified when the alarm is triggered.
  • Select Create new SNS topic:

    • Topic Name: cloudwatch-topic
    • Email endpoint: Enter your email address where the notification will be sent.
  • SNS: SNS (Simple Notification Service) will manage notifications and send alerts when your alarm is triggered.

    • Check your email inbox and confirm the subscription by clicking on the confirmation link in the email from AWS.

7. Configure Alarm Details

  • Provide a name for the alarm, such as High CPU Alarm.
  • In the description field, you can write something like:

     Hey team, 
     This is an automated notification from AWS CloudWatch to let you know that CPU Utilization has spiked to 50% or above. Please take the necessary actions.
    
  • Click Create Alarm.

8. Verify Alarm Status

  • After creating the alarm, it will be in Insufficient Data status initially. This is because it requires time to gather sufficient metrics.
  • Check your email and confirm the subscription to SNS if you haven’t done so already.

9. Simulate CPU Spike (Optional)

  • To test the alarm, you can simulate CPU utilization spike on your EC2 instance:

    • SSH into the EC2 instance.
    • Run a Python script to artificially increase CPU load. Here's an example script:
       import multiprocessing
       def cpu_stress():
           while True:
               pass
    
       processes = [multiprocessing.Process(target=cpu_stress) for _ in range(multiprocessing.cpu_count())]
       for p in processes:
           p.start()
       for p in processes:
           p.join()
    
    • This will increase the CPU utilization and should trigger the alarm if it crosses the 50% threshold.

10. Monitor Alarm & Notification

  • After a minute, you should receive a notification email based on the alarm conditions.
  • You can also monitor the alarm status from the CloudWatch console by refreshing the Alarms section.

11. Review Metrics and Logs

  • Once the alarm is triggered, you can go back to CloudWatch and view the Alarm and Metrics section for detailed information.
  • Check the SNS topic to see the email notifications.

12. Cleanup

  • After testing, don't forget to delete the alarm and EC2 instance to avoid unnecessary charges.

Top comments (0)