DEV Community

Cover image for AWS Under the Hood - Day 6 - Why doesn't AWS EC2 CloudWatch collect metrics like memory and disk utilization by default?
Prashant Lakhera
Prashant Lakhera

Posted on

AWS Under the Hood - Day 6 - Why doesn't AWS EC2 CloudWatch collect metrics like memory and disk utilization by default?

AWS EC2 CloudWatch does not collect metrics such as memory and disk utilization by default, primarily due to how EC2 instances are designed and how CloudWatch collects metrics.

Here's how it works:
1: Nature of Metrics Collected by CloudWatch
CloudWatch primarily collects metrics from the hypervisor level, not from within the operating system running on the EC2 instances. Metrics like CPU utilization, network in/out, and disk read/write operations can be monitored at the hypervisor level, which manages the physical servers on which virtual machines run.
2: Memory and Disk Utilization Metrics
Metrics such as memory and disk utilization require visibility into the operating system's internal state because these metrics are managed inside the guest OS:
Memory utilization needs to be assessed based on what processes within the operating system are using that aren't visible to the hypervisor.
Disk utilization involves understanding how much space is used versus available on the disks mounted to the operating system, which also isn't available at the hypervisor level.
3: Operating System Privacy and Security
Allowing CloudWatch direct access to these metrics could raise privacy and security issues, as it requires deeper access to the operating system's internals. Amazon maintains a boundary here to ensure customer data and operations within the EC2 instances remain secure and private.

Then how can we collect these metrics?
and the answer to this question is via CloudWatch Agent
AWS provides the CloudWatch Agent to capture these internal metrics that you can install inside your EC2 instances. This agent can monitor and send system and application-level metrics and logs to CloudWatch. Here's what happens under the hood when you use the CloudWatch Agent:
Installation and Configuration: The CloudWatch Agent is installed on the EC2 instance. You configure the agent to specify what metrics to collect, including memory and disk metrics.
Data Collection: The agent collects the configured metrics from the operating system.
Data Transmission: It sends these metrics to CloudWatch using secure AWS APIs.
Visualization and Monitoring: Once in CloudWatch, these metrics can be visualized using dashboards, used for alarms, or analyzed with other AWS services.

🧐 I am still confused about how it collects CPU metrics, not memory or disk metrics. Let's break this down into simpler language.

CPU Utilization:
What It Is: This measures how much of the CPU's capacity is being used.
How It's Measured: The hypervisor (like the manager for virtual machines on a physical server) can see how much CPU time each virtual machine uses. This is because CPU tasks are scheduled and managed directly by the hypervisor, which allocates CPU time among all the virtual machines it controls.
Why It's Different: Since CPU usage directly involves the hypervisor's management of resources, it doesn't need to look inside the virtual machine's operating system to get this information.

Memory Utilization:
What It Is: This measures how much RAM (memory) is used by the machine's processes.
How It's Measured: Unlike CPU usage, memory is continuously used and released by various applications within the operating system. To know how much memory is being used, you need insight into the operating system itself, which the hypervisor does not have by default.
Why It's Different: The hypervisor doesn't manage memory allocation between processes within the virtual machines; this is managed internally by each operating system.

Disk Utilization:
What It Is: This measures how much disk space is being used and how much is still free.
How It's Measured: Just like with memory, checking disk utilization requires looking into the file system within the operating system to see how much space files are taking up and how much is left.
Why It's Different: Disk space management is done by the operating system and is not visible to the hypervisor unless specific monitoring tools (like the CloudWatch Agent) are used.

Conclusion
AWS's design decision not to automatically include specific metrics like memory and disk utilization in CloudWatch's default metrics balances ease of use, security, and flexibility. By utilizing the CloudWatch Agent, users can tailor their monitoring to include these and other detailed metrics, providing a comprehensive view of their system's health and performance.

Top comments (0)