Most AWS services provide logs or metrics to CloudWatch, which you can use to monitor your system's health or identify errors. However, services outside of AWS don't inherently benefit from these features. Less well known (judging by the number of GitHub stars) is that AWS offers CloudWatch Agent, an OS service to collect and export system metrics and logs from on-premise servers running outside of AWS.
aws / amazon-cloudwatch-agent
CloudWatch Agent enables you to collect and export host-level metrics and logs on instances running Linux or Windows server.
Amazon CloudWatch Agent
The Amazon CloudWatch Agent is software developed for the CloudWatch Agent
Overview
The Amazon CloudWatch Agent enables you to do the following:
- Collect more system-level metrics from Amazon EC2 instances across operating systems. The metrics can include in-guest metrics, in addition to the metrics for EC2 instances. The additional metrics that can be collected are listed in Metrics Collected by the CloudWatch Agent.
- Collect system-level metrics from on-premises servers. These can include servers in a hybrid environment as well as servers not managed by AWS.
- Retrieve custom metrics from your applications or services using the StatsD and collectd protocols. StatsD is supported on both Linux servers and servers running Windows Server. collectd is supported only on Linux servers.
- Collect logs from Amazon EC2 instances and on-premises servers, running either Linux or Windows Server.
- Collect Open Telemetry and AWS X-Ray traces
Amazon CloudWatch Agent uses open-source projects…
In this short post, I've compiled the most important information to help you quickly set up CloudWatch Agent on a server. All information comes from the official documentation, which I'll link for reference. If you prefer a comprehensive overview before starting, I recommend visiting the official docs.
Create IAM User
If your server isn't already using AWS and doesn't have its own user, you'll need to create one. CloudWatch Agent requires permissions to send logs and metrics to CloudWatch. Follow these steps to create a new user. Then, assign one of the managed policies provided by AWS:
I'm using the CloudWatchAgentAdminPolicy
, which is almost identical to the CloudWatchAgentServerPolicy
. The main difference is that the CloudWatchAgentAdminPolicy
allows creating and updating SSM Parameters, while the CloudWatchAgentServerPolicy
can only read from SSM. CloudWatch Agent can store and retrieve its config from SSM. When we later initialize CloudWatch Agent, we can directly push the config to SSM.
Configure AWS CLI
If you haven't installed the AWS CLI on your server, follow the instructions in the install guide. CloudWatch Agent uses the profile AmazonCloudWatchAgent
for the AWS CLI, so we must configure it with the credentials.
Run the following command and enter the credentials from the previous step:
sudo aws configure --profile AmazonCloudWatchAgent
If you've configured the AWS CLI previously and can't remember the credentials for the IAM user, you can retrieve them by running aws configure get aws_access_key_id
and aws configure get aws_secret_access_key
.
Install CloudWatch Agent
CloudWatch Agent is available for multiple operating systems and architectures. You can find the appropriate download link for your system in this table.
Amazon provides RPM and DEB packages for Linux. My server runs on Ubuntu, so I'll use the DEB package. It can be downloaded using wget
:
wget https://amazoncloudwatch-agent.s3.amazonaws.com/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
After downloading, change to the directory and start the installation. The command depends on the package:
# DEB package
sudo dpkg -i -E ./amazon-cloudwatch-agent.deb
# RPM package
sudo rpm -U ./amazon-cloudwatch-agent.rpm
Create Config with Wizard
Before running CloudWatch Agent on your server, you must create a configuration file. The configuration specifies the metrics and logs that the Agent will collect from the server. You can find the supported metrics in this table. The collected logs are specified as file paths like /var/log/auth.log
, /var/log/auth.log*
, or even /var/log/**.log
.
The wizard will take you through a few questions to create a configuration. If you are uncertain about a question, just go with the default choice. You can always start the wizard again or manually edit the config afterward. If you want to store the config as a parameter in SSM, make sure to answer the question with yes and that your IAM user has the CloudWatchAgentAdminPolicy
policy.
There's one question regarding running the Agent as the root user or as a different user. The docs mention that CloudWatch Agent runs as the root user by default, but the default choice in the wizard was to run it as user cwagent.
If you decide to run it as a different user, be sure to create this user and configure the AWS CLI for it. I overwrote the default choice and run it as the root
user.
To start the wizard, run this command:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
If you want to double-check your answers, you can see my answers here.
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
= =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply. =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:
Trying to fetch the default region based on ec2 metadata...
Are you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [2]:
Please make sure the credentials and region set correctly on your hosts.
Refer to http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
Which user are you planning to run the agent?
1. cwagent
2. root
3. others
default choice: [1]:
2
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:
Which port do you want StatsD daemon to listen to?
default choice: [8125]
What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:
3
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:
2
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle"
],
"metrics_collection_interval": 60,
"totalcpu": true
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"write_bytes",
"read_bytes",
"writes",
"reads"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"net": {
"measurement": [
"bytes_sent",
"bytes_recv",
"packets_sent",
"packets_recv"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 60,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:
Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
1. yes
2. no
default choice: [2]:
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:
Log file path:
/var/log/auth.log
Log group name:
default choice: [auth.log]
auth-log
Log group class:
1. STANDARD
2. INFREQUENT_ACCESS
default choice: [1]:
Log stream name:
default choice: [{hostname}]
{hostname}
Log Group Retention in days
1. -1
2. 1
3. 3
4. 5
5. 7
6. 14
7. 30
8. 60
9. 90
10. 120
11. 150
12. 180
13. 365
14. 400
15. 545
16. 731
17. 1096
18. 1827
19. 2192
20. 2557
21. 2922
22. 3288
23. 3653
default choice: [1]:
7
Do you want to specify any additional log files to monitor?
1. yes
2. no
default choice: [1]:
2
Do you want the CloudWatch agent to also retrieve X-ray traces?
1. yes
2. no
default choice: [1]:
2
Existing config JSON identified and copied to: /opt/aws/amazon-cloudwatch-agent/etc/backup-configs
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/auth.log",
"log_group_class": "STANDARD",
"log_group_name": "auth-log",
"log_stream_name": "{hostname}",
"retention_in_days": 30
}
]
}
}
},
"metrics": {
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle"
],
"metrics_collection_interval": 60,
"totalcpu": true
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"write_bytes",
"read_bytes",
"writes",
"reads"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"net": {
"measurement": [
"bytes_sent",
"bytes_recv",
"packets_sent",
"packets_recv"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 60,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:
What parameter store name do you want to use to store your config? (Use 'AmazonCloudWatch-' prefix if you use our managed AWS policy)
default choice: [AmazonCloudWatch-linux]
AmazonCloudWatch-agent-config
Which region do you want to store the config in the parameter store?
default choice: [eu-west-1]
Which AWS credential should be used to send json config to parameter store?
1. XXXXXX... (From SDK)
2. Other
default choice: [1]:
Successfully put config to parameter store AmazonCloudWatch-agent-config.
Program exits now.
Installing CollectD
CloudWatch Agent uses the CollectD
service to collect metrics. If CollectD
is not installed on your system, the Agent will fail to start. If you are not sure if it's installed, here is how you can check if CollectD
is installed and active:
sudo systemctl status collectd
This will show you if the service is active, inactive, or if there are any issues with it. Alternatively, you can also look for CollectD
in the running processes:
ps aux | grep collectd
If CollectD
is not installed, you can install it from your package manager. On Ubuntu, you can install it using apt
:
sudo apt update
sudo apt install collectd
Then, start and enable the CollectD
service:
sudo systemctl start collectd
sudo systemctl enable collectd
Finally, verify if CollectD
is active and running:
sudo systemctl status collectd
Start CloudWatch Agent
The Agent can be controlled by the script located at /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl
. If you run it without any arguments, it will show the help:
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl
usage: amazon-cloudwatch-agent-ctl -a
stop|start|status|fetch-config|append-config|remove-config|set-log-level
[-m ec2|onPremise|onPrem|auto]
[-c default|all|ssm:<parameter-store-name>|file:<file-path>]
[-s]
[-l INFO|DEBUG|WARN|ERROR|OFF]
e.g.
1. apply a SSM parameter store config on EC2 instance and restart the agent afterwards:
amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:AmazonCloudWatch-Config.json -s
2. append a local json config file on onPremise host and restart the agent afterwards:
amazon-cloudwatch-agent-ctl -a append-config -m onPremise -c file:/tmp/config.json -s
3. query agent status:
amazon-cloudwatch-agent-ctl -a status
-a: action
stop: stop the agent process.
start: start the agent process.
status: get the status of the agent process.
fetch-config: apply config for agent, followed by -c. Target config can be based on location (ssm parameter store name, file name), or 'default'.
append-config: append json config with the existing json configs if any, followed by -c. Target config can be based on the location (ssm parameter store name, file name), or 'default'.
remove-config: remove config for agent, followed by -c. Target config can be based on the location (ssm parameter store name, file name), or 'all'.
set-log-level: sets the log level, followed by -l to provide the level in all caps.
-m: mode
ec2: indicate this is on ec2 host.
onPremise, onPrem: indicate this is on onPremise host.
auto: use ec2 metadata to determine the environment, may not be accurate if ec2 metadata is not available for some reason on EC2.
-c: amazon-cloudwatch-agent configuration
default: default configuration for quick trial.
ssm:<parameter-store-name>: ssm parameter store name.
file:<file-path>: file path on the host.
all: all existing configs. Only apply to remove-config action.
-s: optionally restart after configuring the agent configuration
this parameter is used for 'fetch-config', 'append-config', 'remove-config' action only.
-l: log level to set the agent to INFO, DEBUG, WARN, ERROR, or OFF
this parameter is used for 'set-log-level' only.
CloudWatch Agent must be started with the config created from the wizard. If the config was saved to SSM, use -c ssm:configuration-parameter-store-name
with the name of the parameter (not the ARN). Otherwise, use -c file:configuration-file-path
with the path to the config file. To start CloudWatch Agent, run the following command:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m onPremise -s -c [ssm:configuration-parameter-store-name | file:configuration-file-path]
CloudWatch Agent will fetch the config and validate all settings. If everything is fine, it will enable itself as a system service. You can then verify if the Agent is running with this command:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m onPremise -a status
Update Config and Restart
If you need to change the config, you can update the SSM parameter or edit the local config file. You can then restart the Agent with the same command as before:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m onPremise -s -c [ssm:configuration-parameter-store-name | file:configuration-file-path]
The -s
option tells CloudWatch Agent to restart itself after the new config is fetched.
Troubleshooting
If CloudWatch Agent is not working as expected, take a look at the log files at /opt/aws/amazon-cloudwatch-agent/logs/*
.
If the Agent doesn't even start, check the configuration-validation.log
file to see if there's an issue with the configuration:
nano /opt/aws/amazon-cloudwatch-agent/logs/configuration-validation.log
If the Agent is running but the logs and metrics don't show up in CloudWatch, check the amazon-cloudwatch-agent.log
file for errors (E!
) and warnings (W!
):
nano /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
Further information can be found in the troubleshooting section in the docs.
Top comments (0)