Learning DevOps Monitoring with DevOps Shack: A Hands-On Journey
In the ever-evolving world of DevOps, monitoring is a critical aspect of maintaining the health and performance of your infrastructure. Recently, I embarked on a hands-on learning journey, following along with a YouTuber named DevOps Shack. Through their comprehensive tutorials, I implemented a full-fledged monitoring solution using Prometheus, Node Exporter, Alertmanager, and Blackbox Exporter. This blog post shares my experience and the key takeaways from the project.
Project Overview
The project is designed to provide an end-to-end monitoring solution for your infrastructure. By following the guidance of DevOps Shack, I was able to set up a robust system that not only monitors the health of virtual machines but also sends alerts for critical issues and probes service availability.
Tools and Technologies
- Prometheus: Used for collecting and storing metrics.
- Node Exporter: Used to expose hardware and OS metrics to Prometheus.
- Alertmanager: Manages alerts generated by Prometheus.
- Blackbox Exporter: Probes endpoints to check their availability.
Prerequisites
Before getting started, I ensured that:
- Two virtual machines (VMs) were prepared.
-
wget
andtar
were installed on both VMs. - I had the necessary permissions to download, extract, and run the binaries.
Step-by-Step Setup
VM-1: Setting Up Node Exporter
1. Download Node Exporter:
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
2. Extract Node Exporter:
tar xvfz node_exporter-1.8.1.linux-amd64.tar.gz
3. Start Node Exporter:
cd node_exporter-1.8.1.linux-amd64
./node_exporter &
VM-2: Setting Up Prometheus, Alertmanager, and Blackbox Exporter
Prometheus Setup
1. Download Prometheus:
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
2. Extract Prometheus:
tar xvfz prometheus-2.52.0.linux-amd64.tar.gz
3. Start Prometheus:
cd prometheus-2.52.0.linux-amd64
./prometheus --config.file=prometheus.yml &
Alertmanager Setup
1. Download Alertmanager:
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
2. Extract Alertmanager:
tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz
3. Start Alertmanager:
cd alertmanager-0.27.0.linux-amd64
./alertmanager --config.file=alertmanager.yml &
Blackbox Exporter Setup
1. Download Blackbox Exporter:
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
2. Extract Blackbox Exporter:
tar xvfz blackbox_exporter-0.25.0.linux-amd64.tar.gz
3. Start Blackbox Exporter:
cd blackbox_exporter-0.25.0.linux-amd64
./blackbox_exporter &
Configuration Details
Prometheus Configuration (prometheus.yml
)
-
Global Configuration:
- Scrape interval:
15s
- Evaluation interval:
15s
- Scrape interval:
-
Scrape Configurations:
- Prometheus itself:
- Job name:
prometheus
- Target:
localhost:9090
- Node Exporter:
- Job name:
node_exporter
- Target:
3.110.195.114:9100
- Blackbox Exporter:
- Job name:
blackbox
- Targets:
http://prometheus.io
https://prometheus.io
http://3.110.195.114:8080/
Alertmanager Configuration (alertmanager.yml
)
-
Routing Configuration:
- Group alerts by:
alertname
- Group wait:
30s
- Group interval:
5m
- Repeat interval:
1h
- Default receiver:
email-notifications
- Group alerts by:
-
Receiver Configuration:
- Receiver name:
email-notifications
- Email recipient:
email@gmail.com
- SMTP server:
smtp.gmail.com:587
- Auth username and password (to be configured).
- Receiver name:
-
Inhibition Rules:
- Source match:
severity: critical
- Target match:
severity: warning
- Equal fields:
alertname
,dev
,instance
- Source match:
Alert Rules Configuration (alert_rules.yml
)
Some of the key alert rules I configured include:
- InstanceDown: Alerts if an instance is down for more than 1 minute.
- WebsiteDown: Alerts if a website probe fails.
- HostOutOfMemory: Alerts if memory availability drops below 25%.
- HostOutOfDiskSpace: Alerts if disk space is less than 50%.
- HostHighCpuLoad: Alerts if CPU load exceeds 80%.
- ServiceUnavailable: Alerts if a service is unavailable.
- HighMemoryUsage: Alerts if memory usage exceeds 90%.
- FileSystemFull: Alerts if file system free space drops below 10%.
Firewall and Security Settings
I had to configure the firewall to allow traffic on the necessary ports:
- Prometheus:
9090
- Alertmanager:
9093
- Blackbox Exporter:
9115
- Node Exporter:
9100
Key Features and Functionalities
Monitoring: Using Node Exporter and Prometheus, I was able to monitor crucial system metrics such as CPU usage, memory availability, and disk space. These metrics are scraped at regular intervals, providing real-time insights into the system's performance.
Alerting: With Alertmanager, I was able to set up notifications for critical events, ensuring that I was immediately informed of any issues, such as an instance going down or high CPU load. The flexibility of Alertmanager's configuration allowed me to tailor alerts to meet specific needs.
Probing: The Blackbox Exporter allowed me to monitor the availability and response times of various endpoints, including web services, ensuring that they remained accessible and responsive.
Challenges and Solutions
One of the challenges I faced during this project was configuring the firewall and ensuring proper communication between the services across the two VMs. By carefully adjusting firewall rules and thoroughly reviewing the configuration files, I was able to overcome these hurdles and achieve a seamless setup.
Future Enhancements
Moving forward, I plan to enhance this setup by integrating Grafana for more sophisticated visualization of the metrics collected by Prometheus. Additionally, I am considering automating the entire deployment process using Ansible, making it easier to replicate the setup across different environments.
Conclusion
Following along with DevOps Shack's tutorials provided me with a solid foundation in setting up a comprehensive monitoring and alerting system using open-source tools. This project, DevOps Shack, serves as a testament to the power of hands-on learning in mastering DevOps concepts. I encourage anyone interested in DevOps to explore these tools, experiment with configurations, and discover the power of effective monitoring in maintaining a healthy infrastructure.
You can explore more about the project on DevOps Shack’s YouTube channel (insert the actual link). If you have any questions or feedback, feel free to reach out!
This revised version emphasizes your learning experience and the value of following DevOps Shack's guidance. Feel free to personalize it further before publishing!
Top comments (0)