When it comes to IT infrastructure monitoring, Nagios Core is one of the most widely adopted open-source tools. It gives sysadmins the power to keep an eye on servers, networks, applications, and even devices like printers β all in one place.
In this post, Iβll walk you through how Nagios works, its components, and a real-time example of monitoring a Linux server.
π What is Nagios?
Nagios is an open-source monitoring engine that continuously tracks your IT environment. It detects issues (like high CPU usage, disk running out of space, or services going down) and alerts admins immediately, helping reduce downtime.
Nagios Core can scale to monitor thousands of hosts and services, making it a great fit for small setups as well as large enterprise infrastructures.
βοΈ Nagios Architecture
A Nagios setup usually has:
- Nagios Server (Central host) β Runs Nagios Core, holds all configs, and processes results.
- Plugins β Small executables/scripts that check specific resources (disk, CPU, HTTP, FTP, etc.).
- NRPE (Nagios Remote Plugin Executor) β Allows Nagios to run plugins on remote Linux/Unix hosts.
- NSClient++ β Windows equivalent for running checks remotely.
π How Nagios Works
- Nagios schedules checks using its monitoring engine.
- It executes plugins locally or remotely (via NRPE/NSClient++).
- Plugins return results like:
- OK (all good)
- WARNING (threshold breached)
- CRITICAL (service down/major problem)
- Nagios updates its status database and triggers alerts based on configuration.
- Alerts can be sent via email, SMS, Slack, Telegram, or custom scripts.
π¬ Active vs Passive Checks
-
Active Check β Nagios runs the check itself (e.g.,
check_http
connects to port 80 of a web server). - Passive Check β External apps or agents send results back to Nagios (e.g., SNMP traps).
π¦ Nagios Directory Structure
When you install Nagios Core, youβll see something like this:
nagios/
βββ bin # Nagios main daemon
βββ etc # All configuration files
βββ libexec # Plugins (check_http, check_ping, etc.)
βββ sbin # CGI executables for web interface
βββ share # Web interface files (HTML/PHP)
βββ var # Runtime data (logs, cache, status.dat)
π₯οΈ Real-Time Example: Monitoring an HTTP Service
Letβs say you have a server at 44.233.51.131 and you want to monitor if its web service is up.
Hereβs how a Nagios service definition might look:
define service {
use local-service
host_name web-server-1
service_description HTTP
check_command check_http!-H 44.233.51.131 -p 80
check_interval 1
retry_interval 1
notifications_enabled 1
contact_groups admins
}
β What this does:
- Every 1 minute, Nagios checks port 80 on
44.233.51.131
. - If the service is down (CRITICAL), Nagios immediately alerts the
admins
group. - Once itβs back online (RECOVERY), Nagios sends another notification.
This ensures your team knows the moment downtime happens, minimizing business impact.
π§ Extend with Custom Plugins
Nagios comes with many plugins (check_disk, check_ping, check_http, etc.), but you can also write your own in Python, PHP, or Shell scripts. For example, you could monitor:
- Application log files for errors
- Database query response times
- API health endpoints
π― Why Nagios?
- Scalable β Monitors 10,000+ hosts easily
- Flexible β Active & passive checks, NRPE, SNMP, and custom plugins
- Reliable β Fast alerting system with escalation policies (e.g., L1 alert to on-call admin, L2 alert to ops team)
- Extensible β Works with other tools like Grafana, Prometheus, or ELK for visualization
π Key Configuration Files in /etc/nagios
Nagios stores most of its configuration in the etc
directory. Some of the most important files include:
-
.htpasswd
β Secures the web interface by storing GUI login passwords. -
cgi.cfg
β Manages the web interface settings, including login permissions and who can access what. -
resource.cfg
β Stores sensitive information such as credentials and common paths, keeping them out of command definitions. -
objects/
β Contains definitions for what Nagios monitors, including hosts, services, contacts, and commands.
πΉ Important Nagios Config Files
File | Purpose |
---|---|
nagios.cfg | Main configuration file. Defines global settings, intervals for host/service checks, and points to other object configuration files. |
resource.cfg | Secure storage for sensitive data like usernames, passwords, and directory paths. |
contacts.cfg | Defines who gets notifications when hosts or services go down. Includes contacts (users) and contact groups. Typically contains a default contact like Nagios admin . |
command.cfg | Lists the commands Nagios uses for service checks, notifications, and event handlers. You can also add custom commands here. |
templates.cfg | Contains object templates (blueprints) for hosts, services, and contacts. Real objects can inherit defaults from these templates. |
timeperiods.cfg | Defines reusable schedules for monitoring and notifications (e.g., 24x7 , workhours ). Can be referenced by hosts, services, and contacts. |
βοΈ How Nagios Works
Nagios monitors two main object types:
- Hosts β Any entity on your network: servers, printers, switches, etc.
- Services β Checks performed on hosts, like CPU load, disk space, web server status, or network connectivity.
Nagios uses commands to perform checks, send notifications, and handle events. You can also write custom plugins in Python, PHP, or Shell scripts and add them to libexec
, then reference them in command.cfg
.
πΉ Common Nagios Commands
Here are some of the most widely used Nagios plugins:
Command | Purpose |
---|---|
check_http | Checks Apache/Nginx/web server status |
check_ping | Monitors network connectivity (latency, packet loss) |
check_ssh | Checks if SSH service is running |
check_disk | Checks disk usage on partitions |
check_load | Monitors CPU load average |
check_users | Counts logged-in users |
check_procs | Monitors number of processes or specific process status |
check_swap | Checks swap memory usage |
check_snmp | Monitors SNMP-enabled devices like printers, switches, routers |
check_tcp | Checks TCP port availability (e.g., DB, mail, custom services) |
check_dns | Checks DNS resolution status |
check_smtp | Monitors mail server (SMTP) status |
check_mysql | Checks MySQL/MariaDB availability (if installed) |
πΉ Example: Monitoring a Web Server
Hereβs a simple service definition to monitor an HTTP server at IP 44.233.51.131
:
define service {
use local-service
host_name web-server-1
service_description HTTP
check_command check_http!-H 44.233.51.131 -p 80
check_interval 1
retry_interval 1
notifications_enabled 1
contact_groups admins
}
- Every 1 minute, Nagios checks port 80 on the server.
- Alerts are sent to the
admins
group if the service goes CRITICAL. - Recovery notifications are sent once the server is back online.
πΉ Step-by-Step Guide: Monitoring a Remote Host with NRPE in Nagios
Nagios Core can monitor remote Linux hosts using NRPE (Nagios Remote Plugin Executor). This allows your Nagios master server to execute checks on a remote slave host. Letβs go through the process step by step.
Step 1: Configure NRPE on the Slave Host
The NRPE daemon runs on the remote host you want to monitor. Its configuration file location depends on your installation:
- Common paths:
/etc/nagios/nrpe.cfg
or/usr/local/nagios/etc/nrpe.cfg
Open the file with your editor (Iβll use nano
here):
sudo nano /usr/local/nagios/etc/nrpe.cfg
Locate the allowed_hosts
directive and add your Nagios master IP:
allowed_hosts=127.0.0.1,192.168.1.10
Replace
192.168.1.10
with your Nagios master server IP.
Save the file and restart the NRPE service to apply changes:
sudo systemctl restart nrpe
# or, depending on your OS:
sudo service nrpe restart
Step 2: Verify Connection from the Master
On your Nagios master server, test if it can communicate with the slave host:
/usr/local/nagios/libexec/check_nrpe -H <Slave_IP_Address>
- Replace
<Slave_IP_Address>
with the actual IP of your remote host. - If configured correctly, NRPE should respond with its version number or a test message.
Step 3: Define the Slave Host in Nagios
Create a configuration file for the slave host, e.g., /usr/local/nagios/etc/objects/slave.cfg
:
define host{
use linux-server
host_name Slave-Server-01
alias My Remote Linux Host
address <Slave_IP_Address>
max_check_attempts 5
check_period 24x7
notification_interval 30
notification_period 24x7
contact_groups admins
register 1
}
-
linux-server
is a host template defined intemplates.cfg
-
register 1
β This is a real host definition - Replace
<Slave_IP_Address>
with the remote host IP
Step 4: Define Services (The Actual Checks)
You can define services in the same file (slave.cfg
) or a separate services.cfg
. Here are some common checks using check_nrpe
:
# Check CPU Load
define service{
use generic-service
host_name Slave-Server-01
service_description CPU Load
check_command check_nrpe!check_load
}
# Check Current Users
define service{
use generic-service
host_name Slave-Server-01
service_description Current Users
check_command check_nrpe!check_users
}
# Check Root Disk Space
define service{
use generic-service
host_name Slave-Server-01
service_description Root Disk Space
check_command check_nrpe!check_disk
}
πΉ How it works
-
check_command = check_nrpe!check_load
-
check_nrpe
β Nagios command defined on the master -
!check_load
β Argument passed to NRPE, which corresponds to the remote command defined in the slaveβsnrpe.cfg
-
Step 5: Include the New Host Config and Restart Nagios
Ensure your main Nagios configuration (nagios.cfg
) includes your new host configuration:
cfg_file=/usr/local/nagios/etc/objects/slave.cfg
Finally, restart Nagios to apply the changes:
sudo systemctl restart nagios
# or
sudo service nagios restart
β
That's it!
Your Nagios master can now monitor the remote slave host via NRPE. You can repeat this process for additional hosts and customize the services to monitor CPU, memory, disk, users, or any custom script.
πΉ How to Configure Email Notifications in Nagios Core
Nagios Core can alert administrators via email whenever a host or service goes down, recovers, or reaches a threshold. Setting up email notifications requires configuring contacts, contact groups, and ensuring the Nagios server can send emails. Hereβs a step-by-step guide.
Step 1: Ensure the Nagios Server Can Send Mail
Nagios itself does not send emails directly. It relies on a Mail Transfer Agent (MTA) like mail
, sendmail
, Postfix
, or Postmail
.
- Install and configure your preferred MTA. I prefer using Postfix.
- Test email sending from the command line:
echo "Test email from Nagios" | mail -s "Nagios Test" user@example.com
If the email arrives successfully, your Nagios server can send notifications.
Step 2: Define the Contact in contacts.cfg
Open the contacts configuration file (usually /usr/local/nagios/etc/objects/contacts.cfg
) and define a new contact:
define contact {
contact_name shanmugananthan
use generic-contact
alias Shan
email shan@example.com
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
}
Explanation:
-
service_notification_options w,u,c,r,f,s
β Notifications for Warning, Unknown, Critical, Recovery, Flapping, Scheduled Downtime -
host_notification_options d,u,r,f,s
β Notifications for Down, Unreachable, Recovery, Flapping, Scheduled Downtime -
notify-service-by-email
/notify-host-by-email
β Commands that send email notifications
Step 3: Define or Update the Contact Group
If you use contact groups, make sure your new contact is included:
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin,shanmugananthan
}
- You can assign multiple contacts to a group for centralized notifications.
Step 4: Apply the Contact / Contact Group to Hosts or Services
When defining hosts or services, reference the contact or contact group so they receive alerts.
Example Host Definition:
define host {
use linux-server
host_name web-server-01
alias Web Server
address 192.168.1.10
contact_groups admins
}
Example Service Definition:
define service {
use generic-service
host_name web-server-01
service_description HTTP
check_command check_http
contacts shanmugananthan
contact_groups admins
}
-
contacts
β individual notifications -
contact_groups
β group notifications
Step 5: Verify and Restart Nagios
- Check the Nagios configuration for errors:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
- Restart Nagios to apply changes:
sudo systemctl restart nagios
If the verification shows no errors, your email notifications are now active.
β Tips:
- Use a generic-contact template to avoid repeating notification options.
- Test notifications by manually stopping a service or host to ensure emails are sent correctly.
- Consider integrating Slack, Telegram, or SMS notifications for faster alerts.
Top comments (0)