DEV Community

Shanmugananthan A K
Shanmugananthan A K

Posted on

πŸš€ Getting Started with Nagios Core: Monitoring Made Simple

When it comes to IT infrastructure monitoring, Nagios Core is one of the most widely adopted open-source tools. It gives sysadmins the power to keep an eye on servers, networks, applications, and even devices like printers β€” all in one place.

In this post, I’ll walk you through how Nagios works, its components, and a real-time example of monitoring a Linux server.


πŸ”Ž What is Nagios?

Nagios is an open-source monitoring engine that continuously tracks your IT environment. It detects issues (like high CPU usage, disk running out of space, or services going down) and alerts admins immediately, helping reduce downtime.

Nagios Core can scale to monitor thousands of hosts and services, making it a great fit for small setups as well as large enterprise infrastructures.


βš™οΈ Nagios Architecture

A Nagios setup usually has:

  • Nagios Server (Central host) β†’ Runs Nagios Core, holds all configs, and processes results.
  • Plugins β†’ Small executables/scripts that check specific resources (disk, CPU, HTTP, FTP, etc.).
  • NRPE (Nagios Remote Plugin Executor) β†’ Allows Nagios to run plugins on remote Linux/Unix hosts.
  • NSClient++ β†’ Windows equivalent for running checks remotely.

πŸ”„ How Nagios Works

  1. Nagios schedules checks using its monitoring engine.
  2. It executes plugins locally or remotely (via NRPE/NSClient++).
  3. Plugins return results like:
  • OK (all good)
  • WARNING (threshold breached)
  • CRITICAL (service down/major problem)
    1. Nagios updates its status database and triggers alerts based on configuration.
    2. Alerts can be sent via email, SMS, Slack, Telegram, or custom scripts.

πŸ“¬ Active vs Passive Checks

  • Active Check β†’ Nagios runs the check itself (e.g., check_http connects to port 80 of a web server).
  • Passive Check β†’ External apps or agents send results back to Nagios (e.g., SNMP traps).

πŸ“¦ Nagios Directory Structure

When you install Nagios Core, you’ll see something like this:

nagios/
β”œβ”€β”€ bin       # Nagios main daemon
β”œβ”€β”€ etc       # All configuration files
β”œβ”€β”€ libexec   # Plugins (check_http, check_ping, etc.)
β”œβ”€β”€ sbin      # CGI executables for web interface
β”œβ”€β”€ share     # Web interface files (HTML/PHP)
└── var       # Runtime data (logs, cache, status.dat)
Enter fullscreen mode Exit fullscreen mode

πŸ–₯️ Real-Time Example: Monitoring an HTTP Service

Let’s say you have a server at 44.233.51.131 and you want to monitor if its web service is up.

Here’s how a Nagios service definition might look:

define service {
    use                   local-service
    host_name             web-server-1
    service_description   HTTP
    check_command         check_http!-H 44.233.51.131 -p 80
    check_interval        1
    retry_interval        1
    notifications_enabled 1
    contact_groups        admins
}
Enter fullscreen mode Exit fullscreen mode

βœ… What this does:

  • Every 1 minute, Nagios checks port 80 on 44.233.51.131.
  • If the service is down (CRITICAL), Nagios immediately alerts the admins group.
  • Once it’s back online (RECOVERY), Nagios sends another notification.

This ensures your team knows the moment downtime happens, minimizing business impact.


πŸ”§ Extend with Custom Plugins

Nagios comes with many plugins (check_disk, check_ping, check_http, etc.), but you can also write your own in Python, PHP, or Shell scripts. For example, you could monitor:

  • Application log files for errors
  • Database query response times
  • API health endpoints

🎯 Why Nagios?

  • Scalable β†’ Monitors 10,000+ hosts easily
  • Flexible β†’ Active & passive checks, NRPE, SNMP, and custom plugins
  • Reliable β†’ Fast alerting system with escalation policies (e.g., L1 alert to on-call admin, L2 alert to ops team)
  • Extensible β†’ Works with other tools like Grafana, Prometheus, or ELK for visualization

πŸ“‚ Key Configuration Files in /etc/nagios

Nagios stores most of its configuration in the etc directory. Some of the most important files include:

  • .htpasswd β†’ Secures the web interface by storing GUI login passwords.
  • cgi.cfg β†’ Manages the web interface settings, including login permissions and who can access what.
  • resource.cfg β†’ Stores sensitive information such as credentials and common paths, keeping them out of command definitions.
  • objects/ β†’ Contains definitions for what Nagios monitors, including hosts, services, contacts, and commands.

πŸ”Ή Important Nagios Config Files

File Purpose
nagios.cfg Main configuration file. Defines global settings, intervals for host/service checks, and points to other object configuration files.
resource.cfg Secure storage for sensitive data like usernames, passwords, and directory paths.
contacts.cfg Defines who gets notifications when hosts or services go down. Includes contacts (users) and contact groups. Typically contains a default contact like Nagios admin.
command.cfg Lists the commands Nagios uses for service checks, notifications, and event handlers. You can also add custom commands here.
templates.cfg Contains object templates (blueprints) for hosts, services, and contacts. Real objects can inherit defaults from these templates.
timeperiods.cfg Defines reusable schedules for monitoring and notifications (e.g., 24x7, workhours). Can be referenced by hosts, services, and contacts.

βš™οΈ How Nagios Works

Nagios monitors two main object types:

  1. Hosts β†’ Any entity on your network: servers, printers, switches, etc.
  2. Services β†’ Checks performed on hosts, like CPU load, disk space, web server status, or network connectivity.

Nagios uses commands to perform checks, send notifications, and handle events. You can also write custom plugins in Python, PHP, or Shell scripts and add them to libexec, then reference them in command.cfg.


πŸ”Ή Common Nagios Commands

Here are some of the most widely used Nagios plugins:

Command Purpose
check_http Checks Apache/Nginx/web server status
check_ping Monitors network connectivity (latency, packet loss)
check_ssh Checks if SSH service is running
check_disk Checks disk usage on partitions
check_load Monitors CPU load average
check_users Counts logged-in users
check_procs Monitors number of processes or specific process status
check_swap Checks swap memory usage
check_snmp Monitors SNMP-enabled devices like printers, switches, routers
check_tcp Checks TCP port availability (e.g., DB, mail, custom services)
check_dns Checks DNS resolution status
check_smtp Monitors mail server (SMTP) status
check_mysql Checks MySQL/MariaDB availability (if installed)

πŸ”Ή Example: Monitoring a Web Server

Here’s a simple service definition to monitor an HTTP server at IP 44.233.51.131:

define service {
    use                   local-service
    host_name             web-server-1
    service_description   HTTP
    check_command         check_http!-H 44.233.51.131 -p 80
    check_interval        1
    retry_interval        1
    notifications_enabled 1
    contact_groups        admins
}
Enter fullscreen mode Exit fullscreen mode
  • Every 1 minute, Nagios checks port 80 on the server.
  • Alerts are sent to the admins group if the service goes CRITICAL.
  • Recovery notifications are sent once the server is back online.

πŸ”Ή Step-by-Step Guide: Monitoring a Remote Host with NRPE in Nagios

Nagios Core can monitor remote Linux hosts using NRPE (Nagios Remote Plugin Executor). This allows your Nagios master server to execute checks on a remote slave host. Let’s go through the process step by step.


Step 1: Configure NRPE on the Slave Host

The NRPE daemon runs on the remote host you want to monitor. Its configuration file location depends on your installation:

  • Common paths: /etc/nagios/nrpe.cfg or /usr/local/nagios/etc/nrpe.cfg

Open the file with your editor (I’ll use nano here):

sudo nano /usr/local/nagios/etc/nrpe.cfg
Enter fullscreen mode Exit fullscreen mode

Locate the allowed_hosts directive and add your Nagios master IP:

allowed_hosts=127.0.0.1,192.168.1.10
Enter fullscreen mode Exit fullscreen mode

Replace 192.168.1.10 with your Nagios master server IP.

Save the file and restart the NRPE service to apply changes:

sudo systemctl restart nrpe
# or, depending on your OS:
sudo service nrpe restart
Enter fullscreen mode Exit fullscreen mode

Step 2: Verify Connection from the Master

On your Nagios master server, test if it can communicate with the slave host:

/usr/local/nagios/libexec/check_nrpe -H <Slave_IP_Address>
Enter fullscreen mode Exit fullscreen mode
  • Replace <Slave_IP_Address> with the actual IP of your remote host.
  • If configured correctly, NRPE should respond with its version number or a test message.

Step 3: Define the Slave Host in Nagios

Create a configuration file for the slave host, e.g., /usr/local/nagios/etc/objects/slave.cfg:

define host{
    use                     linux-server
    host_name               Slave-Server-01
    alias                   My Remote Linux Host
    address                 <Slave_IP_Address>
    max_check_attempts      5
    check_period            24x7
    notification_interval   30
    notification_period     24x7
    contact_groups          admins
    register                1
}
Enter fullscreen mode Exit fullscreen mode
  • linux-server is a host template defined in templates.cfg
  • register 1 β†’ This is a real host definition
  • Replace <Slave_IP_Address> with the remote host IP

Step 4: Define Services (The Actual Checks)

You can define services in the same file (slave.cfg) or a separate services.cfg. Here are some common checks using check_nrpe:

# Check CPU Load
define service{
    use                     generic-service
    host_name               Slave-Server-01
    service_description     CPU Load
    check_command           check_nrpe!check_load
}

# Check Current Users
define service{
    use                     generic-service
    host_name               Slave-Server-01
    service_description     Current Users
    check_command           check_nrpe!check_users
}

# Check Root Disk Space
define service{
    use                     generic-service
    host_name               Slave-Server-01
    service_description     Root Disk Space
    check_command           check_nrpe!check_disk
}
Enter fullscreen mode Exit fullscreen mode

πŸ”Ή How it works

  • check_command = check_nrpe!check_load

    • check_nrpe β†’ Nagios command defined on the master
    • !check_load β†’ Argument passed to NRPE, which corresponds to the remote command defined in the slave’s nrpe.cfg

Step 5: Include the New Host Config and Restart Nagios

Ensure your main Nagios configuration (nagios.cfg) includes your new host configuration:

cfg_file=/usr/local/nagios/etc/objects/slave.cfg
Enter fullscreen mode Exit fullscreen mode

Finally, restart Nagios to apply the changes:

sudo systemctl restart nagios
# or
sudo service nagios restart
Enter fullscreen mode Exit fullscreen mode

βœ… That's it!
Your Nagios master can now monitor the remote slave host via NRPE. You can repeat this process for additional hosts and customize the services to monitor CPU, memory, disk, users, or any custom script.


πŸ”Ή How to Configure Email Notifications in Nagios Core

Nagios Core can alert administrators via email whenever a host or service goes down, recovers, or reaches a threshold. Setting up email notifications requires configuring contacts, contact groups, and ensuring the Nagios server can send emails. Here’s a step-by-step guide.


Step 1: Ensure the Nagios Server Can Send Mail

Nagios itself does not send emails directly. It relies on a Mail Transfer Agent (MTA) like mail, sendmail, Postfix, or Postmail.

  1. Install and configure your preferred MTA. I prefer using Postfix.
  2. Test email sending from the command line:
echo "Test email from Nagios" | mail -s "Nagios Test" user@example.com
Enter fullscreen mode Exit fullscreen mode

If the email arrives successfully, your Nagios server can send notifications.


Step 2: Define the Contact in contacts.cfg

Open the contacts configuration file (usually /usr/local/nagios/etc/objects/contacts.cfg) and define a new contact:

define contact {
    contact_name                shanmugananthan
    use                         generic-contact
    alias                       Shan
    email                       shan@example.com
    service_notification_period 24x7
    host_notification_period    24x7
    service_notification_options w,u,c,r,f,s
    host_notification_options   d,u,r,f,s
    service_notification_commands notify-service-by-email
    host_notification_commands  notify-host-by-email
}
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • service_notification_options w,u,c,r,f,s β†’ Notifications for Warning, Unknown, Critical, Recovery, Flapping, Scheduled Downtime
  • host_notification_options d,u,r,f,s β†’ Notifications for Down, Unreachable, Recovery, Flapping, Scheduled Downtime
  • notify-service-by-email / notify-host-by-email β†’ Commands that send email notifications

Step 3: Define or Update the Contact Group

If you use contact groups, make sure your new contact is included:

define contactgroup{
    contactgroup_name admins
    alias             Nagios Administrators
    members           nagiosadmin,shanmugananthan
}
Enter fullscreen mode Exit fullscreen mode
  • You can assign multiple contacts to a group for centralized notifications.

Step 4: Apply the Contact / Contact Group to Hosts or Services

When defining hosts or services, reference the contact or contact group so they receive alerts.

Example Host Definition:

define host {
    use                     linux-server
    host_name               web-server-01
    alias                   Web Server
    address                 192.168.1.10
    contact_groups          admins
}
Enter fullscreen mode Exit fullscreen mode

Example Service Definition:

define service {
    use                     generic-service
    host_name               web-server-01
    service_description     HTTP
    check_command           check_http
    contacts                shanmugananthan
    contact_groups          admins
}
Enter fullscreen mode Exit fullscreen mode
  • contacts β†’ individual notifications
  • contact_groups β†’ group notifications

Step 5: Verify and Restart Nagios

  1. Check the Nagios configuration for errors:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Enter fullscreen mode Exit fullscreen mode
  1. Restart Nagios to apply changes:
sudo systemctl restart nagios
Enter fullscreen mode Exit fullscreen mode

If the verification shows no errors, your email notifications are now active.


βœ… Tips:

  • Use a generic-contact template to avoid repeating notification options.
  • Test notifications by manually stopping a service or host to ensure emails are sent correctly.
  • Consider integrating Slack, Telegram, or SMS notifications for faster alerts.

Top comments (0)