Shanmugananthan A K

Posted on Sep 25

🚀 Getting Started with Nagios Core: Monitoring Made Simple

#nagio #monitoring #devops

When it comes to IT infrastructure monitoring, Nagios Core is one of the most widely adopted open-source tools. It gives sysadmins the power to keep an eye on servers, networks, applications, and even devices like printers — all in one place.

In this post, I’ll walk you through how Nagios works, its components, and a real-time example of monitoring a Linux server.

🔎 What is Nagios?

Nagios is an open-source monitoring engine that continuously tracks your IT environment. It detects issues (like high CPU usage, disk running out of space, or services going down) and alerts admins immediately, helping reduce downtime.

Nagios Core can scale to monitor thousands of hosts and services, making it a great fit for small setups as well as large enterprise infrastructures.

⚙️ Nagios Architecture

A Nagios setup usually has:

Nagios Server (Central host) → Runs Nagios Core, holds all configs, and processes results.
Plugins → Small executables/scripts that check specific resources (disk, CPU, HTTP, FTP, etc.).
NRPE (Nagios Remote Plugin Executor) → Allows Nagios to run plugins on remote Linux/Unix hosts.
NSClient++ → Windows equivalent for running checks remotely.

🔄 How Nagios Works

Nagios schedules checks using its monitoring engine.
It executes plugins locally or remotely (via NRPE/NSClient++).
Plugins return results like:

OK (all good)
WARNING (threshold breached)
CRITICAL (service down/major problem)
1. Nagios updates its status database and triggers alerts based on configuration.
2. Alerts can be sent via email, SMS, Slack, Telegram, or custom scripts.

📬 Active vs Passive Checks

Active Check → Nagios runs the check itself (e.g., check_http connects to port 80 of a web server).
Passive Check → External apps or agents send results back to Nagios (e.g., SNMP traps).

📦 Nagios Directory Structure

When you install Nagios Core, you’ll see something like this:

nagios/
├── bin       # Nagios main daemon
├── etc       # All configuration files
├── libexec   # Plugins (check_http, check_ping, etc.)
├── sbin      # CGI executables for web interface
├── share     # Web interface files (HTML/PHP)
└── var       # Runtime data (logs, cache, status.dat)

🖥️ Real-Time Example: Monitoring an HTTP Service

Let’s say you have a server at 44.233.51.131 and you want to monitor if its web service is up.

Here’s how a Nagios service definition might look:

define service {
    use                   local-service
    host_name             web-server-1
    service_description   HTTP
    check_command         check_http!-H 44.233.51.131 -p 80
    check_interval        1
    retry_interval        1
    notifications_enabled 1
    contact_groups        admins
}

✅ What this does:

Every 1 minute, Nagios checks port 80 on 44.233.51.131.
If the service is down (CRITICAL), Nagios immediately alerts the admins group.
Once it’s back online (RECOVERY), Nagios sends another notification.

This ensures your team knows the moment downtime happens, minimizing business impact.

🔧 Extend with Custom Plugins

Nagios comes with many plugins (check_disk, check_ping, check_http, etc.), but you can also write your own in Python, PHP, or Shell scripts. For example, you could monitor:

Application log files for errors
Database query response times
API health endpoints

🎯 Why Nagios?

Scalable → Monitors 10,000+ hosts easily
Flexible → Active & passive checks, NRPE, SNMP, and custom plugins
Reliable → Fast alerting system with escalation policies (e.g., L1 alert to on-call admin, L2 alert to ops team)
Extensible → Works with other tools like Grafana, Prometheus, or ELK for visualization

📂 Key Configuration Files in `/etc/nagios`

Nagios stores most of its configuration in the etc directory. Some of the most important files include:

.htpasswd → Secures the web interface by storing GUI login passwords.
cgi.cfg → Manages the web interface settings, including login permissions and who can access what.
resource.cfg → Stores sensitive information such as credentials and common paths, keeping them out of command definitions.
objects/ → Contains definitions for what Nagios monitors, including hosts, services, contacts, and commands.

🔹 Important Nagios Config Files

File	Purpose
nagios.cfg	Main configuration file. Defines global settings, intervals for host/service checks, and points to other object configuration files.
resource.cfg	Secure storage for sensitive data like usernames, passwords, and directory paths.
contacts.cfg	Defines who gets notifications when hosts or services go down. Includes contacts (users) and contact groups. Typically contains a default contact like `Nagios admin`.
command.cfg	Lists the commands Nagios uses for service checks, notifications, and event handlers. You can also add custom commands here.
templates.cfg	Contains object templates (blueprints) for hosts, services, and contacts. Real objects can inherit defaults from these templates.
timeperiods.cfg	Defines reusable schedules for monitoring and notifications (e.g., `24x7`, `workhours`). Can be referenced by hosts, services, and contacts.

⚙️ How Nagios Works

Nagios monitors two main object types:

Hosts → Any entity on your network: servers, printers, switches, etc.
Services → Checks performed on hosts, like CPU load, disk space, web server status, or network connectivity.

Nagios uses commands to perform checks, send notifications, and handle events. You can also write custom plugins in Python, PHP, or Shell scripts and add them to libexec, then reference them in command.cfg.

🔹 Common Nagios Commands

Here are some of the most widely used Nagios plugins:

Command	Purpose
check_http	Checks Apache/Nginx/web server status
check_ping	Monitors network connectivity (latency, packet loss)
check_ssh	Checks if SSH service is running
check_disk	Checks disk usage on partitions
check_load	Monitors CPU load average
check_users	Counts logged-in users
check_procs	Monitors number of processes or specific process status
check_swap	Checks swap memory usage
check_snmp	Monitors SNMP-enabled devices like printers, switches, routers
check_tcp	Checks TCP port availability (e.g., DB, mail, custom services)
check_dns	Checks DNS resolution status
check_smtp	Monitors mail server (SMTP) status
check_mysql	Checks MySQL/MariaDB availability (if installed)

🔹 Example: Monitoring a Web Server

Here’s a simple service definition to monitor an HTTP server at IP 44.233.51.131:

define service {
    use                   local-service
    host_name             web-server-1
    service_description   HTTP
    check_command         check_http!-H 44.233.51.131 -p 80
    check_interval        1
    retry_interval        1
    notifications_enabled 1
    contact_groups        admins
}

Every 1 minute, Nagios checks port 80 on the server.
Alerts are sent to the admins group if the service goes CRITICAL.
Recovery notifications are sent once the server is back online.

🔹 Step-by-Step Guide: Monitoring a Remote Host with NRPE in Nagios

Nagios Core can monitor remote Linux hosts using NRPE (Nagios Remote Plugin Executor). This allows your Nagios master server to execute checks on a remote slave host. Let’s go through the process step by step.

Step 1: Configure NRPE on the Slave Host

The NRPE daemon runs on the remote host you want to monitor. Its configuration file location depends on your installation:

Common paths: /etc/nagios/nrpe.cfg or /usr/local/nagios/etc/nrpe.cfg

Open the file with your editor (I’ll use nano here):

sudo nano /usr/local/nagios/etc/nrpe.cfg

Locate the allowed_hosts directive and add your Nagios master IP:

allowed_hosts=127.0.0.1,192.168.1.10

Replace 192.168.1.10 with your Nagios master server IP.

Save the file and restart the NRPE service to apply changes:

sudo systemctl restart nrpe
# or, depending on your OS:
sudo service nrpe restart

Step 2: Verify Connection from the Master

On your Nagios master server, test if it can communicate with the slave host:

/usr/local/nagios/libexec/check_nrpe -H <Slave_IP_Address>

Replace <Slave_IP_Address> with the actual IP of your remote host.
If configured correctly, NRPE should respond with its version number or a test message.

Step 3: Define the Slave Host in Nagios

Create a configuration file for the slave host, e.g., /usr/local/nagios/etc/objects/slave.cfg:

define host{
    use                     linux-server
    host_name               Slave-Server-01
    alias                   My Remote Linux Host
    address                 <Slave_IP_Address>
    max_check_attempts      5
    check_period            24x7
    notification_interval   30
    notification_period     24x7
    contact_groups          admins
    register                1
}

linux-server is a host template defined in templates.cfg
register 1 → This is a real host definition
Replace <Slave_IP_Address> with the remote host IP

Step 4: Define Services (The Actual Checks)

You can define services in the same file (slave.cfg) or a separate services.cfg. Here are some common checks using check_nrpe:

# Check CPU Load
define service{
    use                     generic-service
    host_name               Slave-Server-01
    service_description     CPU Load
    check_command           check_nrpe!check_load
}

# Check Current Users
define service{
    use                     generic-service
    host_name               Slave-Server-01
    service_description     Current Users
    check_command           check_nrpe!check_users
}

# Check Root Disk Space
define service{
    use                     generic-service
    host_name               Slave-Server-01
    service_description     Root Disk Space
    check_command           check_nrpe!check_disk
}

🔹 How it works

check_command = check_nrpe!check_load
- check_nrpe → Nagios command defined on the master
- !check_load → Argument passed to NRPE, which corresponds to the remote command defined in the slave’s nrpe.cfg

Step 5: Include the New Host Config and Restart Nagios

Ensure your main Nagios configuration (nagios.cfg) includes your new host configuration:

cfg_file=/usr/local/nagios/etc/objects/slave.cfg

Finally, restart Nagios to apply the changes:

sudo systemctl restart nagios
# or
sudo service nagios restart

✅ That's it!
Your Nagios master can now monitor the remote slave host via NRPE. You can repeat this process for additional hosts and customize the services to monitor CPU, memory, disk, users, or any custom script.

🔹 How to Configure Email Notifications in Nagios Core

Nagios Core can alert administrators via email whenever a host or service goes down, recovers, or reaches a threshold. Setting up email notifications requires configuring contacts, contact groups, and ensuring the Nagios server can send emails. Here’s a step-by-step guide.

Step 1: Ensure the Nagios Server Can Send Mail

Nagios itself does not send emails directly. It relies on a Mail Transfer Agent (MTA) like mail, sendmail, Postfix, or Postmail.

Install and configure your preferred MTA. I prefer using Postfix.
Test email sending from the command line:

echo "Test email from Nagios" | mail -s "Nagios Test" user@example.com

If the email arrives successfully, your Nagios server can send notifications.

Step 2: Define the Contact in `contacts.cfg`

Open the contacts configuration file (usually /usr/local/nagios/etc/objects/contacts.cfg) and define a new contact:

define contact {
    contact_name                shanmugananthan
    use                         generic-contact
    alias                       Shan
    email                       shan@example.com
    service_notification_period 24x7
    host_notification_period    24x7
    service_notification_options w,u,c,r,f,s
    host_notification_options   d,u,r,f,s
    service_notification_commands notify-service-by-email
    host_notification_commands  notify-host-by-email
}

Explanation:

service_notification_options w,u,c,r,f,s → Notifications for Warning, Unknown, Critical, Recovery, Flapping, Scheduled Downtime
host_notification_options d,u,r,f,s → Notifications for Down, Unreachable, Recovery, Flapping, Scheduled Downtime
notify-service-by-email / notify-host-by-email → Commands that send email notifications

Step 3: Define or Update the Contact Group

If you use contact groups, make sure your new contact is included:

define contactgroup{
    contactgroup_name admins
    alias             Nagios Administrators
    members           nagiosadmin,shanmugananthan
}

You can assign multiple contacts to a group for centralized notifications.

Step 4: Apply the Contact / Contact Group to Hosts or Services

When defining hosts or services, reference the contact or contact group so they receive alerts.

Example Host Definition:

define host {
    use                     linux-server
    host_name               web-server-01
    alias                   Web Server
    address                 192.168.1.10
    contact_groups          admins
}

Example Service Definition:

define service {
    use                     generic-service
    host_name               web-server-01
    service_description     HTTP
    check_command           check_http
    contacts                shanmugananthan
    contact_groups          admins
}

contacts → individual notifications
contact_groups → group notifications

Step 5: Verify and Restart Nagios

Check the Nagios configuration for errors:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Restart Nagios to apply changes:

sudo systemctl restart nagios

If the verification shows no errors, your email notifications are now active.

✅ Tips:

Use a generic-contact template to avoid repeating notification options.
Test notifications by manually stopping a service or host to ensure emails are sent correctly.
Consider integrating Slack, Telegram, or SMS notifications for faster alerts.

DEV Community

🚀 Getting Started with Nagios Core: Monitoring Made Simple

🔎 What is Nagios?

⚙️ Nagios Architecture

🔄 How Nagios Works

📬 Active vs Passive Checks

📦 Nagios Directory Structure

🖥️ Real-Time Example: Monitoring an HTTP Service

🔧 Extend with Custom Plugins

🎯 Why Nagios?

📂 Key Configuration Files in `/etc/nagios`

🔹 Important Nagios Config Files

⚙️ How Nagios Works

🔹 Common Nagios Commands

🔹 Example: Monitoring a Web Server

🔹 Step-by-Step Guide: Monitoring a Remote Host with NRPE in Nagios

Step 1: Configure NRPE on the Slave Host

Step 2: Verify Connection from the Master

Step 3: Define the Slave Host in Nagios

Step 4: Define Services (The Actual Checks)

🔹 How it works

Step 5: Include the New Host Config and Restart Nagios

🔹 How to Configure Email Notifications in Nagios Core

Step 1: Ensure the Nagios Server Can Send Mail

Step 2: Define the Contact in `contacts.cfg`

Step 3: Define or Update the Contact Group

Step 4: Apply the Contact / Contact Group to Hosts or Services

Step 5: Verify and Restart Nagios

Top comments (0)

🔎 What is Nagios?

⚙️ Nagios Architecture

🔄 How Nagios Works

📬 Active vs Passive Checks

📦 Nagios Directory Structure

🖥️ Real-Time Example: Monitoring an HTTP Service

🔧 Extend with Custom Plugins

🎯 Why Nagios?

📂 Key Configuration Files in /etc/nagios

🔹 Important Nagios Config Files

⚙️ How Nagios Works

🔹 Common Nagios Commands

🔹 Example: Monitoring a Web Server

🔹 Step-by-Step Guide: Monitoring a Remote Host with NRPE in Nagios

Step 1: Configure NRPE on the Slave Host

Step 2: Verify Connection from the Master

Step 3: Define the Slave Host in Nagios

Step 4: Define Services (The Actual Checks)

🔹 How it works

Step 5: Include the New Host Config and Restart Nagios

🔹 How to Configure Email Notifications in Nagios Core

Step 1: Ensure the Nagios Server Can Send Mail

Step 2: Define the Contact in contacts.cfg

Step 3: Define or Update the Contact Group

Step 4: Apply the Contact / Contact Group to Hosts or Services

Step 5: Verify and Restart Nagios

📂 Key Configuration Files in `/etc/nagios`

Step 2: Define the Contact in `contacts.cfg`