DEV Community

Cover image for Linux Monitoring & Alerting: Command-Line Mastery for DevOps
Sajja Sudhakararao
Sajja Sudhakararao

Posted on

Linux Monitoring & Alerting: Command-Line Mastery for DevOps

The Monitoring Gap Every DevOps Engineer Faces

Full monitoring stacks like Prometheus + Grafana are great, but they take time to set up. What about the servers you inherit? The staging environments? The emergency VM you spin up during an outage?

Command-line monitoring is your immediate, universal answer. These tools work on every Linux box, no agents required. Better yet, they're fast enough to script into alerting workflows.

This post covers the essential Linux monitoring commands plus patterns to turn raw metrics into actionable alerts—perfect follow-up to our Bash scripting guide.

1. Real-Time Resource Dashboards

The top/htop Foundation
top gives you an instant system snapshot:

top - 11:26:45 up 5 days,  3:12,  2 users,  load average: 1.23, 1.45, 1.67
Tasks: 234 total,   2 running, 232 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.3 us,  8.7 sy,  0.0 ni, 78.9 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  7900.2 total,  1234.5 free,  4567.8 used,  2097.9 buff/cache
Enter fullscreen mode Exit fullscreen mode

Pro move: htop (install with apt install htop)

  • Mouse/keyboard navigation

  • Color-coded resource bars

  • Tree view of processes (F5)

Quick filters:

htop -p $(pgrep -d, nginx)  # Monitor nginx processes only
Enter fullscreen mode Exit fullscreen mode

Memory Deep Dive: free -h

free -h
               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       4.2Gi       1.2Gi       128Mi       2.3Gi       3.1Gi 
Swap:          2.0Gi          0B       2.0Gi
Enter fullscreen mode Exit fullscreen mode

What matters: Focus on available column, not free. Linux aggressively caches to disk.

2. CPU Analysis: Who's Eating Cycles?

Per-Process Breakdown

ps aux --sort=-%cpu | head -10
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mysql     1234 45.2 12.3 2.1g  980m ?        S    10:00   3:45 /usr/sbin/mysqld
Enter fullscreen mode Exit fullscreen mode

Historical CPU Trends: sar

# Install: apt install sysstat
sar -u 1 5     # CPU every 1 sec, 5 samples
sar -u -f /var/log/sysstat/sa08  # Yesterday's data

Average: CPU %user %nice %system %iowait %steal %idle
Average:    all  12.34  0.00  8.76    1.23   0.00  77.67
Enter fullscreen mode Exit fullscreen mode

Alert pattern:

#!/bin/bash
if sar -u 1 3 | tail -1 | awk '{if($8 < 70) exit 1}'; then
  echo "CPU idle <70% for 3s - investigate!"
fi
Enter fullscreen mode Exit fullscreen mode

3. Disk I/O: The Silent Killer

Current I/O: iostat

iostat -x 1 5
Device            r/s     w/s     rkB/s    wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              23.4     1.2   234.5    12.3     0.0     10.2   0.00  89.12    0.1    2.3   0.45    10.0     6.2  1.23  45.2
Enter fullscreen mode Exit fullscreen mode

Red flags: %util >80%, await >20ms

Disk Space Alerts: df

df -h --output=source,fstype,size,used,avail,pcent,target | grep -v tmpfs
Enter fullscreen mode Exit fullscreen mode

Scriptable alert:

df -h | grep -E "[8-9][0-9]%|[9][0-9]%|[100]%" || echo "Disk healthy"
Enter fullscreen mode Exit fullscreen mode

4. Network Troubleshooting Masters

Active Connections: ss

# Replace netstat everywhere
ss -tuln          # Listening TCP/UDP
ss -tunap | grep :80   # Processes on port 80
ss -t state established | grep :443 | wc -l  # Active HTTPS connections
Enter fullscreen mode Exit fullscreen mode

Drop Counters: netstat or ss

netstat -s | grep -E "errors|dropped|retrans"
Ip:
    1234 total packets received
    56 dropped because of memory problems
Enter fullscreen mode Exit fullscreen mode

Live Packet Capture: tcpdump

# Capture 100 packets on interface eth0, port 80
sudo tcpdump -i eth0 -c 100 port 80 -w capture.pcap

# Read capture
tcpdump -r capture.pcap -nn
Enter fullscreen mode Exit fullscreen mode

5. Log Monitoring: Beyond tail -f

Service Logs: journalctl

journalctl -u nginx -f           # Follow nginx logs
journalctl -u nginx --since "1h ago"  # Last hour
journalctl -p err -u nginx      # Only errors
journalctl --no-pager | grep -i panic  # System panics
Enter fullscreen mode Exit fullscreen mode

Pattern Mining: grep + awk

# Count 5xx errors per minute
journalctl -u nginx --since "10min ago" | \
grep " 500 " | \
awk '{print $1, $2}' | cut -d. -f1 | sort | uniq -c

# Slow requests (>2s)
awk '$NF > 2 {print}' /var/log/nginx/access.log
Enter fullscreen mode Exit fullscreen mode

6. Production Alerting Patterns

CPU/Memory Watchdog

#!/bin/bash
set -euo pipefail

alert() { curl -X POST -d "CPU ${CPU}%, MEM ${MEM}%" "$SLACK_WEBHOOK"; }

CPU=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
MEM=$(free | awk '/Mem:/ {printf "%.0f", $3/$2 * 100}')

[[ "$CPU" -gt 80 || "$MEM" -gt 80 ]] && alert
Enter fullscreen mode Exit fullscreen mode

Disk Space Guardian

#!/bin/bash
for fs in $(df --local --output=source | tail -n +2); do
  usage=$(df $fs | tail -1 | awk '{print $5}' | sed 's/%//')
  [[ $usage -gt 85 ]] && echo "ALERT: $fs at ${usage}%"
done
Enter fullscreen mode Exit fullscreen mode

Cron schedule:

# Every 5 minutes
*/5 * * * * /usr/local/bin/check_resources.sh
Enter fullscreen mode Exit fullscreen mode

7. One-Line Dashboards

Combine tools into instant observability:

# System overview (alias this to 'sys')
watch -n 2 'printf "\nCPU: "; sar -u 1 1 |tail-1; printf "MEM: "; free -h |tail-1; printf "DISK: "; df -h / /var |tail -2'
Enter fullscreen mode Exit fullscreen mode
# Top resource hogs
watch -n 2 'ps aux --sort=-%cpu | head -8; echo "---"; ps aux --sort=-%mem | head -8'
Enter fullscreen mode Exit fullscreen mode

Quick Reference Table

| Scenario    | Command                | Pro Tip                              |
| ----------- | ---------------------- | ------------------------------------ |
| CPU trends  | sar -u 1 5             | Historical data in /var/log/sysstat/ |
| Memory      | free -h                | Watch available, ignore free         |
| Disk I/O    | iostat -x 1            | %util >80% = trouble                 |
| Connections | ss -tuln               | Modern netstat replacement           |
| Logs        | journalctl -u nginx -f | systemd's tail -f                    |
| Processes   | htop -p $(pgrep nginx) | Filter to specific app               |
Enter fullscreen mode Exit fullscreen mode

Top comments (0)