Your Server Is at 97% CPU Right Now. Would You Know?

#bash #linux #sysadmin #monitoring

Here's how it usually goes:

You deploy something. Traffic is light. Server load sits at 15% and you move on to the next thing. Then traffic grows, or a cron job stacks on itself, or a memory leak slowly eats through your RAM over 72 hours. By the time you notice, the server is thrashing, responses take 8 seconds, and your app is effectively dead.

The frustrating part is that the tools to catch this have been on your server the entire time. top and free ship with every Linux distribution ever made. Nobody installs them. They're just... there. Waiting for someone to actually ask.

So I wrote a script that asks every hour and logs a warning when the answer is bad.

The Script

#!/bin/bash

CHECK="✓"
CROSS="✗"

# --- Configuration ---
THRESHOLD=80                    # Alert when usage exceeds this %
LOG_FILE="/var/log/resource-monitor.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')

# --- CPU Usage ---
CPU=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1 | cut -d',' -f1 | xargs printf "%.0f")

# --- RAM Usage ---
RAM=$(free | awk '/Mem:/ {printf "%.0f", $3/$2*100}')

echo "[$DATE] CPU: ${CPU}% | RAM: ${RAM}%"

# --- CPU Alert ---
if [ "$CPU" -gt "$THRESHOLD" ]; then
  echo "$CROSS [$DATE] WARNING: CPU at ${CPU}% (threshold: ${THRESHOLD}%)" | tee -a "$LOG_FILE"
else
  echo "$CHECK CPU OK: ${CPU}%"
fi

# --- RAM Alert ---
if [ "$RAM" -gt "$THRESHOLD" ]; then
  echo "$CROSS [$DATE] WARNING: RAM at ${RAM}% (threshold: ${THRESHOLD}%)" | tee -a "$LOG_FILE"
else
  echo "$CHECK RAM OK: ${RAM}%"
fi

Runs in under a second. Zero dependencies. Works on Ubuntu, Debian, CentOS, RHEL, Arch — anything with top and free, which is everything.

How the CPU Check Actually Works

The CPU line looks intimidating, so let me walk through it:

top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1 | cut -d',' -f1 | xargs printf "%.0f"

top -bn1 — runs top in batch mode (-b) for exactly one iteration (-n1). Batch mode dumps the full output to stdout instead of opening the interactive TUI. This is the only way to use top in a script.

grep "Cpu(s)" — grabs the line that shows aggregate CPU stats.

awk '{print $2}' — pulls the user CPU percentage (the second field).

cut and xargs printf — strips the percent sign and any comma decimal separator, then rounds to an integer. You can't do integer comparison in bash with 2.5 — it needs a clean number like 3.

The RAM check is simpler: free shows total and used memory, and awk divides used by total and multiplies by 100.

The Thing About `tee -a`

You'll notice the script uses echo ... | tee -a "$LOG_FILE" for warnings but plain echo for healthy checks. This is intentional.

tee -a writes to the terminal AND appends to the log file simultaneously. When everything is fine, there's nothing to log — you don't want a log file full of "CPU OK" lines every hour for three years. You only want entries when something is actually wrong. So the log file becomes a clean history of every resource spike your server has had, with timestamps.

When something breaks at 2 AM and you're debugging at 9 AM, you can cat /var/log/resource-monitor.log and see exactly when resources started climbing.

Schedule It

crontab -e

Hourly checks (what I use for most servers):

0 * * * * /home/user/monitor.sh >> /var/log/monitor-cron.log 2>&1

Every 5 minutes (for production servers where you need tighter visibility):

*/5 * * * * /home/user/monitor.sh >> /var/log/monitor-cron.log 2>&1

Not sure about the cron syntax? I have a cron job builder tool that generates the line visually.

Variations That Are Worth Adding

Add an email alert when thresholds are breached:

if [ "$CPU" -gt "$THRESHOLD" ]; then
  MSG="$CROSS WARNING: CPU at ${CPU}% on $(hostname) at $DATE"
  echo "$MSG" | tee -a "$LOG_FILE"
  echo "$MSG" | mail -s "[ALERT] High CPU on $(hostname)" you@example.com
fi

I have a full email alert script that covers the mail setup if you haven't configured it before.

Check disk space in the same script:

DISK=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$DISK" -gt "$THRESHOLD" ]; then
  echo "$CROSS [$DATE] WARNING: Disk at ${DISK}%" | tee -a "$LOG_FILE"
fi

Now you've got CPU, RAM, and disk in one pass. I keep disk in a separate script because I use a different threshold for it (90% vs 80%), but combining them works fine if you want fewer cron entries.

Log to a CSV for trending:

echo "$DATE,$CPU,$RAM" >> /var/log/resource-history.csv

Run this for a week and you'll see patterns. Maybe your app spikes every day at 2 PM when a batch job runs. Maybe RAM creeps up 1% per day, which means you have a memory leak that'll hit the wall in three months. You can't see these patterns without historical data.

When This Isn't Enough

This script is a notification system, not a monitoring platform. It tells you "something is wrong right now" but doesn't give you graphs, dashboards, or historical trending out of the box.

If you need that level of visibility, tools like Netdata (free, runs locally) or Grafana + Prometheus are the next step. But for a single VPS or a handful of servers, a cron script that logs warnings and optionally emails you is 90% of what you need — and it takes 2 minutes to deploy instead of 2 hours.

Full script, line-by-line breakdown, cron setup, and more variations:

bashsnippets.xyz/snippets/monitor-cpu-ram-usage.html