DEV Community

Tanvir Rahman
Tanvir Rahman

Posted on

🚨 Auto-Reboot Your Server on High CPU / Memory Load (With Safety Checks)

Sometimes a Linux server can get overwhelmed by sustained high CPU / Memory load β€” due to runaway processes, DDoS attacks or rogue scripts. Manually catching and fixing it in real-time isn't always possible. This guide will show you how to automatically reboot your server when CPU load is critically high for multiple minutes β€” but safely.

We’ll create a lightweight bash script, track load over time, and schedule it with cron to run every minute.

πŸ“œ Step 1: Create the Auto-Reboot Script
Let’s start by writing the script that checks system load and initiates a reboot after 3 consecutive high-load readings.

✏️ Create or Edit the Script

sudo nano /usr/local/bin/reboot-on-high-load.sh
Enter fullscreen mode Exit fullscreen mode

Paste this logic into the file:

#!/bin/bash

# Configuration
CPU_THRESHOLD=0.8                # 80% load per CPU core
MEM_THRESHOLD=90                 # 90% memory usage threshold
MAX_RETRIES=3                    # Reboot if high load/memory persists N times
CHECK_FILE="/tmp/highload.counter"
LOG_FILE="/var/log/reboot-load.log"
MAX_LOG_SIZE=$((10 * 1024 * 1024))  # 10MB

# Auto-truncate log if too large
if [ -f "$LOG_FILE" ] && [ "$(stat -c%s "$LOG_FILE")" -ge "$MAX_LOG_SIZE" ]; then
    echo "[!] Log file exceeded 10MB. Truncating..." >> "$LOG_FILE"
    truncate -s 0 "$LOG_FILE"
fi

# Detect CPU cores and load threshold
CPU_CORES=$(nproc)
LOAD_THRESHOLD=$(echo "$CPU_CORES * $CPU_THRESHOLD" | bc)

# CPU Load Check
LOAD_AVG=$(awk '{print $1}' /proc/loadavg)
LOAD_OK=$(echo "$LOAD_AVG < $LOAD_THRESHOLD" | bc)

# Memory Check
MEM_USED_PERCENT=$(free | awk '/Mem:/ { printf("%.0f", $3/$2 * 100) }')
MEM_OK=$( [ "$MEM_USED_PERCENT" -lt "$MEM_THRESHOLD" ] && echo 1 || echo 0 )

# Initialize counter file if needed
if [ ! -f "$CHECK_FILE" ]; then
    echo 0 > "$CHECK_FILE"
fi

COUNTER=$(cat "$CHECK_FILE")

# Evaluate system status
if [ "$LOAD_OK" -eq 0 ] || [ "$MEM_OK" -eq 0 ]; then
    echo "[!] High resource usage detected at $(date):" >> "$LOG_FILE"
    [ "$LOAD_OK" -eq 0 ] && echo "    - CPU Load: $LOAD_AVG / Threshold: $LOAD_THRESHOLD" >> "$LOG_FILE"
    [ "$MEM_OK" -eq 0 ] && echo "    - Memory Usage: $MEM_USED_PERCENT% / Threshold: $MEM_THRESHOLD%" >> "$LOG_FILE"
    COUNTER=$((COUNTER + 1))
    echo "$COUNTER" > "$CHECK_FILE"
else
    if [ "$COUNTER" -ne 0 ]; then
        echo "[βœ“] Resources back to normal at $(date): Load = $LOAD_AVG, Mem = $MEM_USED_PERCENT%" >> "$LOG_FILE"
    fi
    echo 0 > "$CHECK_FILE"
fi

# Reboot if over threshold for too long
if [ "$COUNTER" -ge "$MAX_RETRIES" ]; then
    echo "[!!!] High resource usage sustained for $MAX_RETRIES checks. Rebooting at $(date)..." >> "$LOG_FILE"
    rm -f "$CHECK_FILE"
    /sbin/shutdown -r now
fi

Enter fullscreen mode Exit fullscreen mode

πŸ”“ Step 2: Make the Script Executable

sudo chmod +x /usr/local/bin/reboot-on-high-load.sh
Enter fullscreen mode Exit fullscreen mode

⏰ Step 3: Schedule It to Run Every Minute
Open root crontab:

sudo crontab -e
Enter fullscreen mode Exit fullscreen mode

Add this line to the bottom:

* * * * * /usr/local/bin/reboot-on-high-load.sh >> /var/log/reboot-load.log 2>&1
Enter fullscreen mode Exit fullscreen mode

This schedules the script to run every minute and logs its output to /var/log/reboot-load.log.

πŸ›‘οΈ Why This Is Safe
βœ… No reboots on single spikes β€” It only reboots if load stays high for 3 checks.

βœ… Self-resetting β€” If load normalizes, the counter resets.

βœ… Persistent state tracking β€” Uses /tmp/highload.counter.

βœ… Simple logging β€” Outputs to /var/log/reboot-load.log.

βœ… Cron-scheduled β€” Lightweight, runs every 60 seconds.

πŸ§ͺ Optional: Test the Script by Simulating High Load
πŸ› οΈ 1. Install a CPU Stress Tool
On Ubuntu/Debian:

sudo apt update && sudo apt install -y stress
Enter fullscreen mode Exit fullscreen mode

On Alpine Linux:

apk add stress
Enter fullscreen mode Exit fullscreen mode

If stress is not available, use yes as a simple CPU loader (see below).

πŸš€ 2. Simulate High Load for Over 3 Minutes
Option A: With stress (preferred)

stress --cpu $(nproc) --timeout 200
Enter fullscreen mode Exit fullscreen mode

--cpu $(nproc) starts 1 thread per core.

--timeout 200 runs for 200 seconds (~3.3 minutes).

Option B: With yes Command

for i in $(seq 1 $(nproc)); do yes > /dev/null & done
Enter fullscreen mode Exit fullscreen mode

Let it run for at least 3 minutes, then stop it:

killall yes
Enter fullscreen mode Exit fullscreen mode

πŸ“ 3. Monitor the Logs
In a second terminal, run:

tail -f /var/log/reboot-load.log
Enter fullscreen mode Exit fullscreen mode

You’ll see output like:

yaml
Copy
Edit
[!] Load is high: 7.23 / Threshold: 6.40
[!] Load is high: 7.45 / Threshold: 6.40
[!!!] High load sustained for 3 checks. Rebooting...
Then the system will automatically reboot.
Enter fullscreen mode Exit fullscreen mode

βœ… Recap
By following this guide, you’ve set up a safe and efficient way to auto-reboot your Linux server during persistent high-load events:

βœ… Script with thresholds and state tracking

βœ… Cronjob for regular checks

βœ… No false positives from single spikes

βœ… Full log output for auditing

Top comments (1)

Collapse
 
antipt profile image
antipt

good stuff