How to Automate VPS Benchmarking with Bash, Cron, and Simple Logging

#bash #devops #linux #automation

Managing a handful of VPS instances is fine. Managing a dozen gets messy fast — especially when you want to know whether that provider's "1 Gbps" claim holds up after the honeymoon period ends, or whether CPU performance silently degrades on a shared hypervisor over time.

I got tired of SSHing into each box, running the same manual benchmarks, and then forgetting to compare the results to last month's numbers. So I built a small automation: each server runs a Bash script every six hours, logs results to a CSV, and pings me when something looks off. No Prometheus, no dashboards required — just Bash, Cron, and a text file.

Here's the full setup.

The Core Benchmark Script

Save this as /opt/bench/run_bench.sh. It covers the four things I track: network latency, download speed, CPU, and disk I/O.

#!/usr/bin/env bash
# /opt/bench/run_bench.sh
# Runs benchmarks and appends results to a monthly CSV log.

set -euo pipefail

LOG_DIR="/opt/bench/logs"
LOG_FILE="${LOG_DIR}/bench_$(date +%Y-%m).csv"
HOSTNAME=$(hostname -s)
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

mkdir -p "$LOG_DIR"

# Write CSV header if file is new
if [[ ! -f "$LOG_FILE" ]]; then
  echo "timestamp,host,ping_ms,dl_mbps,cpu_score,disk_write_mbps,disk_read_mbps" > "$LOG_FILE"
fi

# ---- 1. Ping latency (average across 3 public resolvers) ----
ping_avg() {
  local host="$1"
  ping -c 10 -q "$host" 2>/dev/null \
    | awk -F'/' '/^rtt/{print $5}'
}

PING_GOOGLE=$(ping_avg "8.8.8.8")
PING_CF=$(ping_avg "1.1.1.1")
PING_QUAD9=$(ping_avg "9.9.9.9")

PING_AVG=$(awk "BEGIN {printf \"%.2f\", ($PING_GOOGLE + $PING_CF + $PING_QUAD9) / 3}")

# ---- 2. Download speed via curl ----
# 100 MB test download, 15s timeout to avoid hanging on slow connections
DL_MBPS=$(curl -s -o /dev/null -w "%{speed_download}" \
  --max-time 15 \
  "https://speed.cloudflare.com/__down?bytes=104857600" \
  | awk '{printf "%.2f", $1 / 1048576}')

# ---- 3. CPU benchmark (pure Bash — no external dependencies) ----
cpu_bench() {
  local start end iterations=500000
  start=$(date +%s%N)
  for ((i=0; i<iterations; i++)); do
    : $((i * i))
  done
  end=$(date +%s%N)
  # iterations per millisecond
  echo $(( (iterations * 1000) / ((end - start) / 1000000) ))
}

CPU_SCORE=$(cpu_bench)

# If sysbench is available, uncomment for more standard numbers:
# CPU_SCORE=$(sysbench cpu --cpu-max-prime=20000 run 2>/dev/null \
#   | awk '/events per second/{printf "%.0f", $NF}')

# ---- 4. Disk I/O (dd write + read, 256 MB each) ----
TESTFILE="/tmp/bench_io_$$"

parse_dd_speed() {
  awk '/copied/{
    for(i=1;i<=NF;i++) {
      if($i=="MB/s" || $i=="GB/s") { val=$(i-1); unit=$i }
    }
    if(unit=="GB/s") val=val*1024
    printf "%.2f", val
  }'
}

DISK_WRITE=$(dd if=/dev/zero of="$TESTFILE" bs=1M count=256 conv=fdatasync 2>&1 \
  | parse_dd_speed)

DISK_READ=$(dd if="$TESTFILE" of=/dev/null bs=1M count=256 2>&1 \
  | parse_dd_speed)

rm -f "$TESTFILE"

# ---- Append result row to CSV ----
echo "${TIMESTAMP},${HOSTNAME},${PING_AVG},${DL_MBPS},${CPU_SCORE},${DISK_WRITE},${DISK_READ}" \
  >> "$LOG_FILE"

echo "[bench] ping=${PING_AVG}ms dl=${DL_MBPS}MB/s cpu=${CPU_SCORE} disk_w=${DISK_WRITE}MB/s disk_r=${DISK_READ}MB/s"

chmod +x /opt/bench/run_bench.sh

A note on the CPU score: it's "loop iterations per millisecond" — not a standard unit, but it's internally consistent on the same machine. For cross-server comparison, use sysbench. For trend tracking on one box, the Bash loop is perfectly fine.

Log Rotation

The script already separates logs by month (bench_2025-03.csv, etc.), giving you automatic monthly archives. To compress old files:

#!/usr/bin/env bash
# /opt/bench/rotate_logs.sh

LOG_DIR="/opt/bench/logs"
CURRENT_MONTH=$(date +%Y-%m)

find "$LOG_DIR" -name "bench_*.csv" | while read -r f; do
  month=$(basename "$f" .csv | sed 's/bench_//')
  if [[ "$month" != "$CURRENT_MONTH" ]]; then
    gzip -f "$f" && echo "Archived: ${f}"
  fi
done

Setting Up Cron

Here's where tutorials usually skip the annoying-but-critical details. Cron runs in a stripped environment: no .bashrc, potentially no /usr/local/bin in PATH, and output goes nowhere unless you redirect it.

Open crontab -e and add:

# Explicit PATH — don't assume Cron inherits yours
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# Benchmark every 6 hours
0 */6 * * * /opt/bench/run_bench.sh >> /opt/bench/logs/cron.log 2>&1

# Rotate logs on the 1st of each month at 3 AM
0 3 1 * * /opt/bench/rotate_logs.sh >> /opt/bench/logs/cron.log 2>&1

Two things worth highlighting:

Set PATH explicitly at the top of your crontab. On some minimal Debian/Ubuntu images, Cron's default PATH is just /usr/bin:/bin. If curl or ping live in /usr/local/bin, your job fails silently.

Always redirect 2>&1. If the script errors out, you want the message captured in a log — not swallowed by the void. I learned this the hard way after a curl version mismatch went undetected for two weeks.

Alerting When Things Go Wrong

Passive logging is fine; active alerts make it actionable. The key design decision here: use dynamic thresholds based on each server's own history instead of hardcoded values. A 50ms ping is fine for a server in Europe but alarming for one sitting in the same datacenter as you.

#!/usr/bin/env bash
# /opt/bench/check_alerts.sh
# Reads the current month's CSV and alerts if the latest reading
# is more than 3x (or less than 1/3) the historical average.

set -euo pipefail

LOG_DIR="/opt/bench/logs"
LOG_FILE="${LOG_DIR}/bench_$(date +%Y-%m).csv"
WEBHOOK_URL="${BENCH_WEBHOOK_URL:-}"
ALERT_EMAIL="${BENCH_ALERT_EMAIL:-}"

if [[ ! -f "$LOG_FILE" ]]; then
  exit 0
fi

# Args: column_number metric_label direction(high|low)
# Returns alert message on stdout, exits 1 if alert triggered
check_metric() {
  local col="$1" label="$2" direction="$3"

  awk -F',' -v col="$col" -v label="$label" -v dir="$direction" '
    NR == 1 { next }
    { rows[NR] = $col; total += $col; count++ }
    END {
      if (count < 3) exit 0
      last = rows[NR]
      hist_avg = (total - last) / (count - 1)
      if (hist_avg == 0) exit 0

      if (dir == "high" && last > hist_avg * 3) {
        printf "ALERT %s: %.2f (3x avg %.2f)\n", label, last, hist_avg
        exit 1
      } else if (dir == "low" && last < hist_avg / 3) {
        printf "ALERT %s: %.2f (< 1/3 avg %.2f)\n", label, last, hist_avg
        exit 1
      }
    }
  ' "$LOG_FILE"
}

PING_ALERT=$(check_metric 3 "ping_ms" "high" || true)
DL_ALERT=$(check_metric 4 "dl_mbps" "low" || true)
CPU_ALERT=$(check_metric 5 "cpu_score" "low" || true)

ALERTS="${PING_ALERT}${DL_ALERT}${CPU_ALERT}"

if [[ -n "$ALERTS" ]]; then
  HOST=$(hostname -s)
  MESSAGE="[bench] ${HOST}: ${ALERTS}"
  echo "$MESSAGE"

  # Slack / Discord / any JSON webhook
  if [[ -n "$WEBHOOK_URL" ]]; then
    curl -s -X POST "$WEBHOOK_URL" \
      -H "Content-Type: application/json" \
      -d "{\"text\": \"${MESSAGE}\"}" > /dev/null
  fi

  # Email via sendmail or msmtp
  if [[ -n "$ALERT_EMAIL" ]]; then
    echo -e "Subject: ${MESSAGE}\n\n${MESSAGE}" | sendmail "$ALERT_EMAIL"
  fi
fi

Set the variables in your crontab environment block and chain the scripts:

BENCH_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/HERE
BENCH_ALERT_EMAIL=you@example.com

0 */6 * * * /opt/bench/run_bench.sh >> /opt/bench/logs/cron.log 2>&1 && \
            /opt/bench/check_alerts.sh >> /opt/bench/logs/cron.log 2>&1

The dynamic threshold approach means no per-server tuning. After a week of baseline data, anomalies start surfacing on their own.

Optional: Visualizing the Trends

If you want a quick trend chart without standing up a full monitoring stack, gnuplot handles it in a few lines:

#!/usr/bin/env bash
# /opt/bench/plot_trends.sh
# Generates PNG charts from the current month's data.

MONTH=$(date +%Y-%m)
LOG_FILE="/opt/bench/logs/bench_${MONTH}.csv"
OUT_DIR="/opt/bench/logs"

gnuplot <<EOF
set datafile separator ","
set xdata time
set timefmt "%Y-%m-%dT%H:%M:%SZ"
set format x "%m/%d %H:%M"
set xtics rotate by -45
set grid
set terminal png size 1200,400

set output "${OUT_DIR}/ping_trend.png"
set title "Ping Latency — ${MONTH}"
set ylabel "ms"
plot "${LOG_FILE}" using 1:3 skip 1 with linespoints lw 2 title "ping_ms"

set output "${OUT_DIR}/dl_trend.png"
set title "Download Speed — ${MONTH}"
set ylabel "MB/s"
plot "${LOG_FILE}" using 1:4 skip 1 with linespoints lw 2 lc rgb "green" title "dl_mbps"
EOF

echo "Charts written to ${OUT_DIR}/"

If you prefer the browser, this minimal HTML file loads and plots the CSV client-side with Chart.js (no server needed — just open it locally or serve it from nginx):

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Bench Trends</title>
  <script src="https://cdn.jsdelivr.net/npm/chart.js@4/dist/chart.umd.min.js"></script>
  <style>body { font-family: sans-serif; max-width: 1100px; margin: 2rem auto; }</style>
</head>
<body>
  <h2>Ping Latency</h2>
  <canvas id="ping" height="120"></canvas>
  <h2>Download Speed</h2>
  <canvas id="dl" height="120"></canvas>

  <script>
    // Update filename to current month before deploying
    fetch('bench_YYYY-MM.csv')
      .then(r => r.text())
      .then(csv => {
        const rows = csv.trim().split('\n').slice(1).map(r => r.split(','));
        const labels = rows.map(r => r[0].slice(0, 16));
        const ping   = rows.map(r => parseFloat(r[2]));
        const dl     = rows.map(r => parseFloat(r[3]));

        const mkChart = (id, label, data, color) => new Chart(
          document.getElementById(id),
          { type: 'line',
            data: { labels, datasets: [{ label, data, borderColor: color, tension: 0.3, pointRadius: 2 }] },
            options: { plugins: { legend: { display: false } }, scales: { x: { ticks: { maxTicksLimit: 12 } } } }
          }
        );

        mkChart('ping', 'Ping (ms)',    ping, '#e74c3c');
        mkChart('dl',   'Speed (MB/s)', dl,   '#2ecc71');
      });
  </script>
</body>
</html>

After a Few Months of Running This

The biggest win isn't catching catastrophic failures — those are usually obvious. It's catching the slow drift: a server that was reliably hitting 800 MB/s disk writes in January that's now averaging 420 MB/s in March. Without the log, I'd never have noticed.

A few practical observations from running this on about eight servers:

False positives are rare with the 3x threshold. In four months I've had three genuine alerts and zero false ones.
Disk I/O is the noisiest metric. dd results vary a lot run to run, especially on NVMe. If you want something more stable, consider fio instead — but that's a separate rabbit hole.
The CSV format pays off. I've imported the data into Excel, fed it to pandas, and piped it through awk one-liners. Structured plaintext ages well.

The natural next step is shipping these CSVs into something like Prometheus (via a custom exporter or node_exporter textfile collector) and viewing them in Grafana. But honestly, for most use cases, a few PNG charts and a Slack ping when things go sideways is enough. Sometimes the simple solution is the right one.

The full script set lives at /opt/bench/ — about 150 lines of Bash total. No containers, no daemons, no dependencies beyond curl, ping, dd, and gzip. Runs on every Linux distro I've tried without modification.

If you extend it or hit edge cases, I'd be curious what you run into.