DEV Community

Guo
Guo

Posted on

How I Monitor Small Websites with Lightweight Uptime + TTFB Checks

Why I Stopped Using UptimeRobot

Let me be upfront: UptimeRobot is fine. Pingdom is fine. Better Stack is fine. For most people, a free-tier SaaS monitor is the right call.

But I manage about a dozen small static and semi-static sites — mostly content sites behind Cloudflare. Here's what bugged me about the hosted monitoring route:

  • Free tiers cap at 5–10 monitors. I have 12+ sites and want to check multiple endpoints per site.
  • Check intervals are 5 minutes at best. That's an eternity if your origin server goes down and CF cache expires.
  • No TTFB tracking over time. Most free monitors tell you "up" or "down" — they don't track whether your Time to First Byte is slowly creeping from 200ms to 1.8s.
  • Alert fatigue from false positives. Hosted monitors ping from external IPs that occasionally get rate-limited or geo-blocked. I'd get 3am alerts for sites that were perfectly fine.
  • One more dashboard I never check. I already live in the terminal. Adding another browser tab felt wrong.

So I built something dumb and simple. It runs on the same box that serves the sites, costs nothing, and does exactly what I need.


What I Actually Monitor

For each site, I care about three things:

  1. Is it responding with HTTP 200? (uptime)
  2. What's the TTFB from the origin? (performance baseline)
  3. Is the TLS cert expiring soon? (because Let's Encrypt renewals fail silently more often than you'd think)

That's it. No waterfall charts, no RUM, no synthetic transactions. Just the basics.


The Core: curl Does Everything

Here's the thing most people don't realize: curl has a built-in timing breakdown that gives you more detail than most monitoring dashboards.

curl -o /dev/null -s -w "%{http_code} %{time_namelookup} %{time_connect} %{time_appconnect} %{time_starttransfer} %{time_total}\n" https://example.com
Enter fullscreen mode Exit fullscreen mode

This outputs something like:

200 0.012 0.045 0.132 0.247 0.253
Enter fullscreen mode Exit fullscreen mode

Those numbers, in order:

Variable Meaning
time_namelookup DNS resolution
time_connect TCP handshake complete
time_appconnect TLS handshake complete
time_starttransfer TTFB — first byte received
time_total Full response downloaded

time_starttransfer is your TTFB. That single number tells you more about your server's health than a green/red dot on a dashboard.


The Script

Here's the actual script I run. It's about 60 lines of bash. No dependencies beyond curl and date.

#!/usr/bin/env bash
# site-check.sh — lightweight uptime + TTFB monitor

set -euo pipefail

# === Config ===
LOG_DIR="/var/log/site-monitor"
ALERT_TTFB=2.0        # seconds — alert if TTFB exceeds this
ALERT_TOTAL=5.0        # seconds — alert if total time exceeds this
NOTIFY_CMD=""          # set to your notification command (see below)

# Sites to check: URL and a friendly name
SITES=(
  "https://www.example.com|example"
  "https://www.another-site.com|another"
  "https://blog.example.com|blog"
)

mkdir -p "$LOG_DIR"

TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
DATE_FILE=$(date -u +"%Y-%m-%d")

for entry in "${SITES[@]}"; do
  IFS='|' read -r url name <<< "$entry"

  # Perform the check
  result=$(curl -o /dev/null -s -w "%{http_code}|%{time_namelookup}|%{time_connect}|%{time_appconnect}|%{time_starttransfer}|%{time_total}" \
    --max-time 10 \
    --connect-timeout 5 \
    -H "Cache-Control: no-cache" \
    -H "User-Agent: SiteMonitor/1.0" \
    "$url" 2>/dev/null) || result="000|0|0|0|0|0"

  IFS='|' read -r code dns tcp tls ttfb total <<< "$result"

  # Log every check as CSV
  echo "$TIMESTAMP,$name,$url,$code,$dns,$tcp,$tls,$ttfb,$total" \
    >> "$LOG_DIR/${name}_${DATE_FILE}.csv"

  # Determine status
  status="ok"
  reason=""

  if [[ "$code" != "200" && "$code" != "301" && "$code" != "302" ]]; then
    status="down"
    reason="HTTP $code"
  elif (( $(echo "$ttfb > $ALERT_TTFB" | bc -l) )); then
    status="slow"
    reason="TTFB ${ttfb}s"
  elif (( $(echo "$total > $ALERT_TOTAL" | bc -l) )); then
    status="slow"
    reason="Total ${total}s"
  fi

  # Alert if not ok
  if [[ "$status" != "ok" ]]; then
    msg="[$status] $name ($url) — $reason at $TIMESTAMP"
    echo "$msg" >> "$LOG_DIR/alerts.log"

    if [[ -n "$NOTIFY_CMD" ]]; then
      eval "$NOTIFY_CMD '$msg'"
    fi
  fi
done
Enter fullscreen mode Exit fullscreen mode

What's happening:

  • Each site gets checked with curl, bypassing cache with the Cache-Control: no-cache header.
  • Results are logged as CSV — one file per site per day. Easy to grep, easy to chart later.
  • If HTTP status isn't 2xx/3xx, or if TTFB/total time exceeds thresholds, it logs an alert.
  • The NOTIFY_CMD hook is where you plug in whatever alerting you prefer.

TLS Certificate Expiry Check

This is the one that's bitten me the most. Certbot silently fails, the cert expires, and suddenly your site is showing browser warnings to every visitor.

#!/usr/bin/env bash
# cert-check.sh — TLS certificate expiry monitor

WARN_DAYS=14

DOMAINS=(
  "www.example.com"
  "www.another-site.com"
  "blog.example.com"
)

for domain in "${DOMAINS[@]}"; do
  expiry=$(echo | openssl s_client -servername "$domain" -connect "$domain:443" 2>/dev/null \
    | openssl x509 -noout -enddate 2>/dev/null \
    | cut -d= -f2)

  if [[ -z "$expiry" ]]; then
    echo "[error] $domain — couldn't fetch cert"
    continue
  fi

  expiry_epoch=$(date -d "$expiry" +%s 2>/dev/null || date -jf "%b %d %H:%M:%S %Y %Z" "$expiry" +%s 2>/dev/null)
  now_epoch=$(date +%s)
  days_left=$(( (expiry_epoch - now_epoch) / 86400 ))

  if (( days_left < WARN_DAYS )); then
    echo "[warn] $domain — cert expires in ${days_left} days ($expiry)"
  else
    echo "[ok] $domain — cert valid for ${days_left} days"
  fi
done
Enter fullscreen mode Exit fullscreen mode

Note on portability: The date parsing differs between Linux (date -d) and macOS (date -jf). The script tries both. If you're only on Linux, simplify to just date -d.


Scheduling with Cron

I run the uptime check every 2 minutes and the cert check once daily:

*/2 * * * * /opt/scripts/site-check.sh >> /var/log/site-monitor/cron.log 2>&1
0 6 * * * /opt/scripts/cert-check.sh >> /var/log/site-monitor/cert-cron.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Every 2 minutes might sound aggressive, but curl with a 10-second timeout across a dozen sites finishes in under 5 seconds total. The server doesn't even notice.


Notifications: Keep It Stupid Simple

I've tried Slack webhooks, PagerDuty, custom Discord bots. For a solo operation, I came back to the simplest option: Telegram Bot API.

NOTIFY_CMD='send_telegram'

send_telegram() {
  local msg="$1"
  local bot_token="YOUR_BOT_TOKEN"
  local chat_id="YOUR_CHAT_ID"

  curl -s -X POST "https://api.telegram.org/bot${bot_token}/sendMessage" \
    -d chat_id="$chat_id" \
    -d text="$msg" \
    -d parse_mode="Markdown" > /dev/null
}
Enter fullscreen mode Exit fullscreen mode

Why Telegram?

  • Free, no rate limits for low volume.
  • Push notifications on my phone.
  • No app to maintain, no webhook server to keep running.

If you prefer Discord, Slack, or even plain email via sendmail, swap in your 5-line function. The interface is just a string going in.


Analyzing the Data

The CSV logs pile up. Here's how I actually use them.

Quick TTFB average for a site today

awk -F',' '{sum+=$8; n++} END {printf "Avg TTFB: %.3fs (%d checks)\n", sum/n, n}' \
  /var/log/site-monitor/example_2025-03-12.csv
Enter fullscreen mode Exit fullscreen mode

Find all checks where TTFB exceeded 1 second

awk -F',' '$8 > 1.0 {print $1, $2, $8"s"}' /var/log/site-monitor/example_*.csv
Enter fullscreen mode Exit fullscreen mode

Weekly TTFB trend (average per day)

for f in /var/log/site-monitor/example_2025-03-*.csv; do
  day=$(basename "$f" | grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2}')
  avg=$(awk -F',' '{sum+=$8; n++} END {printf "%.3f", sum/n}' "$f")
  echo "$day $avg"
done
Enter fullscreen mode Exit fullscreen mode

Output:

2025-03-06 0.234
2025-03-07 0.241
2025-03-08 0.228
2025-03-09 0.512
2025-03-10 0.519
2025-03-11 0.245
2025-03-12 0.238
Enter fullscreen mode Exit fullscreen mode

See that spike on March 9–10? That was a runaway log rotation filling the disk. TTFB doubled. The site never went "down" — a binary up/down monitor would've missed it entirely.

That's the whole point of tracking TTFB over time. Degradation is gradual. By the time a site is "down," you've already been serving slow pages for days.


Log Rotation

Don't let your monitoring logs become the disk problem. Simple logrotate config:

/var/log/site-monitor/*.csv {
    daily
    rotate 90
    compress
    delaycompress
    missingok
    notifempty
}
Enter fullscreen mode Exit fullscreen mode

90 days of compressed CSVs for a dozen sites is a few megabytes. Not worth optimizing further.


What I Intentionally Left Out

  • A web dashboard. If I need to visualize trends, I pipe the CSV into gnuplot or import into a spreadsheet. Building a dashboard is a project I don't need.
  • Multi-region checks. I'm monitoring from the origin server. This tells me if my server is healthy. For CDN/edge monitoring, you'd need external vantage points — and at that point, a SaaS tool makes more sense.
  • Incident management. It's just me. An alert hits my phone, I SSH in, I fix it. No escalation policies needed.
  • Database storage. CSV files with awk and grep handle everything at this scale. If I ever hit 100+ sites, I'd pipe into SQLite. But flat files are debug-friendly and zero-maintenance.

The Full Setup at a Glance

/opt/scripts/
├── site-check.sh       # uptime + TTFB (runs every 2 min)
├── cert-check.sh       # TLS expiry (runs daily)
└── notify.sh           # shared notification function

/var/log/site-monitor/
├── example_2025-03-12.csv
├── another_2025-03-12.csv
├── blog_2025-03-12.csv
├── alerts.log
└── cert-cron.log
Enter fullscreen mode Exit fullscreen mode

Total disk footprint of the scripts: under 4KB.
Total runtime per check cycle: under 5 seconds.
Total dependencies: curl, openssl, bash, cron. That's it.


When This Isn't Enough

Be honest about the limits. This setup stops making sense when:

  • You need multi-region monitoring (latency from Tokyo vs. Frankfurt matters)
  • You need status pages for customers
  • You have SLA obligations that require third-party verification
  • Your team is more than 2–3 people and needs shared dashboards

At that point, look at Uptime Kuma (self-hosted, has a UI) or bite the bullet on a paid SaaS.

But for a solo developer running a handful of sites? 60 lines of bash and a cron job is all you need.


Key Takeaways

  1. curl -w is criminally underused. It gives you DNS, TCP, TLS, and TTFB timing in one call.
  2. TTFB trending matters more than uptime percentage. A site can be "up" and still painfully slow.
  3. CSV + cron + awk is a real monitoring stack at small scale. Don't over-engineer.
  4. Check your TLS certs. Certbot failures are silent. Don't learn this the hard way.
  5. Keep alerting simple. A Telegram/Discord message beats a dashboard you never open.

The scripts in this post are simplified versions of what I actually run. Feel free to adapt them. If you improve on them, I'd love to hear about it in the comments.

Top comments (0)