Why I Stopped Using UptimeRobot
Let me be upfront: UptimeRobot is fine. Pingdom is fine. Better Stack is fine. For most people, a free-tier SaaS monitor is the right call.
But I manage about a dozen small static and semi-static sites — mostly content sites behind Cloudflare. Here's what bugged me about the hosted monitoring route:
- Free tiers cap at 5–10 monitors. I have 12+ sites and want to check multiple endpoints per site.
- Check intervals are 5 minutes at best. That's an eternity if your origin server goes down and CF cache expires.
- No TTFB tracking over time. Most free monitors tell you "up" or "down" — they don't track whether your Time to First Byte is slowly creeping from 200ms to 1.8s.
- Alert fatigue from false positives. Hosted monitors ping from external IPs that occasionally get rate-limited or geo-blocked. I'd get 3am alerts for sites that were perfectly fine.
- One more dashboard I never check. I already live in the terminal. Adding another browser tab felt wrong.
So I built something dumb and simple. It runs on the same box that serves the sites, costs nothing, and does exactly what I need.
What I Actually Monitor
For each site, I care about three things:
- Is it responding with HTTP 200? (uptime)
- What's the TTFB from the origin? (performance baseline)
- Is the TLS cert expiring soon? (because Let's Encrypt renewals fail silently more often than you'd think)
That's it. No waterfall charts, no RUM, no synthetic transactions. Just the basics.
The Core: curl Does Everything
Here's the thing most people don't realize: curl has a built-in timing breakdown that gives you more detail than most monitoring dashboards.
curl -o /dev/null -s -w "%{http_code} %{time_namelookup} %{time_connect} %{time_appconnect} %{time_starttransfer} %{time_total}\n" https://example.com
This outputs something like:
200 0.012 0.045 0.132 0.247 0.253
Those numbers, in order:
| Variable | Meaning |
|---|---|
time_namelookup |
DNS resolution |
time_connect |
TCP handshake complete |
time_appconnect |
TLS handshake complete |
time_starttransfer |
TTFB — first byte received |
time_total |
Full response downloaded |
time_starttransfer is your TTFB. That single number tells you more about your server's health than a green/red dot on a dashboard.
The Script
Here's the actual script I run. It's about 60 lines of bash. No dependencies beyond curl and date.
#!/usr/bin/env bash
# site-check.sh — lightweight uptime + TTFB monitor
set -euo pipefail
# === Config ===
LOG_DIR="/var/log/site-monitor"
ALERT_TTFB=2.0 # seconds — alert if TTFB exceeds this
ALERT_TOTAL=5.0 # seconds — alert if total time exceeds this
NOTIFY_CMD="" # set to your notification command (see below)
# Sites to check: URL and a friendly name
SITES=(
"https://www.example.com|example"
"https://www.another-site.com|another"
"https://blog.example.com|blog"
)
mkdir -p "$LOG_DIR"
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
DATE_FILE=$(date -u +"%Y-%m-%d")
for entry in "${SITES[@]}"; do
IFS='|' read -r url name <<< "$entry"
# Perform the check
result=$(curl -o /dev/null -s -w "%{http_code}|%{time_namelookup}|%{time_connect}|%{time_appconnect}|%{time_starttransfer}|%{time_total}" \
--max-time 10 \
--connect-timeout 5 \
-H "Cache-Control: no-cache" \
-H "User-Agent: SiteMonitor/1.0" \
"$url" 2>/dev/null) || result="000|0|0|0|0|0"
IFS='|' read -r code dns tcp tls ttfb total <<< "$result"
# Log every check as CSV
echo "$TIMESTAMP,$name,$url,$code,$dns,$tcp,$tls,$ttfb,$total" \
>> "$LOG_DIR/${name}_${DATE_FILE}.csv"
# Determine status
status="ok"
reason=""
if [[ "$code" != "200" && "$code" != "301" && "$code" != "302" ]]; then
status="down"
reason="HTTP $code"
elif (( $(echo "$ttfb > $ALERT_TTFB" | bc -l) )); then
status="slow"
reason="TTFB ${ttfb}s"
elif (( $(echo "$total > $ALERT_TOTAL" | bc -l) )); then
status="slow"
reason="Total ${total}s"
fi
# Alert if not ok
if [[ "$status" != "ok" ]]; then
msg="[$status] $name ($url) — $reason at $TIMESTAMP"
echo "$msg" >> "$LOG_DIR/alerts.log"
if [[ -n "$NOTIFY_CMD" ]]; then
eval "$NOTIFY_CMD '$msg'"
fi
fi
done
What's happening:
- Each site gets checked with
curl, bypassing cache with theCache-Control: no-cacheheader. - Results are logged as CSV — one file per site per day. Easy to grep, easy to chart later.
- If HTTP status isn't 2xx/3xx, or if TTFB/total time exceeds thresholds, it logs an alert.
- The
NOTIFY_CMDhook is where you plug in whatever alerting you prefer.
TLS Certificate Expiry Check
This is the one that's bitten me the most. Certbot silently fails, the cert expires, and suddenly your site is showing browser warnings to every visitor.
#!/usr/bin/env bash
# cert-check.sh — TLS certificate expiry monitor
WARN_DAYS=14
DOMAINS=(
"www.example.com"
"www.another-site.com"
"blog.example.com"
)
for domain in "${DOMAINS[@]}"; do
expiry=$(echo | openssl s_client -servername "$domain" -connect "$domain:443" 2>/dev/null \
| openssl x509 -noout -enddate 2>/dev/null \
| cut -d= -f2)
if [[ -z "$expiry" ]]; then
echo "[error] $domain — couldn't fetch cert"
continue
fi
expiry_epoch=$(date -d "$expiry" +%s 2>/dev/null || date -jf "%b %d %H:%M:%S %Y %Z" "$expiry" +%s 2>/dev/null)
now_epoch=$(date +%s)
days_left=$(( (expiry_epoch - now_epoch) / 86400 ))
if (( days_left < WARN_DAYS )); then
echo "[warn] $domain — cert expires in ${days_left} days ($expiry)"
else
echo "[ok] $domain — cert valid for ${days_left} days"
fi
done
Note on portability: The
dateparsing differs between Linux (date -d) and macOS (date -jf). The script tries both. If you're only on Linux, simplify to justdate -d.
Scheduling with Cron
I run the uptime check every 2 minutes and the cert check once daily:
*/2 * * * * /opt/scripts/site-check.sh >> /var/log/site-monitor/cron.log 2>&1
0 6 * * * /opt/scripts/cert-check.sh >> /var/log/site-monitor/cert-cron.log 2>&1
Every 2 minutes might sound aggressive, but curl with a 10-second timeout across a dozen sites finishes in under 5 seconds total. The server doesn't even notice.
Notifications: Keep It Stupid Simple
I've tried Slack webhooks, PagerDuty, custom Discord bots. For a solo operation, I came back to the simplest option: Telegram Bot API.
NOTIFY_CMD='send_telegram'
send_telegram() {
local msg="$1"
local bot_token="YOUR_BOT_TOKEN"
local chat_id="YOUR_CHAT_ID"
curl -s -X POST "https://api.telegram.org/bot${bot_token}/sendMessage" \
-d chat_id="$chat_id" \
-d text="$msg" \
-d parse_mode="Markdown" > /dev/null
}
Why Telegram?
- Free, no rate limits for low volume.
- Push notifications on my phone.
- No app to maintain, no webhook server to keep running.
If you prefer Discord, Slack, or even plain email via sendmail, swap in your 5-line function. The interface is just a string going in.
Analyzing the Data
The CSV logs pile up. Here's how I actually use them.
Quick TTFB average for a site today
awk -F',' '{sum+=$8; n++} END {printf "Avg TTFB: %.3fs (%d checks)\n", sum/n, n}' \
/var/log/site-monitor/example_2025-03-12.csv
Find all checks where TTFB exceeded 1 second
awk -F',' '$8 > 1.0 {print $1, $2, $8"s"}' /var/log/site-monitor/example_*.csv
Weekly TTFB trend (average per day)
for f in /var/log/site-monitor/example_2025-03-*.csv; do
day=$(basename "$f" | grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2}')
avg=$(awk -F',' '{sum+=$8; n++} END {printf "%.3f", sum/n}' "$f")
echo "$day $avg"
done
Output:
2025-03-06 0.234
2025-03-07 0.241
2025-03-08 0.228
2025-03-09 0.512
2025-03-10 0.519
2025-03-11 0.245
2025-03-12 0.238
See that spike on March 9–10? That was a runaway log rotation filling the disk. TTFB doubled. The site never went "down" — a binary up/down monitor would've missed it entirely.
That's the whole point of tracking TTFB over time. Degradation is gradual. By the time a site is "down," you've already been serving slow pages for days.
Log Rotation
Don't let your monitoring logs become the disk problem. Simple logrotate config:
/var/log/site-monitor/*.csv {
daily
rotate 90
compress
delaycompress
missingok
notifempty
}
90 days of compressed CSVs for a dozen sites is a few megabytes. Not worth optimizing further.
What I Intentionally Left Out
-
A web dashboard. If I need to visualize trends, I pipe the CSV into
gnuplotor import into a spreadsheet. Building a dashboard is a project I don't need. - Multi-region checks. I'm monitoring from the origin server. This tells me if my server is healthy. For CDN/edge monitoring, you'd need external vantage points — and at that point, a SaaS tool makes more sense.
- Incident management. It's just me. An alert hits my phone, I SSH in, I fix it. No escalation policies needed.
-
Database storage. CSV files with
awkandgrephandle everything at this scale. If I ever hit 100+ sites, I'd pipe into SQLite. But flat files are debug-friendly and zero-maintenance.
The Full Setup at a Glance
/opt/scripts/
├── site-check.sh # uptime + TTFB (runs every 2 min)
├── cert-check.sh # TLS expiry (runs daily)
└── notify.sh # shared notification function
/var/log/site-monitor/
├── example_2025-03-12.csv
├── another_2025-03-12.csv
├── blog_2025-03-12.csv
├── alerts.log
└── cert-cron.log
Total disk footprint of the scripts: under 4KB.
Total runtime per check cycle: under 5 seconds.
Total dependencies: curl, openssl, bash, cron. That's it.
When This Isn't Enough
Be honest about the limits. This setup stops making sense when:
- You need multi-region monitoring (latency from Tokyo vs. Frankfurt matters)
- You need status pages for customers
- You have SLA obligations that require third-party verification
- Your team is more than 2–3 people and needs shared dashboards
At that point, look at Uptime Kuma (self-hosted, has a UI) or bite the bullet on a paid SaaS.
But for a solo developer running a handful of sites? 60 lines of bash and a cron job is all you need.
Key Takeaways
-
curl -wis criminally underused. It gives you DNS, TCP, TLS, and TTFB timing in one call. - TTFB trending matters more than uptime percentage. A site can be "up" and still painfully slow.
- CSV + cron + awk is a real monitoring stack at small scale. Don't over-engineer.
- Check your TLS certs. Certbot failures are silent. Don't learn this the hard way.
- Keep alerting simple. A Telegram/Discord message beats a dashboard you never open.
The scripts in this post are simplified versions of what I actually run. Feel free to adapt them. If you improve on them, I'd love to hear about it in the comments.
Top comments (0)