We've all been there. Your website goes down at 3 AM. MySQL crashed. NGINX stopped responding. And you're scrambling to SSH into the server while your phone buzzes with angry customer emails.
Then someone suggests: "You should use Prometheus + Grafana + Alertmanager + PagerDuty!"
Sure. Or... hear me out... you could just use a 100-line bash script that checks your sites every minute and restarts services automatically when they fail.
The Problem with Enterprise Monitoring
Don't get me wrong - tools like Datadog, New Relic, and Prometheus are amazing. But they're also:
- π― Overkill for small projects
- π° Expensive for startups
- π§© Complex to set up and maintain
- π Slow to deploy (days/weeks of configuration)
- π Require learning new query languages and dashboards
Meanwhile, your website is still down.
Enter: The 100-Line Solution
What if monitoring could be this simple?
# 1. Add your websites
echo "https://example.com" >> sites.txt
# 2. Install
sudo ./install.sh
# 3. Done. Seriously.
That's it. Every minute, your server now:
- β Checks if your websites respond
- π Detects if services are overwhelmed (not just down!)
- π§ Automatically restarts MySQL, NGINX, or Apache
- π Logs only failures (no disk space waste)
- π Tracks failure counts to avoid false positives
How It Works (The Smart Part)
Most monitoring tools just check if a service is "running." That's not enough.
Here's what makes this script intelligent:
1. Load-Based Detection
# Don't just check if MySQL is running...
# Check if it's actually RESPONSIVE
check_mysql_health() {
# Try to ping MySQL
if timeout 3 mysqladmin ping; then
# It's alive! But is it overwhelmed?
current_connections=$(mysqladmin status | grep -oP 'Threads: \K\d+')
if [[ "$current_connections" -gt 150 ]]; then
# Too many connections - restart before it crashes
return 1
fi
fi
}
Your site can be down even when services show as "running" - when they're overloaded with traffic or locked up processing queries.
2. Advanced Health Checks
# NGINX example: Test config + connectivity + load
check_nginx_health() {
# 1. Validate config before trying to use it
nginx -t 2>/dev/null || return 1
# 2. Can it accept connections?
timeout 2 bash -c "echo > /dev/tcp/localhost/80" || return 1
# 3. Is it drowning in connections?
active_conn=$(curl -s http://localhost/nginx_status | grep -oP 'Active connections: \K\d+')
[[ "$active_conn" -gt 1000 ]] && return 1
return 0 # All good!
}
3. Smart Recovery Logic
# Only restart after 3 consecutive failures (avoid false positives)
if [[ "$current_failures" -ge 3 ]]; then
# Restart services in order: Database first, then web server
for service in "${SERVICES[@]}"; do
systemctl restart "$service"
done
fi
Real-World Example
Let's say your e-commerce site suddenly gets featured on Reddit (congrats! π). Traffic spikes 10x:
Traditional Monitoring:
- π Dashboards show high CPU/memory
- π¨ Alerts fire
- π¨βπ» You get paged
- β° You wake up, investigate, manually restart services
- πΈ Lost sales during downtime
This Script:
- π Detects MySQL has 200 active connections (threshold: 150)
- π€ Automatically restarts MySQL in 3 seconds
- π Logs:
"MySQL OVERLOADED (200 connections) - restarted"
- π΄ You stay asleep
- π° Sales continue
Installation (Seriously, It's This Easy)
# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/site-monitor.git
cd site-monitor
# 2. Add your websites
cat > sites.txt << EOF
https://example.com
https://api.example.com
https://www.example.com
EOF
# 3. Optional: Customize thresholds
vim config.conf # Adjust MySQL/NGINX/Apache thresholds
# 4. Install (creates cron job, sets up logging)
sudo ./install.sh
# 5. Watch it work
sudo tail -f /var/log/site-monitor/monitor.log
Output:
[2025-10-20 14:23:45] FAILURE: https://example.com - HTTP 000 (1/3 failures)
[2025-10-20 14:24:45] FAILURE: https://example.com - HTTP 000 (2/3 failures)
[2025-10-20 14:25:45] FAILURE: https://example.com - HTTP 000 (3/3 failures)
[2025-10-20 14:25:46] RECOVERY: Starting recovery for https://example.com
[2025-10-20 14:25:47] RECOVERY: MySQL OVERLOADED (187 connections) - restarted
[2025-10-20 14:25:49] RECOVERY: NGINX responsive - no action needed
[2025-10-20 14:25:50] RECOVERY: Recovery completed
[2025-10-20 14:26:45] SUCCESS: https://example.com back online (HTTP 200)
Configuration Options
Everything is configurable in config.conf
:
# HTTP Settings
TIMEOUT=10 # Request timeout
FAILURE_THRESHOLD=3 # Failures before recovery
# Services to manage (in order)
SERVICES=("mysql" "nginx") # Or: ("mysql" "apache2")
# Load Thresholds
MYSQL_MAX_CONNECTIONS=150 # Restart if connections exceed this
NGINX_MAX_CONNECTIONS=1000 # Restart if connections exceed this
APACHE_MAX_WORKERS=150 # Restart if busy workers exceed this
# Logging
LOG_SUCCESS=false # Only log failures (save disk space)
When to Use This vs. Enterprise Tools
Use This Simple Script When:
- π― You have < 50 websites to monitor
- π° You're on a budget (it's free!)
- β‘ You need it deployed TODAY
- π§ You manage your own Ubuntu servers
- π You want to understand what's happening (no black box)
Use Enterprise Tools When:
- π You need fancy dashboards and metrics
- π You have distributed microservices
- π₯ You have a dedicated DevOps team
- πΌ You need compliance/audit trails
- π You need integration with 50+ other tools
Performance & Resource Usage
This script is incredibly lightweight:
- CPU: Near zero (runs for ~1 second per minute)
- Memory: ~5MB
- Disk: <1MB logs per month (with default settings)
- Network: One HTTP GET per site per minute
Compare that to running Prometheus + Grafana (hundreds of MB of RAM).
Production-Ready Features
Don't let the simplicity fool you - this runs in production:
β
State Tracking: Counts consecutive failures per site
β
Log Rotation: Yearly rotation via logrotate
β
Error Handling: Graceful failures, timeout protection
β
No Dependencies: Just bash + curl + systemctl (already on Ubuntu)
β
Tested: Works on Ubuntu 22.04 LTS
Advanced Use Cases
Multi-Server Deployment
Deploy to multiple servers with different site lists:
# Server 1: Monitor frontend sites
echo "https://app.example.com" > sites.txt
# Server 2: Monitor API endpoints
echo "https://api.example.com" > sites.txt
# Server 3: Monitor admin tools
echo "https://admin.example.com" > sites.txt
Custom Services
Not just MySQL/NGINX! Add any systemd service:
# Add Redis, PHP-FPM, whatever you need
SERVICES=("mysql" "nginx" "redis-server" "php8.1-fpm")
Integration with Existing Tools
Still want Slack notifications? Just add a webhook:
# In monitor.sh, add after line 320:
curl -X POST "YOUR_SLACK_WEBHOOK" \
-d "{\"text\":\"π¨ $url is down! Auto-recovering...\"}"
The Philosophy: Simple > Complex
This project follows the Unix philosophy:
- Do one thing well
- Use plain text for data
- Build small, composable tools
Your monitoring doesn't need to be fancy. It needs to:
- Detect failures β
- Fix them automatically β
- Tell you what happened β
Mission accomplished in 100 lines of bash.
Try It Yourself
The code is open source (MIT License):
π GitHub: https://github.com/sgumz/site-monitor
Installation takes 2 minutes. Give it a try!
Closing Thoughts
Sometimes the best solution isn't the one with the most features - it's the one that solves your problem today without creating new ones.
Could this bash script replace Datadog for a Fortune 500 company? No.
Could it save your small SaaS business from 3 AM wake-up calls? Absolutely.
What's your take? Do you prefer simple scripts or enterprise monitoring? Any horror stories about over-engineered solutions? Drop a comment below! π
Top comments (0)