Tired of silent cron failures? Here's a lightweight Bash-based solution to monitor and alert on your scheduled tasks across multiple servers.
Why Your Cron Jobs Need Monitoring
We’ve all been there. You set up a "mission-critical" backup or data sync as a cron job, and then you forget about it. Six months later, you realize it hasn't run in weeks because of a silent failure, a disk space issue, or an SSH key change.
Cron is great for execution, but it's terrible at visibility.
In this post, I'll walk through a lightweight, Bash-powered monitoring solution I built to keep tabs on cron jobs across multiple servers without needing a heavy agent like Zabbix or Datadog.
🛠 The Architecture
The goal was simple:
- Discover jobs automatically from remote servers via SSH.
- Monitor their last execution time.
- Alert via Slack or Email if a job is "missed" or overdue.
- Test everything locally using Docker.
┌───────────────────────────────┐
│ Monitoring Host │
│ (Bash + cron_health_monitor) │
└───────────────┬───────────────┘
│
┌───────┴───────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ Server A │ │ Server B │
│ (Docker) │ │ (Docker) │
└──────────┘ └──────────┘
🚀 Key Features
1. Automated SSH Discovery
The script scans crontab -l on remote servers, parses the schedules, and automatically adds them to its tracking list. No more manual entry for every single task.
2. State-Based Tracking
Instead of checking logs (which can be messy), the monitor looks at "last run" timestamps. Jobs can report their own completion via a simple CLI command:
./cron_health_monitor.sh record backup_job server1
3. Dockerized Testing Environment
To ensure the monitor works before deploying it to production, I included a docker-compose.yml that spins up three Ubuntu servers. This allows you to simulate real-world cron failures in a safe sandbox.
🧩 The "Aha!" Moments (and Bugs)
While building this, I ran into a few classic engineering hurdles:
- The Date Dilemma: I initially used BSD-style
datecommands (macOS default), which broke completely on the Linux target servers. I had to switch to the more universal GNUdate -dsyntax. - Docker Networking: On macOS, connecting to
localhostinside a container can be tricky. Switching to127.0.0.1fixed several "Host not found" errors. - Silent Failures: I learned that if the SSH connection fails, the script should alert on the connection failure, not just the missing cron job.
🛠 Quick Start
If you want to try it out:
- Clone & Setup:
./setup.sh # Generates keys & starts Docker test servers
- Discover Jobs:
./cron_health_monitor.sh discover
- Run Health Check:
./cron_health_monitor.sh check
💡 Lessons Learned
Bash is incredibly powerful for infrastructure glue code. By combining standard tools like ssh, awk, and grep, you can build a monitoring system that is:
- Zero-agent: Nothing to install on target servers.
- Low-overhead: Runs in milliseconds.
- Portable: Works on almost any Linux distro.
Check out the full source code and my "bug log" in the repository!
https://github.com/alanvarghese-dev/Bash_Scripting/tree/main/cron_job_health_monitor
What are you using to monitor your legacy cron jobs? Let's discuss in the comments!
Top comments (0)