DEV Community

Alan Varghese
Alan Varghese

Posted on

How I Built a Lightweight Cron Job Health Monitor with Bash and Docker

Tired of silent cron failures? Here's a lightweight Bash-based solution to monitor and alert on your scheduled tasks across multiple servers.

Why Your Cron Jobs Need Monitoring

We’ve all been there. You set up a "mission-critical" backup or data sync as a cron job, and then you forget about it. Six months later, you realize it hasn't run in weeks because of a silent failure, a disk space issue, or an SSH key change.

Cron is great for execution, but it's terrible at visibility.

In this post, I'll walk through a lightweight, Bash-powered monitoring solution I built to keep tabs on cron jobs across multiple servers without needing a heavy agent like Zabbix or Datadog.

🛠 The Architecture

The goal was simple:

  1. Discover jobs automatically from remote servers via SSH.
  2. Monitor their last execution time.
  3. Alert via Slack or Email if a job is "missed" or overdue.
  4. Test everything locally using Docker.
┌───────────────────────────────┐
│        Monitoring Host        │
│  (Bash + cron_health_monitor) │
└───────────────┬───────────────┘
                │
        ┌───────┴───────┐
        ▼               ▼
  ┌──────────┐    ┌──────────┐
  │ Server A │    │ Server B │
  │ (Docker) │    │ (Docker) │
  └──────────┘    └──────────┘
Enter fullscreen mode Exit fullscreen mode

🚀 Key Features

1. Automated SSH Discovery

The script scans crontab -l on remote servers, parses the schedules, and automatically adds them to its tracking list. No more manual entry for every single task.

2. State-Based Tracking

Instead of checking logs (which can be messy), the monitor looks at "last run" timestamps. Jobs can report their own completion via a simple CLI command:

./cron_health_monitor.sh record backup_job server1
Enter fullscreen mode Exit fullscreen mode

3. Dockerized Testing Environment

To ensure the monitor works before deploying it to production, I included a docker-compose.yml that spins up three Ubuntu servers. This allows you to simulate real-world cron failures in a safe sandbox.

🧩 The "Aha!" Moments (and Bugs)

While building this, I ran into a few classic engineering hurdles:

  • The Date Dilemma: I initially used BSD-style date commands (macOS default), which broke completely on the Linux target servers. I had to switch to the more universal GNU date -d syntax.
  • Docker Networking: On macOS, connecting to localhost inside a container can be tricky. Switching to 127.0.0.1 fixed several "Host not found" errors.
  • Silent Failures: I learned that if the SSH connection fails, the script should alert on the connection failure, not just the missing cron job.

🛠 Quick Start

If you want to try it out:

  1. Clone & Setup:
   ./setup.sh # Generates keys & starts Docker test servers
Enter fullscreen mode Exit fullscreen mode
  1. Discover Jobs:
   ./cron_health_monitor.sh discover
Enter fullscreen mode Exit fullscreen mode
  1. Run Health Check:
   ./cron_health_monitor.sh check
Enter fullscreen mode Exit fullscreen mode

💡 Lessons Learned

Bash is incredibly powerful for infrastructure glue code. By combining standard tools like ssh, awk, and grep, you can build a monitoring system that is:

  • Zero-agent: Nothing to install on target servers.
  • Low-overhead: Runs in milliseconds.
  • Portable: Works on almost any Linux distro.

Check out the full source code and my "bug log" in the repository!

https://github.com/alanvarghese-dev/Bash_Scripting/tree/main/cron_job_health_monitor


What are you using to monitor your legacy cron jobs? Let's discuss in the comments!

Top comments (1)

Collapse
 
palmaguer profile image
Pablo Almaguer

This looks pretty awesome. I'll definitely check out your repo!