How I Built a Lightweight Cron Job Health Monitor with Bash and Docker

#devops #automation #bash #linux

Tired of silent cron failures? Here's a lightweight Bash-based solution to monitor and alert on your scheduled tasks across multiple servers.

Why Your Cron Jobs Need Monitoring

We’ve all been there. You set up a "mission-critical" backup or data sync as a cron job, and then you forget about it. Six months later, you realize it hasn't run in weeks because of a silent failure, a disk space issue, or an SSH key change.

Cron is great for execution, but it's terrible at visibility.

In this post, I'll walk through a lightweight, Bash-powered monitoring solution I built to keep tabs on cron jobs across multiple servers without needing a heavy agent like Zabbix or Datadog.

🛠 The Architecture

The goal was simple:

Discover jobs automatically from remote servers via SSH.
Monitor their last execution time.
Alert via Slack or Email if a job is "missed" or overdue.
Test everything locally using Docker.

┌───────────────────────────────┐
│        Monitoring Host        │
│  (Bash + cron_health_monitor) │
└───────────────┬───────────────┘
                │
        ┌───────┴───────┐
        ▼               ▼
  ┌──────────┐    ┌──────────┐
  │ Server A │    │ Server B │
  │ (Docker) │    │ (Docker) │
  └──────────┘    └──────────┘

🚀 Key Features

1. Automated SSH Discovery

The script scans crontab -l on remote servers, parses the schedules, and automatically adds them to its tracking list. No more manual entry for every single task.

2. State-Based Tracking

Instead of checking logs (which can be messy), the monitor looks at "last run" timestamps. Jobs can report their own completion via a simple CLI command:

./cron_health_monitor.sh record backup_job server1

3. Dockerized Testing Environment

To ensure the monitor works before deploying it to production, I included a docker-compose.yml that spins up three Ubuntu servers. This allows you to simulate real-world cron failures in a safe sandbox.

🧩 The "Aha!" Moments (and Bugs)

While building this, I ran into a few classic engineering hurdles:

The Date Dilemma: I initially used BSD-style date commands (macOS default), which broke completely on the Linux target servers. I had to switch to the more universal GNU date -d syntax.
Docker Networking: On macOS, connecting to localhost inside a container can be tricky. Switching to 127.0.0.1 fixed several "Host not found" errors.
Silent Failures: I learned that if the SSH connection fails, the script should alert on the connection failure, not just the missing cron job.

🛠 Quick Start

If you want to try it out:

Clone & Setup:

   ./setup.sh # Generates keys & starts Docker test servers

Discover Jobs:

   ./cron_health_monitor.sh discover

Run Health Check:

   ./cron_health_monitor.sh check

💡 Lessons Learned

Bash is incredibly powerful for infrastructure glue code. By combining standard tools like ssh, awk, and grep, you can build a monitoring system that is:

Zero-agent: Nothing to install on target servers.
Low-overhead: Runs in milliseconds.
Portable: Works on almost any Linux distro.

Check out the full source code and my "bug log" in the repository!

https://github.com/alanvarghese-dev/Bash_Scripting/tree/main/cron_job_health_monitor

What are you using to monitor your legacy cron jobs? Let's discuss in the comments!