DEV Community

SIGNAL
SIGNAL

Posted on

Automate Docker Container Health Checks With a 50-Line Bash Script

If you're running any self-hosted services — a media server, a VPN, a database, a reverse proxy — you already know the silent killer: a container that's technically "Up" but completely broken inside. docker ps shows green. Your users see errors. You find out an hour later.

This article shows you how to build a real health-check automation script that goes beyond docker ps and actually verifies your containers are working.


The Problem With docker ps

Docker's default status is based on process state, not service health. A container running Nginx can be "Up" while Nginx itself has crashed and the supervisor is spinning. A Postgres container can be "Up" while still initializing and refusing connections.

Docker has a built-in HEALTHCHECK instruction for Dockerfiles, but:

  • Most third-party images don't define one
  • It only checks per-container, not across your whole stack
  • It doesn't send you alerts

What we want: a script that polls all running containers, checks if they're healthy, and fires a notification if something's wrong.


The Script

Save this as ~/bin/docker-health-check.sh:

#!/usr/bin/env bash
# docker-health-check.sh — Check running containers and alert on issues
# Usage: ./docker-health-check.sh [--notify]

set -euo pipefail

NOTIFY=${1:-""}
FAILED=()
WARNING=()

# --- Helpers ---
log() { echo "[$(date +%H:%M:%S)] $*"; }

alert() {
  local msg="$1"
  log "ALERT: $msg"
  if [[ "$NOTIFY" == "--notify" ]] && command -v notify-send &>/dev/null; then
    notify-send "Docker Health" "$msg" --urgency=critical
  fi
  # Optional: pipe to curl for webhook (Slack, Discord, ntfy.sh)
  # curl -s -X POST "$WEBHOOK_URL" -d "{\"text\": \"$msg\"}" > /dev/null
}

# --- Check Docker is running ---
if ! docker info &>/dev/null; then
  alert "Docker daemon is not running!"
  exit 1
fi

# --- Iterate running containers ---
while IFS= read -r line; do
  NAME=$(echo "$line" | awk '{print $NF}')
  STATUS=$(echo "$line" | awk '{print $2}')
  HEALTH=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "$NAME" 2>/dev/null)

  log "$NAME — status: $STATUS, health: $HEALTH"

  case "$HEALTH" in
    unhealthy)
      FAILED+=("$NAME (unhealthy)")
      alert "Container $NAME is UNHEALTHY"
      ;;
    starting)
      WARNING+=("$NAME (still starting)")
      ;;
    none)
      # No HEALTHCHECK defined — check if process is exited
      if [[ "$STATUS" != "Up" ]]; then
        FAILED+=("$NAME (status: $STATUS)")
        alert "Container $NAME has unexpected status: $STATUS"
      fi
      ;;
  esac

done < <(docker ps --format '{{.Status}}\t{{.Names}}' | grep -v 'Exited')

# --- Summary ---
echo ""
if [[ ${#FAILED[@]} -gt 0 ]]; then
  log "FAILED containers: ${FAILED[*]}"
  exit 2
elif [[ ${#WARNING[@]} -gt 0 ]]; then
  log "WARNING containers: ${WARNING[*]}"
  exit 1
else
  log "All containers healthy ✓"
  exit 0
fi
Enter fullscreen mode Exit fullscreen mode

Make it executable:

chmod +x ~/bin/docker-health-check.sh
Enter fullscreen mode Exit fullscreen mode

Adding a HEALTHCHECK to Your Compose Services

For services you control, add a healthcheck block to your docker-compose.yml:

services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

  postgres:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
Enter fullscreen mode Exit fullscreen mode

Once these are in place, docker inspect <container> returns a real health status, and our script above can act on it.


Running It on a Schedule

Two options depending on your setup:

Option A: cron (classic)

crontab -e
# Check every 5 minutes, log to file
*/5 * * * * /home/you/bin/docker-health-check.sh --notify >> /var/log/docker-health.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Option B: systemd timer (more control)

Create /etc/systemd/system/docker-health.service:

[Unit]
Description=Docker Container Health Check
After=docker.service

[Service]
Type=oneshot
User=your-user
ExecStart=/home/you/bin/docker-health-check.sh --notify
StandardOutput=journal
StandardError=journal
Enter fullscreen mode Exit fullscreen mode

And /etc/systemd/system/docker-health.timer:

[Unit]
Description=Run Docker health check every 5 minutes

[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Unit=docker-health.service

[Install]
WantedBy=timers.target
Enter fullscreen mode Exit fullscreen mode

Enable it:

sudo systemctl daemon-reload
sudo systemctl enable --now docker-health.timer

# Verify
systemctl list-timers docker-health.timer
Enter fullscreen mode Exit fullscreen mode

Push Alerts With ntfy.sh

For real-time push notifications to your phone, swap the alert function body for an ntfy.sh call:

alert() {
  local msg="$1"
  curl -s \
    -H "Title: Docker Health Alert" \
    -H "Priority: urgent" \
    -H "Tags: warning,whale" \
    -d "$msg" \
    https://ntfy.sh/your-private-topic-name > /dev/null
}
Enter fullscreen mode Exit fullscreen mode

ntfy.sh is free, open-source, and you can self-host it too. Subscribe to your topic in the mobile app — done.


Real-World Usage Example

Here's what the output looks like on a typical homelab running 8 containers:

[09:15:02] nginx-proxy — status: Up 3 days, health: healthy
[09:15:02] immich-server — status: Up 3 days, health: healthy
[09:15:03] immich-db — status: Up 3 days, health: healthy
[09:15:03] vaultwarden — status: Up 3 days, health: none
[09:15:03] uptime-kuma — status: Up 3 days, health: none
[09:15:04] jellyfin — status: Up 3 days, health: healthy
[09:15:04] paperless-ngx — status: Up 2 hours, health: starting
[09:15:04] WARNING containers: paperless-ngx (still starting)
Enter fullscreen mode Exit fullscreen mode

The starting state on paperless-ngx is fine — it just restarted. If it's still starting 15 minutes later, that's when you care.


What to Do When Something Fails

If the script exits with code 2 (FAILED), here's a quick triage sequence:

# What is the container's last health check output?
docker inspect --format='{{json .State.Health}}' <name> | jq '.Log[-1]'

# Last 50 lines of container logs
docker logs --tail 50 <name>

# Restart and watch
docker restart <name> && docker logs -f <name>
Enter fullscreen mode Exit fullscreen mode

Summary

This whole setup takes about 10 minutes to deploy and will save you from discovering outages via angry users or missing the fact that a database has been silently crashing and restarting all night.

The script is intentionally simple — under 60 lines, no dependencies beyond bash and Docker. Extend it as you need: add HTTP endpoint checks, disk space warnings, or container CPU/memory thresholds with docker stats.

If you're self-hosting anything seriously, health-check automation is table stakes. Now you have no excuse.


SIGNAL is a weekly digest for builders. If you found this useful, check out the archive on Dev.to.

Top comments (0)