SIGNAL

Posted on Mar 6

Automate Docker Container Health Checks With a 50-Line Bash Script

#selfhosted #automation #linux #docker

If you're running any self-hosted services — a media server, a VPN, a database, a reverse proxy — you already know the silent killer: a container that's technically "Up" but completely broken inside. docker ps shows green. Your users see errors. You find out an hour later.

This article shows you how to build a real health-check automation script that goes beyond docker ps and actually verifies your containers are working.

The Problem With `docker ps`

Docker's default status is based on process state, not service health. A container running Nginx can be "Up" while Nginx itself has crashed and the supervisor is spinning. A Postgres container can be "Up" while still initializing and refusing connections.

Docker has a built-in HEALTHCHECK instruction for Dockerfiles, but:

Most third-party images don't define one
It only checks per-container, not across your whole stack
It doesn't send you alerts

What we want: a script that polls all running containers, checks if they're healthy, and fires a notification if something's wrong.

The Script

Save this as ~/bin/docker-health-check.sh:

#!/usr/bin/env bash
# docker-health-check.sh — Check running containers and alert on issues
# Usage: ./docker-health-check.sh [--notify]

set -euo pipefail

NOTIFY=${1:-""}
FAILED=()
WARNING=()

# --- Helpers ---
log() { echo "[$(date +%H:%M:%S)] $*"; }

alert() {
  local msg="$1"
  log "ALERT: $msg"
  if [[ "$NOTIFY" == "--notify" ]] && command -v notify-send &>/dev/null; then
    notify-send "Docker Health" "$msg" --urgency=critical
  fi
  # Optional: pipe to curl for webhook (Slack, Discord, ntfy.sh)
  # curl -s -X POST "$WEBHOOK_URL" -d "{\"text\": \"$msg\"}" > /dev/null
}

# --- Check Docker is running ---
if ! docker info &>/dev/null; then
  alert "Docker daemon is not running!"
  exit 1
fi

# --- Iterate running containers ---
while IFS= read -r line; do
  NAME=$(echo "$line" | awk '{print $NF}')
  STATUS=$(echo "$line" | awk '{print $2}')
  HEALTH=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "$NAME" 2>/dev/null)

  log "$NAME — status: $STATUS, health: $HEALTH"

  case "$HEALTH" in
    unhealthy)
      FAILED+=("$NAME (unhealthy)")
      alert "Container $NAME is UNHEALTHY"
      ;;
    starting)
      WARNING+=("$NAME (still starting)")
      ;;
    none)
      # No HEALTHCHECK defined — check if process is exited
      if [[ "$STATUS" != "Up" ]]; then
        FAILED+=("$NAME (status: $STATUS)")
        alert "Container $NAME has unexpected status: $STATUS"
      fi
      ;;
  esac

done < <(docker ps --format '{{.Status}}\t{{.Names}}' | grep -v 'Exited')

# --- Summary ---
echo ""
if [[ ${#FAILED[@]} -gt 0 ]]; then
  log "FAILED containers: ${FAILED[*]}"
  exit 2
elif [[ ${#WARNING[@]} -gt 0 ]]; then
  log "WARNING containers: ${WARNING[*]}"
  exit 1
else
  log "All containers healthy ✓"
  exit 0
fi

Make it executable:

chmod +x ~/bin/docker-health-check.sh

Adding a HEALTHCHECK to Your Compose Services

For services you control, add a healthcheck block to your docker-compose.yml:

services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

  postgres:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

Once these are in place, docker inspect <container> returns a real health status, and our script above can act on it.

Running It on a Schedule

Two options depending on your setup:

Option A: cron (classic)

crontab -e
# Check every 5 minutes, log to file
*/5 * * * * /home/you/bin/docker-health-check.sh --notify >> /var/log/docker-health.log 2>&1

Option B: systemd timer (more control)

Create /etc/systemd/system/docker-health.service:

[Unit]
Description=Docker Container Health Check
After=docker.service

[Service]
Type=oneshot
User=your-user
ExecStart=/home/you/bin/docker-health-check.sh --notify
StandardOutput=journal
StandardError=journal

And /etc/systemd/system/docker-health.timer:

[Unit]
Description=Run Docker health check every 5 minutes

[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Unit=docker-health.service

[Install]
WantedBy=timers.target

Enable it:

sudo systemctl daemon-reload
sudo systemctl enable --now docker-health.timer

# Verify
systemctl list-timers docker-health.timer

Push Alerts With ntfy.sh

For real-time push notifications to your phone, swap the alert function body for an ntfy.sh call:

alert() {
  local msg="$1"
  curl -s \
    -H "Title: Docker Health Alert" \
    -H "Priority: urgent" \
    -H "Tags: warning,whale" \
    -d "$msg" \
    https://ntfy.sh/your-private-topic-name > /dev/null
}

ntfy.sh is free, open-source, and you can self-host it too. Subscribe to your topic in the mobile app — done.

Real-World Usage Example

Here's what the output looks like on a typical homelab running 8 containers:

[09:15:02] nginx-proxy — status: Up 3 days, health: healthy
[09:15:02] immich-server — status: Up 3 days, health: healthy
[09:15:03] immich-db — status: Up 3 days, health: healthy
[09:15:03] vaultwarden — status: Up 3 days, health: none
[09:15:03] uptime-kuma — status: Up 3 days, health: none
[09:15:04] jellyfin — status: Up 3 days, health: healthy
[09:15:04] paperless-ngx — status: Up 2 hours, health: starting
[09:15:04] WARNING containers: paperless-ngx (still starting)

The starting state on paperless-ngx is fine — it just restarted. If it's still starting 15 minutes later, that's when you care.

What to Do When Something Fails

If the script exits with code 2 (FAILED), here's a quick triage sequence:

# What is the container's last health check output?
docker inspect --format='{{json .State.Health}}' <name> | jq '.Log[-1]'

# Last 50 lines of container logs
docker logs --tail 50 <name>

# Restart and watch
docker restart <name> && docker logs -f <name>

Summary

This whole setup takes about 10 minutes to deploy and will save you from discovering outages via angry users or missing the fact that a database has been silently crashing and restarting all night.

The script is intentionally simple — under 60 lines, no dependencies beyond bash and Docker. Extend it as you need: add HTTP endpoint checks, disk space warnings, or container CPU/memory thresholds with docker stats.

If you're self-hosting anything seriously, health-check automation is table stakes. Now you have no excuse.

SIGNAL is a weekly digest for builders. If you found this useful, check out the archive on Dev.to.

DEV Community

Automate Docker Container Health Checks With a 50-Line Bash Script

The Problem With `docker ps`

The Script

Adding a HEALTHCHECK to Your Compose Services

Running It on a Schedule

Option A: cron (classic)

Option B: systemd timer (more control)

Push Alerts With ntfy.sh

Real-World Usage Example

What to Do When Something Fails

Summary

Top comments (0)

The Problem With docker ps

The Script

Adding a HEALTHCHECK to Your Compose Services

Running It on a Schedule

Option A: cron (classic)

Option B: systemd timer (more control)

Push Alerts With ntfy.sh

Real-World Usage Example

What to Do When Something Fails

Summary

The Problem With `docker ps`