If you're running any self-hosted services — a media server, a VPN, a database, a reverse proxy — you already know the silent killer: a container that's technically "Up" but completely broken inside. docker ps shows green. Your users see errors. You find out an hour later.
This article shows you how to build a real health-check automation script that goes beyond docker ps and actually verifies your containers are working.
The Problem With docker ps
Docker's default status is based on process state, not service health. A container running Nginx can be "Up" while Nginx itself has crashed and the supervisor is spinning. A Postgres container can be "Up" while still initializing and refusing connections.
Docker has a built-in HEALTHCHECK instruction for Dockerfiles, but:
- Most third-party images don't define one
- It only checks per-container, not across your whole stack
- It doesn't send you alerts
What we want: a script that polls all running containers, checks if they're healthy, and fires a notification if something's wrong.
The Script
Save this as ~/bin/docker-health-check.sh:
#!/usr/bin/env bash
# docker-health-check.sh — Check running containers and alert on issues
# Usage: ./docker-health-check.sh [--notify]
set -euo pipefail
NOTIFY=${1:-""}
FAILED=()
WARNING=()
# --- Helpers ---
log() { echo "[$(date +%H:%M:%S)] $*"; }
alert() {
local msg="$1"
log "ALERT: $msg"
if [[ "$NOTIFY" == "--notify" ]] && command -v notify-send &>/dev/null; then
notify-send "Docker Health" "$msg" --urgency=critical
fi
# Optional: pipe to curl for webhook (Slack, Discord, ntfy.sh)
# curl -s -X POST "$WEBHOOK_URL" -d "{\"text\": \"$msg\"}" > /dev/null
}
# --- Check Docker is running ---
if ! docker info &>/dev/null; then
alert "Docker daemon is not running!"
exit 1
fi
# --- Iterate running containers ---
while IFS= read -r line; do
NAME=$(echo "$line" | awk '{print $NF}')
STATUS=$(echo "$line" | awk '{print $2}')
HEALTH=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "$NAME" 2>/dev/null)
log "$NAME — status: $STATUS, health: $HEALTH"
case "$HEALTH" in
unhealthy)
FAILED+=("$NAME (unhealthy)")
alert "Container $NAME is UNHEALTHY"
;;
starting)
WARNING+=("$NAME (still starting)")
;;
none)
# No HEALTHCHECK defined — check if process is exited
if [[ "$STATUS" != "Up" ]]; then
FAILED+=("$NAME (status: $STATUS)")
alert "Container $NAME has unexpected status: $STATUS"
fi
;;
esac
done < <(docker ps --format '{{.Status}}\t{{.Names}}' | grep -v 'Exited')
# --- Summary ---
echo ""
if [[ ${#FAILED[@]} -gt 0 ]]; then
log "FAILED containers: ${FAILED[*]}"
exit 2
elif [[ ${#WARNING[@]} -gt 0 ]]; then
log "WARNING containers: ${WARNING[*]}"
exit 1
else
log "All containers healthy ✓"
exit 0
fi
Make it executable:
chmod +x ~/bin/docker-health-check.sh
Adding a HEALTHCHECK to Your Compose Services
For services you control, add a healthcheck block to your docker-compose.yml:
services:
api:
image: my-api:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
postgres:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
Once these are in place, docker inspect <container> returns a real health status, and our script above can act on it.
Running It on a Schedule
Two options depending on your setup:
Option A: cron (classic)
crontab -e
# Check every 5 minutes, log to file
*/5 * * * * /home/you/bin/docker-health-check.sh --notify >> /var/log/docker-health.log 2>&1
Option B: systemd timer (more control)
Create /etc/systemd/system/docker-health.service:
[Unit]
Description=Docker Container Health Check
After=docker.service
[Service]
Type=oneshot
User=your-user
ExecStart=/home/you/bin/docker-health-check.sh --notify
StandardOutput=journal
StandardError=journal
And /etc/systemd/system/docker-health.timer:
[Unit]
Description=Run Docker health check every 5 minutes
[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Unit=docker-health.service
[Install]
WantedBy=timers.target
Enable it:
sudo systemctl daemon-reload
sudo systemctl enable --now docker-health.timer
# Verify
systemctl list-timers docker-health.timer
Push Alerts With ntfy.sh
For real-time push notifications to your phone, swap the alert function body for an ntfy.sh call:
alert() {
local msg="$1"
curl -s \
-H "Title: Docker Health Alert" \
-H "Priority: urgent" \
-H "Tags: warning,whale" \
-d "$msg" \
https://ntfy.sh/your-private-topic-name > /dev/null
}
ntfy.sh is free, open-source, and you can self-host it too. Subscribe to your topic in the mobile app — done.
Real-World Usage Example
Here's what the output looks like on a typical homelab running 8 containers:
[09:15:02] nginx-proxy — status: Up 3 days, health: healthy
[09:15:02] immich-server — status: Up 3 days, health: healthy
[09:15:03] immich-db — status: Up 3 days, health: healthy
[09:15:03] vaultwarden — status: Up 3 days, health: none
[09:15:03] uptime-kuma — status: Up 3 days, health: none
[09:15:04] jellyfin — status: Up 3 days, health: healthy
[09:15:04] paperless-ngx — status: Up 2 hours, health: starting
[09:15:04] WARNING containers: paperless-ngx (still starting)
The starting state on paperless-ngx is fine — it just restarted. If it's still starting 15 minutes later, that's when you care.
What to Do When Something Fails
If the script exits with code 2 (FAILED), here's a quick triage sequence:
# What is the container's last health check output?
docker inspect --format='{{json .State.Health}}' <name> | jq '.Log[-1]'
# Last 50 lines of container logs
docker logs --tail 50 <name>
# Restart and watch
docker restart <name> && docker logs -f <name>
Summary
This whole setup takes about 10 minutes to deploy and will save you from discovering outages via angry users or missing the fact that a database has been silently crashing and restarting all night.
The script is intentionally simple — under 60 lines, no dependencies beyond bash and Docker. Extend it as you need: add HTTP endpoint checks, disk space warnings, or container CPU/memory thresholds with docker stats.
If you're self-hosting anything seriously, health-check automation is table stakes. Now you have no excuse.
SIGNAL is a weekly digest for builders. If you found this useful, check out the archive on Dev.to.
Top comments (0)