Vigilmon

Posted on Jun 26

Monitor Docker containers with multi-region uptime checks - no false alerts

#docker #containers #monitoring #devops

Running Docker containers in production means your app is portable, reproducible, and (hopefully) always up. But "always up" requires monitoring — and most single-probe tools create more noise than signal.

This post shows how to add multi-region uptime monitoring to Dockerized apps so you get paged when your container is actually down, not when a single probe has a bad day.

The false alert problem with Docker monitoring

Single-probe tools work like this: one server pings your endpoint every minute. If it times out, you get an alert. If that one server has a routing issue? You get an alert. If your ISP hiccups for 10 seconds? Alert. On-call at 3 AM for a problem that fixed itself? Alert.

Multi-region monitoring solves this with consensus: multiple probes in different geographic regions must agree that your container is unreachable before firing an alert. One region's blip is ignored. Actual downtime triggers immediately.

Vigilmon uses this approach — free tier, no credit card.

Step 1: Add a health check to your Docker container

Docker has built-in health checks. Add one to your Dockerfile:

FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000

# Health check: curl the /health endpoint every 30s
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

CMD ["node", "server.js"]

The HEALTHCHECK instruction tells Docker to test your container's health. If it fails 3 times consecutively, Docker marks the container as unhealthy — useful for orchestrators like Kubernetes or ECS to restart it automatically.

Step 2: Set up a health endpoint in your app

Your app needs a /health route that returns 200 when healthy:

// Node.js/Express example
app.get('/health', (req, res) => {
  res.json({ status: 'ok', timestamp: new Date().toISOString() });
});

# Python/Flask example
@app.route('/health')
def health():
    return {"status": "ok"}, 200

// Go example
http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
})

Step 3: docker-compose health check example

For local dev and staging environments with docker-compose:

version: '3.8'
services:
  api:
    build: .
    ports:
      - "3000:3000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 15s
    restart: unless-stopped

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

The depends_on with condition: service_healthy ensures your API only starts after Postgres is ready:

  api:
    depends_on:
      db:
        condition: service_healthy

Step 4: Add external multi-region monitoring with Vigilmon

Docker's built-in health check only monitors from within the host. It won't catch:

Network routing issues between your users and the server
DNS failures
SSL certificate problems
The entire host going down

That's where external monitoring comes in.

Go to vigilmon.online and sign up free
Click Add Monitor
Enter your container's public URL and health endpoint
Set check interval to 1 minute
Vigilmon probes from multiple regions — if 2+ agree it's down, you get alerted

Step 5: Why multi-region matters for containers

Containers get restarted, redeployed, and migrated. During a rolling update, your container might be briefly unreachable to one region's probe but fine everywhere else. A single-probe tool calls that downtime. Multi-region consensus calls it a deploy.

Real downtime looks different: all regions see the same failure, consistently. That's what Vigilmon flags.

What the free tier includes

Unlimited monitors
1-minute check intervals
Multi-region probes
Slack + email alerts
90-day history charts

No credit card. No 14-day trial. Just sign up at vigilmon.online.

Summary

Production-ready Docker monitoring has two layers:

Internal — Docker HEALTHCHECK for container orchestration and auto-restart
External — Vigilmon for multi-region uptime checks that catch what Docker can't

Add both and you'll stop chasing phantom alerts and start trusting your monitoring.

DEV Community