Alan West

Posted on Mar 28 • Edited on Mar 29

Why Your Self-Hosted App Keeps Dying at 3 AM (And How to Fix It)

#selfhosting #devops #docker #linux

So you spun up a VPS, deployed your app, told everyone it was live — and then woke up to angry Slack messages because the whole thing went down at 3 AM. Welcome to the club.

Self-hosting production applications is one of those things that sounds straightforward until you actually do it. I've been running self-hosted services for about six years now, and the gap between "it works on my server" and "it works reliably in production" is where most of the pain lives. There's actually a massive free guide floating around (750+ pages) covering this exact territory, which reminded me that a lot of developers keep hitting the same walls.

Let me walk through the most common reasons self-hosted apps fail in production and how to actually fix them.

The Root Cause: You Deployed an App, Not a System

Here's the core issue. When you docker compose up -d and walk away, you've deployed an application. But production needs a system — monitoring, automatic restarts, log rotation, backups, resource limits, and reverse proxy configuration that doesn't fall over.

Most 3 AM crashes come down to one of three things:

Memory exhaustion — your app (or its database) slowly ate all available RAM
Disk full — logs or temp files filled the drive
No automatic recovery — the process crashed and nothing restarted it

Let's fix all three.

Step 1: Set Resource Limits (Stop the OOM Killer)

If you're using Docker Compose, you need memory limits. Without them, a single misbehaving container can take down everything on the host.

# docker-compose.yml
services:
  app:
    image: your-app:latest
    deploy:
      resources:
        limits:
          memory: 512M    # hard ceiling — container gets killed past this
          cpus: '1.0'
        reservations:
          memory: 256M    # guaranteed minimum
    restart: unless-stopped  # this alone prevents most 3 AM incidents

  postgres:
    image: postgres:16
    deploy:
      resources:
        limits:
          memory: 1G
    # tune shared_buffers to ~25% of memory limit
    environment:
      POSTGRES_SHARED_BUFFERS: 256MB
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: unless-stopped

That restart: unless-stopped line is doing heavy lifting. It means Docker will automatically restart crashed containers unless you explicitly stopped them. I'm genuinely surprised how many production setups I've seen without it.

Step 2: Fix the Silent Disk Killer

Docker logs will eat your disk alive if you don't configure rotation. By default, Docker just appends JSON logs forever. I learned this the hard way when a 40GB log file took down a production Postgres instance.

Add this to your Docker daemon config:

// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Restart Docker after changing this. Existing containers need to be recreated (not just restarted) to pick up the new logging config.

While you're at it, set up a basic disk monitoring cron job:

#!/bin/bash
# /usr/local/bin/disk-check.sh
# Alert when disk usage crosses 85%

THRESHOLD=85
USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')

if [ "$USAGE" -gt "$THRESHOLD" ]; then
  # swap this for your preferred notification method
  curl -X POST "https://your-webhook-url" \
    -H "Content-Type: application/json" \
    -d "{\"text\": \"Disk usage at ${USAGE}% on $(hostname)\"}"
fi

Schedule it every 15 minutes with cron and you'll never be surprised by a full disk again.

Step 3: Add Health Checks That Actually Work

Docker health checks let you detect when your app is technically running but not actually working — like when your Node.js server is up but stuck in an event loop block.

# docker-compose.yml
services:
  app:
    image: your-app:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s  # grace period for startup
    restart: unless-stopped

But here's the thing most guides skip: your /health endpoint needs to actually check dependencies. Don't just return 200.

// Express health check that actually means something
app.get('/health', async (req, res) => {
  try {
    // check database connection
    await db.query('SELECT 1');
    // check redis if you use it
    await redis.ping();
    res.status(200).json({ status: 'ok' });
  } catch (err) {
    // returning 503 makes Docker mark container as unhealthy
    res.status(503).json({ status: 'degraded', error: err.message });
  }
});

Step 4: Reverse Proxy Configuration That Doesn't Suck

If you're exposing services to the internet, you need a reverse proxy. Caddy has become my go-to because it handles TLS certificates automatically and the config is minimal.

# Caddyfile
yourapp.example.com {
    reverse_proxy app:3000 {
        # passive health checks — stop sending traffic to dead upstreams
        health_uri /health
        health_interval 30s
    }

    # basic rate limiting to prevent abuse
    rate_limit {
        zone dynamic {
            key {remote_host}
            events 100
            window 1m
        }
    }

    encode gzip
    log {
        output file /var/log/caddy/access.log {
            roll_size 50mb
            roll_keep 5
        }
    }
}

Caddy handles HTTPS automatically through Let's Encrypt. No certbot cron jobs, no renewal scripts. It just works.

Step 5: Backups (The Thing You'll Wish You Had)

I know. Backups are boring. But future-you will be incredibly grateful. Here's a minimal but functional approach for Postgres:

#!/bin/bash
# /usr/local/bin/backup-db.sh

BACKUP_DIR="/backups/postgres"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7

# dump the database from the running container
docker exec postgres pg_dump -U appuser -Fc appdb > "${BACKUP_DIR}/app_${TIMESTAMP}.dump"

# clean up old backups
find "$BACKUP_DIR" -name "*.dump" -mtime +$RETENTION_DAYS -delete

# optional: sync to remote storage
# rclone copy "$BACKUP_DIR" remote:backups/postgres --max-age 24h

Run this daily via cron. Uncomment the rclone line when you've set up remote storage — local-only backups on the same server are better than nothing, but not by much.

Prevention: The Checklist

Before you consider any self-hosted deployment "production ready," run through this:

Resource limits set for every container
Restart policies configured (unless-stopped at minimum)
Log rotation enabled at the Docker daemon level
Health checks that verify actual functionality, not just process liveness
TLS termination via a reverse proxy with automatic cert renewal
Automated backups with at least one off-server copy
Disk and memory monitoring with alerts
Firewall rules — only expose ports 80, 443, and your SSH port
Unattended security updates enabled on the host OS

You don't need Kubernetes for this. You don't need a managed platform. A single well-configured VPS with Docker Compose can reliably host a surprising amount of production traffic. The key word is well-configured.

The Bigger Picture

Self-hosting is making a comeback for good reasons — cost control, data sovereignty, and honestly just the satisfaction of running your own infrastructure. But the gap between tutorials and production-grade setups is real, and it's where most people get burned.

The pattern is almost always the same: the app itself is fine, but the operational wrapper around it is missing. Add restart policies, resource limits, health checks, log management, and backups, and you've eliminated probably 90% of the 3 AM pages.

Now go set up those log rotation limits before your disk fills up. Ask me how I know this is urgent.

DEV Community