FOLASAYO SAMUEL OLAYEMI

Posted on Jun 25

Deploying a Containerized Backend to a VPS with Docker Compose + GitHub Actions (A Beginner's Runbook)

#devops #programming #tutorial #automation

This is a complete, copy‑pasteable guide for shipping a backend app to a single Linux server using Docker Compose, with a GitHub Actions pipeline that builds the image, scans it, and deploys it over SSH.

It is written to be language- and framework-agnostic. The examples use a Node/TypeScript API with PostgreSQL, Redis, and a background worker, but the same shape works for Python/Django, Go, Java/Spring, Ruby, etc. Anywhere you see your-app, your-org, your-server-ip, or example.com, substitute your own values.

Every file is included in full, and every non-obvious line is explained. The last section — Common errors and how to fix them — is the part most guides skip, and it is the part that will actually save your afternoon. All of it comes from a real deployment, mistakes included.

1. The mental model (read this first)

Before any YAML, understand the shape of what we're building. There are only three places anything lives:

Your Git repository the single source of truth. Your code, your Dockerfile, your docker-compose.prod.yml, and your CI/CD workflows all live here. You only ever edit things here.
A container registry (we use GHCR, GitHub's built-in registry) — a warehouse for the built application image. CI builds the image and pushes it here.
Your server (a plain Linux VPS) pulls the image from the registry and runs it. It holds exactly two files: the compose file (copied from your repo by the pipeline) and a secrets file (.env) that never leaves the server.

The flow, end to end:

You push to main
      │
      ▼
GitHub Actions: build image ──► push to registry ──► scan image
      │
      ▼
GitHub Actions: SSH to server ──► pull image ──► run migrations ──► start app ──► health-check

The single most important rule: the server is disposable. You never hand-edit files on the server, because the pipeline overwrites them from the repo on every deploy. If you fix something by editing on the server, the next deploy silently erases your fix. Edit in the repo, commit, push. (I learned this one the hard way see the errors section.)

2. Architecture of the running stack

On the server, Docker Compose runs several containers on a private network. Only one port is exposed to the outside world, and even that only on loopback (a reverse proxy / ingress handles TLS in front).

Container	What it is	Exposed?
`postgres`	The database	No — internal only
`pgbouncer`	A connection pooler in front of Postgres	No — internal only
`redis`	Cache / job queue / session store	No — internal only
`migrate`	A one-shot container: runs DB migrations, then exits	No
`api`	Your web API process	`127.0.0.1` only
`worker`	Background job processor (same image as api)	`127.0.0.1` only

Two ideas worth internalizing:

One image, two roles. The api and worker are the same built image. They differ only by the command they run. This keeps builds simple and guarantees the API and worker are always the same version.

Boot order matters. Containers must start in dependency order, or you get race conditions: postgres becomes healthy → pgbouncer and redis become healthy → migrate runs and exits cleanly → only then do api and worker start. Compose enforces this with depends_on + health conditions.

3. The Dockerfile

This is a multi-stage build. Each FROM starts a new stage; only the final stage becomes your shipped image. The point of multi-stage is that build tools (compilers, dev dependencies) stay out of the final image, making it smaller and safer.

# syntax=docker/dockerfile:1

# =========================================================
# Base — package manager + workdir, pinned for reproducibility
# =========================================================
FROM node:22-alpine AS base
RUN apk add --no-cache libc6-compat
RUN npm install -g corepack@latest && corepack enable && corepack prepare pnpm@10.16.1 --activate
WORKDIR /app

# =========================================================
# 1. Dependencies (including dev deps — needed to build)
# =========================================================
FROM base AS deps
COPY package.json pnpm-lock.yaml* ./
RUN pnpm install --frozen-lockfile

# =========================================================
# 2. Build — compile source to /dist
# =========================================================
FROM base AS build
COPY --from=deps /app/node_modules ./node_modules
COPY . .
ENV NODE_ENV=production
RUN pnpm build

# =========================================================
# 3. Production dependencies only (no dev deps)
# =========================================================
FROM base AS prod-deps
COPY package.json pnpm-lock.yaml* ./
RUN pnpm install --prod --frozen-lockfile

# =========================================================
# 4. Runner — the final, minimal image
# =========================================================
FROM node:22-alpine AS runner
# tini = correct PID 1 / signal handling; wget = used by container healthchecks.
RUN apk add --no-cache libc6-compat wget tini

# Remove package managers from the runtime image. Migrations call the migration
# CLI via `node` directly, so npm/pnpm aren't needed at runtime and removing
# them shrinks the attack surface (image scanners flag their bundled CVEs).
RUN rm -rf /usr/local/lib/node_modules/npm /usr/local/bin/npm /usr/local/bin/npx \
    /usr/local/bin/corepack /usr/local/lib/node_modules/corepack || true
WORKDIR /app

ENV NODE_ENV=production
ENV PORT=4000
ENV WORKER_PORT=4001

# Run as a NON-root user. Never run app containers as root.
RUN addgroup --system --gid 1001 nodejs && \
    adduser --system --uid 1001 appuser

COPY --from=prod-deps --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=build     --chown=appuser:nodejs /app/dist         ./dist
COPY --chown=appuser:nodejs package.json ./

USER appuser

EXPOSE 4000 4001

# tini is the entrypoint so signals (Ctrl-C, container stop) are handled properly.
ENTRYPOINT ["/sbin/tini", "--"]
# Default command = API. The worker overrides this in the compose file.
CMD ["node", "dist/main"]

Why each stage exists, in plain terms:

base: shared starting point the language runtime and package manager, pinned to exact versions so builds are reproducible.
deps: installs all dependencies (including dev tools) because you need them to compile.
build: compiles your source into a dist/ folder.
prod-deps: installs only production dependencies into a clean folder — this is what ships.
runner: the final image. It copies in the compiled dist/ and the production-only node_modules, runs as a non-root user, and deliberately removes package managers to reduce CVEs.

Adapting to other stacks: Python would pip install into a venv in a build stage and copy the venv into a slim runtime; Go would compile a static binary in a build stage and copy just the binary into a scratch/distroless image. The pattern is identical: build fat, ship thin, run as non-root.

A small but important detail: the runtime image keeps wget because the container's own healthcheck uses it. If you strip it out, your healthchecks silently break.

4. docker-compose.prod.yml the whole stack in one file

This is the file that runs on the server. It is self-contained: the only other file it needs is .env. No source code on the server, no separate init scripts everything is inlined.

Requires Docker Compose v2.23.1+ (for the inline configs.content feature used below). Check with docker compose version.

name: your-app

# Shared application environment. Secrets are interpolated from .env.
# Defining them once here and reusing via a YAML anchor avoids copy-paste drift.
x-app-env: &app-env
  NODE_ENV: production
  PORT: "4000"
  WORKER_PORT: "4001"
  # The app connects through pgbouncer; the migrator connects to postgres directly.
  DATABASE_URL: postgresql://app_user:${APP_DB_PASSWORD}@pgbouncer:5432/appdb
  DATABASE_MIGRATOR_URL: postgresql://migrator_user:${MIGRATOR_DB_PASSWORD}@postgres:5432/appdb
  REDIS_URL: redis://redis:6379
  JWT_SECRET: ${JWT_SECRET}
  S3_ENDPOINT: ${S3_ENDPOINT}
  S3_BUCKET: ${S3_BUCKET}
  S3_ACCESS_KEY: ${S3_ACCESS_KEY}
  S3_SECRET_KEY: ${S3_SECRET_KEY}
  LOG_LEVEL: ${LOG_LEVEL:-info}

configs:
  # The database init script, inlined. It runs ONCE, only when the postgres
  # data volume is first created (i.e. an empty database). Passwords are
  # interpolated from .env, so the committed compose file contains no secrets.
  postgres_init:
    content: |
      CREATE ROLE app_user      WITH LOGIN PASSWORD '${APP_DB_PASSWORD}';
      CREATE ROLE migrator_user WITH LOGIN PASSWORD '${MIGRATOR_DB_PASSWORD}';

      -- Timeouts set at the ROLE level. Under pgbouncer transaction pooling,
      -- per-session SETs don't reliably stick, so role-level is the safe place.
      ALTER ROLE app_user SET statement_timeout = '15s';
      ALTER ROLE app_user SET idle_in_transaction_session_timeout = '15s';

      GRANT CONNECT ON DATABASE appdb TO app_user, migrator_user;

      -- The migrator needs to create schemas, so it needs CREATE on the database.
      -- Without this, the first migration fails: "permission denied for database".
      GRANT CREATE ON DATABASE appdb TO migrator_user;

      -- Many ORMs write a "migrations" bookkeeping table into a custom schema
      -- BEFORE running the migration that would create that schema a chicken
      -- and egg. Pre-create the schema here so the first run can't fail with
      -- "schema ... does not exist". (Use the schema name YOUR app expects.)
      CREATE SCHEMA IF NOT EXISTS platform AUTHORIZATION migrator_user;

      GRANT USAGE, CREATE ON SCHEMA public TO migrator_user;
      GRANT USAGE          ON SCHEMA public TO app_user;

      -- Tables the migrator creates later should be usable by the app user.
      ALTER DEFAULT PRIVILEGES FOR ROLE migrator_user IN SCHEMA public
        GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app_user;
      ALTER DEFAULT PRIVILEGES FOR ROLE migrator_user IN SCHEMA public
        GRANT USAGE, SELECT ON SEQUENCES TO app_user;

services:
  postgres:
    image: postgres:16
    restart: unless-stopped
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?set POSTGRES_PASSWORD in .env}
      POSTGRES_DB: appdb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    configs:
      - source: postgres_init
        target: /docker-entrypoint-initdb.d/01-init.sql
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -U postgres -d appdb']
      interval: 5s
      timeout: 5s
      retries: 10
    networks: [backend]
    # NOT published — the database must never be reachable from the internet.

  pgbouncer:
    image: edoburu/pgbouncer:latest
    restart: unless-stopped
    environment:
      DB_HOST: postgres
      DB_NAME: appdb
      DB_USER: app_user
      DB_PASSWORD: ${APP_DB_PASSWORD:?set APP_DB_PASSWORD in .env}
      AUTH_TYPE: scram-sha-256
      POOL_MODE: transaction
      MAX_CLIENT_CONN: 200
      DEFAULT_POOL_SIZE: 20
      # DB drivers send these as connection "startup parameters". In transaction
      # pooling mode pgbouncer rejects unknown ones with "unsupported startup
      # parameter". List the ones your driver sends so pgbouncer tolerates them.
      IGNORE_STARTUP_PARAMETERS: extra_float_digits,statement_timeout,lock_timeout,idle_in_transaction_session_timeout
    depends_on:
      postgres:
        condition: service_healthy
    healthcheck:
      test: ['CMD', 'pg_isready', '-h', '127.0.0.1', '-p', '5432', '-U', 'app_user', '-d', 'appdb']
      interval: 5s
      timeout: 3s
      retries: 10
    networks: [backend]

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    # noeviction: this Redis holds real state (jobs, sessions), not just cache,
    # so fail loudly rather than silently dropping keys. AOF persists to disk.
    command: ['redis-server', '--maxmemory', '256mb', '--maxmemory-policy', 'noeviction', '--appendonly', 'yes']
    volumes:
      - redis_data:/data
    healthcheck:
      test: ['CMD', 'redis-cli', 'ping']
      interval: 5s
      timeout: 3s
      retries: 10
    networks: [backend]

  # One-shot migrations. Must exit 0 before api/worker start.
  migrate:
    image: ${BACKEND_IMAGE:-ghcr.io/your-org/your-app:latest}
    init: true
    restart: 'no'
    command: ['node', 'dist/migrate']   # however YOUR app runs migrations
    environment:
      <<: *app-env
    depends_on:
      postgres:
        condition: service_healthy
    networks: [backend]

  api:
    image: ${BACKEND_IMAGE:-ghcr.io/your-org/your-app:latest}
    init: true
    restart: unless-stopped
    command: ['node', 'dist/main']
    environment:
      <<: *app-env
      PROCESS_ROLE: api
    ports:
      - '127.0.0.1:4000:4000'   # loopback only; reverse proxy sits in front
    depends_on:
      migrate:
        condition: service_completed_successfully
      pgbouncer:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ['CMD', 'wget', '-qO-', 'http://localhost:4000/api/health']
      interval: 15s
      timeout: 5s
      retries: 10
      start_period: 120s   # grace period for cold start before failures count
    networks: [backend]

  worker:
    image: ${BACKEND_IMAGE:-ghcr.io/your-org/your-app:latest}
    init: true
    restart: unless-stopped
    command: ['node', 'dist/worker']
    environment:
      <<: *app-env
      PROCESS_ROLE: worker
    ports:
      - '127.0.0.1:4001:4001'
    depends_on:
      migrate:
        condition: service_completed_successfully
      pgbouncer:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ['CMD', 'wget', '-qO-', 'http://localhost:4001/health']
      interval: 15s
      timeout: 5s
      retries: 10
      start_period: 120s
    networks: [backend]

volumes:
  postgres_data:
  redis_data:

networks:
  backend:
    driver: bridge

The parts that trip people up, explained

name: your-app this is the Compose project name. It is not cosmetic: Compose prefixes your volume names with it (e.g. your-app_postgres_data). If you change this name, Compose looks for differently-named volumes and your database appears to vanish it's still on disk under the old name, but the stack now points at a new, empty volume. Pin this and never change it. This is the single most dangerous footgun in the whole file.

x-app-env: &app-env the &app-env defines a YAML anchor (a reusable block). Each service then writes <<: *app-env to merge that block in (*app-env is a reference to the anchor). This is why all three app containers share identical env without copy-paste. If you delete the anchor line but leave the *app-env references, the file won't parse the references point at nothing.

${VAR:?error message} fail fast. If VAR isn't set in .env, Compose refuses to start with your message instead of booting with a broken config.

${VAR:-default} use default if VAR isn't set. Good for optional tuning values.

configs: with inline content: lets you ship the database init SQL inside the compose file, with no separate file to copy. It's mounted into Postgres's docker-entrypoint-initdb.d/, which Postgres runs only on first boot of an empty data volume. Remember that last part see the migration error below.

depends_on with condition: this is what gives you correct boot order. service_healthy waits for a container's healthcheck to pass; service_completed_successfully waits for the one-shot migrate to exit 0.

start_period: 120s on healthchecks during this window, failing health probes don't count against the container. Apps that map hundreds of routes or warm caches can take a while; without a grace period the orchestrator declares them dead before they finish booting.

Why pgbouncer at all? A connection pooler sits between your app and Postgres so that many short app connections share a small number of real database connections. It dramatically reduces DB load. The catch is transaction pooling mode is stricter about connection "startup parameters" hence IGNORE_STARTUP_PARAMETERS (more in the errors section).

5. The secrets file: .env.example

Commit .env.example (a template with empty values). The real .env is created on the server by hand, once, and is never committed.

# Copy to ".env" (literal name) next to docker-compose.prod.yml ON THE SERVER.
# docker compose reads it automatically for ${...} interpolation.
# NEVER commit the real .env.

# --- Secrets (generate once; store in a password manager) ------------------
# IMPORTANT: these values go INTO connection URLs, so use URL-SAFE values.
# `openssl rand -base64` can emit + / = which break URL parsing — prefer hex:
#   openssl rand -hex 32
POSTGRES_PASSWORD=
APP_DB_PASSWORD=
MIGRATOR_DB_PASSWORD=
JWT_SECRET=                 # at least 32 characters

# --- External object storage (S3-compatible) -------------------------------
S3_ENDPOINT=
S3_BUCKET=
S3_ACCESS_KEY=
S3_SECRET_KEY=
S3_REGION=

# --- Optional overrides (sensible defaults applied in compose) -------------
# LOG_LEVEL=info

# --- Image (the deploy workflow sets this automatically; only set to pin) --
# BACKEND_IMAGE=ghcr.io/your-org/your-app:latest

Generate passwords with openssl rand -hex 32, not -base64. Base64 output can contain +, /, and =, which break when embedded in a postgresql://user:password@host/db URL. Hex is always URL-safe. This is a genuinely sneaky bug the password "looks fine" but the connection string is silently malformed.

6. CI part 1: code-quality.yml (runs first, on every push)

This workflow runs static analysis / a quality gate. The deploy workflow only triggers if this one succeeds, so it acts as a gate. (Swap SonarQube for whatever you use — ESLint, CodeQL, etc.)

name: CodeQuality Checks

on:
  push:
    branches:
      - main

jobs:
  code-quality:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v6
        with:
          fetch-depth: 0   # full history; some scanners need it for blame/new-code

      - name: Static analysis scan
        uses: sonarsource/sonarqube-scan-action@v5
        env:
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
          SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}

      - name: Quality Gate
        uses: sonarsource/sonarqube-quality-gate-action@v1
        timeout-minutes: 5
        env:
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
          # For a SELF-HOSTED scanner you MUST pass the host URL here too, or the
          # gate action defaults to the cloud service, can't find your project,
          # and fails with a confusing HTTP 404.
          SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}

The one thing worth highlighting: a self-hosted quality scanner needs its SONAR_HOST_URL on both the scan step and the gate step. Miss it on the gate step and you get a 404 that looks like a credentials problem but isn't.

7. CI part 2: main.yml (build, scan, deploy)

This is the workhorse. It triggers after the quality workflow completes, and runs four jobs in sequence: dependency audit → build & push image → scan image → deploy.

name: Deploy

on:
  workflow_run:
    workflows: ["CodeQuality Checks"]   # only runs after the quality workflow
    types: [completed]
    branches: [main]

env:
  REGISTRY: ghcr.io

concurrency:
  group: deploy
  cancel-in-progress: false   # never interrupt an in-flight deploy

jobs:
  # 1) Block the deploy if a production dependency has a known high-severity CVE
  dependency-check:
    name: Dependency Vulnerability Check
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          ref: ${{ github.event.workflow_run.head_sha }}
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '22'
          cache: 'pnpm'
      - run: pnpm install --frozen-lockfile
      - name: Audit production dependencies (blocking)
        run: pnpm audit --prod --audit-level=high
      - name: Audit everything (report only)
        run: pnpm audit --audit-level=high
        continue-on-error: true

  # 2) Build the image once, push to the registry
  build-and-push:
    name: Build & Push Image
    needs: dependency-check
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write   # needed to push to GHCR
    outputs:
      image: ${{ steps.image-name.outputs.image }}
    steps:
      - uses: actions/checkout@v6
        with:
          ref: ${{ github.event.workflow_run.head_sha }}
      - name: Compute lowercase image name
        id: image-name
        run: echo "image=ghcr.io/$(echo '${{ github.repository }}' | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
      - uses: docker/setup-buildx-action@v4
      - name: Log in to registry
        uses: docker/login-action@v4
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Image metadata (tags)
        id: meta
        uses: docker/metadata-action@v6
        with:
          images: ${{ steps.image-name.outputs.image }}
          tags: |
            type=sha,prefix=sha-
            type=raw,value=latest,enable={{is_default_branch}}
      - name: Build and push
        uses: docker/build-push-action@v7
        with:
          context: .
          file: ./Dockerfile
          target: runner
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # 3) Scan the built image for OS/package CVEs; fail on CRITICAL/HIGH
  image-scan:
    name: Container Security Scan
    needs: build-and-push
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: read
    steps:
      - name: Log in to registry
        uses: docker/login-action@v4
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Trivy scan
        uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # pin actions by SHA
        with:
          image-ref: ${{ needs.build-and-push.outputs.image }}:latest
          severity: CRITICAL,HIGH
          exit-code: '1'
          ignore-unfixed: true   # don't fail on CVEs with no fix available yet

  # 4) Deploy: copy compose to server, pull image, migrate, start, health-check
  deploy:
    name: Deploy to Server
    needs: [build-and-push, image-scan]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          ref: ${{ github.event.workflow_run.head_sha }}

      - name: Ensure deploy directory exists
        uses: appleboy/ssh-action@v1.2.5
        with:
          host: ${{ secrets.SERVER_IP }}
          username: ${{ secrets.SERVER_USER }}
          key: ${{ secrets.SERVER_SSH_KEY }}
          port: 22
          script: mkdir -p "${{ secrets.DEPLOY_PATH }}"

      # The compose file is the source of truth in git and is shipped to the
      # server EVERY deploy (overwrite: true), so the server can never drift.
      - name: Copy compose file to server
        uses: appleboy/scp-action@v0.1.7
        with:
          host: ${{ secrets.SERVER_IP }}
          username: ${{ secrets.SERVER_USER }}
          key: ${{ secrets.SERVER_SSH_KEY }}
          port: 22
          source: docker-compose.prod.yml
          target: ${{ secrets.DEPLOY_PATH }}
          overwrite: true

      - name: Deploy over SSH
        uses: appleboy/ssh-action@v1.2.5
        env:
          GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
          IMAGE: ${{ needs.build-and-push.outputs.image }}
          DEPLOY_PATH: ${{ secrets.DEPLOY_PATH }}
        with:
          host: ${{ secrets.SERVER_IP }}
          username: ${{ secrets.SERVER_USER }}
          key: ${{ secrets.SERVER_SSH_KEY }}
          port: 22
          envs: GHCR_TOKEN,IMAGE,DEPLOY_PATH
          script: |
            set -euo pipefail
            echo "$GHCR_TOKEN" | docker login ghcr.io -u ${{ secrets.GHCR_USERNAME }} --password-stdin

            cd "$DEPLOY_PATH"

            # .env holds all secrets and is never in git — it must already exist.
            if [ ! -f .env ]; then
              echo "ERROR: $DEPLOY_PATH/.env is missing. Create it from .env.example first."
              exit 1
            fi

            export BACKEND_IMAGE="${IMAGE}:latest"
            docker pull "$BACKEND_IMAGE"

            # No `down` named volumes are never touched, so zero data loss and
            # no DB downtime. The one-shot migrate runs forward-only migrations
            # and must exit 0; if it fails, `up` returns non-zero and we stop.
            docker compose -f docker-compose.prod.yml up -d --remove-orphans

            # Wait on the CONTAINER healthcheck (the single source of truth),
            # not a separate host-side probe. 5-minute budget for cold starts.
            echo "Waiting for services to become healthy (up to 5 min)..."
            deadline=$((SECONDS + 300))
            for svc in api worker; do
              cid="$(docker compose -f docker-compose.prod.yml ps -q "$svc")"
              if [ -z "$cid" ]; then
                echo "ERROR: $svc container not created."; docker compose ps; exit 1
              fi
              while true; do
                status="$(docker inspect -f '{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "$cid" 2>/dev/null || echo missing)"
                case "$status" in
                  healthy)   echo "$svc: healthy"; break ;;
                  unhealthy) echo "ERROR: $svc unhealthy. Logs:"; docker compose -f docker-compose.prod.yml logs --tail=100 "$svc"; exit 1 ;;
                esac
                if [ "$SECONDS" -ge "$deadline" ]; then
                  echo "ERROR: $svc not healthy in time. Logs:"; docker compose -f docker-compose.prod.yml logs --tail=100 "$svc"; exit 1
                fi
                sleep 5
              done
            done
            echo "All services healthy."

            # Prune only AFTER success, so the previous image stays for rollback.
            docker image prune -f
            docker compose -f docker-compose.prod.yml ps

Why the deploy job is shaped this way

It triggers off the quality workflow (workflow_run), so a bad commit that fails quality never reaches the server.
The image is built once in CI and pushed to the registry. The server only pulls it never builds. Builds are slow and resource-hungry; your small VPS shouldn't do them.
Actions are pinned third-party actions like Trivy are pinned to a commit SHA, not a moving tag, so a compromised release can't silently change what runs in your pipeline.
No docker compose down bringing the stack down can remove containers and (with -v) volumes. We only ever up -d, which recreates changed containers and leaves the database volume untouched. Zero data-layer downtime.
The health gate waits on the container's own healthcheck via docker inspect, with a 5-minute budget. This is more reliable than a separate curl from the host, because it uses the exact probe defined in compose and accounts for slow cold starts.
Prune happens last only after the new version is confirmed healthy, so the previous image is still around for a fast manual rollback if needed.

8. Keeping dependencies fresh: dependabot.yml

Drop this in .github/dependabot.yml. It opens grouped, scheduled PRs to bump dependencies, GitHub Actions versions, and your Docker base image.

version: 2
updates:
  - package-ecosystem: "npm"     # covers package-lock / pnpm-lock
    directory: "/"
    schedule:
      interval: "weekly"
      day: "monday"
    open-pull-requests-limit: 10
    groups:                       # group related bumps into ONE PR to review
      framework:
        patterns: ["@nestjs/*"]
      dev-tooling:
        dependency-type: "development"

  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "weekly"
    groups:
      actions:
        patterns: ["*"]

  - package-ecosystem: "docker"   # bumps your Dockerfile base image
    directory: "/"
    schedule:
      interval: "weekly"

Grouping is the feature that makes Dependabot bearable: instead of twenty separate PRs, you get a handful of grouped ones you can review and merge together.

9. Step-by-step: your first deploy

One-time setup

On GitHub — add these repository secrets (Settings → Secrets and variables → Actions):

Secret	What it is
`SERVER_IP`	Your server's IP, e.g. `your-server-ip`
`SERVER_USER`	SSH user, e.g. `deploy` or `root`
`SERVER_SSH_KEY`	The private SSH key for that user (full text)
`DEPLOY_PATH`	Where the app lives on the server, e.g. `/home/apps/your-app`
`GHCR_TOKEN`	A token that can read your registry images (used by the server to pull)
`GHCR_USERNAME`	The username/org for the registry login
`SONAR_TOKEN` / `SONAR_HOST_URL`	If you use a quality gate

On the server — install Docker + Compose, create the deploy directory and the secrets file:

# Install Docker (official convenience script) and verify Compose v2.23.1+
curl -fsSL https://get.docker.com | sh
docker compose version

mkdir -p /home/apps/your-app
cd /home/apps/your-app

# Create the real .env from your template, then fill in generated secrets.
nano .env
# POSTGRES_PASSWORD=...     (openssl rand -hex 32)
# APP_DB_PASSWORD=...       (openssl rand -hex 32)
# MIGRATOR_DB_PASSWORD=...  (openssl rand -hex 32)
# JWT_SECRET=...            (openssl rand -hex 32)
# S3_* = ...

That's it for the server. You will not edit anything else here.

Every deploy after that

# 1. Make your change IN THE REPO (code, or the compose file, or a workflow).
# 2. ALWAYS validate the compose file before committing:
docker compose -f docker-compose.prod.yml config >/dev/null && echo "compose OK"

# 3. Commit and push to main:
git add .
git commit -m "your change"
git push origin main

Then watch the Actions tab. The quality workflow runs, then the deploy workflow builds, scans, and ships. Done.

Make docker compose config a reflex. It parses and fully resolves the file (including anchors and .env interpolation) in about a second. It catches the entire class of "the deploy died instantly on a YAML typo" problems before you push. The vast majority of failed first deploys are a malformed compose file that this one command would have caught.

10. Common errors and how to fix them

This is the section I wish every tutorial had. Every one of these is real. They're roughly in the order you hit them as the pipeline gets further each time.

"My edits to the server file keep reverting!"

Cause: you edited docker-compose.prod.yml on the server, but the pipeline copies the repo's version over it (overwrite: true) on every deploy.
Fix: edit the file in the repo, not the server. The server copy is generated output. This is by design — it guarantees the server matches what's reviewed in git. Retrain the muscle memory: never edit on the box.

SSH step fails with "handshake failed" / "permission denied (publickey)"

Cause: the SERVER_SSH_KEY secret is wrong, or the matching public key isn't in the server's ~/.ssh/authorized_keys.
Fix: put the private key (the whole thing, including the BEGIN/END lines) in the secret. Add its public half to authorized_keys for SERVER_USER. Test locally first: ssh -i your_key user@your-server-ip.

Registry login fails: "Error: Cannot perform an interactive login from a non TTY device" or empty password

Cause: the registry token secret is empty or unset, so the docker login gets no password and tries to go interactive.
Fix: set GHCR_TOKEN (and GHCR_USERNAME). Always pipe it: echo "$GHCR_TOKEN" | docker login ghcr.io -u "$USER" --password-stdin.

"stat /path/.env.docker: no such file or directory"

Cause: the compose file references an env file (env_file: .env.docker) that doesn't exist on the server.
Fix: either create that file, or — better — drop the separate env file and define config inline in the compose x-app-env block, reading secrets from the standard .env. One fewer file to manage.

`yaml: line 2: mapping values are not allowed in this context`

Cause: the compose file is malformed — almost always near the top. The classic version: the x-app-env: &app-env anchor line got deleted (often during hand-edits or a bad copy-paste), leaving the env keys with no parent, or a comment lost its leading #.
Fix: restore the structure. Confirm the anchor exists and the references match. Then docker compose config to verify it parses before committing. If you copied the file from somewhere and it got mangled, download the raw file instead of pasting, pasted text can drop indentation or lines.

Migration fails: `schema "..." does not exist` (Postgres code 3F000)

Cause: the ORM tries to create its migrations bookkeeping table inside a custom schema before the migration that would create that schema has run — a chicken-and-egg on a fresh database.
Fix: pre-create the schema in your DB init SQL: CREATE SCHEMA IF NOT EXISTS your_schema AUTHORIZATION migrator_user;. Important: init SQL only runs on a fresh, empty volume. If your volume already exists, also create the schema manually once:

docker compose exec postgres psql -U postgres -d appdb -c \
  "CREATE SCHEMA IF NOT EXISTS your_schema AUTHORIZATION migrator_user;"

Migration fails: `permission denied for database`

Cause: the migrator role can create schemas/tables but wasn't granted CREATE on the database itself.
Fix: in init SQL: GRANT CREATE ON DATABASE appdb TO migrator_user; (and re-run the manual grant if the volume already exists).

App can't connect: `unsupported startup parameter: statement_timeout` (then `lock_timeout`, etc.)

Cause: your DB driver sets session parameters as connection "startup parameters". PgBouncer in transaction pooling mode rejects any it isn't told to allow — and it surfaces them one at a time, so you fix one and hit the next.
Fix: allow the whole set at once on the pgbouncer service:

IGNORE_STARTUP_PARAMETERS: extra_float_digits,statement_timeout,lock_timeout,idle_in_transaction_session_timeout

Enforce the actual timeouts at the role level in init SQL (ALTER ROLE ... SET statement_timeout = ...), because under transaction pooling per-session SETs don't reliably stick.

Deploy reports "did not become healthy in time" but the app log says it started

Cause: the health gate is stricter or faster than the app's real startup, or the health-check path/port is wrong.
Fix: first confirm the app actually serves the health route from inside the container:

docker compose exec api wget -qO- http://localhost:4000/api/health

If that returns OK, it's a timing issue — raise start_period and the deploy's wait budget. If it 404s, fix the path in the healthcheck test:. If it says wget: not found, your runtime image lacks wget — install it or use a node/curl-based check.

The database "disappeared" after I renamed something

Cause: you changed the compose name: (project name). Volumes are namespaced by project name, so the stack now points at a new, empty volume. Your old data is still on disk under the old name.
Fix: never change the project name. To find orphaned data: docker volume ls | grep postgres. This is why a pg_dump backup before any risky change is non-negotiable in production.

Quality gate fails with HTTP 404 (self-hosted scanner)

Cause: the gate step didn't get the scanner host URL, so it defaulted to the cloud service and couldn't find your project.
Fix: pass SONAR_HOST_URL on both the scan and the gate steps.

Dependabot: "security update not possible"

Cause: a vulnerable transitive dependency has no version that satisfies everything else's constraints yet. Common for deep dev-only dependencies.
Fix: if it's dev-only and below your production audit threshold, it's safe to leave until the ecosystem catches up, or add a temporary override/resolution. Don't let a dev-only advisory block production.

A service crashes with an application error (e.g. `TypeError: ... is not iterable`)

Cause: this is not an infrastructure problem the image built and deployed fine; the app code itself is throwing on boot.
Fix: read it as a signal your pipeline is working it caught a real code bug before declaring success. This belongs to whoever owns that part of the application code, not to the deploy config. No amount of compose/workflow tweaking fixes a code bug. Hand it to the right developer with the exact stack trace.

11. A pre-deploy checklist

Pin this somewhere:

[ ] docker compose -f docker-compose.prod.yml config passes locally
[ ] All changes are in the repo, nothing edited directly on the server
[ ] .env exists on the server with every required key filled in (URL-safe secrets via openssl rand -hex 32)
[ ] The compose project name: is unchanged
[ ] All required GitHub secrets are set
[ ] Third-party actions are pinned to SHAs
[ ] Healthcheck path/port match what your app actually serves
[ ] You have a recent database backup (pg_dump) before any risky change

12. Closing lessons

A few things that, in hindsight, mattered more than any single config line:

One source of truth. Edit in the repo; let the server be disposable. Half-adopting this (editing in both places) is worse than not adopting it at all.
Validate before you push. docker compose config turns a 3-minute failed pipeline into a 1-second local check.
Errors get deeper, which is progress. A YAML parse error → a migration error → a connection error → an app boot error is not "still broken" it's each layer passing in turn. Read the new error as a checkpoint reached.
Know the boundary between infra and app. Connection params, schemas, health timing: infra. A TypeError in your own code: not infra. Recognizing which is which saves you from "fixing" the wrong file.
Protect your data. Pin the project name, never down -v casually, and back up before risky changes. Containers are disposable; your database is not.

Happy shipping.

DEV Community

Deploying a Containerized Backend to a VPS with Docker Compose + GitHub Actions (A Beginner's Runbook)

1. The mental model (read this first)

2. Architecture of the running stack

3. The Dockerfile

4. docker-compose.prod.yml the whole stack in one file

The parts that trip people up, explained

5. The secrets file: .env.example

6. CI part 1: code-quality.yml (runs first, on every push)

7. CI part 2: main.yml (build, scan, deploy)

Why the deploy job is shaped this way

8. Keeping dependencies fresh: dependabot.yml

9. Step-by-step: your first deploy

One-time setup

Every deploy after that

10. Common errors and how to fix them

"My edits to the server file keep reverting!"

SSH step fails with "handshake failed" / "permission denied (publickey)"

Registry login fails: "Error: Cannot perform an interactive login from a non TTY device" or empty password

"stat /path/.env.docker: no such file or directory"

`yaml: line 2: mapping values are not allowed in this context`

Migration fails: `schema "..." does not exist` (Postgres code 3F000)

Migration fails: `permission denied for database`

App can't connect: `unsupported startup parameter: statement_timeout` (then `lock_timeout`, etc.)

Deploy reports "did not become healthy in time" but the app log says it started

The database "disappeared" after I renamed something

Quality gate fails with HTTP 404 (self-hosted scanner)

Dependabot: "security update not possible"

A service crashes with an application error (e.g. `TypeError: ... is not iterable`)

11. A pre-deploy checklist

12. Closing lessons

Top comments (0)

1. The mental model (read this first)

2. Architecture of the running stack

3. The Dockerfile

4. docker-compose.prod.yml the whole stack in one file

The parts that trip people up, explained

5. The secrets file: .env.example

6. CI part 1: code-quality.yml (runs first, on every push)

7. CI part 2: main.yml (build, scan, deploy)

Why the deploy job is shaped this way

8. Keeping dependencies fresh: dependabot.yml

9. Step-by-step: your first deploy

One-time setup

Every deploy after that

10. Common errors and how to fix them

"My edits to the server file keep reverting!"

SSH step fails with "handshake failed" / "permission denied (publickey)"

Registry login fails: "Error: Cannot perform an interactive login from a non TTY device" or empty password

"stat /path/.env.docker: no such file or directory"

yaml: line 2: mapping values are not allowed in this context

Migration fails: schema "..." does not exist (Postgres code 3F000)

Migration fails: permission denied for database

App can't connect: unsupported startup parameter: statement_timeout (then lock_timeout, etc.)

Deploy reports "did not become healthy in time" but the app log says it started

The database "disappeared" after I renamed something

Quality gate fails with HTTP 404 (self-hosted scanner)

Dependabot: "security update not possible"

A service crashes with an application error (e.g. TypeError: ... is not iterable)

11. A pre-deploy checklist

12. Closing lessons

`yaml: line 2: mapping values are not allowed in this context`

Migration fails: `schema "..." does not exist` (Postgres code 3F000)

Migration fails: `permission denied for database`

App can't connect: `unsupported startup parameter: statement_timeout` (then `lock_timeout`, etc.)

A service crashes with an application error (e.g. `TypeError: ... is not iterable`)