This is a complete, copy‑pasteable guide for shipping a backend app to a single Linux server using Docker Compose, with a GitHub Actions pipeline that builds the image, scans it, and deploys it over SSH.
It is written to be language- and framework-agnostic. The examples use a Node/TypeScript API with PostgreSQL, Redis, and a background worker, but the same shape works for Python/Django, Go, Java/Spring, Ruby, etc. Anywhere you see your-app, your-org, your-server-ip, or example.com, substitute your own values.
Every file is included in full, and every non-obvious line is explained. The last section — Common errors and how to fix them — is the part most guides skip, and it is the part that will actually save your afternoon. All of it comes from a real deployment, mistakes included.
1. The mental model (read this first)
Before any YAML, understand the shape of what we're building. There are only three places anything lives:
-
Your Git repository the single source of truth. Your code, your
Dockerfile, yourdocker-compose.prod.yml, and your CI/CD workflows all live here. You only ever edit things here. - A container registry (we use GHCR, GitHub's built-in registry) — a warehouse for the built application image. CI builds the image and pushes it here.
-
Your server (a plain Linux VPS) pulls the image from the registry and runs it. It holds exactly two files: the compose file (copied from your repo by the pipeline) and a secrets file (
.env) that never leaves the server.
The flow, end to end:
You push to main
│
▼
GitHub Actions: build image ──► push to registry ──► scan image
│
▼
GitHub Actions: SSH to server ──► pull image ──► run migrations ──► start app ──► health-check
The single most important rule: the server is disposable. You never hand-edit files on the server, because the pipeline overwrites them from the repo on every deploy. If you fix something by editing on the server, the next deploy silently erases your fix. Edit in the repo, commit, push. (I learned this one the hard way see the errors section.)
2. Architecture of the running stack
On the server, Docker Compose runs several containers on a private network. Only one port is exposed to the outside world, and even that only on loopback (a reverse proxy / ingress handles TLS in front).
| Container | What it is | Exposed? |
|---|---|---|
postgres |
The database | No — internal only |
pgbouncer |
A connection pooler in front of Postgres | No — internal only |
redis |
Cache / job queue / session store | No — internal only |
migrate |
A one-shot container: runs DB migrations, then exits | No |
api |
Your web API process |
127.0.0.1 only |
worker |
Background job processor (same image as api) |
127.0.0.1 only |
Two ideas worth internalizing:
One image, two roles. The api and worker are the same built image. They differ only by the command they run. This keeps builds simple and guarantees the API and worker are always the same version.
Boot order matters. Containers must start in dependency order, or you get race conditions: postgres becomes healthy → pgbouncer and redis become healthy → migrate runs and exits cleanly → only then do api and worker start. Compose enforces this with depends_on + health conditions.
3. The Dockerfile
This is a multi-stage build. Each FROM starts a new stage; only the final stage becomes your shipped image. The point of multi-stage is that build tools (compilers, dev dependencies) stay out of the final image, making it smaller and safer.
# syntax=docker/dockerfile:1
# =========================================================
# Base — package manager + workdir, pinned for reproducibility
# =========================================================
FROM node:22-alpine AS base
RUN apk add --no-cache libc6-compat
RUN npm install -g corepack@latest && corepack enable && corepack prepare pnpm@10.16.1 --activate
WORKDIR /app
# =========================================================
# 1. Dependencies (including dev deps — needed to build)
# =========================================================
FROM base AS deps
COPY package.json pnpm-lock.yaml* ./
RUN pnpm install --frozen-lockfile
# =========================================================
# 2. Build — compile source to /dist
# =========================================================
FROM base AS build
COPY --from=deps /app/node_modules ./node_modules
COPY . .
ENV NODE_ENV=production
RUN pnpm build
# =========================================================
# 3. Production dependencies only (no dev deps)
# =========================================================
FROM base AS prod-deps
COPY package.json pnpm-lock.yaml* ./
RUN pnpm install --prod --frozen-lockfile
# =========================================================
# 4. Runner — the final, minimal image
# =========================================================
FROM node:22-alpine AS runner
# tini = correct PID 1 / signal handling; wget = used by container healthchecks.
RUN apk add --no-cache libc6-compat wget tini
# Remove package managers from the runtime image. Migrations call the migration
# CLI via `node` directly, so npm/pnpm aren't needed at runtime and removing
# them shrinks the attack surface (image scanners flag their bundled CVEs).
RUN rm -rf /usr/local/lib/node_modules/npm /usr/local/bin/npm /usr/local/bin/npx \
/usr/local/bin/corepack /usr/local/lib/node_modules/corepack || true
WORKDIR /app
ENV NODE_ENV=production
ENV PORT=4000
ENV WORKER_PORT=4001
# Run as a NON-root user. Never run app containers as root.
RUN addgroup --system --gid 1001 nodejs && \
adduser --system --uid 1001 appuser
COPY --from=prod-deps --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=build --chown=appuser:nodejs /app/dist ./dist
COPY --chown=appuser:nodejs package.json ./
USER appuser
EXPOSE 4000 4001
# tini is the entrypoint so signals (Ctrl-C, container stop) are handled properly.
ENTRYPOINT ["/sbin/tini", "--"]
# Default command = API. The worker overrides this in the compose file.
CMD ["node", "dist/main"]
Why each stage exists, in plain terms:
- base: shared starting point the language runtime and package manager, pinned to exact versions so builds are reproducible.
- deps: installs all dependencies (including dev tools) because you need them to compile.
-
build: compiles your source into a
dist/folder. - prod-deps: installs only production dependencies into a clean folder — this is what ships.
-
runner: the final image. It copies in the compiled
dist/and the production-onlynode_modules, runs as a non-root user, and deliberately removes package managers to reduce CVEs.
Adapting to other stacks: Python would
pip installinto a venv in a build stage and copy the venv into a slim runtime; Go would compile a static binary in a build stage and copy just the binary into ascratch/distrolessimage. The pattern is identical: build fat, ship thin, run as non-root.
A small but important detail: the runtime image keeps wget because the container's own healthcheck uses it. If you strip it out, your healthchecks silently break.
4. docker-compose.prod.yml the whole stack in one file
This is the file that runs on the server. It is self-contained: the only other file it needs is .env. No source code on the server, no separate init scripts everything is inlined.
Requires Docker Compose v2.23.1+ (for the inline
configs.contentfeature used below). Check withdocker compose version.
name: your-app
# Shared application environment. Secrets are interpolated from .env.
# Defining them once here and reusing via a YAML anchor avoids copy-paste drift.
x-app-env: &app-env
NODE_ENV: production
PORT: "4000"
WORKER_PORT: "4001"
# The app connects through pgbouncer; the migrator connects to postgres directly.
DATABASE_URL: postgresql://app_user:${APP_DB_PASSWORD}@pgbouncer:5432/appdb
DATABASE_MIGRATOR_URL: postgresql://migrator_user:${MIGRATOR_DB_PASSWORD}@postgres:5432/appdb
REDIS_URL: redis://redis:6379
JWT_SECRET: ${JWT_SECRET}
S3_ENDPOINT: ${S3_ENDPOINT}
S3_BUCKET: ${S3_BUCKET}
S3_ACCESS_KEY: ${S3_ACCESS_KEY}
S3_SECRET_KEY: ${S3_SECRET_KEY}
LOG_LEVEL: ${LOG_LEVEL:-info}
configs:
# The database init script, inlined. It runs ONCE, only when the postgres
# data volume is first created (i.e. an empty database). Passwords are
# interpolated from .env, so the committed compose file contains no secrets.
postgres_init:
content: |
CREATE ROLE app_user WITH LOGIN PASSWORD '${APP_DB_PASSWORD}';
CREATE ROLE migrator_user WITH LOGIN PASSWORD '${MIGRATOR_DB_PASSWORD}';
-- Timeouts set at the ROLE level. Under pgbouncer transaction pooling,
-- per-session SETs don't reliably stick, so role-level is the safe place.
ALTER ROLE app_user SET statement_timeout = '15s';
ALTER ROLE app_user SET idle_in_transaction_session_timeout = '15s';
GRANT CONNECT ON DATABASE appdb TO app_user, migrator_user;
-- The migrator needs to create schemas, so it needs CREATE on the database.
-- Without this, the first migration fails: "permission denied for database".
GRANT CREATE ON DATABASE appdb TO migrator_user;
-- Many ORMs write a "migrations" bookkeeping table into a custom schema
-- BEFORE running the migration that would create that schema a chicken
-- and egg. Pre-create the schema here so the first run can't fail with
-- "schema ... does not exist". (Use the schema name YOUR app expects.)
CREATE SCHEMA IF NOT EXISTS platform AUTHORIZATION migrator_user;
GRANT USAGE, CREATE ON SCHEMA public TO migrator_user;
GRANT USAGE ON SCHEMA public TO app_user;
-- Tables the migrator creates later should be usable by the app user.
ALTER DEFAULT PRIVILEGES FOR ROLE migrator_user IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app_user;
ALTER DEFAULT PRIVILEGES FOR ROLE migrator_user IN SCHEMA public
GRANT USAGE, SELECT ON SEQUENCES TO app_user;
services:
postgres:
image: postgres:16
restart: unless-stopped
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?set POSTGRES_PASSWORD in .env}
POSTGRES_DB: appdb
volumes:
- postgres_data:/var/lib/postgresql/data
configs:
- source: postgres_init
target: /docker-entrypoint-initdb.d/01-init.sql
healthcheck:
test: ['CMD-SHELL', 'pg_isready -U postgres -d appdb']
interval: 5s
timeout: 5s
retries: 10
networks: [backend]
# NOT published — the database must never be reachable from the internet.
pgbouncer:
image: edoburu/pgbouncer:latest
restart: unless-stopped
environment:
DB_HOST: postgres
DB_NAME: appdb
DB_USER: app_user
DB_PASSWORD: ${APP_DB_PASSWORD:?set APP_DB_PASSWORD in .env}
AUTH_TYPE: scram-sha-256
POOL_MODE: transaction
MAX_CLIENT_CONN: 200
DEFAULT_POOL_SIZE: 20
# DB drivers send these as connection "startup parameters". In transaction
# pooling mode pgbouncer rejects unknown ones with "unsupported startup
# parameter". List the ones your driver sends so pgbouncer tolerates them.
IGNORE_STARTUP_PARAMETERS: extra_float_digits,statement_timeout,lock_timeout,idle_in_transaction_session_timeout
depends_on:
postgres:
condition: service_healthy
healthcheck:
test: ['CMD', 'pg_isready', '-h', '127.0.0.1', '-p', '5432', '-U', 'app_user', '-d', 'appdb']
interval: 5s
timeout: 3s
retries: 10
networks: [backend]
redis:
image: redis:7-alpine
restart: unless-stopped
# noeviction: this Redis holds real state (jobs, sessions), not just cache,
# so fail loudly rather than silently dropping keys. AOF persists to disk.
command: ['redis-server', '--maxmemory', '256mb', '--maxmemory-policy', 'noeviction', '--appendonly', 'yes']
volumes:
- redis_data:/data
healthcheck:
test: ['CMD', 'redis-cli', 'ping']
interval: 5s
timeout: 3s
retries: 10
networks: [backend]
# One-shot migrations. Must exit 0 before api/worker start.
migrate:
image: ${BACKEND_IMAGE:-ghcr.io/your-org/your-app:latest}
init: true
restart: 'no'
command: ['node', 'dist/migrate'] # however YOUR app runs migrations
environment:
<<: *app-env
depends_on:
postgres:
condition: service_healthy
networks: [backend]
api:
image: ${BACKEND_IMAGE:-ghcr.io/your-org/your-app:latest}
init: true
restart: unless-stopped
command: ['node', 'dist/main']
environment:
<<: *app-env
PROCESS_ROLE: api
ports:
- '127.0.0.1:4000:4000' # loopback only; reverse proxy sits in front
depends_on:
migrate:
condition: service_completed_successfully
pgbouncer:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ['CMD', 'wget', '-qO-', 'http://localhost:4000/api/health']
interval: 15s
timeout: 5s
retries: 10
start_period: 120s # grace period for cold start before failures count
networks: [backend]
worker:
image: ${BACKEND_IMAGE:-ghcr.io/your-org/your-app:latest}
init: true
restart: unless-stopped
command: ['node', 'dist/worker']
environment:
<<: *app-env
PROCESS_ROLE: worker
ports:
- '127.0.0.1:4001:4001'
depends_on:
migrate:
condition: service_completed_successfully
pgbouncer:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ['CMD', 'wget', '-qO-', 'http://localhost:4001/health']
interval: 15s
timeout: 5s
retries: 10
start_period: 120s
networks: [backend]
volumes:
postgres_data:
redis_data:
networks:
backend:
driver: bridge
The parts that trip people up, explained
name: your-app this is the Compose project name. It is not cosmetic: Compose prefixes your volume names with it (e.g. your-app_postgres_data). If you change this name, Compose looks for differently-named volumes and your database appears to vanish it's still on disk under the old name, but the stack now points at a new, empty volume. Pin this and never change it. This is the single most dangerous footgun in the whole file.
x-app-env: &app-env the &app-env defines a YAML anchor (a reusable block). Each service then writes <<: *app-env to merge that block in (*app-env is a reference to the anchor). This is why all three app containers share identical env without copy-paste. If you delete the anchor line but leave the *app-env references, the file won't parse the references point at nothing.
${VAR:?error message} fail fast. If VAR isn't set in .env, Compose refuses to start with your message instead of booting with a broken config.
${VAR:-default} use default if VAR isn't set. Good for optional tuning values.
configs: with inline content: lets you ship the database init SQL inside the compose file, with no separate file to copy. It's mounted into Postgres's docker-entrypoint-initdb.d/, which Postgres runs only on first boot of an empty data volume. Remember that last part see the migration error below.
depends_on with condition: this is what gives you correct boot order. service_healthy waits for a container's healthcheck to pass; service_completed_successfully waits for the one-shot migrate to exit 0.
start_period: 120s on healthchecks during this window, failing health probes don't count against the container. Apps that map hundreds of routes or warm caches can take a while; without a grace period the orchestrator declares them dead before they finish booting.
Why pgbouncer at all? A connection pooler sits between your app and Postgres so that many short app connections share a small number of real database connections. It dramatically reduces DB load. The catch is transaction pooling mode is stricter about connection "startup parameters" hence IGNORE_STARTUP_PARAMETERS (more in the errors section).
5. The secrets file: .env.example
Commit .env.example (a template with empty values). The real .env is created on the server by hand, once, and is never committed.
# Copy to ".env" (literal name) next to docker-compose.prod.yml ON THE SERVER.
# docker compose reads it automatically for ${...} interpolation.
# NEVER commit the real .env.
# --- Secrets (generate once; store in a password manager) ------------------
# IMPORTANT: these values go INTO connection URLs, so use URL-SAFE values.
# `openssl rand -base64` can emit + / = which break URL parsing — prefer hex:
# openssl rand -hex 32
POSTGRES_PASSWORD=
APP_DB_PASSWORD=
MIGRATOR_DB_PASSWORD=
JWT_SECRET= # at least 32 characters
# --- External object storage (S3-compatible) -------------------------------
S3_ENDPOINT=
S3_BUCKET=
S3_ACCESS_KEY=
S3_SECRET_KEY=
S3_REGION=
# --- Optional overrides (sensible defaults applied in compose) -------------
# LOG_LEVEL=info
# --- Image (the deploy workflow sets this automatically; only set to pin) --
# BACKEND_IMAGE=ghcr.io/your-org/your-app:latest
Generate passwords with
openssl rand -hex 32, not-base64. Base64 output can contain+,/, and=, which break when embedded in apostgresql://user:password@host/dbURL. Hex is always URL-safe. This is a genuinely sneaky bug the password "looks fine" but the connection string is silently malformed.
6. CI part 1: code-quality.yml (runs first, on every push)
This workflow runs static analysis / a quality gate. The deploy workflow only triggers if this one succeeds, so it acts as a gate. (Swap SonarQube for whatever you use — ESLint, CodeQL, etc.)
name: CodeQuality Checks
on:
push:
branches:
- main
jobs:
code-quality:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v6
with:
fetch-depth: 0 # full history; some scanners need it for blame/new-code
- name: Static analysis scan
uses: sonarsource/sonarqube-scan-action@v5
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
- name: Quality Gate
uses: sonarsource/sonarqube-quality-gate-action@v1
timeout-minutes: 5
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
# For a SELF-HOSTED scanner you MUST pass the host URL here too, or the
# gate action defaults to the cloud service, can't find your project,
# and fails with a confusing HTTP 404.
SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
The one thing worth highlighting: a self-hosted quality scanner needs its SONAR_HOST_URL on both the scan step and the gate step. Miss it on the gate step and you get a 404 that looks like a credentials problem but isn't.
7. CI part 2: main.yml (build, scan, deploy)
This is the workhorse. It triggers after the quality workflow completes, and runs four jobs in sequence: dependency audit → build & push image → scan image → deploy.
name: Deploy
on:
workflow_run:
workflows: ["CodeQuality Checks"] # only runs after the quality workflow
types: [completed]
branches: [main]
env:
REGISTRY: ghcr.io
concurrency:
group: deploy
cancel-in-progress: false # never interrupt an in-flight deploy
jobs:
# 1) Block the deploy if a production dependency has a known high-severity CVE
dependency-check:
name: Dependency Vulnerability Check
if: ${{ github.event.workflow_run.conclusion == 'success' }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
ref: ${{ github.event.workflow_run.head_sha }}
- uses: pnpm/action-setup@v4
- uses: actions/setup-node@v4
with:
node-version: '22'
cache: 'pnpm'
- run: pnpm install --frozen-lockfile
- name: Audit production dependencies (blocking)
run: pnpm audit --prod --audit-level=high
- name: Audit everything (report only)
run: pnpm audit --audit-level=high
continue-on-error: true
# 2) Build the image once, push to the registry
build-and-push:
name: Build & Push Image
needs: dependency-check
runs-on: ubuntu-latest
permissions:
contents: read
packages: write # needed to push to GHCR
outputs:
image: ${{ steps.image-name.outputs.image }}
steps:
- uses: actions/checkout@v6
with:
ref: ${{ github.event.workflow_run.head_sha }}
- name: Compute lowercase image name
id: image-name
run: echo "image=ghcr.io/$(echo '${{ github.repository }}' | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
- uses: docker/setup-buildx-action@v4
- name: Log in to registry
uses: docker/login-action@v4
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Image metadata (tags)
id: meta
uses: docker/metadata-action@v6
with:
images: ${{ steps.image-name.outputs.image }}
tags: |
type=sha,prefix=sha-
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push
uses: docker/build-push-action@v7
with:
context: .
file: ./Dockerfile
target: runner
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
# 3) Scan the built image for OS/package CVEs; fail on CRITICAL/HIGH
image-scan:
name: Container Security Scan
needs: build-and-push
runs-on: ubuntu-latest
permissions:
contents: read
packages: read
steps:
- name: Log in to registry
uses: docker/login-action@v4
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Trivy scan
uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # pin actions by SHA
with:
image-ref: ${{ needs.build-and-push.outputs.image }}:latest
severity: CRITICAL,HIGH
exit-code: '1'
ignore-unfixed: true # don't fail on CVEs with no fix available yet
# 4) Deploy: copy compose to server, pull image, migrate, start, health-check
deploy:
name: Deploy to Server
needs: [build-and-push, image-scan]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
ref: ${{ github.event.workflow_run.head_sha }}
- name: Ensure deploy directory exists
uses: appleboy/ssh-action@v1.2.5
with:
host: ${{ secrets.SERVER_IP }}
username: ${{ secrets.SERVER_USER }}
key: ${{ secrets.SERVER_SSH_KEY }}
port: 22
script: mkdir -p "${{ secrets.DEPLOY_PATH }}"
# The compose file is the source of truth in git and is shipped to the
# server EVERY deploy (overwrite: true), so the server can never drift.
- name: Copy compose file to server
uses: appleboy/scp-action@v0.1.7
with:
host: ${{ secrets.SERVER_IP }}
username: ${{ secrets.SERVER_USER }}
key: ${{ secrets.SERVER_SSH_KEY }}
port: 22
source: docker-compose.prod.yml
target: ${{ secrets.DEPLOY_PATH }}
overwrite: true
- name: Deploy over SSH
uses: appleboy/ssh-action@v1.2.5
env:
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
IMAGE: ${{ needs.build-and-push.outputs.image }}
DEPLOY_PATH: ${{ secrets.DEPLOY_PATH }}
with:
host: ${{ secrets.SERVER_IP }}
username: ${{ secrets.SERVER_USER }}
key: ${{ secrets.SERVER_SSH_KEY }}
port: 22
envs: GHCR_TOKEN,IMAGE,DEPLOY_PATH
script: |
set -euo pipefail
echo "$GHCR_TOKEN" | docker login ghcr.io -u ${{ secrets.GHCR_USERNAME }} --password-stdin
cd "$DEPLOY_PATH"
# .env holds all secrets and is never in git — it must already exist.
if [ ! -f .env ]; then
echo "ERROR: $DEPLOY_PATH/.env is missing. Create it from .env.example first."
exit 1
fi
export BACKEND_IMAGE="${IMAGE}:latest"
docker pull "$BACKEND_IMAGE"
# No `down` named volumes are never touched, so zero data loss and
# no DB downtime. The one-shot migrate runs forward-only migrations
# and must exit 0; if it fails, `up` returns non-zero and we stop.
docker compose -f docker-compose.prod.yml up -d --remove-orphans
# Wait on the CONTAINER healthcheck (the single source of truth),
# not a separate host-side probe. 5-minute budget for cold starts.
echo "Waiting for services to become healthy (up to 5 min)..."
deadline=$((SECONDS + 300))
for svc in api worker; do
cid="$(docker compose -f docker-compose.prod.yml ps -q "$svc")"
if [ -z "$cid" ]; then
echo "ERROR: $svc container not created."; docker compose ps; exit 1
fi
while true; do
status="$(docker inspect -f '{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "$cid" 2>/dev/null || echo missing)"
case "$status" in
healthy) echo "$svc: healthy"; break ;;
unhealthy) echo "ERROR: $svc unhealthy. Logs:"; docker compose -f docker-compose.prod.yml logs --tail=100 "$svc"; exit 1 ;;
esac
if [ "$SECONDS" -ge "$deadline" ]; then
echo "ERROR: $svc not healthy in time. Logs:"; docker compose -f docker-compose.prod.yml logs --tail=100 "$svc"; exit 1
fi
sleep 5
done
done
echo "All services healthy."
# Prune only AFTER success, so the previous image stays for rollback.
docker image prune -f
docker compose -f docker-compose.prod.yml ps
Why the deploy job is shaped this way
-
It triggers off the quality workflow (
workflow_run), so a bad commit that fails quality never reaches the server. - The image is built once in CI and pushed to the registry. The server only pulls it never builds. Builds are slow and resource-hungry; your small VPS shouldn't do them.
- Actions are pinned third-party actions like Trivy are pinned to a commit SHA, not a moving tag, so a compromised release can't silently change what runs in your pipeline.
-
No
docker compose downbringing the stackdowncan remove containers and (with-v) volumes. We only everup -d, which recreates changed containers and leaves the database volume untouched. Zero data-layer downtime. -
The health gate waits on the container's own healthcheck via
docker inspect, with a 5-minute budget. This is more reliable than a separatecurlfrom the host, because it uses the exact probe defined in compose and accounts for slow cold starts. - Prune happens last only after the new version is confirmed healthy, so the previous image is still around for a fast manual rollback if needed.
8. Keeping dependencies fresh: dependabot.yml
Drop this in .github/dependabot.yml. It opens grouped, scheduled PRs to bump dependencies, GitHub Actions versions, and your Docker base image.
version: 2
updates:
- package-ecosystem: "npm" # covers package-lock / pnpm-lock
directory: "/"
schedule:
interval: "weekly"
day: "monday"
open-pull-requests-limit: 10
groups: # group related bumps into ONE PR to review
framework:
patterns: ["@nestjs/*"]
dev-tooling:
dependency-type: "development"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
groups:
actions:
patterns: ["*"]
- package-ecosystem: "docker" # bumps your Dockerfile base image
directory: "/"
schedule:
interval: "weekly"
Grouping is the feature that makes Dependabot bearable: instead of twenty separate PRs, you get a handful of grouped ones you can review and merge together.
9. Step-by-step: your first deploy
One-time setup
On GitHub — add these repository secrets (Settings → Secrets and variables → Actions):
| Secret | What it is |
|---|---|
SERVER_IP |
Your server's IP, e.g. your-server-ip
|
SERVER_USER |
SSH user, e.g. deploy or root
|
SERVER_SSH_KEY |
The private SSH key for that user (full text) |
DEPLOY_PATH |
Where the app lives on the server, e.g. /home/apps/your-app
|
GHCR_TOKEN |
A token that can read your registry images (used by the server to pull) |
GHCR_USERNAME |
The username/org for the registry login |
SONAR_TOKEN / SONAR_HOST_URL
|
If you use a quality gate |
On the server — install Docker + Compose, create the deploy directory and the secrets file:
# Install Docker (official convenience script) and verify Compose v2.23.1+
curl -fsSL https://get.docker.com | sh
docker compose version
mkdir -p /home/apps/your-app
cd /home/apps/your-app
# Create the real .env from your template, then fill in generated secrets.
nano .env
# POSTGRES_PASSWORD=... (openssl rand -hex 32)
# APP_DB_PASSWORD=... (openssl rand -hex 32)
# MIGRATOR_DB_PASSWORD=... (openssl rand -hex 32)
# JWT_SECRET=... (openssl rand -hex 32)
# S3_* = ...
That's it for the server. You will not edit anything else here.
Every deploy after that
# 1. Make your change IN THE REPO (code, or the compose file, or a workflow).
# 2. ALWAYS validate the compose file before committing:
docker compose -f docker-compose.prod.yml config >/dev/null && echo "compose OK"
# 3. Commit and push to main:
git add .
git commit -m "your change"
git push origin main
Then watch the Actions tab. The quality workflow runs, then the deploy workflow builds, scans, and ships. Done.
Make
docker compose configa reflex. It parses and fully resolves the file (including anchors and.envinterpolation) in about a second. It catches the entire class of "the deploy died instantly on a YAML typo" problems before you push. The vast majority of failed first deploys are a malformed compose file that this one command would have caught.
10. Common errors and how to fix them
This is the section I wish every tutorial had. Every one of these is real. They're roughly in the order you hit them as the pipeline gets further each time.
"My edits to the server file keep reverting!"
Cause: you edited docker-compose.prod.yml on the server, but the pipeline copies the repo's version over it (overwrite: true) on every deploy.
Fix: edit the file in the repo, not the server. The server copy is generated output. This is by design — it guarantees the server matches what's reviewed in git. Retrain the muscle memory: never edit on the box.
SSH step fails with "handshake failed" / "permission denied (publickey)"
Cause: the SERVER_SSH_KEY secret is wrong, or the matching public key isn't in the server's ~/.ssh/authorized_keys.
Fix: put the private key (the whole thing, including the BEGIN/END lines) in the secret. Add its public half to authorized_keys for SERVER_USER. Test locally first: ssh -i your_key user@your-server-ip.
Registry login fails: "Error: Cannot perform an interactive login from a non TTY device" or empty password
Cause: the registry token secret is empty or unset, so the docker login gets no password and tries to go interactive.
Fix: set GHCR_TOKEN (and GHCR_USERNAME). Always pipe it: echo "$GHCR_TOKEN" | docker login ghcr.io -u "$USER" --password-stdin.
"stat /path/.env.docker: no such file or directory"
Cause: the compose file references an env file (env_file: .env.docker) that doesn't exist on the server.
Fix: either create that file, or — better — drop the separate env file and define config inline in the compose x-app-env block, reading secrets from the standard .env. One fewer file to manage.
yaml: line 2: mapping values are not allowed in this context
Cause: the compose file is malformed — almost always near the top. The classic version: the x-app-env: &app-env anchor line got deleted (often during hand-edits or a bad copy-paste), leaving the env keys with no parent, or a comment lost its leading #.
Fix: restore the structure. Confirm the anchor exists and the references match. Then docker compose config to verify it parses before committing. If you copied the file from somewhere and it got mangled, download the raw file instead of pasting, pasted text can drop indentation or lines.
Migration fails: schema "..." does not exist (Postgres code 3F000)
Cause: the ORM tries to create its migrations bookkeeping table inside a custom schema before the migration that would create that schema has run — a chicken-and-egg on a fresh database.
Fix: pre-create the schema in your DB init SQL: CREATE SCHEMA IF NOT EXISTS your_schema AUTHORIZATION migrator_user;. Important: init SQL only runs on a fresh, empty volume. If your volume already exists, also create the schema manually once:
docker compose exec postgres psql -U postgres -d appdb -c \
"CREATE SCHEMA IF NOT EXISTS your_schema AUTHORIZATION migrator_user;"
Migration fails: permission denied for database
Cause: the migrator role can create schemas/tables but wasn't granted CREATE on the database itself.
Fix: in init SQL: GRANT CREATE ON DATABASE appdb TO migrator_user; (and re-run the manual grant if the volume already exists).
App can't connect: unsupported startup parameter: statement_timeout (then lock_timeout, etc.)
Cause: your DB driver sets session parameters as connection "startup parameters". PgBouncer in transaction pooling mode rejects any it isn't told to allow — and it surfaces them one at a time, so you fix one and hit the next.
Fix: allow the whole set at once on the pgbouncer service:
IGNORE_STARTUP_PARAMETERS: extra_float_digits,statement_timeout,lock_timeout,idle_in_transaction_session_timeout
Enforce the actual timeouts at the role level in init SQL (ALTER ROLE ... SET statement_timeout = ...), because under transaction pooling per-session SETs don't reliably stick.
Deploy reports "did not become healthy in time" but the app log says it started
Cause: the health gate is stricter or faster than the app's real startup, or the health-check path/port is wrong.
Fix: first confirm the app actually serves the health route from inside the container:
docker compose exec api wget -qO- http://localhost:4000/api/health
If that returns OK, it's a timing issue — raise start_period and the deploy's wait budget. If it 404s, fix the path in the healthcheck test:. If it says wget: not found, your runtime image lacks wget — install it or use a node/curl-based check.
The database "disappeared" after I renamed something
Cause: you changed the compose name: (project name). Volumes are namespaced by project name, so the stack now points at a new, empty volume. Your old data is still on disk under the old name.
Fix: never change the project name. To find orphaned data: docker volume ls | grep postgres. This is why a pg_dump backup before any risky change is non-negotiable in production.
Quality gate fails with HTTP 404 (self-hosted scanner)
Cause: the gate step didn't get the scanner host URL, so it defaulted to the cloud service and couldn't find your project.
Fix: pass SONAR_HOST_URL on both the scan and the gate steps.
Dependabot: "security update not possible"
Cause: a vulnerable transitive dependency has no version that satisfies everything else's constraints yet. Common for deep dev-only dependencies.
Fix: if it's dev-only and below your production audit threshold, it's safe to leave until the ecosystem catches up, or add a temporary override/resolution. Don't let a dev-only advisory block production.
A service crashes with an application error (e.g. TypeError: ... is not iterable)
Cause: this is not an infrastructure problem the image built and deployed fine; the app code itself is throwing on boot.
Fix: read it as a signal your pipeline is working it caught a real code bug before declaring success. This belongs to whoever owns that part of the application code, not to the deploy config. No amount of compose/workflow tweaking fixes a code bug. Hand it to the right developer with the exact stack trace.
11. A pre-deploy checklist
Pin this somewhere:
- [ ]
docker compose -f docker-compose.prod.yml configpasses locally - [ ] All changes are in the repo, nothing edited directly on the server
- [ ]
.envexists on the server with every required key filled in (URL-safe secrets viaopenssl rand -hex 32) - [ ] The compose project
name:is unchanged - [ ] All required GitHub secrets are set
- [ ] Third-party actions are pinned to SHAs
- [ ] Healthcheck path/port match what your app actually serves
- [ ] You have a recent database backup (
pg_dump) before any risky change
12. Closing lessons
A few things that, in hindsight, mattered more than any single config line:
- One source of truth. Edit in the repo; let the server be disposable. Half-adopting this (editing in both places) is worse than not adopting it at all.
-
Validate before you push.
docker compose configturns a 3-minute failed pipeline into a 1-second local check. - Errors get deeper, which is progress. A YAML parse error → a migration error → a connection error → an app boot error is not "still broken" it's each layer passing in turn. Read the new error as a checkpoint reached.
-
Know the boundary between infra and app. Connection params, schemas, health timing: infra. A
TypeErrorin your own code: not infra. Recognizing which is which saves you from "fixing" the wrong file. -
Protect your data. Pin the project name, never
down -vcasually, and back up before risky changes. Containers are disposable; your database is not.
Happy shipping.
Top comments (0)