Omar Ahmed

Posted on Feb 6

Docker Production Best Practices - Complete Guide

#docker #devops

Part 1 : Image Selection & Architecture

1. Minimal Base Images

Containers are not Virtual Machines. They share the host kernel.

# ❌ BAD - Large attack surface (800MB+)
FROM ubuntu:22.04

# ✅ GOOD - Minimal attack surface (5-50MB) Alpine / Distroless
FROM alpine:3.19
# OR
FROM gcr.io/distroless/static-debian12

Rule:

Every package you don't include is a package that cannot have vulnerabilities.
Compiled ( Go , Rust ) Use Distroless or Scratch. Compile to a static binary, no runtime dependencies needed, only need the binary file.
Interpreted (Node, Python) Use Alpine or Slim variants. You only need the language runtime, not the OS tools.

2. Separate the Builder from the Runner - Multi-Stage Builds

We don't need the source code, compilers, dependencies, or test files in production, so we need to separate the build stage from the runtime stage. The final image discards all build tools.

FROM node:alpine AS builder
WORKDIR /app
COPY . .
RUN npm install && npm run build

FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html

┌─────────────────┐                      ┌─────────────────┐
│                 │                      │                 │
│  ┌──────────┐   │                      │  ┌──────────┐   │
│  │  Source  │   │   COPY --from=       │  │   App    │   │
│  └──────────┘   │     builder          │  │  Binary  │   │
│                 │  ─────────────────>  │  └──────────┘   │
│  ┌──────────┐   │                      │                 │
│  │ Compiler │   │                      │                 │
│  └──────────┘   │                      │                 │
│                 │                      │                 │
└─────────────────┘                      └─────────────────┘
Stage 1: Builder                        Stage 2: Runner

3. Derive Versions, Don't Guess

Always match Dockerfile base image versions to your project configuration files - Use specific version tags, not latest .

Your project config is the Source of Truth. The Dockerfile must mirror it. Mismatched versions lead to "It works on my machine" but fails in production.

  ┌─────────────┐
  │ package.json│
  │             │
  │  engines:   │
  │  node >= 20 │──────────────────────> FROM node:20-alpine
  │             │
  └─────────────┘


  ┌─────────────┐
  │   go.mod    │
  │             │
  │  go 1.21    │──────────────────────> FROM golang:1.21
  │             │
  └─────────────┘


  ┌─────────────┐
  │   Pipfile   │
  │             │
  │ python_     │
  │ version =   │──────────────────────> FROM python:3.11-slim
  │   3.11      │
  └─────────────┘

Part 2 : Optimization - Build Speed & Layer Caching

1. Combine Install and Clean Steps

RUN npm install
RUN npm cache clean --force

This creates TWO separate layers in the Docker image:

The first RUN command installs packages and creates cache files in Layer 1
The second RUN command cleans the cache in Layer 2
However, the cache files from Layer 1 are only hidden, not deleted
They still exist in the image and contribute to the total image size
This wastes disk space and increases image size unnecessarily

The Solution (Production Way)

RUN npm ci --omit=dev && npm cache clean --force

Both commands execute in a single layer
The cache files are created and then immediately cleaned within the same layer
The temporary cache files never commit to the final image history

Key Concept

Docker layers are additive. You cannot delete data from a previous layer, only mask it. When you delete files in a subsequent layer, they're hidden but still consume space in the image. To truly remove temporary files, you must clean them up in the same RUN command where they were created using && to chain commands together.

Other Example

# ❌ BAD - Two layers
RUN apt-get update
RUN apt-get install -y package && rm -rf /var/lib/apt/lists/*

# ✅ GOOD - Single layer
RUN apt-get update && \
    apt-get install -y package && \
    rm -rf /var/lib/apt/lists/*

2. Be Explicit with Files and Dependencies

No Lazy Copying - Avoid COPY . . in the final stage.

Using COPY . . copies everything from your project directory into the container, including:

.env files (sensitive credentials)
.git directory (version control history)
Test files
Documentation
Development configs
Build artifacts you don't need

The Solution:

Copy ONLY what is needed:

# ❌ BAD - Copies everything
COPY . .

# ✅ GOOD - Explicit and selective
COPY ./dist ./dist

Best Practice:

Be explicit about which files and directories you copy
Only include runtime-necessary files
Use .dockerignore to exclude unwanted files
Copy built artifacts from builder stage, not source files

3. Production Dependencies Only

Dev tools do not belong in production.

Use npm ci --omit=dev (or equivalent for your package manager)

# ❌ BAD - Installs all dependencies including dev
RUN npm install

# ✅ GOOD - Production dependencies only
RUN npm ci --omit=dev

Part 3 : Security Hardening

1. Never Run as Root

By default, Docker containers run as the root user (UID 0), which poses significant security risks. Always create and switch to a non-root user in production containers.

RUN addgroup -S appgroup && \
    adduser -S appuser -G appgroup
USER appuser

Security Risk

If an attacker compromises a container running as root, they potentially gain root privileges over the host.

This means:

Full control over the container
Possible escape to the host system
Access to sensitive host resources
Ability to compromise other containers

2. The Latest Tag is a Lie

No version control
No reproducibility
Debugging nightmares
Unexpected breaking changes
Security vulnerabilities from unknown versions

# ❌ BAD - Today you get Node 20, tomorrow you might get Node 21
FROM node:latest

# ✅ GOOD - Always Node 20.11.0 on Alpine 3.19
FROM node:20.11.0-alpine3.19

# ✅ ACCEPTABLE - Gets patch updates but stays on 20.x
FROM node:20-alpine

3. Random images from Docker Hub

Unknown Source can contain malware, crypto-miners, or backdoors.

4. Remove the Attacker's Toolkit

If an attacker gets in, don't hand them the tools to explore.

curl, wget, vim , netcat - These tools allow attackers to explore the network, download malware. If the tool isn't there, their job becomes significantly harder.

Question: If I strip all tools from my production container, how do I debug a DNS issue?

Instead of bloating your production container with debugging tools, use an Ephemeral Sidecars - temporary sidecar container that attaches to your production container's network namespace only when needed.

# Find your container ID or name
docker ps

# Run netshoot (debugging toolbox) in the same network namespace
docker run -it --rm \
  --network container:my-app-container \
  nicolaka/netshoot

# Now you have access to debugging tools:
# - curl http://api.example.com
# - dig google.com
# - nslookup database-service
# - ping redis
# - tcpdump

Simply exit the sidecar container - it's removed automatically (due to --rm flag), leaving your production container pristine.

Other Example

# network debugging
docker run -it --rm --network container:my-app nicolaka/netshoot

# basic Unix tools
docker run -it --rm --network container:my-app busybox

# alpine with tools
docker run -it --rm --network container:my-app \
  alpine sh -c "apk add --no-cache curl bind-tools && sh"

5. Root Owns, User Executes

This security pattern ensures that even if an attacker compromises your application, they cannot establish a permanent foothold by modifying the application binary itself.

If the application is compromised:

Attacker gains access to the running container as appuser
Attacker tries to inject backdoor into the application binary
Attack fails because appuser has no write permissions
Result: Attacker cannot establish persistence through binary modification

Entity	Permission	Impact
Owner (Root)	Read / Write	Can update app ✅ (during build)
User (AppUser)	Read / Execute	Can RUN app, cannot MODIFY app ❌

Implementation

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine
# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app

# Copy files as root, set ownership to root
COPY --from=builder --chown=root:root /app/dist ./dist
COPY --from=builder --chown=root:root /app/node_modules ./node_modules

# Make binaries executable but not writable
RUN chmod 755 dist/

# Switch to non-root user for execution
USER appuser

CMD ["node", "dist/index.js"]

Attacker → Backdoor.sh → [×] App Binary (Read Only) → ✓ Protected

Part 4: Maintainability - Dev Experience & Documentation

1. Sort Arguments

Alphabetize multi-line installs for cleaner git diffs.

# ❌ BAD - Unsorted, harder to track changes
RUN apt-get update && apt-get install -y \
    curl \
    git \
    jq \
    python3 \
    zip

# ✅ GOOD - Alphabetically sorted (A→Z)
RUN apt-get update && apt-get install -y \
    curl \
    git \
    jq \
    python3 \
    zip

Why this matters:

Cleaner git diffs - Easy to see what was added/removed
Prevents duplicates - Sorted list makes duplicates obvious
Easier reviews - Reviewers can quickly scan alphabetically
Merge conflicts - Reduces conflicts when multiple people edit

Example git diff with sorted packages:

RUN apt-get update && apt-get install -y \
    curl \
    git \
+   htop \
    jq \
    python3 \

2. Use WORKDIR

Don't use cd. It resets every layer ( Context Reset: Layer boundary). WORKDIR sets persistent context (Set once, applies to all following).

FROM node:20-alpine
RUN cd /app
COPY package.json .
RUN npm install

Problem: The cd /app command only affects that single RUN layer. The next layer (COPY) resets back to the root directory, so files go to the wrong location.

3. OCI Labels

Tag images with metadata (source, version, description) for easier registry identification.

OCI labels are key-value pairs that provide metadata about your Docker image.

Implementation

FROM node:20-alpine

# Add OCI labels for metadata
LABEL org.opencontainers.image.title="My Application"
LABEL org.opencontainers.image.description="A production-ready Node.js application"
LABEL org.opencontainers.image.version="1.2.3"
LABEL org.opencontainers.image.authors="team@example.com"
LABEL org.opencontainers.image.source="https://github.com/myorg/myapp"
LABEL org.opencontainers.image.created="2026-02-06T12:00:00Z"
LABEL org.opencontainers.image.revision="abc123def"
LABEL org.opencontainers.image.licenses="MIT"

WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
USER node
CMD ["node", "index.js"]

Automated Label Injection (CI/CD)

# Use ARG to accept build-time variables, ARG vars Exist only during build, Can be overridden with --build-arg, Have default values (so local builds don't fail)
ARG VERSION=dev
ARG GIT_COMMIT=unknown
ARG BUILD_DATE=unknown

LABEL org.opencontainers.image.version="${VERSION}"
LABEL org.opencontainers.image.revision="${GIT_COMMIT}"
LABEL org.opencontainers.image.created="${BUILD_DATE}"

# Build with metadata from CI/CD
docker build \
  --build-arg VERSION=$(git describe --tags) \
  --build-arg GIT_COMMIT=$(git rev-parse HEAD) \
  --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
  -t myapp:latest .

4. Handle Signals Gracefully

Always use JSON array syntax (exec form) for CMD and ENTRYPOINT in Dockerfiles or similar container configurations. This ensures your application receives system signals and can shut down gracefully, which is crucial for production reliability.

CMD npm start  # ❌ BAD

The Problem: When you use shell form syntax like CMD npm start, Node.js spawns your command through a shell (like /bin/sh). This creates a process hierarchy where:

The shell becomes PID 1 (the main process)
Your actual Node app runs as a child process (PID 2)

When a SIGTERM signal is sent (typically during shutdown, like docker stop), it goes to PID 1 (the shell). However, shells don't forward signals to their children by default, so:

the shell ( main process ) will kill, so a children process will forced kill ( not graceful shutdown )
The Node app never receives the signal
The app can't perform graceful shutdown (close connections, finish requests, cleanup)
The system eventually force-kills the app after a timeout
This can lead to data loss or corrupted state

The Solution

CMD ["npm", "start"]

When you use exec form syntax like CMD ["npm", "start"], Node.js runs your command directly without a shell wrapper. This means:

Your Node app becomes PID 1 directly
SIGTERM is sent directly to your application
Your app can handle the signal properly (using event listeners like process.on('SIGTERM'))
The app can perform graceful shutdown operations
Clean, controlled shutdown is achieved

5. Secrets Do Not Belong in Dockerfiles

If a secret goes into a Docker image, it's already leaked.

✅ DO:

Inject secrets at runtime via environment variables

# Runtime injection
docker run \
  -e DATABASE_PASSWORD=$DB_PASS \
  -e API_KEY=$API_KEY \
  myapp:latest

Use secret management tools (AWS Secrets Manager, Vault, K8s Secrets)
Mount secrets as volumes from secure storage
Use Docker secrets in Swarm mode
Keep secrets in .gitignore and .dockerignore

❌ DON'T:

Hardcode secrets in Dockerfile
COPY secret files into the image
Use ENV for sensitive data
Commit .env files with real secrets to git
Assume deleting in a later layer removes the secret

Thanks for reading! If you found this helpful, please share it with your team.

DEV Community