S, Sanjay

Posted on Mar 11

Docker Multi-Stage Builds: How I Reduced Image Sizes by 94% (With Real Examples)

#kubernetes #docker #devops #containers

Your Docker image is probably too big.

A default Node.js image with npm install is 1.2GB. A Python image with pip dependencies hits 900MB easily. A Java image with Maven can exceed 1.5GB.

These bloated images mean:

Slower deployments — pulling 1.2GB vs 48MB across your cluster
More CVEs — every extra package is an attack surface
Higher costs — storage, bandwidth, and registry fees add up
Longer CI pipelines — building, pushing, and scanning large images

Multi-stage builds solve all of this. One Dockerfile. Multiple stages. The final image contains only what your application needs to run — nothing else.

How Multi-Stage Builds Work

A multi-stage Dockerfile has multiple FROM statements. Each FROM creates a new stage. You can copy artifacts from one stage to another, leaving build dependencies behind.

# Stage 1: BUILD (contains compilers, dev tools, source code)
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: RUNTIME (contains only compiled app + runtime)
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
EXPOSE 3000
CMD ["node", "dist/index.js"]

What happens:

Stage 1 installs all dependencies (including devDependencies), compiles TypeScript, and builds the app
Stage 2 starts from a clean Alpine image and copies ONLY the compiled output
Build tools, source code, devDependencies — none of it exists in the final image

Real Examples: Before vs After

Node.js (Express API)

# ❌ BEFORE: Single stage (1.1GB)
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "src/index.js"]

# ✅ AFTER: Multi-stage (148MB)
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

FROM node:20-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/src ./src
COPY --from=builder /app/package.json ./
USER appuser
EXPOSE 3000
CMD ["node", "src/index.js"]

Key optimizations:

node:20-alpine instead of node:20 — Alpine Linux is 5MB vs 150MB
npm ci --only=production — no devDependencies in the final image
Non-root user — security best practice
Only src/, node_modules/, and package.json copied — no .git, tests, docs

Python (FastAPI)

# ❌ BEFORE: Single stage (920MB)
FROM python:3.12
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0"]

# ✅ AFTER: Multi-stage (85MB)

# Stage 1: Build dependencies in full Python image
FROM python:3.12-slim AS builder
WORKDIR /app
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Stage 2: Runtime with only the virtualenv
FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
WORKDIR /app
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY app/ ./app/
USER appuser
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Why virtualenv in Docker? It creates a clean, isolated directory of all Python dependencies. You copy that single directory to the runtime stage — no pip, no build headers, no cache.

Go (API Server)

Go produces static binaries. This is where multi-stage builds really shine:

# ✅ Go multi-stage (12MB final image!)

# Stage 1: Build the binary
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server ./cmd/server

# Stage 2: Scratch image (literally empty)
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /app/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

12MB. The scratch image is completely empty — no shell, no OS, no package manager. Just your binary and TLS certificates. This is the smallest possible attack surface.

Java (Spring Boot)

# ✅ Java multi-stage (180MB, down from 700MB)

# Stage 1: Build with Maven
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline    # Cache dependencies
COPY src/ ./src/
RUN mvn package -DskipTests -q

# Stage 2: Extract Spring Boot layers
FROM eclipse-temurin-21-jre-alpine AS extractor
WORKDIR /app
COPY --from=builder /app/target/*.jar app.jar
RUN java -Djarmode=layertools -jar app.jar extract

# Stage 3: Runtime with layered JARs
FROM eclipse-temurin-21-jre-alpine
RUN addgroup -S app && adduser -S app -G app
WORKDIR /app
COPY --from=extractor /app/dependencies/ ./
COPY --from=extractor /app/spring-boot-loader/ ./
COPY --from=extractor /app/snapshot-dependencies/ ./
COPY --from=extractor /app/application/ ./
USER app
EXPOSE 8080
ENTRYPOINT ["java", "org.springframework.boot.loader.launch.JarLauncher"]

Three stages. The Spring Boot layer extraction (stage 2) separates dependencies from application code. Docker caches the dependency layer — so when only your code changes, the rebuild copies just the application layer ( usually <1MB). This makes subsequent builds extremely fast.

Optimization Techniques

1. Order COPY statements by change frequency

Docker caches each layer. When a layer changes, all subsequent layers are invalidated. Put rarely-changing files first:

# ✅ Dependencies change rarely → cached
COPY package*.json ./
RUN npm ci

# Source code changes often → rebuilt
COPY src/ ./src/

2. Use .dockerignore

A file npm install copies your entire project. Without .dockerignore, you're sending .git/, node_modules/, test files, and docs to the Docker daemon.

# .dockerignore
.git
.gitignore
node_modules
npm-debug.log
Dockerfile
docker-compose.yml
.env
*.md
tests/
coverage/
.vscode/

3. Pin exact versions

# ❌ Breaks randomly when base image updates
FROM node:latest

# ❌ Breaks when minor version changes
FROM node:20

# ✅ Predictable, reproducible builds
FROM node:20.11.1-alpine3.19

4. Minimize layers

Each RUN command creates a layer. Combine related commands:

# ❌ 3 layers
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# ✅ 1 layer
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

5. Security: Never run as root

# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

Every container should run as a non-root user. If an attacker exploits your application, they don't get root access to the container (or potentially the host).

Scanning Your Images

A small image with known CVEs is still a vulnerable image. Scan after building:

# Trivy — free, fast, comprehensive
trivy image myapp:v1.0.0

# Docker Scout (built into Docker Desktop)
docker scout cves myapp:v1.0.0

# Grype by Anchore
grype myapp:v1.0.0

Integrate scanning into CI:

# GitHub Actions step
- name: Scan Docker image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myapp:${{ github.sha }}
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'
    exit-code: '1'    # Fail the build if critical CVEs found

Size Comparison Summary

Language      | Before          | After           | Reduction
-------------|-----------------|-----------------|----------
Node.js      | 1,100 MB        | 148 MB          | 87%
Python       | 920 MB          | 85 MB           | 91%
Go           | 850 MB          | 12 MB           | 99%
Java         | 700 MB          | 180 MB          | 74%

The effort: restructuring one Dockerfile. The payoff: faster deployments, fewer vulnerabilities, lower costs — permanently.

When NOT to Use Multi-Stage Builds

Development environments. In dev, you want hot-reload, debuggers, and full source code. Use a single-stage Dockerfile for development and multi-stage for production.
Debugging production issues. Sometimes you need curl, sh, or strace in the container. Use a debug image (alpine instead of scratch) when troubleshooting.

Every Dockerfile in your production pipeline should be multi-stage. It's one of those rare optimizations that improves performance, security, and cost simultaneously.

What's the smallest Docker image you've built? Share your Dockerfile tricks in the comments.

Follow me for more container and DevOps content.

Top comments (1)

klement Gunndu • Mar 11

The adduser + non-root step in the final stage is the real gem here — most multi-stage tutorials skip that and ship as root. That alone fixes half the CVE surface beyond just image size.