When engineers say "Docker builds an image," they usually mean a single command.
In reality, docker build triggers a deterministic pipeline that transforms a text file into an OCI-compliant artifact, composed of immutable, content-addressed layers.
Understanding this pipeline explains why cache behaves the way it does, why instruction order matters, and why small Dockerfile changes can dramatically impact build time and image size.
From Dockerfile to Build Graph
The build process starts long before any filesystem changes occur.
Docker first parses the Dockerfile into an internal instruction graph.
This phase validates syntax, resolves build stages, and prepares the build context after applying .dockerignore. No layers are created here. The output is a dependency-aware plan for how the image could be built.
Only after this plan is constructed does execution begin.
Practical Impact: The .dockerignore Advantage
# Without .dockerignore:
Sending build context to Docker daemon 1.2GB # Slow transfer
# With proper .dockerignore:
Sending build context to Docker daemon 12.3kB # Fast transfer
Key files to exclude:
node_modules/
.git/
*.log
.env
dist/ # For multi-stage builds
Layer Creation Is Content, Not Commands
Each filesystem-changing instruction such as RUN, COPY, or ADD produces a new layer.
These layers are immutable and identified by a cryptographic hash derived from their content and their parent layer.
This is why Docker caching is reliable.
If the inputs are identical, the resulting layer hash is identical. The build system does not care why a command ran, only what it produced.
Cache Key Composition
Layer Hash = SHA256(
Parent Layer Hash +
Instruction Content +
File Content (for COPY/ADD) +
Build Arguments at this point
)
Example Cache Behavior:
# Layer 1: Always cached (base image)
FROM node:18-alpine
# Layer 2: Cached unless WORKDIR changes
WORKDIR /app
# Layer 3: Cache breaks if package.json changes
COPY package*.json ./
# Layer 4: Cache breaks if Layer 3 changes
RUN npm ci
# Layer 5: Cache breaks if ANY file changes
COPY . .
# Layer 6: Always cached (metadata)
CMD ["npm", "start"]
This design is what allows Docker to reuse layers across images, hosts, and even registries.
Why BuildKit Changed Everything
The classic Docker builder executed instructions sequentially, treating each step as an isolated operation.
BuildKit replaces this with a graph-based execution model.
With BuildKit, independent steps can execute in parallel, cache keys are more precise, and sensitive data such as credentials can be mounted at build time without ever becoming part of an image layer.
BuildKit vs Classic: A Performance Comparison
# Classic Builder (sequential)
Step 1/8 : FROM alpine:latest
Step 2/8 : RUN apk add --no-cache python3
Step 3/8 : RUN pip install pandas
... # Each step waits for previous
# BuildKit (concurrent possible)
[+] Building 8.2s (15/15) FINISHED
=> CACHED [stage-1 2/6] ...
=> CACHED [stage-1 3/6] ... # Parallel execution
=> CACHED [stage-1 4/6] ...
Advanced BuildKit Features
1. Build Secrets (Never in Image Layers)
RUN --mount=type=secret,id=npm_token \
echo "//registry.npmjs.org/:_authToken=$(cat /run/secrets/npm_token)" > .npmrc && \
npm ci
2. Cache Mounts (Persistent Between Builds)
RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && apt-get install -y packages
This is not an optimization.
It is a fundamental shift in how image builds are modeled.
Multi-Stage Builds as a Security Boundary
Multi-stage builds are often described as a size optimization.
More importantly, they create a clean separation between build-time and runtime concerns.
Compilers, package managers, and secrets exist only in intermediate stages.
The final image contains exactly what is required to run the application, and nothing else.
Security Impact Analysis
# Single-Stage (Vulnerable)
FROM node:18
COPY . .
RUN npm ci # 600+ dev dependencies
RUN npm run build
CMD ["node", "dist/app.js"]
# Result: 1.2GB image with dev tools, compilers, secrets
# Multi-Stage (Secure)
FROM node:18 AS builder
COPY . .
RUN npm ci && npm run build # Dev dependencies here
FROM node:18-alpine
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./
RUN npm ci --only=production # Only 40 prod dependencies
# Result: 180MB image, no dev tools, no build secrets
This reduces attack surface, simplifies vulnerability scanning, and makes image provenance easier to reason about.
Debugging Builds Means Debugging Inputs
Most Docker build issues are not runtime problems.
They are cache invalidation problems.
Unexpected rebuilds almost always trace back to:
- Changing inputs in early layers
- Overly broad
COPYinstructions - Uncontrolled build arguments
Diagnostic Toolkit
1. Layer Inspection
docker history myimage --no-trunc --format "{{.CreatedBy}}"
dive myimage # Interactive layer explorer
2. Cache Analysis
# See why cache invalidated
docker build --progress=plain .
# Check specific layer
docker inspect myimage --format='{{.RootFS.Layers}}'
3. Context Troubleshooting
# See what's being sent to daemon
docker build --no-cache . 2>&1 | grep "sending build context"
Tools like docker build --progress=plain, docker history, and layer inspection utilities expose these relationships directly, turning "Docker magic" back into observable behavior.
Production Patterns
1. Deterministic Builds
# Pin everything
FROM node:18.20.1-alpine3.19 # Not :latest
RUN npm ci --frozen-lockfile # Not npm install
2. Build-Time Optimization
# Order matters: Stable → Changing
COPY package*.json ./ # Infrequent changes
RUN npm ci # Expensive operation
COPY . . # Frequent changes
3. Size Optimization
# Clean as you go
RUN apt-get update && \
apt-get install -y build-essential && \
# Build something && \
apt-get remove -y build-essential && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*
The OCI Artifact: What Actually Gets Built
At the end of the pipeline, Docker produces:
- Image Manifest - Metadata and layer references
- Image Config - Environment, entrypoint, working directory
- Layer Tarballs - Compressed filesystem diffs
- Index (multi-arch) - Platform-specific manifests
{
"schemaVersion": 2,
"layers": [
{
"digest": "sha256:abc123...", // Content hash
"size": 1234567
}
],
"config": {
"digest": "sha256:def456...",
"Cmd": ["npm", "start"]
}
}
Summary
The Docker build pipeline transforms human-readable instructions into a secure, efficient, distributable artifact through:
- Graph-based planning - Not linear execution
- Content-addressable storage - Deterministic layer creation
- Stage isolation - Build/runtime separation
- Observable behavior - Every layer is inspectable
Understanding these internals moves teams from "Docker builds" to "engineered artifact pipelines."
Top comments (0)