Originally published on devopsstart.com, this guide explores how to use Docker multi-stage builds to slash image sizes and harden your production security.
Deploying applications with Docker has become a standard practice in modern software development. However, simply containerizing your code isn't enough; you need to build Docker images that are lean, secure, and truly ready for production. This is where Docker multi-stage builds become not just a best practice, but a critical component of an efficient deployment strategy.
Many teams initially start with a single Dockerfile that bundles everything: compilers, development libraries, test frameworks, and the application itself. This often leads to bloated images that are slow to pull, consume excessive storage, and, critically, expose a significantly larger attack surface. If you've ever wondered why your 10MB Go binary might ultimately result in a 1GB Docker image, you're about to discover the underlying reasons and learn effective solutions for Docker multi-stage builds.
This guide will walk you through the core concepts of Docker multi-stage builds, highlight their tangible benefits for production environments—especially concerning security and deployment efficiency—and provide actionable, real-world examples to help you optimize your Docker builds effectively.
What Are Docker Multi-Stage Builds?
At its core, a Docker multi-stage build enables you to utilize multiple FROM instructions within a single Dockerfile. Each FROM instruction initiates a new stage of the build process. The power of this approach lies in your ability to selectively copy artifacts from an earlier stage to a subsequent stage. This ensures that only the absolutely necessary runtime components and application binaries are carried forward into the final image, explicitly leaving all build-time tools, development dependencies, and intermediate files behind.
Consider it akin to manufacturing: you wouldn't ship your entire factory, tools, and raw materials to a customer; you'd deliver only the finished product. Multi-stage builds apply this precise logic to Docker images, creating a clean, production-ready artifact.
Prior to the introduction of multi-stage builds in Docker 17.05, developers often had to manage two separate Dockerfiles: one for building the application and another for creating a slim runtime image. This parallel approach was cumbersome, challenging to maintain, and frequently led to inconsistencies. Docker multi-stage builds elegantly unify this process into a single, cohesive Dockerfile.
Here's a foundational example of the syntax for a Docker multi-stage build:
# Stage 1: The 'builder' stage for compiling the application
FROM golang:1.22.4-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# CGO_ENABLED=0 ensures a statically linked binary, GOOS=linux targets Linux environment
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o myapp .
# Stage 2: The 'final' runtime stage
FROM alpine:3.19.1 AS final
WORKDIR /root/
# Copy only the compiled binary from the 'builder' stage
COPY --from=builder /app/myapp .
CMD ["./myapp"]
In this concrete example:
- The first line,
FROM golang:1.22.4-alpine AS builder, initiates a stage namedbuilder. This stage includes the Go compiler, its SDK, and all necessary dependencies required to build the application. - The
RUN go build ...command compiles our Go application binary. - The second line,
FROM alpine:3.19.1 AS final, starts a new, clean stage, aliased asfinal. This image is inherently much smaller and contains only the absolute essentials for runtime. -
COPY --from=builder /app/myapp .is the pivotal instruction. It directs Docker to copy only the compiledmyappbinary from thebuilderstage into ourfinalimage. Crucially, nothing else from thebuilderstage—such as the Go SDK itself, build environment configurations, or intermediate object files—is included in the final image.
The final image is the one ultimately pushed to your container registry and subsequently deployed to production using Docker multi-stage builds.
Why Use Docker Multi-Stage Builds for Production Applications?
The advantages of Docker multi-stage builds extend far beyond the superficial goal of a "smaller image." In production environments, these benefits directly translate into tangible cost savings, enhanced system reliability, and a significantly superior security posture. By adopting this approach, organizations can achieve more robust and efficient deployment pipelines using multi-stage builds.
1. Significant Image Size Reduction with Multi-Stage Builds
This is often the most immediate and visually striking benefit of using Docker multi-stage builds. By enforcing a clear separation between the build environment and the runtime environment, you effectively eliminate megabytes—and frequently gigabytes—of unnecessary data from your final production image.
Consider a typical Node.js application. Its development dependencies (transpilers, linters, testing frameworks) can easily add hundreds of megabytes. Your node_modules directory alone might exceed 500MB if not managed carefully. With Docker multi-stage builds, you install these development dependencies, build your front-end assets, potentially compress your code, and then discard all those temporary files and tools. Only the compiled, minified application and its runtime dependencies are copied into a slim base image. For instance, I've personally observed Node.js application images shrink from over 1.5GB to under 150MB after adopting multi-stage builds.
Tangible Benefits of Smaller Docker Images from Multi-Stage Builds:
- Faster Image Pulls: Smaller images download significantly quicker, directly reducing deployment times. This is especially impactful on new nodes or within autoscaling scenarios, potentially shaving minutes off your deployment pipeline and leading to much faster iteration cycles.
- Reduced Storage Costs: Less data means less space consumed in your Docker registry (such as AWS ECR, Azure Container Registry, or Google Container Registry) and on your hosts. Across hundreds or thousands of container images, this accumulates into substantial cost savings over time.
- Quicker Local Development: Developers also benefit, as pulling smaller images locally is faster, thereby improving local build and test cycles and reducing developer wait times.
- Lower Bandwidth Costs: For organizations frequently pulling images across different network zones or leveraging various cloud providers, reduced bandwidth usage directly lowers operational expenditure.
- Faster Container Start-up: While not always the primary driver for application initialization, a smaller image can, in specific scenarios, contribute to faster container startup times because fewer layers need to be loaded and processed by the container runtime.
2. Enhanced Security Through Reduced Attack Surface with Multi-Stage Builds
This is arguably the most critical benefit of Docker multi-stage builds for production systems, yet it's often undervalued beyond the simple mantra that "smaller is better." Every piece of software, library, or tool included in your Docker image introduces a potential vulnerability. Build tools, SDKs, compilers, test runners, and development packages are not required at runtime and frequently contain known security flaws or introduce unneeded complexity.
By implementing Docker multi-stage builds, you systematically remove these extraneous components from your final production image. This dramatically shrinks your application's attack surface. If an attacker manages to gain access to your running container, they will not find gcc, make, npm, git, or often even basic network utilities like curl installed. This makes it substantially more difficult for them to escalate privileges, download malicious payloads, or pivot to other systems within your environment.
Consider a scenario where a critical vulnerability is discovered in npm or one of its transitive dependencies. If your production image was built using a single-stage approach, it would likely contain npm, making the running container vulnerable. With a multi-stage Docker build, npm exists exclusively within the build stage, which is discarded. Your running container would remain unaffected by that particular vulnerability, illustrating a significant security advantage.
3. Faster Deployments and Improved Reliability
While closely related to image size, elaborating on the direct impact on your Continuous Integration/Continuous Deployment (CI/CD) pipelines is worthwhile. Docker multi-stage builds often lead to more efficient and reliable deployments.
-
Optimized Caching: When Docker builds an image, it intelligently leverages caching for layers. Multi-stage builds intrinsically allow for more effective caching because individual stages can be rebuilt independently if only their specific dependencies change. For example, if only your application's source code changes (e.g.,
main.goorindex.js), but not its module definitions (e.g.,go.modorpackage.json), Docker can reuse the cached dependency download layers from a previous build of that stage, significantly speeding up subsequent builds. - Reduced Network Latency Effects: Less overall data to transfer during image pushes and pulls translates into a lower chance of encountering network timeouts or corrupted transfers when interacting with a registry. This inherently improves the reliability and stability of your deployments, particularly in distributed environments or across less stable networks.
-
Cleaner Separation of Concerns: The
Dockerfileitself becomes much clearer and more readable, with distinct build logic cleanly separated from runtime configuration. This improves maintainability and comprehension for your development and operations teams, making it easier to onboard new team members or troubleshoot issues.
Basic Docker Multi-Stage Build Example: Go Application
Let's expand on the Go example to make it more concrete and demonstrate a full walk-through of a Docker multi-stage build. This particular pattern is highly adaptable and can be applied effectively to applications written in Node.js, Python, Java, and other compiled or transpiled languages.
Our simple Go application (main.go):
// main.go
package main
import (
"fmt"
"log"
"net/http"
"os"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
appVersion := os.Getenv("APP_VERSION")
if appVersion == "" {
appVersion = "unknown"
}
fmt.Fprintf(w, "Hello from Go, version %s!\n", appVersion)
})
port := os.Getenv("PORT")
if port == "" {
port = "8080"
}
log.Printf("Server starting on port %s...", port)
log.Fatal(http.ListenAndServe(":"+port, nil))
}
And our optimized Dockerfile utilizing Docker multi-stage builds:
# syntax=docker/dockerfile:1.4
# Define build argument for the application version, defaulting to 1.0.0
ARG APP_VERSION=1.0.0
# --- Build Stage: Compiles our Go application ---
FROM golang:1.22.4-alpine AS builder
# Set build-time environment variables for static compilation
ENV CGO_ENABLED=0
ENV GOOS=linux
ENV GOARCH=amd64
WORKDIR /app
# Copy Go module files (go.mod, go.sum) first to leverage Docker layer caching.
# This ensures that the dependency download layer is only rebuilt if these files change.
COPY go.mod go.sum ./
RUN go mod download
# Copy the rest of the application source code
COPY . .
# Build the application
# -v for verbose output
# -o myapp specifies the output binary name
# -ldflags="-X main.version=${APP_VERSION}" embeds the version into the binary
RUN go build -v -o myapp -ldflags="-X main.version=${APP_VERSION}" .
# --- Release/Runtime Stage: Creates a minimal image for production ---
FROM alpine:3.19.1 AS final
# Set runtime environment variables
# Pass the APP_VERSION build argument here to make it available at runtime
ARG APP_VERSION
ENV APP_VERSION=${APP_VERSION}
EXPOSE 8080
WORKDIR /usr/local/bin
# Copy only the compiled binary from the 'builder' stage to the 'final' stage.
# This keeps the final image lean by excluding all build tools and dependencies.
COPY --from=builder /app/myapp .
# Set executable permissions for the binary (important for security and execution)
RUN chmod +x myapp
# Define the command to run the application when the container starts
CMD ["./myapp"]
To build and run this Docker multi-stage build example:
# Build the Docker image, passing the application version as a build argument
docker build -t go-app:1.0.0 --build-arg APP_VERSION=1.0.0 .
# List Docker images to see the size of our newly built image
docker images go-app:1.0.0
# Run the container, mapping port 8080 from the container to host
docker run -p 8080:8080 go-app:1.0.0
Example output from the above commands:
$ docker build -t go-app:1.0.0 --build-arg APP_VERSION=1.0.0 .
[+] Building 5.9s (14/14) FINISHED
... SNIP ...
=> CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -v -o myapp -ldflags=-X main.version=1.0.0 . 0.9s # Compilation step
=> COPY --from=builder /app/myapp . 0.0s # Copying binary from builder
=> RUN chmod +x myapp 0.1s
=> CMD ["./myapp"] 0.0s
=> exporting to image 0.0s
=> exporting config 0.0s
=> pushing layers 0.0s
=> naming to docker.io/library/go-app:1.0.0 0.0s
$ docker images go-app:1.0.0
REPOSITORY TAG IMAGE ID CREATED SIZE
go-app 1.0.0 f1b2c3d4e5f6 5 minutes ago 11.5MB # Observe the minimal image size!
$ docker run -p 8080:8080 go-app:1.0.0
2024/08/01 10:30:01 Server starting on port 8080...
Navigating to http://localhost:8080 in your web browser will display "Hello from Go, version 1.0.0!".
Notice the final image size: a mere 11.5MB. If we had built this with a single-stage golang:1.22.4-alpine image (which includes the entire Go SDK and build tools), it would easily be hundreds of megabytes. This stark difference unequivocally highlights the power and efficiency gained through Docker multi-stage builds.
Optimizing Docker Multi-Stage Builds: Best Practices
While the basic multi-stage pattern is powerful, you can further enhance the performance, security, and efficiency of your Docker images. By applying these advanced tips, you can squeeze even more value out of your Docker multi-stage build process.
1. Choose Minimal Base Images for the Final Stage
For your final runtime stage, always default to the smallest possible base image that can still reliably execute your application. This choice has a direct impact on image size and security.
-
scratch: This is the ultimate minimal image. It contains no operating system, just your statically linked binary. It is perfect for Go applications compiled withCGO_ENABLED=0. An image built onscratchis typically only a few megabytes in size. -
alpine: A very popular and tiny Linux distribution (typically 5-7MB). It's an excellent choice for dynamically linked binaries ifscratchisn't feasible. Alpine uses musl libc, which can sometimes cause compatibility issues with binaries compiled against glibc. Therefore, thorough testing is recommended. -
Distroless images: Provided by Google, these images contain only your application and its direct runtime dependencies (e.g., glibc, ca-certificates), intentionally excluding shell utilities, package managers, and other components often found in minimal Linux distros. They are generally considered more secure than
alpinedue to their even smaller footprint and attack surface. Highly recommended for Java, Node.js, and Python applications.
Example with scratch for Go applications:
If your Go binary is truly statically compiled (as generated with CGO_ENABLED=0 in our previous example), you can leverage the scratch base image for an exceptionally small final image:
# ... (builder stage as before) ...
# --- Release/Runtime Stage with Scratch ---
FROM scratch AS final
# Add CA certificates for HTTPS communication. This is crucial for Go applications
# that need to make external HTTPS calls, as Go relies on the system's certificate store.
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
WORKDIR /usr/local/bin
COPY --from=builder /app/myapp .
CMD ["./myapp"]
When built, this might result in an image size of just 5-10MB, a testament to the power of scratch with Docker multi-stage builds.
2. Leverage .dockerignore Effectively
Don't overlook the .dockerignore file. It functions identically to a .gitignore file but for Docker builds. Any files or directories listed in .dockerignore will not be copied to the Docker daemon's build context. This is crucial for keeping your build context small, preventing unnecessary files from being inadvertently included in any build stage, and speeding up the docker build command itself.
Example .dockerignore content:
.git # Exclude Git repository metadata
.gitignore # Exclude the git ignore file itself
node_modules # Exclude Node.js dependencies if installed locally
build # Exclude local build artifacts
dist # Exclude local distribution directories
*.log # Exclude log files
*.swp # Exclude editor swap files
tmp/ # Exclude temporary directories
**/.vscode # Exclude VS Code specific configuration
3. Order Instructions for Effective Caching
Docker employs a layer-based caching mechanism. If a layer changes, all subsequent layers built upon it are invalidated and must be rebuilt. Therefore, strategically structure your Dockerfile so that instructions that change less frequently are placed earlier in the file.
For instance, copy go.mod and go.sum (or package.json and yarn.lock for Node.js) and execute dependency installation before copying your main application source code. This way, if only your application logic changes, Docker can efficiently use the cached dependency installation layer, significantly accelerating subsequent builds using Docker multi-stage builds.
# Good caching strategy for a Node.js application:
# --- Build Stage ---
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package.json and package-lock.json first
# This layer is cached unless these dependency files change
COPY package.json package-lock.json ./
RUN npm ci --loglevel=error --no-audit # Clean install, verbose errors off, no audit for speed
# Copy the rest of the application source code.
# Only this layer and subsequent ones rebuild if only source code changes.
COPY . .
# Run build commands like transpiling or bundling frontend assets
RUN npm run build
# --- Final Stage ---
FROM node:18-alpine AS final # or consider gcr.io/distroless/nodejs18-debian12
WORKDIR /app
# Copy only the compiled assets and runtime node_modules from the builder stage
COPY --from=builder /app/dist ./dist # Example for compiled frontend
COPY --from=builder /app/node_modules ./node_modules # Copy only production dependencies
COPY --from=builder /app/src ./src # Copy essential runtime source files
CMD ["node", "dist/index.js"] # Adjust based on your build output
4. Group Related RUN Commands for Smaller Layers
Multiple RUN instructions each create a new layer in your Docker image. While Docker is designed for layer optimization, it's generally more efficient to chain related commands together using && \ and to clean up any temporary files within the same RUN command. This practice reduces the total number of layers and helps minimize the final image size by discarding intermediate artifacts immediately.
# Bad practice: Creates three distinct layers, increasing image size
RUN apt-get update
RUN apt-get install -y some-package
RUN rm -rf /var/lib/apt/lists/*
# Good practice: Combines actions into a single layer, ensuring cleanup
RUN apt-get update && \
apt-get install -y --no-install-recommends some-package && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
The --no-install-recommends flag for apt-get install can further reduce the number of installed packages, making the image even smaller.
This approach is especially critical in build stages where you might install numerous development tools and then promptly remove their installation artifacts.
Real-World Use Cases for Docker Multi-Stage Builds
The versatility of Docker multi-stage builds makes them incredibly beneficial for almost any application, regardless of its primary language or framework. They provide a standardized approach to image optimization and security.
-
Go Applications: As demonstrated, Go applications often compile into single, statically linked binaries. These can be placed directly into a
scratchoralpinebase image, resulting in remarkably small and secure final container images, typically under 20MB. -
Node.js Applications: Docker multi-stage builds are crucial here. You can build frontend code (e.g., React, Angular, Vue) in a
builderstage that includeswebpack,babel,typescript, etc. Afterward, copy only the compiled static assets and the backend API code into a slim Node.js runtime image (e.g.,node:18-alpineor a distroless Node.js image). This completely avoids including alldevDependencies(npm,yarn) in the final production image. -
Java Applications: Compile your
.jaror.warfile within a dedicated Maven or Gradle builder stage. Then, copy only the resulting artifact into a JRE-only base image (e.g.,openjdk:17-jre-slim-busterorgcr.io/distroless/java17). This strategy prevents the JDK, Maven/Gradle, and other extensive build tools from being present in your production environment. Learn more about Maven Docker best practices. -
Python Applications: Utilize a
builderstage to install development dependencies, compile necessary C extensions, or build Python wheels. Then, copy only the essential.pyfiles and runtime-specific dependencies into a minimalalpine:3.19.1orpython:3.10-slim-busterimage. This avoids carrying over development headers and build tools. -
C/C++ Applications: Compile your C/C++ code using
gcc/makein one stage. Subsequently, copy only the compiled binary into an extremely minimalalpineordistroless/baseimage. This reduces the runtime dependencies to just the C standard library.
Essentially, any language or framework that distinguishes between its compilation/bundling dependencies and its runtime dependencies is an ideal candidate for leveraging Docker multi-stage builds.
When NOT to Use Docker Multi-Stage Builds
While Docker multi-stage builds offer widespread benefits, there are a few niche scenarios where the added complexity to a Dockerfile might not yield enough value to justify the effort.
-
Extremely Simple Static Asset Serving: If your "application" consists purely of a few static HTML, CSS, and JavaScript files that you intend to copy directly into a lightweight web server image like Nginx or Caddy, there might be no distinct "build artifact" creation process to separate. In such cases, a single-stage
Dockerfileis likely sufficient and simpler. -
Trivial Scripts or Executables: For very small, self-contained shell scripts or compiled utilities that have no complex build dependencies and are deployed as-is, the overhead of a two-stage build might not result in significant size reductions or security benefits to justify the additional lines of
Dockerfile. A simplealpinebase image with the script copied might be perfectly acceptable. - Development-Only Images: If your explicit goal is to create a "developer workstation" Docker image that needs all the tools (e.g., compiler, debugger, Git client, text editor) for interactive development, then deliberately including everything might be the intended design. The critical caveat here is to ensure these comprehensive development images are never deployed to or used in production environments.
These scenarios represent rare exceptions. For the vast majority of applications intended for production deployment, Docker multi-stage builds are the correct and highly recommended approach. The gains in security, performance, and operational efficiency almost always substantially outweigh any minor increase in Dockerfile complexity.
Frequently Asked Questions About Docker Multi-Stage Builds
Q1: Is the builder stage pushed to my Docker registry?
No, typically only the final stage of your Dockerfile (the one marked with AS that subsequent stages copy from, or the last stage if all have aliases) is relevant for the final image and is eventually pushed to the registry. All intermediate stages and their associated layers, like the builder stage, exist solely during the build process. They are discarded once the build completes, unless you explicitly configure Docker or BuildKit to save them (e.g., for build cache retention).
Q2: Can I have more than two stages in a Docker multi-stage build?
Absolutely! You can implement as many stages as required to logically separate your build process. This modularity can significantly enhance the clarity and efficiency of your Dockerfile. For example, you might define distinct stages for:
- A
depsstage to download and cache external dependencies. - A
buildstage to compile or bundle the application code, copying dependencies fromdeps. - A
teststage to execute unit and integration tests, potentially copying artifacts frombuild. - A
releasestage for the final production-ready image, copying only the necessary runtime artifacts frombuild.
Q3: What if my application needs build tools at runtime (e.g., Python pip install in a running container)?
This specific practice is generally considered an anti-pattern for production-hardened Docker images. Ideally, all application dependencies should be installed, and any external tools or compilation steps should be executed during the build stage, not within the running container in production. If your application genuinely requires a compiler or package manager to function periodically at runtime, it often indicates a design flaw in the application's packaging or architecture. For instance, a Python application should have all its pip dependencies installed and packaged within the builder stage, then copied to the final image, rather than attempting live installations in the running environment. If dynamic compilation is genuinely unavoidable for a specific use case, consider offloading that functionality to a dedicated, secure build service rather than bundling compilers into every production container.
Q4: How do I handle secrets securely during Docker multi-stage builds?
You should rigorously avoid directly embedding secrets into your Dockerfile or committing them into your source code repository. For build-time secrets (e.g., credentials for a private package repository), leverage Docker's --build-arg (as shown for APP_VERSION in the Go example), or, for enhanced security, use the experimental --secret feature available in BuildKit (# syntax=docker/dockerfile:1.4 at the top of your Dockerfile). For runtime secrets, best practices involve using dedicated secret management solutions such as Kubernetes Secrets, AWS Secrets Manager, HashiCorp Vault, or environment variables and mounting them securely into the container at deployment time. Never COPY secret files into your final image, even if they are only present in a build stage, without extreme caution and ensuring their immediate and verifiable deletion within the same RUN command to prevent layer leakage.
Q5: Will using Docker multi-stage builds necessarily increase my total build time?
Initially, yes, especially if you had a very simplistic single-stage build. You are now explicitly orchestrating two distinct environments and their respective operations. However, the gains derived from improved layer caching (due to the more granular, layered approach) and the significantly reduced final image size often lead to a faster overall CI/CD pipeline, particularly during crucial stages like image pushes/pulls and actual deployments. The potential slight increase in pure build time is generally a worthwhile trade-off for the substantial benefits in security, efficiency, and maintainability provided by Docker multi-stage builds.
Conclusion
Docker multi-stage builds are more than just a convenience; they are a fundamental and indispensable technique for constructing robust, secure, and performant Docker images explicitly designed for production environments. By consciously and thoroughly separating your build-time dependencies from your runtime components, you immediately unlock critical advantages: significantly smaller images, a drastically reduced attack surface, and faster, more reliable deployments.
It's highly recommended to allocate time to refactor your existing Dockerfiles to incorporate Docker multi-stage builds. Begin by selecting your most critical applications and diligently observe the measurable differences in image size, image pull times, and, most importantly, the tangible security benefits. This isn't merely about saving a few megabytes; it represents a commitment to adhering to modern security best practices and cultivating a more resilient, efficient software delivery pipeline. Your infrastructure team, your security team, and ultimately, your end-users will undoubtedly recognize and appreciate the improvements.
Next steps for you:
-
Audit your current
Dockerfiles: Systematically identify applications that are prime candidates for conversion to Docker multi-stage builds. - Start simple: Pick one application that could benefit most and implement a basic two-stage build (a builder stage and a final runtime stage).
-
Experiment with base images: Explore different minimal base images for your final stage, such as
alpine,scratch, ordistrolessvariants, to find the optimal fit for your application. - Measure the impact: Crucially, track and compare key metrics like image sizes, total build times, and deployment durations both before and after implementing Docker multi-stage builds. This data will quantify the improvements.
Top comments (0)