Alan Varghese

Posted on Jan 23

From Bloated to Streamlined: A Developer's Guide to Docker Image Optimization

#docker #webdev #python #javascript

Learn how to significantly reduce the size of your Docker images with practical, step-by-step techniques. We'll take a simple app and apply optimizations like multi-stage builds, choosing the right base image, and more.

Introduction

As developers, we love using Docker for its portability and consistency. However, it's easy to end up with large, bloated Docker images. Why does image size matter?

Faster CI/CD Pipelines: Smaller images are quicker to build, push, and pull, speeding up your development and deployment cycles.
Lower Costs: Reduced image size means less storage space used in your container registry and on your servers.
Improved Security: A smaller image has a smaller attack surface, with fewer packages and libraries for vulnerabilities to hide in.

In this guide, we'll walk through a hands-on lab to demonstrate how to slash your Docker image sizes. We'll start with simple "Hello World" applications in Python and Node.js and apply a series of powerful optimization techniques.

Our Sample Applications

For this lab, we'll use two basic applications.

Python (app.py):

print("Hello, from Python!")

With an empty requirements.txt:

# No dependencies for this simple app

Node.js (app.js):

console.log("Hello, from Node.js!");

And a standard package.json:

{
  "name": "docker-opt-lab",
  "version": "1.0.0",
  "description": "",
  "main": "app.js",
  "scripts": {
    "start": "node app.js"
  },
  "author": "",
  "license": "ISC"
}

The Starting Point: The Unoptimized Image

Let's start with a basic, unoptimized Dockerfile for each application, using a standard Debian-based image.

Python (Dockerfile.python-debian):

FROM python:3.9-slim-buster

WORKDIR /app

COPY . .

CMD ["python", "app.py"]

Node.js (Dockerfile.node-debian):

FROM node:16-buster

WORKDIR /app

COPY . .

CMD ["node", "app.js"]

After building these, we get our baseline sizes. For example, the python-debian image might be around 115MB, and the node-debian image could be about 940MB. Let's see how we can shrink these.

Technique 1: Choose Your Base Image Wisely (Alpine vs. Debian)

One of the easiest wins in image optimization is choosing a smaller base image. The alpine variants are significantly smaller than their Debian-based counterparts because they're based on Alpine Linux, a minimal Linux distribution.

Let's switch our base images to alpine.

Python (Dockerfile.python-alpine):

FROM python:3.9-alpine

WORKDIR /app

COPY . .

CMD ["python", "app.py"]

Node.js (Dockerfile.node-alpine):

FROM node:16-alpine

WORKDIR /app

COPY . .

CMD ["node", "app.js"]

Just by switching to alpine, the Python image might drop to around 48MB and the Node.js image to 115MB. That's a huge reduction!

Note: While alpine images are small, they use musl libc instead of glibc. This can sometimes lead to compatibility issues with certain packages, so always test your application thoroughly.

Technique 2: Slim Down with Multi-Stage Builds

For applications that have a build step (like installing dependencies or compiling code), multi-stage builds are a game-changer. The idea is to use one stage (the "builder") to perform all the build-related tasks and then copy only the necessary artifacts into a clean, lightweight final stage.

Let's see this with our Node.js application.

Node.js (Dockerfile.node-multistage):

# --- Build Stage ---
FROM node:16-buster AS builder

WORKDIR /app

COPY package.json .
RUN npm install --production

# --- Production Stage ---
FROM node:16-slim

WORKDIR /app

COPY --from=builder /app/node_modules ./node_modules
COPY app.js .

CMD ["node", "app.js"]

How does this work?

The builder stage uses the full node:16-buster image, which includes npm and all the tools needed to install dependencies.
We copy package.json and run npm install.
The production stage starts from a fresh, slim base image (node:16-slim).
We use COPY --from=builder to copy only the node_modules folder from the builder stage into our final image. We also copy our application code.

The final image contains only the compiled node_modules and app.js, without any of the development tooling. This can bring our Node.js image size down even further.

Technique 3: Don't Forget the `.dockerignore` File

The .dockerignore file works just like .gitignore. It lets you specify files and directories that should be excluded from the Docker build context. This is important to prevent sensitive files or unnecessary clutter from ending up in your image.

A good .dockerignore file might look like this:

.git
.gitignore
Dockerfile*
*.md
README.md

This ensures that your Git history, .md files, and the Dockerfiles themselves aren't copied into the image, keeping it clean and lean.

Technique 4: Layering and Cleanup

Each RUN instruction in a Dockerfile creates a new layer. You can reduce the number of layers by chaining commands together with &&. This also allows you to clean up temporary files in the same layer they were created in.

For example, instead of this:

RUN apt-get update
RUN apt-get install -y curl

Do this:

RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*

By running rm -rf /var/lib/apt/lists/*, we clean up the apt cache in the same RUN command, which reduces the final image size.

Conclusion

Let's look at the potential impact of these optimizations.

Unoptimized (Debian): ~115MB (Python), ~940MB (Node.js)
Alpine: ~48MB (Python), ~115MB (Node.js)
Multi-stage (Node.js): Can be even smaller than the Alpine image, depending on dependencies.
With .dockerignore and cleanup: Every little bit helps!

By applying these simple but effective techniques, you can drastically reduce the size of your Docker images. This leads to a more efficient, secure, and cost-effective development process.

Happy optimizing!