Mustafa Siddiqui

Posted on Oct 13

Docker Images: A Developer's Guide

#beginners #devops #architecture #docker

Docker images are the blueprint for your containers. Think of them as a snapshot of everything your application needs to run, the code, runtime, libraries, dependencies, environment variables, all packaged together in a reproducible format. When you run a container, you're essentially spinning up an instance from that image. The beauty of this approach is that if it works on your machine, it'll work anywhere Docker runs.

How Images Actually Work

Images are built in layers, like a stack of transparent sheets. Each instruction in your Dockerfile creates a new layer that sits on top of the previous ones. This layering system is one of Docker's key innovations, it enables efficient storage, fast builds, and easy distribution.

When you change your code and rebuild, Docker is smart enough to reuse unchanged layers from its cache. This means if you're just updating application code but your dependencies haven't changed, Docker won't reinstall everything from scratch. It'll use the cached layer and only rebuild what's changed.

FROM node:18-alpine    # Layer 1: Base image
WORKDIR /app          # Layer 2: Set working directory
COPY package*.json ./ # Layer 3: Copy dependency files
RUN npm install       # Layer 4: Install dependencies
COPY . .              # Layer 5: Copy application code
CMD ["npm", "start"]  # Layer 6: Define startup command

Order matters here, and it matters a lot. Put the stuff that changes least often at the top (like installing dependencies) and the stuff that changes most often at the bottom (like your application code). This way, you're not reinstalling all your packages every time you fix a typo in your source code.

Each layer is read-only and identified by a cryptographic hash of its contents. When you modify a layer, Docker creates a new layer rather than modifying the existing one. This immutability is what makes images so reliable and reproducible.

Understanding Base Images

Every Dockerfile starts with a FROM instruction that specifies a base image. This is the foundation everything else builds on. You've got options here:

Full OS images like ubuntu:22.04 or debian:bookworm give you a complete Linux distribution. They're big (often 100MB+) but familiar and include standard tools like bash, apt, and common utilities. Use these when you need maximum compatibility or you're just getting started.

Language runtime images like node:18, python:3.11, or openjdk:17 are maintained by the community and include the language runtime plus build tools. They're convenient but often bloated with stuff you don't need in production.

Alpine-based images use Alpine Linux, a security-oriented, lightweight distribution. node:18-alpine or python:3.11-alpine can be 5 to 10x smaller than their full counterparts. Alpine uses musl libc instead of glibc, which occasionally causes compatibility issues with compiled binaries, but for most use cases it's rock solid.

Slim variants like python:3.11-slim or node:18-slim are based on Debian but with most non-essential packages stripped out. They're a middle ground, smaller than full images but more compatible than Alpine.

Making Images Smaller and Faster

Image size matters more than you might think. Big images are slow to build, slow to push to registries, slow to pull in production, and take up storage on every node in your cluster. Plus, every package you include is another potential vulnerability.

Multi-Stage Builds

This is one of the most powerful optimization techniques. The idea is simple: use one stage to build your application with all the necessary build tools, then copy just the compiled artifacts to a minimal final image.

# Build stage
FROM golang:1.21 as builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o myapp .

# Final stage
FROM alpine:3.18
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/myapp .
CMD ["./myapp"]

The final image contains only your binary and the minimal Alpine base. The Go compiler and all the intermediate build artifacts are left behind in the builder stage, which gets discarded. Your final image might be 10 to 20MB instead of 1GB+.

This pattern works brilliantly for any compiled language—Go, Rust, C++, even Java with GraalVM native image. For interpreted languages like Node or Python, you can still use multi-stage builds to separate dev dependencies from production dependencies.

# Dependencies stage
FROM node:18-alpine as deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Build stage
FROM node:18-alpine as builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:18-alpine
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package*.json ./
USER node
CMD ["node", "dist/index.js"]

Distroless Images

Google's distroless images take minimalism to another level. They contain only your application and its runtime dependencies—no shell, no package manager, no GNU utilities, nothing. Just enough to run your code.

FROM golang:1.21 as builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o myapp

FROM gcr.io/distroless/static-debian11
COPY --from=builder /app/myapp /
CMD ["/myapp"]

The security benefits here are real. No shell means no shell exploits. No package manager means attackers can't install tools. The attack surface is minimal. Distroless images are available for common runtimes: Java, Python, Node, and static binaries.

The downside? Debugging is harder. You can't docker exec into a running container and poke around because there's no shell. But for production workloads, that's often a feature, not a bug. For debugging, use a debug variant during development or use ephemeral debug containers in Kubernetes.

Scratch Images

The ultimate minimal base is scratch—an empty image. It's literally nothing. This only works for static binaries that don't depend on any system libraries.

FROM golang:1.21 as builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o myapp

FROM scratch
COPY --from=builder /app/myapp /myapp
ENTRYPOINT ["/myapp"]

Your final image will be just the size of your binary—could be single digit megabytes. Perfect for microservices and serverless deployments where cold start time matters.

Layer Optimization Techniques

Understanding layers means you can optimize them. Here are some patterns:

Combine RUN commands to reduce layers. Instead of:

RUN apt-get update
RUN apt-get install -y package1
RUN apt-get install -y package2

Do this:

RUN apt-get update && apt-get install -y \
    package1 \
    package2 \
    && rm -rf /var/lib/apt/lists/*

This creates one layer instead of three, and cleaning up the apt cache in the same layer means the temporary files don't end up in your final image.

Split COPY operations strategically. Copy dependency files first, install them, then copy application code:

# Good: dependency layer is cached
COPY package*.json ./
RUN npm ci
COPY . .

# Bad: entire layer rebuilds on any code change
COPY . .
RUN npm ci

Use .dockerignore to keep unnecessary files out of your build context. This makes builds faster and prevents sensitive files from accidentally ending up in images:

node_modules
.git
.env
*.md
.vscode
coverage

Security and Hardening

Security starts with the base image. Use official images from trusted sources, and keep them updated. Run docker scan or use tools like Trivy to check for known vulnerabilities:

trivy image myapp:latest

Run as non-root. By default, containers run as root, which is unnecessary and dangerous. Create a user and switch to it:

FROM node:18-alpine

# Create app directory and user
RUN mkdir -p /app && \
    addgroup -g 1001 -S appuser && \
    adduser -u 1001 -S appuser -G appuser && \
    chown -R appuser:appuser /app

# Switch to non-root user
USER appuser
WORKDIR /app

COPY --chown=appuser:appuser . .
RUN npm ci --only=production

CMD ["node", "index.js"]

Minimize attack surface. Every package you include is another potential vulnerability. Distroless and scratch images are inherently more secure because there's less code to exploit. No shell, no package manager, no problem.

Use specific image tags. Never use latest in production. Pin to specific versions so you know exactly what you're running:

# Bad
FROM node:18

# Good
FROM node:18.17.1-alpine3.18

Handle secrets properly. Never bake secrets into images. Use BuildKit's secret mounts for build time secrets:

# syntax=docker/dockerfile:1
FROM alpine
RUN --mount=type=secret,id=mysecret \
    cat /run/secrets/mysecret

Then build with:

docker build --secret id=mysecret,src=./secret.txt .

The secret is never stored in any image layer.

BuildKit and Modern Build Features

BuildKit is Docker's next-gen build engine. It's enabled by default in recent versions and unlocks powerful features:

Parallel builds - BuildKit can build independent stages simultaneously, dramatically speeding up multi-stage builds.

Better caching - More intelligent cache invalidation and the ability to use remote caches.

Build secrets and SSH forwarding - Mount secrets without them ending up in your image history.

Enable BuildKit explicitly with:

DOCKER_BUILDKIT=1 docker build .

Or set it permanently:

export DOCKER_BUILDKIT=1

Remote caching lets you share build caches across machines or CI pipelines:

docker build \
  --cache-from type=registry,ref=myregistry/myapp:cache \
  --cache-to type=registry,ref=myregistry/myapp:cache \
  -t myapp:latest .

Practical Workflow Tips

Tag semantically. Use versioning that makes sense:

docker build -t myapp:1.2.3 .
docker tag myapp:1.2.3 myapp:1.2
docker tag myapp:1.2.3 myapp:1
docker tag myapp:1.2.3 myapp:latest

This gives you rollback options and clear version tracking.

Inspect your images. Use docker history to see what's taking up space:

docker history myapp:latest

Use dive (a third party tool) for interactive exploration of layers and file changes.

Keep images focused. One service per container. Don't try to run your web server, database, and message queue in one image. That's not the Docker way. Each service gets its own container, and they communicate over networks.

Test your images locally. Before pushing to production, run them locally with production like settings:

docker run --read-only --tmpfs /tmp --user 1001 myapp:latest

This verifies your app works without write access and as a non-root user.

The Build Process Under the Hood

When you run docker build, Docker (or BuildKit) does several things:

It sends the build context (current directory by default) to the Docker daemon
It processes each Dockerfile instruction sequentially
For each instruction, it checks if a cached layer exists
If cache hits, it reuses the layer; if not, it executes the instruction
Each executed instruction creates a new intermediate container, runs the command, and commits the result as a new layer
The final image is a stack of all these layers

Understanding this helps you optimize. Want faster builds? Reduce the build context size with .dockerignore. Want better caching? Order your instructions carefully.

Common Pitfalls to Avoid

Running apt-get/yum in separate RUN commands. Always update and install in one command and clean up in the same layer.

Copying everything then ignoring. Use .dockerignore instead of copying everything and then trying to delete files.

Not leveraging build cache. Put expensive operations that change rarely at the top of your Dockerfile.

Forgetting health checks. Add a HEALTHCHECK instruction so orchestrators know if your app is actually working:

HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8080/health || exit 1

Using ADD when you mean COPY. ADD has magic behavior (auto extraction, URL fetching). Use COPY for simple file copying—it's explicit and predictable.

The Bottom Line

Docker images are how you package applications for the modern world. Master the fundamentals—understand layers, optimize for size and security, use multi-stage builds, and leverage modern BuildKit features. Start with good base images, be deliberate about what you include, and always think about what actually needs to be in production.

A well crafted Dockerfile is infrastructure as code at its best reproducible, versionable, and reliable. Take the time to optimize your images. Your future self will thank you when deployments are fast, secure, and bulletproof.

DEV Community