Juan Torchia

Posted on Apr 17 • Edited on Apr 20 • Originally published at juanchi.dev

I Optimized a Docker Image from 1.58GB to 186MB — And Silently Broke Hot Reload for Two Days

#english #history #typescript #node

I spent three hours optimizing a Docker image for a client. Took it from 1.58GB to 186MB. Sent the PR with an immaculate description, metrics included, everything neat. I felt like a genius.

Two days later, one of the devs messages me: "Hey, hot reload hasn't been working since your change got merged."

Two days. 48 hours of a team grinding without hot reload, restarting the server by hand, probably silently hating me without even knowing why.

I'm not telling this story to seem humble. I'm telling it because the post that inspired this one — I Shrunk My Docker Image From 1.58GB to 186MB — ends exactly where the real problem begins. The second half of the title, "Then I Had to Explain What I Actually Broke", is the part nobody writes. And it's the most important part.

How to actually optimize Docker image size: what really works

Before I get to what I broke, the happy path. Because the optimization itself is legitimate and worth understanding properly.

The project was a Node.js/Express app with TypeScript. Official base image, everything in a single stage, node_modules included with devDependencies and all. Classic.

# ORIGINAL Dockerfile — the one weighing 1.58GB
FROM node:20

WORKDIR /app

# Copy everything without filtering anything
COPY package*.json ./
RUN npm install

COPY . .

# Build TS
RUN npm run build

EXPOSE 3000
CMD ["node", "dist/index.js"]

This Dockerfile has all the classic problems: full base image with compilers, devDependencies installed and present in the final image, no effective .dockerignore, no separation of concerns between build and runtime.

The fix was a multi-stage build with an Alpine image:

# OPTIMIZED Dockerfile — 186MB
# Stage 1: build
FROM node:20-alpine AS builder

WORKDIR /app

# Dependencies first to leverage layer caching
COPY package*.json ./
RUN npm ci --include=dev

# Copy source and compile
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build

# Stage 2: production — only what needs to run
FROM node:20-alpine AS production

WORKDIR /app

# Production dependencies only
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Only the compiled code, not the source
COPY --from=builder /app/dist ./dist

EXPOSE 3000
CMD ["node", "dist/index.js"]

And the .dockerignore that matters just as much as the Dockerfile itself:

# .dockerignore — everything that should NOT go in
node_modules
dist
.git
.gitignore
*.md
.env*
.dockerignore
Dockerfile*
npm-debug.log*

Result: 1.58GB → 186MB. 88% reduction. Pull times in CI/CD dropped from 4 minutes to 40 seconds. Legit.

What I broke without realizing it

Here's the problem nobody mentions in optimization tutorials.

The project used a single Dockerfile for both dev and production. In development, they'd spin up the container with docker-compose and a volume mounted over /app, running ts-node-dev for hot reload. In production, they ran the final stage with the compiled code.

When I switched the Dockerfile to multi-stage, the production stage came out perfect. But docker-compose.dev.yml was still pointing to the same Dockerfile without specifying a target:

# docker-compose.dev.yml — BEFORE my change
services:
  api:
    build:
      context: .
      dockerfile: Dockerfile  # No target specified
    volumes:
      - ./src:/app/src  # Hot reload via volume
    command: npm run dev  # ts-node-dev
    ports:
      - "3000:3000"

When Docker builds a multi-stage Dockerfile without a target, it uses the last stage. The last stage was production. The production stage doesn't have ts-node-dev installed. It doesn't have the source code. It only has the dist/ compiled at build time.

So the volume ./src:/app/src was mounting the source files... but there was nothing listening to them. The running process was node dist/index.js on static code. Changes to the source did absolutely nothing.

And the worst part: the container started without any errors. The app worked. Everything looked fine. It's just that code changes weren't reflected until someone manually rebuilt the image.

Two days of that.

# docker-compose.dev.yml — FIXED
services:
  api:
    build:
      context: .
      dockerfile: Dockerfile
      target: builder  # Explicit: use the stage with devDependencies
    volumes:
      - ./src:/app/src
      - ./tsconfig.json:/app/tsconfig.json
    command: npm run dev
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development

With target: builder specified, compose uses the stage that has all devDependencies including ts-node-dev, and hot reload works again.

Alternatively — and this is the solution I ended up implementing to make it more explicit — separate the Dockerfiles:

# Dockerfile.dev — development only, no ambiguity
FROM node:20-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci  # All dependencies, including dev

# Source is mounted by the compose volume
# We don't copy anything else here

EXPOSE 3000
CMD ["npm", "run", "dev"]

# docker-compose.dev.yml — explicitly using Dockerfile.dev
services:
  api:
    build:
      context: .
      dockerfile: Dockerfile.dev  # Zero ambiguity possible
    volumes:
      - ./src:/app/src
      - ./tsconfig.json:/app/tsconfig.json
    ports:
      - "3000:3000"

More files, zero confusion.

The most common mistakes when optimizing Docker images

After this episode I started documenting the gotchas that don't show up in tutorials.

1. Alpine and native dependencies

Alpine uses musl libc instead of glibc. Some Node packages with native binaries (bcrypt, sharp, canvas) won't compile on Alpine or behave differently. If your app uses any of these, test the image before celebrating the size:

# If you're having trouble with native binaries on Alpine,
# use slim instead — less dramatic but safer
FROM node:20-slim AS production

2. Layer order matters for caching

I knew this one and I still see it broken constantly:

# BAD — invalidates the dependency cache with every code change
COPY . .
RUN npm install

# GOOD — the npm install cache survives source changes
COPY package*.json ./
RUN npm install
COPY . .

3. npm install vs npm ci

In Docker, always npm ci. No debate. npm install can resolve different versions each time. npm ci uses the lockfile and is reproducible.

4. Not cleaning the npm cache

# After installing, clean the cache — saves 50-100MB easily
RUN npm ci --only=production && npm cache clean --force

5. The .dockerignore people forget

Without .dockerignore, your local node_modules gets sucked into the build context and can overwrite what Docker installed. Always, always, .dockerignore before any other optimization.

FAQ: Docker image optimization

How much can I reduce a typical Node.js Docker image?

Depends on the starting point, but in real-world projects the typical range is 70-90% reduction. Going from node:20 (1.1GB base) to node:20-alpine (45MB base) is already dramatic. Add multi-stage to separate devDependencies from runtime and it's common to go from 1-2GB down to 150-300MB.

Should I always use Alpine?

No. Alpine is excellent for most cases but has incompatibilities with packages that use native binaries compiled against glibc. If you're using sharp, bcrypt, canvas or similar, validate on Alpine before deploying. If there are issues, node:20-slim is the middle ground: smaller than the full image, more compatible than Alpine.

What is multi-stage build and why does it reduce size?

Multi-stage build lets you have multiple FROM statements in a single Dockerfile. Each stage is a separate environment. You can do the build in one stage with all the tools you need and then copy only the final artifact into a clean stage. The resulting image only contains the last stage — no compilers, no devDependencies, no source code if you don't need it.

How do I know what's taking up space in my image?

Use docker image history image-name to see the size of each layer. For more detailed analysis, dive is an excellent tool: it shows you each layer with an interactive file explorer and how much space each file contributes.

# Install dive
brew install dive  # macOS
# or
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive image-name

Does image size affect runtime performance?

Image size mainly affects pull and push times — which directly impact CI/CD pipelines and cold start times on platforms like Railway or Fly.io. Once the container is running, image size doesn't affect performance. What does affect runtime is the number of processes, allocated memory, and Node configuration — not image size.

How do I avoid the hot reload problem described in this post?

The most robust solution is to have separate Dockerfiles for dev and production (Dockerfile and Dockerfile.dev). If you prefer a single multi-stage Dockerfile, always specify the target in docker-compose.dev.yml. Never let Docker assume which stage to use in a dev compose — the default assumption is the last stage, which is usually the production one.

The metric that's missing from every optimization post

The number of MB you shave off is the easiest metric to show and the least important one for the team.

The metric that matters is: did the development workflow stay intact? Can the team make changes and see them reflected immediately? Is the dev/prod parity good enough for bugs to surface before deployment?

I failed that metric. The image looked beautiful. The team lost two days.

If you're tackling an optimization like this, add this to your checklist before merging:

Did you run docker-compose up and modify a file in /src? Did the change show up?
Are there environment variables that the production stage doesn't have?
Do the health checks work the same way?
Are the static file paths the same?

Four questions, ten minutes. Would have saved two days of broken hot reload.

It's the same principle I apply to any infrastructure change — from the distributed systems stuff I talked about in the post on multi-agent development to working with custom runtimes like the Rust one for TypeScript: optimizing one dimension without measuring the impact on the others is the most elegant way to break things. I learned that at an internet café at 14, fixing dropped connections with the place packed — if your solution creates a new problem nobody can see, it's not a solution.

The 186MB looks great in the PR. The team that can hot reload feels great day to day. Optimize both.