Alexey Cherednichenko

Posted on Jan 17

Stop Shipping Fat Python Docker Images: Multi-Stage Builds Explained

#python #docker #devops #security

Hey folks! Let’s be honest — if you’ve ever dockerized a basic Python app, you’ve probably run into this: you build a tiny Flask “Hello World” project, and somehow the Docker image balloons to almost a gigabyte. Feels a bit like shipping a single pizza in a cargo freighter.

So, what’s happening? Why do these Python images blow up in size?

Overview

The "Fat" Image Problem

1) Invisible Bloat: pip install pulls in all sorts of stuff behind the scenes. If your app uses pandas or psycopg2, for example, you’ll need build tools like gcc just to get them installed. Once pip’s done, those tools stick around, eating up space for no good reason.
2) The Base Image Trap: If you start with something like FROM python:3.11, you’re basically dragging along a full-blown Debian OS — utilities, tools, and a bunch of extras that your app will never touch in production.
3) Hidden Leftovers: Pip caches, temporary files, and build artifacts. They just pile up, taking up room and serving no purpose once the image is built.

Why It Matters (The L3 Perspective)

I deal with this stuff every day as an L3 Support Engineer. And it’s not just about storage. Every unnecessary megabyte is another opportunity for attackers. If someone breaks into your container, all those extra tools (like gcc, curl, or git) give them a head start to compile exploits or move laterally through your network.

The Plan

We want images that are lean, locked down, and fast. Here’s what we are going to achieve:

1) Shrink your image size by as much as 80%.
2) Strip out build tools before production for tighter security.
3) Speed up your CI/CD pipeline.

The trick? Multi-Stage Builds. Let’s get into it.

Step 0: The Project Setup

Before we even touch Docker, let’s get our bearings with a simple Python app. The setup looks like what you’d see in most real-world projects.

Project Structure:

my-python-app/ 
├── templates/ 
│   └── index.html
├── app.py 
├── requirements.txt 
└── Dockerfile

templates/index.html:

<!DOCTYPE html>
<html>
<head>
    <title>Docker Slim App</title>
</head>
<body>
    <h1>Docker Multi-stage is working!</h1>
    <p>This image is slim, secure, and production-ready.</p>
</body>
</html>

app.py (A simple Flask app):

from flask import Flask, render_template

app = Flask(__name__)

@app.route('/')
def hello():
    return render_template("index.html")

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

requirements.txt:

flask==3.0.0

Step 1: Use a Specific Base Image

Let’s talk about the FROM instruction. This line sets the foundation for your whole Dockerfile. Most folks just slap in FROM python:3.11 and call it a day. The problem? That image is a monster.

You actually have two lighter choices: Alpine and Slim.

Now, Alpine gets a lot of hype because it’s tiny—the base is only about 5MB. Sounds great, right? But if you’re working with Python, Alpine often causes additional complexity for Python workloads because most wheels expect glibc.

Here’s the deal: Alpine uses musl libc, but almost all pre-built Python packages (wheels) expect glibc (which you’ll find on Debian or Ubuntu). With Alpine, pip can’t use those wheels. Instead, it tries to build everything from scratch. That quick 30-second install? Suddenly, it drags out for 20 minutes and usually dies with some confusing error.

The Pro Move: Go with Python-Slim. It’s basically Debian, but trimmed down. You get glibc compatibility, so your packages install fast, and the base image is still only about 120MB. No headaches, no drama.

# Start with the slim version for stability and speed
FROM python:3.11-slim

Step 2: The "Builder" Stage (Compiling Dependencies)

# Stage 1: The Builder
FROM python:3.11-slim as builder

# Install system dependencies needed for building packages
# Since we are on a Debian-based image (like Ubuntu), we use apt
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    gcc \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Create the virtual environment
RUN python -m venv /opt/venv
# Ensure subsequent commands use the venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt \
    && pip install --no-cache-dir gunicorn

Why bother with a virtual environment inside Docker?

Yeah, Docker containers are already isolated. But when we use /opt/venv, we keep all our installed libraries tucked away in one spot. Later, instead of chasing files all over the place, we just grab that one folder and we’re good to go.

Step 3: The "Final" Stage (The Lean Runner)

Now for the fun part. We start fresh with a slim image—no compilers, no build tools, nothing extra.

# Stage 2: The Production Image
FROM python:3.11-slim

WORKDIR /app

# Copy the virtual environment from the builder
COPY --from=builder /opt/venv /opt/venv

# Bring in the app code and templates
COPY . .

# Set some environment variables
ENV PATH="/opt/venv/bin:$PATH" \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

EXPOSE 5000

CMD ["gunicorn", "-b", "0.0.0.0:5000", "app:app"]

So, what’s actually going on here?

1) COPY --from=builder: This is what makes multi-stage builds awesome. We tell Docker, “Hey, grab only the /opt/venv folder from our heavy builder image and drop it in here.”

2) PYTHONDONTWRITEBYTECODE=1: Stops Python from scattering .pyc files everywhere. Keeps things tidy.

3) PYTHONUNBUFFERED=1: Super important for Docker. It makes sure all your logs and print statements show up right away in the terminal, not stuck waiting somewhere.

Step 4: Security Hardening (Non-Root User)

Never run your app as root in production. If someone finds a bug in your code and breaks in, you don’t want them running wild as the most powerful user in the system.

To lock things down, you need a dedicated user with limited access.

Updated Final Stage:

# Stage 2: The Production Image
FROM python:3.11-slim

# Add a new group and user just for your app
RUN groupadd -r appuser && useradd -r -g appuser appuser

WORKDIR /app

# Bring in the virtual environment from the builder stage
COPY --from=builder /opt/venv /opt/venv

# Copy your code and hand it over to the new user
COPY --chown=appuser:appuser . .

# Set up environment variables
ENV PATH="/opt/venv/bin:$PATH" \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

# Make sure everything below runs as the non-root user
USER appuser

EXPOSE 5000

CMD ["gunicorn", "-b", "0.0.0.0:5000", "app:app"]

Why bother with all this?

1) groupadd & useradd: You’re making a user called appuser. No password, no home directory, nothing fancy. It’s just there to run your app and nothing else.
2) --chown=appuser:appuser: As soon as your code lands in the image, it belongs to appuser. No root access, no extra powers.
3) USER appuser: This one’s huge. Now, everything after this — including your app itself — only runs with the limited permissions of appuser.

Step 5: Don't Forget the .dockerignore

Think of .dockerignore as Docker’s version of .gitignore. It tells Docker, “Hey, when I copy everything over, skip these files.”

If you miss this, you could end up copying your whole local venv into the image and mess up the clean one you built earlier.

Just drop a .dockerignore file in your project’s root directory:

.git
.gitignore
__pycache__/
*.pyc
venv/
.env
.vscode/

Why does this matter so much?

Stops needless rebuilds: If you change a random local file (like a README or git log), Docker won’t waste time rebuilding your image layers from scratch.
Protects sensitive stuff: You don’t want .git history or .env files — especially with secrets or passwords — ending up in your production image.
Speeds things up: Your build context shrinks. Less junk to copy means your build kicks off almost immediately.

Conclusion

Let’s see what we started with and where we ended up. These changes took a clunky, insecure container and turned it into something you’d actually want running in production.

Feature	Standard FROM python:3.11	Our Multi-Stage Image
Final Size	~467 MB	~154 MB
Security	Runs as `root` (Dangerous)	Non-root user (Secure)
Compilers	`gcc`, `build-essential` present	Removed
Cleanliness	Contains pip cache & junk	Only necessary files

Shrinking the image by almost 85% isn’t just good for bragging rights. Now deployments go faster, you save on storage, and your security gets a serious upgrade. Take it from me—when your image is lean, you have fewer weird issues, and way fewer middle-of-the-night emergencies.

Before you send your next Python image to the registry, double-check these:

Before you push your next Python image to the registry, run through this quick checklist:

Use a -slim base image. Don’t pull in the whole kitchen sink unless you have to.
Build in stages. Compilers and headers stay in the builder; keep your final image tidy.
Bundle dependencies with venv. Keep everything in a single, portable folder.
Never run as root. Spin up a dedicated appuser instead.
Check your .dockerignore. Don’t ship .git, secrets, or __pycache__ by accident.

6. Set PYTHONUNBUFFERED=1. That way, your logs actually show up in real time.

Honestly, good DevOps isn’t just about getting things to work. It’s about doing it efficiently and securely. Multi-stage builds are an easy win if you want to step up your Docker game.