DEV Community

mk668a
mk668a

Posted on

Which package is bloating your Docker image?

layer-blame is git blame for image layers — it names the package responsible for every byte.

Here's what it prints when you point it at a stock Alpine image:

$ docker save alpine:3.20 -o alpine.tar
$ layer-blame alpine.tar

Image total: 8.4 MB across 1 layers  ·  package attribution: 100%

Layer 0  8.4 MB  ADD alpine-minirootfs-3.20.10-aarch64.tar.gz /
      4.8 MB  libcrypto3                      pkg   ← largest line highlighted
    911.7 KB  libssl3                         pkg
    906.0 KB  busybox                         pkg
    706.5 KB  musl                            pkg
    327.1 KB  apk-tools                       pkg
Enter fullscreen mode Exit fullscreen mode

Every byte in that layer now has an owner. libcrypto3 is 4.8 of the 8.4 MB. That's the whole pitch: it's git blame, but the blamed thing is the package responsible for a layer's size.

The gap it fills

You already have two tools for image size, and between them there's a hole.

docker history tells you that a layer is 95 MB. It will not tell you what's in it.

dive (~54k⭐) lets you browse what's in it — file by file, interactively. It answers what is in a layer.

Neither answers the question you actually walked up with: "which package put those bytes here?" So the standard ritual is: read docker history for the fat layer, open dive, eyeball the tree, and guess. People have asked dive for package-level breakdown directly — see dive#291, open for years.

layer-blame JOINs each layer's added files against the image's own package databases — apk for Alpine, dpkg for Debian/Ubuntu — and attributes every byte to a package. It's not a browser. It's a non-interactive report you can paste into a PR or wire into CI.

dive for browsing, layer-blame for attribution. They're complementary.

Where do Python's 137 MB actually go?

This is the example that sold me on building it. python:3.12-slim is famously chunky. Here's the breakdown:

$ docker save python:3.12-slim -o py.tar
$ layer-blame --top 5 py.tar

Image total: 137.7 MB across 4 layers  ·  package attribution: 69%

Layer 0  95.8 MB  # debian.sh --arch 'arm64' out/ 'trixie' ...
     22.5 MB  libc6                           pkg
      9.1 MB  coreutils                       pkg
      7.4 MB  perl-base                       pkg
      7.3 MB  libssl3t64                      pkg
      6.6 MB  util-linux                      pkg

Layer 2  38.2 MB  RUN /bin/sh -c set -eux; savedAptMark="$(apt-mark showmanual)";      6.3 MB  /usr/local/lib/libpython3.12.so.1.0                          file
      1.8 MB  /usr/local/lib/python3.12/ensurepip/_bundled/pip-...whl       file
      1.1 MB  /usr/local/lib/python3.12/lib-dynload/unicodedata...so        file
      ...
Enter fullscreen mode Exit fullscreen mode

Look at Layer 2. The big contributors come back as files, not packages. That isn't a bug — it's the finding. This image compiles Python from source, so those bytes belong to no dpkg package. The layer's weight is build artifacts, not an installed package, and the tool says so by falling back to file-level attribution instead of pretending. The package attribution: 69% header is telling you exactly that: 69% of the image mapped cleanly to packages, the rest is unowned bytes worth a second look.

Using it

docker save <image> -o image.tar
layer-blame [flags] image.tar
Enter fullscreen mode Exit fullscreen mode

No Docker daemon needed at read time — it parses the docker save tarball (or a plain OCI layout) directly. That's what makes it CI-friendly: save the artifact in your build job, run layer-blame against the file.

Flag Default Meaning
--top N 5 Top N contributors per layer
--no-color false Disable ANSI color (also honors NO_COLOR and non-TTY output)
--version false Print version, commit, build date

How it works

Five steps, all deterministic — no network, no daemon, no model:

  1. Load the tarball / OCI layout via go-containerregistry, which normalizes Docker-save, BuildKit, and containerd/OCI formats for you.
  2. Walk each layer's filesystem diff, recording every added file and its size. Whiteout (deletion) markers are skipped — you can't blame a layer for bytes it removed.
  3. Build a file→package index from the image's own package DBs: /lib/apk/db/installed (Alpine) and /var/lib/dpkg/info/*.list (Debian/Ubuntu).
  4. For each layer, group added bytes by owning package. Files with no owner are reported individually, so a large unattributed artifact still surfaces by name.
  5. Map each layer back to the Dockerfile instruction that created it (created_by from the image config) and print the table.

The novel part is step 4 — the JOIN between added bytes and the package database. Everything else is plumbing.

Known boundaries

I'd rather you hit these knowingly than be surprised:

  • scratch / distroless have no package database, so attribution falls back to file level. Still useful, no package names.
  • Multi-stage COPY --from files are disconnected from their origin package, so they show as unattributed in the destination layer.
  • v1 handles apk (Alpine) and dpkg (Debian/Ubuntu) only. rpm and language package managers (npm, pip) aren't in yet.

Install

It's a single Go binary, MIT-licensed.

# Prebuilt binary (Linux/macOS/Windows, amd64+arm64) — from the latest release
curl -sSfL https://github.com/mk668a/layer-blame/releases/latest/download/layer-blame_<version>_linux_amd64.tar.gz \
  | tar -xz layer-blame

# or with Go
go install github.com/mk668a/layer-blame@latest
Enter fullscreen mode Exit fullscreen mode

Repo: https://github.com/mk668a/layer-blame

Next time a PR balloons your image and the size-budget review asks "why is this 800 MB?", you'll have a one-command answer with a package name attached — instead of an afternoon in dive. If you try it on a weird image and the attribution surprises you, open an issue; the edge cases (rpm, cross-stage COPY) are exactly where it gets interesting.

Top comments (0)