DEV Community

Cover image for How to Find Every Consumer of Your Docker Base
Daniel Westgaard
Daniel Westgaard

Posted on • Originally published at riftmap.dev

How to Find Every Consumer of Your Docker Base

You maintain a shared base image. A CVE drops. Which repos are affected? Here's why the answer is harder than it should be.


You maintain an internal Docker base image. Maybe it's platform/node-base or company/python-runtime. A dozen repos use it. Or thirty. Or you're not sure how many, because nobody's counted since the last team reorganisation.

Then a critical CVE hits the base OS layer, and you need to push a patched version. The question that matters is simple: which repos across our org pull this image, and at which version?

This should be easy to answer. It isn't.

The scenario

Here's what this looks like in practice. Your platform team builds and publishes registry.company.com/platform/base-image. Across the org, Dockerfiles reference it:

FROM registry.company.com/platform/base-image:v2.1
Enter fullscreen mode Exit fullscreen mode

Some repos pin to a specific tag. Some use latest. Some have ARG-parameterized FROM statements where the tag is injected at build time. Some reference the image not in a Dockerfile but in a docker-compose.yml or a CI pipeline config.

You need to find all of them, check which version each one uses, and coordinate the update. Right now, most teams do this by grepping, asking on Slack, or checking CI logs. None of these give you a complete, current answer.

What existing tools give you (and where they stop)

Several tools address parts of the Docker base image problem. Each one is useful. None of them answer the full question.

Docker Scout

Docker Scout analyzes a built image and tells you what base image it uses, whether that base is outdated, and what vulnerabilities it contains. docker scout recommendations will suggest updated base images with fewer CVEs.

This is valuable, but it works per-image, not per-org. It answers "what base image does this image use?" It doesn't answer "which of my 200 repos use this base image?" You'd need to run Scout against every image in your registry and correlate the results back to source repos β€” which is a project in itself.

Renovate

Renovate detects FROM statements in Dockerfiles and can open pull requests when a newer tag is available. It implicitly knows who consumes what, because it's configured per-repo and parses the Dockerfiles.

But Renovate doesn't expose this as a queryable view. You can't ask Renovate "show me every repo that references platform/base-image." It reacts when a new version appears. It doesn't give you the pre-release blast radius: before you push the patched image, which repos and teams will need to update?

Backstage / Roadie Tech Insights

Roadie (a managed Backstage provider) has a Tech Insights feature that can parse Dockerfiles via regex, extract base image versions, and create scorecards tracking migration progress. This is the closest existing solution to the consumer-tracking problem.

The limitation is that it requires Backstage to be set up with catalog-info.yaml per repo. You're tracking Docker base images through a service catalog that depends on manual registration. If a repo isn't in the catalog, its Dockerfile is invisible.

Container registries

Your registry (Docker Hub, Harbor, ACR, ECR) knows which images have been pulled and how often. Some provide pull statistics by tag. But pull logs don't tell you which source repo initiated the pull. A CI pipeline pulling base-image:v2.1 shows up as a pull event, not as "repo frontend-api depends on this image via its Dockerfile on line 3."

Grep

The fallback everyone uses. Clone all the repos, run grep -r "platform/base-image", assemble the results. It works once. The results are stale immediately. It misses parameterized FROM statements, compose files, and CI configs. And at a hundred repos, it takes long enough that nobody does it proactively.

Why this is harder than it looks

The core difficulty is that Docker base image dependencies live in multiple places and multiple formats.

Dockerfiles are the obvious source, but FROM statements can use ARG substitution:

ARG BASE_TAG=v2.1
FROM registry.company.com/platform/base-image:${BASE_TAG}
Enter fullscreen mode Exit fullscreen mode

A simple grep for the image name finds this. A simple grep for the version doesn't, because the version is in a variable.

Docker Compose files reference images differently:

services:
  api:
    image: registry.company.com/platform/base-image:v2.1
Enter fullscreen mode Exit fullscreen mode

This is a dependency on the same image, declared in a completely different file format.

CI pipeline configs often pull images directly:

build:
  image: registry.company.com/platform/base-image:v2.1
  script:
    - make build
Enter fullscreen mode Exit fullscreen mode

A GitLab CI image: directive or a GitHub Actions container: field is another consumption point for the same base image, and it's in yet another file.

Then there's the producer-side problem. Knowing which repos consume the image is half the story. You also need to know which repo builds it. That's usually a CI pipeline with docker build and docker push commands, not something declared in a manifest. Connecting the consumer graph to the producer requires cross-referencing multiple file types within and across repos.

No single tool today connects all of these surfaces into one view.

What the full answer requires

To reliably answer "who consumes this base image," you need a system that:

  1. Scans every repo in the org, not just the ones registered in a catalog
  2. Parses Dockerfiles, Compose files, and CI configs, because the same image can be referenced in all three
  3. Handles variable substitution in FROM statements by resolving ARG defaults
  4. Detects which repos produce which images by scanning CI configs for build and push commands
  5. Keeps the results current through scheduled or event-triggered rescans
  6. Makes the graph queryable: "show me every consumer of platform/base-image, grouped by version"

This is one of the specific problems I'm building Riftmap to solve. It scans a GitLab or GitHub org, parses Dockerfiles (including multi-stage builds, ARG defaults, and Compose files), detects which repos produce which images via CI config analysis, and builds a cross-repo dependency graph you can query by artifact.

The result: when that CVE drops, you click on the base image in the graph and immediately see every repo that depends on it, which version each one uses, and who owns them. No grepping. No Slack archaeology. No stale spreadsheet from last quarter.


How is your team solving this today? I'd genuinely like to know β€” drop a comment or find me at riftmap.dev.

Top comments (0)