Pavan Madduri

Posted on May 8

The Zero-Trust Docker Pipeline: Securing GPU/AI Container Images from Build to Production

#ai #docker #gpu #security

GPU container images are the softest target in your infrastructure. A typical vLLM image is 15GB with hundreds of packages, a CUDA runtime, Python dependencies, and model weights. Most teams build these images once, push them, and never scan them again. That's a problem.

I've been building GPU infrastructure tools on Docker and Kubernetes for the past year — keda-gpu-scaler for autoscaling, otel-gpu-receiver for observability, and GPU NUMA topology scheduling for Volcano. Every one of these ships as a Docker container. This post walks through the zero-trust pipeline I use to build, scan, sign, and deploy GPU containers — from docker build to production.

The Attack Surface

A standard GPU inference image has five layers of dependencies, and every layer is a CVE vector:

Layer	Example	Risk
OS packages	Ubuntu 22.04, libc, OpenSSL	OS-level CVEs, often unpatched in CUDA base images
CUDA toolkit	libcudart, libnvml, cuDNN, NCCL	NVIDIA releases on their own cycle, often behind on OS patches
Python runtime	CPython 3.11+	Python CVEs, pip supply chain attacks
ML framework	PyTorch, TensorFlow, vLLM	Hundreds of transitive Python deps
Application code	Custom serving logic, prompt templates	Your code — hopefully the smallest attack surface

The CUDA layer is the sneaky one. NVIDIA maintains their own base images (nvidia/cuda:12.4-base-ubuntu22.04) on their own release schedule. When Ubuntu patches a critical OpenSSL vulnerability, the NVIDIA base image might not pick it up for weeks. If you're building on top of nvidia/cuda, you inherit that lag.

Step 1: Docker Hardened Images as the Foundation

Docker Hardened Images are pre-vetted, continuously patched base images maintained by Docker. Instead of building on NVIDIA's base image directly, you can use a Docker Hardened base and selectively copy in only the CUDA libraries you need:

# Instead of inheriting the full NVIDIA base:
# FROM nvidia/cuda:12.4-base-ubuntu22.04

# Use Docker Hardened base + only the CUDA libs you need
FROM docker.io/docker/hardened-runtime:ubuntu-22.04

# Copy ONLY the NVIDIA libraries required at runtime
COPY --from=nvidia/cuda:12.4-base-ubuntu22.04 \
  /usr/local/cuda/lib64/libcudart.so* \
  /usr/local/cuda/lib64/libnvml.so* \
  /usr/local/cuda/lib64/libcublas.so* \
  /usr/local/lib/

RUN ldconfig

What this gives you:

Patched OS layer — Docker maintains the base, not NVIDIA. Patches ship within hours, not weeks.
Smaller image — only the CUDA libraries your application actually links against. Not the full 3GB toolkit.
Scout-optimized — Docker Scout has first-party provenance data for Hardened Images, so scanning is faster and more accurate.

Step 2: Multi-Stage Builds with Minimal Runtime

The goal is to ship the smallest possible runtime image. Everything that's only needed at build time — compilers, Go toolchain, npm, development headers — stays in the build stage.

Here's the pattern I use for keda-gpu-scaler:

# === Build Stage ===
FROM golang:1.22-bookworm AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=1 go build -ldflags="-s -w" -o gpu-scaler ./cmd/scaler

# === Runtime Stage ===
FROM docker.io/docker/hardened-runtime:ubuntu-22.04

# Only the NVML library — nothing else from CUDA
COPY --from=nvidia/cuda:12.4-base-ubuntu22.04 \
  /usr/local/cuda/lib64/libnvml.so* /usr/local/lib/
RUN ldconfig

# Non-root user
USER 65534:65534

# Read-only filesystem — binary is statically positioned
COPY --from=builder /app/gpu-scaler /usr/local/bin/

ENTRYPOINT ["gpu-scaler"]

Result: Runtime image is ~80MB instead of 3.5GB. Attack surface reduced by 97%.

For Python-based inference images (vLLM, Triton), the same principle applies but with pip:

# Build stage: install Python deps
FROM python:3.11-bookworm AS builder
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Runtime stage: minimal base + only installed packages
FROM docker.io/docker/hardened-runtime:ubuntu-22.04

# Copy CUDA runtime libs
COPY --from=nvidia/cuda:12.4-runtime-ubuntu22.04 \
  /usr/local/cuda/lib64/ /usr/local/cuda/lib64/

# Copy Python and installed packages
COPY --from=python:3.11-slim /usr/local/ /usr/local/
COPY --from=builder /install /usr/local

# Non-root
USER 65534:65534

COPY ./app /app
ENTRYPOINT ["python", "-m", "app.serve"]

Step 3: Docker Scout in CI — Block on CVEs

Docker Scout integrates into your CI pipeline to catch vulnerabilities at build time, not after deployment:

# .github/workflows/docker-security.yml
name: Docker Security Pipeline
on:
  push:
    branches: [main]
  pull_request:

jobs:
  build-and-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: docker/setup-buildx-action@v3

      - uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Build image
        uses: docker/build-push-action@v5
        with:
          context: .
          load: true
          tags: my-gpu-image:${{ github.sha }}

      # Scan for CVEs — fail on critical/high
      - name: Docker Scout CVE scan
        uses: docker/scout-action@v1
        with:
          command: cves
          image: my-gpu-image:${{ github.sha }}
          sarif-file: scout-results.sarif
          only-severities: critical,high
          exit-code: true

      # Check against Docker Scout policies
      - name: Docker Scout policy evaluation
        uses: docker/scout-action@v1
        with:
          command: policy
          image: my-gpu-image:${{ github.sha }}

      # Upload SARIF to GitHub Security tab
      - name: Upload SARIF
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: scout-results.sarif

      # Only push if scan passes
      - name: Push image
        if: success()
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            pmady/keda-gpu-scaler:${{ github.sha }}
            pmady/keda-gpu-scaler:latest

Key configuration:

exit-code: true — the build fails if Scout finds critical or high CVEs. No exceptions.
sarif-file — results show up in GitHub's Security tab for tracking over time.
Policy evaluation — Docker Scout policies can enforce rules like "no images older than 30 days" or "must use Docker Official Images as base."

Common GPU Image Findings

Finding	Cause	Fix
Critical OpenSSL CVE	NVIDIA base image lags Ubuntu patches	Use Hardened Image as base
High-severity Python CVE	Transitive deps in PyTorch/vLLM	Pin versions, run `pip audit` in CI
Medium glibc CVE	Base image outdated	Rebuild weekly with `--no-cache`
Outdated CUDA libraries	NVIDIA release cycle	Cherry-pick only needed `.so` files from latest CUDA image

Step 4: Container Signing with Docker Content Trust

Sign your images so that Kubernetes admission controllers can verify provenance:

# Enable Docker Content Trust
export DOCKER_CONTENT_TRUST=1

# Push signs automatically
docker push pmady/keda-gpu-scaler:v1.0.0

# Verify a signed image
docker trust inspect pmady/keda-gpu-scaler:v1.0.0

For CI, use cosign (Sigstore) for keyless signing:

# In your GitHub Actions workflow
- name: Sign image with cosign
  uses: sigstore/cosign-installer@v3

- name: Sign
  run: |
    cosign sign --yes \
      pmady/keda-gpu-scaler:${{ github.sha }}
  env:
    COSIGN_EXPERIMENTAL: 1

Then enforce in Kubernetes with Kyverno:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-signed-images
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-signature
      match:
        any:
          - resources:
              kinds: ["Pod"]
      verifyImages:
        - imageReferences: ["pmady/*"]
          attestors:
            - entries:
                - keyless:
                    subject: "pavan4devops@gmail.com"
                    issuer: "https://accounts.google.com"

Now unsigned or tampered images are rejected at admission. The chain is: Build → Scout scan → Sign → Push → Kubernetes admission verifies → Runtime policy enforces.

Step 5: Runtime Security for GPU Containers

GPU containers need device access (/dev/nvidia*) but they don't need anything else privileged. Lock them down:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-inference
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534
    fsGroup: 65534
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: inference
      image: pmady/keda-gpu-scaler:v1.0.0
      securityContext:
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
      resources:
        limits:
          nvidia.com/gpu: 1
          memory: 8Gi
        requests:
          nvidia.com/gpu: 1
          memory: 4Gi
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: models
          mountPath: /models
          readOnly: true
  volumes:
    - name: tmp
      emptyDir:
        sizeLimit: 100Mi
    - name: models
      persistentVolumeClaim:
        claimName: model-weights
        readOnly: true

What this enforces:

Non-root — NVML reads from sysfs, doesn't need root
Read-only root filesystem — no writes except /tmp (ephemeral) and /models (read-only PVC)
No privilege escalation — prevents container escape
Drop all capabilities — the NVIDIA device plugin handles GPU device injection, the container doesn't need any Linux capabilities
Seccomp — default syscall filter

The Full Pipeline

Source Code
    │
    ▼
docker build (multi-stage, Hardened Image base)
    │
    ▼
Docker Scout CVE scan ── FAIL? → Block merge
    │
    ▼
Docker Scout policy check ── FAIL? → Block merge
    │
    ▼
cosign sign (keyless, Sigstore)
    │
    ▼
docker push (to registry)
    │
    ▼
Kubernetes admission (Kyverno verifies signature + base image)
    │
    ▼
Runtime (non-root, read-only fs, drop caps, seccomp, resource limits)

Every stage has a gate. No image reaches production without being scanned, signed, and policy-checked. This is the same pipeline whether you're deploying a GPU inference service, a training job orchestrated by Volcano, or my keda-gpu-scaler DaemonSet.

This Isn't Optional for GPU Workloads

GPU containers are high-value targets. They run on expensive hardware ($3-30/hour per instance), they often have network access to model registries (HuggingFace, S3), and they process potentially sensitive input data. A compromised inference container is a direct path to data exfiltration and compute theft (cryptomining on your A100s).

The zero-trust pipeline adds ~2 minutes to your CI build. The alternative is finding out about a critical CVE from your security team after it's been running in production for three weeks.

Docker Scout + Hardened Images + container signing. Use all three.

Pavan Madduri is a Senior Cloud Platform Engineer at W.W. Grainger, Inc., CNCF Golden Kubestronaut, and Oracle ACE Associate. He maintains keda-gpu-scaler and otel-gpu-receiver, and contributed GPU NUMA topology scheduling to Volcano.

DEV Community