GPU container images are the softest target in your infrastructure. A typical vLLM image is 15GB with hundreds of packages, a CUDA runtime, Python dependencies, and model weights. Most teams build these images once, push them, and never scan them again. That's a problem.
I've been building GPU infrastructure tools on Docker and Kubernetes for the past year — keda-gpu-scaler for autoscaling, otel-gpu-receiver for observability, and GPU NUMA topology scheduling for Volcano. Every one of these ships as a Docker container. This post walks through the zero-trust pipeline I use to build, scan, sign, and deploy GPU containers — from docker build to production.
The Attack Surface
A standard GPU inference image has five layers of dependencies, and every layer is a CVE vector:
| Layer | Example | Risk |
|---|---|---|
| OS packages | Ubuntu 22.04, libc, OpenSSL | OS-level CVEs, often unpatched in CUDA base images |
| CUDA toolkit | libcudart, libnvml, cuDNN, NCCL | NVIDIA releases on their own cycle, often behind on OS patches |
| Python runtime | CPython 3.11+ | Python CVEs, pip supply chain attacks |
| ML framework | PyTorch, TensorFlow, vLLM | Hundreds of transitive Python deps |
| Application code | Custom serving logic, prompt templates | Your code — hopefully the smallest attack surface |
The CUDA layer is the sneaky one. NVIDIA maintains their own base images (nvidia/cuda:12.4-base-ubuntu22.04) on their own release schedule. When Ubuntu patches a critical OpenSSL vulnerability, the NVIDIA base image might not pick it up for weeks. If you're building on top of nvidia/cuda, you inherit that lag.
Step 1: Docker Hardened Images as the Foundation
Docker Hardened Images are pre-vetted, continuously patched base images maintained by Docker. Instead of building on NVIDIA's base image directly, you can use a Docker Hardened base and selectively copy in only the CUDA libraries you need:
# Instead of inheriting the full NVIDIA base:
# FROM nvidia/cuda:12.4-base-ubuntu22.04
# Use Docker Hardened base + only the CUDA libs you need
FROM docker.io/docker/hardened-runtime:ubuntu-22.04
# Copy ONLY the NVIDIA libraries required at runtime
COPY --from=nvidia/cuda:12.4-base-ubuntu22.04 \
/usr/local/cuda/lib64/libcudart.so* \
/usr/local/cuda/lib64/libnvml.so* \
/usr/local/cuda/lib64/libcublas.so* \
/usr/local/lib/
RUN ldconfig
What this gives you:
- Patched OS layer — Docker maintains the base, not NVIDIA. Patches ship within hours, not weeks.
- Smaller image — only the CUDA libraries your application actually links against. Not the full 3GB toolkit.
- Scout-optimized — Docker Scout has first-party provenance data for Hardened Images, so scanning is faster and more accurate.
Step 2: Multi-Stage Builds with Minimal Runtime
The goal is to ship the smallest possible runtime image. Everything that's only needed at build time — compilers, Go toolchain, npm, development headers — stays in the build stage.
Here's the pattern I use for keda-gpu-scaler:
# === Build Stage ===
FROM golang:1.22-bookworm AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=1 go build -ldflags="-s -w" -o gpu-scaler ./cmd/scaler
# === Runtime Stage ===
FROM docker.io/docker/hardened-runtime:ubuntu-22.04
# Only the NVML library — nothing else from CUDA
COPY --from=nvidia/cuda:12.4-base-ubuntu22.04 \
/usr/local/cuda/lib64/libnvml.so* /usr/local/lib/
RUN ldconfig
# Non-root user
USER 65534:65534
# Read-only filesystem — binary is statically positioned
COPY --from=builder /app/gpu-scaler /usr/local/bin/
ENTRYPOINT ["gpu-scaler"]
Result: Runtime image is ~80MB instead of 3.5GB. Attack surface reduced by 97%.
For Python-based inference images (vLLM, Triton), the same principle applies but with pip:
# Build stage: install Python deps
FROM python:3.11-bookworm AS builder
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Runtime stage: minimal base + only installed packages
FROM docker.io/docker/hardened-runtime:ubuntu-22.04
# Copy CUDA runtime libs
COPY --from=nvidia/cuda:12.4-runtime-ubuntu22.04 \
/usr/local/cuda/lib64/ /usr/local/cuda/lib64/
# Copy Python and installed packages
COPY --from=python:3.11-slim /usr/local/ /usr/local/
COPY --from=builder /install /usr/local
# Non-root
USER 65534:65534
COPY ./app /app
ENTRYPOINT ["python", "-m", "app.serve"]
Step 3: Docker Scout in CI — Block on CVEs
Docker Scout integrates into your CI pipeline to catch vulnerabilities at build time, not after deployment:
# .github/workflows/docker-security.yml
name: Docker Security Pipeline
on:
push:
branches: [main]
pull_request:
jobs:
build-and-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build image
uses: docker/build-push-action@v5
with:
context: .
load: true
tags: my-gpu-image:${{ github.sha }}
# Scan for CVEs — fail on critical/high
- name: Docker Scout CVE scan
uses: docker/scout-action@v1
with:
command: cves
image: my-gpu-image:${{ github.sha }}
sarif-file: scout-results.sarif
only-severities: critical,high
exit-code: true
# Check against Docker Scout policies
- name: Docker Scout policy evaluation
uses: docker/scout-action@v1
with:
command: policy
image: my-gpu-image:${{ github.sha }}
# Upload SARIF to GitHub Security tab
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: scout-results.sarif
# Only push if scan passes
- name: Push image
if: success()
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
pmady/keda-gpu-scaler:${{ github.sha }}
pmady/keda-gpu-scaler:latest
Key configuration:
-
exit-code: true— the build fails if Scout finds critical or high CVEs. No exceptions. -
sarif-file— results show up in GitHub's Security tab for tracking over time. - Policy evaluation — Docker Scout policies can enforce rules like "no images older than 30 days" or "must use Docker Official Images as base."
Common GPU Image Findings
| Finding | Cause | Fix |
|---|---|---|
| Critical OpenSSL CVE | NVIDIA base image lags Ubuntu patches | Use Hardened Image as base |
| High-severity Python CVE | Transitive deps in PyTorch/vLLM | Pin versions, run pip audit in CI |
| Medium glibc CVE | Base image outdated | Rebuild weekly with --no-cache
|
| Outdated CUDA libraries | NVIDIA release cycle | Cherry-pick only needed .so files from latest CUDA image |
Step 4: Container Signing with Docker Content Trust
Sign your images so that Kubernetes admission controllers can verify provenance:
# Enable Docker Content Trust
export DOCKER_CONTENT_TRUST=1
# Push signs automatically
docker push pmady/keda-gpu-scaler:v1.0.0
# Verify a signed image
docker trust inspect pmady/keda-gpu-scaler:v1.0.0
For CI, use cosign (Sigstore) for keyless signing:
# In your GitHub Actions workflow
- name: Sign image with cosign
uses: sigstore/cosign-installer@v3
- name: Sign
run: |
cosign sign --yes \
pmady/keda-gpu-scaler:${{ github.sha }}
env:
COSIGN_EXPERIMENTAL: 1
Then enforce in Kubernetes with Kyverno:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-signed-images
spec:
validationFailureAction: Enforce
rules:
- name: check-signature
match:
any:
- resources:
kinds: ["Pod"]
verifyImages:
- imageReferences: ["pmady/*"]
attestors:
- entries:
- keyless:
subject: "pavan4devops@gmail.com"
issuer: "https://accounts.google.com"
Now unsigned or tampered images are rejected at admission. The chain is: Build → Scout scan → Sign → Push → Kubernetes admission verifies → Runtime policy enforces.
Step 5: Runtime Security for GPU Containers
GPU containers need device access (/dev/nvidia*) but they don't need anything else privileged. Lock them down:
apiVersion: v1
kind: Pod
metadata:
name: gpu-inference
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65534
fsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: inference
image: pmady/keda-gpu-scaler:v1.0.0
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
resources:
limits:
nvidia.com/gpu: 1
memory: 8Gi
requests:
nvidia.com/gpu: 1
memory: 4Gi
volumeMounts:
- name: tmp
mountPath: /tmp
- name: models
mountPath: /models
readOnly: true
volumes:
- name: tmp
emptyDir:
sizeLimit: 100Mi
- name: models
persistentVolumeClaim:
claimName: model-weights
readOnly: true
What this enforces:
- Non-root — NVML reads from sysfs, doesn't need root
-
Read-only root filesystem — no writes except
/tmp(ephemeral) and/models(read-only PVC) - No privilege escalation — prevents container escape
- Drop all capabilities — the NVIDIA device plugin handles GPU device injection, the container doesn't need any Linux capabilities
- Seccomp — default syscall filter
The Full Pipeline
Source Code
│
▼
docker build (multi-stage, Hardened Image base)
│
▼
Docker Scout CVE scan ── FAIL? → Block merge
│
▼
Docker Scout policy check ── FAIL? → Block merge
│
▼
cosign sign (keyless, Sigstore)
│
▼
docker push (to registry)
│
▼
Kubernetes admission (Kyverno verifies signature + base image)
│
▼
Runtime (non-root, read-only fs, drop caps, seccomp, resource limits)
Every stage has a gate. No image reaches production without being scanned, signed, and policy-checked. This is the same pipeline whether you're deploying a GPU inference service, a training job orchestrated by Volcano, or my keda-gpu-scaler DaemonSet.
This Isn't Optional for GPU Workloads
GPU containers are high-value targets. They run on expensive hardware ($3-30/hour per instance), they often have network access to model registries (HuggingFace, S3), and they process potentially sensitive input data. A compromised inference container is a direct path to data exfiltration and compute theft (cryptomining on your A100s).
The zero-trust pipeline adds ~2 minutes to your CI build. The alternative is finding out about a critical CVE from your security team after it's been running in production for three weeks.
Docker Scout + Hardened Images + container signing. Use all three.
Pavan Madduri is a Senior Cloud Platform Engineer at W.W. Grainger, Inc., CNCF Golden Kubestronaut, and Oracle ACE Associate. He maintains keda-gpu-scaler and otel-gpu-receiver, and contributed GPU NUMA topology scheduling to Volcano.
Top comments (0)