GPU passthrough on Proxmox is one of those things that looks straightforward in the documentation and then systematically humbles you in practice. The official wiki covers the basic IOMMU setup, the hostpci line, and a note about VGA arbitration. What it doesn't cover is the three separate ways I managed to brick or confuse a GPU before I got Ollama running stably on a Kubernetes worker.
This post is the guide I wish I'd had. Not a rehash of the Proxmox wiki, but a rundown of the specific failure modes I hit — most of which are findable on forums in fragments, but rarely assembled in one place.
The Setup
A single server with a PCIe add-in card GPU — specifically, a data-center-class card with no display output (no VGA arbitration issues, at least). The goal was to pass it through to a VM running as a Kubernetes worker node, then use it for local LLM inference. Sounds simple. Was not.
Gotcha 1: Machine Type Must Be q35
If you're running an older VM and try to add a PCIe device, Proxmox will let you set hostpci0 on an i440fx machine. The VM will not boot, or will boot with the GPU invisible. No clear error.
The fix is machine: q35. PCIe passthrough requires a q35 machine. The i440fx chipset emulation only supports PCI (not PCIe) and doesn't expose the topology that GPU drivers expect.
Change it in the Proxmox UI under Hardware → Machine, or in the config:
qm set <vmid> --machine q35
If you're changing machine types on an existing VM, verify your VirtIO devices are still recognized post-change. Boot once with just the disk attached before adding the GPU.
Gotcha 2: PCI Address Instability Across Reboots
This one took me an embarrassingly long time to figure out because the symptom looks like corruption or a driver issue.
When a server has a PCIe switch (common in machines that fit a lot of slots), the PCI bus numbers assigned to downstream devices can change between reboots. The GPU might be 01:00.0 on one boot and 08:00.0 on the next. The hostpci0 line in the VM config points at a static address. After a reboot, you're pointing at empty space.
Symptoms: VM starts, guest OS sees no GPU. lspci in the guest shows nothing in the expected vendor range. No helpful error anywhere.
Fix: after every host reboot, SSH to the Proxmox node and check:
lspci | grep -i nvidia # or amd, or whatever your GPU vendor is
Then update the VM config:
qm set <vmid> --hostpci0 <new-address>,pcie=1
There's no automatic solution to this short of pinning the hardware. If you're relying on this VM starting cleanly after a power event, this is a real operational problem. I haven't found a reliable software fix; the right answer is hardware: either a server that doesn't use a PCIe switch, or one where the device order is deterministic.
Gotcha 3: D3cold Will Brick Your GPU (Until You Reboot the Host)
This is the one that stings because the root cause is non-obvious and the recovery requires a full host reboot.
Modern PCIe devices support D3cold — a low-power state where the device is essentially off. On desktop systems this is managed by the OS and BIOS cooperatively. When you're doing PCI passthrough, the host kernel can still manage D3cold for devices before they're claimed by the VM.
If a GPU enters D3cold and you then try to detach and re-attach it (e.g., echo 1 > /sys/bus/pci/devices/<addr>/remove followed by a rescan), it won't come back. The rescan finds an empty slot because the device is powered off and can't respond to config space reads.
Before passing through a GPU, check its D3cold state and disable it:
cat /sys/bus/pci/devices/<addr>/d3cold_allowed
# If 1, disable it:
echo 0 > /sys/bus/pci/devices/<addr>/d3cold_allowed
Do this before any qm start, before any PCI remove/rescan cycle, before anything. Make it part of your passthrough setup checklist.
If you've already hit this and the GPU is gone, there's no recovery path short of rebooting the host. rescan, probe, driver reloads — none of it matters when the device is in D3cold and unresponsive on the bus.
Gotcha 4: The NVIDIA Container Toolkit Isn't Enough — You Need the Default Runtime
Getting NVIDIA's container stack working on a Kubernetes node is well-documented. You install the toolkit, configure containerd with nvidia-ctk runtime configure --runtime=containerd, and the nvidia runtime becomes available. GPU workloads with runtimeClassName: nvidia start working.
What nobody mentions clearly: the NVIDIA device plugin daemonset doesn't set runtimeClassName: nvidia. It runs with whatever the default containerd runtime is. If your default runtime is still runc, the device plugin can't find libnvidia-ml.so.1 at startup, silently fails to initialize, and never registers the nvidia.com/gpu resource on the node.
Your GPU pods will sit in Pending forever with:
0/N nodes are available: N Insufficient nvidia.com/gpu.
The device plugin pod itself will be Running (from Kubernetes' perspective), but internally it has failed to detect the GPU. Logs will show something like "failed to initialize NVML" or a library not found error.
Fix: set nvidia as the default runtime on GPU nodes:
nvidia-ctk runtime configure --runtime=containerd --set-as-default
systemctl restart containerd
Then delete and let the device plugin pod restart. Within a few seconds, kubectl describe node <gpu-node> should show the GPU resource.
This is a "works on my machine" failure mode because most tutorials test with explicit runtimeClassName: nvidia on workloads, and that works fine. The device plugin itself is the edge case.
Gotcha 5: Wildcard DNS + High ndots Breaks External Registry Pulls
This one is specific to Kubernetes clusters with a wildcard internal domain, but it's worth calling out because it's painful to debug.
If your cluster uses a wildcard DNS entry (e.g., *.yourdomain.com pointing at your ingress) and your Kubernetes pods have the default ndots: 5 resolver configuration, there's a class of DNS failures that only affects pods trying to pull from external container registries.
The issue: ndots: 5 means pods try to resolve unqualified hostnames by appending cluster search domains before going to the root. A hostname like registry.someregistry.io has only 2 dots, so Kubernetes will first try registry.someregistry.io.cluster.local, then registry.someregistry.io.yourdomain.com, and so on.
With a wildcard *.yourdomain.com, that second attempt resolves successfully — to your ingress IP. Your ingress then terminates TLS with a certificate for *.yourdomain.com, which doesn't match registry.someregistry.io. You get a TLS verification error or a certificate mismatch, and your pod fails to pull the image.
The fix for GPU workloads (or anything pulling from external registries):
spec:
template:
spec:
dnsConfig:
options:
- name: ndots
value: "2"
Setting ndots: 2 means anything with 2+ dots goes straight to root DNS, bypassing the search domain expansion for fully-qualified names. This fixes the pull failures without affecting internal service discovery.
Gotcha 6: Use Recreate, Not RollingUpdate, for GPU Deployments
A smaller one, but it'll stop you cold during updates. If you only have one GPU and your deployment is using the default RollingUpdate strategy, Kubernetes will try to start the new pod before terminating the old one. The new pod can't schedule because there's no GPU available. The rollout hangs indefinitely.
Set your deployment strategy to Recreate for anything running on a single-GPU node:
spec:
strategy:
type: Recreate
Yes, this means downtime during updates. With a single GPU, that's the tradeoff. If you need zero-downtime GPU deployments, you need more GPUs.
Lessons Learned
If I were doing this again, I'd approach it in a different order than I did:
Start with the GPU, not the workload. Verify passthrough is stable across reboots before building anything on top of it. Check the PCI address after several reboots. Make sure D3cold is disabled. Don't deploy Kubernetes, Ollama, and the container toolkit all at once and then try to figure out which layer broke.
Validate the device plugin first, not your workload. Run a simple test pod with runtimeClassName: nvidia and nvidia-smi as the command. If that works, the device plugin is healthy. If your actual workload fails, the problem is in the workload, not the GPU stack.
PCIe switches are a hidden variable. Any machine that fits a lot of PCIe cards likely has a switch, and bus renumbering is a real risk. Before purchasing hardware for passthrough, look at the motherboard topology. Servers with direct CPU-to-slot connections (typically enterprise gear) are more predictable than consumer boards with PLX switches.
D3cold is not commonly documented in passthrough guides. Most guides are written for desktop GPUs in desktop OSes where D3cold behavior is different. Data center cards and server environments behave differently. If a GPU disappears after a detach and won't come back on rescan, assume D3cold before anything else.
The GPU is running well now. Inference is fast. The K8s node registers the resource correctly, pods schedule without drama, and image pulls work. Getting here required stepping through every one of these failure modes, often multiple times. Hopefully you can skip a few of them.
Top comments (0)