Quick story.
I run a small homelab — one box, an NVIDIA card, around ten Docker containers, and a couple of local model servers (Ollama mostly, vLLM when I'm playing around).
Every "why is this model OOM-ing" turned into the same five minutes of archaeology:
nvidia-smi → pick a PID
ps -o cgroup -p → find the container ID
docker ps → map ID to name
Just to answer: which container, which model, is eating my VRAM right now?
I tried Prometheus + Grafana + node-exporter + dcgm-exporter. It works, but for one box it's a stack-on-a-stack to answer a single question.
So I built a third option: one container, one page. GPU panel maps VRAM-using processes back to their Docker container automatically. AI Models panel queries each model server's own API (Ollama /api/ps, vLLM /v1/models, llama.cpp, TGI, A1111, ComfyUI) and shows you which model is loaded.
docker compose up -d --build and that's the whole setup.
History in SQLite, downsampled on read. No agents, no cloud, no Prometheus.
The repo, with the longer technical write-up and screenshots:
👉 github.com/SikamikanikoBG/homelab-monitor
MIT licensed. NVIDIA-only on the GPU panel for now — AMD/Intel back-ends are a good first issue if anyone wants to extend.
Curious how others here solve the "who holds my VRAM" problem. Different tool? Different stack? Or did you also build something tiny because the big stacks felt like too much for one box?
Top comments (0)