You Don't Always Need Grafana for GPU Monitoring

#gpu #monitoring #docker #nvidia

My ML group has a few GPU servers. I wanted to check utilization without SSHing into each machine. The standard answer is Grafana + Prometheus + exporters, but that felt like overkill for checking if GPUs are busy.

I built GPU Hot as a simpler alternative. This post is about why that made sense.

The Grafana Problem

Grafana is excellent for production monitoring at scale. But for a small team with a few GPU boxes, you're looking at:

Installing Prometheus
Installing node exporters on each server
Installing GPU exporters
Writing Prometheus configs
Setting up Grafana dashboards
Maintaining all of this

For this use case (checking GPU utilization while walking to get coffee), this was too much infrastructure.

What I Actually Needed

A web page that shows: which GPUs are in use, temperature, memory usage, and what processes are running. Updates in real-time so I can see when a training job finishes.

That's it. No alerting, no long-term storage, no complex queries.

The Setup

One Docker command per server:

docker run -d --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest

Open http://localhost:1312 and you see your GPUs updating every 0.5 seconds.

For multiple servers, run the container on each GPU box, then start a hub:

# On each GPU server
docker run -d --gpus all -p 1312:1312 \
  -e NODE_NAME=$(hostname) \
  ghcr.io/psalias2006/gpu-hot:latest

# On your laptop (no GPU needed)
docker run -d -p 1312:1312 \
  -e GPU_HOT_MODE=hub \
  -e NODE_URLS=http://server1:1312,http://server2:1312 \
  ghcr.io/psalias2006/gpu-hot:latest

Open http://localhost:1312 and you see all GPUs from all servers in one dashboard. Total setup time: under 5 minutes.

How It Works

The core is straightforward:

NVML for metrics: Python's NVML bindings give direct access to GPU data. Faster than parsing nvidia-smi output and returns structured data.

FastAPI + WebSockets: Async WebSockets push metrics to the browser. No polling, sub-second updates. The server collects metrics and broadcasts them to all connected clients.

Hub mode: Each node runs the same container and exposes metrics via WebSocket. The hub connects to all nodes, aggregates their data, and serves it through a single dashboard.

Frontend: Vanilla JavaScript with Chart.js. No build step, no framework, just HTML/CSS/JS.

Docker: Packages everything. Users don't need to install Python, NVML bindings, or manage dependencies. The NVIDIA Container Toolkit handles GPU access.

When This Approach Works

This pattern works well when:

You have a small number of machines (1-20)
You need real-time visibility, not historical analysis
Your team is small enough that everyone can check one dashboard
You don't need alerting or complex queries

It doesn't replace proper monitoring for production services. But for development infrastructure in a small team, it's sufficient and much simpler to maintain.

Trade-offs

What you lose compared to Grafana:

No persistent storage (metrics are only kept in memory for the current session)
No alerting
No complex queries or correlations
No authentication (we run this on an internal network)

What you gain:

Zero configuration
Sub-second updates
No maintenance
One command deployment

For this use case, the trade-off made sense. This isn't for monitoring production services. It's for checking if GPUs are free before starting a training run.

Takeaway

Not every monitoring problem needs the full observability stack. For small teams with straightforward needs, a purpose-built tool can be simpler to deploy and maintain than configuring enterprise solutions.

Try the interactive demo to see it in action.