DEV Community

Cover image for You Don't Always Need Grafana for GPU Monitoring
Panos S
Panos S

Posted on

You Don't Always Need Grafana for GPU Monitoring

My ML group has a few GPU servers. I wanted to check utilization without SSHing into each machine. The standard answer is Grafana + Prometheus + exporters, but that felt like overkill for checking if GPUs are busy.

I built GPU Hot as a simpler alternative. This post is about why that made sense.

The Grafana Problem

Grafana is excellent for production monitoring at scale. But for a small team with a few GPU boxes, you're looking at:

  • Installing Prometheus
  • Installing node exporters on each server
  • Installing GPU exporters
  • Writing Prometheus configs
  • Setting up Grafana dashboards
  • Maintaining all of this

For this use case (checking GPU utilization while walking to get coffee), this was too much infrastructure.

What I Actually Needed

A web page that shows: which GPUs are in use, temperature, memory usage, and what processes are running. Updates in real-time so I can see when a training job finishes.

That's it. No alerting, no long-term storage, no complex queries.

The Setup

One Docker command per server:

docker run -d --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:1312 and you see your GPUs updating every 0.5 seconds.

For multiple servers, run the container on each GPU box, then start a hub:

# On each GPU server
docker run -d --gpus all -p 1312:1312 \
  -e NODE_NAME=$(hostname) \
  ghcr.io/psalias2006/gpu-hot:latest

# On your laptop (no GPU needed)
docker run -d -p 1312:1312 \
  -e GPU_HOT_MODE=hub \
  -e NODE_URLS=http://server1:1312,http://server2:1312 \
  ghcr.io/psalias2006/gpu-hot:latest
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:1312 and you see all GPUs from all servers in one dashboard. Total setup time: under 5 minutes.

How It Works

The core is straightforward:

NVML for metrics: Python's NVML bindings give direct access to GPU data. Faster than parsing nvidia-smi output and returns structured data.

FastAPI + WebSockets: Async WebSockets push metrics to the browser. No polling, sub-second updates. The server collects metrics and broadcasts them to all connected clients.

Hub mode: Each node runs the same container and exposes metrics via WebSocket. The hub connects to all nodes, aggregates their data, and serves it through a single dashboard.

Frontend: Vanilla JavaScript with Chart.js. No build step, no framework, just HTML/CSS/JS.

Docker: Packages everything. Users don't need to install Python, NVML bindings, or manage dependencies. The NVIDIA Container Toolkit handles GPU access.

When This Approach Works

This pattern works well when:

  • You have a small number of machines (1-20)
  • You need real-time visibility, not historical analysis
  • Your team is small enough that everyone can check one dashboard
  • You don't need alerting or complex queries

It doesn't replace proper monitoring for production services. But for development infrastructure in a small team, it's sufficient and much simpler to maintain.

Trade-offs

What you lose compared to Grafana:

  • No persistent storage (metrics are only kept in memory for the current session)
  • No alerting
  • No complex queries or correlations
  • No authentication (we run this on an internal network)

What you gain:

  • Zero configuration
  • Sub-second updates
  • No maintenance
  • One command deployment

For this use case, the trade-off made sense. This isn't for monitoring production services. It's for checking if GPUs are free before starting a training run.

Takeaway

Not every monitoring problem needs the full observability stack. For small teams with straightforward needs, a purpose-built tool can be simpler to deploy and maintain than configuring enterprise solutions.

Try the interactive demo to see it in action.

Top comments (0)