<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Panos S</title>
    <description>The latest articles on DEV Community by Panos S (@panos_s_38e7dbec806aeb7db).</description>
    <link>https://dev.to/panos_s_38e7dbec806aeb7db</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3592733%2F7129c616-36f2-4136-a7be-e4bc456c8223.png</url>
      <title>DEV Community: Panos S</title>
      <link>https://dev.to/panos_s_38e7dbec806aeb7db</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/panos_s_38e7dbec806aeb7db"/>
    <language>en</language>
    <item>
      <title>You Don't Always Need Grafana for GPU Monitoring</title>
      <dc:creator>Panos S</dc:creator>
      <pubDate>Sat, 01 Nov 2025 23:36:55 +0000</pubDate>
      <link>https://dev.to/panos_s_38e7dbec806aeb7db/you-probably-dont-need-grafana-for-gpu-monitoring-jc3</link>
      <guid>https://dev.to/panos_s_38e7dbec806aeb7db/you-probably-dont-need-grafana-for-gpu-monitoring-jc3</guid>
      <description>&lt;p&gt;My ML group has a few GPU servers. I wanted to check utilization without SSHing into each machine. The standard answer is Grafana + Prometheus + exporters, but that felt like overkill for checking if GPUs are busy.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/psalias2006/gpu-hot" rel="noopener noreferrer"&gt;GPU Hot&lt;/a&gt; as a simpler alternative. This post is about why that made sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Grafana Problem
&lt;/h2&gt;

&lt;p&gt;Grafana is excellent for production monitoring at scale. But for a small team with a few GPU boxes, you're looking at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installing Prometheus&lt;/li&gt;
&lt;li&gt;Installing node exporters on each server&lt;/li&gt;
&lt;li&gt;Installing GPU exporters&lt;/li&gt;
&lt;li&gt;Writing Prometheus configs&lt;/li&gt;
&lt;li&gt;Setting up Grafana dashboards&lt;/li&gt;
&lt;li&gt;Maintaining all of this&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this use case (checking GPU utilization while walking to get coffee), this was too much infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Needed
&lt;/h2&gt;

&lt;p&gt;A web page that shows: which GPUs are in use, temperature, memory usage, and what processes are running. Updates in real-time so I can see when a training job finishes.&lt;/p&gt;

&lt;p&gt;That's it. No alerting, no long-term storage, no complex queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;One Docker command per server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="nt"&gt;-p&lt;/span&gt; 1312:1312 ghcr.io/psalias2006/gpu-hot:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:1312&lt;/code&gt; and you see your GPUs updating every 0.5 seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwhsqfh37ba5l2jfi54i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwhsqfh37ba5l2jfi54i.png" alt=" " width="800" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For multiple servers, run the container on each GPU box, then start a hub:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On each GPU server&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="nt"&gt;-p&lt;/span&gt; 1312:1312 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;NODE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;hostname&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/psalias2006/gpu-hot:latest

&lt;span class="c"&gt;# On your laptop (no GPU needed)&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 1312:1312 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;GPU_HOT_MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;hub &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;NODE_URLS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://server1:1312,http://server2:1312 &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/psalias2006/gpu-hot:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:1312&lt;/code&gt; and you see all GPUs from all servers in one dashboard. Total setup time: under 5 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvt4av8dutuhiwxk8q6o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvt4av8dutuhiwxk8q6o.png" alt=" " width="800" height="691"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The core is straightforward:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NVML for metrics&lt;/strong&gt;: Python's NVML bindings give direct access to GPU data. Faster than parsing &lt;code&gt;nvidia-smi&lt;/code&gt; output and returns structured data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FastAPI + WebSockets&lt;/strong&gt;: Async WebSockets push metrics to the browser. No polling, sub-second updates. The server collects metrics and broadcasts them to all connected clients.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hub mode&lt;/strong&gt;: Each node runs the same container and exposes metrics via WebSocket. The hub connects to all nodes, aggregates their data, and serves it through a single dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontend&lt;/strong&gt;: Vanilla JavaScript with Chart.js. No build step, no framework, just HTML/CSS/JS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker&lt;/strong&gt;: Packages everything. Users don't need to install Python, NVML bindings, or manage dependencies. The NVIDIA Container Toolkit handles GPU access.&lt;/p&gt;

&lt;h2&gt;
  
  
  When This Approach Works
&lt;/h2&gt;

&lt;p&gt;This pattern works well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have a small number of machines (1-20)&lt;/li&gt;
&lt;li&gt;You need real-time visibility, not historical analysis&lt;/li&gt;
&lt;li&gt;Your team is small enough that everyone can check one dashboard&lt;/li&gt;
&lt;li&gt;You don't need alerting or complex queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It doesn't replace proper monitoring for production services. But for development infrastructure in a small team, it's sufficient and much simpler to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;What you lose compared to Grafana:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No persistent storage (metrics are only kept in memory for the current session)&lt;/li&gt;
&lt;li&gt;No alerting&lt;/li&gt;
&lt;li&gt;No complex queries or correlations&lt;/li&gt;
&lt;li&gt;No authentication (we run this on an internal network)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What you gain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sub-second&lt;/strong&gt; updates&lt;/li&gt;
&lt;li&gt;No maintenance&lt;/li&gt;
&lt;li&gt;One command deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this use case, the trade-off made sense. This isn't for monitoring production services. It's for checking if GPUs are free before starting a training run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Not every monitoring problem needs the full observability stack. For small teams with straightforward needs, a purpose-built tool can be simpler to deploy and maintain than configuring enterprise solutions.&lt;/p&gt;

&lt;p&gt;Try the &lt;a href="https://psalias2006.github.io/gpu-hot" rel="noopener noreferrer"&gt;interactive demo&lt;/a&gt; to see it in action.&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>monitoring</category>
      <category>docker</category>
      <category>nvidia</category>
    </item>
  </channel>
</rss>
