<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Deal Estate</title>
    <description>The latest articles on DEV Community by Deal Estate (@deal_estate_715bf4569d373).</description>
    <link>https://dev.to/deal_estate_715bf4569d373</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4011121%2F5ca35d75-3e96-47d4-8cec-f091d4520cff.png</url>
      <title>DEV Community: Deal Estate</title>
      <link>https://dev.to/deal_estate_715bf4569d373</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/deal_estate_715bf4569d373"/>
    <language>en</language>
    <item>
      <title>DGX Spark hitting 83 C under sustained Ollama load — solved by clock-locking via nvidia-smi -lgc</title>
      <dc:creator>Deal Estate</dc:creator>
      <pubDate>Wed, 01 Jul 2026 15:38:44 +0000</pubDate>
      <link>https://dev.to/deal_estate_715bf4569d373/dgx-spark-hitting-83degc-under-sustained-ollama-load-solved-by-clock-locking-via-nvidia-smi-lgc-1pn6</link>
      <guid>https://dev.to/deal_estate_715bf4569d373/dgx-spark-hitting-83degc-under-sustained-ollama-load-solved-by-clock-locking-via-nvidia-smi-lgc-1pn6</guid>
      <description>&lt;h1&gt;
  
  
  DGX Spark hitting 83°C under sustained Ollama load — solved by clock-locking via nvidia-smi -lgc
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; GB10 in the DGX Spark has no user-exposed power-limit or fan-curve control (&lt;code&gt;nvidia-smi&lt;/code&gt; returns &lt;code&gt;[N/A]&lt;/code&gt; for both — firmware-managed). But &lt;code&gt;nvidia-smi --lock-gpu-clocks&lt;/code&gt; DOES work. I wrote a tiny daemon that samples temp every 30s and steps the clock ceiling down 150 MHz whenever it enters the warning band, then relaxes it back up after 3 consecutive cool samples. Ollama gpt-oss:120b + qwen2.5:72b workload — dropped from 83 °C → 72 °C, sustained, same util.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;My DGX Spark serving Ollama (~40 GB VRAM across three model instances, sustained 94% util) sits at 82–84 °C indefinitely. No thermal-throttle events yet, but that's uncomfortably close to the SW-slowdown threshold. Standard cooling knobs are absent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;nvidia-smi &lt;span class="nt"&gt;--query-gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;power.limit,power.max_limit,power.min_limit,fan.speed &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv,noheader
&lt;span class="go"&gt;[N/A], [N/A], [N/A], [N/A]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is firmware-managed. &lt;code&gt;nvidia-smi --help&lt;/code&gt; still lists &lt;code&gt;-lgc&lt;/code&gt; / &lt;code&gt;--lock-gpu-clocks&lt;/code&gt; though, and it works — GB10 accepts arbitrary integer MHz values within silicon range even though &lt;code&gt;--query-supported-clocks=graphics&lt;/code&gt; returns &lt;code&gt;[N/A]&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nvidia-smi &lt;span class="nt"&gt;-lgc&lt;/span&gt; 1500,2000 &lt;span class="nt"&gt;-i&lt;/span&gt; 0
&lt;span class="go"&gt;GPU clocks set to "(gpuClkMin 1500, gpuClkMax 2000)" for GPU 0000000F:01:00.0
All done.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The daemon
&lt;/h2&gt;

&lt;p&gt;Three-band hysteresis, one actuator. Pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;every 30s:
  read temp.gpu
  if temp &amp;gt;= 78 C:              step_down(150 MHz), bounded by floor
  elif temp &amp;lt;= 72 C and cool_streak &amp;gt;= 3:  step_up(150 MHz), bounded by ceil
  else:                         hold
  cool_streak = cool_streak+1 if temp &amp;lt;= 72 else 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setpoints, floor 1800 MHz, ceil 3000 MHz (GB10 max is ~3003). At sustained 83 °C it walks the ceiling down in 150 MHz steps every 30 seconds until temp leaves the hot band, then holds. When load drops it relaxes back to the ceiling on a 3-sample cool streak so a brief dip doesn't clock the whole GPU down for the next hour.&lt;/p&gt;

&lt;h2&gt;
  
  
  Log — spark-23, live production node
&lt;/h2&gt;

&lt;p&gt;Same Ollama workload throughout, no config changes to the models or the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;time      temp   clock  util   action
07:46:28  82 C   2463   94%    STEP_DOWN
07:47:28  83 C   2463   94%    STEP_DOWN
07:47:58  83 C   2463   94%    STEP_DOWN
07:56:29  76 C   1976   95%    HOLD
07:57:29  77 C   1976   96%    HOLD
08:13:44  72 C   2093   94%    HOLD (cool streak 1)
08:14:14  72 C   2093   94%    HOLD (cool streak 2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;−11 °C sustained. No throttle events across the window. Latency impact is real but bounded — the floor cap of 1800 MHz vs stock 2463 MHz ≈ 27% worst-case clock reduction, and in practice the daemon rides much higher than that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;One actuator only. GB10 doesn't expose power-limit or fan PWM the way a 4090 does; I couldn't build a multi-objective controller even if I wanted to.&lt;/li&gt;
&lt;li&gt;Clock-locking slows inference. Whether the tradeoff is worth it depends on your workload — for me the 24/7 uptime and thermal headroom are worth the 5–15% median TPS hit.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sudo nvidia-smi -lgc&lt;/code&gt; needs passwordless sudo for the daemon user. I scope it in &lt;code&gt;/etc/sudoers.d/&lt;/code&gt; to only &lt;code&gt;-lgc *&lt;/code&gt; and &lt;code&gt;-rgc&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If you want the binary
&lt;/h2&gt;

&lt;p&gt;Wrote it up as a licensed install at &lt;a href="https://thermal.zctechnologies.org" rel="noopener noreferrer"&gt;https://thermal.zctechnologies.org&lt;/a&gt; — Go daemon, systemd unit, sudoers scoped, per-node monthly. Comment or DM if you'd rather just have the shell recipe; the algorithm above is the whole thing and I'm happy to answer questions about setpoints or the ExecStopPost=&lt;code&gt;nvidia-smi -rgc&lt;/code&gt; teardown so a graceful stop returns your GPU to stock clocks.&lt;/p&gt;

</description>
      <category>nvidia</category>
      <category>gpu</category>
      <category>llm</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
