<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yaroslav Pristupa</title>
    <description>The latest articles on DEV Community by Yaroslav Pristupa (@yaro_dev).</description>
    <link>https://dev.to/yaro_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3841466%2Ffdd736c5-f87b-4bc7-a2dd-052cab4c37c8.png</url>
      <title>DEV Community: Yaroslav Pristupa</title>
      <link>https://dev.to/yaro_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yaro_dev"/>
    <language>en</language>
    <item>
      <title>Task Manager is lying about your GPU temps. Here is how to read the real data in Python</title>
      <dc:creator>Yaroslav Pristupa</dc:creator>
      <pubDate>Mon, 13 Apr 2026 12:46:34 +0000</pubDate>
      <link>https://dev.to/yaro_dev/task-manager-is-lying-about-your-gpu-temps-here-is-how-to-read-the-real-data-in-python-d2p</link>
      <guid>https://dev.to/yaro_dev/task-manager-is-lying-about-your-gpu-temps-here-is-how-to-read-the-real-data-in-python-d2p</guid>
      <description>&lt;p&gt;As developers, we are used to trusting our system monitors. When you are pushing a high-end laptop GPU to its absolute limits – say, running a massive batch in Stable Diffusion or training a local LLM – you naturally keep an eye on Windows Task Manager. &lt;/p&gt;

&lt;p&gt;It tells you your GPU is sitting at 100% utilization and the temperature is a comfortable 75°C. You think everything is fine. But 30 minutes later, your generation speed drops by half, the system stutters, and your laptop feels like a hotplate. &lt;/p&gt;

&lt;p&gt;Task Manager isn't exactly lying, but it is omitting the most important variable: the Memory Junction (VRAM) temperature. &lt;/p&gt;

&lt;p&gt;Modern GDDR6X memory chips run incredibly hot. In a laptop with shared heat pipes, the GPU core can be perfectly cooled while the VRAM hits 105°C, triggering a massive hardware-level thermal throttle. &lt;/p&gt;

&lt;p&gt;When I set out to build a utility to fix this for my own AI workflows, my first hurdle was simply getting the data. Here is a look at how I approached accessing this hidden telemetry, and why I ended up using a sidecar pattern in Python instead of writing low-level C++.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Telemetry Nightmare: WMI, NVAPI, and Ring-0
&lt;/h3&gt;

&lt;p&gt;My first thought was to use Windows Management Instrumentation (WMI). It is built-in, easy to query with Python, and safe. Unfortunately, WMI is notoriously slow and, more importantly, it rarely exposes granular GPU sensor data like the Memory Junction temperature. It usually just gives you the core package temp.&lt;/p&gt;

&lt;p&gt;Next, I looked at NVIDIA's NVAPI. While it is the official route, NVAPI is a massive, complex C++ SDK. Wrapping it for a lightweight Python background script felt like massive overkill. Plus, undocumented calls change between driver versions, making it a maintenance nightmare.&lt;/p&gt;

&lt;p&gt;The "hardcore" route would be writing a custom kernel-mode driver (Ring-0) to read the SMBus directly. But doing that in 2026 means dealing with strict Windows driver signature enforcement, triggering anti-cheat software in games, and risking blue screens. I wanted a lightweight utility, not a rootkit.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Sidecar Pattern: LibreHardwareMonitor
&lt;/h3&gt;

&lt;p&gt;Instead of fighting the OS, I looked at the open-source community. Tools like LibreHardwareMonitor (LHM) already do the heavy lifting. They have safe, signed drivers that know exactly how to talk to the thermal sensors across hundreds of different GPU architectures.&lt;/p&gt;

&lt;p&gt;Even better, LHM has a built-in local web server that exposes all of its sensor data as a clean JSON API. &lt;/p&gt;

&lt;p&gt;This led me to a sidecar architecture. I could run a headless instance of LHM alongside my Python application and simply poll &lt;code&gt;localhost&lt;/code&gt; for the exact metrics I needed. No kernel drivers, no C++ wrappers, just standard HTTP requests.&lt;/p&gt;

&lt;p&gt;Here is a simplified conceptual look at how you can grab the VRAM temperature using Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_vram_temp&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Polling the local LibreHardwareMonitor JSON API
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8085/data.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Traverse the JSON tree to find the GPU Memory Junction sensor
&lt;/span&gt;        &lt;span class="c1"&gt;# (The actual path depends on the specific hardware tree)
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hardware&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Children&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Children&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPU&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hardware&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hardware&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Children&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Temperatures&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sensor&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Children&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPU Memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sensor&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sensor&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; °C&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Telemetry error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Current VRAM Temp: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;get_vram_temp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;°C&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is fast, it is reliable, and it relies on a tool that is already trusted by the enthusiast community.&lt;/p&gt;

&lt;h3&gt;
  
  
  From Monitoring to Active Management
&lt;/h3&gt;

&lt;p&gt;Once I had a reliable stream of real-time VRAM temperatures, I needed to act on it. If the memory hit 100°C, I needed to cool it down before the hardware firmware panicked at 105°C.&lt;/p&gt;

&lt;p&gt;Again, I wanted to avoid global power limits. I wanted to pause the specific CUDA process that was causing the heat. In Windows, you can do this using the native &lt;em&gt;NtSuspendProcess&lt;/em&gt; and &lt;em&gt;NtResumeProcess&lt;/em&gt; functions from &lt;em&gt;ntdll.dll&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;Using Python's &lt;em&gt;ctypes&lt;/em&gt; library, calling these low-level Windows APIs is surprisingly straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;

&lt;span class="c1"&gt;# Load the NTDLL library
&lt;/span&gt;&lt;span class="n"&gt;ntdll&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;windll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ntdll&lt;/span&gt;

&lt;span class="c1"&gt;# Define the required access rights
&lt;/span&gt;&lt;span class="n"&gt;PROCESS_SUSPEND_RESUME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x0800&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;suspend_process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Open the process
&lt;/span&gt;    &lt;span class="n"&gt;handle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;windll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenProcess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PROCESS_SUSPEND_RESUME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Suspend the threads
&lt;/span&gt;        &lt;span class="n"&gt;ntdll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NtSuspendProcess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;windll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CloseHandle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;resume_process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;handle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;windll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenProcess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PROCESS_SUSPEND_RESUME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Resume the threads
&lt;/span&gt;        &lt;span class="n"&gt;ntdll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NtResumeProcess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;windll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CloseHandle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By suspending the heavy AI process for just 100 to 200 milliseconds, the OS scheduler drops the hardware load to zero. The CUDA context stays perfectly safe in the VRAM – the model doesn't crash – but the shared heat pipes get a tiny window to dissipate the thermal soak. &lt;/p&gt;

&lt;h3&gt;
  
  
  Putting it all together
&lt;/h3&gt;

&lt;p&gt;Of course, a simple &lt;em&gt;time.sleep()&lt;/em&gt; loop isn't enough for a production environment. If you pause the process too long, the system lags. If you pause it too little, the VRAM still overheats. &lt;/p&gt;

&lt;p&gt;I eventually built a dynamic mathematical model that takes the telemetry from LHM and calculates a precise duty cycle for the &lt;em&gt;NtSuspendProcess&lt;/em&gt; calls on the fly. It acts like a software-based Pulse Width Modulation (PWM) for your GPU workload. &lt;/p&gt;

&lt;p&gt;I packaged this Python logic, compiled it down with Nuitka, and wrapped it in a clean WebView2 UI. The result is &lt;a href="https://vramshield.com" rel="noopener noreferrer"&gt;VRAM Shield&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;If you are building your own hardware management tools, don't feel pressured to write everything in C++ from scratch. Leveraging established open-source telemetry tools via local APIs and using Python's &lt;em&gt;ctypes&lt;/em&gt; for WinAPI calls is an incredibly powerful, safe, and fast way to build system utilities.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>hardware</category>
      <category>softwaredevelopment</category>
      <category>gpu</category>
    </item>
    <item>
      <title>I built a duty-cycle throttler for my RTX 4060 (because undervolting wasn't enough)</title>
      <dc:creator>Yaroslav Pristupa</dc:creator>
      <pubDate>Mon, 06 Apr 2026 12:44:49 +0000</pubDate>
      <link>https://dev.to/yaro_dev/i-built-a-duty-cycle-throttler-for-my-rtx-4060-because-undervolting-wasnt-enough-2onn</link>
      <guid>https://dev.to/yaro_dev/i-built-a-duty-cycle-throttler-for-my-rtx-4060-because-undervolting-wasnt-enough-2onn</guid>
      <description>&lt;p&gt;If you spend any time on Reddit or hardware forums complaining about your laptop overheating during local AI workloads, you will get the exact same advice within five minutes: &lt;em&gt;"Just undervolt it, bro"&lt;/em&gt; or &lt;em&gt;"Cap your power limit to 70% in MSI Afterburner."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For a long time, that was my default approach too. When I started running heavy generative models like Flux.1 and complex ComfyUI video pipelines on my RTX 4080 laptop, the heat was intense. The fans sounded like a jet engine, and the chassis was physically uncomfortable to touch. So, I opened Afterburner, dropped the global power limit by 30%, and called it a day.&lt;/p&gt;

&lt;p&gt;But after a few weeks of running long, unattended overnight batches, I realized something frustrating. Global power capping is a blunt instrument. It is the wrong tool for a very specific problem, and it was silently killing my iteration speeds. &lt;/p&gt;

&lt;p&gt;Here is why I completely abandoned global power limits for my AI workflows, and how I transitioned to a process-level duty-cycle approach instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  The problem with global limits in AI workloads
&lt;/h3&gt;

&lt;p&gt;To understand why power capping sucks for local AI, you have to look at how these models actually stress your hardware. &lt;/p&gt;

&lt;p&gt;Gaming is a dynamic workload. You have loading screens, inventory menus, and scenes with varying geometric complexity. The GPU gets micro-breaks. AI inference, on the other hand, is a flat, unrelenting 100% utilization of your CUDA cores and memory bandwidth. It is a sustained synthetic stress test.&lt;/p&gt;

&lt;p&gt;When you apply a global power cap – say, restricting a 175W laptop GPU to 100W – that cap affects everything simultaneously. You are starving the core, the memory controller, and the auxiliary components. Yes, your total heat output drops. But you are also artificially limiting your hardware's compute capability from the very first second of generation, even when the silicon is still sitting at a cool 45°C. &lt;/p&gt;

&lt;p&gt;More importantly, global power capping completely ignores the actual bottleneck in modern laptops: the heat density of the VRAM. &lt;/p&gt;

&lt;p&gt;Because of the shared heat pipe designs in laptops like the Legion or Zephyrus, the GPU core might be well-ventilated and perfectly happy at 70°C. But the GDDR6X memory modules, packed tightly around that core, are absorbing all the thermal soak. &lt;/p&gt;

&lt;p&gt;Even with a global power cap, sustained AI workloads will eventually push that Memory Junction temperature to the critical 105°C limit. When that happens, the laptop's low-level firmware panics. It triggers an aggressive emergency throttle, slashing memory clocks by half. Your iterations-per-second (it/s) fall off a cliff. You end up with erratic, unpredictable generation times, and you are left wondering why your "cool" GPU is performing so poorly.&lt;/p&gt;

&lt;h3&gt;
  
  
  The duty-cycle alternative (Pulse Throttling)
&lt;/h3&gt;

&lt;p&gt;I wanted a way to manage this specific VRAM thermal load without castrating my GPU's peak compute power. I started looking at duty cycles – specifically, modulating the workload of the single, intensive Python process running the AI.&lt;/p&gt;

&lt;p&gt;The logic was straightforward. If the VRAM is overheating because of a sustained, unbroken load, the most effective way to cool it down is to simply stop it from doing work for a fraction of a second. &lt;/p&gt;

&lt;p&gt;By utilizing the native Windows API – specifically the &lt;em&gt;NtSuspendProcess&lt;/em&gt; and &lt;em&gt;NtResumeProcess&lt;/em&gt; functions – I could introduce "micro-pauses" directly into the CUDA-heavy process. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy3sk2qhj92dnx51304p.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy3sk2qhj92dnx51304p.webp" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is essentially Pulse Throttling. Imagine applying a 15% suspension duty cycle. The process runs at absolute maximum performance for 850 milliseconds, and then it is completely suspended for 150 milliseconds. &lt;/p&gt;

&lt;p&gt;From the OS perspective, the thread is just frozen. The CUDA context remains perfectly intact in the VRAM, the model doesn't crash, and no data is lost. But physically, those 150 milliseconds of zero load give the memory modules and the shared heat pipes just enough "breathing room" to dissipate the accumulated heat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Granular management vs. Blunt force
&lt;/h3&gt;

&lt;p&gt;The results of this approach were incredibly eye-opening. &lt;/p&gt;

&lt;p&gt;On my test machine, applying a strict 100W global power cap reduced my Memory Junction temperature by about 6°C. However, it permanently slowed down every single step of the generation process. My baseline it/s dropped significantly, and the VRAM still eventually crept up to the throttle point during multi-hour runs.&lt;/p&gt;

&lt;p&gt;In contrast, when I removed the power cap and applied a dynamic duty-cycle suspension, the Memory Junction temperature dropped by 12°C. &lt;/p&gt;

&lt;p&gt;Because the suspension was only applied to the specific render process, the rest of my Windows environment remained perfectly responsive. I could browse the web and watch YouTube without the whole system lagging. I wasn't just blindly capping power; I was managing the heat density exactly at the source.&lt;/p&gt;

&lt;p&gt;Instead of my iteration speeds crashing unpredictably when the firmware panicked, they remained perfectly consistent for 12 hours straight. The "average" speed over a long run was actually higher than with a power cap, because the hardware never hit the 105°C emergency wall. &lt;/p&gt;

&lt;h3&gt;
  
  
  Making it smart
&lt;/h3&gt;

&lt;p&gt;Of course, a static 15% pause is not ideal. You don't want to pause the process if the VRAM is only at 80°C. &lt;/p&gt;

&lt;p&gt;To solve this, I wrote a background service in Python that hooks into LibreHardwareMonitor to pull real-time telemetry from the Memory Junction sensors. Instead of a dumb on/off switch, I implemented an advanced mathematical model that calculates the required duty cycle on the fly. &lt;/p&gt;

&lt;p&gt;If the temperature is safe, the duty cycle is 0%. The GPU runs at full throttle. As the VRAM approaches the danger zone, the algorithm dynamically scales the micro-pauses – maybe 3% throttling at first, scaling up only if the heat continues to rise. It finds the exact equilibrium point where the heat dissipation matches the heat generation.&lt;/p&gt;

&lt;p&gt;I eventually packaged this entire pulse-throttling engine into a standalone Windows utility called &lt;a href="https://vramshield.com" rel="noopener noreferrer"&gt;VRAM Shield&lt;/a&gt;. It runs quietly in the system tray, monitors the hardware, and applies these micro-suspensions automatically. &lt;/p&gt;

&lt;p&gt;If you are running local LLMs, generating huge batches in Stable Diffusion, or dealing with heavy 3D renders on a laptop, stop neutering your GPU with global power limits. Managing the duty cycle of the process itself is a much safer, more transparent, and significantly more effective way to keep your hardware alive without sacrificing its potential.&lt;/p&gt;

</description>
      <category>softwaredevelopment</category>
      <category>gpu</category>
      <category>vram</category>
      <category>hardware</category>
    </item>
    <item>
      <title>How I fixed the 30-minute performance drop in Cyberpunk 2077</title>
      <dc:creator>Yaroslav Pristupa</dc:creator>
      <pubDate>Tue, 24 Mar 2026 10:40:55 +0000</pubDate>
      <link>https://dev.to/yaro_dev/how-i-fixed-the-30-minute-performance-drop-in-cyberpunk-2077-4i1m</link>
      <guid>https://dev.to/yaro_dev/how-i-fixed-the-30-minute-performance-drop-in-cyberpunk-2077-4i1m</guid>
      <description>&lt;p&gt;Every laptop gamer knows this exact cycle. You finally have some free time, you launch a heavy title like Cyberpunk 2077 or Black Myth: Wukong, and for the first 15 to 20 minutes, your machine runs like an absolute dream. The frame rate is locked. The frame times are a flat line. Everything feels incredibly responsive.&lt;/p&gt;

&lt;p&gt;But then, right around the 30-minute mark, the game starts to feel slightly off. &lt;/p&gt;

&lt;p&gt;You notice micro-stutters during fast camera pans. Your average FPS suddenly drops by 20% or more. You alt-tab to check your telemetry in MSI Afterburner or Task Manager, expecting to see your hardware melting. Instead, your GPU core is sitting at a totally reasonable 75°C to 78°C. Your CPU is fine. &lt;/p&gt;

&lt;p&gt;So what exactly is happening? Why does the performance fall off a cliff when the core temperatures look perfectly safe?&lt;/p&gt;

&lt;p&gt;As someone who spends a lot of time profiling high-performance hardware and writing system utilities, I decided to dig into this "mystery slowdown." What I found is a massive hardware bottleneck that standard monitoring tools completely ignore. &lt;/p&gt;

&lt;p&gt;The culprit is the thermal density of your Memory Junction – specifically, the VRAM.&lt;/p&gt;

&lt;h3&gt;
  
  
  The shared heat pipe problem
&lt;/h3&gt;

&lt;p&gt;To understand why this happens, we have to look at how modern gaming laptops are built. Whether you have a Lenovo Legion, an ASUS Zephyrus, or a Razer Blade, most high-end machines use a shared cooling assembly. The same copper heat pipes carry thermal energy away from both the GPU core and the surrounding components.&lt;/p&gt;

&lt;p&gt;This design is great for burst workloads. But during a sustained two-hour gaming session, it creates a severe "thermal soak" effect. &lt;/p&gt;

&lt;p&gt;The GPU core itself is usually fine. It has a large die surface area and gets priority contact with the best cooling zones. But the VRAM modules – especially the high-performance GDDR6 or GDDR6X chips on RTX 30- and 40-series laptops – are packed incredibly tight around that core. &lt;/p&gt;

&lt;p&gt;As you play, these memory chips generate a constant, intense amount of heat. During my tests with a mobile RTX 4080, I watched the telemetry closely. While the GPU core stabilized at a comfortable 78°C, the Memory Junction temperature just kept climbing. &lt;/p&gt;

&lt;p&gt;At the 20-minute mark, it hit 95°C. By minute 35, it hit the hard wall: 105°C.&lt;/p&gt;

&lt;h3&gt;
  
  
  The firmware's panic button
&lt;/h3&gt;

&lt;p&gt;When your VRAM hits 105°C, the laptop's low-level firmware steps in to stop the silicon from physically degrading. It triggers an aggressive emergency throttle. &lt;/p&gt;

&lt;p&gt;The system instantly drops the memory clock speeds by nearly 50% to cut the heat generation. This is the exact moment you feel your game stutter and your FPS tank. &lt;/p&gt;

&lt;p&gt;The firmware keeps the memory choked until the sensors report a significant drop in temperature. Once it cools down a few degrees, the clocks boost back up to maximum. The memory rapidly overheats again, the throttle kicks back in, and you are stuck in a miserable "yo-yo" performance loop. &lt;/p&gt;

&lt;p&gt;The most frustrating part is the blindness. Because basic overlays only report the GPU core temperature, users are left chasing ghosts. They roll back NVIDIA drivers, disable Windows background services, or blame the game developers for "memory leaks." In reality, they are just hitting a localized hardware thermal limit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sledgehammers vs. Software
&lt;/h3&gt;

&lt;p&gt;Once I identified the problem, I looked at the standard community fixes. They were all terrible. &lt;/p&gt;

&lt;p&gt;You can repaste the laptop and swap the VRAM thermal pads. This actually works well, but it voids your warranty and requires you to completely disassemble a $2,500 machine. &lt;/p&gt;

&lt;p&gt;You can use a global undervolt or strictly cap the GPU power limit. This lowers the overall heat, but it also leaves a ton of performance on the table. You end up nerfing your expensive GPU even during the times when it is running perfectly cool. &lt;/p&gt;

&lt;p&gt;I wanted a software solution. I wanted a way to manage this specific heat soak without castrating the laptop's peak performance. &lt;/p&gt;

&lt;h3&gt;
  
  
  Building a dynamic safety net
&lt;/h3&gt;

&lt;p&gt;I started experimenting with process-level modulation using the Windows API. Specifically, I looked at the native NtSuspendProcess and NtResumeProcess functions. &lt;/p&gt;

&lt;p&gt;The theory was simple. If I could introduce microscopic pauses into the heavy GPU-bound game thread, the Windows scheduler would momentarily drop the hardware load. If I gave the memory modules just a few milliseconds of "breathing room" every second, the heat pipes might have enough time to clear the thermal backlog before the firmware hit its 105°C panic button.&lt;/p&gt;

&lt;p&gt;I wrote a Python script to test this out. It ran as a background service, pulling real-time Memory Junction telemetry from LibreHardwareMonitor. &lt;/p&gt;

&lt;p&gt;Instead of just blindly pausing the game – which would look like a massive lag spike – I built a dynamic modulation engine. I implemented a rather complex mathematical model that calculates the exact duty cycle needed on the fly. It constantly evaluates how fast the VRAM is heating up and calculates the absolute minimum pause duration required to stabilize the temperature. &lt;/p&gt;

&lt;p&gt;We are talking about milliseconds. It is a pulse-throttling approach that happens so fast the human eye rarely catches it, but the thermal sensors absolutely do.&lt;/p&gt;

&lt;h3&gt;
  
  
  The results
&lt;/h3&gt;

&lt;p&gt;The impact on my Cyberpunk 2077 sessions was immediate. &lt;/p&gt;

&lt;p&gt;With the script running, my Memory Junction temperature stabilized at a safe 92°C instead of slamming into the 105°C wall. I lost a tiny fraction of my absolute peak FPS, but the catastrophic 40% performance drops completely vanished. &lt;/p&gt;

&lt;p&gt;More importantly, the frame times became a flat, consistent line. Instead of the jagged, erratic performance of a hardware-throttled system, the game remained smooth and responsive for hours. I no longer had to sacrifice long-term stability for short-term benchmark numbers.&lt;/p&gt;

&lt;p&gt;I initially built this just to keep my own laptop from cooking itself. But after seeing how well the dynamic modulation worked for both gaming and heavy local AI workloads (like Stable Diffusion), I refined the code, added a proper UI, and packaged it into a utility called &lt;a href="https://vramshield.com" rel="noopener noreferrer"&gt;VRAM Shield&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;If you are tired of your laptop silently throttling your games, stop messing with your drivers. Check your Memory Junction temps. Understanding the physical limits of your VRAM – and managing them proactively – is the only real way to get the sustained performance you paid for.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>gaming</category>
      <category>hardware</category>
      <category>windows</category>
    </item>
  </channel>
</rss>
