DEV Community

mikebains41-debug
mikebains41-debug

Posted on

I found 146W of "ghost power" on NVIDIA A100 at 0% utilization. Here's my API to detect it

I ran 35 controlled energy tests on NVIDIA A100 and H100 GPUs on RunPod.

Standard monitoring tools missed something critical.

When nvidia-smi reports 0% GPU utilization, data center operators assume the GPU is idle. Billing stops. Cost tracking stops.

The power draw does not stop.

On A100 SXM, I consistently measured 67–146W while utilization reported 0%. I call it ghost power.

Key findings

  • A100 idle floor: 67.1W — never drops lower
  • Peak ghost power: 146.7W at 0% utilization — sustained 11 minutes
  • H100 SXM: zero ghost power detected across 11 tests
  • FP16 draws 60% more power than FP32 at same matrix size
  • Power capping blocked at hypervisor level on RunPod — tenants cannot fix this
  • $58.70 per GPU per year in pure idle waste
  • At 1 million GPUs globally — $58M/year invisible waste

Live API & Interactive Docs

Base API URL: https://ai-gpu-brain-v3.onrender.com

Interactive Swagger docs (try it in your browser):

👉 https://ai-gpu-brain-v3.onrender.com/docs

Try it yourself (curl)


bash
curl https://ai-gpu-brain-v3.onrender.com/detect/a100/13
Enter fullscreen mode Exit fullscreen mode

Top comments (0)