DEV Community

david dai
david dai

Posted on

That AI Device Kept Rebooting – Turns Out It Was the Power Supply

Let me get straight to the point: Don't assume that AI hardware only needs sufficient compute power. Pick the wrong power supply, and no matter how stable your model inference is, it won't matter.

I recently did a post-mortem on an edge AI inference device project. The device itself was well-designed, running a lightweight vision model, with over 200 units deployed in the field. Two months later, the failure rate hit nearly 15%. The symptoms were random: devices would reboot occasionally, inference results would be intermittently abnormal, and the logs showed no clear pattern.

Everyone started by checking the software — driver versions, model quantization, thermal strategies — all checked out fine. Then I spent three days with an oscilloscope and finally caught it: the 24V input switching power adapter dropped to 18V the moment the NPU kicked in, with nearly 200mV of high-frequency ripple riding on top.

Cracked open that power supply — a classic "generic special." The label said "medical grade," but inside: the input filter was missing one stage of common-mode choke, and the capacitor after the rectifier bridge was only rated for 85°C. Conducted EMI hadn't been pre-certified at all — noise was coupling directly onto the main I2C bus. The device that kept reporting "model load failed"? The power supply had pushed the ripple on the Flash supply past its threshold.

Swapped in a properly medical EMC-certified power supply (30µA leakage current limit, 4kV reinforced insulation, two-stage EMI filtering), and all the anomalies disappeared. Then we re-did the thermal derating: the sealed enclosure was hitting 65°C internally, so we derated the power supply from 60W to 45W and swapped the housing for an aluminum conduction-cooled case. Three more months in the field — zero failures.

The point is this: AI hardware keeps getting smaller, power density keeps climbing, and the power adapter has become the easiest thing to leave as an afterthought. But a lot of those mysterious field issues — ESD resets, intermittent lockups, communication errors — trace back to power supplies that fell apart under high-frequency switching noise and thermal accumulation.

When selecting a power supply, don't just look at voltage and current. Ask your supplier three questions:

Which standard was used to measure conducted and radiated EMI? Do you have an actual test report?

Was output ripple measured at no load or full load? What's the peak-to-peak value?

What's the temperature rating of the electrolytic capacitors? Do you have batch traceability?

These three questions will filter out 80% of the unreliable units. For the rest, put them on the bench. Use an electronic load to simulate dynamic loading, aim a heat gun at them up to the rated temperature limit, and run them for 24 hours. Pass that test, and you'll sleep better in the field.

AI is a smart business, but what keeps it working reliably is often this unglamorous, unexciting grind.

Top comments (0)