DEV Community

Cover image for Thermal Management PCB Design for AI Accelerator Cards
AtlasPCBEngineering
AtlasPCBEngineering

Posted on • Originally published at atlaspcb.com

Thermal Management PCB Design for AI Accelerator Cards

The Thermal Challenge: 700W Through a Single Package

Modern AI accelerators (GPUs, TPUs, custom ASICs) push 300-700W through a single BGA package. The PCB underneath must simultaneously:

  1. Conduct heat away from the die to bottom-side heatsinks
  2. Deliver 500-800A at sub-1V to the core
  3. Route 5,000+ signals at 112G PAM4
  4. Survive 5-10 years of thermal cycling (125C junction, 40C ambient)

These requirements conflict. Here's how to balance them.

Thermal Via Array Design

The primary heat path through the PCB is thermal vias under the GPU/ASIC exposed pad.

Recommended Geometry

Parameter Value Notes
Via diameter 0.3mm (12 mil) Balance thermal vs routing
Via pitch 1.0mm (40 mil) Matches BGA ball pitch
Copper fill Filled + planarized Mandatory for BGA soldering
Pattern Grid under thermal pad Cover 80%+ of pad area
Aspect ratio Keep below 8:1 Ensures reliable plating

How Many Vias Do You Need?

For a 50x50mm package at 500W:

  • Target: 400-600 thermal vias
  • Each via conducts approximately 0.5-1W (depends on copper fill % and board thickness)
  • 400 vias in parallel: thermal resistance approximately 0.2 C/W from via contribution alone

But vias alone are insufficient. You also need:

  • 2oz+ copper internal planes for lateral spreading
  • Bottom-side heatsink with quality TIM
  • Adequate airflow over board surface

Power Delivery Network for 500A+ Loads

An AI accelerator switching between idle (100W) and compute (700W) in microseconds demands:

  • Less than 3% voltage droop at 0.85V core (max 25mV allowed)
  • Sub-nanosecond response from decoupling
  • Loop inductance below 50 pH for bulk cap connections

Copper Weight Strategy

Layer Function Minimum Cu Purpose
Top (BGA breakout) 1oz Signal routing
Power planes (VCore) 2oz Low-resistance distribution
Ground planes 2oz Return current + thermal spreading
Inner signal 1oz High-speed routing
Bottom (heatsink) 2oz Thermal pad + VRM connection

Power Via Farm

VRM output to GPU needs 100+ power vias in parallel:

  • Each 0.3mm via with 25um wall: approximately 0.5 mohm
  • 100 vias parallel: 5 uohm total (acceptable)
  • Ground return needs equal via count

Material Selection

GPUs running at 95C junction create 60-80C thermal excursions every few minutes. Over 5 years: 100,000+ thermal cycles.

Requirements

Property Minimum Reason
Tg 170C+ Prevents CTE jump below operating temp
Td 340C+ Solder reflow margin
CTE z-axis < 50 ppm/C Minimizes via barrel stress
CAF resistance High 0.85V with 3mil spacing

Material Tiers

Tier 1 (Data Center HPC):

  • Megtron 6: Dk 3.71, Df 0.004, Tg 200C
  • Isola I-Speed: Dk 3.6, Df 0.004, Tg 200C

Tier 2 (Edge AI, Inference):

  • Isola 370HR: Dk 4.04, Df 0.009, Tg 180C
  • TUC TU-872 SLK: Dk 3.9, Df 0.008, Tg 200C

Tier 3 (Dev Boards):

  • High-Tg FR-4 (Tg 170): adequate for prototypes, not production data center

16-Layer Stackup Example

L1  — Top signal (1oz) — BGA breakout
L2  — GND (2oz) — Impedance reference
L3  — Signal (1oz)
L4  — VCore power (2oz)
L5  — GND (2oz)
L6  — Signal (1oz)
L7  — GND (2oz)
L8  — Signal (1oz)
L9  — Signal (1oz)
L10 — GND (2oz)
L11 — Signal (1oz)
L12 — GND (2oz)
L13 — VCore power (2oz)
L14 — Signal (1oz)
L15 — GND (2oz) — Impedance reference
L16 — Bottom signal (1oz) — Decap, VRM
Enter fullscreen mode Exit fullscreen mode

Key design choices:

  • Power planes sandwiched between GND for low-inductance PDN
  • 2oz on all GND/PWR for thermal + DC resistance
  • Symmetric construction prevents warping
  • Total thickness approximately 2.4mm

Via Reliability Under Thermal Stress

At 16 layers and 2.4mm thickness, via aspect ratio hits 8:1. Each thermal cycle stresses the copper barrel due to CTE mismatch (copper 17 ppm/C vs FR-4 z-axis 50-70 ppm/C).

Mitigation strategies:

  • IPC Class 3 plating: 25um average, 20um minimum
  • IST qualification to 1000+ cycles
  • Conductive via fill for additional thermal path
  • Backdrilling unused stubs (also helps signal integrity at 56+ Gbps)

Practical Considerations

Via-in-Pad for Decoupling

MLCC capacitors under the BGA field need via-in-pad connections. These vias must be:

  • Filled and planarized (same process as thermal vias)
  • Connected to nearest power/ground plane
  • Within 2mm of the BGA power ball they serve

Thermal Conductivity Limits

Even premium laminates only conduct 0.3-0.4 W/mK. The board is a thermal insulator. Your heat removal strategy must rely on copper (planes + vias) as the primary conduction medium through the board.

For extreme cases (700W+):

  • Embedded copper coin under die
  • Metal-core PCB for VRM section
  • Consider thermal paste-filled through-holes

Designing AI hardware? AtlasPCB fabricates GPU/ASIC boards with thermal via arrays, 2-5oz copper, backdrilling, and via-in-pad as standard processes. Up to 30 layers on high-performance laminates.

Related reading:

Top comments (0)