DEV Community

Zoey Lee
Zoey Lee

Posted on

GB300 Cold Plate Liquid Cooling: An Advanced Full Liquid Cooling Architecture

As artificial intelligence workloads continue to drive exponential increases in server power density, effective thermal management has become a critical differentiator for next-generation data center infrastructure. The GB300 platform introduces a comprehensive cold plate liquid cooling solution optimized for a 1U chassis, delivering robust support for high-power compute nodes with a total liquid cooling power dissipation of 6.9 kW. Operating at approximately 8 liters per minute (LPM) coolant flow rate and accommodating supply liquid temperatures up to 45°C, this design sets a new benchmark for efficiency, serviceability, and scalability in AI server deployments.

Building upon proven liquid cooling principles while addressing prior limitations, the GB300 architecture offers valuable insights for future AI system designs. This article explores the evolution from the GB200 platform and highlights the key engineering advancements in the GB300 full cold plate solution.

Lessons from the GB200 Liquid Cooling Design
The GB200 established a solid foundation for modular liquid cooling by integrating two compute units into a single module with parallel cold plate configurations. Notable features included quick-disconnect fittings on each module for streamlined maintenance, parallel GPU cold plates feeding into a series connection with the CPU/GPU cold plate, and a large CPU cold plate that also managed secondary heat sources such as LPDDR5 memory and CX7 networking components.

Cold plates within each module were interconnected using rigid copper tubing or flexible corrugated hoses, creating a hybrid series-parallel flow path. While functional, this approach revealed several operational challenges:

Complex Maintenance: The integrated three-plate assembly (two GPUs and one CPU) increased the complexity of installation and removal.

Elevated Flow Resistance: Parallel GPU plates combined in series with the CPU plate resulted in higher overall pressure drop within the node.

Cost and Size Implications: The oversized CPU cold plate required to cover multiple components drove higher material costs and consumed valuable space.

Limited Expandability: Without a dedicated manifold, adding new cold plates for emerging components required extensive redesign of the flow network.

These insights directly informed the iterative improvements implemented in the GB300 platform.

Key Innovations in the GB300 Cold Plate Architecture

The GB300 represents a significant leap forward, leveraging the SXM7 packaging for the B300 GPU to enable independent installation and removal from the baseboard. This modularity enhances flexibility and reduces downtime during upgrades or servicing.

Central to the new design is the integration of a sophisticated liquid manifold (Manifold) system within the node. Equipped with 10 pairs of high-reliability NVQD03 quick-disconnect couplings for both inlet and outlet paths, the manifold enables true parallel liquid delivery to multiple components. This architecture dramatically simplifies servicing while optimizing flow distribution.

Additional enhancements include:

Comprehensive Component Coverage: Liquid cooling is extended beyond core processors to critical high-power elements, including the Power Distribution Board (PDB), OSFP optical modules, and BF3 network cards.

Innovative SSD Cooling: Eight M.2 SSDs benefit from dedicated cold plates featuring a pluggable design. Thermal Interface Material (TIM) ensures low thermal resistance between the cold plate and drives, while integrated wear-resistant structures maintain reliability under repeated insertion cycles.

Optimized GPU and CPU Flow Paths: The four B300 GPU cold plates operate in full parallel configuration for balanced cooling and minimal pressure drop. The CPU cold plate adopts a parallel design supported by heat pipes that efficiently transfer heat from LPDDR5 memory and CX8 networking chips. The flexible nature of these heat pipes allows for a more compact CPU cold plate design without sacrificing performance.

Detailed Cooling Circuit Configuration

The GB300 node features multiple independent and intelligently grouped cooling loops:

GPU Loops: Four independent GPU cold plates with dedicated inlet/outlet paths for maximum parallelism and uniform temperature distribution.

CPU and Associated Components: CPU cold plate with heat pipe integration for memory and networking chips.

OSFP Modules: Upper OSFP cold plates connected in series via flexible hoses, allowing individual module float and optimal contact.
SSD Loop: Fully independent pluggable circuit optimized for hot-swap operations.

PDB Cooling: Dedicated loop for the power distribution board.
BF3 and Lower OSFP Integration: Shared loop where two OSFP plates connect in series to the BF3 cold plate. QSFP optical modules on the BF3 are cooled via embedded bent copper tubes within a conductive plate for precise thermal management.

This modular approach ensures each component receives tailored coolant flow while maintaining overall system hydraulic balance.

Performance, Reliability, and Future-Ready Benefits

By implementing a manifold-centric parallel architecture with strategically placed quick connects, the GB300 solution achieves lower flow resistance, faster node servicing, and greater operational efficiency compared to its predecessor. All major high-power dissipation elements — CPUs, GPUs, NICs, PDB, optical modules, and storage — are now fully liquid-cooled, contributing to enhanced system reliability, reduced fan noise, and higher overall power density within the compact 1U form factor.

The design also improves energy efficiency by supporting warmer coolant inlet temperatures (up to 45°C), which reduces chiller energy consumption and aligns with modern data center sustainability goals. Furthermore, the scalable manifold design provides headroom for future component additions without major architectural overhauls.

Conclusion
The GB300 full cold plate liquid cooling architecture marks a mature evolution in AI server thermal management. By addressing the practical limitations observed in the GB200 generation and introducing innovative features such as comprehensive manifolds, pluggable SSD cooling, and extensive component coverage, NVIDIA has delivered a robust, serviceable, and forward-looking solution.

As AI infrastructure demands continue to grow, the principles embodied in the GB300 — modularity, parallel flow optimization, and holistic thermal design — will serve as a valuable reference for next-generation liquid-cooled systems. This advancement not only meets today’s high-performance computing requirements but also paves the way for even denser and more efficient AI deployments in the years ahead.

Quick Tip:
Lian Li Liquid Cooling is a global leader in liquid cooling solutions. Backed by its own R&D team, the company has developed specialized liquid cooling solutions for the NVIDIA GB300 series. Its product lineup—ranging from cold plates and liquid-cooled server racks to CDUs and liquid-cooled data center containers—is deeply optimized for the GB300, achieving a PUE of below 1.15 and meeting authoritative international standards such as RoHS, CE, and UL.

Top comments (0)