DEV Community

Anantha
Anantha

Posted on

Why GPU Density Just Broke Two Decades of Data Centre Design Assumptions

For most of the last twenty years, enterprise data centre design optimised for a fairly stable target. Racks drew somewhere between 5 and 15 kilowatts. Air cooling — cold aisle, hot aisle, raised floor, perforated tiles — was sufficient. Power densities crept up gradually, and the operational playbook stayed roughly the same from one generation of servers to the next. That entire baseline broke in about eighteen months.
The cause is well known: AI training and inference workloads built around dense GPU clusters. An NVIDIA H100 server pulls roughly 10 kilowatts on its own. A fully populated rack of H100s or H200s can exceed 60 kW. The newer Blackwell-based systems push individual racks past 130 kW. Air cooling cannot move that much heat, and even if it could, the noise levels and air velocities required would make the floor unworkable. Liquid cooling, which used to be a niche optimisation, has become a structural requirement.
If you are responsible for any infrastructure decisions in 2026 — whether you are building, buying, or just choosing where to deploy a model — the implications are worth being clear about.
The math of why air cooling ran out
Air carries about 1 joule per litre per degree Celsius. That number is fixed by physics. To remove a kilowatt of heat, you need to move a specific volume of air at a specific temperature delta. As rack power densities climb past 30-40 kW, the volume of air required becomes physically impractical — the airflow rates needed exceed what perforated tiles, server fans, and CRAC units can sustainably deliver.
Water, by contrast, carries about 4,180 joules per litre per degree Celsius — roughly 4,000 times more thermal capacity per unit volume. A coolant loop moving a fraction of the volume of air can remove orders of magnitude more heat. This is not a marketing claim; it is why every dense AI deployment converges on some form of liquid cooling.
The three liquid cooling approaches that matter
Not all liquid cooling is the same, and the differences matter for both capital cost and operational complexity:
• Rear-door heat exchangers. A liquid-cooled coil sits at the back of the rack, cooling the hot air as it exits. Server hardware itself is unchanged — fans push air through the chassis as before, but the air is then chilled before it returns to the room. This is the lowest-friction path to higher densities (typically supports 30-50 kW per rack) and is often the bridge solution for facilities adding GPU capacity to existing halls.
• Direct-to-chip liquid cooling. Coolant is plumbed directly to cold plates mounted on CPUs and GPUs. The hot components transfer heat into the liquid loop without ever going through air. This supports densities of 80-130+ kW per rack and is what the major hyperscalers and AI cloud providers are deploying for their newest generations of accelerator hardware.
• Immersion cooling. Entire servers are submerged in a dielectric fluid that absorbs heat directly from every component. The thermal performance is extraordinary, but it requires purpose-designed hardware, completely different facility plumbing, and operational practices that most enterprise teams have no experience with. It is currently a niche choice — powerful in the right context, expensive in most others.

For most enterprise AI workloads in 2026, direct-to-chip liquid cooling is the architectural default. Rear-door exchangers are used as a transitional path. Immersion cooling is being evaluated but rarely deployed at scale outside hyperscale operators.
The certification matters more than the marketing
Any data centre operator can claim to support liquid cooling. Far fewer have actually built and certified facilities for the densities that current AI hardware demands. The reference point worth knowing is NVIDIA's DGX-Ready Data Center programme, which audits facilities against specific power, cooling, and operational criteria for running NVIDIA reference architectures.
In India, the operator that holds the certification for liquid-cooled DGX-Ready operations supporting 130+ kW per rack is Sify. This matters because the difference between a facility that claims to support high-density GPU racks and one that has been independently certified to do so is usually the difference between a deployment that works and one that hits thermal throttling within weeks. AI infrastructure is unforgiving of facility shortcuts.
For enterprises evaluating where to host AI workloads in India, the underlying data centre infrastructure choice now has consequences that extend well beyond traditional concerns like uptime and connectivity. The cooling architecture, power delivery, and ability to support sustained 100+ kW racks are technical requirements that simply did not exist three years ago, and they are non-negotiable for any production AI work.
Power delivery is the second-order problem
Cooling gets all the attention in the conversation about AI infrastructure. Power delivery is the equally hard problem nobody talks about. A rack pulling 130 kW needs a power feed and switchgear architecture that most enterprise data centres were never built for.
Consider what 130 kW means in practice. A traditional 5 kW rack might be fed by a single 30A 208V circuit. A 130 kW rack needs roughly 26 times the current capacity. That is not a question of bigger cables — it is a question of substation capacity, busway design, PDU architecture, and the redundancy strategy that backs the whole thing. Doubling the power density of a hall typically requires re-architecting the electrical distribution from the floor up.
There is also the grid-side question. A facility that adds significant GPU capacity is making a measurable demand on the local utility. In several Indian metros, the lead time for additional grid capacity is now a constraint on how quickly AI deployments can scale. The facilities that planned ahead — securing power capacity, building substations, signing long-term commercial arrangements with utilities — are the ones that can absorb new AI tenants today. The ones that did not are quoting 18-24 month lead times.
Sustainability stops being optional
AI infrastructure is energy-intensive enough that sustainability moves from a corporate-responsibility line item to an operational and regulatory question. Three forces are colliding:
• Customer mandates. Enterprise AI buyers increasingly require carbon-disclosure data from their infrastructure providers as part of procurement.
• Regulatory pressure. India's energy intensity targets and BEE building codes are tightening, with data centres explicitly named as a focus category.
• Economic gravity. Renewable energy has become cheaper than grid power for high-load tenants in several Indian states. Operators with PPA arrangements are quietly building a structural cost advantage.
The combined effect is that green data centre design has become a competitive parameter, not just a marketing one. Facilities with measured PUE under 1.4, on-site renewable integration, water-efficient cooling, and certified sustainability credentials are increasingly the only acceptable options for serious enterprise tenants. The ones that ignored sustainability for the last five years are now retrofitting expensively.
Operators that built green data centres into their roadmap early — see www.sifytechnologies.com for one example of how this is being approached at scale in India — have a measurable advantage in the AI infrastructure conversation that did not exist as recently as 2023.
What infrastructure teams should be planning for
If your team is starting to plan for AI workloads — whether training, fine-tuning, inference, or any combination — the practical questions worth working through this quarter are:
• What rack densities will the planned hardware actually require? Get the numbers from the vendor specifications, not from approximations.
• Does your current facility (or your provider's facility) genuinely support those densities? "Supports liquid cooling" is too vague — ask for the specific certified rack density and the cooling architecture in use.
• What is the power delivery story? A facility that supports 130 kW per rack thermally but tops out at 40 kW electrically does not actually support 130 kW per rack.
• What is the failure mode? Liquid cooling adds a failure dimension that air cooling did not have. The operational maturity of the facility — leak detection, redundant loops, response procedures — is now part of the evaluation.
• What does the sustainability profile look like? PUE, renewable energy share, water usage. These are no longer optional questions.

The shift from air cooling to liquid cooling is not a tweak to existing data centre designs. It is a generational architectural change, and the facilities that handle it well will host the next decade of AI infrastructure. The ones that try to retrofit half-measures will spend the rest of this decade explaining why their tenants moved out.

About Sify: This article draws on observed patterns in enterprise AI infrastructure deployment in India. Sify (NASDAQ: SIFY) is India's largest integrated ICT solutions and services provider and was India's first commercial colocation provider. Sify operates six concurrently-maintainable data centres across Mumbai, Chennai, Noida, Bangalore, Hyderabad and Kolkata — including India's first NVIDIA-Certified DGX-Ready facility for liquid cooling at 130+ kW per rack. The company hosts 3 of the 4 global hyperscalers in its facilities and serves 10,000+ enterprises across BFSI, manufacturing, retail, healthcare, pharma, and digital-native sectors. Recognised in Gartner's Magic Quadrant for Managed Network Services Global; IDC MarketScape for Managed Cloud Services APeJ. NASDAQ-listed since 1999.

Top comments (0)