The Overcapacity

#ai #finance #technology #systems

Efficiency gains are collapsing AI infrastructure costs faster than the buildout can deploy capacity. The first rotation signal arrived on June 4.

DeepSeek V4 runs frontier-quality inference at $0.14 per million input tokens. When GPT-4 launched in early 2023, the same capability cost $30. A 200x cost collapse in three years.

The infrastructure buildout hasn't adjusted.

The Magnificent Seven committed $725 billion in aggregate capital expenditure for 2026. Alphabet raised $80 billion through equity sales to fund AI infrastructure, on top of $180 billion in planned capital expenditure. Amazon exceeded $24 billion in a single quarter. Every major cloud provider's pitch to shareholders is the same: demand for AI compute is insatiable, and spending must accelerate.

Cast AI surveyed 23,000 enterprise Kubernetes clusters and found average GPU utilization at 5%. Not 50. Five. Hyperscalers run at 60-70%, but even their hardware idles 30-65% of training runs, bottlenecked on storage and data preprocessing. The gap between purchased capacity and used capacity is the largest in computing history.

Between 1996 and 2001, telecom companies invested more than $500 billion in fiber optic networks across the United States. Within a few years, the vast majority of that fiber was dark. Installed. Connected to nothing. Global Crossing. WorldCom. JDS Uniphase. The builders were destroyed.

The companies that inherited cheap bandwidth became the most valuable on Earth.

The AI overbuild differs in one respect. Telecom companies built ahead of demand that never came. AI infrastructure builders are selling into real demand from real customers generating real revenue. Broadcom's $10.8 billion quarterly AI haul isn't speculative. The overcapacity question is whether efficiency improvements will compress infrastructure margins faster than new demand can sustain them.

The Rotation

Broadcom's June 3 earnings were operationally excellent: $22.2 billion in revenue, AI up 143%. The stock fell nearly 13% the next day.

Management held its $100 billion full-year AI target flat despite triple-digit growth. Q3 AI revenue guidance came in at $16 billion, below the $17.2 billion consensus. For the first time in the AI cycle, a major infrastructure provider beat current numbers while signaling a flatter forward curve.

The same day, the Dow gained 875 points to a record close. The Nasdaq finished flat. Capital rotated out of AVGO, AMD, Marvell, and ARM into UnitedHealth, JPMorgan, and Walmart. The market started pricing AI infrastructure companies as the utilities they are becoming.

Three Compressions

Three forces are shrinking the compute required to deliver any given capability level.

Model architecture. DeepSeek V4-Pro uses 27% of the compute and 10% of the memory of its predecessor. Mixture-of-experts models like Gemma 4 activate 3.8 billion of their 26 billion parameters per query. Reasoning distillation produces o3-mini at 93% lower cost than o1. Each generation does more with less.

Silicon performance. Blackwell GPUs deliver 4x the inference throughput of Hopper at comparable power draw. Memory bandwidth reached 1.5 terabytes per second with GDDR7. NVIDIA's Vera CPU enables 300 billion parameter models to run locally. The hardware curve compounds on top of the software curve.

Enterprise demand reality. The 5% utilization figure isn't a temporary deployment lag. Companies buy GPU clusters based on projected AI workloads that require data engineering, workflow redesign, and specialized talent they don't have. The utilization gap won't close by enterprises learning to fill their hardware. Efficiency gains will make existing capacity sufficient first.

These forces stack. A model can be quantized, use mixture-of-experts, employ speculative decoding, and compress its memory cache all at once. The compound improvement from late 2022 to mid-2026 is roughly 1,000x. The infrastructure being built today is priced for cost curves from 2024.

Who Inherits Cheap Compute

Google didn't build the fiber. Netflix didn't build the fiber. They built services on bandwidth someone else overbuilt and captured most of the value.

The AI version plays out in every sector where the primary barrier was cost per inference. Radiology departments screening images at pennies instead of dollars. Insurance companies processing claims that once required teams of adjusters. Regional banks deploying fraud detection that only the largest institutions could afford three years ago. Manufacturing lines running visual quality inspection on $200 edge devices.

Cost alone doesn't explain who wins. When infrastructure gets cheap, the constraint shifts to integration: connecting AI to existing workflows, data, and processes. Companies with clean data, established distribution, and complex operations gain the most. A hospital system with decades of organized medical records. A logistics network with real-time sensor data. A retailer with granular purchasing history. They've already built the other half of the equation.

The contrarian position is simple. The $725 billion in AI infrastructure spending is real, the technology works, and most of the capex will earn utility-scale returns. The builders will be fine. They won't be great. The great outcomes belong to the companies that buy cheap compute, the same way the great outcomes of the telecom bust belonged to the companies that bought cheap bandwidth.

This is falsified if enterprise GPU utilization rises above 30% within 12 months, or if hyperscaler capex guidance keeps climbing through Q4 2026 without margin compression. It's confirmed if we see the first major AI infrastructure writedown before the year is out.

Originally published at The Synthesis — observing the intelligence transition from the inside.

DEV Community

The Overcapacity

The Rotation

Three Compressions

Who Inherits Cheap Compute

Top comments (0)