Why We're Stuck With GPUs This Long?

#ai #llm #nvidia #startup

I'm probably not the only one who checks every few months whether a GPU alternative has finally shipped, mostly so I can cancel a few subscriptions.

Nobody doubts it's physically possible or that people have tried. The real question is why it hasn't actually happened, and the answer is economic and structural, not technical.

GPUs are not uniquely ideal. They're uniquely general

LLM workloads are dense matmul, high parallelism, memory-bandwidth-bound compute. GPUs handle this well but weren't built for it specifically. An ASIC purpose-built for transformer inference should beat a GPU on perf-per-watt and perf-per-dollar, and in narrow slices, it already does:

Groq's LPU beats GPUs on single-stream inference throughput for models that fit its architecture
Cerebras' WSE cuts interconnect overhead by putting the whole model on one wafer
Google TPUs have run production workloads for years and are now sold externally via GCP

So specialized hardware can win, sometimes even in production. The real question isn't whether something can beat a GPU, it's why none of these have dented Nvidia's share.

1. The capital barrier

Custom silicon needs hundreds of millions in NRE cost, access to TSMC's leading-edge nodes with multi-year allocation queues, and several iterations before a design is commercially viable. That caps the field to hyperscaler balance sheets or venture funding measured in billions.

The barrier isn't just the chip either. CUDA, the surrounding tooling, and production pipelines took a decade of capital and engineering to mature, and matching that means rebuilding all of it, not swapping a part. That's a second capital sink on top of the silicon itself.

There's also a timing risk specific to fixed-function silicon: if the underlying model architecture shifts significantly, an ASIC taped out for today's transformer variant can become dead weight, while a GPU just needs a software update to run whatever comes next reasonably well. That risk hasn't actually played out, at least not since the current hype cycle started. The field has stayed on transformers, and changes like MoE routing or new attention variants have been incremental enough for both GPUs and existing ASICs to keep up. But it's a standing risk premium priced into every ASIC investment decision regardless, which raises the bar for committing capital even further.

2. The incentive and survivorship problem

Nvidia's own roadmap (Hopper, Blackwell, Rubin) keeps raising the bar for "worth switching." Hyperscaler chips (TPU, Trainium, Inferentia) mostly optimize internal cost structure rather than compete for the open market. And independent hardware startups face brutal survivorship odds: Graphcore wound down in 2024; Cerebras and Groq survive, but "surviving" and "threatening Nvidia" are different bars. Winners tend to get absorbed into a niche rather than displace the stack.

3. Pricing lock-in

Less obvious than the ecosystem barrier is what a cheaper substrate would do to pricing. GPU cloud and inference API pricing are calibrated to current GPU cost structures. A substantially cheaper substrate doesn't just improve margins, it invites price wars that make the whole inference business less defensible, and it strands capital already committed to GPU fleets under long depreciation schedules. So the incentive across the stack, chipmaker to cloud to API provider, is to absorb efficiency gains rather than pass through the price collapse a truly disruptive substrate would cause.

4. Stranded capital: the least motivated disruptors have the most capital

The numbers make this concrete: the four largest hyperscalers are on track to spend roughly $725B on AI infrastructure in 2026, up from about $410B in 2025, almost all of it GPUs, custom silicon, and power built around today's substrate.

That's an asymmetry. The entities with enough capital to fund a genuinely disruptive alternative are the same ones most exposed if it succeeds too fast. This is the innovator's dilemma in balance-sheet form: the companies best positioned to fund the successor technology are structurally the least motivated to ship it quickly, since doing so writes down their own assets. Custom silicon programs like TPU and Trainium read as hedges against Nvidia's pricing power, not attempts to strand their own fleets overnight. Hyperscalers are managing this transition, not racing to trigger it.

5. The consumer-device threat: disintermediation, not just competition

There's a more radical version of this than a better data-center chip: consumer silicon (phone NPUs, Apple's Neural Engine, Qualcomm's Snapdragon) getting good enough at running quantized, distilled models that people stop needing an API call at all. This is already happening at the edges, Apple Intelligence and on-device Llama variants handle a real slice of tasks locally. It won't touch frontier-scale training or the largest models anytime soon, since those still need data-center memory bandwidth and interconnect that no phone will ever have.

But it doesn't need to replace the frontier to be dangerous. If "good enough" runs on hardware someone already owns, that reduces the number of inference transactions routed through centralized APIs, and erodes the subscription and per-token economy that most AI startups, plus a meaningful share of Nvidia's inference revenue, are priced on. That's arguably a bigger threat to the current business model than a cheaper data-center chip, because it doesn't compete on price per FLOP. It competes for the transaction itself.

Is this the vacuum-tube moment?

The vacuum-tube-to-transistor analogy is tempting: massive capital-intensive infrastructure made obsolete once a cheaper substrate cleared a viability threshold. But the detail cuts both ways. Bell Labs and then IBM's transistorized System/360 captured that transition because they invested in it early. The tube makers who sat it out disappeared. Capital lock-in didn't block the shift, it decided who survived it.

That's the more useful lens here. Nvidia and the hyperscalers pouring capital into custom silicon alongside their GPU fleets look less like denial and more like hedging both sides, exactly what the transistor-era survivors did.

The result: a stable equilibrium, not a technical ceiling

We're stuck with GPUs not because they're optimal, but because they're the only option that clears performance adequacy, ecosystem maturity, supply chain scale, economic viability, and the incentive to not strand your own capital, all at once.

Where a workload is narrow and stable enough to justify walking away from CUDA, high-volume single-model inference being the clearest case, specialized silicon is already carving out real share. That's the boundary worth watching: not whether something replaces the GPU, but how much workload fragments off before the economics stop favoring generality.

If a real break does come, it's unlikely to come from the incumbents. Their capital is committed, and their incentive is to manage the transition, not trigger it. It'll more likely come from someone with nothing to strand. But the moment that threat looks credible and big enough, expect Nvidia and the hyperscalers to move fast and buy, fund, or out-build their way into it. They're not blind to the risk, they're just not going to be the ones who take the first swing at their own balance sheet.