Epoch AI published a component-cost breakdown of a frontier AI accelerator this week, and the headline number is one that reframes a lot of the 2026 GPU conversation: memory now accounts for roughly two-thirds of the bill of materials. The logic die — the thing most people picture when they hear "AI chip" — is no longer the dominant cost. The stacks of high-bandwidth memory glued to it are.
That is a quiet inversion. A decade ago, on a server GPU built for general HPC, the logic die was the headline cost and memory was an accessory. Today, on an accelerator built for training and serving large models, the proportions have flipped. The same chip family that used to be priced mostly by its silicon foundry is increasingly priced by what its memory vendor charges per HBM stack.
The Epoch AI insight names that share concretely, and the source data is exactly the kind of slow-moving structural number that gets ignored until a quarterly earnings call forces the room to look at it. (Epoch AI's component-cost breakdown is the primary source.)
What this changes for anyone reading the AI-infrastructure trade press is the unit of analysis. The bottleneck story for the last two years has been told in GPU units — how many H100s a hyperscaler bought, how many B100s shipped, how many Blackwell racks Nvidia could deliver. That framing buried the real constraint. The number of accelerators a fab could finish each quarter has not been the binding line item; the number of HBM stacks the memory vendors could deliver to bond onto those accelerators has. SK Hynix, Samsung, and Micron are the three names that matter for that supply, and their allocation calendars — not Nvidia's yields — are what set training-cluster ship dates.
There is a second consequence that is easy to miss. When the logic die was dominant, a generational improvement in compute efficiency translated cleanly into a generational improvement in chip economics: a smaller die or a denser process node moved the cost curve in a predictable direction. With memory now the dominant share, the cost curve of "GPU compute" is increasingly a memory-pricing curve in disguise. The thing that lowers the cost of training a frontier model in 2027 may have less to do with what TSMC ships from its 2nm node and more to do with how aggressively the three HBM vendors compete on capacity. That is a different industry shape than the one the AI hardware narrative has been carrying.
The thing to watch from here is the HBM4 ramp. Industry roadmaps have the next generation of high-bandwidth memory beginning meaningful shipments through 2026 and scaling into 2027, with all three vendors competing for design wins on the next round of training accelerators. HBM4 is faster per stack and denser per package than HBM3E, which means a single accelerator can carry more memory without a wider footprint — and that, mechanically, will move the cost share again. Whether memory keeps climbing toward three-quarters of the BOM or plateaus around the current level depends on two specific things: how quickly the HBM4 capacity expansions at SK Hynix and Micron actually come online, and whether Nvidia, AMD, and the in-house silicon teams at the hyperscalers spec larger or smaller HBM configurations on the next platform generation.
The cleanest framing for the practitioner reading this is that "compute scarcity" in 2026 is mostly a memory story. When the next round of pricing changes or supply announcements lands, the question worth asking is not what happened at the logic foundry but what happened at the HBM line — that is where the chip's cost is now living.
Top comments (0)