Google’s Virgo network interconnects 134K TPUv8t chips at 47 Pbps

#ai #machinelearning #research #deeplearning

Google's Virgo network interconnects 134,400 TPUv8t chips at 47 Pbps, targeting large-scale training clusters.

Google’s Virgo network interconnects up to 134,400 TPUv8t chips at 47 Pbps. The scale-out architecture, disclosed by @SemiAnalysis_, targets training clusters for frontier models.

Key facts

Virgo interconnects up to 134,400 TPUv8t chips.
Non-blocking bisectional bandwidth: 47 Pbps.
TPUv8t is a training-focused variant of Google's TPU.
47 Pbps equals ~47,000 Tbps.
Competes with NVIDIA NVLink and InfiniBand fabrics.

Google has introduced a new training-focused TPU, the TPUv8t, alongside a scale-out network architecture called Virgo. According to @SemiAnalysis_, Virgo can interconnect up to 134,400 chips with up to 47 Pbps of non-blocking bisectional bandwidth. That bandwidth figure dwarfs prior interconnects; by comparison, InfiniBand NDR 400 switches top out around 3.2 Tbps per port, and NVIDIA's NVLink Switch supports 900 GB/s per GPU directionally.

Why 47 Pbps matters

47 Pbps is roughly 47,000 Tbps — enough to transfer the entire printed Library of Congress in under a second. For training runs on clusters exceeding 100,000 accelerators, the interconnect often becomes the bottleneck: all-reduce gradients across 100K+ chips can stall if bandwidth is insufficient. Virgo’s non-blocking design means every chip can communicate with any other chip at full line rate simultaneously, eliminating the head-of-line blocking that plagues tree-based or fat-tree topologies.

TPUv8t: training-first design

Google did not disclose TPUv8t’s raw FLOPS or memory bandwidth, but the “t” suffix indicates a training-optimized variant of the eighth-generation TPU. Previous TPU generations (v4, v5p, v6) were general-purpose; the v8t appears to sacrifice some inference efficiency for higher sustained throughput on large-batch training jobs. The 134,400-chip ceiling suggests Google is targeting clusters comparable to the 100K-GPU scale used for Gemini and GPT-4-class models.

Competitive positioning

Virgo competes directly with NVIDIA’s NVLink and InfiniBand fabrics, and with AMD’s Infinity Fabric. However, no commercial interconnect currently advertises 47 Pbps aggregate bandwidth at that scale. Google’s custom silicon approach allows tighter integration between the TPU and the network topology, potentially reducing latency and power consumption compared to third-party switches. The architecture also aligns with Google’s strategy of vertical integration for AI infrastructure, similar to how AWS builds its Trainium/Nitro combination.

Unknowns

Google has not published Virgo’s per-port bandwidth, latency, or power draw. Nor has it disclosed the TPUv8t’s training performance on standard benchmarks like MLPerf. The 134,400-chip figure is a theoretical maximum; real-world clusters may be smaller. [@SemiAnalysis_] did not specify when Virgo or TPUv8t will enter production.

What to watch

Watch for TPUv8t MLPerf training submission results and Google Cloud’s pricing for Virgo-based TPU slices. Also track whether Google licenses Virgo to third-party hardware or keeps it proprietary for internal training runs.

Originally published on gentic.news