DEV Community

NTCTech
NTCTech

Posted on • Originally published at rack2cloud.com

InfiniBand Is Losing the Fabric War. Here's What That Changes for Your Architecture.

The InfiniBand vs RoCEv2 decision has been settled at the hyperscaler level — and the answer is Ethernet. Broadcom's March 2026 earnings confirmed it: roughly 70% of new AI infrastructure deployments are now choosing Ethernet-based fabrics over InfiniBand. That didn't happen because Ethernet got faster. It happened because InfiniBand ran out of room.

InfiniBand Didn't Lose on Performance

Let's be precise about what the shift actually means. InfiniBand remains technically superior for a specific class of problem: tightly coupled, homogeneous, single-vendor GPU clusters running large-scale distributed training in a controlled environment. At that workload, InfiniBand's latency characteristics and RDMA implementation are still genuinely differentiated.

The shift isn't a performance verdict. It's an ecosystem verdict.

InfiniBand is losing because of operational isolation, vendor lock-in, and scaling friction in the environments where enterprise AI actually runs — not because RoCEv2 won a latency benchmark.

What's Actually Happening

Ecosystem divergence diagram showing InfiniBand vendor stack versus RoCEv2 open ecosystem alignment

Three forces are converging to push the InfiniBand vs RoCEv2 decision toward Ethernet:

The hyperscalers moved first. AWS, Google, and Microsoft have all built or are building their AI backend fabrics on Ethernet-based architectures. When the largest AI training environments in the world converge on a fabric model, the tooling, operational expertise, and ecosystem compound. Teams building on-premises AI clusters after training on cloud infrastructure face a jarring operational discontinuity if they select InfiniBand for the private side.

The Ultra Ethernet Consortium formalized the direction. The UEC — backed by AMD, Broadcom, Cisco, HPE, Intel, Meta, and Microsoft — is building AI-optimized extensions to Ethernet to close the gap with InfiniBand for distributed training. Congestion control, in-sequence delivery, and multipath capabilities that InfiniBand had as native features are being engineered into Ethernet as open standards.

NVIDIA is pushing InfiniBand as a platform commitment, not just a networking choice. The tightly coupled NVIDIA InfiniBand stack — GPU, NIC, switch, software — delivers real performance and real lock-in. For organizations evaluating multi-vendor GPU procurement or heterogeneous inference environments, that's a platform commitment with long-term procurement consequences.

Why InfiniBand Is Losing in Practice

InfiniBand scaling friction diagram showing where the architecture breaks in hybrid and multi-region environments

Constraint 01 — Operational Isolation

InfiniBand requires a separate toolchain, separate skillset, and separate operational model from everything else in the stack. Your network engineers know Ethernet. Your cloud engineers know Ethernet. InfiniBand expertise is a specialized hire — in an environment where most organizations are already stretched thin.

Constraint 02 — Vendor Lock-In Architecture

InfiniBand is not a neutral standard. It's a NVIDIA/Mellanox ecosystem. Switches, NICs, cables, drivers, and management tooling are tightly coupled to a single vendor stack. Multi-vendor GPU environments, heterogeneous inference hardware, and future silicon decisions are all constrained by the fabric choice made today.

Constraint 03 — Scaling Friction at the Boundary

InfiniBand works exceptionally well inside its design boundary: a homogeneous, on-premises, single-vendor cluster. The moment the architecture extends to hybrid connectivity, multi-region inference serving, or heterogeneous environments mixing cloud and private GPU infrastructure, InfiniBand creates hard boundaries. Bridging InfiniBand to Ethernet at the hybrid edge adds latency, complexity, and cost that erodes the performance advantage it was selected for.

Why Ethernet Is Winning the InfiniBand vs RoCEv2 Decision

RoCEv2 isn't winning because it's technically superior in a controlled benchmark. It's winning because it removes the operational, ecosystem, and scaling constraints InfiniBand carries — at a cost point and interoperability profile that compounds over time.

Ecosystem gravity is the primary force. Ethernet is the fabric of cloud infrastructure, enterprise networking, and the operational knowledge base of virtually every network engineer. When you choose RoCEv2, you're choosing alignment with the tooling, talent, and integration patterns that the rest of your infrastructure already runs on.

Programmability is the second force. DPUs and SmartNICs — NVIDIA BlueField, AMD Pensando, Intel IPU — sit on top of Ethernet and offload networking functions, security processing, and storage I/O to dedicated silicon. This programmability layer is native to the Ethernet ecosystem. For architects building software-defined fabric policies, congestion control automation, or integrated security enforcement at the network layer, Ethernet provides the surface that InfiniBand does not.

Cloud alignment is the third force. If your AI workloads span cloud training bursts and on-premises inference, a consistent fabric model across both environments eliminates an entire class of integration friction.

The Real Shift: The Fabric Is Becoming Software

The deeper architectural change is not InfiniBand vs. RoCEv2. It's the transition of the fabric from a hardware-defined performance layer to a software-defined, policy-driven component of the infrastructure stack. That transition is native to Ethernet.

The deterministic networking architecture that AI training clusters require — symmetric leaf-spine topology, ECN over PFC for congestion signaling, adaptive routing for failure recovery — is increasingly implemented through programmable logic at the switch and NIC layer, not through hardware-enforced InfiniBand primitives.

What this means operationally: fabric engineering is converging with platform engineering. Fabric policy — congestion thresholds, routing logic, QoS configuration — is increasingly expressed as code, version-controlled, and enforced through the same IaC pipelines that provision the rest of the AI infrastructure stack.

What Most Teams Will Miss

The teams making the wrong fabric decision aren't the ones who don't understand InfiniBand's performance characteristics. They're benchmarking raw latency while ignoring the dimensions that actually govern lifecycle cost:

What gets benchmarked What governs lifecycle cost
Raw latency (µs) Operability — can your team run it at 2am?
Peak bandwidth (Gbps) Failure domain containment
RDMA throughput (ideal conditions) Cost of complexity — tooling overhead
MPI all-reduce scores Hybrid boundary friction

A cluster that hits 95% of InfiniBand's throughput on RoCEv2 while being operable by the team that already runs the rest of the infrastructure is a better architecture outcome than 100% throughput with a dedicated fabric specialist keeping it alive.

What This Means for Your Architecture

InfiniBand vs RoCEv2 architect decision matrix for AI infrastructure workload selection

The InfiniBand vs RoCEv2 decision in 2026 is not a binary verdict. It's a workload-specific evaluation:

Scenario InfiniBand RoCEv2 / Ethernet
Homogeneous NVIDIA cluster, isolated training Strong fit Strong fit — evaluate operational overhead
Heterogeneous GPU environment Friction at boundaries Natural fit
Hybrid cloud + on-prem AI Hard boundary complexity Consistent model
Inference-only cluster Overcomplicated Right-sized
Team with Ethernet expertise Operational gap No gap
Multi-region AI infrastructure Not designed for this Cloud-native alignment

Three questions before you commit: What is your workload type — training, inference, or both? What is your scale model — isolated cluster, hybrid, or multi-region? What is your team's operational capability?

Architect's Verdict

The InfiniBand vs RoCEv2 question is settled at the ecosystem level — but not at the workload level. InfiniBand isn't disappearing. It remains the correct selection for specific, bounded, high-performance training environments committed to the NVIDIA full-stack model.

But it is no longer the presumptive default. The 70/30 Ethernet split reflects a market that has moved past the performance comparison phase and into the operational reality phase of AI infrastructure deployment at scale.

DO:

  • Evaluate fabric against workload type, scale model, and team capability — not benchmark scores
  • Model the operational cost of InfiniBand expertise — specialization has a real hiring and retention cost
  • Design the hybrid fabric boundary explicitly before committing
  • Treat ECN configuration as a first-class architecture decision on RoCEv2, not a default setting

DON'T:

  • Default to InfiniBand "because AI"
  • Treat RoCEv2 as a drop-in replacement without engineering the congestion control layer
  • Benchmark only peak throughput
  • Lock in fabric before modeling the training vs. inference infrastructure split

The fabric decision is the foundation of every AI infrastructure choice made above it. Getting it right means evaluating it as a systems decision, not a networking benchmark.


Cross-posted from Rack2Cloud — field-tested AI infrastructure architecture for engineers operating at enterprise scale.

Top comments (0)