DEV Community

gentic news
gentic news

Posted on • Originally published at gentic.news

Blackwell NVLink Breaks Confidential Compute, 61% Regression Reported

NVIDIA Blackwell confidential computing disables NVLink multicast, causing 61% regression on SGLang Qwen3.5 397B. Hopper had unencrypted NVLink, compounding the issue.

NVIDIA Blackwell's confidential computing disables NVLink multicast, causing a 61% performance regression on SGLang Qwen3.5 397B. The finding, from a GitHub ticket by @verdacloud, was amplified by @SemiAnalysis_.

Key facts

  • 61% performance regression on SGLang Qwen3.5 397B
  • NVLink multicast unsupported in Blackwell confidential computing
  • Hopper confidential computing had unencrypted NVLink
  • Finding from @verdacloud GitHub ticket, amplified by @SemiAnalysis_
  • Regression affects large-model inference in regulated environments

NVIDIA's Blackwell architecture suffers a critical flaw in its confidential computing implementation: NVLink multicast is not supported, leading to a 61% performance regression on SGLang Qwen3.5 397B, according to a GitHub ticket from @verdacloud and reporting by @SemiAnalysis_ [@SemiAnalysis_]. The regression is particularly severe for large-model inference, where NVLink multicast—which allows one GPU to broadcast data to multiple GPUs simultaneously—is essential for reducing communication overhead.

The issue is compounded by NVIDIA's own documentation. The company's whitepaper, "NVIDIA Secure AI with Blackwell and Hopper GPUs," reveals that Hopper's confidential computing had fully unencrypted NVLink, meaning the previous generation's "secure" mode was incomplete [NVIDIA whitepaper]. This suggests NVIDIA's confidential computing story has been inconsistent across generations.

The 61% regression on SGLang Qwen3.5 397B is a worst-case scenario for large-model inference. SGLang, a popular inference engine for large language models, relies heavily on NVLink multicast for tensor parallelism across GPUs. Without multicast, each GPU must individually fetch data from other GPUs, increasing latency and reducing throughput.

Why this matters more than the press release suggests

The NVLink multicast regression reveals a structural trade-off in NVIDIA's confidential computing design. To achieve memory encryption and isolation, NVIDIA must disable NVLink multicast, which is a hardware-level feature. This is not a software bug that can be patched—it is a design choice with permanent performance implications for any workload requiring confidential computing.

For enterprise customers deploying large models in regulated environments (finance, healthcare, government), this is a significant problem. They must choose between security (confidential computing) and performance (NVLink multicast). The 61% regression makes large-model inference under confidential computing nearly impractical for latency-sensitive applications.

Broader context

This is not an isolated incident. Earlier this year, NVIDIA's Grace Hopper superchip faced criticism for memory bandwidth limitations in confidential computing mode. The pattern suggests NVIDIA is prioritizing time-to-market over rigorous validation of security features. Competitors like AMD's MI300X, which supports confidential computing with full interconnect bandwidth, could capitalize on this weakness.

What to watch

Watch for NVIDIA's response—either a firmware update that mitigates the regression (unlikely given the hardware nature) or a revised whitepaper acknowledging the limitation. Also monitor @verdacloud's GitHub ticket for updates and any benchmark comparisons from AMD or Intel showcasing their confidential computing performance on large models.

Key Takeaways

  • NVIDIA Blackwell confidential computing disables NVLink multicast, causing 61% regression on SGLang Qwen3.5 397B.
  • Hopper had unencrypted NVLink, compounding the issue.

What to watch

Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 ...

Watch for NVIDIA's official response—either a firmware update (unlikely) or a revised whitepaper acknowledging the limitation. Also monitor @verdacloud's GitHub ticket for updates and benchmark comparisons from AMD or Intel showcasing their confidential computing performance on large models.


Originally published on gentic.news

Top comments (0)