DEV Community

Nick Schmidt
Nick Schmidt

Posted on • Originally published at blog.engyak.co on

Mellanox `nmlx5_core` driver `4.23` issues on ESXi 8.0 Update 1

Mellanox Driver Overview

Problem Inventory - Mellanox Driver Update on ESXi 8.0u1 causing network virtualization issues

After installing ESXi 8.0 Update 1, some issues start to appear with affected nmlx5_core adapters:

  • Delayed / Failed IP discovery on VLAN-backed segments, even within the same host. Once in the ARP cache, no issues persist
  • Delayed / Failed IP discovery, IP allocation failures on VLAN trunked port-groups, even within the same host. Issues persist even after IP discovery is established
  • Overlay encapsulation offload failures:
    • ICMP with any payload size will function bidirectionally via Edge Transport Nodes / FRRLinux machines, but TCP and UDP will not
    • All overlay traffic encapsulated by a vSphere host flows correctly between workloads on the sane NSX overlay segment
    • All overlay traffic encapsulated by a vSphere host flows correctly between segments on the same NSX distributed router

These issues are seen on the following hardware models:

  • MCX4121A-ACAT firmware revisions 14.25 and 14.32

These issues are experienced with the upgrade to vSphere 8.0 Update 1, which includes the following updated driver:

nmlx5-core 4.23.0.36-8vmw.800.1.0.20513097

This driver from NVIDIA ships with support for both Bluefield SmartNIC and ConnectX Generation 5 network adapters as one package, and rolling back to a previous release of ESXi 8 with the previous driver (nmlx5-core 4.22) immediately resolves all overlay issues

Resolution not yet found, this page will be updated when it is

If anyone would like to contribute to this problem inventory, email me here

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more