<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fabricio</title>
    <description>The latest articles on DEV Community by Fabricio (@fpolica91).</description>
    <link>https://dev.to/fpolica91</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F554821%2Fca0f4225-7a98-448b-bf9b-2244ff3b2d52.png</url>
      <title>DEV Community: Fabricio</title>
      <link>https://dev.to/fpolica91</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fpolica91"/>
    <language>en</language>
    <item>
      <title>nvidia-peermem "Invalid argument" on Ubuntu — Fix GPUDirect RDMA with DMA-BUF</title>
      <dc:creator>Fabricio</dc:creator>
      <pubDate>Sun, 21 Jun 2026 18:24:22 +0000</pubDate>
      <link>https://dev.to/fpolica91/nvidia-peermem-invalid-argument-fix-2b3n</link>
      <guid>https://dev.to/fpolica91/nvidia-peermem-invalid-argument-fix-2b3n</guid>
      <description>&lt;h1&gt;
  
  
  nvidia-peermem "Invalid argument" on Ubuntu — Fix GPUDirect RDMA with DMA-BUF
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; If &lt;code&gt;modprobe nvidia-peermem&lt;/code&gt; fails with &lt;code&gt;Invalid argument&lt;/code&gt; (&lt;code&gt;-EINVAL&lt;/code&gt;) on a system using the &lt;strong&gt;inbox Ubuntu InfiniBand stack&lt;/strong&gt; (&lt;code&gt;rdma-core&lt;/code&gt;), the module is not broken and you do not need it. nvidia-peermem requires an API that only exists in MLNX_OFED. On Hopper/Blackwell GPUs with the NVIDIA &lt;strong&gt;open&lt;/strong&gt; driver, use &lt;strong&gt;DMA-BUF&lt;/strong&gt; instead — it does GPUDirect RDMA natively. The one gotcha: you must enable &lt;code&gt;nvidia-drm modeset=1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Applies to:&lt;/strong&gt; Ubuntu 22.04 / 24.04, inbox &lt;code&gt;rdma-core&lt;/code&gt; stack, NVIDIA open kernel driver, H100 / H200 / B200, ConnectX-6/7 (or any HCA with ODP support).&lt;/p&gt;




&lt;h2&gt;
  
  
  The symptom
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;modprobe nvidia-peermem
modprobe: ERROR: could not insert &lt;span class="s1"&gt;'nvidia_peermem'&lt;/span&gt;: Invalid argument
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;dmesg&lt;/code&gt; shows nvidia-peermem loaded but registered nothing, or the load returns &lt;code&gt;-EINVAL&lt;/code&gt;. GPUDirect RDMA appears to be unavailable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens (and why it is not a bug)
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;nvidia-peermem&lt;/code&gt; is the &lt;strong&gt;legacy&lt;/strong&gt; path for GPUDirect RDMA. It registers GPU memory with the InfiniBand subsystem through a Mellanox-proprietary kernel API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;ib_register_peer_memory_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That symbol &lt;strong&gt;only exists in MLNX_OFED's build of &lt;code&gt;ib_core&lt;/code&gt;&lt;/strong&gt;. It is not in the mainline kernel, and it is not in &lt;code&gt;rdma-core&lt;/code&gt;, which is the inbox InfiniBand stack on Ubuntu.&lt;/p&gt;

&lt;p&gt;If you are on the inbox stack, nvidia-peermem was compiled without that API present, so it can never bind and always returns &lt;code&gt;Invalid argument&lt;/code&gt;. No module parameter or config change will fix it, because the thing it needs was never there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not install MLNX_OFED just to make nvidia-peermem load.&lt;/strong&gt; That works, but it is the wrong fix — you would be adding a heavy proprietary stack to revive an obsolete module. There is a native path already in your kernel.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: use DMA-BUF
&lt;/h2&gt;

&lt;p&gt;On Hopper and newer with the open driver, GPUDirect RDMA works through &lt;strong&gt;DMA-BUF&lt;/strong&gt;, a mainline Linux framework. No external module, no MLNX_OFED.&lt;/p&gt;

&lt;h3&gt;
  
  
  Requirements (check these first)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;NVIDIA &lt;strong&gt;open&lt;/strong&gt; kernel driver (not the proprietary build)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nvidia-drm modeset=1&lt;/code&gt; enabled ← most common missing piece&lt;/li&gt;
&lt;li&gt;Kernel built with:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CONFIG_DMA_SHARED_BUFFER=y&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CONFIG_HMM_MIRROR=y&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CONFIG_INFINIBAND_ON_DEMAND_PAGING=y&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ib_umem_dmabuf&lt;/code&gt; symbols present in &lt;code&gt;ib_uverbs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;HCA with ODP support (ConnectX-6/7 have it)&lt;/li&gt;
&lt;li&gt;Hopper or newer GPU (H100 / H200 / B200)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1 — Enable nvidia-drm modeset
&lt;/h3&gt;

&lt;p&gt;Check current state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /sys/module/nvidia_drm/parameters/modeset
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it returns &lt;code&gt;N&lt;/code&gt;, DMA-BUF export is inactive. Enable it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Runtime&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;modprobe &lt;span class="nt"&gt;-r&lt;/span&gt; nvidia_drm &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;modprobe nvidia_drm &lt;span class="nv"&gt;modeset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Persistent across reboots&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'options nvidia-drm modeset=1'&lt;/span&gt; | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; /etc/modprobe.d/nvidia-drm-modeset.conf
&lt;span class="nb"&gt;sudo &lt;/span&gt;update-initramfs &lt;span class="nt"&gt;-u&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Re-check that the parameter now reads &lt;code&gt;Y&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Verify GPUDirect RDMA actually works
&lt;/h3&gt;

&lt;p&gt;Do not trust "it should work now." Confirm the full path: allocate GPU memory, export it as a DMA-BUF file descriptor, register it with the HCA.&lt;/p&gt;

&lt;p&gt;The three calls that must succeed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;cudaMalloc()&lt;/code&gt; — allocate GPU memory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cuMemGetHandleForAddressRange()&lt;/code&gt; with &lt;code&gt;CU_MEM_RANGE_HANDLE_TYPE_DMA_BUF_FD&lt;/code&gt; — export as a DMA-BUF fd&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ibv_reg_dmabuf_mr()&lt;/code&gt; — register that fd with the InfiniBand HCA&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If all three return success, GPU memory is directly addressable by the HCA over DMA-BUF and GPUDirect RDMA is working. nvidia-peermem is not needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Legacy (nvidia-peermem)&lt;/th&gt;
&lt;th&gt;Modern (DMA-BUF)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Requires MLNX_OFED&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External module&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works on inbox &lt;code&gt;rdma-core&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported GPUs&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;Hopper+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA recommendation&lt;/td&gt;
&lt;td&gt;Deprecated&lt;/td&gt;
&lt;td&gt;Preferred&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If &lt;code&gt;nvidia-peermem&lt;/code&gt; fails with &lt;code&gt;Invalid argument&lt;/code&gt; on an inbox stack, that is expected. Enable &lt;code&gt;nvidia-drm modeset=1&lt;/code&gt;, use DMA-BUF, verify with the three-call test above.&lt;/p&gt;




&lt;h3&gt;
  
  
  Related symptoms worth checking on the same box
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;All IB ports stuck in &lt;code&gt;INIT&lt;/code&gt;, LID 0&lt;/strong&gt; → no Subnet Manager on the fabric. Start one: &lt;code&gt;sudo apt install opensm &amp;amp;&amp;amp; sudo systemctl start opensm&lt;/code&gt;. Ports go Active within seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One port &lt;code&gt;Down/Polling&lt;/code&gt; at SDR while others are Active&lt;/strong&gt; → check the switch side by directed route. If both ends are polling, it is physical (cable / transceiver / seat), not software. Reseat or swap.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nvidia</category>
      <category>nvidiapeermem</category>
      <category>ai</category>
      <category>hpc</category>
    </item>
    <item>
      <title>How to fix 'zsh: command not found: python' when using pyenv on macos</title>
      <dc:creator>Fabricio</dc:creator>
      <pubDate>Mon, 13 Mar 2023 18:19:01 +0000</pubDate>
      <link>https://dev.to/fpolica91/how-to-fix-zsh-command-not-found-python-when-using-pyenv-on-macos-5hh3</link>
      <guid>https://dev.to/fpolica91/how-to-fix-zsh-command-not-found-python-when-using-pyenv-on-macos-5hh3</guid>
      <description>&lt;ol&gt;
&lt;li&gt;Start by installing pyenv via brew &lt;code&gt;brew install pyenv&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Install desired version of python &lt;code&gt;pyenv install &amp;lt;version&amp;gt;&lt;/code&gt; and set it as global &lt;code&gt;pyenv global &amp;lt;version&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Update the environment variables to use the version of Python installed through pyenv: &lt;code&gt;echo 'export PATH="$HOME/.pyenv/bin:$PATH"' &amp;gt;&amp;gt; ~/.zshrc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Restart your terminal or run the following command: &lt;code&gt;source ~/.zshrc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Alias python installed via pyenv &lt;code&gt;echo "alias python=$(pyenv which python)" &amp;gt;&amp;gt; ~/.zshrc&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
  </channel>
</rss>
