<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tushar Thokdar</title>
    <description>The latest articles on DEV Community by Tushar Thokdar (@tushar365).</description>
    <link>https://dev.to/tushar365</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3874701%2Ff1089530-bbf0-4ad0-842d-a716e438be0c.jpeg</url>
      <title>DEV Community: Tushar Thokdar</title>
      <link>https://dev.to/tushar365</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tushar365"/>
    <language>en</language>
    <item>
      <title>512MiB 512MB — the silent trtexec bug</title>
      <dc:creator>Tushar Thokdar</dc:creator>
      <pubDate>Sun, 12 Apr 2026 10:31:15 +0000</pubDate>
      <link>https://dev.to/tushar365/512mib-512mb-the-silent-trtexec-bug-4p1</link>
      <guid>https://dev.to/tushar365/512mib-512mb-the-silent-trtexec-bug-4p1</guid>
      <description>&lt;h1&gt;
  
  
  512MiB ≠ 512MB — the silent trtexec bug
&lt;/h1&gt;

&lt;p&gt;This is a short one. But it cost me a full day, so.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I was trying to do
&lt;/h2&gt;

&lt;p&gt;Building a TensorRT FP16 engine on a Jetson Orin Nano. The device has a 512 MB NvMap CMA ceiling — a hard kernel memory limit. The FP16 build needed 606 MB. Everything failed.&lt;/p&gt;

&lt;p&gt;After some research I found that limiting the TRT workspace might help. So I tried:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trtexec &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--onnx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;benchmarks/prithvi_4f.onnx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--saveEngine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;benchmarks/prithvi_fp16.trt &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--fp16&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--memPoolSize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;workspace:512MiB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It built. No errors. Engine saved. 591 MB on disk. Inference ran.&lt;/p&gt;

&lt;p&gt;I thought I'd fixed it. I hadn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's actually happening
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;--memPoolSize&lt;/code&gt; flag has a strict set of valid suffixes: &lt;code&gt;B&lt;/code&gt;, &lt;code&gt;K&lt;/code&gt;, &lt;code&gt;M&lt;/code&gt;, &lt;code&gt;G&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MiB&lt;/code&gt; is not one of them.&lt;/p&gt;

&lt;p&gt;When trtexec sees an unrecognized suffix it doesn't error. It doesn't warn. It silently strips the &lt;code&gt;iB&lt;/code&gt; part and falls back to some default unit, parsing &lt;code&gt;512MiB&lt;/code&gt; as roughly &lt;strong&gt;1 KB&lt;/strong&gt; of workspace.&lt;/p&gt;

&lt;p&gt;So TRT built the engine with effectively no scratch memory. It compiled because TRT can always find &lt;em&gt;some&lt;/em&gt; kernel tactic — just not the good ones. The engine ran correctly, produced correct outputs, had reasonable latency. Nothing in the output told me anything was wrong.&lt;/p&gt;

&lt;p&gt;That's the bad part. A crash would have been better.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# this silently gives you ~1KB of workspace&lt;/span&gt;
&lt;span class="nt"&gt;--memPoolSize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;workspace:512MiB

&lt;span class="c"&gt;# this actually gives you 512MB&lt;/span&gt;
&lt;span class="nt"&gt;--memPoolSize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;workspace:512M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Valid suffixes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Suffix&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;B&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;K&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Kilobytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;M&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Megabytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;G&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gigabytes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Single letter. No &lt;code&gt;MB&lt;/code&gt;, no &lt;code&gt;MiB&lt;/code&gt;, no &lt;code&gt;GB&lt;/code&gt;, no &lt;code&gt;GiB&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to check if this happened to you
&lt;/h2&gt;

&lt;p&gt;Look at the build log. There should be a line like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[I] [TRT] Maximum workspace size: 536870912 bytes (512 MB)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it says something like &lt;code&gt;1024 bytes&lt;/code&gt; or anything under a few MB, your suffix was wrong and you got a degraded engine. Rebuild with &lt;code&gt;M&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can also verify the flag syntax first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trtexec &lt;span class="nt"&gt;--help&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;memPoolSize
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Does this fix NvMap?
&lt;/h2&gt;

&lt;p&gt;No. Limiting workspace reduces scratch memory usage during optimization but doesn't change where TRT allocates — it still goes through NvMap. The actual fix for that is a custom &lt;code&gt;IGpuAllocator&lt;/code&gt; that calls &lt;code&gt;cudaMalloc&lt;/code&gt; via ctypes, bypassing NvMap entirely. I wrote that up separately.&lt;/p&gt;




&lt;h2&gt;
  
  
  One more thing
&lt;/h2&gt;

&lt;p&gt;I checked the trtexec source after hitting this. The suffix parser doesn't validate — it looks for known letters, and if it doesn't recognize the pattern it just falls back silently. No warning, no error, nothing in the logs.&lt;/p&gt;

&lt;p&gt;It should be a hard error. It isn't. So now you know.&lt;/p&gt;

</description>
      <category>tensorrt</category>
      <category>jetson</category>
      <category>cuda</category>
      <category>debugging</category>
    </item>
  </channel>
</rss>
