DEV Community

Tushar Thokdar
Tushar Thokdar

Posted on

512MiB 512MB — the silent trtexec bug

512MiB ≠ 512MB — the silent trtexec bug

This is a short one. But it cost me a full day, so.


What I was trying to do

Building a TensorRT FP16 engine on a Jetson Orin Nano. The device has a 512 MB NvMap CMA ceiling — a hard kernel memory limit. The FP16 build needed 606 MB. Everything failed.

After some research I found that limiting the TRT workspace might help. So I tried:

trtexec \
  --onnx=benchmarks/prithvi_4f.onnx \
  --saveEngine=benchmarks/prithvi_fp16.trt \
  --fp16 \
  --memPoolSize=workspace:512MiB
Enter fullscreen mode Exit fullscreen mode

It built. No errors. Engine saved. 591 MB on disk. Inference ran.

I thought I'd fixed it. I hadn't.


What's actually happening

The --memPoolSize flag has a strict set of valid suffixes: B, K, M, G.

MiB is not one of them.

When trtexec sees an unrecognized suffix it doesn't error. It doesn't warn. It silently strips the iB part and falls back to some default unit, parsing 512MiB as roughly 1 KB of workspace.

So TRT built the engine with effectively no scratch memory. It compiled because TRT can always find some kernel tactic — just not the good ones. The engine ran correctly, produced correct outputs, had reasonable latency. Nothing in the output told me anything was wrong.

That's the bad part. A crash would have been better.


The fix

# this silently gives you ~1KB of workspace
--memPoolSize=workspace:512MiB

# this actually gives you 512MB
--memPoolSize=workspace:512M
Enter fullscreen mode Exit fullscreen mode

Valid suffixes:

Suffix Meaning
B Bytes
K Kilobytes
M Megabytes
G Gigabytes

Single letter. No MB, no MiB, no GB, no GiB.


How to check if this happened to you

Look at the build log. There should be a line like:

[I] [TRT] Maximum workspace size: 536870912 bytes (512 MB)
Enter fullscreen mode Exit fullscreen mode

If it says something like 1024 bytes or anything under a few MB, your suffix was wrong and you got a degraded engine. Rebuild with M.

You can also verify the flag syntax first:

trtexec --help | grep memPoolSize
Enter fullscreen mode Exit fullscreen mode

Does this fix NvMap?

No. Limiting workspace reduces scratch memory usage during optimization but doesn't change where TRT allocates — it still goes through NvMap. The actual fix for that is a custom IGpuAllocator that calls cudaMalloc via ctypes, bypassing NvMap entirely. I wrote that up separately.


One more thing

I checked the trtexec source after hitting this. The suffix parser doesn't validate — it looks for known letters, and if it doesn't recognize the pattern it just falls back silently. No warning, no error, nothing in the logs.

It should be a hard error. It isn't. So now you know.

Top comments (0)