When allocating GPU memory for TensorRT I/O tensors in Python, the most
natural code path leads to a hard process crash that leaves no traceback and
only appears at runtime inside a GStreamer pipeline. This article documents
the three conditions that must align for the crash to occur, explains why it
is invisible in unit tests, and provides a safe replacement for the broken
iteration pattern.
The problem
ICudaEngine.get_tensor_shape() returns a trt.Dims object.
Iterating over it with tuple() / list() works fine in a plain Python
script but crashes the process hard (exit 134 SIGABRT or exit 139 SIGSEGV)
when the same code runs inside a GStreamer plugin loaded through the
PyGObject / GLib Python wrapper. No Python traceback is produced because
the fault occurs in native code.
Conditions required to reproduce
The crash only occurs when all three of the following are true at the same
time:
| # | Condition | Why it matters |
|---|---|---|
| 1 | The engine was pre-converted from ONNX to a serialised TRT .engine file and is loaded at runtime via Runtime.deserialize_cuda_engine
|
A freshly built engine (ONNX compiled in the same process) does not trigger the issue — the internal memory layout of the Dims object differs between the two paths |
| 2 | The code runs inside a GStreamer plugin loaded by the PyGObject / GLib Python wrapper (gi.repository) |
PyGObject installs its own allocator hooks that alter how Pybind11 objects interact with the Python sequence protocol; the same code in a plain Python script or pytest session is unaffected |
| 3 | The tensor rank is 4-D or higher | Lower-rank tensors happen to stay within the bounds that __iter__ over-reads, so the crash does not occur for them |
In practice this means the bug is easy to miss: unit tests that load the engine
directly (without GStreamer) pass without error, and the crash only surfaces
at runtime inside the pipeline.
Context: how trt.Dims is used
The shape read from get_tensor_shape is used to pre-allocate CuPy GPU
buffers for each I/O tensor, whose device pointers are then registered with
the execution context before calling execute_async_v3.
Passing the trt.Dims object directly to cp.zeros() is the natural
approach — CuPy accepts any sequence as the shape argument:
buf = cp.zeros(dims, dtype=dtype) # CuPy internally calls tuple(dims) → crash
CuPy converts the shape argument to a tuple internally, which triggers
trt.Dims.__iter__ and causes the crash — even though dims is never
explicitly converted in user code.
Where the crash does NOT appear
Running the following in an ordinary Python interpreter or pytest session
completes without error:
import tensorrt as trt
trt_logger = trt.Logger(trt.Logger.WARNING)
with open("model.engine", "rb") as f, trt.Runtime(trt_logger) as rt:
engine = rt.deserialize_cuda_engine(f.read())
for i in range(engine.num_io_tensors):
name = engine.get_tensor_name(i)
dims = engine.get_tensor_shape(name)
shape = tuple(dims) # works here
print(name, shape)
Where the crash DOES appear
The same iteration crashes when it runs inside a GStreamer BaseTransform
plugin written in Python and loaded by GStreamer via the PyGObject / GLib
introspection layer (gi.repository).
Minimal plugin skeleton that triggers the crash:
import gi
gi.require_version("Gst", "1.0")
gi.require_version("GstBase", "1.0")
from gi.repository import Gst, GstBase
import tensorrt as trt
Gst.init(None)
class MyTransform(GstBase.BaseTransform):
__gstmetadata__ = ("Test", "Transform", "Test", "Author")
__gsttemplates__ = ()
def do_transform_ip(self, buffer):
# engine loaded earlier via trt.Runtime.deserialize_cuda_engine
for i in range(self.engine.num_io_tensors):
name = self.engine.get_tensor_name(i)
dims = self.engine.get_tensor_shape(name)
shape = tuple(dims) # ← SIGSEGV / exit 139 for 4-D tensors
return Gst.FlowReturn.OK
Running a pipeline that processes a single buffer through MyTransform
terminates with:
Process finished with exit code 139 (SIGSEGV)
or occasionally:
Process finished with exit code 134 (SIGABRT)
What differs in the GLib/PyGObject context
- PyGObject installs its own memory allocator hooks and modifies how Python objects manage reference counts across the GLib/GObject boundary.
- The
trt.Dimsobject returned byget_tensor_shapeis a thin Pybind11 wrapper around a C++ struct. Its__iter__implementation appears to rely on Python's sequence protocol in a way that is sensitive to allocator or GIL state changes introduced by PyGObject. - The same object's
__len__and__getitem__(index access) remain functional in both contexts.
Additionally, dims.nbDims — the C++ API attribute that would give the rank —
is not exposed on the Python object, so it cannot be used as a guard:
dims.nbDims # AttributeError in both contexts
Fix
Avoid __iter__ entirely; use __len__ to get the rank and index access to
read each dimension:
dims = engine.get_tensor_shape(name)
shape = tuple(dims[i] for i in range(len(dims)))
This produces the correct result (e.g. (1, 3, 256, 256)) in both plain
Python and inside a PyGObject-loaded GStreamer plugin, and can be passed safely
to cp.zeros:
dims = engine.get_tensor_shape(name)
shape = tuple(dims[i] for i in range(len(dims)))
dtype = trt.nptype(engine.get_tensor_dtype(name))
buf = cp.zeros(shape, dtype=dtype)
context.set_tensor_address(name, buf.data.ptr)
# later, inside inference:
context.execute_async_v3(stream_handle=stream.ptr)
Environment
| Component | Details |
|---|---|
| OS | Ubuntu 24.04 |
| Python | 3.12 |
| GStreamer | 1.26.5 (PyGObject / gi.repository) |
| TensorRT package |
tensorrt / tensorrt_bindings
|
| Tensor rank where crash is observed | 4-D and above |
Top comments (0)