The 200ms Frame That Shouldn't Exist
My ROS2 depth estimation node was hitting 5 FPS. The Jetson Orin was barely breaking a sweat—GPU utilization sat at 12%. Something was terribly wrong. I'd just migrated from a custom CUDA pipeline to "standard" ROS2 image processing, expecting cleaner code with similar performance. Instead, I got a 4x latency regression.
The culprit wasn't the algorithm. It was the data path.
Every frame was taking a round trip through CPU memory—CUDA tensor to CPU, CPU to ROS message, ROS message back to CPU, CPU back to CUDA for the next node. Each hop costs 15-40ms depending on resolution. At 1080p, you're burning 80ms just shuffling pixels around before any actual computation happens.
This is the problem NVIDIA's Isaac ROS solves with NITROS (NVIDIA Isaac Transport for ROS). But does it actually deliver? I ran the benchmarks so you don't have to.
What NITROS Actually Does Under the Hood
Continue reading the full article on TildAlice

Top comments (0)