What is ML runtime?

Popular ML runtime frameworks include ExecuTorch, ONNX runtime, TensoRT. They bridge the gap between model training and model deployment.

ML models are usually trained with PyTorch on a GPU. But directly deploying them in PyTorch format is cumbersome. PyTorch’s inference runtime libtorch is written in C++ and very heavy. Other ML runtime like ExecuTorch, also coming from the same PyTorch team, is built for embedded environment with lightweight and efficiency as its primary goals.

ML runtime is created to be deployed on different hardware backends, GPUs, NPUs, TPUs, CPUs, DSPs and other accelerators. So that models can be delivered in one single format (a DAG graph that describes the relationship between tensors and operations, as well as a binary of weights) and deployed on many different platforms.

DEV Community

What is ML runtime?

Top comments (0)