wolfram

Posted on Apr 7

Setting Up and Using ONNX Runtime for C++ in Linux

#programming #tutorial #machinelearning #cpp

If you want to run machine learning models in a native C++ application on Linux, ONNX Runtime is one of the most practical tools available. It gives you a fast inference engine for models stored in ONNX format (.onnx file), which means you can train or export models elsewhere and then deploy them in a lightweight C++ program without having to bring along an entire Python environment.

That combination is especially useful when your application is already written in C++, whether that means a backend service, a robotics stack, a desktop application, or an embedded system. In those settings, Linux and CMake are likely already part of the workflow, so ONNX Runtime fits naturally into the existing build process.

In this post, I’ll walk through how to set up ONNX Runtime for C++ using CMake, and then show a simple image classification example to prove the setup works.

Setup

This is the project structure we'll follow:

onnx-classifier
├── CMakeLists.txt
├── external
│   └── onnxruntime/
├── models/
└── src
    └── main.cpp

You should have a Linux system (duh) with GCC and CMake installed. Additionally, you need to download the correct ONNX Runtime configuration and an ONNX image classification model.

The Basics

Run the following to install the basic tools on Ubuntu or Debian:

sudo apt update
sudo apt install build-essential cmake

Or on Fedora, run:

sudo dnf install @development-tools cmake

The Runtime

Go the ONNX Runtime Getting Started page, select your configuration, navigate to the Github link where you'll then download the respective gzipped tarball file.

For the configuration, select Linux, C++, X64, and Default CPU.

After downloading that, extract the contents.

tar xzf onnxruntime-linux-x64-<version>.tgz

Next, copy the include and lib directories from it into the folder external/onnxruntime. In your terminal, navigate to the folder where you extracted the runtime and run:

cp -r include lib <path-to-project>/external/onnxruntime

The include directory contains the headers needed by your C++ source files. The lib directory contains the shared library.

The Model

Go to the ONNX Model Zoo and download the squeezenext1_1 ONNX model (it's really simple and will suffice for this guide). Then move or copy the model into the models folder:

cp squeezenet1_1_Opset18.onnx <path-to-project>/models

Writing the CMake Configuration

Here is a clean CMakeLists.txt that sets up the project and links it against ONNX Runtime:

cmake_minimum_required(VERSION 3.23)
project(onnx-classifier CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

set(ORT_ROOT ${CMAKE_SOURCE_DIR}/external/onnxruntime)

add_library(onnxruntime SHARED IMPORTED)
set_target_properties(onnxruntime PROPERTIES
    IMPORTED_LOCATION ${ORT_ROOT}/lib/libonnxruntime.so
    INTERFACE_INCLUDE_DIRECTORIES ${ORT_ROOT}/include
)

add_executable(onnx-classifier src/main.cpp)
target_link_libraries(onnx-classifier PRIVATE onnxruntime)

set_target_properties(onnx-classifier PROPERTIES
    BUILD_RPATH ${ORT_ROOT}/lib
)

This configuration does a few important things. It defines the project, enables C++17, points to the ONNX Runtime library and headers, links the executable against that library, and sets an rpath so the executable can find libonnxruntime.so during local development.

A lot of this is standard CMake boilerplate for any project. But in case you aren't familiar with some of there other command this is what they do:

add_library(onnxruntime SHARED IMPORTED) tells CMake to create a library called onnxruntime, but to not build it from source. Instead it's an imported shared library because the compiled .so file exists on disk within the project directory.

set_target_properties(onnxruntime PROPERTIES
    IMPORTED_LOCATION ${ORT_ROOT}/lib/libonnxruntime.so
    INTERFACE_INCLUDE_DIRECTORIES ${ORT_ROOT}/include
)

set_target_properties sets properties on the specified target. You can find the list of all properties here.
IMPORTED_LOCATION property tells CMake the exact path to the shared library file (libonnxruntime.so). When the executable is linked, this is the library file it will use.
INTERFACE_INCLUDE_DIRECTORIES property tells CMake which header directory should be exposed to anything that links against the target. In practice, this means that when our executable links to onnxruntime, it automatically gets ${ORT_ROOT}/include added to its include path. That is why the compiler can find headers like onnxruntime_cxx_api.h without us having to manually call include_directories().

set_target_properties(onnx-classifier PROPERTIES
    BUILD_RPATH ${ORT_ROOT}/lib
)

BUILD_RPATH embeds a runtime search path into the executable during the build. In this case, it tells the program to look in ${ORT_ROOT}/lib for shared libraries. This makes it so the user doesn't have to manually specify LD_LIBRARY_PATH.

Using the Runtime

Now let’s write a small C++ program that loads the model, creates an input tensor, runs inference, and prints the predicted class index and respective score. It'll use a dummy input tensor rather than loading a real image from disk, so as to keep the code focused on using ONNX Runtime. Even so, the program will still perform actual inference through the model and produce a real output vector.

#include <algorithm>
#include <iostream>
#include <onnxruntime_cxx_api.h>
#include <vector>

int main() {
  try {
    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "classifier");
    Ort::SessionOptions session_options;
    session_options.SetIntraOpNumThreads(1);
    session_options.SetGraphOptimizationLevel(
        GraphOptimizationLevel::ORT_ENABLE_EXTENDED);

    const char *model_path = "../models/squeezenet1_1_Opset18.onnx";
    Ort::Session session(env, model_path, session_options);

    Ort::AllocatorWithDefaultOptions allocator;

    auto input_name_allocated = session.GetInputNameAllocated(0, allocator);
    auto output_name_allocated = session.GetOutputNameAllocated(0, allocator);

    const char *input_name = input_name_allocated.get();
    const char *output_name = output_name_allocated.get();

    auto input_type_info = session.GetInputTypeInfo(0);
    auto input_tensor_info = input_type_info.GetTensorTypeAndShapeInfo();
    auto input_shape = input_tensor_info.GetShape();

    std::cout << "Model loaded successfully.\n";
    std::cout << "Input name: " << input_name << "\n";
    std::cout << "Output name: " << output_name << "\n";

    std::cout << "Input shape: [";
    for (size_t i = 0; i < input_shape.size(); ++i) {
      std::cout << input_shape[i];
      if (i + 1 < input_shape.size()) {
        std::cout << ", ";
      }
    }
    std::cout << "]\n";

    std::vector<int64_t> resolved_input_shape = input_shape;
    for (auto &dim : resolved_input_shape) {
      if (dim == -1) {
        dim = 1;
      }
    }

    size_t input_tensor_size = 1;
    for (auto dim : resolved_input_shape) {
      input_tensor_size *= static_cast<size_t>(dim);
    }

    std::vector<float> input_tensor_values(input_tensor_size, 0.5f);

    Ort::MemoryInfo memory_info =
        Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);

    Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
        memory_info, input_tensor_values.data(), input_tensor_size,
        resolved_input_shape.data(), resolved_input_shape.size());

    const char *input_names[] = {input_name};
    const char *output_names[] = {output_name};

    auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_names,
                                      &input_tensor, 1, output_names, 1);

    if (output_tensors.empty() || !output_tensors[0].IsTensor()) {
      std::cerr << "Model did not return a valid output tensor.\n";
      return 1;
    }

    float *output_data = output_tensors[0].GetTensorMutableData<float>();
    auto output_info = output_tensors[0].GetTensorTypeAndShapeInfo();
    auto output_shape = output_info.GetShape();

    size_t output_size = 1;
    for (auto dim : output_shape) {
      output_size *= static_cast<size_t>(dim);
    }

    auto max_it = std::max_element(output_data, output_data + output_size);
    size_t predicted_class = std::distance(output_data, max_it);

    std::cout << "Output shape: [";
    for (size_t i = 0; i < output_shape.size(); ++i) {
      std::cout << output_shape[i];
      if (i + 1 < output_shape.size()) {
        std::cout << ", ";
      }
    }
    std::cout << "]\n";

    std::cout << "Predicted class index: " << predicted_class << "\n";
    std::cout << "Predicted score: " << *max_it << "\n";

  } catch (const Ort::Exception &e) {
    std::cerr << "ONNX Runtime error: " << e.what() << "\n";
    return 1;
  }

  return 0;
}

There are decent number of things used here from the API. When looking at the documentation, what you want to look at is the Ort namespace, and search from there.

Building and Running the Program

Now we have to compile everything together and run the executable. To do so, navigate back to the project root and create a build directory. Then cd into the build directory and configure and build the project. From root, run the following commands:

mkdir build
cmake ..
cmake --build .

And now we can run the executable!

./onnx-classifier

Model loaded successfully.
Input name: x
Output name: 117
Input shape: [1, 3, 224, 224]
Output shape: [1, 1000]
Predicted class index: 111
Predicted score: 5.98394

Make sure you run it from the build directory, otherwise you'll get the following error saying that the ONNX model can't be found.

ONNX Runtime error: Load model from ../models/squeezenet1_1_Opset18.onnx failed:Load model ../models/squeezenet1_1_Opset18.onnx failed. File doesn't exist

Conclusion

At this point, you have completed a minimal ONNX Runtime setup for C++ on Linux using CMake. You downloaded the runtime, linked against its shared library, built a small native executable, and ran a real ONNX model from C++ code. Even though the example I gave used a dummy input tensor, the important part is that the full inference pipeline is now in place: load a model, inspect its inputs and outputs, create a tensor, run the session, and read the results.

This pattern is the foundation for just about everything else you might want to do with ONNX in C++. A more realistic application would replace the dummy tensor with actual preprocessed input data, like an image converted into the format the model expects. From there, you could move on to richer computer vision examples like image classification with real images, object detection, or segmentation.

Thanks for reading :)

DEV Community