kouwei qing

Posted on Dec 25, 2024 • Edited on Jan 8

HarmonyOS Next Native Intelligence - Face Detection in Practice

#harmonyos #harmonyosnext

Background

In many scenarios within the company, face detection and face recognition functions are used. Our image team independently developed models related to face recognition and face detection. Among them, face recognition and pose recognition run on mobile phones and perform inference based on the TensorFlow engine. After detecting a face, the face data is sent to the server for face matching. There are also some special scenarios where image-related technologies are applied, such as empty room detection. Although HarmonyOS provides a face detection interface, to align with the effects on the Android and iOS platforms, it is still necessary to run our self-developed models.

The first thought to run the local model is to compile the open-source TensorFlowLite inference engine into a HarmonyOS version and then run and load the existing model for inference. For this solution, first, it is necessary to compile the TFLite open-source C++ library to the HarmonyOS platform. Moreover, on the Android side, TFLite has GPU acceleration capabilities because the inference engine is adapted to the GPU-based inference APIs provided by the Android side. However, currently, no one has adapted it for the HarmonyOS platform, and there is no adaptation plan in sight for the time being. If only the CPU is used, the performance may be relatively poor.

Later, it was found that the side AI framework MindSporeLite provided by HarmonyOS officially supports inference after converting TFLite models. MindSpore Lite is a lightweight and high-performance side AI engine that provides standard model inference and training interfaces, has a built-in high-performance operator library for general hardware, and natively supports the Neural Network Runtime Kit to enable AI-specific chips to accelerate inference, helping to create intelligent applications for all scenarios.

This article introduces the practical implementation of inference using the TFLite model based on the MindSpore Lite engine.

Model Conversion

MindSpore Lite uses the.ms format model for inference. For models of third-party frameworks, such as TensorFlow, TensorFlow Lite, Caffe, ONNX, etc., they can be converted into.ms models using the model conversion tool provided by MindSpore Lite. So, we first need to convert the TFLite model into an.ms model.

For the convenience of demonstration, we directly use the open face detection model provided in Google Mediapipe here.

First, it is necessary to install the model conversion tool (it can also be obtained by compiling from the source code. Here, we directly use the ready-made tool):

Component	Hardware Platform	Operating System	Link	SHA-256
Side inference and training benchmark tool, converter tool, cropper tool	CPU	Linux-x86_64	mindspore-lite-2.1.0-linux-x64.tar.gz	b267e5726720329200389e47a178c4f882bf526833b714ba6e630c8e2920fe89

The MindSpore Lite model conversion tool provides multiple parameter settings, and we can choose to use them according to our needs. We can input ./converter_lite --help to obtain the command parameter information.

We use the following command to convert the TFLite model into an.ms model:

./converter_lite --fmk=TFLITE --modelFile=facedetech.tflite --outputFile=facedetech

Model Deployment and Inference

The HarmonyOS MindSpore Lite inference engine provides two APIs, C++ and ArkTS. Here, we take the C++ interface as an example to introduce how to deploy the face detection model.

Creating the Context Environment

// Create and configure the context, set the number of threads at runtime to 2, and the core binding strategy to prioritize large cores
OH_AI_ContextHandle context = OH_AI_ContextCreate();
if (context == NULL) {
  printf("OH_AI_ContextCreate failed.\n");
  return OH_AI_STATUS_LITE_ERROR;
}
// Prioritize using NNRT for inference.
// Here, use the first NNRT hardware of the ACCELERATORS category found to create nnrt device information and set the hardware to use the high-performance mode for inference. You can also use interfaces such as OH_AI_GetAllNNRTDeviceDescs() to obtain the description information of all NNRT hardware in the current environment, search according to device names, types, and other information, and find a specific device as the NNRT inference hardware.
OH_AI_DeviceInfoHandle nnrt_device_info = OH_AI_CreateNNRTDeviceInfoByType(OH_AI_NNRTDEVICE_ACCELERATOR);
if (nnrt_device_info == NULL) {
  printf("OH_AI_DeviceInfoCreate failed.\n");
  OH_AI_ContextDestroy(&context);
  return OH_AI_STATUS_LITE_ERROR;
}
OH_AI_DeviceInfoSetPerformanceMode(nnrt_device_info, OH_AI_PERFORMANCE_HIGH);
OH_AI_ContextAddDeviceInfo(context, nnrt_device_info);

// Then set CPU inference.
OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU);
if (cpu_device_info == NULL) {
  printf("OH_AI_DeviceInfoCreate failed.\n");
  OH_AI_ContextDestroy(&context);
  return OH_AI_STATUS_LITE_ERROR;
}
OH_AI_ContextAddDeviceInfo(context, cpu_device_info);

Here, a heterogeneous inference context for NNRT (Neural Network Runtime) and CPU is created. The acceleration hardware connected to NNRT, such as NPU, has strong inference capabilities but supports fewer operator specifications. The general CPU has weaker inference capabilities but supports a more comprehensive range of operator specifications. MindSpore Lite supports configuring heterogeneous inference between NNRT hardware and CPU: prioritize scheduling model operators to NNRT for inference. If certain operators are not supported by NNRT, they will be scheduled to the CPU for inference.

Creating and Loading the Model

// Create the model
OH_AI_ModelHandle model = OH_AI_ModelCreate();
if (model == NULL) {
  printf("OH_AI_ModelCreate failed.\n");
  OH_AI_ContextDestroy(&context);
  return OH_AI_STATUS_LITE_ERROR;
}

// Load and compile the model. The type of the model is OH_AI_MODELTYPE_MINDIR
int ret = OH_AI_ModelBuildFromFile(model, argv[1], OH_AI_MODELTYPE_MINDIR, context);
if (ret!= OH_AI_STATUS_SUCCESS) {
  printf("OH_AI_ModelBuildFromFile failed, ret: %d.\n");
  OH_AI_ModelDestroy(&model);
  return ret;
}

Use the OH_AI_ModelBuildFromFile function to load the model file from the local file path and create a Model.

Feeding Data to the Model

// Obtain the input tensor
OH_AI_TensorHandleArray inputs = OH_AI_ModelGetInputs(model);
if (inputs.handle_list == NULL) {
  printf("OH_AI_ModelGetInputs failed, ret: %d.\n");
  OH_AI_ModelDestroy(&model);
  return ret;
}

//TODO Write camera data into inputs

Retrieve the memory address representing the tensor space from the model and write the camera data into this memory. The data here is obtained from the camera, and the specific acquisition method will be introduced later.

Performing Inference

// Execute model inference
OH_AI_TensorHandleArray outputs;
ret = OH_AI_ModelPredict(model, inputs, &outputs, NULL, NULL);
if (ret!= OH_AI_STATUS_SUCCESS) {
  printf("OH_AI_ModelPredict failed, ret: %d.\n");
  OH_AI_ModelDestroy(&model);
  return ret;
}

Use the OH_AI_ModelPredict function for inference here.

Obtaining the Inference Results

// Obtain the output tensor of the model and print it
for (size_t i = 0; i < outputs.handle_num; ++i) {
  OH_AI_TensorHandle tensor = outputs.handle_list[i];
  int64_t element_num = OH_AI_TensorGetElementNum(tensor);
  printf("Tensor name: %s, tensor size is %zu,elements num: %lld.\n", OH_AI_TensorGetName(tensor),
        OH_AI_TensorGetDataSize(tensor), element_num);
  const float *data = (const float *)OH_AI_TensorGetData(tensor);

    float* out1 = outputs[1];
    std::vector<float> scores(out1, out1 + 16 * 16);
    float* out2 = outputs[0];
    std::vector<float> boxes(out2, out2 + 16 * 16);
    std::vector<float> highBox;

    for (size_t j = 0; j < scores.size(); j++) {
        scores[j] = 1.0f / (1.0f + std::exp(-scores[j]));
        if (scores[j] > 0.9) {
            for (size_t i = 0; i < 16; i++) {
                highBox.push_back(boxes[i * 16 + i]);
            }
            float sx = highBox[0];
            float sy = highBox[1];
            float w = highBox[2];
            float h = highBox[3];
            std::cout << "MS_LITE_LOG_AI: score： sx = " << sx << ",sy = " << sy << ", w = " << w << ", h = " << h << std::endl;
            float cx = sx;
            float cy = sy;

            cx /= 100.0f; // Assume modelInputWidth is 100
            cy /= 100.0f; // Assume modelInputHeight is 100

            float topleftX = cx - w * 0.5f;
            float topleftY = cy - h * 0.5f;
            float btmrightX = cx + w * 0.5f;
            float btmrightY = cy + h * 0.5f;

            std::cout << "MS_LITE_LOG_AI: score： " << scores[j] << std::endl;
            std::cout << "MS_LITE_LOG_AI: topleft： " << topleftX << "," << topleftY << std::endl;
            std::cout << "MS_LITE_LOG_AI: btmright： " << btmrightX << "," << btmrightY << std::endl;
            for (size_t j = 0; j < 6; j++) {
                float lx = highBox[4 + (2 * j) + 0];
                float ly = highBox[4 + (2 * j) + 1];
                lx /= 100.0f;
                ly /= 100.0f;
                std::cout << "MS_LITE_LOG_AI: key[" << j << "]:" << lx << "," << ly << std::endl;
            }
            break;
        }
    }

    // Release memory
    for (float* ptr : outputs) {
        delete[] ptr;
    }
}

Obtain the inference results through the output tensor here.

To achieve real-time face detection, we need to continuously read data from the camera, call the inference engine for inference. Use the camera in the HarmonyOS media interface to open and configure the camera.

let cameraManager = camera.getCameraManager(context);  
let camerasInfo = this.cameraManager.getSupportedCameras();

 cameraInput = this.cameraManager.createCameraInput(cameraDevice);  
await this.cameraInput?.open()  
let captureSession = this.cameraManager.createCaptureSession();  
captureSession.beginConfig();  
captureSession.addInput(this.cameraInput);  
let previewProfile = previewProfiles[previewProfileIndex];  
let componentSurfacePreviewOutput = this.cameraManager.createPreviewOutput(previewProfile, this.surfaceId);  
captureSession.addOutput(this.componentSurfacePreviewOutput);  
let mReceiver = image.createImageReceiver(  
  previewProfile.size.width,  
  previewProfile.size.height,  
  image.ImageFormat.JPEG,  
  8  
);  
let receivingSurfaceId = await mReceiver.getReceivingSurfaceId();  
let imageReceiverPreviewOutput = this.cameraManager.createPreviewOutput(previewProfile, receivingSurfaceId);  
captureSession.addOutput(this.imageReceiverPreviewOutput);  
mReceiver.on('imageArrival', async () => {  
  let imageData = await mReceiver.readNextImage();  
  let imageJPEGComponent = await imageData.getComponent(image.ComponentType.JPEG);  
  // Hand over the imageJPEGComponent.byteBuffer data to the inference engine
  await imageData.release();  
})  
await captureSession.commitConfig()  
await startPreview()

References

Summary

This article introduced the MindSpore Lite inference engine provided by HarmonyOS, how to convert the historical TFLite model into the.ms format, and finally, we used the HarmonyOS camera interface to open and configure the camera, and created a capture session to preview the video stream in real time. The image receiver is used to receive the image data collected by the camera and hand it over to the inference engine for inference to obtain and process the results. To implement the complete process, it is also necessary to apply for camera permissions, etc. To avoid redundancy, this has not been elaborated. After the code is organized, the project will be open-sourced for everyone to learn together.

DEV Community