HarmonyOS Native Intelligence: Face Detection Practice

#harmonyosnext

HarmonyOS Native Intelligence: Face Detection Practice

Background

Many scenarios in our company require face detection and recognition functions. Our image team has developed self-research models for face recognition and detection, where face recognition and pose recognition run on mobile devices, leveraging the TensorFlow engine for inference. After detecting a face, the data is sent to the server for face matching. Special scenarios like empty room detection also apply image technologies. Although HarmonyOS provides face detection interfaces, aligning with the effects on Android and iOS requires running self-developed models.

The initial approach to run local models was to compile the open-source TensorFlow Lite inference engine into a HarmonyOS version and load existing models for inference. This requires compiling the TFLite open-source C++ library for HarmonyOS. Additionally, TFLite on Android supports GPU acceleration through Android's GPU inference APIs, but HarmonyOS currently lacks such adaptation and has no announced plans. Relying solely on CPU may result in poor performance.

Later, we discovered that HarmonyOS' official edge AI framework, MindSpore Lite, supports inference with converted TFLite models. MindSpore Lite is a lightweight, high-performance edge AI engine that provides standard model inference and training interfaces, built-in high-performance operator libraries for general hardware, and native support for Neural Network Runtime Kit (NNRT) to enable AI-dedicated chip acceleration for inference, facilitating the creation of full-scenario intelligent applications.

This article introduces practical face detection inference using MindSpore Lite for TFLite models.

Model Conversion

MindSpore Lite uses .ms format models for inference. For third-party framework models like TensorFlow, TensorFlow Lite, Caffe, and ONNX, MindSpore Lite's model conversion tools can convert them to .ms models. Thus, we first convert the TFLite model to .ms. For demonstration, we use the open-source face detection model from Google MediaPipe.

First, install the model conversion tool (also obtainable via source code compilation; here, we use the ready-made tool):

Component	Hardware Platform	OS	Link	SHA-256
Edge Inference & Training Benchmark Tool, Converter Tool, Cropper Tool	CPU	Linux-x86_64	mindspore-lite-2.1.0-linux-x64.tar.gz	b267e5726720329200389e47a178c4f882bf526833b714ba6e630c8e2920fe89

MindSpore Lite's model conversion tool offers various parameter settings. Run ./converter_lite --help to get command parameters. Convert the TFLite model to .ms using:

1. ./converter_lite --fmk=TFLITE --modelFile=facedetech.tflite --outputFile=facedetech

Model Deployment and Inference

HarmonyOS MindSpore Lite inference engine provides C++ and ArkTS APIs. Here, we use C++ interfaces to demonstrate model deployment.

First, create the context environment:

// Create and configure context, set runtime thread count to 2 with big-core priority binding
OH_AI_ContextHandle context = OH_AI_ContextCreate();
if (context == NULL) {
  printf("OH_AI_ContextCreate failed.\n");
  return OH_AI_STATUS_LITE_ERROR;
}
// Prioritize NNRT inference.
// Use the first found NNRT hardware of type ACCELERATORS to create device info with high-performance mode.
// Alternatively, use OH_AI_GetAllNNRTDeviceDescs() to get all NNRT hardware descriptions and select by name/type.
OH_AI_DeviceInfoHandle nnrt_device_info = OH_AI_CreateNNRTDeviceInfoByType(OH_AI_NNRTDEVICE_ACCELERATOR);
if (nnrt_device_info == NULL) {
  printf("OH_AI_DeviceInfoCreate failed.\n");
  OH_AI_ContextDestroy(&context);
  return OH_AI_STATUS_LITE_ERROR;
}
OH_AI_DeviceInfoSetPerformanceMode(nnrt_device_info, OH_AI_PERFORMANCE_HIGH);
OH_AI_ContextAddDeviceInfo(context, nnrt_device_info);

// Then set up CPU inference.
OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU);
if (cpu_device_info == NULL) {
  printf("OH_AI_DeviceInfoCreate failed.\n");
  OH_AI_ContextDestroy(&context);
  return OH_AI_STATUS_LITE_ERROR;
}
OH_AI_ContextAddDeviceInfo(context, cpu_device_info);

This creates a heterogeneous inference context for NNRT (Neural Network Runtime) and CPU. NNRT interfaces with acceleration hardware like NPU for strong inference but limited operator support; CPU offers weaker inference but broader operator coverage. MindSpore Lite supports heterogeneous inference: operators are scheduled to NNRT first, and unsupported ones fall back to CPU.

Next, create and load the model:

// Create model
OH_AI_ModelHandle model = OH_AI_ModelCreate();
if (model == NULL) {
  printf("OH_AI_ModelCreate failed.\n");
  OH_AI_ContextDestroy(&context);
  return OH_AI_STATUS_LITE_ERROR;
}

// Load and compile model (model type: OH_AI_MODELTYPE_MINDIR)
int ret = OH_AI_ModelBuildFromFile(model, argv[1], OH_AI_MODELTYPE_MINDIR, context);
if (ret != OH_AI_STATUS_SUCCESS) {
  printf("OH_AI_ModelBuildFromFile failed, ret: %d.\n", ret);
  OH_AI_ModelDestroy(&model);
  return ret;
}

Load the model file via OH_AI_ModelBuildFromFile to create the Model.

Next, feed data to the model:

// Get input tensors
OH_AI_TensorHandleArray inputs = OH_AI_ModelGetInputs(model);
if (inputs.handle_list == NULL) {
  printf("OH_AI_ModelGetInputs failed, ret: %d.\n", ret);
  OH_AI_ModelDestroy(&model);
  return ret;
}

//TODO Write camera data to inputs

Retrieve the memory address representing the tensor space and write camera data into it. Camera data acquisition is detailed below.

Perform inference:

// Execute model inference
OH_AI_TensorHandleArray outputs;
ret = OH_AI_ModelPredict(model, inputs, &outputs, NULL, NULL);
if (ret != OH_AI_STATUS_SUCCESS) {
  printf("OH_AI_ModelPredict failed, ret: %d.\n", ret);
  OH_AI_ModelDestroy(&model);
  return ret;
}

Use OH_AI_ModelPredict for inference.

Finally, get inference results:

// Get and print model output tensors
for (size_t i = 0; i < outputs.handle_num; ++i) {
  OH_AI_TensorHandle tensor = outputs.handle_list[i];
  int64_t element_num = OH_AI_TensorGetElementNum(tensor);
  printf("Tensor name: %s, tensor size is %zu ,elements num: %lld.\n", OH_AI_TensorGetName(tensor),
        OH_AI_TensorGetDataSize(tensor), element_num);
  const float *data = (const float *)OH_AI_TensorGetData(tensor);

    float* out1 = outputs[1];
    std::vector<float> scores(out1, out1 + 16 * 16);
    float* out2 = outputs[0];
    std::vector<float> boxes(out2, out2 + 16 * 16);
    std::vector<float> highBox;

    for (size_t j = 0; j < scores.size(); j++) {
        scores[j] = 1.0f / (1.0f + std::exp(-scores[j]));
        if (scores[j] > 0.9) {
            for (size_t i = 0; i < 16; i++) {
                highBox.push_back(boxes[i * 16 + i]);
            }
            float sx = highBox[0];
            float sy = highBox[1];
            float w = highBox[2];
            float h = highBox[3];
            std::cout << "MS_LITE_LOG_AI: score： sx = " << sx << ",sy = " << sy << ", w = " << w << ", h = " << h << std::endl;
            float cx = sx;
            float cy = sy;

            cx /= 100.0f; // Assume modelInputWidth is 100
            cy /= 100.0f; // Assume modelInputHeight is 100

            float topleftX = cx - w * 0.5f;
            float topleftY = cy - h * 0.5f;
            float btmrightX = cx + w * 0.5f;
            float btmrightY = cy + h * 0.5f;

            std::cout << "MS_LITE_LOG_AI: score： " << scores[j] << std::endl;
            std::cout << "MS_LITE_LOG_AI: topleft： " << topleftX << "," << topleftY << std::endl;
            std::cout << "MS_LITE_LOG_AI: btmright： " << btmrightX << "," << btmrightY << std::endl;
            for (size_t j = 0; j < 6; j++) {
                float lx = highBox[4 + (2 * j) + 0];
                float ly = highBox[4 + (2 * j) + 1];
                lx /= 100.0f;
                ly /= 100.0f;
                std::cout << "MS_LITE_LOG_AI: key[" << j << "]:" << lx << "," << ly << std::endl;
            }
            break;
        }
    }

    // Release memory
    for (float* ptr : outputs) {
        delete[] ptr;
    }
}

Obtain inference results via output tensors.

For real-time face detection, continuously read from the camera and invoke the inference engine. Use HarmonyOS Media API's camera to open and configure the camera:

let cameraManager = camera.getCameraManager(context);  
let camerasInfo = this.cameraManager.getSupportedCameras();

 cameraInput = this.cameraManager.createCameraInput(cameraDevice);  
await this.cameraInput?.open()  
let captureSession = this.cameraManager.createCaptureSession();  
captureSession.beginConfig();  
captureSession.addInput(this.cameraInput);  
let previewProfile = previewProfiles[previewProfileIndex];  
let componentSurfacePreviewOutput = this.cameraManager.createPreviewOutput(previewProfile, this.surfaceId);  
captureSession.addOutput(this.componentSurfacePreviewOutput);  
let mReceiver = image.createImageReceiver(  
  previewProfile.size.width,  
  previewProfile.size.height,  
  image.ImageFormat.JPEG,  
  8  
);  
let receivingSurfaceId = await mReceiver.getReceivingSurfaceId();  
let imageReceiverPreviewOutput = this.cameraManager.createPreviewOutput(previewProfile, receivingSurfaceId);  
captureSession.addOutput(this.imageReceiverPreviewOutput);  
mReceiver.on('imageArrival', async () => {  
  let imageData = await mReceiver.readNextImage();  
  let imageJPEGComponent = await imageData.getComponent(image.ComponentType.JPEG);  
  // Pass imageJPEGComponent.byteBuffer data to the inference engine
  await imageData.release();  
})  
await captureSession.commitConfig()  
await startPreview()

References

Summary

This article introduces HarmonyOS' MindSpore Lite inference engine, how to convert legacy TFLite models to .ms format, and uses HarmonyOS camera APIs to open, configure, and create a capture session for real-time video streaming. Image receivers capture camera data, which is passed to the inference engine for processing. Completing the full flow requires camera permission application, etc., which are omitted here for brevity. The project will be open-sourced after code organization for shared learning.