DEV Community

kouwei qing
kouwei qing

Posted on

HarmonyOS Next IM Practical Combat: Handling So Dynamic Library Crash Issues

Background Introduction

After the IM SDK for HarmonyOS Next was integrated into the application and launched, many users reported significant app lag, particularly those with a large number of conversations and messages. Due to design flaws in the SDK's logic, receiving a message triggered a full refresh of the conversation list, which involved querying the entire database. This operation was time-consuming when there were many conversations and was originally performed on the main thread, causing the app to freeze.

To address this issue, we decided to leverage HarmonyOS's Worker threads. We introduced two worker threads during app initialization: one for fetching the conversation list and another for message retrieval. This approach aimed to offload database operations from the main thread. Workers were chosen because these operations are complex, involve multiple tasks, and require background thread support.

Problem Description

After creating the worker thread as per the standard procedure and starting it during SDK initialization:

this.syncConfigWorkerInstance = new worker.ThreadWorker("../workers/SyncConfigAndConvWorker.ets");  
this.syncConfigWorkerInstance.onmessage = async (event) => {
  // Event handler
}
Enter fullscreen mode Exit fullscreen mode

The application crashed or froze immediately after starting the worker, even though the worker thread itself did nothing. The crash occurred in a C++ shared library (.so) unrelated to the worker's logic, which was puzzling:

Timestamp: 2025-05-07 17:12:54:794
Tid: 8025, Name: beike.imsdk.tob
#00 pc 00000000001b6344 /system/lib/ld-musl-aarch64.so.1(fcf57c313493609e8c78a5f07477c358)
#01 pc 00000000001b834c /system/lib/ld-musl-aarch64.so.1(pthread_cond_timedwait+188)(fcf57c313493609e8c78a5f07477c358)
#02 pc 00000000000c430c /data/storage/el1/bundle/libs/arm64/libc++_shared.so(std::__n1::condition_variable::wait(std::__n1::unique_lock<std::__n1::mutex>&)+20)(cdf97be9396a35e8f4806f252f90a11320d26ec6)
#03 pc 00000000000c4e9c /data/storage/el1/bundle/libs/arm64/libc++_shared.so(std::__n1::__assoc_sub_state::__sub_wait(std::__n1::unique_lock<std::__n1::mutex>&)+48)(cdf97be9396a35e8f4806f252f90a11320d26ec6)
#04 pc 00000000000447f4 /data/storage/el1/bundle/libs/arm64/libmarsstn.so(4ed6d779cb9585f42f228dfc8a706399ce60a56f)
#05 pc 0000000000044344 /data/storage/el1/bundle/libs/arm64/libmarsstn.so(4ed6d779cb9585f42f228dfc8a706399ce60a56f)
#06 pc 00000000000d7cf0 /data/storage/el1/bundle/libs/arm64/libmarsstn.so(4ed6d779cb9585f42f228dfc8a706399ce60a56f)
#07 pc 000000000006983c /data/storage/el1/bundle/libs/arm64/libmarsstn.so(4ed6d779cb9585f42f228dfc8a706399ce60a56f)
Enter fullscreen mode Exit fullscreen mode

Problem Diagnosis

Initially, the crash was reproducible in a specific scenario but later became random. Disabling the dynamic library initialization or the worker thread resolved the issue. To identify the root cause, we enabled multithreading detection on a physical device:

hdc shell param set persist.ark.properties 0x107c
Enter fullscreen mode Exit fullscreen mode

After rebooting the device and reproducing the crash, we obtained the following details:

Process name: com.beike.imsdk.tob
Process life time: 34s
Reason: Signal: SIGABRT(SI_TKILL)@0x01317b4e00004e2a from: 20010:20020046
LastFatalMessage: [(ark_native_reference.cpp:117)(Get)] param env is not equal to its owner
Fault thread info:
Tid: 20010, Name: kou.imsdk.tob
#00 pc 0000000000199e1c /system/lib/ld-musl-aarch64.so.1(raise+228)(6b9883f518515f73e093bce9a89a2548)
#01 pc 0000000000146f8c /system/lib/ld-musl-aarch64.so.1(abort+20)(6b9883f518515f73e093bce9a89a2548)
#02 pc 0000000000056fac /system/lib64/platformsdk/libace_napi.z.so(ArkNativeReference::Get(NativeEngine*)+476)(edf034e044dbf26f955142c343577527)
#03 pc 000000000005f0a0 /system/lib64/platformsdk/libace_napi.z.so(napi_get_reference_value+48)(edf034e044dbf26f955142c343577527)
#04 pc 00000000000ea354 /data/storage/el1/bundle/libs/arm64/libmarsstn.so(160beef25288e9539a33ed6f307aa362ceb17fc1)
#05 pc 0000000000070754 /system/lib64/platformsdk/libace_napi.z.so(NativeSafeAsyncWork::ProcessAsyncHandle()+596)
Enter fullscreen mode Exit fullscreen mode

Using the addr2line tool, we traced the crash to the napi_get_reference_value call in libmarsstn.so, where the env parameter was invalid.

Further investigation revealed that the env was cached from the main thread during SO initialization (performed on the main thread). Adding logs confirmed that the crash occurred when the main thread used this cached env, which had been overwritten by a worker thread:

Image description

Logs showed that the napi_module_register function in the SO library was being called from worker threads, overwriting the cached mainEnv_ (a singleton in C++).

In HarmonyOS Next, Worker threads are isolated with their own memory spaces, similar to separate processes. Each Worker thread loads the SO library independently, leading to multiple initializations.

Solution

To resolve the issue, we ensured that mainEnv_ is initialized only once, during the first call from the main thread. We used std::atomic for thread-safe initialization:

std::atomic<bool> flag{false};

bool expected = false;
if (flag.compare_exchange_strong(expected, true)) {
    napi_status status;
    int32_t ret;
    auto context = MarsNapiManager::getInstance();
    if (context) {
        context->mainEnv_ = env;
        context->mainThreadId = std::this_thread::get_id();
        auto marsNapi = context->getMarsNapi();
        marsNapi->Export(env, exports);
    }
} else {
    xdebug2(TSF"napi-->initXComponent not first");
}
Enter fullscreen mode Exit fullscreen mode

Another Multithreading Issue

When passing SDK configuration parameters to the worker thread using HarmonyOS Next's Sendable mechanism, the application crashed with the error:

TypeError: Cannot set sendable property with mismatched type
Enter fullscreen mode Exit fullscreen mode

The crash occurred during postMessage. Initially, we serialized the object to JSON to bypass the issue. Further investigation revealed that the demo used class instances (which implemented Sendable), while the app used Record objects, which lacked Sendable implementation.

Conclusion

This case highlights critical differences between HarmonyOS Next's Worker threads and traditional thread models: each Worker thread loads SO libraries independently, causing the main thread's env cached in C++ singletons to be overwritten. Key findings:

  1. Root Cause: The main thread's cached env (mainEnv_) was overwritten by worker threads during napi_module_register, leading to invalid env usage in napi_get_reference_value.
  2. Thread Characteristics: Worker threads have strong memory isolation, similar to separate processes, causing SO libraries to be reloaded.
  3. Solution: Use std::atomic to ensure mainEnv_ is initialized only once on the main thread, with compare_exchange_strong for thread safety.
  4. Derived Issue: Cross-thread data transfer must strictly follow the Sendable protocol; non-Sendable objects (e.g., Record) should be serialized to JSON.

This case underscores two critical considerations in HarmonyOS multithreading: thread safety during SO library initialization and strict adherence to cross-thread data transfer protocols.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.