Getting the NPU to actually initialize was, by a wide margin, the hardest part of building Redacto. It was not the application logic; that part was straightforward. It was roughly eighty [dramtically speaking] APK rebuilds, each one earned by staring at logcat, reading vendor libraries and headers that ship on the device, and doing adb shell find across the entire /vendor partition to chase down one more silent failure. The model was correct. The API was correct. The device was correct. And yet Engine.initialize() threw, silently, repeatedly, for reasons I could not find documented in the paths I checked.
The goal was Redacto, a zero-trust, on-device PII redaction app that runs Gemma 4 E2B entirely on the Hexagon V79 NPU inside the Snapdragon 8 Elite. (It later went on to win the Qualcomm x Google LiteRT Developer Hackathon 2026, but that is not the part that was hard.) The hard part was the gap between "the pieces are all correct" and "the engine actually starts."
This post is the document I wish had existed. Six distinct failure modes, in the order I hit them, with exact log lines, root causes, and fixes. If you are trying to get LiteRT-LM running on a Qualcomm NPU, especially on a Samsung Galaxy S25 Ultra with Android 16, this will save you most of those eighty rebuilds.
Environment
| Item | Value |
|---|---|
| Device | Samsung Galaxy S25 Ultra |
| Chipset | Snapdragon 8 Elite for Galaxy (SM8750-AC) |
| NPU | Hexagon V79 |
| Android | 16 (API 36) |
| LiteRT-LM dep | com.google.ai.edge.litertlm:litertlm-android:0.11.0-rc1 |
| Model |
gemma4_npu.litertlm (3.02 GB, NPU-compiled multimodal Gemma 4 E2B) |
All six failures below occurred on this exact configuration. Some are device-specific (Samsung), some are OS-version-specific (Android 16), and some are universal to anyone using QNN HTP on LiteRT-LM.
Failure 1: Dispatch Lib Symbol Mismatch
The log
E tflite : Encountered unresolved custom op: DISPATCH_OP.
E tflite : Node number 0 (DISPATCH_OP) failed to prepare.
Why it happens
When you use an NPU-compiled .litertlm model, the model's computation graph contains DISPATCH_OP custom ops. These are not standard TFLite operations: they are delegated operations that tell LiteRT "hand this subgraph to the QNN HTP backend." The library libLiteRtDispatch_Qualcomm.so registers these custom ops via C++ static-init constructors when it is dlopened by the runtime.
If dlopen fails, those constructors never run, and LiteRT sees DISPATCH_OP as an unresolved custom op. The critical detail: dlopen fails silently. There is no E dlopen: ... line in logcat. You just get the downstream symptom: the custom op registration never happened.
In our case, the dispatch lib version was mismatched against the AAR's libLiteRt.so. We had initially pulled libQnnHtp.so and related libs from the device's /vendor partition, thinking "same device, same chipset, should work." Wrong. The vendor-partition libs are built against a different ABI revision of QAIRT than what the LiteRT-LM AAR expects.
The fix
Pull the exact six native libraries from the official Qualcomm LLM sample's jniLibs/arm64-v8a/ directory. These are built against QAIRT 2.42 and are ABI-compatible with litertlm-android:0.11.0-rc1:
-
libLiteRtDispatch_Qualcomm.so(478,912 bytes) -
libGemmaModelConstraintProvider.so(20,092,072 bytes) -
libQnnHtp.so(2,778,176 bytes) -
libQnnHtpV79Skel.so(10,975,268 bytes) -
libQnnHtpV79Stub.so(679,168 bytes) -
libQnnSystem.so(2,983,560 bytes)
Source: the litert-samples GitHub repository, under compiled_model_api/qualcomm/llm_chatbot_npu/app/src/main/jniLibs/arm64-v8a/.
After replacing all six with the sample's QAIRT 2.42 set, logcat shows the dlopen succeeding:
I litert : [qnn_manager.cc:125] Loading qnn shared library from "libQnnHtp.so"
I litert : [qnn_manager.cc:134] Loaded qnn shared library
I tflite : Replacing 1 out of 1 node(s) with delegate (DispatchDelegate) for subgraph 0
Why it is subtle: The error message says "unresolved custom op." Your instinct is to look at the model file, or the op registration code, or the TFLite runtime version. None of those are the problem. The problem is a silent dlopen failure caused by an ABI mismatch in a native library you probably copied from a place that seemed authoritative (the device itself).
Failure 2: pickFirsts Masking Nothing
The investigation
After Failure 1, we suspected the pickFirsts block in app/build.gradle.kts was silently selecting wrong-version libs during the build. The pickFirsts directive tells AGP to resolve duplicate native library conflicts by picking the first one found. If the AAR shipped its own libQnnHtp.so and our jniLibs/ directory also had one, pickFirsts could be silently choosing the wrong copy.
We spent hours on this theory. It was wrong.
Why it was wrong
Inspecting the AAR contents at ~/.gradle/caches/.../litertlm-android-0.11.0-rc1.aar, it ships exactly three native libraries:
libLiteRt.solibLiteRtClGlAccelerator.soliblitertlm_jni.so
Zero overlap with our six QNN/dispatch libs. pickFirsts had nothing to deduplicate. It was a complete no-op.
What actually mattered
pickFirsts was a no-op. It had nothing to deduplicate, so leaving it in changes nothing (we kept the block as harmless defensive config in case a future dependency ever ships an overlapping lib).
The one real change in this area was removing libLiteRtCompilerPlugin_Qualcomm.so from jniLibs/: that library is for classical-model NPU JIT compilation, not LLM inference, and the official LLM sample does not bundle it.
Why I am including this: Because if you are debugging NPU init, you will probably go down this rabbit hole too. The pickFirsts block looks suspicious. It is not the problem. Save yourself the hours.
Failure 3: Hexagon DSP Cannot Find libQnnHtpV79Skel.so
This was the key fix. Understanding this failure requires understanding how the Hexagon DSP loads code.
The log
W apps_std_imp.c:1185: apps_std_fopen_with_env_fd failed with 0xd for /vendor/dsp/cdsp/./libQnnHtpV79Skel.so (No such file or directory)
E remote_handle_open_domain: dynamic loading failed for libQnnHtpV79Skel.so
E QnnDsp <E> Failed to find available PD for contextId 5 ... err: 1002
E litert: [qnn_manager.cc:556] Failed to create QNN context: 1002
E tflite : Failed to initialize kernel.
The architecture you need to understand
The Qualcomm AI Engine has a split-process architecture. Your Android app runs on the application processor (the ARM Cortex cores). But the actual NPU inference runs on the Hexagon DSP, a separate processor with its own firmware, its own address space, and its own filesystem view.
When the QNN HTP backend initializes, it needs to load a "skeleton" library (libQnnHtpV79Skel.so) onto the Hexagon DSP. This is done via FastRPC, Qualcomm's mechanism for remote procedure calls between the application processor and the DSP. The skeleton is the DSP-side implementation; the "stub" (libQnnHtpV79Stub.so) is the app-processor-side proxy.
Here is the critical detail: the DSP does not see your APK's lib/arm64-v8a/ directory. That directory lives in /data/app/, which is an app-processor filesystem path. The DSP loads the skeleton from a DSP-accessible path, and it searches a hardcoded fallback list:
/vendor/dsp/cdsp//vendor/lib/rfsa/adsp/- A handful of other vendor-specific paths
Plus whatever is in the ADSP_LIBRARY_PATH environment variable.
The Samsung-specific wrinkle
On the Samsung Galaxy S25 Ultra, the actual location of libQnnHtpV79Skel.so is:
/vendor/lib64/rfs/dsp/snap/libQnnHtpV79Skel.so
We found this by running:
adb shell find /vendor /system /odm -name 'libQnnHtpV79Skel.so'
The path /vendor/lib64/rfs/dsp/snap/ is not in FastRPC's hardcoded fallback list. This is a Samsung-specific vendor path. Without ADSP_LIBRARY_PATH pointing at it, the DSP will never find the skeleton, and QNN context creation fails with error 1002.
The timing trap
We were setting ADSP_LIBRARY_PATH, but in the wrong place. The code was inside InferenceEngine.initialize():
// WRONG: too late
fun initialize(...) {
android.system.Os.setenv("ADSP_LIBRARY_PATH", paths, true)
// ... then load engine
}
The problem: libQnnHtp.so reads the ADSP_LIBRARY_PATH environment variable once, when it is first dlopened. Our pre-init call System.loadLibrary("LiteRtDispatch_Qualcomm") was triggering that dlopen before we reached the setenv call. By the time we set the path, QnnHtp had already cached an empty value.
The fix
Create a custom Application subclass and set the environment variables in onCreate(), the earliest practical app hook that reliably runs before our library loading. (ContentProvider initialization and attachBaseContext() technically run earlier, but onCreate() is the simplest place that is still guaranteed to precede the first QNN dlopen.)
class RedactoApp : Application() {
override fun onCreate() {
super.onCreate()
val nativeLibDir = applicationInfo.nativeLibraryDir
val paths = listOf(
nativeLibDir,
"/vendor/lib64/rfs/dsp/snap", // Samsung S25 Ultra V79 skel
"/vendor/lib64/hw/audio", // Samsung alternate
"/vendor/dsp/cdsp",
"/vendor/lib64",
"/vendor/lib64/snap",
"/system/lib64",
).joinToString(":")
android.system.Os.setenv("ADSP_LIBRARY_PATH", paths, true)
android.system.Os.setenv("LD_LIBRARY_PATH", paths, true)
}
}
Wire it into AndroidManifest.xml:
<application android:name=".RedactoApp" ... >
After this fix, the DSP successfully finds the skeleton and creates a QNN Protection Domain.
Why it is subtle: Three things have to be correct simultaneously: (1) the path must include the Samsung-specific vendor directory, (2) the env var must be set before any QNN library is loaded, and (3) the env var must be set early enough to precede that load, which in practice means Application.onCreate() rather than an Activity or a later initialization step. Get any one of these wrong and you get the same opaque "error 1002."
Failure 4: OpenGL CreateSharedMemoryManager Unimplemented on Android 16
This was the most deceptive failure. The NPU was working. We did not know the NPU was working.
The log
E delegate_opengl.cc:218: Failed to create DelegateKernelLiteRt: UNIMPLEMENTED: CreateSharedMemoryManager is not implemented.
=== Source Location Trace: ===
third_party/odml/litert/ml_drift/delegate/gpu_backend_opengl.cc:169
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:337
third_party/odml/litert/ml_drift/delegate/delegate_kernel_litert.cc:167
E tflite : Failed to initialize kernel.
The root cause
Look at the source location trace carefully. This error is from gpu_backend_opengl.cc. It is a GPU error, not an NPU error.
LiteRT-LM's multimodal Gemma 4 model has sub-backends for different modalities. When you configure the engine, you specify:
val config = EngineConfig(
backend = Backend.NPU(nativeLibraryDir = nativeLibDir),
visionBackend = Backend.GPU(), // the dev guide example uses this
audioBackend = Backend.CPU(),
)
The developer guide example explicitly shows visionBackend = Backend.GPU(). The idea is that image preprocessing runs on the GPU while the language model runs on the NPU. Reasonable architecture.
On Android 16 (API 36), the GPU vision sub-backend tries to create an OpenGL shared memory manager and hits an unimplemented codepath in litert::ml_drift. The CreateSharedMemoryManager function is not yet implemented for Android 16's new graphics memory model. It throws.
And that throw kills the entire Engine.initialize() call. Not just the vision sub-backend. The whole engine. Even though the NPU backend had already successfully registered DispatchDelegate on multiple subgraphs and created QNN contexts totaling roughly 1.3 GB.
We saw Failed to initialize kernel and assumed the NPU was failing. We spent an entire day re-investigating Failures 1-3, thinking we had regressed. We had not. The NPU was fine. A completely unrelated GPU sub-backend was killing the process.
The fix
Set all sub-backends to CPU:
PreferredBackend.NPU -> {
val config = EngineConfig(
modelPath = modelPath,
backend = Backend.NPU(nativeLibraryDir = nativeLibDir),
visionBackend = Backend.CPU(), // NOT Backend.GPU(): hits an unimplemented OpenGL path on Android 16
audioBackend = Backend.CPU(),
maxNumTokens = 4000,
cacheDir = context.cacheDir.absolutePath,
)
engine = Engine(config).also { it.initialize() }
}
Our use case is text-only PII redaction. We do not run vision inference through the LiteRT-LM engine (OCR is handled separately by ML Kit). Setting visionBackend = Backend.CPU() costs us nothing and avoids the Android 16 OpenGL crash.
Why it is subtle: The error message says Failed to initialize kernel, the same message as every other failure. The only clue is the source location trace pointing at gpu_backend_opengl.cc, which you might dismiss as irrelevant if you think you are running on the NPU. You are running on the NPU. The NPU is not the part that failed.
Failure 5: Constrained Decoding Error 12 on NPU
The symptom
This one surfaces at runtime rather than init, but it blocks NPU usage just as effectively: the runtime reports that constrained decoding is not supported on NPU, with error 12.
The root cause
In the code, we had written:
ExperimentalFlags.enableConversationConstrainedDecoding = isNpu
The intent was: "NPU is our best backend, so let's enable advanced features on it." The logic was backwards. Constrained decoding forces the model's output to conform to a schema (JSON, tool-call format, etc.) by constraining the sampling at each token. The NPU executor does not support this. When createConversation is called with the flag set and the backend is NPU, the runtime returns error 12: not supported.
The official gallery sample passes enableConversationConstrainedDecoding as an external opt-in parameter, not tied to the backend. Constrained decoding is for structured-output use cases (tool calls, JSON schemas). For free-form text generation, which is what PII redaction needs, it should be off.
The fix
ExperimentalFlags.enableConversationConstrainedDecoding = false
Force off for all backends. Additionally, samplerConfig = null is required for NPU (the NPU uses its own runtime-default sampler; passing a custom SamplerConfig is not supported).
Why it is subtle: The variable name says "constrained decoding," which sounds like a quality feature. The boolean was tied to isNpu, which sounds intentional. The error says "error 12," which is not self-explanatory. You need to know that constrained decoding is a sampling-time constraint, that it requires specific executor support, and that QNN HTP does not provide it.
Failure 6: In-Process Re-Init (Known Constraint)
The log
After the NPU has successfully run in a process, switching to CPU or GPU and then back to NPU produces:
E QnnDsp: Failed to find available PD for contextId 5 ... err: 1002
E tflite : Encountered unresolved custom op: DISPATCH_OP.
The root cause
This is a QNN runtime constraint, not a bug in our code. When the QNN HTP backend initializes, it acquires a Protection Domain (PD) on the Hexagon DSP. A PD is an isolated execution environment on the DSP, think of it as a process-level sandbox. When you close the NPU Engine and create a non-NPU engine, the PD is released. But something in the QNN runtime's state, likely the FastRPC channel or the DSP driver's per-process tracking, is not fully cleaned up. When you try to acquire a new PD for the same process, the DSP refuses with error 1002.
The workaround
We did not fix this. It is not fixable from application code without restarting the process.
Our backend cascade handles it naturally: NPU is the default and is tried first on fresh launch. If it works, great. If the user manually switches to CPU/GPU and later wants NPU back, they need to kill the app and relaunch. Android will restart the Activity automatically.
A future enhancement could automate this by calling Process.killProcess(Process.myPid()) when the user selects NPU after a non-NPU session. We have not shipped that yet.
Why it matters: If you are building a backend-selection UI (as we were), you need to know that NPU is a "first or never" choice within a process lifetime. Design your UX accordingly.
What Successful NPU Init Looks Like
After all six issues are resolved, here is the log sequence for a clean NPU init. Annotated for clarity:
15:54:27.482 RedactoApp: Pre-init ADSP_LIBRARY_PATH=...
^^ Application.onCreate() seeds the env var
15:54:27.664 InferenceEngine: Attempting NPU backend (SDK_INT=36, Android 16)
15:54:27.681 litert: [qnn_manager.cc:401] Adding shared library dir to path
15:54:27.690 litert: [qnn_manager.cc:125] Loading qnn shared library from "libQnnHtp.so"
15:54:27.691 litert: [qnn_manager.cc:134] Loaded qnn shared library
^^ dlopen succeeds, DISPATCH_OP registered
15:54:27.788 tflite: Replacing 1/1 nodes with delegate (DispatchDelegate) for subgraph 0
15:54:28.361 tflite: DispatchDelegate for subgraph 1
15:54:28.740 tflite: DispatchDelegate for subgraph 1 (decoder)
15:54:28.744 tflite: DispatchDelegate for subgraph 4
^^ Four subgraphs delegated to Hexagon V79
15:54:30.171 InferenceEngine: NPU init succeeded
^^ Total init: ~2.5 seconds
The total wall-clock time from Engine(config) to initialize() returning is approximately 2.5 seconds. Most of that is the QNN backend loading context binaries (roughly 1.3 GB of pre-compiled Hexagon instructions) into DSP memory.
A caveat on these figures: the 2.5-second init and the context size come from a single directional session on one Galaxy S25 Ultra in May 2026. I did not save the raw logs and no longer have the device, so treat them as a snapshot of what a clean init looked like, not as rigorous, multi-run benchmarks.
Things That Did NOT Turn Out to Be the Problem
This section is for the next person who would otherwise invest time investigating these. We already did. They are not the issue.
Maven version (0.11.0-rc1). We initially suspected this was a pre-release bug. The exact same version is used by the official Qualcomm sample, which works. There is no newer published version on Maven Central or Google's Maven repository. Building from source via Bazel (as suggested in NPU_ISSUE_REPORT.md) is unnecessary.
useLegacyPackaging = true. This is required and should be kept. LiteRT's dispatch lookup does a filesystem readdir on applicationInfo.nativeLibraryDir. Without legacy packaging, that directory is empty because libraries stay compressed inside the APK. This is not a bug, it is an intended behavior, but it looks suspicious when you are debugging library loading issues.
Bundling libQnnHtpV79Skel.so in jniLibs/. Helpful but not sufficient on its own. The skeleton runs on the DSP and can only be loaded by FastRPC from paths listed in ADSP_LIBRARY_PATH. We bundle our copy AND point the path at the Samsung vendor location, for redundancy.
NPU model file naming. The filenames (gemma4_npu.litertlm vs gemma4.litertlm) do not affect init. What matters is that the NPU variant uses the NPU-compiled model (which contains DISPATCH_OP nodes) and the CPU/GPU variants use the generic model (which contains standard TFLite ops). The actual filenames are arbitrary as long as the variant-to-file mapping is consistent.
The Debugging Stack in Retrospect
Looking back at all those rebuilds, these six failures form a stack. Each one was only visible after the previous one was fixed. Failure 1 (dlopen mismatch) masked Failure 3 (DSP path), which masked Failure 4 (OpenGL sub-backend), which masked Failure 5 (constrained decoding). Failure 2 was a red herring that cost hours. Failure 6 is a permanent constraint we designed around.
The experience taught us something about debugging opaque vendor stacks: the error message you see is almost never the error you have. Encountered unresolved custom op: DISPATCH_OP can mean the dispatch lib is not loaded (Failure 1), the DSP cannot find the skeleton (Failure 3), or the QNN PD cannot be re-acquired (Failure 6). Failed to initialize kernel can mean the NPU backend failed (Failures 1, 3) or that a completely unrelated GPU sub-backend failed (Failure 4).
The only reliable debugging strategy was: fix one layer, rebuild, and see what the next layer reveals.
If you are working on LiteRT-LM NPU enablement and you have hit one of these walls, I hope this saves you time. If you find a seventh failure mode, please document it. The next person will thank you.
Related in this series of "Edge AI from the Trenches"
- What Is an NPU and Why Does It Exist?: foundational context on the hardware behind these failures
- What Is a Delegate in LiteRT?: how delegates route operations to the NPU and why separate model files are required
-
What's Inside a
.litertlmFile?: why DISPATCH_OP custom ops exist and what they mean
Jaydeep Shah is a developer with roots in embedded systems, Android platform internals, and silicon-level AI optimization. He now explores on-device AI inference - bringing models from the cloud to phones and edge hardware. Along with his team Edge Artists, he builds applications using LiteRT-LM and Gemma models on mobile hardware, and writes about what works, what breaks, and what he learns along the way. This post is part of the Edge AI from the Trenches series.
Last updated: July 2026
12th of 23 posts in the "Edge AI from the Trenches" series

Top comments (0)