Intrducing momo-kiji: CUDA for Apple Neural Engine

Robert Reilly — Thu, 19 Mar 2026 08:42:26 +0000

Introducing momo-kiji: CUDA for Apple Neural Engine

Cross-posted to Medium and Hashnode

The Problem

Every Apple device has an Apple Neural Engine. Every single one. Billions of them.

Yet most ML developers ignore it completely.

Why? Because there's no good way to target it. CoreML is locked down. You can't bring your own models. You're stuck in Apple's walled garden.

Meanwhile, that ANE sits there doing nothing most of the time—a 10x efficiency boost, completely untapped.

Introducing momo-kiji

Today, we're releasing momo-kiji—an open-source CUDA-like SDK for Apple Neural Engine.

It's simple: compile your model once, target ANE directly, and get 10x better efficiency without rewriting anything.


python
import momo_kiji as mk

# Load any model
model = mk.load("model.onnx")

# Compile for ANE
compiled = model.compile(target="ane")

# Deploy
compiled.save("model_ane.mlmodel")

DEV Community: Robert Reilly

Intrducing momo-kiji: CUDA for Apple Neural Engine

Introducing momo-kiji: CUDA for Apple Neural Engine

The Problem

Introducing momo-kiji