Ferran Pons

Posted on Jan 28

How to Run LLMs Offline on Android Using Kotlin

#android #kotlin #ai #machinelearning

Cloud-based LLMs are powerful, but they’re not always the right tool for mobile apps.

They introduce:
• Network dependency
• Latency
• Usage-based costs
• Privacy concerns

As Android developers, we already ship complex logic on-device.
So the real question is:

Can we run LLMs fully offline on Android, using Kotlin?

Yes — and it’s surprisingly practical today.

In this article, I’ll show how to run LLMs locally on Android using Kotlin, powered by llama.cpp and a Kotlin-first library called Llamatik.

Why run LLMs offline on Android?

Offline LLMs unlock use cases that cloud APIs struggle with:
• 📴 Offline-first apps
• 🔐 Privacy-preserving AI
• 📱 Predictable performance & cost
• ⚡ Tight UI integration

Modern Android devices have:
• ARM CPUs with NEON
• Plenty of RAM (on mid/high-end devices)
• Fast local storage

The challenge isn’t hardware — it’s tooling.

llama.cpp: the engine behind on-device LLMs

llama.cpp is a high-performance C++ runtime designed to run LLMs efficiently on CPUs.

Why it’s ideal for Android:
• CPU-first (no GPU required)
• Supports quantized GGUF models
• Battle-tested across platforms

The downside?
It’s C++, and integrating it directly into Android apps is painful.

That’s where Llamatik comes in.

What is Llamatik?

Llamatik is a Kotlin-first library that wraps llama.cpp behind a clean Kotlin API.

It’s designed for:
• Android
• Kotlin Multiplatform (iOS & Desktop)
• Fully offline inference

Key features:
• No JNI in your app code
• GGUF model support
• Streaming & non-streaming generation
• Embeddings for offline RAG
• Kotlin Multiplatform–friendly API

You write Kotlin — native complexity stays inside the library.

Add Llamatik to your Android project

Llamatik is published on Maven Central.

dependencies {
    implementation("com.llamatik:library:0.12.0")
}

No custom Gradle plugins.
No manual NDK setup.

Add a GGUF model

Download a quantized GGUF model (Q4 or Q5 recommended) and place it in:

androidMain/assets/
└── phi-2.Q4_0.gguf

Quantized models are essential for mobile performance.

Load the model

val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")
LlamaBridge.initGenerateModel(modelPath)

This copies the model from assets and loads it into native memory.

Generate text (fully offline)

val response = LlamaBridge.generate(
    "Explain Kotlin Multiplatform in one sentence."
)

No network.
No API keys.
No cloud calls.

Everything runs on-device.

Streaming generation (for chat UIs)

Streaming is critical for good UX.

LlamaBridge.generateStreamWithContext(
    system = "You are a concise assistant.",
    context = "",
    user = "List three benefits of offline LLMs.",
    onDelta = { token ->
        // Append token to your UI
    },
    onDone = { },
    onError = { error -> }
)

This works naturally with:
• Jetpack Compose
• ViewModels
• StateFlow

Embeddings & offline RAG

Llamatik also supports embeddings, enabling offline search and RAG use cases.

LlamaBridge.initModel(modelPath)
val embedding = LlamaBridge.embed("On-device AI with Kotlin")

Store embeddings locally and build fully offline AI features.

Performance expectations

On-device LLMs have limits — let’s be honest:
• Use small, quantized models
• Expect slower responses than cloud GPUs
• Manage memory carefully
• Always call shutdown() when done

That said, for:
• Assistive features
• Short prompts
• Domain-specific tasks

The performance is absolutely usable on modern devices.

When does this approach make sense?

Llamatik is a great fit when you need:
• Offline support
• Strong privacy guarantees
• Predictable costs
• Tight UI integration

It’s not meant to replace large cloud models — it’s edge AI done right.

⸻

Try it yourself

• GitHub: https://github.com/ferranpons/llamatik

• Website & demo app: https://llamatik.com

• llama.cpp: https://github.com/ggml-org/llama.cpp

Final thoughts

Running LLMs offline on Android using Kotlin is no longer experimental.

With the right abstractions, Kotlin developers can build private, offline, on-device AI — without touching C++.

If you’re curious about pushing AI closer to the device, this is a great place to start.

DEV Community