DEV Community

Cover image for How to Run LLMs Offline on Android Using Kotlin
Ferran Pons
Ferran Pons

Posted on

How to Run LLMs Offline on Android Using Kotlin

Cloud-based LLMs are powerful, but they’re not always the right tool for mobile apps.

They introduce:
• Network dependency
• Latency
• Usage-based costs
• Privacy concerns

As Android developers, we already ship complex logic on-device.
So the real question is:

Can we run LLMs fully offline on Android, using Kotlin?

Yes — and it’s surprisingly practical today.

In this article, I’ll show how to run LLMs locally on Android using Kotlin, powered by llama.cpp and a Kotlin-first library called Llamatik.

Why run LLMs offline on Android?

Offline LLMs unlock use cases that cloud APIs struggle with:
• 📴 Offline-first apps
• 🔐 Privacy-preserving AI
• 📱 Predictable performance & cost
• ⚡ Tight UI integration

Modern Android devices have:
• ARM CPUs with NEON
• Plenty of RAM (on mid/high-end devices)
• Fast local storage

The challenge isn’t hardware — it’s tooling.

llama.cpp: the engine behind on-device LLMs

llama.cpp is a high-performance C++ runtime designed to run LLMs efficiently on CPUs.

Why it’s ideal for Android:
• CPU-first (no GPU required)
• Supports quantized GGUF models
• Battle-tested across platforms

The downside?
It’s C++, and integrating it directly into Android apps is painful.

That’s where Llamatik comes in.

What is Llamatik?

Llamatik is a Kotlin-first library that wraps llama.cpp behind a clean Kotlin API.

It’s designed for:
• Android
• Kotlin Multiplatform (iOS & Desktop)
• Fully offline inference

Key features:
• No JNI in your app code
• GGUF model support
• Streaming & non-streaming generation
• Embeddings for offline RAG
• Kotlin Multiplatform–friendly API

You write Kotlin — native complexity stays inside the library.

Add Llamatik to your Android project

Llamatik is published on Maven Central.

dependencies {
    implementation("com.llamatik:library:0.12.0")
}
Enter fullscreen mode Exit fullscreen mode

No custom Gradle plugins.
No manual NDK setup.

Add a GGUF model

Download a quantized GGUF model (Q4 or Q5 recommended) and place it in:

androidMain/assets/
└── phi-2.Q4_0.gguf
Enter fullscreen mode Exit fullscreen mode

Quantized models are essential for mobile performance.

Load the model

val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")
LlamaBridge.initGenerateModel(modelPath)
Enter fullscreen mode Exit fullscreen mode

This copies the model from assets and loads it into native memory.

Generate text (fully offline)

val response = LlamaBridge.generate(
    "Explain Kotlin Multiplatform in one sentence."
)
Enter fullscreen mode Exit fullscreen mode

No network.
No API keys.
No cloud calls.

Everything runs on-device.

Streaming generation (for chat UIs)

Streaming is critical for good UX.

LlamaBridge.generateStreamWithContext(
    system = "You are a concise assistant.",
    context = "",
    user = "List three benefits of offline LLMs.",
    onDelta = { token ->
        // Append token to your UI
    },
    onDone = { },
    onError = { error -> }
)
Enter fullscreen mode Exit fullscreen mode

This works naturally with:
• Jetpack Compose
• ViewModels
• StateFlow

Embeddings & offline RAG

Llamatik also supports embeddings, enabling offline search and RAG use cases.

LlamaBridge.initModel(modelPath)
val embedding = LlamaBridge.embed("On-device AI with Kotlin")
Enter fullscreen mode Exit fullscreen mode

Store embeddings locally and build fully offline AI features.

Performance expectations

On-device LLMs have limits — let’s be honest:
• Use small, quantized models
• Expect slower responses than cloud GPUs
• Manage memory carefully
• Always call shutdown() when done

That said, for:
• Assistive features
• Short prompts
• Domain-specific tasks

The performance is absolutely usable on modern devices.

When does this approach make sense?

Llamatik is a great fit when you need:
• Offline support
• Strong privacy guarantees
• Predictable costs
• Tight UI integration

It’s not meant to replace large cloud models — it’s edge AI done right.

Try it yourself

• GitHub: https://github.com/ferranpons/llamatik
• Website & demo app: https://llamatik.com
• llama.cpp: https://github.com/ggml-org/llama.cpp
Enter fullscreen mode Exit fullscreen mode




Final thoughts

Running LLMs offline on Android using Kotlin is no longer experimental.

With the right abstractions, Kotlin developers can build private, offline, on-device AI — without touching C++.

If you’re curious about pushing AI closer to the device, this is a great place to start.

Top comments (0)