If you're working with Ollama in a Kotlin project, you've probably wished for a client that feels truly native – one that embraces coroutines, sealed classes, and the type safety we all love.
That's exactly why I built KOllama – a Kotlin client for Ollama, powered by Ktor Client and designed to make local LLM interactions delightful.
In this post, I'll walk you through what makes KOllama different, show you how to get started, and share some of the design decisions behind it.
Why KOllama?
Ollama's REST API is simple, but using it from Kotlin can be messy:
- You have to manually map fields (like
prompt_eval_countto something readable). - Streaming responses require manual SSE handling.
- There's no built‑in type safety – you're dealing with raw JSON or
Maps.
KOllama changes that by providing:
✅ Full type safety – every request and response is a Kotlin data class.
✅ Kotlin‑first API – suspend functions, Flow for streaming, DSL builders, and sensible defaults.
✅ Semantic naming – fields like evaluatedInputTokens instead of prompt_eval_count.
✅ Built on Ktor – you get all the power of Ktor's HTTP client (engine switching, logging, timeouts) with zero extra bloat.
✅ Easy customisation – pass your own HttpClient or configure the engine directly.
Quick Start
Add the JitPack repository and dependency:
// settings.gradle.kts or build.gradle.kts
repositories {
mavenCentral()
maven { url = uri("https://jitpack.io") }
}
dependencies {
implementation("com.github.BlophyNova:kollama:main-SNAPSHOT") // or a specific commit hash
}
Now you can start chatting with your local models:
val client = KOllamaClient() // defaults to http://localhost:11434
suspend fun main() {
// Generate text
val response = client.generate(
model = "llama3",
prompt = "Why is Kotlin awesome?",
system = "You are a helpful assistant."
)
println(response.response)
// Stream tokens
client.generateFlow(
model = "llama3",
prompt = "Tell me a story"
).collect { chunk ->
print(chunk.response)
}
}
A Peek Under the Hood
Type‑Safe Models, Clear Semantics
Ollama's API uses names like prompt_eval_count and num_ctx. These are fine if you already know the API inside out, but they're cryptic to everyone else.
In KOllama, fields are renamed to be both accurate and Kotlin‑idiomatic:
@Serializable
data class GenerateRequest(
val model: String,
val prompt: String,
val images: List<String>? = null,
@SerialName("system") val systemPrompt: String? = null,
// ...
)
@Serializable
data class GenerateResponse(
val model: String,
val response: String,
val done: Boolean,
@SerialName("prompt_eval_count") val evaluatedInputTokens: Int,
@SerialName("eval_count") val outputTokens: Int,
// ...
)
The original JSON field names are preserved via @SerialName, so the wire format stays compatible. You get clean Kotlin code without sacrificing interoperability.
Streaming with Flow
Ollama's streaming endpoints are a perfect match for Kotlin's Flow. KOllama returns a Flow of response chunks, so you can easily combine, transform, or collect them:
client.generateFlow(
model = "mistral",
prompt = "Write a poem about coroutines"
).collectIndexed { index, chunk ->
println("[${index + 1}] ${chunk.response}")
}
Rich Options DSL
Many models accept parameters like temperature, top_k, or num_ctx. Instead of forcing you to build a map, KOllama provides a type‑safe DSL:
val response = client.generate {
model = "llama3"
prompt = "What's the capital of France?"
options {
temperature = 0.8f
topP = 0.9f
contextSize = 2048
}
}
Under the hood, this builds a GenerateOptions object that's serialised into the correct JSON fields (top_p, num_ctx, etc.). Your IDE gives you autocompletion and type checks – no magic strings involved.
Multi‑Modal Support
KOllama natively supports sending images alongside text:
val response = client.generate {
model = "llava"
prompt = "What's in this picture?"
images(encodeImageToBase64(myImageFile))
}
More Than Just Generate
KOllama covers the entire Ollama API:
// List models
val models = client.listModels()
// Pull a model
client.pullModel("mistral")
// Embed text
val embedding = client.embeddings(
model = "nomic-embed-text",
prompt = "Kotlin is fantastic"
)
// Chat with tool support
val chatResponse = client.chat {
model = "llama3.1"
message {
role = ChatRole.User
content = "What's the weather in Paris?"
}
tool {
name = "get_weather"
description = "Get the current weather for a location"
}
}
Every endpoint gets the same type‑safe treatment. No guessing, no magic strings.
Why Not the Alternatives?
vs. Ollama4j
Ollama4j is the most popular JVM client for Ollama, but it's Java‑first – and it shows.
Setting options requires a Map:
// Ollama4j
OllamaAPI api = new OllamaAPI("http://localhost:11434");
OllamaStreamResult result = api.generateWithStreaming(
"llama3", "Hello", "",
new StreamHandler() {
@Override
public void handle(String token) {
System.out.print(token); // callback hell
}
}
);
With KOllama:
client.generateFlow("llama3", "Hello").collect { print(it.response) }
Streaming in Ollama4j uses a callback interface – fine for Java, but not idiomatic in Kotlin. KOllama returns a Flow, so you can use all of Kotlin's flow operators:
client.generateFlow {
model = "mistral"
prompt = "Write a poem"
}.map { it.response }
.filter { it.isNotBlank() }
.collect { print(it) }
vs. nirmato-ollama
nirmato-ollama is a Kotlin Multiplatform library, so at first glance it looks like a natural fit. But the API design tells a different story – it's KMP, not Kotlin‑first:
// nirmato-ollama
val client = OllamaClient(CIO) {
httpClient {
defaultRequest {
url("http://localhost:11434/api/") // manually appending /api/
}
}
}
val request = chatRequest {
model("llama3") // function call, not property assignment
messages(listOf(Message(role = USER, content = "Hello"))) // manual listOf wrapping
options(Options(temperature = 0.7)) // separate Options object
stream(true) // stream is a function too
}
client.chatStream(request).collect { chunk ->
chunk.message?.content?.let { print(it) } // two levels of safe calls just to get text
}
With KOllama:
val client = KOllamaClient()
client.chatFlow {
model = "llama3"
message {
role = ChatRole.User
content = "Hello"
}
options { temperature = 0.7f }
}.collect { print(it.message?.content) }
The difference isn't just fewer lines – it's that KOllama's API reads like Kotlin, while nirmato's reads like a Java builder that learned Kotlin syntax last week.
Summary
| Ollama4j | nirmato-ollama | KOllama | |
|---|---|---|---|
| Language | Java | Kotlin (KMP) | Kotlin (JVM) |
| Streaming | Callback |
Flow (awkward) |
Flow (idiomatic) |
| DSL | ❌ | ⚠️ Half-baked | ✅ |
| Type safety | ⚠️ Map-based options | ⚠️ | ✅ |
| Kotlin code style | ❌ | ⚠️ | ✅ |
Design Philosophy
KOllama is built on three principles:
Kotlin first – not just "works in Kotlin", but designed for Kotlin. Suspend functions, Flow-based streaming, named parameters, default values, and DSL builders are first-class citizens, not afterthoughts.
Type safety over convenience – if the API allows multiple shapes for a field, it's modelled with sealed classes. You'll never get a ClassCastException at runtime.
Semantic clarity – field names should tell you what they mean. evaluatedInputTokens is much clearer than prompt_eval_count; contextSize is clearer than num_ctx. The original wire names are preserved via @SerialName, so you get clean code without losing compatibility.
What's Next?
KOllama is still in early development, but the core generate and chat APIs are already working. The immediate roadmap:
- ✅ Add comprehensive tests using Ktor's
MockEngine - ✅ Publish to Maven Central
- ✅ Write detailed documentation and more examples
Full documentation is available there, including API reference, all DSL options, and more examples.
The project is open source on GitHub: BlophyNova/kollama. Feedback and contributions are very welcome.
- ⭐ Star the repo to show your interest
- 🐛 Try it out and report issues
- 💡 Suggest improvements or new features
- 🔧 Submit a PR – there's plenty of low‑hanging fruit
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.