DEV Community

Mojave Hao
Mojave Hao

Posted on

Introducing KOllama: A Kotlin-First Ollama Client with Full Type Safety

If you're working with Ollama in a Kotlin project, you've probably wished for a client that feels truly native – one that embraces coroutines, sealed classes, and the type safety we all love.

That's exactly why I built KOllama – a Kotlin client for Ollama, powered by Ktor Client and designed to make local LLM interactions delightful.

In this post, I'll walk you through what makes KOllama different, show you how to get started, and share some of the design decisions behind it.


Why KOllama?

Ollama's REST API is simple, but using it from Kotlin can be messy:

  • You have to manually map fields (like prompt_eval_count to something readable).
  • Streaming responses require manual SSE handling.
  • There's no built‑in type safety – you're dealing with raw JSON or Maps.

KOllama changes that by providing:

Full type safety – every request and response is a Kotlin data class.

Kotlin‑first API – suspend functions, Flow for streaming, DSL builders, and sensible defaults.

Semantic naming – fields like evaluatedInputTokens instead of prompt_eval_count.

Built on Ktor – you get all the power of Ktor's HTTP client (engine switching, logging, timeouts) with zero extra bloat.

Easy customisation – pass your own HttpClient or configure the engine directly.


Quick Start

Add the JitPack repository and dependency:

// settings.gradle.kts or build.gradle.kts
repositories {
    mavenCentral()
    maven { url = uri("https://jitpack.io") }
}

dependencies {
    implementation("com.github.BlophyNova:kollama:main-SNAPSHOT") // or a specific commit hash
}
Enter fullscreen mode Exit fullscreen mode

Now you can start chatting with your local models:

val client = KOllamaClient() // defaults to http://localhost:11434

suspend fun main() {
    // Generate text
    val response = client.generate(
        model = "llama3",
        prompt = "Why is Kotlin awesome?",
        system = "You are a helpful assistant."
    )
    println(response.response)

    // Stream tokens
    client.generateFlow(
        model = "llama3",
        prompt = "Tell me a story"
    ).collect { chunk ->
        print(chunk.response)
    }
}
Enter fullscreen mode Exit fullscreen mode

A Peek Under the Hood

Type‑Safe Models, Clear Semantics

Ollama's API uses names like prompt_eval_count and num_ctx. These are fine if you already know the API inside out, but they're cryptic to everyone else.

In KOllama, fields are renamed to be both accurate and Kotlin‑idiomatic:

@Serializable
data class GenerateRequest(
    val model: String,
    val prompt: String,
    val images: List<String>? = null,
    @SerialName("system") val systemPrompt: String? = null,
    // ...
)

@Serializable
data class GenerateResponse(
    val model: String,
    val response: String,
    val done: Boolean,
    @SerialName("prompt_eval_count") val evaluatedInputTokens: Int,
    @SerialName("eval_count") val outputTokens: Int,
    // ...
)
Enter fullscreen mode Exit fullscreen mode

The original JSON field names are preserved via @SerialName, so the wire format stays compatible. You get clean Kotlin code without sacrificing interoperability.

Streaming with Flow

Ollama's streaming endpoints are a perfect match for Kotlin's Flow. KOllama returns a Flow of response chunks, so you can easily combine, transform, or collect them:

client.generateFlow(
    model = "mistral",
    prompt = "Write a poem about coroutines"
).collectIndexed { index, chunk ->
    println("[${index + 1}] ${chunk.response}")
}
Enter fullscreen mode Exit fullscreen mode

Rich Options DSL

Many models accept parameters like temperature, top_k, or num_ctx. Instead of forcing you to build a map, KOllama provides a type‑safe DSL:

val response = client.generate {
    model = "llama3"
    prompt = "What's the capital of France?"
    options {
        temperature = 0.8f
        topP = 0.9f
        contextSize = 2048
    }
}
Enter fullscreen mode Exit fullscreen mode

Under the hood, this builds a GenerateOptions object that's serialised into the correct JSON fields (top_p, num_ctx, etc.). Your IDE gives you autocompletion and type checks – no magic strings involved.

Multi‑Modal Support

KOllama natively supports sending images alongside text:

val response = client.generate {
    model = "llava"
    prompt = "What's in this picture?"
    images(encodeImageToBase64(myImageFile))
}
Enter fullscreen mode Exit fullscreen mode

More Than Just Generate

KOllama covers the entire Ollama API:

// List models
val models = client.listModels()

// Pull a model
client.pullModel("mistral")

// Embed text
val embedding = client.embeddings(
    model = "nomic-embed-text",
    prompt = "Kotlin is fantastic"
)

// Chat with tool support
val chatResponse = client.chat {
    model = "llama3.1"
    message {
        role = ChatRole.User
        content = "What's the weather in Paris?"
    }
    tool {
        name = "get_weather"
        description = "Get the current weather for a location"
    }
}
Enter fullscreen mode Exit fullscreen mode

Every endpoint gets the same type‑safe treatment. No guessing, no magic strings.


Why Not the Alternatives?

vs. Ollama4j

Ollama4j is the most popular JVM client for Ollama, but it's Java‑first – and it shows.

Setting options requires a Map:

// Ollama4j
OllamaAPI api = new OllamaAPI("http://localhost:11434");

OllamaStreamResult result = api.generateWithStreaming(
    "llama3", "Hello", "",
    new StreamHandler() {
        @Override
        public void handle(String token) {
            System.out.print(token);  // callback hell
        }
    }
);
Enter fullscreen mode Exit fullscreen mode

With KOllama:

client.generateFlow("llama3", "Hello").collect { print(it.response) }
Enter fullscreen mode Exit fullscreen mode

Streaming in Ollama4j uses a callback interface – fine for Java, but not idiomatic in Kotlin. KOllama returns a Flow, so you can use all of Kotlin's flow operators:

client.generateFlow {
    model = "mistral"
    prompt = "Write a poem"
}.map { it.response }
 .filter { it.isNotBlank() }
 .collect { print(it) }
Enter fullscreen mode Exit fullscreen mode

vs. nirmato-ollama

nirmato-ollama is a Kotlin Multiplatform library, so at first glance it looks like a natural fit. But the API design tells a different story – it's KMP, not Kotlin‑first:

// nirmato-ollama
val client = OllamaClient(CIO) {
    httpClient {
        defaultRequest {
            url("http://localhost:11434/api/")  // manually appending /api/
        }
    }
}
val request = chatRequest {
    model("llama3")                             // function call, not property assignment
    messages(listOf(Message(role = USER, content = "Hello")))  // manual listOf wrapping
    options(Options(temperature = 0.7))         // separate Options object
    stream(true)                                // stream is a function too
}
client.chatStream(request).collect { chunk ->
    chunk.message?.content?.let { print(it) }  // two levels of safe calls just to get text
}
Enter fullscreen mode Exit fullscreen mode

With KOllama:

val client = KOllamaClient()
client.chatFlow {
    model = "llama3"
    message {
        role = ChatRole.User
        content = "Hello"
    }
    options { temperature = 0.7f }
}.collect { print(it.message?.content) }
Enter fullscreen mode Exit fullscreen mode

The difference isn't just fewer lines – it's that KOllama's API reads like Kotlin, while nirmato's reads like a Java builder that learned Kotlin syntax last week.

Summary

Ollama4j nirmato-ollama KOllama
Language Java Kotlin (KMP) Kotlin (JVM)
Streaming Callback Flow (awkward) Flow (idiomatic)
DSL ⚠️ Half-baked
Type safety ⚠️ Map-based options ⚠️
Kotlin code style ⚠️

Design Philosophy

KOllama is built on three principles:

Kotlin first – not just "works in Kotlin", but designed for Kotlin. Suspend functions, Flow-based streaming, named parameters, default values, and DSL builders are first-class citizens, not afterthoughts.

Type safety over convenience – if the API allows multiple shapes for a field, it's modelled with sealed classes. You'll never get a ClassCastException at runtime.

Semantic clarity – field names should tell you what they mean. evaluatedInputTokens is much clearer than prompt_eval_count; contextSize is clearer than num_ctx. The original wire names are preserved via @SerialName, so you get clean code without losing compatibility.


What's Next?

KOllama is still in early development, but the core generate and chat APIs are already working. The immediate roadmap:

  • ✅ Add comprehensive tests using Ktor's MockEngine
  • ✅ Publish to Maven Central
  • ✅ Write detailed documentation and more examples

Full documentation is available there, including API reference, all DSL options, and more examples.

The project is open source on GitHub: BlophyNova/kollama. Feedback and contributions are very welcome.

  • ⭐ Star the repo to show your interest
  • 🐛 Try it out and report issues
  • 💡 Suggest improvements or new features
  • 🔧 Submit a PR – there's plenty of low‑hanging fruit

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.