DEV Community

Santosh Premi Adhikari
Santosh Premi Adhikari

Posted on

Building a private AI desktop app with Rust, Tauri, and llama.cpp

Most AI chat apps are either web-only — so data leaves the machine — or Electron-based, with a heavy Chromium bundle. KathaGPT is a different approach: a desktop app built for fully local use, a small footprint, and a backend that stays on-device. The stack is Rust + Tauri + llama.cpp.

KathaGPT demo

What KathaGPT does

  • Download and run Llama, Mistral, Qwen, DeepSeek from inside the app — no terminal, no Ollama install
  • Chat offline once a model is downloaded
  • Optionally connect cloud providers (OpenRouter, OpenAI, Anthropic, Gemini, Perplexity) with bring-your-own-key
  • Keep chats, keys, and settings on disk — no account, no telemetry

Website: https://santoshpremi.github.io/KathaGPT/

Repo: https://github.com/santoshpremi/KathaGPT (MIT)


The old Node.js server is gone. One Rust core handles the API, storage, streaming, and the desktop shell.


1. Axum embedded in Tauri

The HTTP API runs inside the Tauri process on 127.0.0.1:17890 — loopback only, not exposed to the LAN.

In development, Vite serves the React UI on :5173 and proxies /api/local to the Rust server. In production, the built dist/ folder is loaded by the WebView and the API stays in-process.

Why this matters:

  • No separate daemon to manage
  • No Electron-style Chromium bundle
  • Native window + system tray from Tauri
  • Smaller installers, lower RAM than Electron apps

2. llama.cpp as a sidecar

The llama-server binary (~15 MB) is not shipped inside the installer. On first local model use, KathaGPT:

  1. Downloads the correct llama-server build from llama.cpp GitHub Releases
  2. Extracts it to the app data directory
  3. Reuses the cached binary on later launches

The sidecar exposes an OpenAI-compatible API at 127.0.0.1:11435. That means local and cloud models share the same streaming code path in stream.rs — no duplicate inference logic.

Model catalog lives in Rust (model_catalog.rs): 18 curated GGUF models with HuggingFace URLs, RAM requirements, and quant levels. Download progress streams over SSE so the UI can show a real progress bar.


3. Unified streaming for local + cloud


rust
match resolve_model_route(pool, model).await? {
    ModelRoute::Local { model } => {
        sidecar::ensure_running(&model).await?;
        stream_openai_compatible(
            "http://127.0.0.1:11435/v1/chat/completions",
            "local",
            &model,
            options,
        ).await?
    }
    ModelRoute::OpenRouter { slug } => {
        stream_openai_compatible(
            "https://openrouter.ai/api/v1/chat/completions",
            &key,
            &slug,
            options,
        ).await?
    }
    // Direct: OpenAI, Anthropic, Gemini, Perplexity ...
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)