Building a private AI desktop app with Rust, Tauri, and llama.cpp

#rust #llm #ai #opensource

Most AI chat apps are either web-only — so data leaves the machine — or Electron-based, with a heavy Chromium bundle. KathaGPT is a different approach: a desktop app built for fully local use, a small footprint, and a backend that stays on-device. The stack is Rust + Tauri + llama.cpp.

What KathaGPT does

Download and run Llama, Mistral, Qwen, DeepSeek from inside the app — no terminal, no Ollama install
Chat offline once a model is downloaded
Optionally connect cloud providers (OpenRouter, OpenAI, Anthropic, Gemini, Perplexity) with bring-your-own-key
Keep chats, keys, and settings on disk — no account, no telemetry

Website: https://santoshpremi.github.io/KathaGPT/

Repo: https://github.com/santoshpremi/KathaGPT (MIT)

The old Node.js server is gone. One Rust core handles the API, storage, streaming, and the desktop shell.

1. Axum embedded in Tauri

The HTTP API runs inside the Tauri process on 127.0.0.1:17890 — loopback only, not exposed to the LAN.

In development, Vite serves the React UI on :5173 and proxies /api/local to the Rust server. In production, the built dist/ folder is loaded by the WebView and the API stays in-process.

Why this matters:

No separate daemon to manage
No Electron-style Chromium bundle
Native window + system tray from Tauri
Smaller installers, lower RAM than Electron apps

2. llama.cpp as a sidecar

The llama-server binary (~15 MB) is not shipped inside the installer. On first local model use, KathaGPT:

Downloads the correct llama-server build from llama.cpp GitHub Releases
Extracts it to the app data directory
Reuses the cached binary on later launches

The sidecar exposes an OpenAI-compatible API at 127.0.0.1:11435. That means local and cloud models share the same streaming code path in stream.rs — no duplicate inference logic.

Model catalog lives in Rust (model_catalog.rs): 18 curated GGUF models with HuggingFace URLs, RAM requirements, and quant levels. Download progress streams over SSE so the UI can show a real progress bar.

3. Unified streaming for local + cloud


rust
match resolve_model_route(pool, model).await? {
    ModelRoute::Local { model } => {
        sidecar::ensure_running(&model).await?;
        stream_openai_compatible(
            "http://127.0.0.1:11435/v1/chat/completions",
            "local",
            &model,
            options,
        ).await?
    }
    ModelRoute::OpenRouter { slug } => {
        stream_openai_compatible(
            "https://openrouter.ai/api/v1/chat/completions",
            &key,
            &slug,
            options,
        ).await?
    }
    // Direct: OpenAI, Anthropic, Gemini, Perplexity ...
}