Run AI Models On-Device — Zero Config, Five Minutes

Glenn Sonna — Mon, 06 Apr 2026 13:55:40 +0000

You already know why on-device AI matters. Privacy, latency, cost. You've read the guides.

Now you want to actually do it. Here's what that looks like with Xybrid — no tensor shapes, no preprocessing scripts, no ML expertise.

Install

# macOS / Linux
curl -sSL https://raw.githubusercontent.com/xybrid-ai/xybrid/master/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/xybrid-ai/xybrid/master/install.ps1 | iex

Text-to-Speech

xybrid run --model kokoro-82m --input "Hello from the edge" --output hello.wav

That's it. Xybrid resolved the model from the registry, downloaded it, ran inference, and saved a WAV file. You configured nothing.

Kokoro is an 82M parameter TTS model with 24 voices. First run downloads ~80MB and caches it locally. Subsequent runs are instant.

Speech Recognition

xybrid run --model whisper-tiny --input recording.wav

Whisper Tiny transcribes audio in real-time on any modern laptop. Outputs plain text.

Text Generation

xybrid run --model qwen3.5-0.8b --input "Explain quantum computing in one sentence"

Qwen 3.5 0.8B runs locally via llama.cpp. 201 languages, fits in 500MB quantized.

Browse the Registry

xybrid models list

25+ models, all hosted on HuggingFace, downloaded on-demand, cached locally:

Model	Task	Size	Notes
kokoro-82m	Text-to-Speech	82M	24 voices, high quality
kitten-tts-nano-0.8	Text-to-Speech	15M	Ultra-lightweight
qwen3-tts-0.6b	Text-to-Speech	600M	Multilingual
whisper-tiny	Speech Recognition	39M	Real-time, multilingual
wav2vec2-base-960h	Speech Recognition	95M	CTC-based
lfm2.5-350m	Text Generation	354M	9 languages, edge-optimized
smollm2-360m	Text Generation	360M	Best tiny LLM
qwen3.5-0.8b	Text Generation	800M	201 languages
gemma-4-e2b	Text Generation	5.1B	Multimodal
mistral-7b	Text Generation	7B	Function calling

Beyond the CLI

The CLI is the fastest way to evaluate. When you're ready to integrate into an app, Xybrid has SDKs for Flutter, Swift, Kotlin, Unity, and Rust — same models, same behavior, every platform.

Xybrid is in beta (v0.1.0-beta9), open-source under Apache 2.0.

GitHub: github.com/xybrid-ai/xybrid

Questions? Drop them in the comments — happy to help you get running.