Taylor

Posted on May 31

Anti Refusal LLM Service

I Built a 12MB Desktop App for Running Uncensored AI Models Locally (Tauri + Rust + Ollama) published: true description: How I built Cerberus AI — a local-first desktop app that auto-detects your GPU, pulls the right model quantization, and gives you uncensored AI chat without sending a single prompt to the cloud. Every major language model ships with an alignment layer that refuses certain prompts. Sometimes that's reasonable. Sometimes you're a security researcher, a creative writer, or just someone who doesn't want a corporation deciding what questions you're allowed to ask.

I built Cerberus AI to fix that — and to make the whole experience local-first, lightweight, and dead simple to install.

What Is Cerberus AI?
Cerberus AI is a platform for running open-weight, refusal-ablated language models on your own hardware. It has three parts:

A native desktop app (~12 MB) built with Tauri + Rust — not Electron
Open-weight GGUF models hosted on a public CDN
An OpenAI-compatible managed API for when you don't want to run local
The desktop app integrates directly with Ollama, auto-detects your GPU VRAM, and recommends the right model quantization for your hardware. From 4 GB laptops to 24 GB workstations, it just works.

Cerberus AI Desktop Chat

What Is Refusal Ablation?
This is the core technical innovation behind Cerberus models. Here's the short version:

Language models learn a refusal direction in their activation space during alignment training. When a prompt triggers this direction, the model produces refusal text ("I can't help with that") regardless of whether the underlying model actually lacks the knowledge.

Refusal ablation surgically removes this direction from the model weights. The technique:

Identifies the refusal direction vector in the model's residual stream
Projects it out of the weight matrices
Preserves all other reasoning capabilities
The result is a model that treats every prompt equally. No refusals. No moralizing. Just direct, unfiltered output from the model's actual knowledge.

We apply this to multiple base architectures:

Model Base Parameters Use Case
Cerberus 4B v2 Qwen 3.5 4B General purpose, fits on 4-8 GB GPU
Arbiter GL9b GLM-4 9B Heavier reasoning, needs 6+ GB
Gamma3 1B BDPO Custom 1B Edge devices, CPU-only inference
All models are distributed as GGUF files — the same format llama.cpp uses. Download once, run anywhere.

Why Tauri Instead of Electron?
Electron bundles an entire Chromium browser. That's 150+ MB just for the runtime. For a chat app that talks to a local Ollama instance, that's absurd.

Tauri uses your system's existing WebView (WebView2 on Windows) and writes the backend in Rust. The result:

~12 MB installer vs 150+ MB for Electron
Native performance — the Rust backend handles hardware detection, Ollama management, and streaming
Lower memory footprint — no spare Chrome process eating your RAM
System-level integration — direct access to GPU info, filesystem, and process management
The frontend is Vue 3 + Vite, but the heavy lifting (VRAM detection, model management, MCP plugin orchestration) all happens in Rust.

Install in 30 Seconds
On Windows, open PowerShell and run:

powershell

irm https://cerberusai.dev/get | iexThis single command:

Checks for (and installs) WebView2 if missing
Installs Ollama if missing
Detects your GPU VRAM
Downloads the recommended model quantization
Installs the Cerberus desktop app
Launches it
You paste your API key (free to generate at access.cerberusai.dev) and you're chatting locally.

The API — OpenAI-Compatible
Don't want to run local? Use the managed API. It follows the OpenAI chat completions shape, so any existing OpenAI SDK code works with a one-line change:

python

from openai import OpenAI
client = OpenAI(
    base_url="https://api.cerberusai.dev/v1",
    api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
    model="cerberus-4b-v2-abliterated",
    messages=[
        {"role": "system", "content": "You are Cerberus, an unfiltered AI."},
        {"role": "user", "content": "Explain how refusal ablation works"}
    ],
    stream=True
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Streaming via SSE, standard error codes (401, 402, 429), and a public model CDN at llm.cerberusai.dev that's fully CORS-enabled — you can even fetch model metadata from browser-based apps.

curl Example
bash

curl -X POST https://api.cerberusai.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerberus-4b-v2-abliterated",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false
  }'

Model Downloads — Public CDN
All GGUF model files are hosted on llm.cerberusai.dev with a public JSON API:

bash

List all available models

curl https://llm.cerberusai.dev/api/models/

List files for a specific model (with exact byte sizes)

curl https://llm.cerberusai.dev/api/models/cerberus-4b-v2-abliterated/

Download a specific quantization

wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_K_M.gguf

Resume interrupted downloads

wget -c https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q8_0.gguf
Range requests are supported, CORS is enabled for all origins, and GGUF files are served with proper Content-Disposition: attachment headers.

Built-In Features
Beyond chat, the desktop app includes:

Model Manager — browse local Ollama models, pull from the Cerberus cloud catalog, import raw GGUF files, switch active models, see disk usage
MCP Plugin System — browse and install Model Context Protocol plugins from inside the app. There's also a public MCP Skills Server at api.cerberusai.dev/skills-sse
Hardware Monitoring — CPU, RAM, and VRAM activity displayed in the interface
Zero Telemetry — no prompts leave your machine during local inference. No analytics. No phone-home.
Pricing
Every account gets 50,000 free monthly credits. That's enough for casual use and testing.

If you need more:

Plan Price Monthly Credits
Free $0 50,000
Lite $8/mo 300,000
Mid $15/mo 900,000
Exp $22/mo 2,000,000
One-time top-ups start at $5 (125,000 credits). Stripe and PayPal supported. The free tier has no time limit — it refreshes every month.

Local inference through Ollama costs zero credits. Credits only apply to the managed API.

Try It
🌐 Website: cerberusai.dev
📦 GitHub: github.com/tjcrims0nx/CerberusAI-Desktop
🧠 Models: llm.cerberusai.dev
📖 API Docs: cerberusai.dev/docs/api
💬 Discord: discord.gg/YdVj7hEtv5
🔑 Get API Key: access.cerberusai.dev
If you've ever been frustrated by a language model refusing a perfectly reasonable prompt, or if you just want to run AI locally without cloud dependencies — give Cerberus a try. The install is one command, the free tier is permanent, and the weights are open.

I'd love to hear feedback. Drop into the Discord or open an issue on GitHub.

Cerberus AI is an open-weight project. The desktop app source is on GitHub. Models are distributed as GGUF under open licenses. The managed API is a pay-as-you-go service.****

DEV Community

Anti Refusal LLM Service

List all available models

List files for a specific model (with exact byte sizes)

Download a specific quantization

Resume interrupted downloads

Top comments (0)