noxlie

Posted on Jul 5 • Originally published at ai-privacy-tools.vercel.app

Setting Up SillyTavern With a Privacy-First AI API in 2026

#ai #privacy #tutorial #opensource

SillyTavern has become the standard frontend for AI roleplay and creative writing. But the default setup sends all your conversations to OpenAI — a company that logs everything for 30 days and uses your data for training.

If you care about what happens to your prompts, you need a different backend. Here's how I set up SillyTavern with a privacy-focused API that accepts crypto payments and doesn't require an account.

Why the Backend Matters

SillyTavern itself is just a frontend. It runs locally on your machine. But the moment you connect it to a cloud API, your conversation context — every message, every character card, every system prompt — gets sent to that provider's servers.

OpenAI stores this for 30 days. Anthropic has similar retention. Google's Gemini logs everything. If you're writing creative fiction, roleplaying, or discussing sensitive topics, that's a lot of personal data sitting on someone else's server.

The Setup (5 Minutes)

The API I'm using is OpenAI-compatible, which means SillyTavern supports it out of the box. No plugins, no hacks.

Get an API key from the provider's dashboard
In SillyTavern, go to API Connection → Chat Completion → Custom (OpenAI-compatible)
Enter the endpoint URL and your key
Select a model from the dropdown

That's it. The connection works because SillyTavern treats it like any OpenAI-compatible endpoint.

Model Selection for Creative Writing

Not all models are equal for creative tasks. Here's what I found after testing:

For prose quality: Claude 3.5 Sonnet is the best. It handles nuance, subtext, and character voice better than anything else. Costs about $0.003 to $0.01 per message.

For consistency: Llama 3 70B stays in character better than most proprietary models. It's also 3-10x cheaper.

For budget: DeepSeek V3 costs a fraction of a cent per message. Quality is lower, but for everyday use it's more than good enough.

For speed: Gemini 2.0 Flash responds in 1-3 seconds. Quality is lower, but for quick back-and-forth it's unbeatable.

The Privacy Angle

The provider I use claims a no-log policy. That's better than OpenAI (which logs everything), but "no logs" is a claim, not a guarantee. Here's what I do to layer privacy:

Pay with crypto (Monero for maximum privacy, Nano for speed)
Use a VPN when connecting to the API
Use a burner email for the account
For the most sensitive conversations, I run a local model through Ollama instead

No single solution is perfect. But combining these layers makes it significantly harder for anyone to connect your AI usage to your real identity.

What About Extensions?

SillyTavern extensions mostly work fine with OpenAI-compatible APIs. Text-to-speech, image generation, summarization, and vector storage all function normally. The only extensions that might not work are ones that rely on provider-specific features — but those are rare.

The Cost Reality

I use SillyTavern daily — about 30-50 messages. My monthly cost is around $3-5. The $8/month flat rate plan covers unlimited use of select models, which is the best deal if you're a heavy user.

Compare that to OpenAI's $20/month ChatGPT Plus subscription, which only gives you GPT-4. With a multi-model API, I get access to 400+ models for less money.

Presets That Work

Generation settings matter as much as model selection. Here are my tested starting points:

Claude: Temperature 0.8-1.0, Top-P 0.95, max tokens 800-1200
Llama 3: Temperature 0.7-0.9, Top-P 0.9, max tokens 600-1000, frequency penalty 0.1
DeepSeek: Temperature 0.8-1.0, Top-P 0.95, max tokens 500-800, frequency penalty 0.15

Temperature is the biggest quality lever — lower for consistency, higher for creativity.

I wrote a more detailed setup walkthrough with troubleshooting for common errors here: NanoGPT SillyTavern Setup Guide.

DEV Community