ChrisRemo

Posted on Mar 24 • Edited on Apr 1 • Originally published at voidllm.ai

Why I built a privacy-first LLM proxy

#llm #go #selfhosted #proxy

Every LLM gateway I evaluated had the same problem: they logged my prompts.

I'd spin up a proxy, route my team's requests through it, check the dashboard - and there they were. Full request bodies, full response bodies, sitting in someone's database. Sometimes on someone else's infrastructure.

For a lot of use cases that's fine. But when you're working with customer data, internal documents, or anything remotely sensitive, "we store everything by default" isn't a feature. It's a liability.

The problem

I needed something simple:

Route LLM requests through a single endpoint
Manage API keys so developers don't share raw provider keys
Track who's using what, how much it costs
Never touch the actual prompts

Sounds basic, right? But every solution I found either:

Logged everything - full request/response bodies in their database
Charged per-request - on top of what I'm already paying the provider
Were way too complex - sprawling microservice architectures, dozens of config files, hours of setup for something that should take minutes

I just wanted a proxy. A dumb pipe with access control and a dashboard.

So I built one

VoidLLM is a single Go binary that sits between your apps and LLM providers. It's OpenAI-compatible - change your base URL, keep your SDK.

docker run -p 8080:8080 \
  -e VOIDLLM_ADMIN_KEY=$(openssl rand -hex 32) \
  -e VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32) \
  ghcr.io/voidmind-io/voidllm:latest

That's it. SQLite database, admin UI, API - all in a 63MB Docker image.

What it does

Access control: Org → Team → Key hierarchy with 4 RBAC roles. Create API keys per user, per team, or per service account. Keys are HMAC-SHA256 hashed — we never store the raw key.

Usage tracking: Every request logs who made it, which model, how many tokens, how long it took, what it cost. No prompt content. No response content. Just metadata.

Rate limiting: Per org, per team, per key. Token budgets (daily/monthly) and request limits (per minute/per day). In-memory for single instance, Redis for distributed.

Provider adapters: OpenAI (passthrough), Anthropic (full message format translation), Azure OpenAI (URL mapping), Ollama, vLLM, or any OpenAI-compatible endpoint.

What it doesn't do

It never sees your prompts. This isn't a toggle. There's no "enable content logging" option. The proxy reads the request body only to extract the model field for routing, then passes it through. Request and response bodies exist in memory for the duration of the request - they're never written to disk, never logged, never stored anywhere.

This is a hard architectural constraint, not a policy. You can audit the code - there's no code path that persists content.

The privacy angle

I keep emphasizing this because it matters more than people think.

If you're in the EU, GDPR applies to LLM prompts that contain personal data. If your proxy logs those prompts, congratulations - you now have a data processing operation that needs a legal basis, a retention policy, a deletion mechanism, and probably a DPIA.

Or you could just... not log them.

VoidLLM is GDPR-compliant by architecture. There's nothing to delete because there's nothing stored. The DPO's favorite proxy.

Tech decisions

A few choices that might be interesting if you're building something similar:

Go + Fiber v3: Fiber runs on fasthttp, not net/http. The proxy overhead is under 2ms. For a pass-through proxy where every millisecond counts (especially with streaming), this matters.

SQLite default: Zero dependencies to get started. modernc.org/sqlite is pure Go — no CGO, no shared libraries. For single-instance deployments (which is most people), it just works. PostgreSQL is there when you need to scale.

Embedded UI: The React admin dashboard is compiled into the Go binary via embed.FS. No separate frontend deployment, no CORS, no reverse proxy config. One binary, one port.

HMAC-SHA256 for API keys: Not bcrypt. The auth check is on the hot path of every proxy request. HMAC with a server-side secret gives O(1) lookup with constant-time comparison. Bcrypt would add 100ms+ per request.

Ed25519 license keys: Enterprise features are gated by a signed JWT that's verified offline. No license server call on the hot path. Daily heartbeat refreshes the key in the background.

What's next

The project is still very early. Some things on the roadmap:

Model routing / fallback chains
More provider adapters
Documentation site

If you're running LLMs (self-hosted or managed) and need access control without the prompt logging, give it a try:

github.com/voidmind-io/voidllm

I'm looking for early adopters. If you're willing to test it in your setup and share honest feedback, I'll give you a free Enterprise license.
Open an issue or reach out directly — I want to know what breaks, what's missing, and what you'd actually need before you'd trust this in production.

It's BSL 1.1 licensed — source available, self-hosting permitted. Converts to Apache 2.0 after 4 years.

This project was built with significant assistance from AI (Claude by Anthropic).

DEV Community