I built a self-hosted LLM proxy that supports 12 providers (Claude, GPT-4o, Gemini, Ollama...)

#llm #go #opensource #ai

Every time a new LLM comes out, someone on your team adds a new SDK,
a new API key in .env, and a new set of error handling logic. Repeat
for OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama...

I got tired of this. So I built llm-gateway.

What it does

llm-gateway is a single Go binary that sits between your app and every
LLM provider. Your code sends one request format (OpenAI-compatible),
and the gateway routes it to the right provider based on the model name.

# One endpoint for all providers
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4-20250514", "messages": [{"role": "user", "content": "hello"}]}'

# Switch provider by changing the model name — zero code changes
curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "gpt-4o", "messages": [...]}'

# Local model — no API key needed
curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "llama3.2", "messages": [...]}'

Model routing is automatic:

claude-* → Anthropic
gpt-*, o1, o3 → OpenAI
gemini-* → Google
llama* → Ollama or Groq
mistral-* → Mistral
grok-* → xAI
sonar-* → Perplexity
command-* → Cohere

Install in 30 seconds

docker run -p 8080:8080 -v gateway-data:/data scutontech/llm-gateway

Open http://localhost:8080/admin → set your admin password → add API
keys from the Settings page. No .env files, no YAML editing.

Admin dashboard

The gateway ships with a full admin dashboard:

Real-time stats — requests, tokens, latency, error rate, estimated cost
Provider breakdown — which providers you're actually using
Cost analytics — daily/monthly spend by model, with CSV export
Request log — last 50 requests with model, provider, tokens, cost, latency
Dark mode — because of course

Streaming support

SSE streaming works for all providers. The gateway normalizes
Anthropic's stream format to OpenAI's SSE format transparently.

curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "claude-sonnet-4-20250514", "stream": true, "messages": [...]}'

Supported providers

Provider	Models
Anthropic	claude-opus-4, claude-sonnet-4, claude-haiku-4
OpenAI	gpt-4o, gpt-4o-mini, o1, o3-mini
Google	gemini-2.0-flash, gemini-1.5-pro
Groq	llama-3.3-70b, mixtral-8x7b
Mistral	mistral-large, codestral
Cohere	command-r-plus, command-r
xAI	grok-2, grok-2-mini
Perplexity	sonar-large, sonar-small
Together AI	50+ open source models
Ollama	any local model
LM Studio	any local model
vLLM	any hosted model

Why Go?

~15MB binary. Under 100ms cold start. ~20MB memory at idle.
Drop it on a $5 VPS and forget about it.

Try it

GitHub: https://github.com/scuton-technology/llm-gateway
Docker: docker pull scutontech/llm-gateway

MIT licensed. PRs welcome.

DEV Community