DEV Community

Cover image for I built a self-hosted LLM proxy that supports 12 providers (Claude, GPT-4o, Gemini, Ollama...)
Sabahattin Kalkan
Sabahattin Kalkan

Posted on

I built a self-hosted LLM proxy that supports 12 providers (Claude, GPT-4o, Gemini, Ollama...)

Every time a new LLM comes out, someone on your team adds a new SDK,
a new API key in .env, and a new set of error handling logic. Repeat
for OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama...

I got tired of this. So I built llm-gateway.

What it does

llm-gateway is a single Go binary that sits between your app and every
LLM provider. Your code sends one request format (OpenAI-compatible),
and the gateway routes it to the right provider based on the model name.

# One endpoint for all providers
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4-20250514", "messages": [{"role": "user", "content": "hello"}]}'

# Switch provider by changing the model name — zero code changes
curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "gpt-4o", "messages": [...]}'

# Local model — no API key needed
curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "llama3.2", "messages": [...]}'
Enter fullscreen mode Exit fullscreen mode

Model routing is automatic:

  • claude-* → Anthropic
  • gpt-*, o1, o3 → OpenAI
  • gemini-* → Google
  • llama* → Ollama or Groq
  • mistral-* → Mistral
  • grok-* → xAI
  • sonar-* → Perplexity
  • command-* → Cohere

Install in 30 seconds

docker run -p 8080:8080 -v gateway-data:/data scutontech/llm-gateway
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:8080/admin → set your admin password → add API
keys from the Settings page. No .env files, no YAML editing.

Admin dashboard

The gateway ships with a full admin dashboard:

  • Real-time stats — requests, tokens, latency, error rate, estimated cost
  • Provider breakdown — which providers you're actually using
  • Cost analytics — daily/monthly spend by model, with CSV export
  • Request log — last 50 requests with model, provider, tokens, cost, latency
  • Dark mode — because of course

Streaming support

SSE streaming works for all providers. The gateway normalizes
Anthropic's stream format to OpenAI's SSE format transparently.

curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "claude-sonnet-4-20250514", "stream": true, "messages": [...]}'
Enter fullscreen mode Exit fullscreen mode

Supported providers

Provider Models
Anthropic claude-opus-4, claude-sonnet-4, claude-haiku-4
OpenAI gpt-4o, gpt-4o-mini, o1, o3-mini
Google gemini-2.0-flash, gemini-1.5-pro
Groq llama-3.3-70b, mixtral-8x7b
Mistral mistral-large, codestral
Cohere command-r-plus, command-r
xAI grok-2, grok-2-mini
Perplexity sonar-large, sonar-small
Together AI 50+ open source models
Ollama any local model
LM Studio any local model
vLLM any hosted model

Why Go?

~15MB binary. Under 100ms cold start. ~20MB memory at idle.
Drop it on a $5 VPS and forget about it.

Try it

MIT licensed. PRs welcome.

Top comments (0)