Every time a new LLM comes out, someone on your team adds a new SDK,
a new API key in .env, and a new set of error handling logic. Repeat
for OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama...
I got tired of this. So I built llm-gateway.
What it does
llm-gateway is a single Go binary that sits between your app and every
LLM provider. Your code sends one request format (OpenAI-compatible),
and the gateway routes it to the right provider based on the model name.
# One endpoint for all providers
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "claude-sonnet-4-20250514", "messages": [{"role": "user", "content": "hello"}]}'
# Switch provider by changing the model name — zero code changes
curl http://localhost:8080/v1/chat/completions \
-d '{"model": "gpt-4o", "messages": [...]}'
# Local model — no API key needed
curl http://localhost:8080/v1/chat/completions \
-d '{"model": "llama3.2", "messages": [...]}'
Model routing is automatic:
-
claude-*→ Anthropic -
gpt-*,o1,o3→ OpenAI -
gemini-*→ Google -
llama*→ Ollama or Groq -
mistral-*→ Mistral -
grok-*→ xAI -
sonar-*→ Perplexity -
command-*→ Cohere
Install in 30 seconds
docker run -p 8080:8080 -v gateway-data:/data scutontech/llm-gateway
Open http://localhost:8080/admin → set your admin password → add API
keys from the Settings page. No .env files, no YAML editing.
Admin dashboard
The gateway ships with a full admin dashboard:
- Real-time stats — requests, tokens, latency, error rate, estimated cost
- Provider breakdown — which providers you're actually using
- Cost analytics — daily/monthly spend by model, with CSV export
- Request log — last 50 requests with model, provider, tokens, cost, latency
- Dark mode — because of course
Streaming support
SSE streaming works for all providers. The gateway normalizes
Anthropic's stream format to OpenAI's SSE format transparently.
curl http://localhost:8080/v1/chat/completions \
-d '{"model": "claude-sonnet-4-20250514", "stream": true, "messages": [...]}'
Supported providers
| Provider | Models |
|---|---|
| Anthropic | claude-opus-4, claude-sonnet-4, claude-haiku-4 |
| OpenAI | gpt-4o, gpt-4o-mini, o1, o3-mini |
| gemini-2.0-flash, gemini-1.5-pro | |
| Groq | llama-3.3-70b, mixtral-8x7b |
| Mistral | mistral-large, codestral |
| Cohere | command-r-plus, command-r |
| xAI | grok-2, grok-2-mini |
| Perplexity | sonar-large, sonar-small |
| Together AI | 50+ open source models |
| Ollama | any local model |
| LM Studio | any local model |
| vLLM | any hosted model |
Why Go?
~15MB binary. Under 100ms cold start. ~20MB memory at idle.
Drop it on a $5 VPS and forget about it.
Try it
- GitHub: https://github.com/scuton-technology/llm-gateway
-
Docker:
docker pull scutontech/llm-gateway
MIT licensed. PRs welcome.
Top comments (0)