You know that feeling when you've just finished migrating your entire codebase from GPT-4 to Claude — rewriting API calls, fixing response parsing, updating streaming logic — and then Google drops a new Gemini with benchmarks that make everything else look like a calculator from 1995?
Yeah. I've been there. Multiple times.
After the third migration in six months, I sat down and thought: why am I rewriting integration code when the only thing that changes is which HTTP endpoint I'm hitting? The request is JSON. The response is JSON. The models all take messages in, text out. Why does switching providers feel like changing the engine on a moving car?
That's how I ended up looking at LM-Proxy — and it changed the way I architect LLM-powered applications. Let me walk you through what it is, why it matters, and how to set it up in under 5 minutes.
The Problem (or: Why Your LLM Integration Code Is a Ticking Time Bomb)
Here's the typical evolution of an LLM-powered project:
Month 1: "We'll just use OpenAI. One provider, one SDK, simple."
Month 3: "Claude is actually better for our summarization task. Let's add Anthropic too." You now have two different SDKs, two authentication flows, two response formats, and an if/else that haunts your dreams.
Month 5: "Gemini Flash is way cheaper for simple queries. Let's route the easy stuff there." Your routing logic now lives in three different files, you have provider-specific error handling everywhere, and your new developer spent two days just understanding which API key goes where.
Month 7: Someone asks "can we add a local model for sensitive data?" and you start updating your LinkedIn.
The root cause is simple: every LLM provider decided to invent their own API format, even though they all do fundamentally the same thing. OpenAI's format became the de facto standard, but Anthropic, Google, and others each have their own quirks, authentication schemes, and streaming protocols.
What you actually need is a reverse proxy — a single endpoint that speaks OpenAI's API format on the outside, but can talk to any provider on the inside.
Enter LM-Proxy
LM-Proxy is a lightweight, OpenAI-compatible HTTP proxy/gateway built with Python and FastAPI. Here's the pitch in one sentence:
Your app talks to one API (OpenAI format). LM-Proxy talks to everyone else.
It supports OpenAI, Anthropic, Google (AI Studio and Vertex AI), local PyTorch models, and anything that speaks OpenAI's API format — all through a single /v1/chat/completions endpoint. Requires Python 3.11+.
But "another LLM gateway" isn't what makes it interesting. Here's what does:
1. It's Truly Lightweight
No Kubernetes. No Redis. No Kafka. No 47-microservice architecture. It's a single Python package:
pip install lm-proxy
One TOML config file. One command to start:
lm-proxy
That's it. You can have it running on a $5/month VPS, inside a Docker container, or even embedded directly into your Python application as a library (since it's built on FastAPI, you can import and mount it as a sub-app). The entire core is minimal by design — no bloat, no enterprise-grade complexity for a problem that doesn't need it.
2. Configuration That Doesn't Require a PhD
Here's a complete, working config that routes GPT requests to OpenAI, Claude to Anthropic, and Gemini to Google:
host = "0.0.0.0"
port = 8000
[connections.openai]
api_type = "open_ai"
api_base = "https://api.openai.com/v1/"
api_key = "env:OPENAI_API_KEY"
[connections.anthropic]
api_type = "anthropic"
api_key = "env:ANTHROPIC_API_KEY"
[connections.google]
api_type = "google_ai_studio"
api_key = "env:GOOGLE_API_KEY"
[routing]
"gpt*" = "openai.*"
"claude*" = "anthropic.*"
"gemini*" = "google.*"
"*" = "openai.gpt-4o-mini" # fallback
[groups.default]
api_keys = ["my-team-api-key-1", "my-team-api-key-2"]
Read that config top to bottom. You understood it in 30 seconds, didn't you? No YAML indentation nightmares, no 200-line JSON blobs — just clean TOML with obvious semantics. (YAML, JSON, and Python config formats are also supported, by the way.)
The env: prefix pulls secrets from environment variables (or .env files), so your API keys never touch version control.
3. Pattern-Based Routing (The Killer Feature)
The [routing] section is where the magic happens. Keys are glob patterns that match against the model name your client sends. The .* suffix means "pass the model name as-is to the provider." So when your client asks for claude-sonnet-4-5-20250929, LM-Proxy forwards exactly that to Anthropic's API. No mapping tables, no model ID translation files — it just works.
You can also pin patterns to specific models:
[routing]
"custom*" = "local.llama-7b" # Any "custom*" request → local Llama
"gpt-3.5*" = "openai.gpt-3.5-turbo" # Pin to a specific model
"*" = "openai.gpt-4o-mini" # Everything else → cheap fallback
This means your client code never changes. Want to try a new model? Update one line in the config. Want to A/B test two providers? Add a routing rule. Want to deprecate a model? Redirect the pattern to something else. Zero code changes.
4. Virtual API Keys and Access Control
This is the feature that makes LM-Proxy production-ready rather than just a toy.
LM-Proxy maintains two layers of API keys:
- Virtual (Client) API Keys — what your users/services use to authenticate with the proxy
- Provider (Upstream) API Keys — the real API keys for OpenAI, Anthropic, etc., which stay hidden
# Premium users get everything
[groups.premium]
api_keys = ["premium-key-1", "premium-key-2"]
allowed_connections = "*"
# Free tier gets OpenAI only
[groups.free]
api_keys = ["free-key-1"]
allowed_connections = "openai"
# Internal tools get local models only
[groups.internal]
api_keys = ["internal-key-1"]
allowed_connections = "local"
Your upstream API keys are never exposed to clients. You can rotate them without updating any client configuration. You can create granular access tiers — premium users get Claude Opus, free users get GPT-4o-mini, internal tools use local models. All managed in one config file.
It even supports external authentication — you can validate virtual API keys against Keycloak, Auth0, or any OIDC provider:
[api_key_check]
class = "lm_proxy.api_key_check.CheckAPIKeyWithRequest"
method = "POST"
url = "http://keycloak:8080/realms/master/protocol/openid-connect/userinfo"
response_as_user_info = true
use_cache = true
cache_ttl = 60
[api_key_check.headers]
Authorization = "Bearer {api_key}"
Your existing OAuth tokens become LLM API keys automatically. And if the built-in validators don't fit, you can write a custom one — just a Python function that takes an API key string and returns a group name.
5. Full Streaming Support
SSE streaming works out of the box. Your clients get real-time token-by-token responses regardless of which provider is actually generating them. The proxy handles the format translation transparently — Anthropic's streaming format becomes OpenAI-compatible SSE events before they reach your client.
6. Use It as a Library, Not Just a Server
LM-Proxy isn't just a standalone service — it's also an importable Python package. Since it's built on FastAPI, you can mount it inside your existing application, use it in integration tests, or compose it with other ASGI middleware. No need to run a separate process if you don't want to.
Real-World Setup: 5 Minutes to Production
Let me walk through a concrete scenario. You're building a SaaS product that uses LLMs. You want GPT-4o for complex reasoning, Claude Sonnet for long-document processing, and Gemini Flash for cheap classification.
Step 1: Install
pip install lm-proxy
Step 2: Create .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AI...
Step 3: Create config.toml
host = "0.0.0.0"
port = 8000
[connections.openai]
api_type = "open_ai"
api_base = "https://api.openai.com/v1/"
api_key = "env:OPENAI_API_KEY"
[connections.anthropic]
api_type = "anthropic"
api_key = "env:ANTHROPIC_API_KEY"
[connections.google]
api_type = "google_ai_studio"
api_key = "env:GOOGLE_API_KEY"
[routing]
"gpt*" = "openai.*"
"claude*" = "anthropic.*"
"gemini*" = "google.*"
"*" = "google.gemini-2.0-flash"
[groups.backend]
api_keys = ["backend-service-key"]
allowed_connections = "*"
[groups.frontend]
api_keys = ["frontend-widget-key"]
allowed_connections = "google" # cheap models only
Step 4: Run
lm-proxy
Step 5: Use from your app
from openai import OpenAI
client = OpenAI(
api_key="backend-service-key",
base_url="http://localhost:8000/v1"
)
# Complex reasoning → GPT-4o
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Analyze this contract..."}]
)
# Long document → Claude
response = client.chat.completions.create(
model="claude-opus-4-6",
messages=[{"role": "user", "content": "Summarize this 100-page report..."}]
)
# Quick classification → Gemini Flash (cheap)
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Is this email spam? ..."}]
)
One client. One base URL. Three providers. And tomorrow, when someone releases a model that's 50% cheaper, you change one line in config.toml and restart the server. Your application code stays the same.
How It Compares to LiteLLM
The obvious question: "How is this different from LiteLLM?"
Let's be honest — LiteLLM is the 800-pound gorilla in this space. With 33k+ GitHub stars, 2000+ supported LLMs, an AWS Marketplace listing, and features like spend tracking dashboards, guardrails, caching, rate limiting, SSO, and MCP integration, it's a full-blown enterprise platform. It also offers both a Python SDK and an HTTP proxy server, so architecturally it covers similar ground.
So why would you choose LM-Proxy instead?
The same reason you'd choose Flask over Django, or SQLite over PostgreSQL. Not everything needs the full enterprise stack.
| LM-Proxy | LiteLLM Proxy | |
|---|---|---|
| Philosophy | Minimal core, extend as needed | Batteries included |
| Setup complexity |
pip install lm-proxy + one TOML file |
pip install 'litellm[proxy]' + YAML + optional DB |
| Dependencies | FastAPI + MicroCore (lightweight) | Heavier dependency tree |
| Config format | TOML / YAML / JSON / Python | YAML |
| Embeddable as library | First-class use case | Supported but discouraged by docs |
| Virtual keys + groups | Built-in | Built-in (more advanced) |
| OIDC / Keycloak auth | Built-in | Built-in (more advanced: JWT validation, role mapping, SSO UI) |
| Spend tracking, guardrails, caching | Not built-in (extensible via add-ons) | Built-in |
| Admin UI | No | Yes |
| Supported providers | All major providers + local models / embedded inference | All major providers |
Choose LM-Proxy when: you want a lightweight, easy-to-embed proxy with a tiny footprint, Python-config extensibility, and you don't need 90% of the enterprise features. It's ideal for small teams, personal projects, or when you want a gateway you can fully understand by reading the source in an afternoon.
Choose LiteLLM when: you need enterprise-grade spend tracking, a management UI, dozens of integrations, guardrails, caching, and support for 100+ providers and integrations out of the box.
Other alternatives like Portkey lean even more enterprise. LM-Proxy intentionally occupies the "just enough gateway" niche — powerful enough for production, simple enough that your config file is the documentation.
What's Already There (and What I'd Love to See Next)
Credit where it's due — LM-Proxy already covers more ground than you might expect from a "lightweight" tool:
-
Structured logging — a pluggable logger system with
JsonLogWriterandLogEntryTransformer(tracks tokens, duration, group, connection, remote address), plus the lm-proxy-db-connector add-on for writing logs to PostgreSQL, MySQL, SQLite, and other databases via SQLAlchemy - Load balancing — there's an example configuration that distributes requests randomly across multiple LLM servers using the Python config format
- Request handlers — a middleware-like system for intercepting requests before they reach upstream providers, enabling cross-cutting concerns like auditing and header manipulation
- Vertex AI support — Google Cloud's Vertex AI is supported alongside the simpler AI Studio API, with a dedicated config example
That said, the project is still evolving (latest release: v3.0.0), and a few things remain on my personal wishlist:
- Usage analytics dashboard — the logging and DB infrastructure is solid, but a built-in UI for visualizing spend and usage would be the cherry on top
-
Wildcard model expansion — the
expand_wildcardsmode for/v1/modelsis planned but not yet implemented — for now you need to list models explicitly in the routing config - Automatic provider failover — if OpenAI returns 5xx, automatically reroute to Anthropic. Load balancing across instances of the same provider is there, but cross-provider failover would complete the picture
The extensibility-by-design philosophy means most of these can be added as add-ons without touching the core — and the DB connector already demonstrates this pattern nicely. The codebase is MIT licensed and small enough to read end-to-end in an afternoon.
TL;DR
If you're working with multiple LLM providers (or think you might in the future), stop writing provider-specific integration code. Set up LM-Proxy as a gateway, point all your services at it, and never think about API format differences again.
pip install lm-proxy
Have you tried LM-Proxy or a similar LLM gateway? What's your approach to multi-provider LLM integration? Share your experience in the comments.
Top comments (0)