Every desktop AI tool I've used — Cursor, Claude Desktop, Windsurf, Continue — has the same limitation: one API endpoint, one provider. If that provider goes down, your tool stops.
This isn't a theoretical problem. In the past three months, I've seen DeepSeek go down twice, OpenAI have multi-hour outages, and various providers return 5xx errors during peak hours.
The standard advice is "use OpenRouter" or "deploy LiteLLM." But:
- OpenRouter means your API traffic goes through a third party
- LiteLLM requires Docker (200MB+), which is overkill for a desktop tool
- Manual proxy requires DevOps skills most desktop users don't have
So I built a 73-line solution. Here's the journey.
The Problem: Model Names Are Not Portable
My setup: DeepSeek as primary, KIMI as fallback. Simple, right?
# First attempt: just change the base URL
deepseek_url = "https://api.deepseek.com/v1/chat/completions"
kimi_url = "https://api.moonshot.cn/v1/chat/completions"
Nope. DeepSeek's model deepseek-chat doesn't exist on KIMI. KIMI calls its equivalent moonshot-v1-128k. The request returns 404.
This is the model name mapping problem — and it affects every multi-provider setup. GPT-4o on OpenAI becomes openai/gpt-4o on OpenRouter. Claude Sonnet doesn't exist on Groq. Every provider has its own naming scheme, and failover tools that ignore this will silently break.
The Solution: Sequential Failover With Model Mapping
The architecture is dead simple:
Client → POST /v1/chat/completions → local-gateway
├── try DeepSeek → success → forward response
└── DeepSeek fails → map model name → try KIMI → forward
Critical invariant: connect to the upstream before sending HTTP 200 to the client. If the first provider fails, try the next transparently. The client never sees a partial response.
# Pure urllib, zero dependencies
for provider in providers:
try:
upstream = urllib.request.urlopen(
urllib.request.Request(url, data=data, headers=headers)
)
# Connected! Now send 200 to client
self.send_response(200)
self.send_header("Content-Type", "text/event-stream")
self.end_headers()
# Forward SSE chunks
while True:
chunk = upstream.read(4096)
if not chunk: break
self.wfile.write(chunk)
return
except HTTPError:
continue # try next provider
# All failed → 502
That's the core. 73 lines total.
Why This Matters for Desktop AI Users
Desktop AI tools are becoming the standard way developers interact with LLMs. Cursor alone has millions of users. But these tools have a single-provider dependency that creates three risks:
- Availability: Your provider goes down → your tool stops working
- Rate limits: One account, one set of rate limits
- Cost optimization: No way to route cheap vs expensive models
A local gateway solves all three. And because it runs on 127.0.0.1 with zero external dependencies, there's no data leakage, no additional latency, no Docker overhead.
The Result
I packaged this into a pip-installable tool:
pip install local-gateway
export DEEPSEEK_API_KEY=sk-...
export KIMI_API_KEY=sk-...
local-gateway --providers deepseek,kimi
Then point any OpenAI-compatible client at http://127.0.0.1:18790/v1:
import openai
client = openai.OpenAI(
base_url="http://127.0.0.1:18790/v1",
api_key="not-needed",
)
The Node.js version is also available (npm install local-gateway) for Electron apps and VS Code extensions.
What's Next
-
Model mapping contributions: If you know the equivalent model names between providers, submit a PR. The mapping table is in
models.json. - Pro version: Dashboard, usage analytics, per-provider cost tracking
- Correctover integration: Verified failover — not just switching providers, but validating the response is correct
Try It
pip install local-gateway
local-gateway --providers deepseek,kimi
# Open http://127.0.0.1:18790/health
Or contribute model mappings at github.com/correctover/local-gateway.
Built by Correctover — verified failover for LLM APIs.
Top comments (0)