The Problem: API Key Fragmentation Is Real
If you're building AI applications in 2026, you know the pain: 6 different API keys, 6 different billing dashboards, 6 different SDKs. Every time a new model drops, you spend hours integrating it.
I found a solution that changed my workflow: New API — an open-source AI API gateway that routes to 100+ models through a single OpenAI-compatible endpoint.
What Is New API?
New API is an open-source (AGPLv3) gateway that sits between your application and AI model providers. Think of it as a universal translator for AI APIs.
Key Features
- Single Endpoint: One OpenAI-compatible API routes to GPT-4o, Claude, Gemini, DeepSeek, Qwen, Llama — and any custom model
- Zero Markup: The managed version (aipossword.cn) charges $0 on top of model pricing
- Self-Hostable: Docker, 2 minutes. Full control.
- Auto Failover: If a model goes down, requests auto-route to the next best option
- Team Ready: RBAC, per-member keys, usage quotas
Quick Start (30 Seconds)
# Your existing OpenAI code — just change the base URL and model
curl https://api.aipossword.cn/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"Hello"}]}'
Switching Models: One Line of Code
This is where the magic happens. Want to compare GPT-4o vs Claude vs DeepSeek? Just change the model string:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_KEY",
base_url="https://api.aipossword.cn/v1"
)
# Try GPT-4o
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role":"user","content":"Hello"}]
)
# Now try Claude — same code, different model
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role":"user","content":"Hello"}]
)
Real-World Use Cases
- Cost Optimization: Route simple queries to cheap models (Qwen at $0.10/1M tokens) and complex ones to frontier models
- Multi-Provider Redundancy: Set up fallback chains — if OpenAI is down, auto-switch to Claude
- Team Billing: One invoice, per-member usage tracking, no more expense report nightmares
- Local + Cloud Hybrid: Route to your local Ollama instance for dev, fall back to cloud for production
Self-Hosted vs Managed
| Feature | Self-Hosted | Managed (aipossword.cn) |
|---|---|---|
| Setup | Docker, 2 min | Instant |
| Models | Bring your keys | Pre-configured |
| Billing | DIY | USD, Stripe |
| Cost | Server costs | Model price + $0 |
Why I Recommend It
I've been using New API in production for a few weeks. The auto-failover has saved me twice when providers went down. The zero-markup pricing means I'm not paying extra for convenience — I pay exactly what the model costs.
The open-source nature (AGPLv3) gives me confidence. I can audit the code, self-host if I want, and never worry about vendor lock-in.
Get Started
- Self-host:
docker run calciumion/new-api:latest - Managed: aipossword.cn — $5 free credits
- GitHub: github.com/QuantumNous/new-api (37k+ stars)
One endpoint. Every model. Zero friction.
Have you tried API gateways for AI models? What's your setup? Let me know in the comments!
Top comments (1)
This architecture solves a massive integration headache, though it brings a set of hidden observability challenges we’ve covered in earlier threads.
Exposing 100+ heterogeneous LLMs under one unified endpoint drastically simplifies client integration, but it amplifies cost and monitoring blind spots. Without strict per-model routing tagging, baseline tracking breaks entirely — config tweaks to routing weights, fallback logic or priority tiers will silently shift monthly spend, and most dashboards won’t trigger automatic recalibration for those changes.
Another critical pain point: blast-radius alert tiering becomes far harder to enforce at scale. Every team will argue their target model workload deserves an exception to standardized severity rules, gradually eroding your entire alert framework. Also, paired versioning between meta-evaluators and individual models becomes unwieldy when managing this many variants; unaligned evaluator drift will corrupt all your unified quality signals.
Curious how you handle two core pieces:
1.Automated baseline resets whenever routing configs are updated;
2.Mandatory evidence review for teams requesting custom alert exceptions for specific models.