The Problem: API Key Fragmentation Is Real
If you're building AI applications in 2026, you know the pain: 6 different API keys, 6 different billing dashboards, 6 different SDKs. Every time a new model drops, you spend hours integrating it.
I found a solution that changed my workflow: New API — an open-source AI API gateway that routes to 100+ models through a single OpenAI-compatible endpoint.
What Is New API?
New API is an open-source (AGPLv3) gateway that sits between your application and AI model providers. Think of it as a universal translator for AI APIs.
Key Features
- Single Endpoint: One OpenAI-compatible API routes to GPT-4o, Claude, Gemini, DeepSeek, Qwen, Llama — and any custom model
- Zero Markup: The managed version (aipossword.cn) charges $0 on top of model pricing
- Self-Hostable: Docker, 2 minutes. Full control.
- Auto Failover: If a model goes down, requests auto-route to the next best option
- Team Ready: RBAC, per-member keys, usage quotas
Quick Start (30 Seconds)
# Your existing OpenAI code — just change the base URL and model
curl https://api.aipossword.cn/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"Hello"}]}'
Switching Models: One Line of Code
This is where the magic happens. Want to compare GPT-4o vs Claude vs DeepSeek? Just change the model string:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_KEY",
base_url="https://api.aipossword.cn/v1"
)
# Try GPT-4o
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role":"user","content":"Hello"}]
)
# Now try Claude — same code, different model
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role":"user","content":"Hello"}]
)
Real-World Use Cases
- Cost Optimization: Route simple queries to cheap models (Qwen at $0.10/1M tokens) and complex ones to frontier models
- Multi-Provider Redundancy: Set up fallback chains — if OpenAI is down, auto-switch to Claude
- Team Billing: One invoice, per-member usage tracking, no more expense report nightmares
- Local + Cloud Hybrid: Route to your local Ollama instance for dev, fall back to cloud for production
Self-Hosted vs Managed
| Feature | Self-Hosted | Managed (aipossword.cn) |
|---|---|---|
| Setup | Docker, 2 min | Instant |
| Models | Bring your keys | Pre-configured |
| Billing | DIY | USD, Stripe |
| Cost | Server costs | Model price + $0 |
Why I Recommend It
I've been using New API in production for a few weeks. The auto-failover has saved me twice when providers went down. The zero-markup pricing means I'm not paying extra for convenience — I pay exactly what the model costs.
The open-source nature (AGPLv3) gives me confidence. I can audit the code, self-host if I want, and never worry about vendor lock-in.
Get Started
- Self-host:
docker run calciumion/new-api:latest - Managed: aipossword.cn — $5 free credits
- GitHub: github.com/QuantumNous/new-api (37k+ stars)
One endpoint. Every model. Zero friction.
Have you tried API gateways for AI models? What's your setup? Let me know in the comments!
Top comments (0)