How to Route to 100+ AI Models with a Single API Endpoint

#opensource #api #ai #tutorial

The Problem: API Key Fragmentation Is Real

If you're building AI applications in 2026, you know the pain: 6 different API keys, 6 different billing dashboards, 6 different SDKs. Every time a new model drops, you spend hours integrating it.

I found a solution that changed my workflow: New API — an open-source AI API gateway that routes to 100+ models through a single OpenAI-compatible endpoint.

What Is New API?

New API is an open-source (AGPLv3) gateway that sits between your application and AI model providers. Think of it as a universal translator for AI APIs.

Key Features

Single Endpoint: One OpenAI-compatible API routes to GPT-4o, Claude, Gemini, DeepSeek, Qwen, Llama — and any custom model
Zero Markup: The managed version (aipossword.cn) charges $0 on top of model pricing
Self-Hostable: Docker, 2 minutes. Full control.
Auto Failover: If a model goes down, requests auto-route to the next best option
Team Ready: RBAC, per-member keys, usage quotas

Quick Start (30 Seconds)

# Your existing OpenAI code — just change the base URL and model
curl https://api.aipossword.cn/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"Hello"}]}'

Switching Models: One Line of Code

This is where the magic happens. Want to compare GPT-4o vs Claude vs DeepSeek? Just change the model string:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_KEY",
    base_url="https://api.aipossword.cn/v1"
)

# Try GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role":"user","content":"Hello"}]
)

# Now try Claude — same code, different model
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role":"user","content":"Hello"}]
)

Real-World Use Cases

Cost Optimization: Route simple queries to cheap models (Qwen at $0.10/1M tokens) and complex ones to frontier models
Multi-Provider Redundancy: Set up fallback chains — if OpenAI is down, auto-switch to Claude
Team Billing: One invoice, per-member usage tracking, no more expense report nightmares
Local + Cloud Hybrid: Route to your local Ollama instance for dev, fall back to cloud for production

Self-Hosted vs Managed

Feature	Self-Hosted	Managed (aipossword.cn)
Setup	Docker, 2 min	Instant
Models	Bring your keys	Pre-configured
Billing	DIY	USD, Stripe
Cost	Server costs	Model price + $0

Why I Recommend It

I've been using New API in production for a few weeks. The auto-failover has saved me twice when providers went down. The zero-markup pricing means I'm not paying extra for convenience — I pay exactly what the model costs.

The open-source nature (AGPLv3) gives me confidence. I can audit the code, self-host if I want, and never worry about vendor lock-in.

Get Started

Self-host: docker run calciumion/new-api:latest
Managed: aipossword.cn — $5 free credits
GitHub: github.com/QuantumNous/new-api (37k+ stars)

One endpoint. Every model. Zero friction.

Have you tried API gateways for AI models? What's your setup? Let me know in the comments!

DEV Community