DEV Community

LYX19951121
LYX19951121

Posted on

How to Route to 100+ AI Models with a Single API Endpoint

The Problem: API Key Fragmentation Is Real

If you're building AI applications in 2026, you know the pain: 6 different API keys, 6 different billing dashboards, 6 different SDKs. Every time a new model drops, you spend hours integrating it.

I found a solution that changed my workflow: New API — an open-source AI API gateway that routes to 100+ models through a single OpenAI-compatible endpoint.

What Is New API?

New API is an open-source (AGPLv3) gateway that sits between your application and AI model providers. Think of it as a universal translator for AI APIs.

Key Features

  • Single Endpoint: One OpenAI-compatible API routes to GPT-4o, Claude, Gemini, DeepSeek, Qwen, Llama — and any custom model
  • Zero Markup: The managed version (aipossword.cn) charges $0 on top of model pricing
  • Self-Hostable: Docker, 2 minutes. Full control.
  • Auto Failover: If a model goes down, requests auto-route to the next best option
  • Team Ready: RBAC, per-member keys, usage quotas

Quick Start (30 Seconds)

# Your existing OpenAI code — just change the base URL and model
curl https://api.aipossword.cn/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"Hello"}]}'
Enter fullscreen mode Exit fullscreen mode

Switching Models: One Line of Code

This is where the magic happens. Want to compare GPT-4o vs Claude vs DeepSeek? Just change the model string:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_KEY",
    base_url="https://api.aipossword.cn/v1"
)

# Try GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role":"user","content":"Hello"}]
)

# Now try Claude — same code, different model
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role":"user","content":"Hello"}]
)
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

  1. Cost Optimization: Route simple queries to cheap models (Qwen at $0.10/1M tokens) and complex ones to frontier models
  2. Multi-Provider Redundancy: Set up fallback chains — if OpenAI is down, auto-switch to Claude
  3. Team Billing: One invoice, per-member usage tracking, no more expense report nightmares
  4. Local + Cloud Hybrid: Route to your local Ollama instance for dev, fall back to cloud for production

Self-Hosted vs Managed

Feature Self-Hosted Managed (aipossword.cn)
Setup Docker, 2 min Instant
Models Bring your keys Pre-configured
Billing DIY USD, Stripe
Cost Server costs Model price + $0

Why I Recommend It

I've been using New API in production for a few weeks. The auto-failover has saved me twice when providers went down. The zero-markup pricing means I'm not paying extra for convenience — I pay exactly what the model costs.

The open-source nature (AGPLv3) gives me confidence. I can audit the code, self-host if I want, and never worry about vendor lock-in.

Get Started

One endpoint. Every model. Zero friction.


Have you tried API gateways for AI models? What's your setup? Let me know in the comments!

Top comments (0)