DEV Community

FastAnchor_io
FastAnchor_io

Posted on

How to Route to 100+ AI Models with a Single API Endpoint

The Problem: API Key Fragmentation Is Real

If you're building AI applications in 2026, you know the pain: 6 different API keys, 6 different billing dashboards, 6 different SDKs. Every time a new model drops, you spend hours integrating it.

I found a solution that changed my workflow: New API — an open-source AI API gateway that routes to 100+ models through a single OpenAI-compatible endpoint.

What Is New API?

New API is an open-source (AGPLv3) gateway that sits between your application and AI model providers. Think of it as a universal translator for AI APIs.

Key Features

  • Single Endpoint: One OpenAI-compatible API routes to GPT-4o, Claude, Gemini, DeepSeek, Qwen, Llama — and any custom model
  • Zero Markup: The managed version (aipossword.cn) charges $0 on top of model pricing
  • Self-Hostable: Docker, 2 minutes. Full control.
  • Auto Failover: If a model goes down, requests auto-route to the next best option
  • Team Ready: RBAC, per-member keys, usage quotas

Quick Start (30 Seconds)

# Your existing OpenAI code — just change the base URL and model
curl https://api.aipossword.cn/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"Hello"}]}'
Enter fullscreen mode Exit fullscreen mode

Switching Models: One Line of Code

This is where the magic happens. Want to compare GPT-4o vs Claude vs DeepSeek? Just change the model string:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_KEY",
    base_url="https://api.aipossword.cn/v1"
)

# Try GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role":"user","content":"Hello"}]
)

# Now try Claude — same code, different model
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role":"user","content":"Hello"}]
)
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

  1. Cost Optimization: Route simple queries to cheap models (Qwen at $0.10/1M tokens) and complex ones to frontier models
  2. Multi-Provider Redundancy: Set up fallback chains — if OpenAI is down, auto-switch to Claude
  3. Team Billing: One invoice, per-member usage tracking, no more expense report nightmares
  4. Local + Cloud Hybrid: Route to your local Ollama instance for dev, fall back to cloud for production

Self-Hosted vs Managed

Feature Self-Hosted Managed (aipossword.cn)
Setup Docker, 2 min Instant
Models Bring your keys Pre-configured
Billing DIY USD, Stripe
Cost Server costs Model price + $0

Why I Recommend It

I've been using New API in production for a few weeks. The auto-failover has saved me twice when providers went down. The zero-markup pricing means I'm not paying extra for convenience — I pay exactly what the model costs.

The open-source nature (AGPLv3) gives me confidence. I can audit the code, self-host if I want, and never worry about vendor lock-in.

Get Started

One endpoint. Every model. Zero friction.


Have you tried API gateways for AI models? What's your setup? Let me know in the comments!

Top comments (1)

Collapse
 
fastanchor_io profile image
FastAnchor_io

This architecture solves a massive integration headache, though it brings a set of hidden observability challenges we’ve covered in earlier threads.
Exposing 100+ heterogeneous LLMs under one unified endpoint drastically simplifies client integration, but it amplifies cost and monitoring blind spots. Without strict per-model routing tagging, baseline tracking breaks entirely — config tweaks to routing weights, fallback logic or priority tiers will silently shift monthly spend, and most dashboards won’t trigger automatic recalibration for those changes.
Another critical pain point: blast-radius alert tiering becomes far harder to enforce at scale. Every team will argue their target model workload deserves an exception to standardized severity rules, gradually eroding your entire alert framework. Also, paired versioning between meta-evaluators and individual models becomes unwieldy when managing this many variants; unaligned evaluator drift will corrupt all your unified quality signals.
Curious how you handle two core pieces:
1.Automated baseline resets whenever routing configs are updated;
2.Mandatory evidence review for teams requesting custom alert exceptions for specific models.