One OpenAI-compatible API that auto-picks the model — and tells you which one answered

#ai #llm #openai #api

I had three API keys in my .env — OpenAI, Anthropic, Google — plus an unwritten
rule in my head: "cheap model for easy stuff, big model for the hard stuff."

In practice I'd forget to switch, burn a frontier model on a one-line prompt, and
then watch a per-token bill swing 5x week to week with no real change in what I
was doing. So I built a small gateway to take that decision off my plate, and
wrote a cookbook of drop-in examples for it. Sharing in case the same thing bugs
you.

The idea

Modelis is an OpenAI-compatible /chat/completions API. You send one model
name — modelis-auto — and it routes each request to the right model (GPT,
Claude, Gemini, …). Two things make it more than "yet another gateway":

Flat per-call pricing. You pay a fixed rate per request, not per token, so the bill doesn't balloon when a model gets chatty. It's predictable.
It tells you who answered. The response's model field is the real model that handled the request. No black box.

It's the OpenAI API with a different base_url, so there's no new SDK to learn —
your existing code keeps working.

How the routing actually works

You pick a quality tier (basic / standard / premium). For each request,
Modelis classifies it (how hard is it? code? reasoning? a one-liner?) and routes
to the cheapest model that clears that tier's quality floor for that kind of
task. A trivial prompt might land on Gemini Flash; a gnarly reasoning task gets
bumped to Claude or GPT — but you pay the same flat per-call rate for the tier
either way. The model choice is the system's problem; your cost stays fixed.

30-second integration

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MODELIS_KEY",           # free key at https://modelishub.com
    base_url="https://modelishub.com/v1",  # <- the only change
)

resp = client.chat.completions.create(
    model="modelis-auto",
    messages=[{"role": "user", "content": "In one sentence, what is an LLM gateway?"}],
)

print(resp.choices[0].message.content)
print("answered by:", resp.model)

Output — note the second line:

An LLM gateway is a middleware service that routes, manages, and secures
requests to one or more large language models.

answered by: openai/gpt-4.1-2025-04-14

I sent modelis-auto; for that easy prompt the router picked gpt-4.1 and said
so. Ask something harder and it moves up; ask something trivial and it drops to a
cheaper model — the per-call price doesn't move.

Use it from the tools you already have

Because it's just the OpenAI API with a different base_url, it drops into most
stacks unchanged. Runnable examples for each are in the cookbook:

curl — one request, see the model field
OpenAI Python / Node SDKs — change base_url, done
LangChain — ChatOpenAI(base_url=...)
Vercel AI SDK — createOpenAI({ baseURL })
Tool / function calling — works on the direct endpoint

👉 Cookbook: https://github.com/chenxiao5580-cmd/modelis-cookbook

How it's different from per-token gateways

Most multi-model gateways still bill you per token and still expect you to name a
model. That's fine until your spend becomes a function of how verbose a model
decides to be, and until "which model for this call?" becomes a decision you make
hundreds of times. Modelis trades that for a flat per-call price and an automatic
choice — and then tells you what it chose so you can sanity-check it. Different
trade-off, not a silver bullet; whether it fits depends on your workload.

Honest caveats

Text chat is the first-class path today. Tool calling works on the direct endpoint; vision isn't the focus yet.
The free signup quota is small — it's for kicking the tires, not running production for free.
There's a managed pay-as-you-go option on RapidAPI, but that plan is basic text chat only (no tools/vision) — the auto-routing across premium models lives on the direct endpoint.

Why I bother

Per-token billing punishes you for verbose models and makes spend unpredictable;
hand-picking a model per request is busywork you forget to do. A flat per-call
price + automatic routing + a response that's honest about which model ran felt
like the gateway I actually wanted to use.

If you try it, I'd genuinely like feedback on the routing quality and whether flat
pricing matters for your workload — drop a comment.