I had three API keys in my .env — OpenAI, Anthropic, Google — plus an unwritten
rule in my head: "cheap model for easy stuff, big model for the hard stuff."
In practice I'd forget to switch, burn a frontier model on a one-line prompt, and
then watch a per-token bill swing 5x week to week with no real change in what I
was doing. So I built a small gateway to take that decision off my plate, and
wrote a cookbook of drop-in examples for it. Sharing in case the same thing bugs
you.
The idea
Modelis is an OpenAI-compatible /chat/completions API. You send one model
name — modelis-auto — and it routes each request to the right model (GPT,
Claude, Gemini, …). Two things make it more than "yet another gateway":
- Flat per-call pricing. You pay a fixed rate per request, not per token, so the bill doesn't balloon when a model gets chatty. It's predictable.
-
It tells you who answered. The response's
modelfield is the real model that handled the request. No black box.
It's the OpenAI API with a different base_url, so there's no new SDK to learn —
your existing code keeps working.
How the routing actually works
You pick a quality tier (basic / standard / premium). For each request,
Modelis classifies it (how hard is it? code? reasoning? a one-liner?) and routes
to the cheapest model that clears that tier's quality floor for that kind of
task. A trivial prompt might land on Gemini Flash; a gnarly reasoning task gets
bumped to Claude or GPT — but you pay the same flat per-call rate for the tier
either way. The model choice is the system's problem; your cost stays fixed.
30-second integration
from openai import OpenAI
client = OpenAI(
api_key="YOUR_MODELIS_KEY", # free key at https://modelishub.com
base_url="https://modelishub.com/v1", # <- the only change
)
resp = client.chat.completions.create(
model="modelis-auto",
messages=[{"role": "user", "content": "In one sentence, what is an LLM gateway?"}],
)
print(resp.choices[0].message.content)
print("answered by:", resp.model)
Output — note the second line:
An LLM gateway is a middleware service that routes, manages, and secures
requests to one or more large language models.
answered by: openai/gpt-4.1-2025-04-14
I sent modelis-auto; for that easy prompt the router picked gpt-4.1 and said
so. Ask something harder and it moves up; ask something trivial and it drops to a
cheaper model — the per-call price doesn't move.
Use it from the tools you already have
Because it's just the OpenAI API with a different base_url, it drops into most
stacks unchanged. Runnable examples for each are in the cookbook:
-
curl — one request, see the
modelfield -
OpenAI Python / Node SDKs — change
base_url, done -
LangChain —
ChatOpenAI(base_url=...) -
Vercel AI SDK —
createOpenAI({ baseURL }) - Tool / function calling — works on the direct endpoint
👉 Cookbook: https://github.com/chenxiao5580-cmd/modelis-cookbook
How it's different from per-token gateways
Most multi-model gateways still bill you per token and still expect you to name a
model. That's fine until your spend becomes a function of how verbose a model
decides to be, and until "which model for this call?" becomes a decision you make
hundreds of times. Modelis trades that for a flat per-call price and an automatic
choice — and then tells you what it chose so you can sanity-check it. Different
trade-off, not a silver bullet; whether it fits depends on your workload.
Honest caveats
- Text chat is the first-class path today. Tool calling works on the direct endpoint; vision isn't the focus yet.
- The free signup quota is small — it's for kicking the tires, not running production for free.
- There's a managed pay-as-you-go option on RapidAPI, but that plan is basic text chat only (no tools/vision) — the auto-routing across premium models lives on the direct endpoint.
Why I bother
Per-token billing punishes you for verbose models and makes spend unpredictable;
hand-picking a model per request is busywork you forget to do. A flat per-call
price + automatic routing + a response that's honest about which model ran felt
like the gateway I actually wanted to use.
If you try it, I'd genuinely like feedback on the routing quality and whether flat
pricing matters for your workload — drop a comment.
Top comments (0)