DEV Community

chenxiao5580-cmd
chenxiao5580-cmd

Posted on

One OpenAI-compatible API that auto-picks the model — and tells you which one answered

I had three API keys in my .env — OpenAI, Anthropic, Google — plus an unwritten
rule in my head: "cheap model for easy stuff, big model for the hard stuff."

In practice I'd forget to switch, burn a frontier model on a one-line prompt, and
then watch a per-token bill swing 5x week to week with no real change in what I
was doing. So I built a small gateway to take that decision off my plate, and
wrote a cookbook of drop-in examples for it. Sharing in case the same thing bugs
you.

The idea

Modelis is an OpenAI-compatible /chat/completions API. You send one model
name — modelis-auto — and it routes each request to the right model (GPT,
Claude, Gemini, …). Two things make it more than "yet another gateway":

  1. Flat per-call pricing. You pay a fixed rate per request, not per token, so the bill doesn't balloon when a model gets chatty. It's predictable.
  2. It tells you who answered. The response's model field is the real model that handled the request. No black box.

It's the OpenAI API with a different base_url, so there's no new SDK to learn —
your existing code keeps working.

How the routing actually works

You pick a quality tier (basic / standard / premium). For each request,
Modelis classifies it (how hard is it? code? reasoning? a one-liner?) and routes
to the cheapest model that clears that tier's quality floor for that kind of
task. A trivial prompt might land on Gemini Flash; a gnarly reasoning task gets
bumped to Claude or GPT — but you pay the same flat per-call rate for the tier
either way. The model choice is the system's problem; your cost stays fixed.

30-second integration

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MODELIS_KEY",           # free key at https://modelishub.com
    base_url="https://modelishub.com/v1",  # <- the only change
)

resp = client.chat.completions.create(
    model="modelis-auto",
    messages=[{"role": "user", "content": "In one sentence, what is an LLM gateway?"}],
)

print(resp.choices[0].message.content)
print("answered by:", resp.model)
Enter fullscreen mode Exit fullscreen mode

Output — note the second line:

An LLM gateway is a middleware service that routes, manages, and secures
requests to one or more large language models.

answered by: openai/gpt-4.1-2025-04-14
Enter fullscreen mode Exit fullscreen mode

I sent modelis-auto; for that easy prompt the router picked gpt-4.1 and said
so. Ask something harder and it moves up; ask something trivial and it drops to a
cheaper model — the per-call price doesn't move.

Use it from the tools you already have

Because it's just the OpenAI API with a different base_url, it drops into most
stacks unchanged. Runnable examples for each are in the cookbook:

  • curl — one request, see the model field
  • OpenAI Python / Node SDKs — change base_url, done
  • LangChainChatOpenAI(base_url=...)
  • Vercel AI SDKcreateOpenAI({ baseURL })
  • Tool / function calling — works on the direct endpoint

👉 Cookbook: https://github.com/chenxiao5580-cmd/modelis-cookbook

How it's different from per-token gateways

Most multi-model gateways still bill you per token and still expect you to name a
model. That's fine until your spend becomes a function of how verbose a model
decides to be, and until "which model for this call?" becomes a decision you make
hundreds of times. Modelis trades that for a flat per-call price and an automatic
choice — and then tells you what it chose so you can sanity-check it. Different
trade-off, not a silver bullet; whether it fits depends on your workload.

Honest caveats

  • Text chat is the first-class path today. Tool calling works on the direct endpoint; vision isn't the focus yet.
  • The free signup quota is small — it's for kicking the tires, not running production for free.
  • There's a managed pay-as-you-go option on RapidAPI, but that plan is basic text chat only (no tools/vision) — the auto-routing across premium models lives on the direct endpoint.

Why I bother

Per-token billing punishes you for verbose models and makes spend unpredictable;
hand-picking a model per request is busywork you forget to do. A flat per-call
price + automatic routing + a response that's honest about which model ran felt
like the gateway I actually wanted to use.

If you try it, I'd genuinely like feedback on the routing quality and whether flat
pricing matters for your workload — drop a comment.

Top comments (0)