DEV Community

chenxiao5580-cmd
chenxiao5580-cmd

Posted on

Use a flat-priced, auto-routing LLM API in Aider or Cline — one npx command

Coding assistants like Aider, Cline, and Continue all speak the OpenAI wire protocol — point them at a base_url, give them an API key, done. That makes swapping in a different LLM backend trivial... if that backend uses Authorization: Bearer.

The flat-priced, auto-routing API I'd been using doesn't. It's distributed through RapidAPI, which authenticates with an X-RapidAPI-Key header instead of Bearer. So I couldn't just drop it into Aider. The fix turned out to be ~120 lines, so I open-sourced it.

modelis-openai

A zero-dependency local proxy (MIT, Node 18+). It listens on 127.0.0.1, speaks plain OpenAI, rewrites the auth header, and forwards to the upstream gateway. Streaming (stream: true) is piped straight through, so token-by-token output works exactly as with the OpenAI API.

your tool ──OpenAI(Bearer)──▶ modelis-openai (localhost) ──X-RapidAPI-Key──▶ upstream ──▶ best model
Enter fullscreen mode Exit fullscreen mode

Quickstart

npx modelis-openai
Enter fullscreen mode Exit fullscreen mode

Then point any OpenAI-compatible tool at it:

Setting Value
Base URL http://127.0.0.1:8787/v1
API key your RapidAPI key
Model modelis-auto

Drop it into your tool

Aider

export OPENAI_API_BASE=http://127.0.0.1:8787/v1
export OPENAI_API_KEY=<your-rapidapi-key>
aider --model openai/modelis-auto
Enter fullscreen mode Exit fullscreen mode

Cline / Roo Code — API Provider OpenAI Compatible, Base URL http://127.0.0.1:8787/v1, Model ID modelis-auto.

Continue (~/.continue/config.yaml)

models:
  - name: Modelis
    provider: openai
    model: modelis-auto
    apiBase: http://127.0.0.1:8787/v1
    apiKey: <your-rapidapi-key>
Enter fullscreen mode Exit fullscreen mode

Any OpenAI SDK

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8787/v1", api_key="<your-rapidapi-key>")
print(client.chat.completions.create(
    model="modelis-auto",
    messages=[{"role": "user", "content": "Hello"}],
).choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

How it works

  1. Reads the key from Authorization: Bearer <key> (or MODELIS_RAPIDAPI_KEY).
  2. Rewrites the request model to modelis-auto (configurable).
  3. Forwards to the RapidAPI gateway with X-RapidAPI-Key / X-RapidAPI-Host.
  4. Relays the response — including SSE streams and rate-limit headers — unchanged.

It also answers GET /v1/models and GET /health so tools that probe on startup don't choke.

Honest notes

  • It routes to a paid API (there's a free tier to start). The point of the proxy is to remove the integration friction, not to give anything away.
  • Cursor isn't supported — it sends requests from its own servers, so a localhost endpoint can't be reached. This is for tools that call the API from your machine.

Links

If you try it in a tool I didn't list, I'd love to hear how it goes.

Top comments (1)

Collapse
 
topstar_ai profile image
Luis

This setup is basically about routing Aider / Cline through a cheap or unified LLM gateway instead of calling OpenAI/Anthropic directly.

A clean, practical version of what that post is likely showing:

You can run Aider or Cline through an OpenAI-compatible proxy (like OpenRouter-style routing, LiteLLM, RelayPlane, or similar) by setting:

export OPENAI_API_BASE="localhost:PORT"
export OPENAI_API_KEY="your-key-or-proxy-key"

Then launch tools normally:

aider --model gpt-4o

or for Cline, just point the API settings to the same base URL in VS Code settings.

What this enables:

One endpoint → multiple models (Claude, GPT, Gemini, DeepSeek, etc.)
Automatic routing (cheap model for simple edits, stronger model for refactors)
Cost control per task instead of per tool
Works with CLI tools like Aider and IDE tools like Cline without changes

If you want a more “production-grade” version of this idea, people usually combine:

OpenRouter or LiteLLM proxy (model aggregation layer)
Aider / Cline (client tools)
optional routing rules (cost/quality-based)

So the real value here isn’t the npx command itself — it’s the abstraction layer that makes all coding agents model-agnostic and cheaper to run.

Good practical post overall—this is the direction most AI dev tooling is heading 🤝