A flat per-call endpoint for summarize / classify / extract in your n8n and Make automations

#ai #api #llm #automation

If you run automations that summarize, classify, or pull fields out of text at volume, the LLM step is where per-token pricing turns budgeting into a guessing game: one batch of long inputs and the bill spikes. For these bounded-output jobs, a flat price per call fits better than a per-token frontier model. Here is how I wire it into n8n / Make, and when not to.

Why flat-per-call fits automation

Automation runs are repetitive and high-volume, and the outputs are short by nature: a summary, a label, a few extracted fields. I route them through Modelis, an OpenAI-compatible gateway that auto-routes each request to a fitting model and charges a flat price per call with output capped at ~1024 tokens. Because the output is bounded, each run costs the same and your monthly total stays predictable no matter the input size.

Wiring it into n8n / Make

It is a standard OpenAI-compatible POST /v1/chat/completions. Use an HTTP Request node:

Method: POST
URL: https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions
Headers: x-rapidapi-host: modelis-auto-chat.p.rapidapi.com, x-rapidapi-key: YOUR_KEY, content-type: application/json
Body:

{"model":"modelis-auto","messages":[{"role":"user","content":"Label sentiment (positive/negative/neutral): {{ $json.text }}"}]}

The curl equivalent:

curl --request POST \
  --url https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions \
  --header 'content-type: application/json' \
  --header 'x-rapidapi-host: modelis-auto-chat.p.rapidapi.com' \
  --header 'x-rapidapi-key: YOUR_KEY' \
  --data '{"model":"modelis-auto","messages":[{"role":"user","content":"Summarize in 2 sentences: ..."}]}'

If you would rather use a built-in OpenAI node that expects an Authorization: Bearer key and a custom base URL, run the tiny open-source adapter next to your workflow runner:

npx modelis-openai      # local proxy on 127.0.0.1:8787, MIT, ~120 lines

Then point the node at http://127.0.0.1:8787/v1 with model modelis-auto.

Prompts that fit (short outputs)

Summarize: Summarize in 2 sentences: ...
Classify: Label sentiment (positive/negative/neutral): ...
Extract: Return JSON with {name, email, company} from: ...

All produce short outputs, so the flat per-call price keeps high-volume runs cheap to reason about.

When NOT to use it

Long-form generation (articles, whole files, large code) will hit the ~1024-token cap and get truncated. Keep a high-output model for those. Use this for the short, structured outputs that automations actually need.

Try it

Free tier: https://rapidapi.com/chenxiao5580/api/modelis-auto-chat
Adapter source (read it before you run it): https://github.com/modelishub/modelis-openai

I built the adapter. I am most curious which extraction and classification tasks the routing handles well versus badly. If you point an automation at it, I would love to hear how it routed.