DEV Community

chenxiao5580-cmd
chenxiao5580-cmd

Posted on

A flat per-call endpoint for summarize / classify / extract in your n8n and Make automations

If you run automations that summarize, classify, or pull fields out of text at volume, the LLM step is where per-token pricing turns budgeting into a guessing game: one batch of long inputs and the bill spikes. For these bounded-output jobs, a flat price per call fits better than a per-token frontier model. Here is how I wire it into n8n / Make, and when not to.

Why flat-per-call fits automation

Automation runs are repetitive and high-volume, and the outputs are short by nature: a summary, a label, a few extracted fields. I route them through Modelis, an OpenAI-compatible gateway that auto-routes each request to a fitting model and charges a flat price per call with output capped at ~1024 tokens. Because the output is bounded, each run costs the same and your monthly total stays predictable no matter the input size.

Wiring it into n8n / Make

It is a standard OpenAI-compatible POST /v1/chat/completions. Use an HTTP Request node:

  • Method: POST
  • URL: https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions
  • Headers: x-rapidapi-host: modelis-auto-chat.p.rapidapi.com, x-rapidapi-key: YOUR_KEY, content-type: application/json
  • Body:
{"model":"modelis-auto","messages":[{"role":"user","content":"Label sentiment (positive/negative/neutral): {{ $json.text }}"}]}
Enter fullscreen mode Exit fullscreen mode

The curl equivalent:

curl --request POST \
  --url https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions \
  --header 'content-type: application/json' \
  --header 'x-rapidapi-host: modelis-auto-chat.p.rapidapi.com' \
  --header 'x-rapidapi-key: YOUR_KEY' \
  --data '{"model":"modelis-auto","messages":[{"role":"user","content":"Summarize in 2 sentences: ..."}]}'
Enter fullscreen mode Exit fullscreen mode

If you would rather use a built-in OpenAI node that expects an Authorization: Bearer key and a custom base URL, run the tiny open-source adapter next to your workflow runner:

npx modelis-openai      # local proxy on 127.0.0.1:8787, MIT, ~120 lines
Enter fullscreen mode Exit fullscreen mode

Then point the node at http://127.0.0.1:8787/v1 with model modelis-auto.

Prompts that fit (short outputs)

  • Summarize: Summarize in 2 sentences: ...
  • Classify: Label sentiment (positive/negative/neutral): ...
  • Extract: Return JSON with {name, email, company} from: ...

All produce short outputs, so the flat per-call price keeps high-volume runs cheap to reason about.

When NOT to use it

Long-form generation (articles, whole files, large code) will hit the ~1024-token cap and get truncated. Keep a high-output model for those. Use this for the short, structured outputs that automations actually need.

Try it

I built the adapter. I am most curious which extraction and classification tasks the routing handles well versus badly. If you point an automation at it, I would love to hear how it routed.

Top comments (0)