<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Scott Ellis</title>
    <description>The latest articles on DEV Community by Scott Ellis (@scott_ellis_a8a3a764b5893).</description>
    <link>https://dev.to/scott_ellis_a8a3a764b5893</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3860726%2F55c8268d-7966-4f7c-b851-09064cf3a475.jpg</url>
      <title>DEV Community: Scott Ellis</title>
      <link>https://dev.to/scott_ellis_a8a3a764b5893</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/scott_ellis_a8a3a764b5893"/>
    <language>en</language>
    <item>
      <title>The `prefer` parameter: routing AI requests by intent instead of model name</title>
      <dc:creator>Scott Ellis</dc:creator>
      <pubDate>Sat, 04 Apr 2026 09:09:42 +0000</pubDate>
      <link>https://dev.to/scott_ellis_a8a3a764b5893/the-prefer-parameter-routing-ai-requests-by-intent-instead-of-model-name-h3n</link>
      <guid>https://dev.to/scott_ellis_a8a3a764b5893/the-prefer-parameter-routing-ai-requests-by-intent-instead-of-model-name-h3n</guid>
      <description>&lt;p&gt;Here's a pattern I see everywhere in LLM code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem isn't the code. The problem is that &lt;code&gt;"gpt-4o"&lt;/code&gt; is a hardcoded decision that will silently become wrong. Models update, get deprecated, get cheaper alternatives. Three months from now, whatever you've hardcoded might be the expensive choice when a better-for-your-use-case option exists.&lt;/p&gt;

&lt;p&gt;More importantly: what you actually wanted to say was &lt;strong&gt;"I want a high-quality model for this request"&lt;/strong&gt;. The specific model name is an implementation detail that got promoted to an interface.&lt;/p&gt;




&lt;h3&gt;
  
  
  The intent gap
&lt;/h3&gt;

&lt;p&gt;Consider these scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A summarization script that runs on a cron job. What you care about: cheap. You do not care which model.&lt;/li&gt;
&lt;li&gt;A code review tool where correctness matters. What you care about: quality reasoning. The model name is incidental.&lt;/li&gt;
&lt;li&gt;A real-time autocomplete feature. What you care about: latency. Cheapest fast model, updated automatically as the landscape changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In each case, you know your intent. You don't know (and shouldn't have to know) which specific model best serves that intent &lt;em&gt;right now&lt;/em&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Enter the &lt;code&gt;prefer&lt;/code&gt; parameter
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://lxg2it.com" rel="noopener noreferrer"&gt;Model Router&lt;/a&gt; is an OpenAI-compatible API that adds one non-standard parameter: &lt;code&gt;prefer&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prefer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cheap"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;prefer&lt;/code&gt; can be: &lt;code&gt;cheap&lt;/code&gt;, &lt;code&gt;fast&lt;/code&gt;, &lt;code&gt;balanced&lt;/code&gt;, &lt;code&gt;quality&lt;/code&gt;, or &lt;code&gt;coding&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;prefer: cheap&lt;/code&gt; → routes to Qwen or DeepSeek via AWS Bedrock. Surprisingly capable, very cheap.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prefer: fast&lt;/code&gt; → routes to whichever model currently has the lowest observed latency&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prefer: quality&lt;/code&gt; → routes to Claude Sonnet or GPT-4o&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prefer: coding&lt;/code&gt; → routes by SWE-bench-weighted composite score; highest coding benchmark wins&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prefer: balanced&lt;/code&gt; → balanced cost/quality within the standard tier (default)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The routing decision is made at request time, not at code-write time. When a better cheap model comes out, &lt;code&gt;prefer: cheap&lt;/code&gt; routes to it automatically. Your code doesn't change.&lt;/p&gt;




&lt;h3&gt;
  
  
  How it works under the hood
&lt;/h3&gt;

&lt;p&gt;There are two independent axes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier&lt;/strong&gt; — constrains the eligible model pool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;economy&lt;/code&gt;: open-weight models (Qwen, DeepSeek, Mistral, GLM via Bedrock). Some are free.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;standard&lt;/code&gt;: mid-tier commercial models (Claude Haiku, Gemini Flash, Llama)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;premium&lt;/code&gt;: top-tier models (Claude Sonnet, GPT-4o, Gemini Pro)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prefer&lt;/strong&gt; — selects within the pool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes the tier's eligible models and picks based on declared optimization target&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can use them independently or together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"economy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"prefer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"quality"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;best&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;quality&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;within&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;economy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tier&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"prefer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fast"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;fastest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;across&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tiers&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;default:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;standard&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tier,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;balanced&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;routing&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also pin a specific model the normal way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-3-5-sonnet"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;routes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;directly,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tier/prefer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ignored&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  The Bedrock angle
&lt;/h3&gt;

&lt;p&gt;One thing that surprised me building this: AWS Bedrock has a genuinely interesting set of open-weight models that are quite cheap and reasonably capable for the right workloads. Qwen, DeepSeek, MiniMax, Kimi — models that most developers aren't routing to directly because the Bedrock setup is a bit of friction.&lt;/p&gt;

&lt;p&gt;Model Router handles that friction. The &lt;code&gt;tier: economy&lt;/code&gt; pool is essentially "give me the good stuff from Bedrock" as a single API call. Some of those models — Llama 3, some Qwen variants — have no usage cost at all.&lt;/p&gt;




&lt;h3&gt;
  
  
  Compatibility
&lt;/h3&gt;

&lt;p&gt;The API is OpenAI-compatible. Drop-in replacement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.lxg2it.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-model-router-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prefer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cheap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;prefer&lt;/code&gt; parameter is passed via &lt;code&gt;extra_body&lt;/code&gt; in the Python SDK (or as a top-level parameter in raw JSON).&lt;/p&gt;




&lt;h3&gt;
  
  
  Where it's at
&lt;/h3&gt;

&lt;p&gt;This is early but running. The routing works, the billing works, and real external users are building with it — including someone processing 80k-token documents through it as part of a pipeline. What it doesn't have yet is scale.&lt;/p&gt;

&lt;p&gt;If you have use cases where you're currently hardcoding model names and wishing you could express intent instead — I'd genuinely love to hear if this helps. $1 signup credit, no commitment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://api.lxg2it.com" rel="noopener noreferrer"&gt;api.lxg2it.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>developer</category>
    </item>
  </channel>
</rss>
