<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Simon Sharp</title>
    <description>The latest articles on DEV Community by Simon Sharp (@simonamsharp).</description>
    <link>https://dev.to/simonamsharp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3871947%2Fbb380092-e611-4f5c-8a0a-f789904a4ed8.jpeg</url>
      <title>DEV Community: Simon Sharp</title>
      <link>https://dev.to/simonamsharp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/simonamsharp"/>
    <language>en</language>
    <item>
      <title>Build a Model Router in 20 Lines with WhichModel</title>
      <dc:creator>Simon Sharp</dc:creator>
      <pubDate>Fri, 10 Apr 2026 14:49:32 +0000</pubDate>
      <link>https://dev.to/simonamsharp/build-a-model-router-in-20-lines-with-whichmodel-49j</link>
      <guid>https://dev.to/simonamsharp/build-a-model-router-in-20-lines-with-whichmodel-49j</guid>
      <description>&lt;h1&gt;
  
  
  Build a Model Router in 20 Lines with WhichModel
&lt;/h1&gt;

&lt;p&gt;You have an AI agent that calls LLMs. It always uses the same model. You want it to pick the right model for each task — optimising for cost, capability, and quality — without maintaining a pricing database yourself.&lt;/p&gt;

&lt;p&gt;Here is how to build a model router in 20 lines using WhichModel and the MCP TypeScript SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/client/index.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StreamableHTTPClientTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/client/streamableHttp.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;router&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StreamableHTTPClientTransport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://whichmodel.dev/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pickModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;complexity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;callTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;recommend_model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;complexity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;budget_per_call&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Use it&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pickModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code_generation&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommended&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// e.g. "anthropic/claude-sonnet-4"&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;budget_option&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// e.g. "google/gemini-2.5-flash"&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;estimated_cost&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;      &lt;span class="c1"&gt;// e.g. "$0.0034"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. Your agent now picks the optimal model for every call based on live pricing data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Get Back
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;recommend_model&lt;/code&gt; tool returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommended"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic/claude-sonnet-4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"estimated_cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$0.0034"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Best quality-to-cost ratio for high-complexity code generation"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"alternative"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai/gpt-4.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"estimated_cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$0.0028"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"budget_option"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google/gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"estimated_cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$0.0004"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three options: best pick, alternative, and budget. Your agent decides which to use based on the task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Budget Caps
&lt;/h2&gt;

&lt;p&gt;Want to enforce spending limits? Add a budget:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Never spend more than $0.002 per call&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cheap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pickModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summarisation&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.002&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WhichModel finds the best model within your budget. If nothing fits, it tells you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing at Scale
&lt;/h2&gt;

&lt;p&gt;Before committing to a model for a high-volume pipeline, compare costs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;comparison&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;callTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;compare_models&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-sonnet-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai/gpt-4.1-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;volume&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;calls_per_day&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;avg_input_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;avg_output_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you daily and monthly cost projections for each model — no spreadsheet required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Just Hardcode?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Prices change multiple times per week across providers&lt;/li&gt;
&lt;li&gt;New models launch constantly — last month alone saw 5 new models that are cheaper than existing options&lt;/li&gt;
&lt;li&gt;Different tasks need different models — a $15/M-token model is overkill for classification&lt;/li&gt;
&lt;li&gt;At 10K calls/day, model choice is a $6,000+/month decision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;WhichModel tracks all of this and updates every 4 hours. Your router stays current without code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whichmodel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://whichmodel.dev/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Which-Model/whichmodel-mcp" rel="noopener noreferrer"&gt;Which-Model/whichmodel-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://whichmodel.dev" rel="noopener noreferrer"&gt;whichmodel.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT — free to use, no API key required&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;20 lines. Zero maintenance. Always current pricing.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>ai</category>
      <category>mcp</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>AI Model Pricing Is a Mess — Here Is How We Track It</title>
      <dc:creator>Simon Sharp</dc:creator>
      <pubDate>Fri, 10 Apr 2026 14:47:50 +0000</pubDate>
      <link>https://dev.to/simonamsharp/ai-model-pricing-is-a-mess-here-is-how-we-track-it-288a</link>
      <guid>https://dev.to/simonamsharp/ai-model-pricing-is-a-mess-here-is-how-we-track-it-288a</guid>
      <description>&lt;h1&gt;
  
  
  AI Model Pricing Is a Mess — Here Is How We Track It
&lt;/h1&gt;

&lt;p&gt;There are over 100 LLM models available through commercial APIs today. Their pricing changes constantly — sometimes multiple times per week. New models launch, old ones get deprecated, and providers quietly adjust rates.&lt;/p&gt;

&lt;p&gt;If you are building with LLMs, you have probably experienced this: you pick a model, hardcode it, ship it, and three months later discover you are paying 10x what a newer model would cost for the same quality.&lt;/p&gt;

&lt;p&gt;We built WhichModel to fix this.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scale of the Problem
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10+ providers&lt;/strong&gt; with different pricing pages, formats, and update cadences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100+ models&lt;/strong&gt; with different input/output/cached token rates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability matrices&lt;/strong&gt; that change with each model update (vision, tool calling, JSON mode, context windows)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality tiers&lt;/strong&gt; that do not map cleanly to price — a $0.60/M-token model can outperform a $15/M-token model on specific tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams handle this by not handling it. They pick a model, maybe two, and revisit the decision quarterly if ever.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Track It
&lt;/h2&gt;

&lt;p&gt;WhichModel scrapes, normalises, and cross-verifies pricing data from every major LLM provider every 4 hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Source Verification
&lt;/h3&gt;

&lt;p&gt;We do not trust a single source. Pricing data is cross-checked across provider APIs, documentation pages, and third-party aggregators. If sources disagree, we flag it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured Capability Tracking
&lt;/h3&gt;

&lt;p&gt;For each model we track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input, output, and cached token prices&lt;/li&gt;
&lt;li&gt;Context window size&lt;/li&gt;
&lt;li&gt;Supported features (tool calling, JSON output, streaming, vision)&lt;/li&gt;
&lt;li&gt;Provider and availability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  MCP-Native Access
&lt;/h3&gt;

&lt;p&gt;The data is exposed as an MCP server — meaning any AI agent can query it natively. No REST API to learn, no SDK to install:&lt;/p&gt;

&lt;p&gt;One line of config. No API key. Real-time pricing data.&lt;/p&gt;

&lt;p&gt;Your agent can then ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What is the cheapest model that supports tool calling with at least 128K context?"&lt;/li&gt;
&lt;li&gt;"Compare Claude Sonnet 4 vs GPT-4.1 for code generation at 10K calls/day"&lt;/li&gt;
&lt;li&gt;"Recommend a model for data extraction under $0.002 per call"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What We Have Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Price is not correlated with quality for most tasks.&lt;/strong&gt;&lt;br&gt;
A $0.60/M-token model handles 80% of production tasks as well as a $15/M-token model. The gap matters for the remaining 20%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Pricing changes more than you think.&lt;/strong&gt;&lt;br&gt;
We see meaningful pricing updates multiple times per week across the ecosystem. What was true last month may not be true today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The "just use the best model" approach is expensive at scale.&lt;/strong&gt;&lt;br&gt;
At 10K calls/day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Agents need this data in real time, not in a spreadsheet.&lt;/strong&gt;&lt;br&gt;
The whole point of autonomous agents is that they make decisions without human intervention — including which model to use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;WhichModel is open source and free to use.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP Endpoint:&lt;/strong&gt; &lt;a href="https://whichmodel.dev/mcp" rel="noopener noreferrer"&gt;https://whichmodel.dev/mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Which-Model/whichmodel-mcp" rel="noopener noreferrer"&gt;Which-Model/whichmodel-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://whichmodel.dev" rel="noopener noreferrer"&gt;whichmodel.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built for agents. Updated every 4 hours. MIT licensed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to Add Cost-Aware Model Selection to Your AI Agent</title>
      <dc:creator>Simon Sharp</dc:creator>
      <pubDate>Fri, 10 Apr 2026 14:35:38 +0000</pubDate>
      <link>https://dev.to/simonamsharp/how-to-add-cost-aware-model-selection-to-your-ai-agent-43mh</link>
      <guid>https://dev.to/simonamsharp/how-to-add-cost-aware-model-selection-to-your-ai-agent-43mh</guid>
      <description>&lt;h1&gt;
  
  
  How to Add Cost-Aware Model Selection to Your AI Agent
&lt;/h1&gt;

&lt;p&gt;Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.&lt;/p&gt;

&lt;p&gt;This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing current prices across providers, which models support the capabilities you need, and how model quality maps to task complexity.&lt;/p&gt;

&lt;p&gt;Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: 30 Seconds
&lt;/h2&gt;

&lt;p&gt;Add WhichModel to your MCP client config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whichmodel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://whichmodel.dev/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API key. No installation. It is a remote MCP server — your agent connects directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using It: Three Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: Task-Based Routing
&lt;/h3&gt;

&lt;p&gt;Ask WhichModel to recommend a model based on what you are doing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;recommend_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;task_type: &lt;/span&gt;&lt;span class="s2"&gt;"code_generation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;complexity: &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;estimated_input_tokens: &lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;estimated_output_tokens: &lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;requirements: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tool_calling: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Budget Caps
&lt;/h3&gt;

&lt;p&gt;Set a per-call budget and let WhichModel find the best model within it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;recommend_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;task_type: &lt;/span&gt;&lt;span class="s2"&gt;"summarisation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;complexity: &lt;/span&gt;&lt;span class="s2"&gt;"low"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;budget_per_call: &lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 3: Volume Cost Projections
&lt;/h3&gt;

&lt;p&gt;Before committing to a model, compare costs at scale:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;compare_models&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;models: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"anthropic/claude-sonnet-4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"openai/gpt-4.1-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"google/gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="ss"&gt;volume: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="ss"&gt;calls_per_day: &lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;avg_input_tokens: &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;avg_output_tokens: &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is &lt;strong&gt;$216/day&lt;/strong&gt; — over $6,000 per month. WhichModel helps your agent make that call automatically, with pricing data that updates every 4 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Remote endpoint:&lt;/strong&gt; &lt;a href="https://whichmodel.dev/mcp" rel="noopener noreferrer"&gt;https://whichmodel.dev/mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Which-Model/whichmodel-mcp" rel="noopener noreferrer"&gt;Which-Model/whichmodel-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://whichmodel.dev" rel="noopener noreferrer"&gt;whichmodel.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;WhichModel is open source (MIT). No API key required.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>llm</category>
      <category>agentdev</category>
    </item>
    <item>
      <title>Build a Model Router in 20 Lines with WhichModel</title>
      <dc:creator>Simon Sharp</dc:creator>
      <pubDate>Fri, 10 Apr 2026 14:35:16 +0000</pubDate>
      <link>https://dev.to/simonamsharp/build-a-model-router-in-20-lines-with-whichmodel-3g5m</link>
      <guid>https://dev.to/simonamsharp/build-a-model-router-in-20-lines-with-whichmodel-3g5m</guid>
      <description>&lt;p&gt;$QUICKSTART_BODY&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>typescript</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI Model Pricing Is a Mess — Here Is How We Track It</title>
      <dc:creator>Simon Sharp</dc:creator>
      <pubDate>Fri, 10 Apr 2026 14:23:23 +0000</pubDate>
      <link>https://dev.to/simonamsharp/ai-model-pricing-is-a-mess-here-is-how-we-track-it-1f05</link>
      <guid>https://dev.to/simonamsharp/ai-model-pricing-is-a-mess-here-is-how-we-track-it-1f05</guid>
      <description>&lt;h1&gt;
  
  
  AI Model Pricing Is a Mess — Here Is How We Track It
&lt;/h1&gt;

&lt;p&gt;There are over 100 LLM models available through commercial APIs today. Their pricing changes constantly — sometimes multiple times per week. New models launch, old ones get deprecated, and providers quietly adjust rates.&lt;/p&gt;

&lt;p&gt;If you are building with LLMs, you have probably experienced this: you pick a model, hardcode it, ship it, and three months later discover you are paying 10x what a newer model would cost for the same quality.&lt;/p&gt;

&lt;p&gt;We built WhichModel to fix this.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scale of the Problem
&lt;/h2&gt;

&lt;p&gt;Here is what tracking LLM pricing actually looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10+ providers&lt;/strong&gt; with different pricing pages, formats, and update cadences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100+ models&lt;/strong&gt; with different input/output/cached token rates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability matrices&lt;/strong&gt; that change with each model update (vision support, tool calling, JSON mode, context windows)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality tiers&lt;/strong&gt; that do not map cleanly to price — a $0.60/M-token model can outperform a $15/M-token model on specific tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams handle this by... not handling it. They pick a model, maybe two, and revisit the decision quarterly if ever.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Track It
&lt;/h2&gt;

&lt;p&gt;WhichModel scrapes, normalises, and cross-verifies pricing data from every major LLM provider every 4 hours. Here is what that involves:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Multi-Source Verification
&lt;/h3&gt;

&lt;p&gt;We do not trust a single source. Pricing data is cross-checked across provider APIs, documentation pages, and third-party aggregators. If sources disagree, we flag it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Structured Capability Tracking
&lt;/h3&gt;

&lt;p&gt;Pricing is useless without capability context. For each model we track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input, output, and cached token prices&lt;/li&gt;
&lt;li&gt;Context window size&lt;/li&gt;
&lt;li&gt;Supported features (tool calling, JSON output, streaming, vision)&lt;/li&gt;
&lt;li&gt;Provider and availability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. MCP-Native Access
&lt;/h3&gt;

&lt;p&gt;The data is exposed as an MCP server — meaning any AI agent can query it natively. No REST API to learn, no SDK to install. Just add the MCP endpoint and your agent can ask questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What is the cheapest model that supports tool calling with at least 128K context?"&lt;/li&gt;
&lt;li&gt;"Compare Claude Sonnet 4 vs GPT-4.1 for code generation at 10K calls/day"&lt;/li&gt;
&lt;li&gt;"Recommend a model for data extraction under $0.002 per call"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why MCP?
&lt;/h2&gt;

&lt;p&gt;We chose MCP (Model Context Protocol) because the users of this data are AI agents, not humans browsing a dashboard. MCP is the standard protocol for giving AI agents access to tools and data. By exposing WhichModel as an MCP server, any agent that speaks MCP can use it out of the box.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whichmodel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://whichmodel.dev/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line of config. No API key. Real-time pricing data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Have Learned
&lt;/h2&gt;

&lt;p&gt;After building this, a few things surprised us:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Price is not correlated with quality for most tasks.&lt;/strong&gt; A $0.60/M-token model handles 80% of production tasks as well as a $15/M-token model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pricing changes more than you think.&lt;/strong&gt; We see meaningful pricing updates multiple times per week across the ecosystem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The "just use the best model" approach is expensive at scale.&lt;/strong&gt; At 10K calls/day, model choice is a $6,000+/month decision.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agents need this data in real time, not in a spreadsheet.&lt;/strong&gt; The whole point of autonomous agents is that they make decisions without human intervention — including which model to use.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;WhichModel is open source and free to use.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP Endpoint:&lt;/strong&gt; &lt;code&gt;https://whichmodel.dev/mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Which-Model/whichmodel-mcp" rel="noopener noreferrer"&gt;Which-Model/whichmodel-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://whichmodel.dev" rel="noopener noreferrer"&gt;whichmodel.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built for agents. Updated every 4 hours. MIT licensed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to Add Cost-Aware Model Selection to Your AI Agent</title>
      <dc:creator>Simon Sharp</dc:creator>
      <pubDate>Fri, 10 Apr 2026 14:23:17 +0000</pubDate>
      <link>https://dev.to/simonamsharp/how-to-add-cost-aware-model-selection-to-your-ai-agent-5a3l</link>
      <guid>https://dev.to/simonamsharp/how-to-add-cost-aware-model-selection-to-your-ai-agent-5a3l</guid>
      <description>&lt;h1&gt;
  
  
  How to Add Cost-Aware Model Selection to Your AI Agent
&lt;/h1&gt;

&lt;p&gt;Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.&lt;/p&gt;

&lt;p&gt;This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Current prices across providers&lt;/li&gt;
&lt;li&gt;Which models support the capabilities you need (tool calling, JSON output, vision)&lt;/li&gt;
&lt;li&gt;How model quality maps to task complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: 30 Seconds
&lt;/h2&gt;

&lt;p&gt;Add WhichModel to your MCP client config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whichmodel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://whichmodel.dev/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API key. No installation. It is a remote MCP server — your agent connects directly.&lt;/p&gt;

&lt;p&gt;For stdio-based clients (Claude Desktop, Cursor):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whichmodel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"whichmodel-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using It: Three Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: Task-Based Routing
&lt;/h3&gt;

&lt;p&gt;Ask WhichModel to recommend a model based on what you are doing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;recommend_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;task_type: &lt;/span&gt;&lt;span class="s2"&gt;"code_generation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;complexity: &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;estimated_input_tokens: &lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;estimated_output_tokens: &lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;requirements: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tool_calling: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Budget Caps
&lt;/h3&gt;

&lt;p&gt;Set a per-call budget and let WhichModel find the best model within it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;recommend_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;task_type: &lt;/span&gt;&lt;span class="s2"&gt;"summarisation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;complexity: &lt;/span&gt;&lt;span class="s2"&gt;"low"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;budget_per_call: &lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a simple summarisation task, you might be paying $0.01 per call with GPT-4 when a $0.0005 call to a smaller model would give you the same result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Volume Cost Projections
&lt;/h3&gt;

&lt;p&gt;Before committing to a model, compare costs at scale:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;compare_models&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;models: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"anthropic/claude-sonnet-4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"openai/gpt-4.1-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"google/gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="ss"&gt;task_type: &lt;/span&gt;&lt;span class="s2"&gt;"data_extraction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;volume: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="ss"&gt;calls_per_day: &lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;avg_input_tokens: &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;avg_output_tokens: &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you daily and monthly cost projections for each model, so you can make informed decisions before scaling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is &lt;strong&gt;$216/day&lt;/strong&gt; — over $6,000 per month. For many tasks, the cheaper model produces equivalent results.&lt;/p&gt;

&lt;p&gt;WhichModel helps your agent make that call automatically, every time, with pricing data that updates every 4 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Remote endpoint:&lt;/strong&gt; &lt;code&gt;https://whichmodel.dev/mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Which-Model/whichmodel-mcp" rel="noopener noreferrer"&gt;Which-Model/whichmodel-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://whichmodel.dev" rel="noopener noreferrer"&gt;whichmodel.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;WhichModel is open source (MIT). No API key required.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>agents</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
