DEV Community: Simon Sharp

Build a Model Router in 20 Lines with WhichModel

Simon Sharp — Fri, 10 Apr 2026 14:49:32 +0000

Build a Model Router in 20 Lines with WhichModel

You have an AI agent that calls LLMs. It always uses the same model. You want it to pick the right model for each task — optimising for cost, capability, and quality — without maintaining a pricing database yourself.

Here is how to build a model router in 20 lines using WhichModel and the MCP TypeScript SDK.

The Code

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";

const client = new Client({ name: "router", version: "1.0" });
await client.connect(
  new StreamableHTTPClientTransport(new URL("https://whichmodel.dev/mcp"))
);

async function pickModel(taskType: string, complexity: string, budget?: number) {
  const result = await client.callTool({
    name: "recommend_model",
    arguments: {
      task_type: taskType,
      complexity,
      ...(budget && { budget_per_call: budget }),
    },
  });
  return JSON.parse(result.content[0].text);
}

// Use it
const rec = await pickModel("code_generation", "high", 0.01);
console.log(rec.recommended.model); // e.g. "anthropic/claude-sonnet-4"
console.log(rec.budget_option.model); // e.g. "google/gemini-2.5-flash"
console.log(rec.estimated_cost);      // e.g. "$0.0034"

That is it. Your agent now picks the optimal model for every call based on live pricing data.

What You Get Back

The recommend_model tool returns:

{
  "recommended": {
    "model": "anthropic/claude-sonnet-4",
    "provider": "anthropic",
    "estimated_cost": "$0.0034",
    "reasoning": "Best quality-to-cost ratio for high-complexity code generation"
  },
  "alternative": {
    "model": "openai/gpt-4.1",
    "estimated_cost": "$0.0028"
  },
  "budget_option": {
    "model": "google/gemini-2.5-flash",
    "estimated_cost": "$0.0004"
  }
}

Three options: best pick, alternative, and budget. Your agent decides which to use based on the task.

Adding Budget Caps

Want to enforce spending limits? Add a budget:

// Never spend more than $0.002 per call
const cheap = await pickModel("summarisation", "low", 0.002);

WhichModel finds the best model within your budget. If nothing fits, it tells you.

Comparing at Scale

Before committing to a model for a high-volume pipeline, compare costs:

const comparison = await client.callTool({
  name: "compare_models",
  arguments: {
    models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
    volume: { calls_per_day: 10000, avg_input_tokens: 1000, avg_output_tokens: 500 }
  }
});

This gives you daily and monthly cost projections for each model — no spreadsheet required.

Why Not Just Hardcode?

Prices change multiple times per week across providers
New models launch constantly — last month alone saw 5 new models that are cheaper than existing options
Different tasks need different models — a $15/M-token model is overkill for classification
At 10K calls/day, model choice is a $6,000+/month decision

WhichModel tracks all of this and updates every 4 hours. Your router stays current without code changes.

Get Started

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

GitHub: Which-Model/whichmodel-mcp
Website: whichmodel.dev
License: MIT — free to use, no API key required

20 lines. Zero maintenance. Always current pricing.

AI Model Pricing Is a Mess — Here Is How We Track It

Simon Sharp — Fri, 10 Apr 2026 14:47:50 +0000

AI Model Pricing Is a Mess — Here Is How We Track It

There are over 100 LLM models available through commercial APIs today. Their pricing changes constantly — sometimes multiple times per week. New models launch, old ones get deprecated, and providers quietly adjust rates.

If you are building with LLMs, you have probably experienced this: you pick a model, hardcode it, ship it, and three months later discover you are paying 10x what a newer model would cost for the same quality.

We built WhichModel to fix this.

The Scale of the Problem

10+ providers with different pricing pages, formats, and update cadences
100+ models with different input/output/cached token rates
Capability matrices that change with each model update (vision, tool calling, JSON mode, context windows)
Quality tiers that do not map cleanly to price — a $0.60/M-token model can outperform a $15/M-token model on specific tasks

Most teams handle this by not handling it. They pick a model, maybe two, and revisit the decision quarterly if ever.

How We Track It

WhichModel scrapes, normalises, and cross-verifies pricing data from every major LLM provider every 4 hours.

Multi-Source Verification

We do not trust a single source. Pricing data is cross-checked across provider APIs, documentation pages, and third-party aggregators. If sources disagree, we flag it.

Structured Capability Tracking

For each model we track:

Input, output, and cached token prices
Context window size
Supported features (tool calling, JSON output, streaming, vision)
Provider and availability

MCP-Native Access

The data is exposed as an MCP server — meaning any AI agent can query it natively. No REST API to learn, no SDK to install:

One line of config. No API key. Real-time pricing data.

Your agent can then ask:

"What is the cheapest model that supports tool calling with at least 128K context?"
"Compare Claude Sonnet 4 vs GPT-4.1 for code generation at 10K calls/day"
"Recommend a model for data extraction under $0.002 per call"

What We Have Learned

1. Price is not correlated with quality for most tasks.
A $0.60/M-token model handles 80% of production tasks as well as a $15/M-token model. The gap matters for the remaining 20%.

2. Pricing changes more than you think.
We see meaningful pricing updates multiple times per week across the ecosystem. What was true last month may not be true today.

3. The "just use the best model" approach is expensive at scale.
At 10K calls/day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month.

4. Agents need this data in real time, not in a spreadsheet.
The whole point of autonomous agents is that they make decisions without human intervention — including which model to use.

Try It

WhichModel is open source and free to use.

MCP Endpoint: https://whichmodel.dev/mcp
GitHub: Which-Model/whichmodel-mcp
Website: whichmodel.dev

Built for agents. Updated every 4 hours. MIT licensed.

How to Add Cost-Aware Model Selection to Your AI Agent

Simon Sharp — Fri, 10 Apr 2026 14:35:38 +0000

How to Add Cost-Aware Model Selection to Your AI Agent

Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.

This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.

The Problem

LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing current prices across providers, which models support the capabilities you need, and how model quality maps to task complexity.

Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.

Setup: 30 Seconds

Add WhichModel to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

No API key. No installation. It is a remote MCP server — your agent connects directly.

Using It: Three Patterns

Pattern 1: Task-Based Routing

Ask WhichModel to recommend a model based on what you are doing:

recommend_model(
  task_type: "code_generation",
  complexity: "high",
  estimated_input_tokens: 4000,
  estimated_output_tokens: 2000,
  requirements: { tool_calling: true }
)

WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.

Pattern 2: Budget Caps

Set a per-call budget and let WhichModel find the best model within it:

recommend_model(
  task_type: "summarisation",
  complexity: "low",
  budget_per_call: 0.001
)

Pattern 3: Volume Cost Projections

Before committing to a model, compare costs at scale:

compare_models(
  models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
  volume: {
    calls_per_day: 10000,
    avg_input_tokens: 1000,
    avg_output_tokens: 500
  }
)

Why This Matters

At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. WhichModel helps your agent make that call automatically, with pricing data that updates every 4 hours.

Try It

Remote endpoint: https://whichmodel.dev/mcp
GitHub: Which-Model/whichmodel-mcp
Website: whichmodel.dev

WhichModel is open source (MIT). No API key required.

Build a Model Router in 20 Lines with WhichModel

Simon Sharp — Fri, 10 Apr 2026 14:35:16 +0000

$QUICKSTART_BODY

AI Model Pricing Is a Mess — Here Is How We Track It

Simon Sharp — Fri, 10 Apr 2026 14:23:23 +0000

AI Model Pricing Is a Mess — Here Is How We Track It

We built WhichModel to fix this.

The Scale of the Problem

Here is what tracking LLM pricing actually looks like:

10+ providers with different pricing pages, formats, and update cadences
100+ models with different input/output/cached token rates
Capability matrices that change with each model update (vision support, tool calling, JSON mode, context windows)
Quality tiers that do not map cleanly to price — a $0.60/M-token model can outperform a $15/M-token model on specific tasks

Most teams handle this by... not handling it. They pick a model, maybe two, and revisit the decision quarterly if ever.

How We Track It

WhichModel scrapes, normalises, and cross-verifies pricing data from every major LLM provider every 4 hours. Here is what that involves:

1. Multi-Source Verification

We do not trust a single source. Pricing data is cross-checked across provider APIs, documentation pages, and third-party aggregators. If sources disagree, we flag it.

2. Structured Capability Tracking

Pricing is useless without capability context. For each model we track:

Input, output, and cached token prices
Context window size
Supported features (tool calling, JSON output, streaming, vision)
Provider and availability

3. MCP-Native Access

The data is exposed as an MCP server — meaning any AI agent can query it natively. No REST API to learn, no SDK to install. Just add the MCP endpoint and your agent can ask questions like:

"What is the cheapest model that supports tool calling with at least 128K context?"
"Compare Claude Sonnet 4 vs GPT-4.1 for code generation at 10K calls/day"
"Recommend a model for data extraction under $0.002 per call"

Why MCP?

We chose MCP (Model Context Protocol) because the users of this data are AI agents, not humans browsing a dashboard. MCP is the standard protocol for giving AI agents access to tools and data. By exposing WhichModel as an MCP server, any agent that speaks MCP can use it out of the box.

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

One line of config. No API key. Real-time pricing data.

What We Have Learned

After building this, a few things surprised us:

Price is not correlated with quality for most tasks. A $0.60/M-token model handles 80% of production tasks as well as a $15/M-token model.
Pricing changes more than you think. We see meaningful pricing updates multiple times per week across the ecosystem.
The "just use the best model" approach is expensive at scale. At 10K calls/day, model choice is a $6,000+/month decision.
Agents need this data in real time, not in a spreadsheet. The whole point of autonomous agents is that they make decisions without human intervention — including which model to use.

Try It

WhichModel is open source and free to use.

MCP Endpoint: https://whichmodel.dev/mcp
GitHub: Which-Model/whichmodel-mcp
Website: whichmodel.dev

Built for agents. Updated every 4 hours. MIT licensed.

How to Add Cost-Aware Model Selection to Your AI Agent

Simon Sharp — Fri, 10 Apr 2026 14:23:17 +0000

How to Add Cost-Aware Model Selection to Your AI Agent

This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.

The Problem

LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing:

Current prices across providers
Which models support the capabilities you need (tool calling, JSON output, vision)
How model quality maps to task complexity

Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.

Setup: 30 Seconds

Add WhichModel to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

No API key. No installation. It is a remote MCP server — your agent connects directly.

For stdio-based clients (Claude Desktop, Cursor):

{
  "mcpServers": {
    "whichmodel": {
      "command": "npx",
      "args": ["-y", "whichmodel-mcp"]
    }
  }
}

Using It: Three Patterns

Pattern 1: Task-Based Routing

Ask WhichModel to recommend a model based on what you are doing:

recommend_model(
  task_type: "code_generation",
  complexity: "high",
  estimated_input_tokens: 4000,
  estimated_output_tokens: 2000,
  requirements: { tool_calling: true }
)

WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.

Pattern 2: Budget Caps

Set a per-call budget and let WhichModel find the best model within it:

recommend_model(
  task_type: "summarisation",
  complexity: "low",
  budget_per_call: 0.001
)

For a simple summarisation task, you might be paying $0.01 per call with GPT-4 when a $0.0005 call to a smaller model would give you the same result.

Pattern 3: Volume Cost Projections

Before committing to a model, compare costs at scale:

compare_models(
  models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
  task_type: "data_extraction",
  volume: {
    calls_per_day: 10000,
    avg_input_tokens: 1000,
    avg_output_tokens: 500
  }
)

This gives you daily and monthly cost projections for each model, so you can make informed decisions before scaling.

Why This Matters

At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. For many tasks, the cheaper model produces equivalent results.

WhichModel helps your agent make that call automatically, every time, with pricing data that updates every 4 hours.

Try It

Remote endpoint: https://whichmodel.dev/mcp
GitHub: Which-Model/whichmodel-mcp
Website: whichmodel.dev

WhichModel is open source (MIT). No API key required.