Build a Model Router in 20 Lines with WhichModel

#mcp #typescript #ai #tutorial

Build a Model Router in 20 Lines with WhichModel

You have an AI agent that calls LLMs. It always uses the same model. You want it to pick the right model for each task — optimising for cost, capability, and quality — without maintaining a pricing database yourself.

Here is how to build a model router in 20 lines using WhichModel and the MCP TypeScript SDK.

The Code

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";

const client = new Client({ name: "router", version: "1.0" });
await client.connect(
  new StreamableHTTPClientTransport(new URL("https://whichmodel.dev/mcp"))
);

async function pickModel(taskType: string, complexity: string, budget?: number) {
  const result = await client.callTool({
    name: "recommend_model",
    arguments: {
      task_type: taskType,
      complexity,
      ...(budget && { budget_per_call: budget }),
    },
  });
  return JSON.parse(result.content[0].text);
}

// Use it
const rec = await pickModel("code_generation", "high", 0.01);
console.log(rec.recommended.model); // e.g. "anthropic/claude-sonnet-4"
console.log(rec.budget_option.model); // e.g. "google/gemini-2.5-flash"
console.log(rec.estimated_cost);      // e.g. "$0.0034"

That is it. Your agent now picks the optimal model for every call based on live pricing data.

What You Get Back

The recommend_model tool returns:

{
  "recommended": {
    "model": "anthropic/claude-sonnet-4",
    "provider": "anthropic",
    "estimated_cost": "$0.0034",
    "reasoning": "Best quality-to-cost ratio for high-complexity code generation"
  },
  "alternative": {
    "model": "openai/gpt-4.1",
    "estimated_cost": "$0.0028"
  },
  "budget_option": {
    "model": "google/gemini-2.5-flash",
    "estimated_cost": "$0.0004"
  }
}

Three options: best pick, alternative, and budget. Your agent decides which to use based on the task.

Adding Budget Caps

Want to enforce spending limits? Add a budget:

// Never spend more than $0.002 per call
const cheap = await pickModel("summarisation", "low", 0.002);

WhichModel finds the best model within your budget. If nothing fits, it tells you.

Comparing at Scale

Before committing to a model for a high-volume pipeline, compare costs:

const comparison = await client.callTool({
  name: "compare_models",
  arguments: {
    models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
    volume: { calls_per_day: 10000, avg_input_tokens: 1000, avg_output_tokens: 500 }
  }
});

This gives you daily and monthly cost projections for each model — no spreadsheet required.

Why Not Just Hardcode?

Prices change multiple times per week across providers
New models launch constantly — last month alone saw 5 new models that are cheaper than existing options
Different tasks need different models — a $15/M-token model is overkill for classification
At 10K calls/day, model choice is a $6,000+/month decision

WhichModel tracks all of this and updates every 4 hours. Your router stays current without code changes.

Get Started

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

GitHub: Which-Model/whichmodel-mcp
Website: whichmodel.dev
License: MIT — free to use, no API key required

20 lines. Zero maintenance. Always current pricing.

DEV Community

Build a Model Router in 20 Lines with WhichModel

Build a Model Router in 20 Lines with WhichModel

The Code

What You Get Back

Adding Budget Caps

Comparing at Scale

Why Not Just Hardcode?

Get Started

Top comments (0)