DEV Community

Simon Sharp
Simon Sharp

Posted on

Build a Model Router in 20 Lines with WhichModel

Build a Model Router in 20 Lines with WhichModel

You have an AI agent that calls LLMs. It always uses the same model. You want it to pick the right model for each task — optimising for cost, capability, and quality — without maintaining a pricing database yourself.

Here is how to build a model router in 20 lines using WhichModel and the MCP TypeScript SDK.

The Code

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";

const client = new Client({ name: "router", version: "1.0" });
await client.connect(
  new StreamableHTTPClientTransport(new URL("https://whichmodel.dev/mcp"))
);

async function pickModel(taskType: string, complexity: string, budget?: number) {
  const result = await client.callTool({
    name: "recommend_model",
    arguments: {
      task_type: taskType,
      complexity,
      ...(budget && { budget_per_call: budget }),
    },
  });
  return JSON.parse(result.content[0].text);
}

// Use it
const rec = await pickModel("code_generation", "high", 0.01);
console.log(rec.recommended.model); // e.g. "anthropic/claude-sonnet-4"
console.log(rec.budget_option.model); // e.g. "google/gemini-2.5-flash"
console.log(rec.estimated_cost);      // e.g. "$0.0034"
Enter fullscreen mode Exit fullscreen mode

That is it. Your agent now picks the optimal model for every call based on live pricing data.

What You Get Back

The recommend_model tool returns:

{
  "recommended": {
    "model": "anthropic/claude-sonnet-4",
    "provider": "anthropic",
    "estimated_cost": "$0.0034",
    "reasoning": "Best quality-to-cost ratio for high-complexity code generation"
  },
  "alternative": {
    "model": "openai/gpt-4.1",
    "estimated_cost": "$0.0028"
  },
  "budget_option": {
    "model": "google/gemini-2.5-flash",
    "estimated_cost": "$0.0004"
  }
}
Enter fullscreen mode Exit fullscreen mode

Three options: best pick, alternative, and budget. Your agent decides which to use based on the task.

Adding Budget Caps

Want to enforce spending limits? Add a budget:

// Never spend more than $0.002 per call
const cheap = await pickModel("summarisation", "low", 0.002);
Enter fullscreen mode Exit fullscreen mode

WhichModel finds the best model within your budget. If nothing fits, it tells you.

Comparing at Scale

Before committing to a model for a high-volume pipeline, compare costs:

const comparison = await client.callTool({
  name: "compare_models",
  arguments: {
    models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
    volume: { calls_per_day: 10000, avg_input_tokens: 1000, avg_output_tokens: 500 }
  }
});
Enter fullscreen mode Exit fullscreen mode

This gives you daily and monthly cost projections for each model — no spreadsheet required.

Why Not Just Hardcode?

  • Prices change multiple times per week across providers
  • New models launch constantly — last month alone saw 5 new models that are cheaper than existing options
  • Different tasks need different models — a $15/M-token model is overkill for classification
  • At 10K calls/day, model choice is a $6,000+/month decision

WhichModel tracks all of this and updates every 4 hours. Your router stays current without code changes.

Get Started

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

20 lines. Zero maintenance. Always current pricing.

Top comments (0)