Build a Model Router in 20 Lines with WhichModel
You have an AI agent that calls LLMs. It always uses the same model. You want it to pick the right model for each task — optimising for cost, capability, and quality — without maintaining a pricing database yourself.
Here is how to build a model router in 20 lines using WhichModel and the MCP TypeScript SDK.
The Code
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const client = new Client({ name: "router", version: "1.0" });
await client.connect(
new StreamableHTTPClientTransport(new URL("https://whichmodel.dev/mcp"))
);
async function pickModel(taskType: string, complexity: string, budget?: number) {
const result = await client.callTool({
name: "recommend_model",
arguments: {
task_type: taskType,
complexity,
...(budget && { budget_per_call: budget }),
},
});
return JSON.parse(result.content[0].text);
}
// Use it
const rec = await pickModel("code_generation", "high", 0.01);
console.log(rec.recommended.model); // e.g. "anthropic/claude-sonnet-4"
console.log(rec.budget_option.model); // e.g. "google/gemini-2.5-flash"
console.log(rec.estimated_cost); // e.g. "$0.0034"
That is it. Your agent now picks the optimal model for every call based on live pricing data.
What You Get Back
The recommend_model tool returns:
{
"recommended": {
"model": "anthropic/claude-sonnet-4",
"provider": "anthropic",
"estimated_cost": "$0.0034",
"reasoning": "Best quality-to-cost ratio for high-complexity code generation"
},
"alternative": {
"model": "openai/gpt-4.1",
"estimated_cost": "$0.0028"
},
"budget_option": {
"model": "google/gemini-2.5-flash",
"estimated_cost": "$0.0004"
}
}
Three options: best pick, alternative, and budget. Your agent decides which to use based on the task.
Adding Budget Caps
Want to enforce spending limits? Add a budget:
// Never spend more than $0.002 per call
const cheap = await pickModel("summarisation", "low", 0.002);
WhichModel finds the best model within your budget. If nothing fits, it tells you.
Comparing at Scale
Before committing to a model for a high-volume pipeline, compare costs:
const comparison = await client.callTool({
name: "compare_models",
arguments: {
models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
volume: { calls_per_day: 10000, avg_input_tokens: 1000, avg_output_tokens: 500 }
}
});
This gives you daily and monthly cost projections for each model — no spreadsheet required.
Why Not Just Hardcode?
- Prices change multiple times per week across providers
- New models launch constantly — last month alone saw 5 new models that are cheaper than existing options
- Different tasks need different models — a $15/M-token model is overkill for classification
- At 10K calls/day, model choice is a $6,000+/month decision
WhichModel tracks all of this and updates every 4 hours. Your router stays current without code changes.
Get Started
{
"mcpServers": {
"whichmodel": {
"url": "https://whichmodel.dev/mcp"
}
}
}
- GitHub: Which-Model/whichmodel-mcp
- Website: whichmodel.dev
- License: MIT — free to use, no API key required
20 lines. Zero maintenance. Always current pricing.
Top comments (0)