How to Add Cost-Aware Model Selection to Your AI Agent
Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.
This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.
The Problem
LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing:
- Current prices across providers
- Which models support the capabilities you need (tool calling, JSON output, vision)
- How model quality maps to task complexity
Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.
Setup: 30 Seconds
Add WhichModel to your MCP client config:
{
"mcpServers": {
"whichmodel": {
"url": "https://whichmodel.dev/mcp"
}
}
}
No API key. No installation. It is a remote MCP server — your agent connects directly.
For stdio-based clients (Claude Desktop, Cursor):
{
"mcpServers": {
"whichmodel": {
"command": "npx",
"args": ["-y", "whichmodel-mcp"]
}
}
}
Using It: Three Patterns
Pattern 1: Task-Based Routing
Ask WhichModel to recommend a model based on what you are doing:
recommend_model(
task_type: "code_generation",
complexity: "high",
estimated_input_tokens: 4000,
estimated_output_tokens: 2000,
requirements: { tool_calling: true }
)
WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.
Pattern 2: Budget Caps
Set a per-call budget and let WhichModel find the best model within it:
recommend_model(
task_type: "summarisation",
complexity: "low",
budget_per_call: 0.001
)
For a simple summarisation task, you might be paying $0.01 per call with GPT-4 when a $0.0005 call to a smaller model would give you the same result.
Pattern 3: Volume Cost Projections
Before committing to a model, compare costs at scale:
compare_models(
models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
task_type: "data_extraction",
volume: {
calls_per_day: 10000,
avg_input_tokens: 1000,
avg_output_tokens: 500
}
)
This gives you daily and monthly cost projections for each model, so you can make informed decisions before scaling.
Why This Matters
At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. For many tasks, the cheaper model produces equivalent results.
WhichModel helps your agent make that call automatically, every time, with pricing data that updates every 4 hours.
Try It
-
Remote endpoint:
https://whichmodel.dev/mcp - GitHub: Which-Model/whichmodel-mcp
- Website: whichmodel.dev
WhichModel is open source (MIT). No API key required.
Top comments (0)