How to Add Cost-Aware Model Selection to Your AI Agent
Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.
This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.
The Problem
LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing current prices across providers, which models support the capabilities you need, and how model quality maps to task complexity.
Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.
Setup: 30 Seconds
Add WhichModel to your MCP client config:
{
"mcpServers": {
"whichmodel": {
"url": "https://whichmodel.dev/mcp"
}
}
}
No API key. No installation. It is a remote MCP server — your agent connects directly.
Using It: Three Patterns
Pattern 1: Task-Based Routing
Ask WhichModel to recommend a model based on what you are doing:
recommend_model(
task_type: "code_generation",
complexity: "high",
estimated_input_tokens: 4000,
estimated_output_tokens: 2000,
requirements: { tool_calling: true }
)
WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.
Pattern 2: Budget Caps
Set a per-call budget and let WhichModel find the best model within it:
recommend_model(
task_type: "summarisation",
complexity: "low",
budget_per_call: 0.001
)
Pattern 3: Volume Cost Projections
Before committing to a model, compare costs at scale:
compare_models(
models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
volume: {
calls_per_day: 10000,
avg_input_tokens: 1000,
avg_output_tokens: 500
}
)
Why This Matters
At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. WhichModel helps your agent make that call automatically, with pricing data that updates every 4 hours.
Try It
- Remote endpoint: https://whichmodel.dev/mcp
- GitHub: Which-Model/whichmodel-mcp
- Website: whichmodel.dev
WhichModel is open source (MIT). No API key required.
Top comments (0)