DEV Community

Simon Sharp
Simon Sharp

Posted on

How to Add Cost-Aware Model Selection to Your AI Agent

How to Add Cost-Aware Model Selection to Your AI Agent

Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.

This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.

The Problem

LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing current prices across providers, which models support the capabilities you need, and how model quality maps to task complexity.

Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.

Setup: 30 Seconds

Add WhichModel to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

No API key. No installation. It is a remote MCP server — your agent connects directly.

Using It: Three Patterns

Pattern 1: Task-Based Routing

Ask WhichModel to recommend a model based on what you are doing:

recommend_model(
  task_type: "code_generation",
  complexity: "high",
  estimated_input_tokens: 4000,
  estimated_output_tokens: 2000,
  requirements: { tool_calling: true }
)
Enter fullscreen mode Exit fullscreen mode

WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.

Pattern 2: Budget Caps

Set a per-call budget and let WhichModel find the best model within it:

recommend_model(
  task_type: "summarisation",
  complexity: "low",
  budget_per_call: 0.001
)
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Volume Cost Projections

Before committing to a model, compare costs at scale:

compare_models(
  models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
  volume: {
    calls_per_day: 10000,
    avg_input_tokens: 1000,
    avg_output_tokens: 500
  }
)
Enter fullscreen mode Exit fullscreen mode

Why This Matters

At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. WhichModel helps your agent make that call automatically, with pricing data that updates every 4 hours.

Try It

WhichModel is open source (MIT). No API key required.

Top comments (0)