DEV Community

Simon Sharp
Simon Sharp

Posted on

How to Add Cost-Aware Model Selection to Your AI Agent

How to Add Cost-Aware Model Selection to Your AI Agent

Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.

This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.

The Problem

LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing:

  • Current prices across providers
  • Which models support the capabilities you need (tool calling, JSON output, vision)
  • How model quality maps to task complexity

Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.

Setup: 30 Seconds

Add WhichModel to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

No API key. No installation. It is a remote MCP server — your agent connects directly.

For stdio-based clients (Claude Desktop, Cursor):

{
  "mcpServers": {
    "whichmodel": {
      "command": "npx",
      "args": ["-y", "whichmodel-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Using It: Three Patterns

Pattern 1: Task-Based Routing

Ask WhichModel to recommend a model based on what you are doing:

recommend_model(
  task_type: "code_generation",
  complexity: "high",
  estimated_input_tokens: 4000,
  estimated_output_tokens: 2000,
  requirements: { tool_calling: true }
)
Enter fullscreen mode Exit fullscreen mode

WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.

Pattern 2: Budget Caps

Set a per-call budget and let WhichModel find the best model within it:

recommend_model(
  task_type: "summarisation",
  complexity: "low",
  budget_per_call: 0.001
)
Enter fullscreen mode Exit fullscreen mode

For a simple summarisation task, you might be paying $0.01 per call with GPT-4 when a $0.0005 call to a smaller model would give you the same result.

Pattern 3: Volume Cost Projections

Before committing to a model, compare costs at scale:

compare_models(
  models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
  task_type: "data_extraction",
  volume: {
    calls_per_day: 10000,
    avg_input_tokens: 1000,
    avg_output_tokens: 500
  }
)
Enter fullscreen mode Exit fullscreen mode

This gives you daily and monthly cost projections for each model, so you can make informed decisions before scaling.

Why This Matters

At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. For many tasks, the cheaper model produces equivalent results.

WhichModel helps your agent make that call automatically, every time, with pricing data that updates every 4 hours.

Try It


WhichModel is open source (MIT). No API key required.

Top comments (0)