How to Add Cost-Aware Model Selection to Your AI Agent

#ai #mcp #llm #agentdev

How to Add Cost-Aware Model Selection to Your AI Agent

Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.

This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.

The Problem

LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing current prices across providers, which models support the capabilities you need, and how model quality maps to task complexity.

Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.

Setup: 30 Seconds

Add WhichModel to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

No API key. No installation. It is a remote MCP server — your agent connects directly.

Using It: Three Patterns

Pattern 1: Task-Based Routing

Ask WhichModel to recommend a model based on what you are doing:

recommend_model(
  task_type: "code_generation",
  complexity: "high",
  estimated_input_tokens: 4000,
  estimated_output_tokens: 2000,
  requirements: { tool_calling: true }
)

WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.

Pattern 2: Budget Caps

Set a per-call budget and let WhichModel find the best model within it:

recommend_model(
  task_type: "summarisation",
  complexity: "low",
  budget_per_call: 0.001
)

Pattern 3: Volume Cost Projections

Before committing to a model, compare costs at scale:

compare_models(
  models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
  volume: {
    calls_per_day: 10000,
    avg_input_tokens: 1000,
    avg_output_tokens: 500
  }
)

Why This Matters

At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. WhichModel helps your agent make that call automatically, with pricing data that updates every 4 hours.