Simon Sharp

Posted on Apr 10

How to Add Cost-Aware Model Selection to Your AI Agent

#mcp #typescript #ai #agents

How to Add Cost-Aware Model Selection to Your AI Agent

Every AI agent picks a model. Most pick the same one every time — usually the most expensive one. That is a fine default when you are prototyping, but in production it means you are overpaying for simple tasks and underpowering complex ones.

This tutorial shows how to add dynamic, cost-aware model selection to any AI agent using WhichModel, an open MCP server that tracks pricing and capabilities across 100+ LLM models.

The Problem

LLM pricing changes constantly. New models launch weekly. Picking the right model for each task requires knowing:

Current prices across providers
Which models support the capabilities you need (tool calling, JSON output, vision)
How model quality maps to task complexity

Maintaining this yourself means building a pricing database, keeping it updated, and writing routing logic. Or you can let your agent ask WhichModel.

Setup: 30 Seconds

Add WhichModel to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

No API key. No installation. It is a remote MCP server — your agent connects directly.

For stdio-based clients (Claude Desktop, Cursor):

{
  "mcpServers": {
    "whichmodel": {
      "command": "npx",
      "args": ["-y", "whichmodel-mcp"]
    }
  }
}

Using It: Three Patterns

Pattern 1: Task-Based Routing

Ask WhichModel to recommend a model based on what you are doing:

recommend_model(
  task_type: "code_generation",
  complexity: "high",
  estimated_input_tokens: 4000,
  estimated_output_tokens: 2000,
  requirements: { tool_calling: true }
)

WhichModel returns a recommended model, a budget alternative, cost estimates, and reasoning for the pick.

Pattern 2: Budget Caps

Set a per-call budget and let WhichModel find the best model within it:

recommend_model(
  task_type: "summarisation",
  complexity: "low",
  budget_per_call: 0.001
)

For a simple summarisation task, you might be paying $0.01 per call with GPT-4 when a $0.0005 call to a smaller model would give you the same result.

Pattern 3: Volume Cost Projections

Before committing to a model, compare costs at scale:

compare_models(
  models: ["anthropic/claude-sonnet-4", "openai/gpt-4.1-mini", "google/gemini-2.5-flash"],
  task_type: "data_extraction",
  volume: {
    calls_per_day: 10000,
    avg_input_tokens: 1000,
    avg_output_tokens: 500
  }
)

This gives you daily and monthly cost projections for each model, so you can make informed decisions before scaling.

Why This Matters

At 10,000 calls per day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month. For many tasks, the cheaper model produces equivalent results.

WhichModel helps your agent make that call automatically, every time, with pricing data that updates every 4 hours.

Try It

Remote endpoint: https://whichmodel.dev/mcp
GitHub: Which-Model/whichmodel-mcp
Website: whichmodel.dev

WhichModel is open source (MIT). No API key required.

DEV Community

How to Add Cost-Aware Model Selection to Your AI Agent

How to Add Cost-Aware Model Selection to Your AI Agent

The Problem

Setup: 30 Seconds

Using It: Three Patterns

Pattern 1: Task-Based Routing

Pattern 2: Budget Caps

Pattern 3: Volume Cost Projections

Why This Matters

Try It

Top comments (0)