AI Model Pricing Is a Mess — Here Is How We Track It

#mcp #llm #ai #opensource

AI Model Pricing Is a Mess — Here Is How We Track It

There are over 100 LLM models available through commercial APIs today. Their pricing changes constantly — sometimes multiple times per week. New models launch, old ones get deprecated, and providers quietly adjust rates.

If you are building with LLMs, you have probably experienced this: you pick a model, hardcode it, ship it, and three months later discover you are paying 10x what a newer model would cost for the same quality.

We built WhichModel to fix this.

The Scale of the Problem

10+ providers with different pricing pages, formats, and update cadences
100+ models with different input/output/cached token rates
Capability matrices that change with each model update (vision, tool calling, JSON mode, context windows)
Quality tiers that do not map cleanly to price — a $0.60/M-token model can outperform a $15/M-token model on specific tasks

Most teams handle this by not handling it. They pick a model, maybe two, and revisit the decision quarterly if ever.

How We Track It

WhichModel scrapes, normalises, and cross-verifies pricing data from every major LLM provider every 4 hours.

Multi-Source Verification

We do not trust a single source. Pricing data is cross-checked across provider APIs, documentation pages, and third-party aggregators. If sources disagree, we flag it.

Structured Capability Tracking

For each model we track:

Input, output, and cached token prices
Context window size
Supported features (tool calling, JSON output, streaming, vision)
Provider and availability

MCP-Native Access

The data is exposed as an MCP server — meaning any AI agent can query it natively. No REST API to learn, no SDK to install:

One line of config. No API key. Real-time pricing data.

Your agent can then ask:

"What is the cheapest model that supports tool calling with at least 128K context?"
"Compare Claude Sonnet 4 vs GPT-4.1 for code generation at 10K calls/day"
"Recommend a model for data extraction under $0.002 per call"

What We Have Learned

1. Price is not correlated with quality for most tasks.
A $0.60/M-token model handles 80% of production tasks as well as a $15/M-token model. The gap matters for the remaining 20%.

2. Pricing changes more than you think.
We see meaningful pricing updates multiple times per week across the ecosystem. What was true last month may not be true today.

3. The "just use the best model" approach is expensive at scale.
At 10K calls/day, the difference between a $15/M-token model and a $0.60/M-token model is $216/day — over $6,000 per month.

4. Agents need this data in real time, not in a spreadsheet.
The whole point of autonomous agents is that they make decisions without human intervention — including which model to use.