I Compared 4,000+ AI API Prices So You Don't Have To

#ai #productivity #api #llm

If you're building with LLMs, API costs can quietly kill your margin. The official pricing pages look simple — but the actual cheapest option for a given model is rarely the official provider.

I spent the last few months building a scraper that tracks prices across 70+ providers for 4,000+ model-provider combinations, updated daily. Here's what I found.

Current Cheapest Prices Per Model Family

The lowest price available right now for each family's latest flagship model:

Model	Input /1M tokens
Llama 4 Maverick	$0.028
DeepSeek V4 Pro	$0.060
GLM-5.2	$0.126
Claude Opus 4.8	$0.138
Grok 4.3	$0.172
Qwen3.7 Max	$0.189
GPT-5.5	$0.207
Gemini 3.1 Pro	$0.276
Mistral Large 3	$0.276
Command A	$2.50

Why Prices Vary So Much

There's a whole category of providers called API relay services — they buy API access in bulk and pass the savings to developers. These are especially common for:

OpenAI models: relay providers often undercut official pricing by 50–80%
Claude: relay providers can be 3–5x cheaper than Anthropic's official API
Gemini: relay providers often offer better rate limits than the free tier

The catch: you're trusting a third-party with your API calls. For production, always evaluate reliability. For development and testing, the cost savings are real.

Key Findings

1. The cheapest option is almost never the official provider
For every major model except open-source ones, at least one relay provider offers 50%+ discount vs official pricing.

2. DeepSeek is genuinely cheap everywhere
DeepSeek V4 Pro delivers strong performance at a fraction of GPT-4o pricing, regardless of which provider you use.

3. Qwen and GLM have the deepest discounts
These Alibaba and Zhipu models go below $0.20/1M input on relay providers. If latency to Chinese servers is acceptable, the savings are significant.

4. Open-source models beat most official "lite" tiers
Llama 4 Maverick at $0.028/1M is cheaper than official mini/flash model pricing from most major labs.

5. "Flash" models aren't always the cheapest
Gemini Flash is cheap, but Pro-tier models on relay providers can sometimes undercut official Flash pricing.

How I Built the Tracker

The scraper hits each provider's pricing endpoint or page daily at 09:00 UTC. The main engineering challenges:

Canonical name matching: the same model is listed under dozens of different names across providers (claude-opus-4-8, claude/opus-4.8, anthropic.claude-opus-4-8 etc). I built a normalization layer to map everything to a canonical name.
Automatic flagship selection: for each model family, I parse version numbers from canonical names and pick the highest-scoring model automatically. When GPT-6 ships and gets scraped, it surfaces without any code changes.
Deprecation handling: models get marked deprecated when providers drop them, so stale data doesn't pollute the comparison.