Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models

#ai #llm #opendata #machinelearning

I run a site that reviews AI tools for small business owners. One question keeps coming up: "How much does this actually cost?"

The problem is AI model pricing changes constantly. OpenAI drops prices, Google launches new tiers, open-source models get cheaper on hosted APIs. There's no single place that tracks these changes month over month in a structured way.

So I built one.

What it is

An open dataset tracking pricing for 22 AI/LLM models across four categories:

Frontier (GPT-4o, Claude Sonnet 4, Gemini 2.5 Pro)
Efficiency (GPT-4o Mini, Gemini 2.0 Flash, Mistral Small)
Reasoning (o3 Mini, DeepSeek R1, Gemini 2.5 Flash Thinking)
Open Source (Llama 4 Scout, Llama 3.3 70B, Qwen)

Every model includes price per 1M tokens (prompt, completion, and blended), context window size, and provider info.

Two composite indices

AI CPI (Cost Pressure Index): Weighted average cost across all tracked models. Shows whether the market is getting cheaper or more expensive overall.

Budget Index: Ratio of efficiency-tier pricing to frontier-tier pricing. The lower this number, the bigger the savings you get from choosing a smaller model.

How it works

Data pulls from the OpenRouter API on the 1st of each month. OpenRouter aggregates pricing across providers, so it gives a standardized view of what each model actually costs through a single API.

The dataset auto-updates monthly via cron. Historical snapshots are preserved so you can track trends over time.

The data

Available as JSON and CSV:

data/pricing_current.json — latest month
data/pricing_history.json — all historical snapshots
data/models.csv — spreadsheet-friendly format

Repo: github.com/AIscending/llm-pricing-index

Free to use with attribution. Only one month of data so far (April 2026), but it'll compound.

Why I built this

I was writing pricing breakdowns for my site and realized I was manually checking the same 20+ models every month. Automating the data collection was the obvious move. Publishing it openly seemed like the right thing to do — if I need this data, other people probably do too.

If you have suggestions for models to add or data points to track, I'm all ears.

I write practical AI guides at AIscending.com — built for people running small businesses, not ML engineers. But this dataset is for everyone.