I'm a data science student learning how AI APIs are priced. I run agentic coding sessions through OpenRouter, where every code-generation loop pulls a fresh batch of tokens from a model I picked from a list of 300+ names I barely understood. Some loops cost a few cents. Some cost dollars. The total adds up faster than I expected.
So I built a calculator for it. The blog post below is what I learned along the way.
The cheapest paid models on OpenRouter in mid-2026 are clustered in the open-weights category. Llama 3.1 8B Instruct from Meta is $0.02 per 1M input tokens and $0.03 per 1M output tokens. Phi-4 from Microsoft is $0.07/$0.14. Llama 3.3 70B at $0.10/$0.32 is the cheapest 70B-class model. Mistral Small 3.1 24B at $0.35/$0.56 is the cheapest non-Meta option in the mid-tier band.
The per-call cost on these models is essentially free at low volume. A chat-shaped call (1,000 in + 500 out) on Llama 3.1 8B is $0.000035. At 1 million calls per month, that is $35. The same call on GPT-4o is $0.0075, which is $7,500 per month. The cheap models buy a 200x cost reduction at the cost of some quality on hard tasks.
Therefore the cheap tier is the right starting point for any product where the per-call cost is a meaningful fraction of revenue.
The free tier
OpenRouter lists 26 models at $0 input and $0 output. The list includes Llama 3.2 3B Instruct, several Gemma 4 variants, Liquid's LFM 2.5 1.2B, and a handful of community-finetuned open-weights models. They are tagged with a :free suffix in the slug.
Free models are prototyping tools, not production tools. The rate limits are tighter than the paid tier (typically only a few requests per minute rather than a few hundred). Latency varies depending on the lab's GPU availability. The labs reserve the right to retire the model with little notice.
A startup that launches on a free model and grows into real traffic needs a migration plan to a paid tier within a quarter or two. The free tier is great for evaluation. It is not great for paying customers.
If the question is between Llama 3.1 8B and Mistral Small for a real workload, run a few thousand free requests against Llama 3.2 3B and Gemma 4 31B to see whether the family is competitive. The free tier is also the right place for hackathon projects, demos, and any non-production traffic.
A free model in production is a liability waiting to happen.
What the cheap models are good at
The cheap models are good at the same tasks the flagship models are good at, with a quality penalty on the hard end of the distribution. Specifically:
Classification. Sentiment analysis, topic labeling, intent detection, and any task where the output is one of N predefined categories. Llama 3.1 8B and Phi-4 are both competitive with the flagships on standard classification benchmarks.
Extraction. Pulling structured data out of unstructured text. Names, dates, amounts, addresses. The cheap models handle the workload at a level that closes the gap with the flagships in most production deployments.
Short-form generation. Email subjects, ad copy, push notifications, tweet-sized completions. The cheap models are not bottlenecked on length and the output is short enough that any quality difference is rarely visible.
Routing. Calling a cheap model to classify or extract, then escalating to a flagship only when the cheap model says the task is hard. This is the highest-ROI pattern I have found for cost reduction. Most calls don't need the flagship. The ones that do are usually obvious in advance.
When to graduate
Graduate when one of three things is true:
- Quality on the hard end of the distribution is hurting retention. Users notice when the model misses edge cases.
- The flagship's reasoning tokens are earning their keep. Reasoning tokens (o1, o3, Claude with extended thinking, Gemini 2.5 Thinking) are billed at output rates. They are worth it when they actually solve the problem.
- Your volume is large enough that the cheap-vs-flagship cost difference is meaningful in absolute terms. At 1M calls/month the cheap tier saves thousands. At 100 calls/month the savings are noise.
Do NOT graduate early. Cheap models have closed the gap on most production tasks. Reach for the flagship when you have evidence the cheap tier is the bottleneck, not before.
I built AI Cost Calculator to make this kind of comparison one click instead of ten browser tabs. Free, no signup, live OpenRouter prices for 336 models. Pulled together what I learned during my own cost debugging.
Top comments (0)