Darul

Posted on May 29

I compared AI costs across 22 models — here's what surprised me

#ai #webdev #productivity #llm

I was building an AI feature into a side project when I realised I had no idea what I was actually spending.

I knew the ballpark. GPT-4o is "expensive". Haiku is "cheap". But I didn't know the numbers for my specific prompts — and the gap between "expensive" and "cheap" turned out to be much bigger than I expected.

So I did the thing any developer would do when they can't find the tool they need: I built it.

The setup

I took five real prompts I use in my projects — a customer support classifier, a code review prompt, a JSON extraction task, a summarisation prompt, and a conversational reply — and priced each one across 22 models from 8 providers:

Claude (Haiku 3.5, Sonnet 3.5, Opus 4)
OpenAI (GPT-4o mini, GPT-4o, o1, o3-mini)
Google (Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 2.0 Flash)
Mistral, DeepSeek V3, DeepSeek R1, Grok, Perplexity, Groq Llama variants

What I found

Finding #1: The gap is 40x, not 2x

For a typical customer support classification prompt:

Model	Cost per 1,000 calls
GPT-4o	~$2.30
Claude Sonnet 3.5	~$0.90
Claude Haiku 3.5	~$0.21
DeepSeek V3	~$0.08
Groq Llama 3.1 8B	~$0.06

That's a 38x difference between top and bottom.

Finding #2: GPT-4o mini is not the budget option anymore

When you add Groq, DeepSeek V3, and Gemini Flash, GPT-4o mini sits in the middle of the pack — outpriced 3–5x by the genuinely cheap options.

Finding #3: Reasoning models are shockingly expensive for simple tasks

For a simple JSON extraction task:

o1: $0.089 per call
DeepSeek V3: $0.0001 per call

That's 890x more expensive. For the exact same output.

Finding #4: Gemini Flash is criminally underrated

Gemini 1.5 Flash kept appearing near the bottom of the cost table — in a good way. Fast, capable, and cheaper than most realise.

The tool I built

After doing this manually with spreadsheets, I built CheyX.

Paste any prompt → see what it costs across 22 models → sorted cheapest to most expensive. No signup. No API key. Free.

Switching my customer support classifier to DeepSeek V3 would save me ~$180/month. That's a coffee every day for changing one line of config.

The practical takeaway

Most developers pick a model once, never revisit it, and just pay whatever shows up on the invoice.

A few hours benchmarking your actual prompts against cheaper models is almost always worth it. Make the cost visible, then decide.

Built CheyX to solve this for myself. Feedback welcome in the comments.

→ ai-model-cost-calculator.vercel.app

DEV Community