I was building an AI feature into a side project when I realised I had no idea what I was actually spending.
I knew the ballpark. GPT-4o is "expensive". Haiku is "cheap". But I didn't know the numbers for my specific prompts — and the gap between "expensive" and "cheap" turned out to be much bigger than I expected.
So I did the thing any developer would do when they can't find the tool they need: I built it.
The setup
I took five real prompts I use in my projects — a customer support classifier, a code review prompt, a JSON extraction task, a summarisation prompt, and a conversational reply — and priced each one across 22 models from 8 providers:
Claude (Haiku 3.5, Sonnet 3.5, Opus 4)
OpenAI (GPT-4o mini, GPT-4o, o1, o3-mini)
Google (Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 2.0 Flash)
Mistral, DeepSeek V3, DeepSeek R1, Grok, Perplexity, Groq Llama variants
What I found
Finding #1: The gap is 40x, not 2x
For a typical customer support classification prompt:
| Model | Cost per 1,000 calls |
|---|---|
| GPT-4o | ~$2.30 |
| Claude Sonnet 3.5 | ~$0.90 |
| Claude Haiku 3.5 | ~$0.21 |
| DeepSeek V3 | ~$0.08 |
| Groq Llama 3.1 8B | ~$0.06 |
That's a 38x difference between top and bottom.
Finding #2: GPT-4o mini is not the budget option anymore
When you add Groq, DeepSeek V3, and Gemini Flash, GPT-4o mini sits in the middle of the pack — outpriced 3–5x by the genuinely cheap options.
Finding #3: Reasoning models are shockingly expensive for simple tasks
For a simple JSON extraction task:
- o1: $0.089 per call
- DeepSeek V3: $0.0001 per call
That's 890x more expensive. For the exact same output.
Finding #4: Gemini Flash is criminally underrated
Gemini 1.5 Flash kept appearing near the bottom of the cost table — in a good way. Fast, capable, and cheaper than most realise.
The tool I built
After doing this manually with spreadsheets, I built CheyX.
Paste any prompt → see what it costs across 22 models → sorted cheapest to most expensive. No signup. No API key. Free.
Switching my customer support classifier to DeepSeek V3 would save me ~$180/month. That's a coffee every day for changing one line of config.
The practical takeaway
Most developers pick a model once, never revisit it, and just pay whatever shows up on the invoice.
A few hours benchmarking your actual prompts against cheaper models is almost always worth it. Make the cost visible, then decide.
Built CheyX to solve this for myself. Feedback welcome in the comments.
Top comments (0)