Quick question for anyone building with LLM APIs.
The cost spread across current models is wild — GPT-4o vs Gemini 2.0 Flash is roughly a 30x difference per token. For most tasks, you could swap to a cheaper model and users wouldn't notice. But you only realize this late in the project, after the architecture is already set.
What's your process for thinking about costs before you start building?
I built a client-side token counter (llmtokens.vercel.app) that shows real-time cost breakdowns across 25+ current models as you type — GPT-4o, Claude 4 Sonnet/Opus, Gemini 2.5 Pro/Flash, o3, DeepSeek, Llama, etc. It runs entirely in the browser, no signup.
The goal was to make the cost conversation happen at architecture time, not bill-shock time.
Curious what others do:
- Do you pick a model first and accept the cost, or estimate cost first and pick accordingly?
- Any heuristics for which tasks justify premium models vs. flash/mini tiers?
- Anything about cost estimation you wish you'd known earlier?
Top comments (0)