Retries — Failed requests still count. A 10% retry rate adds 10% to your bill.
System prompts — That 2000-token system prompt gets sent with every request.
Streaming overhead — Some providers charge slightly more for streamed responses.
Batch vs real-time — OpenAI's batch API is 50% cheaper but has a 24h SLA.
Automate Price Monitoring
// Node.js: check for price changesconstcheckPricing=async ()=>{constresp=awaitfetch('https://api.lazy-mac.com/ai-spend/pricing');constpricing=awaitresp.json();// Compare with your stored baselinefor (constmodelofpricing.models){console.log(`${model.name}: $${model.input_price}/1M in, $${model.output_price}/1M out`);}};
The Bottom Line
Cheapest: Gemini 2.0 Flash ($0.10/1M input)
Best value: GPT-4o or Claude Sonnet (quality/price sweet spot)
Most capable: Claude Opus 4 or o1 (for tasks that justify the cost)
Bookmark the AI Spend API for live pricing data on 50+ models.
Top comments (0)