GPT-4o is OpenAI's flagship model. The pricing page shows $2.50/1M input tokens. What it doesn't show is everything else you'll actually pay for.
The Visible Cost
Input: $2.50/1M tokens. Output: $10/1M tokens. Easy.
# What you think you're paying
input_cost = (1_000_000 / 1_000_000) * 2.50 # $2.50
output_cost = (500_000 / 1_000_000) * 10.00 # $5.00
# Total: $7.50
The Hidden Costs
1. System prompts are input tokens too
Most production apps have 500-2000 token system prompts. On every single request.
# 1000 system prompt tokens × 1M requests/month
system_prompt_cost = (1000 * 1_000_000 / 1_000_000) * 2.50 # $2,500 extra/month
2. Retry costs
Network errors, rate limits, hallucinations requiring retries — add 10-20% to your bill.
3. Context window bloat
Chat history grows per conversation. A 20-turn conversation has 10x the token cost of the first message.
4. Embeddings (separate billing)
If you use text-embedding-3-large for RAG, that's an additional $0.13/1M tokens. At scale: significant.
The Fix
import requests
# Check actual cost before sending, including system prompt
def estimate_full_cost(system_prompt, user_message, expected_output_tokens=500):
total_input = len(system_prompt.split()) * 1.3 + len(user_message.split()) * 1.3
resp = requests.get("https://api.lazy-mac.com/ai-spend/calculate", params={
"model": "gpt-4o",
"input_tokens": int(total_input),
"output_tokens": expected_output_tokens
})
return resp.json()['total_cost']
Track actual spend (including all hidden sources) vs estimated: AI Cost Calculator | API store
Top comments (0)