楊東霖

Posted on Mar 25 • Originally published at devplaybook.cc

OpenAI API Pricing Calculator: How to Estimate Your Monthly Costs

#ai #openai #api #programming

One of the most common developer surprises when building with OpenAI's API is the bill at the end of the month. Token-based pricing is not intuitive, and it's easy to underestimate costs when you're moving fast in development. A chatbot that seems cheap in testing can become expensive at scale if you haven't done the math.

This guide shows you how to estimate your OpenAI API costs before they surprise you — with real calculation examples, optimization strategies, and a cross-provider comparison so you can make informed model choices.

Understanding OpenAI's Pricing Model

OpenAI charges based on tokens — not requests, not words, not time. A token is roughly 4 characters or 0.75 words in English. Every API call consumes input tokens (your prompt) and produces output tokens (the response). You pay for both, at different rates.

Key insight: Input tokens are always cheaper than output tokens. For most models, output costs 3–5× more per token than input. This matters for how you architect your prompts.

Current OpenAI Model Pricing (2025)

Prices are per 1 million tokens unless otherwise noted:

Model	Input	Cached Input	Output
GPT-4o	$2.50	$1.25	$10.00
GPT-4o-mini	$0.15	$0.075	$0.60
o3	$10.00	$2.50	$40.00
o3-mini	$1.10	$0.55	$4.40
o1	$15.00	$7.50	$60.00

Notes:

Cached input pricing applies when the same prompt prefix is reused (see Prompt Caching below)
o3 and o1 are reasoning models — they think before responding, producing "reasoning tokens" that contribute to cost
Check platform.openai.com/docs/pricing for the latest figures — prices change

The Token Math: How to Calculate Costs

The formula is straightforward:

Cost = (input_tokens × input_price_per_token) + (output_tokens × output_price_per_token)

Where:

input_price_per_token = model_input_price / 1,000,000
output_price_per_token = model_output_price / 1,000,000

Quick Reference: Token Counting

Content	Approximate Token Count
Single word	~1 token
Average sentence	~15–20 tokens
1 page of text (~500 words)	~650–750 tokens
Short paragraph	~100 tokens
System prompt (200 words)	~260 tokens
Full book chapter	~4,000–8,000 tokens

You can count tokens precisely using OpenAI's tiktoken library:

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Example
system_prompt = "You are a helpful customer support assistant for an e-commerce store."
user_message = "I ordered 3 days ago and my package hasn't arrived. Order number is #78432."

system_tokens = count_tokens(system_prompt)
user_tokens = count_tokens(user_message)
total_input = system_tokens + user_tokens

print(f"System prompt: {system_tokens} tokens")
print(f"User message: {user_tokens} tokens")
print(f"Total input: {total_input} tokens")

Real Calculation Examples

Example 1: Customer Support Chatbot

Setup:

System prompt: 300 tokens (role definition, product info, guidelines)
Average user message: 80 tokens
Average conversation history passed: 400 tokens (last 3 turns)
Average assistant response: 200 tokens
Volume: 50,000 conversations/month

Per conversation cost (GPT-4o):

Input tokens:  300 (system) + 80 (user) + 400 (history) = 780 tokens
Output tokens: 200 tokens

Input cost:  780 × ($2.50 / 1,000,000)  = $0.00195
Output cost: 200 × ($10.00 / 1,000,000) = $0.00200
Cost per conversation: ~$0.004

Monthly cost at 50,000 conversations:

50,000 × $0.004 = $200/month (GPT-4o)
50,000 × $0.00024 = $12/month (GPT-4o-mini)

Recommendation: Use GPT-4o-mini for straightforward support queries and escalate to GPT-4o only for complex cases. Potential savings: 94%.

Example 2: Document Summarizer

Setup:

Average document length: 3,000 tokens
System prompt: 150 tokens
Average summary output: 400 tokens
Volume: 10,000 documents/month

Per document cost (GPT-4o):

Input tokens:  3,000 (doc) + 150 (system) = 3,150 tokens
Output tokens: 400 tokens

Input cost:  3,150 × ($2.50 / 1,000,000)  = $0.007875
Output cost: 400 × ($10.00 / 1,000,000)   = $0.004000
Cost per document: ~$0.012

Monthly cost at 10,000 documents:

GPT-4o:      10,000 × $0.012  = $120/month
GPT-4o-mini: 10,000 × $0.00072 = $7.20/month

For summarization tasks where you don't need frontier reasoning, GPT-4o-mini produces excellent results at a fraction of the cost.

Example 3: AI Coding Assistant

Setup:

System prompt: 500 tokens (coding guidelines, language constraints)
Average code context passed: 1,500 tokens
Average user question: 100 tokens
Average code response: 800 tokens
Volume: 5,000 requests/month

Per request cost (GPT-4o):

Input tokens:  500 + 1,500 + 100 = 2,100 tokens
Output tokens: 800 tokens

Input cost:  2,100 × ($2.50 / 1,000,000)  = $0.00525
Output cost: 800 × ($10.00 / 1,000,000)   = $0.00800
Cost per request: ~$0.013

Monthly cost:

GPT-4o:   5,000 × $0.013  = $65/month
o3-mini:  5,000 × ($2,100 × 0.0000011 + $800 × 0.0000044) = ~$23/month

For coding tasks requiring deeper reasoning, o3-mini can outperform GPT-4o at a lower cost — run your own evals to verify for your specific use case.

Building a Cost Estimator in Python

Here's a reusable calculator you can adapt for your own apps:

# OpenAI pricing per million tokens (update as needed)
PRICING = {
    "gpt-4o": {
        "input": 2.50,
        "cached_input": 1.25,
        "output": 10.00
    },
    "gpt-4o-mini": {
        "input": 0.15,
        "cached_input": 0.075,
        "output": 0.60
    },
    "o3": {
        "input": 10.00,
        "cached_input": 2.50,
        "output": 40.00
    },
    "o3-mini": {
        "input": 1.10,
        "cached_input": 0.55,
        "output": 4.40
    }
}

def estimate_monthly_cost(
    model: str,
    avg_input_tokens: int,
    avg_output_tokens: int,
    monthly_requests: int,
    cached_input_fraction: float = 0.0
) -> dict:
    """
    Estimate monthly API costs.

    Args:
        model: Model name (e.g., "gpt-4o")
        avg_input_tokens: Average input tokens per request
        avg_output_tokens: Average output tokens per request
        monthly_requests: Expected monthly request volume
        cached_input_fraction: Fraction of input that qualifies for cached pricing (0.0–1.0)
    """
    if model not in PRICING:
        raise ValueError(f"Unknown model: {model}. Available: {list(PRICING.keys())}")

    prices = PRICING[model]

    # Split input into cached vs non-cached
    cached_tokens = avg_input_tokens * cached_input_fraction
    fresh_tokens = avg_input_tokens * (1 - cached_input_fraction)

    cost_per_request = (
        (fresh_tokens * prices["input"] / 1_000_000) +
        (cached_tokens * prices["cached_input"] / 1_000_000) +
        (avg_output_tokens * prices["output"] / 1_000_000)
    )

    monthly_cost = cost_per_request * monthly_requests

    return {
        "model": model,
        "cost_per_request_usd": round(cost_per_request, 6),
        "monthly_cost_usd": round(monthly_cost, 2),
        "monthly_input_tokens": avg_input_tokens * monthly_requests,
        "monthly_output_tokens": avg_output_tokens * monthly_requests,
    }


def compare_models(avg_input_tokens, avg_output_tokens, monthly_requests):
    """Compare costs across all models for a given workload."""
    results = []
    for model in PRICING:
        result = estimate_monthly_cost(
            model, avg_input_tokens, avg_output_tokens, monthly_requests
        )
        results.append(result)
    return sorted(results, key=lambda x: x["monthly_cost_usd"])


# Example: Chatbot with 1,000 tokens input, 300 tokens output, 100k requests/month
comparison = compare_models(1000, 300, 100_000)
for r in comparison:
    print(f"{r['model']:20} ${r['monthly_cost_usd']:>10,.2f}/month")

Output for this example:

gpt-4o-mini          $    330.00/month
o3-mini              $  2,420.00/month
gpt-4o               $  5,500.00/month
o3                   $ 22,000.00/month

Paste the JSON output into DevPlaybook's JSON Formatter to inspect your cost breakdown in a readable format.

Cost Optimization Strategies

1. Prompt Caching

OpenAI's prompt caching automatically caches repeated prompt prefixes and charges 50% for cached tokens. This is highly effective when you have a large, static system prompt.

To maximize cache hits:

Keep your static content (system prompt, reference documents, examples) at the beginning of the prompt
Keep dynamic content (user message, conversation history) at the end
Cache hits apply when the same prefix appears in multiple requests within the caching window

Potential savings: 30–60% on input costs for applications with large, reused system prompts.

2. Model Selection Routing

Not every request needs your most expensive model. Route by complexity:

def select_model(task_type: str, complexity_score: int) -> str:
    """
    Select model based on task type and complexity.
    complexity_score: 1 (trivial) to 10 (highly complex)
    """
    if task_type in ["classification", "keyword_extraction", "sentiment"] or complexity_score <= 3:
        return "gpt-4o-mini"

    if task_type in ["code_reasoning", "math", "multi_step_analysis"] or complexity_score >= 8:
        return "o3-mini"

    return "gpt-4o"  # Default for mid-complexity tasks

3. Batch Processing

OpenAI's Batch API offers a 50% cost reduction for non-real-time workloads (24-hour turnaround):

from openai import OpenAI
import json

client = OpenAI()

# Prepare batch
requests = [
    {
        "custom_id": f"task-{i}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": f"Summarize: {document}"}],
            "max_tokens": 200
        }
    }
    for i, document in enumerate(documents)
]

# Write JSONL file
with open("batch_requests.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# Upload and submit batch
with open("batch_requests.jsonl", "rb") as f:
    batch_file = client.files.create(file=f, purpose="batch")

batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch submitted: {batch.id}")

When to use batching: Nightly report generation, bulk document processing, offline classification jobs. Not appropriate for user-facing real-time responses.

4. Trim Conversation History

Passing your full conversation history on every turn is one of the most common cost leaks:

def trim_history(messages: list, max_tokens: int = 2000) -> list:
    """Keep only the most recent messages that fit within token budget."""
    import tiktoken
    enc = tiktoken.encoding_for_model("gpt-4o")

    trimmed = []
    token_count = 0

    # Always keep the system message (index 0)
    # Iterate history in reverse to keep most recent
    for message in reversed(messages[1:]):
        msg_tokens = len(enc.encode(message["content"]))
        if token_count + msg_tokens > max_tokens:
            break
        trimmed.insert(0, message)
        token_count += msg_tokens

    return [messages[0]] + trimmed  # Prepend system message

5. Set `max_tokens` Appropriately

If you only need a 100-word answer, don't set max_tokens=4096. While you only pay for tokens actually generated, a tight max_tokens budget forces the model to be concise, which often improves quality for structured outputs.

OpenAI vs Anthropic vs Gemini: Pricing Comparison

As of early 2025, for the mid-tier "best value" models:

Provider	Model	Input (per 1M)	Output (per 1M)	Context Window
OpenAI	GPT-4o	$2.50	$10.00	128K
OpenAI	GPT-4o-mini	$0.15	$0.60	128K
Anthropic	Claude Sonnet 4.6	$3.00	$15.00	200K
Anthropic	Claude Haiku 4.5	$0.80	$4.00	200K
Google	Gemini 1.5 Pro	$1.25	$5.00	1M
Google	Gemini 1.5 Flash	$0.075	$0.30	1M

Key observations:

Cheapest for high-volume tasks: Gemini Flash is the most economical option for simple, high-volume tasks. GPT-4o-mini is close behind.
Best context window: Claude Sonnet 4.6's 200K context window is a practical advantage for document processing. Gemini 1.5 Pro's 1M context window is unmatched for extremely long documents.
Reasoning models: OpenAI's o3/o3-mini and Anthropic's Claude for extended thinking are premium offerings — use them only when reasoning depth meaningfully improves your output quality.
Price vs quality trade-offs differ by task type: Run your own benchmarks on your specific use case. General benchmarks rarely match real-world results for specialized applications.

Monitoring Costs in Production

Track token usage on every API call:

import json
from datetime import datetime, UTC
from openai import OpenAI

client = OpenAI()
PRICING = {"gpt-4o": {"input": 2.50, "output": 10.00}, "gpt-4o-mini": {"input": 0.15, "output": 0.60}}

def tracked_completion(model, messages, **kwargs):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        **kwargs
    )

    usage = response.usage
    prices = PRICING.get(model, {"input": 0, "output": 0})

    cost = (
        usage.prompt_tokens * prices["input"] / 1_000_000 +
        usage.completion_tokens * prices["output"] / 1_000_000
    )

    log_entry = {
        "ts": datetime.now(UTC).isoformat(),
        "model": model,
        "prompt_tokens": usage.prompt_tokens,
        "completion_tokens": usage.completion_tokens,
        "cost_usd": round(cost, 8)
    }

    with open("api_usage.jsonl", "a") as f:
        f.write(json.dumps(log_entry) + "\n")

    return response

# Usage
response = tracked_completion(
    "gpt-4o-mini",
    [{"role": "user", "content": "Summarize this article..."}]
)

Use DevPlaybook's API Tester to manually test your OpenAI API calls during development. It supports custom headers (for Authorization: Bearer ...), request bodies, and displays the full response including usage stats.

Monthly Cost Estimator: Quick Reference

Use this table to estimate costs for common workload sizes. Assumes 1,000 input + 300 output tokens per request.

Monthly Requests	GPT-4o-mini	GPT-4o	o3-mini
1,000	$0.33	$3.50	$2.42
10,000	$3.30	$35.00	$24.20
100,000	$33.00	$350.00	$242.00
1,000,000	$330.00	$3,500.00	$2,420.00

At 1M requests/month:

If GPT-4o-mini quality is sufficient → $330/month
If you need GPT-4o quality → $3,500/month
That's a 10× cost difference for the same workload

Model selection is the highest-leverage cost decision you'll make.

Common Cost Traps to Avoid

1. Not trimming conversation history. Every message in messages[] costs tokens. A long chat session can accumulate thousands of tokens of history per request.

2. Overly verbose system prompts. System prompts are paid for on every request. Audit yours for unnecessary verbiage. A 2,000-token system prompt at 100K requests/month costs an extra $500 on GPT-4o.

3. Using o3 where GPT-4o-mini would do. Reasoning models are 67× more expensive than GPT-4o-mini. Reserve them for tasks where deep reasoning genuinely changes the output quality.

4. Ignoring cached input pricing. If you have a large static system prompt, restructure your prompts to maximize cache hits. This alone can cut input costs by half.

5. Not setting usage alerts. Set budget alerts in the OpenAI console. A runaway loop in development can generate thousands of requests before you notice. Cap your development API key's usage limit.

Summary: A Decision Framework for Model Selection

Is the task simple? (classification, extraction, short summaries)
  → GPT-4o-mini (or Gemini Flash for maximum savings)

Does the task involve complex reasoning? (multi-step math, code debugging, analysis)
  → Test GPT-4o first; escalate to o3-mini only if quality is insufficient

Do you need maximum context window?
  → Claude Sonnet 4.6 (200K) or Gemini 1.5 Pro (1M)

Is this a non-real-time batch workload?
  → Use OpenAI Batch API for 50% discount

Are you serving users in real time with tight latency requirements?
  → GPT-4o-mini or Haiku 4.5 — both return responses in under 2 seconds

The most important thing you can do is instrument your app from day one. Log every token count, track cost per request, and set budget alerts. Surprises happen when you're not watching.

DevPlaybook offers free developer tools at devplaybook.cc, including a JSON Formatter and API Tester useful for working with LLM APIs. No affiliate relationship with OpenAI, Anthropic, or Google.

Level Up Your Dev Workflow

Found this useful? Explore DevPlaybook — cheat sheets, tool comparisons, and hands-on guides for modern developers.

🛒 Get the DevToolkit Starter Kit on Gumroad — 40+ browser-based dev tools, source code + deployment guide included.

Top comments (1)

Henry Godnick • Mar 29

Solid breakdown, especially the cost traps section. The "not trimming conversation history" one bites so many people.

One thing I'd add for the monitoring section — if you're on macOS and want ambient cost awareness without building your own logging, TokenBar (tokenbar.site) puts a live token counter in your menu bar. It tracks across OpenAI, Anthropic, and other providers in real time. $5 one-time.

Not a replacement for the production instrumentation you described (which is essential for deployed apps), but for development-time awareness it's been useful. Seeing tokens tick up in the menu bar while iterating on prompts naturally makes you more cost-conscious.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

Understanding OpenAI's Pricing Model

Current OpenAI Model Pricing (2025)

The Token Math: How to Calculate Costs

Quick Reference: Token Counting

Real Calculation Examples

Example 1: Customer Support Chatbot

Example 2: Document Summarizer

Example 3: AI Coding Assistant

Building a Cost Estimator in Python

Cost Optimization Strategies

1. Prompt Caching

2. Model Selection Routing

3. Batch Processing

4. Trim Conversation History

5. Set max_tokens Appropriately

OpenAI vs Anthropic vs Gemini: Pricing Comparison

Monitoring Costs in Production

Monthly Cost Estimator: Quick Reference

Common Cost Traps to Avoid

Summary: A Decision Framework for Model Selection

Level Up Your Dev Workflow

5. Set `max_tokens` Appropriately