DEV Community

楊東霖
楊東霖

Posted on • Originally published at devplaybook.cc

OpenAI API Pricing Calculator: How to Estimate Your Monthly Costs

One of the most common developer surprises when building with OpenAI's API is the bill at the end of the month. Token-based pricing is not intuitive, and it's easy to underestimate costs when you're moving fast in development. A chatbot that seems cheap in testing can become expensive at scale if you haven't done the math.

This guide shows you how to estimate your OpenAI API costs before they surprise you — with real calculation examples, optimization strategies, and a cross-provider comparison so you can make informed model choices.


Understanding OpenAI's Pricing Model

OpenAI charges based on tokens — not requests, not words, not time. A token is roughly 4 characters or 0.75 words in English. Every API call consumes input tokens (your prompt) and produces output tokens (the response). You pay for both, at different rates.

Key insight: Input tokens are always cheaper than output tokens. For most models, output costs 3–5× more per token than input. This matters for how you architect your prompts.


Current OpenAI Model Pricing (2025)

Prices are per 1 million tokens unless otherwise noted:

Model Input Cached Input Output
GPT-4o $2.50 $1.25 $10.00
GPT-4o-mini $0.15 $0.075 $0.60
o3 $10.00 $2.50 $40.00
o3-mini $1.10 $0.55 $4.40
o1 $15.00 $7.50 $60.00

Notes:

  • Cached input pricing applies when the same prompt prefix is reused (see Prompt Caching below)
  • o3 and o1 are reasoning models — they think before responding, producing "reasoning tokens" that contribute to cost
  • Check platform.openai.com/docs/pricing for the latest figures — prices change

The Token Math: How to Calculate Costs

The formula is straightforward:

Cost = (input_tokens × input_price_per_token) + (output_tokens × output_price_per_token)
Enter fullscreen mode Exit fullscreen mode

Where:

input_price_per_token = model_input_price / 1,000,000
output_price_per_token = model_output_price / 1,000,000
Enter fullscreen mode Exit fullscreen mode

Quick Reference: Token Counting

Content Approximate Token Count
Single word ~1 token
Average sentence ~15–20 tokens
1 page of text (~500 words) ~650–750 tokens
Short paragraph ~100 tokens
System prompt (200 words) ~260 tokens
Full book chapter ~4,000–8,000 tokens

You can count tokens precisely using OpenAI's tiktoken library:

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Example
system_prompt = "You are a helpful customer support assistant for an e-commerce store."
user_message = "I ordered 3 days ago and my package hasn't arrived. Order number is #78432."

system_tokens = count_tokens(system_prompt)
user_tokens = count_tokens(user_message)
total_input = system_tokens + user_tokens

print(f"System prompt: {system_tokens} tokens")
print(f"User message: {user_tokens} tokens")
print(f"Total input: {total_input} tokens")
Enter fullscreen mode Exit fullscreen mode

Real Calculation Examples

Example 1: Customer Support Chatbot

Setup:

  • System prompt: 300 tokens (role definition, product info, guidelines)
  • Average user message: 80 tokens
  • Average conversation history passed: 400 tokens (last 3 turns)
  • Average assistant response: 200 tokens
  • Volume: 50,000 conversations/month

Per conversation cost (GPT-4o):

Input tokens:  300 (system) + 80 (user) + 400 (history) = 780 tokens
Output tokens: 200 tokens

Input cost:  780 × ($2.50 / 1,000,000)  = $0.00195
Output cost: 200 × ($10.00 / 1,000,000) = $0.00200
Cost per conversation: ~$0.004
Enter fullscreen mode Exit fullscreen mode

Monthly cost at 50,000 conversations:

50,000 × $0.004 = $200/month (GPT-4o)
50,000 × $0.00024 = $12/month (GPT-4o-mini)
Enter fullscreen mode Exit fullscreen mode

Recommendation: Use GPT-4o-mini for straightforward support queries and escalate to GPT-4o only for complex cases. Potential savings: 94%.


Example 2: Document Summarizer

Setup:

  • Average document length: 3,000 tokens
  • System prompt: 150 tokens
  • Average summary output: 400 tokens
  • Volume: 10,000 documents/month

Per document cost (GPT-4o):

Input tokens:  3,000 (doc) + 150 (system) = 3,150 tokens
Output tokens: 400 tokens

Input cost:  3,150 × ($2.50 / 1,000,000)  = $0.007875
Output cost: 400 × ($10.00 / 1,000,000)   = $0.004000
Cost per document: ~$0.012
Enter fullscreen mode Exit fullscreen mode

Monthly cost at 10,000 documents:

GPT-4o:      10,000 × $0.012  = $120/month
GPT-4o-mini: 10,000 × $0.00072 = $7.20/month
Enter fullscreen mode Exit fullscreen mode

For summarization tasks where you don't need frontier reasoning, GPT-4o-mini produces excellent results at a fraction of the cost.


Example 3: AI Coding Assistant

Setup:

  • System prompt: 500 tokens (coding guidelines, language constraints)
  • Average code context passed: 1,500 tokens
  • Average user question: 100 tokens
  • Average code response: 800 tokens
  • Volume: 5,000 requests/month

Per request cost (GPT-4o):

Input tokens:  500 + 1,500 + 100 = 2,100 tokens
Output tokens: 800 tokens

Input cost:  2,100 × ($2.50 / 1,000,000)  = $0.00525
Output cost: 800 × ($10.00 / 1,000,000)   = $0.00800
Cost per request: ~$0.013
Enter fullscreen mode Exit fullscreen mode

Monthly cost:

GPT-4o:   5,000 × $0.013  = $65/month
o3-mini:  5,000 × ($2,100 × 0.0000011 + $800 × 0.0000044) = ~$23/month
Enter fullscreen mode Exit fullscreen mode

For coding tasks requiring deeper reasoning, o3-mini can outperform GPT-4o at a lower cost — run your own evals to verify for your specific use case.


Building a Cost Estimator in Python

Here's a reusable calculator you can adapt for your own apps:

# OpenAI pricing per million tokens (update as needed)
PRICING = {
    "gpt-4o": {
        "input": 2.50,
        "cached_input": 1.25,
        "output": 10.00
    },
    "gpt-4o-mini": {
        "input": 0.15,
        "cached_input": 0.075,
        "output": 0.60
    },
    "o3": {
        "input": 10.00,
        "cached_input": 2.50,
        "output": 40.00
    },
    "o3-mini": {
        "input": 1.10,
        "cached_input": 0.55,
        "output": 4.40
    }
}

def estimate_monthly_cost(
    model: str,
    avg_input_tokens: int,
    avg_output_tokens: int,
    monthly_requests: int,
    cached_input_fraction: float = 0.0
) -> dict:
    """
    Estimate monthly API costs.

    Args:
        model: Model name (e.g., "gpt-4o")
        avg_input_tokens: Average input tokens per request
        avg_output_tokens: Average output tokens per request
        monthly_requests: Expected monthly request volume
        cached_input_fraction: Fraction of input that qualifies for cached pricing (0.0–1.0)
    """
    if model not in PRICING:
        raise ValueError(f"Unknown model: {model}. Available: {list(PRICING.keys())}")

    prices = PRICING[model]

    # Split input into cached vs non-cached
    cached_tokens = avg_input_tokens * cached_input_fraction
    fresh_tokens = avg_input_tokens * (1 - cached_input_fraction)

    cost_per_request = (
        (fresh_tokens * prices["input"] / 1_000_000) +
        (cached_tokens * prices["cached_input"] / 1_000_000) +
        (avg_output_tokens * prices["output"] / 1_000_000)
    )

    monthly_cost = cost_per_request * monthly_requests

    return {
        "model": model,
        "cost_per_request_usd": round(cost_per_request, 6),
        "monthly_cost_usd": round(monthly_cost, 2),
        "monthly_input_tokens": avg_input_tokens * monthly_requests,
        "monthly_output_tokens": avg_output_tokens * monthly_requests,
    }


def compare_models(avg_input_tokens, avg_output_tokens, monthly_requests):
    """Compare costs across all models for a given workload."""
    results = []
    for model in PRICING:
        result = estimate_monthly_cost(
            model, avg_input_tokens, avg_output_tokens, monthly_requests
        )
        results.append(result)
    return sorted(results, key=lambda x: x["monthly_cost_usd"])


# Example: Chatbot with 1,000 tokens input, 300 tokens output, 100k requests/month
comparison = compare_models(1000, 300, 100_000)
for r in comparison:
    print(f"{r['model']:20} ${r['monthly_cost_usd']:>10,.2f}/month")
Enter fullscreen mode Exit fullscreen mode

Output for this example:

gpt-4o-mini          $    330.00/month
o3-mini              $  2,420.00/month
gpt-4o               $  5,500.00/month
o3                   $ 22,000.00/month
Enter fullscreen mode Exit fullscreen mode

Paste the JSON output into DevPlaybook's JSON Formatter to inspect your cost breakdown in a readable format.


Cost Optimization Strategies

1. Prompt Caching

OpenAI's prompt caching automatically caches repeated prompt prefixes and charges 50% for cached tokens. This is highly effective when you have a large, static system prompt.

To maximize cache hits:

  • Keep your static content (system prompt, reference documents, examples) at the beginning of the prompt
  • Keep dynamic content (user message, conversation history) at the end
  • Cache hits apply when the same prefix appears in multiple requests within the caching window

Potential savings: 30–60% on input costs for applications with large, reused system prompts.

2. Model Selection Routing

Not every request needs your most expensive model. Route by complexity:

def select_model(task_type: str, complexity_score: int) -> str:
    """
    Select model based on task type and complexity.
    complexity_score: 1 (trivial) to 10 (highly complex)
    """
    if task_type in ["classification", "keyword_extraction", "sentiment"] or complexity_score <= 3:
        return "gpt-4o-mini"

    if task_type in ["code_reasoning", "math", "multi_step_analysis"] or complexity_score >= 8:
        return "o3-mini"

    return "gpt-4o"  # Default for mid-complexity tasks
Enter fullscreen mode Exit fullscreen mode

3. Batch Processing

OpenAI's Batch API offers a 50% cost reduction for non-real-time workloads (24-hour turnaround):

from openai import OpenAI
import json

client = OpenAI()

# Prepare batch
requests = [
    {
        "custom_id": f"task-{i}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": f"Summarize: {document}"}],
            "max_tokens": 200
        }
    }
    for i, document in enumerate(documents)
]

# Write JSONL file
with open("batch_requests.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# Upload and submit batch
with open("batch_requests.jsonl", "rb") as f:
    batch_file = client.files.create(file=f, purpose="batch")

batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch submitted: {batch.id}")
Enter fullscreen mode Exit fullscreen mode

When to use batching: Nightly report generation, bulk document processing, offline classification jobs. Not appropriate for user-facing real-time responses.

4. Trim Conversation History

Passing your full conversation history on every turn is one of the most common cost leaks:

def trim_history(messages: list, max_tokens: int = 2000) -> list:
    """Keep only the most recent messages that fit within token budget."""
    import tiktoken
    enc = tiktoken.encoding_for_model("gpt-4o")

    trimmed = []
    token_count = 0

    # Always keep the system message (index 0)
    # Iterate history in reverse to keep most recent
    for message in reversed(messages[1:]):
        msg_tokens = len(enc.encode(message["content"]))
        if token_count + msg_tokens > max_tokens:
            break
        trimmed.insert(0, message)
        token_count += msg_tokens

    return [messages[0]] + trimmed  # Prepend system message
Enter fullscreen mode Exit fullscreen mode

5. Set max_tokens Appropriately

If you only need a 100-word answer, don't set max_tokens=4096. While you only pay for tokens actually generated, a tight max_tokens budget forces the model to be concise, which often improves quality for structured outputs.


OpenAI vs Anthropic vs Gemini: Pricing Comparison

As of early 2025, for the mid-tier "best value" models:

Provider Model Input (per 1M) Output (per 1M) Context Window
OpenAI GPT-4o $2.50 $10.00 128K
OpenAI GPT-4o-mini $0.15 $0.60 128K
Anthropic Claude Sonnet 4.6 $3.00 $15.00 200K
Anthropic Claude Haiku 4.5 $0.80 $4.00 200K
Google Gemini 1.5 Pro $1.25 $5.00 1M
Google Gemini 1.5 Flash $0.075 $0.30 1M

Key observations:

  1. Cheapest for high-volume tasks: Gemini Flash is the most economical option for simple, high-volume tasks. GPT-4o-mini is close behind.

  2. Best context window: Claude Sonnet 4.6's 200K context window is a practical advantage for document processing. Gemini 1.5 Pro's 1M context window is unmatched for extremely long documents.

  3. Reasoning models: OpenAI's o3/o3-mini and Anthropic's Claude for extended thinking are premium offerings — use them only when reasoning depth meaningfully improves your output quality.

  4. Price vs quality trade-offs differ by task type: Run your own benchmarks on your specific use case. General benchmarks rarely match real-world results for specialized applications.


Monitoring Costs in Production

Track token usage on every API call:

import json
from datetime import datetime, UTC
from openai import OpenAI

client = OpenAI()
PRICING = {"gpt-4o": {"input": 2.50, "output": 10.00}, "gpt-4o-mini": {"input": 0.15, "output": 0.60}}

def tracked_completion(model, messages, **kwargs):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        **kwargs
    )

    usage = response.usage
    prices = PRICING.get(model, {"input": 0, "output": 0})

    cost = (
        usage.prompt_tokens * prices["input"] / 1_000_000 +
        usage.completion_tokens * prices["output"] / 1_000_000
    )

    log_entry = {
        "ts": datetime.now(UTC).isoformat(),
        "model": model,
        "prompt_tokens": usage.prompt_tokens,
        "completion_tokens": usage.completion_tokens,
        "cost_usd": round(cost, 8)
    }

    with open("api_usage.jsonl", "a") as f:
        f.write(json.dumps(log_entry) + "\n")

    return response

# Usage
response = tracked_completion(
    "gpt-4o-mini",
    [{"role": "user", "content": "Summarize this article..."}]
)
Enter fullscreen mode Exit fullscreen mode

Use DevPlaybook's API Tester to manually test your OpenAI API calls during development. It supports custom headers (for Authorization: Bearer ...), request bodies, and displays the full response including usage stats.


Monthly Cost Estimator: Quick Reference

Use this table to estimate costs for common workload sizes. Assumes 1,000 input + 300 output tokens per request.

Monthly Requests GPT-4o-mini GPT-4o o3-mini
1,000 $0.33 $3.50 $2.42
10,000 $3.30 $35.00 $24.20
100,000 $33.00 $350.00 $242.00
1,000,000 $330.00 $3,500.00 $2,420.00

At 1M requests/month:

  • If GPT-4o-mini quality is sufficient → $330/month
  • If you need GPT-4o quality → $3,500/month
  • That's a 10× cost difference for the same workload

Model selection is the highest-leverage cost decision you'll make.


Common Cost Traps to Avoid

1. Not trimming conversation history. Every message in messages[] costs tokens. A long chat session can accumulate thousands of tokens of history per request.

2. Overly verbose system prompts. System prompts are paid for on every request. Audit yours for unnecessary verbiage. A 2,000-token system prompt at 100K requests/month costs an extra $500 on GPT-4o.

3. Using o3 where GPT-4o-mini would do. Reasoning models are 67× more expensive than GPT-4o-mini. Reserve them for tasks where deep reasoning genuinely changes the output quality.

4. Ignoring cached input pricing. If you have a large static system prompt, restructure your prompts to maximize cache hits. This alone can cut input costs by half.

5. Not setting usage alerts. Set budget alerts in the OpenAI console. A runaway loop in development can generate thousands of requests before you notice. Cap your development API key's usage limit.


Summary: A Decision Framework for Model Selection

Is the task simple? (classification, extraction, short summaries)
  → GPT-4o-mini (or Gemini Flash for maximum savings)

Does the task involve complex reasoning? (multi-step math, code debugging, analysis)
  → Test GPT-4o first; escalate to o3-mini only if quality is insufficient

Do you need maximum context window?
  → Claude Sonnet 4.6 (200K) or Gemini 1.5 Pro (1M)

Is this a non-real-time batch workload?
  → Use OpenAI Batch API for 50% discount

Are you serving users in real time with tight latency requirements?
  → GPT-4o-mini or Haiku 4.5 — both return responses in under 2 seconds
Enter fullscreen mode Exit fullscreen mode

The most important thing you can do is instrument your app from day one. Log every token count, track cost per request, and set budget alerts. Surprises happen when you're not watching.


DevPlaybook offers free developer tools at devplaybook.cc, including a JSON Formatter and API Tester useful for working with LLM APIs. No affiliate relationship with OpenAI, Anthropic, or Google.


Level Up Your Dev Workflow

Found this useful? Explore DevPlaybook — cheat sheets, tool comparisons, and hands-on guides for modern developers.

🛒 Get the DevToolkit Starter Kit on Gumroad — 40+ browser-based dev tools, source code + deployment guide included.

Top comments (0)