GPT-5 from a Developer's Perspective: API Changes, Costs, and When to Upgrade

#openai #ai #webdev #productivity

tags: openai, ai, webdev, productivity

GPT-5 from a Developer's Perspective: API Changes, Costs, and When to Upgrade

I have been running GPT-5 in production for about three months across two services. One is a documentation summarizer hitting roughly 40k requests per day, the other is a code review assistant for our internal PR workflow. This post is what I wish someone had written before I migrated, with actual numbers and the things that broke.

What Changed in the API

The endpoint shape is mostly backward compatible. If your code uses client.chat.completions.create(model="gpt-4o", ...) you can swap to model="gpt-5" and most things keep working. The differences show up in three places.

First, the reasoning parameters. GPT-5 exposes a reasoning_effort field that takes "low", "medium", or "high". Setting it to "low" gives you something close to GPT-4o behavior at a similar cost. Setting it to "high" invokes the deeper reasoning path and roughly doubles your token cost on the output side. The default is "medium", which is fine for most use cases but worth knowing about if your bill suddenly jumps.

response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": prompt}],
    reasoning_effort="low",   # cheap, fast, GPT-4o-ish
    max_completion_tokens=2000,
)

Second, max_tokens got renamed to max_completion_tokens. The old name still works but emits a deprecation warning. If you have CI that fails on warnings, this will surprise you.

Third, function calling improved. Tool selection is more reliable, and the model is less likely to call a function with malformed JSON arguments. I used to wrap every tool call in a try-except for JSON parse errors. I still do, but I have not hit one in production for about six weeks.

Token Costs and the Actual Bill

Pricing at the time I migrated was roughly $1.25 per million input tokens and $10 per million output tokens for the standard tier, with the reasoning path costing more on output. GPT-4o was $2.50 per million input and $10 per million output. So on the input side, GPT-5 is actually cheaper. The output side depends on whether your workload triggers the reasoning path.

For my documentation summarizer, which has a 50:1 input-to-output ratio, the total cost dropped about 30 percent. For the code review service, which has a tighter ratio and benefits from reasoning_effort="medium", the cost went up about 15 percent but the output quality jumped enough that we kept it. There is a thorough writeup comparing GPT-5 pricing and features that includes the reasoning effort cost curves, and the numbers match my observed spend within a couple of percent.

If you are doing high-volume cheap work, look at GPT-5 mini before defaulting to full GPT-5. It is roughly one-fifth the cost and good enough for classification, tagging, simple extraction, and the kind of structured output work where you do not need the deep reasoning path.

Migration Pain Points

The thing that bit me hardest was structured output validation. GPT-5 is better at following JSON schemas, which sounds good, except that my downstream code was tolerant of some weirdness GPT-4o used to produce. When GPT-5 started producing cleaner output, a parsing branch that handled malformed responses stopped firing, and a bug downstream that depended on that branch surfaced. Not GPT-5's fault. Mine for writing code that depended on bad upstream data. But worth flagging.

The second issue was latency. GPT-5 with default settings is slower than GPT-4o. My p50 latency went from 1.8 seconds to 3.1 seconds for a typical request. For batch work this does not matter. For anything user-facing, you need to either drop to reasoning_effort="low" or rethink the UX to handle the wait. I added a typing indicator and a "thinking" status message and users stopped complaining.

When You Should Migrate

Default to GPT-5 if your workload involves any of: multi-step reasoning, code analysis, ambiguous instructions, long context windows, or anything where GPT-4o has been giving you "almost right" outputs that need human cleanup. The cleanup time saved usually beats the latency cost.

Stay on GPT-4o (or move to GPT-5 mini) if your workload is high-volume, low-complexity, latency-sensitive, or already working well. There is no prize for being on the newest model.

Avoid GPT-5 entirely if you have not done a cost projection. The reasoning effort multiplier is real and your bill can move in directions you did not expect.

What I Wish I Had Known

Read your existing logs before migrating. The errors you currently silently tolerate from GPT-4o are the errors that will change shape under GPT-5, and you want to know what your downstream code is actually doing with bad input.

Run both models in parallel for a week, log the diffs, eyeball a hundred examples. You will catch the cases where GPT-5 is worse for your specific use case (they exist) and you will not get caught by surprise on day one of full migration.

One pattern I now use everywhere is a routing layer that picks the model per request based on input characteristics. Short prompts and structured extraction go to GPT-5 mini. Long context and code-heavy work goes to GPT-5 with medium reasoning effort. Anything where the user is waiting in real time goes to GPT-5 with low reasoning effort. The implementation is about thirty lines of Python and saves me from picking a single default that is wrong for half my traffic.

def route_model(prompt, has_code, user_waiting):
    if user_waiting:
        return ("gpt-5", "low")
    if has_code or len(prompt) > 8000:
        return ("gpt-5", "medium")
    return ("gpt-5-mini", "low")