OpenAI Just Released GPT-5.5. Here's What It Actually Does (and What It Costs You)

#ai #machinelearning #openai #discuss

GPT-5.4 shipped on March 5. Seven weeks later, on April 23, 2026, OpenAI released GPT-5.5 — and the pace alone tells you something about where this race is headed. This isn't iteration for iteration's sake. GPT-5.5 is a genuinely different model from the ground up, and if you're building on top of OpenAI's stack, the changes matter in ways that go beyond the benchmark table.

Here's everything developers need to know.

The Problem It's Solving

The core complaint with every prior GPT-5.x model was the same: impressive on individual tasks, but brittle on anything that required sustained, multi-step reasoning. You'd hand it a complex task, get a decent first pass, and then spend the next hour managing every subsequent step yourself.

GPT-5.5 is designed to handle messy, multi-part tasks where you can trust it to plan, use tools, check its own work, navigate ambiguity, and keep going without babysitting. OpenAI That's the stated goal, and unlike most model launch claims, there's enough third-party benchmark data to take it seriously.

How GPT-5.5 Actually Works

The first thing to understand about GPT-5.5 is architectural. Every GPT model since GPT-5 — versions 5.1 through 5.4 — was built on the same base architecture. GPT-5.5 breaks that pattern entirely. It's a model trained from scratch. LushBinary That's not a minor detail. Fresh base training means the model reasons differently at a fundamental level, particularly in how it maintains context across long, multi-file, multi-step tasks.

GPT-5.5 ships in three variants: the standard model (GPT-5.5 Thinking), and a higher-compute version called GPT-5.5 Pro. The model supports text and image input and has a context window of approximately 920K tokens. Artificial Analysis In Codex specifically, GPT-5.5 can be accessed with a 400,000 token context window across Plus, Pro, Business, Enterprise, Edu, and Go plans. gHacks Tech News

GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while performing at a significantly higher level of intelligence. It also uses fewer tokens to complete the same Codex tasks. OpenAI That last point matters for your cost model, which we'll get to.

On the research side, OpenAI has a concrete example worth noting. An internal version of GPT-5.5 with a custom harness helped discover a new proof about Ramsey numbers in combinatorics, later verified in Lean — a concrete case of GPT-5.5 contributing not just code or explanation, but a mathematically novel argument in a core research area. OpenAI

What Developers Are Actually Using It For

Agentic coding in Codex is the headline use case. The model is designed to handle engineering work such as implementation, refactoring, debugging, testing, and validation as a continuous loop. Developer Tech News

Real-world signals from early testers are notably specific. Dan Shipper, CEO of Every, said GPT-5.5 reproduced the type of system rewrite one of his engineers had eventually chosen for a post-launch issue, while GPT-5.4 could not. Pietro Schirano, CEO of MagicPath, said the model merged a branch with hundreds of frontend and refactor changes into a main codebase that had also diverged, resolving the work in about 20 minutes. Cursor co-founder Michael Truell noted GPT-5.5 stayed on task longer and showed more reliable tool use than GPT-5.4. Developer Tech News

Computer use is meaningfully better. On OSWorld-Verified, which assesses a model's ability to operate in real-world computer environments autonomously, GPT-5.5 achieves 78.7%, up from GPT-5.4's 75.0%. gHacks Tech News

Knowledge work across 44 occupations is tracked via GDPval. GPT-5.5 scores 84.9% on GDPval and 98.0% on Tau2-bench Telecom, which tests complex customer-service workflows, without prompt tuning. OpenAI

OpenAI also shared internal use cases: the Finance team used Codex to review 24,771 K-1 tax forms across 71,637 pages, helping accelerate the task by two weeks compared to the prior year. A Go-to-Market employee automated weekly business reporting, saving 5–10 hours per week. OpenAI

Codex + browser expansion is also new. With GPT-5.5, Codex can interact with web apps, test flows, click through pages, capture screenshots, and iterate on what it sees until it completes the task — expanding well beyond the terminal. 9to5Mac

What the Benchmarks Actually Show

OpenAI moved away from SWE-bench Verified as a primary eval, citing plateau concerns. The benchmarks now favored are more demanding and more representative of real work.

On Terminal-Bench 2.0, GPT-5.5 achieves 82.7%, up from GPT-5.4's 75.1%. Claude Opus 4.7 sits at 69.4%. gHacks Tech News Terminal-Bench tests real command-line workflows: multi-step shell scripting, package management, build configuration, container orchestration. A single wrong flag breaks the chain. This is the benchmark where GPT-5.5's lead is most decisive.

On SWE-Bench Pro, GPT-5.5 scores 58.6%. Claude Opus 4.7 scores higher at 64.3%. gHacks Tech News That's an honest trade-off OpenAI included in their own launch materials — a rare sign of benchmark confidence elsewhere even if not everywhere.

On CyberGym, GPT-5.5 scores 81.8%, versus GPT-5.4's 79.0% and Claude Opus 4.7's 73.1%. gHacks Tech News

On FrontierMath Tier 1–3, GPT-5.5 scores 51.7%, up from GPT-5.4's 47.6%. Skypage

One important caveat from third-party testing: in many benchmarks, GPT-5.4 Pro still outperforms the default GPT-5.5. The New Stack The Pro tier of the older model remains competitive unless you're specifically targeting the areas where the new architecture shines.

Why This Is a Bigger Deal Than It Looks

Two things make this release significant beyond the spec sheet.

First, the architecture break. Every GPT-5.x model up to 5.4 was a refinement of the same base. GPT-5.5 is not. GPT-5.5 (codenamed "Spud") is the first fully retrained base model since GPT-4.5. LushBinary That changes what's possible downstream. The previous models delivered steady improvements to Codex, but each was constrained by the original GPT-5 architecture. GPT-5.5 doesn't have that ceiling.

Second, the super app strategy. Greg Brockman said GPT-5.5 is another step toward a "super app" — a unified service combining ChatGPT, Codex, and an AI browser — that Brockman and Sam Altman envision as the primary interface for enterprise work. TechCrunch GPT-5.5 is both a model release and an infrastructure move. The cadence — GPT-5.4 on March 5, GPT-5.5 on April 23 — is deliberate. OpenAI is trying to establish category lock-in before enterprise procurement cycles close.

The NVIDIA integration is also notable. GPT-5.5 was co-designed, trained, and served on NVIDIA GB200 and GB300 NVL72 systems. Codex analyzed weeks of production traffic data and wrote custom heuristic algorithms for load balancing and partitioning, resulting in more than 20% faster token generation speeds. Developer Tech News The model helped optimize its own serving stack. That feedback loop between the model and the infrastructure it runs on is new.

The Part That Should Actually Concern You: Pricing

This is where the release gets complicated for independent developers and smaller teams.

GPT-5.5 API pricing: $5.00 per million input tokens, $30.00 per million output tokens. Apidog That's double GPT-5.4's input price of $2.50. GPT-5 launched in August 2025 at $0.63 per million input tokens. GPT-5.4 increased that to $2.50 in March 2026. GPT-5.5 doubles it again to $5.00 — nearly an 8x increase in under a year. Skypage

GPT-5.5 Pro pricing: $30 per million input tokens and $180 per million output tokens, with Priority processing at 2.5 times the standard rate. EdTech Innovation Hub

OpenAI's defense of this is token efficiency — the model reaches the same output with fewer tokens, so your actual bill may not double even if the rate does. At 10 million output tokens per month, GPT-5.5 standard comes to $300 versus Claude Opus 4.7's $250. If GPT-5.5's agentic performance means 25% fewer task iterations, you break even. Build Fast with AI The math works — if the efficiency gains hold for your specific workload. Benchmark your actual tasks before assuming the sticker price reflects your real cost.

One concrete optimization to implement immediately: cached input tokens on GPT-5.5 drop to $0.50 per million — a tenth of the standard rate. Cache system prompts, tool schemas, and repo context on anything reused across requests. Skypage

Availability and Access

GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in ChatGPT. As of April 24, 2026, both GPT-5.5 and GPT-5.5 Pro are available in the API. OpenAI

For API access, the model IDs are gpt-5.5 for standard and gpt-5.5-pro for the Pro tier. Both are available through the Chat Completions and Responses APIs.

On the safety side, OpenAI has classified GPT-5.5's cybersecurity and biological capabilities as High under its Preparedness Framework, though below the Critical threshold. The company is also running a Trusted Access for Cyber program through Codex, allowing verified users expanded access to advanced security capabilities. EdTech Innovation Hub

Quick cost controls worth building in on day one: route premium, long-horizon tasks to GPT-5.5 and standard queries to GPT-5.4 or GPT-5.4-mini. The per-token price jump makes tiered routing a budget necessity, not an optimization.

The real story here isn't a single model release — it's the six-week cadence that produced it. OpenAI is shipping at a pace that forces enterprise decisions before anyone has time to fully evaluate. Whether that serves developers or just locks them in faster is a question the next few months will answer.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.