OpenAI announced GPT-5.5 on April 23, 2026. The API rolled out one day later on April 24. Four days in, the marketing claims and benchmark hype are everywhere — but the picture is more nuanced than headlines suggest.
This post is a 1st-source-only digest. Everything here is cross-validated against openai.com, developers.openai.com, CNBC, TechCrunch, Fortune, Help Net Security, and the OpenAI/Codex GitHub issue tracker.
What Was Actually Announced
- Announcement: April 23, 2026 (Brockman, Glaese, Chen, Pachocki — not Sam Altman)
- API release: April 24, 2026 (one day later, separate safeguard process)
-
Model IDs:
gpt-5.5, snapshotgpt-5.5-2026-04-23 - Knowledge cutoff: December 1, 2025
- ChatGPT availability: Plus, Pro, Business, Enterprise (immediate)
- Codex availability: Plus, Pro, Business, Enterprise, Edu, Go
Benchmarks: Where GPT-5.5 Is SOTA
All scores below are at reasoning effort xhigh per OpenAI's official tables.
| Benchmark | GPT-5.4 | GPT-5.5 | Delta |
|---|---|---|---|
| Terminal-Bench 2.0 | 75.1% | 82.7% | +7.6pp (SOTA) |
| Expert-SWE (Internal) | 68.5% | 73.1% | +4.6pp |
| GDPval (knowledge work) | 83.0% | 84.9% | beats Claude Opus 4.7 (80.3%) |
| OSWorld-Verified | 75.0% | 78.7% | beats Claude (78.0%) |
| Tau2-bench Telecom | 92.8% | 98.0% | no prompt tuning |
| FrontierMath Tier 4 | 27.1% | 35.4% | beats Claude (22.9%) |
| ARC-AGI-2 | 73.3% | 85.0% | +11.7pp |
| MRCR v2 8-needle 512K-1M | 36.6% | 74.0% | +37.4pp (2x recovery) |
The MRCR v2 long-context recovery is particularly impressive. GPT-5.4 was losing more than half the needles in the 512K-1M range; GPT-5.5 retains roughly three-quarters.
Where GPT-5.5 Is NOT #1
This is the part most marketing posts skip. Per OpenAI's own published comparison tables:
| Benchmark | GPT-5.5 | Leader | Lead Margin |
|---|---|---|---|
| SWE-Bench Pro | 58.6% | Claude Opus 4.7 (64.3%) | -5.7pp |
| GPQA Diamond | 93.6% | Claude Opus 4.7 (94.2%) | -0.6pp |
| Humanity's Last Exam (with tools) | 52.2% | Claude Opus 4.7 (54.7%) | -2.5pp |
| ARC-AGI-1 (Verified) | 95.0% | Gemini 3.1 Pro (98.0%) | -3.0pp |
| BrowseComp | 84.4% | Gemini 3.1 Pro (85.9%) | -1.5pp |
OpenAI itself notes in the announcement that SWE-Bench Pro has potential memorization concerns documented in the literature. Take any single benchmark with appropriate skepticism.
The 1M Context Catch
OpenAI markets GPT-5.5 with a "1M context window." The exact number per developer docs is 1,050,000 tokens. But this number depends heavily on where you use it.
| Environment | Context | Source |
|---|---|---|
API (gpt-5.5) |
1,050,000 tokens | developers.openai.com |
| Codex (official) | 400,000 tokens | OpenAI announcement |
| Codex (measured) | 258,400 tokens (bug report) | openai/codex#19319 |
| Max output | 128,000 tokens | developers.openai.com |
Users in the GitHub issue are reporting "exceeds the context window" errors at unexpectedly low input sizes. If you're building tooling that depends on the full 1M window, validate the actual environment, not the marketing claim.
Pricing: 2x Increase + Long Context Premium
The published API pricing:
gpt-5.5
Input: $5.00 / 1M tokens
Output: $30.00 / 1M tokens
Cached input: $0.50 / 1M tokens
gpt-5.5-pro (parallel test-time compute variant)
Input: $30.00 / 1M tokens
Output: $180.00 / 1M tokens
This is exactly 2x GPT-5.4's pricing ($2.50 input / $10 output).
The hidden premium: inputs over 272K tokens get 2x input cost and 1.5x output cost. So if you actually use the full 1M window, you're paying double on input. This makes "1M context is essentially priced twice" a fair characterization.
Other pricing modifiers:
- Batch / Flex: 50% of standard
- Priority processing: 250% of standard
- Regional processing (data residency): +10%
- Codex Fast mode: 1.5x speed at 2.5x cost
OpenAI argues that token efficiency improvements offset the price hike for many workloads. Your mileage will depend heavily on the work type.
Safety: AISI Found a Universal Jailbreak in 6 Hours
The UK AI Security Institute (AISI) ran a 6-hour expert red team and found a universal jailbreak before launch. OpenAI says they fixed it before release. However, per Transformer News, AISI did not directly verify the fix in the final deployment configuration.
GPT-5.5 is rated "High" on OpenAI's Preparedness Framework for both cybersecurity and biology (below "Critical" but above prior models). OpenAI launched a Bio Bug Bounty program for finding biology safeguard bypasses.
For cybersecurity, OpenAI is positioning defensively via the Trusted Access for Cyber program — vetted defenders get expanded access to GPT-5.5's cyber capabilities. SecureBio evaluation reportedly found "wet-lab virology troubleshooting assistance above expert level," which is the basis for the High rating.
Practical Guidance (Day 4)
Real-world feedback is still thin. Based on what OpenAI has published:
Use GPT-5.5 when:
- Agentic coding workflows (Terminal-Bench-style tasks)
- Computer use / OS automation (OSWorld-Verified)
- Long-context recall in the 512K-1M range
- Tier-4 frontier mathematics
- Knowledge work where GDPval is representative
Consider Claude Opus 4.7 when:
- Pure SWE-Bench Pro-style coding tasks
- Academic reasoning (GPQA Diamond)
- Humanity's Last Exam-style questions
Cost optimization:
- Stay below 272K input tokens to avoid the long context premium
- Use Batch/Flex modes for 50% off when latency is flexible
- Cached input drops cost to $0.50/1M (90% savings)
- In Codex, plan for ~258-400K context, not 1M
Caveats Worth Repeating
- All benchmarks above are at reasoning effort
xhigh. Default API settings will likely produce lower scores. - We're 4 days post-launch. External reproduction and independent evaluation are pending.
- OpenAI's comparison tables have empty cells (
-) for some Claude/Gemini entries, so "SOTA across the board" is an overstatement of what was actually published. - Korean and other non-English language performance is not specifically benchmarked in the announcement.
References
- Introducing GPT-5.5 - OpenAI
- GPT-5.5 System Card
- GPT-5.5 API Docs
- CNBC coverage (Ashley Capoot)
- TechCrunch coverage (Lucas Ropek)
- Fortune coverage (Sharon Goldman)
- Help Net Security - cybersecurity safeguards
- Transformer News - AISI red team
- GitHub openai/codex#19319 - context window bug
Disclaimer: AI-assisted research digest. Verify primary sources before making decisions. We're 4 days into the release; expect updates as third-party evaluations come in.
Top comments (0)