TL;DR / Quick Answer
GPT-5.4 is OpenAI's most advanced AI model for professional work, released March 5, 2026. It merges GPT-5.3-Codex’s coding strengths with improved reasoning, computer use, and tool integration. GPT-5.4 achieves an 83% win rate on knowledge work, 75% on computer use benchmarks, and uses fewer tokens than GPT-5.2. Access via API: $2.50/M input tokens, $15/M output tokens, with a Pro version ($30/$180) for complex workloads.
Introduction
OpenAI’s GPT-5.4, released March 5, 2026, sets a new bar for AI-powered professional workflows. It wins 83% of real-world knowledge work benchmarks, uses fewer tokens than GPT-5.2, and is 33% less likely to hallucinate facts. It also completes computer-use tasks 3x faster than previous models.
💡 For developers integrating AI, robust API testing is essential. Apidog streamlines API design, debugging, testing, and mocking—ideal when connecting GPT-5.4 or your own services. Its unified platform accelerates team workflows for AI integration.
button
This guide covers GPT-5.4’s practical improvements, benchmarks, and actionable integration tips. You'll get:
- Direct comparisons with GPT-5.2 and GPT-5.3-Codex
- Benchmark scores: coding, computer use, and knowledge work
- Examples of new computer use and vision features
- Pricing breakdown: Pro vs Standard
- API integration advice
What Is GPT-5.4?
GPT-5.4 is OpenAI’s first general-purpose model with built-in computer use features. It fuses GPT-5.3-Codex’s code generation with improved reasoning, vision, and tool integration.
Professional scenarios targeted:
- Knowledge work: Spreadsheets, presentations, documents, analytics (83% win on GDPval, up from 70.9% for GPT-5.2).
- Computer use/agents: Mouse/keyboard automation, browser scripting, workflow execution (75% OSWorld-Verified, above human 72.4%).
- Coding/development: Writing/debugging code, SWE-Bench Pro 57.7% accuracy, up to 1M token context.
Variants:
- GPT-5.4: For most tasks.
- GPT-5.4 Pro: For complex reasoning ($30/M input, $180/M output).
Key Improvements Over GPT-5.2
GPT-5.4 brings major advances in four areas:
1. Factual Accuracy and Hallucination Reduction
- 33% fewer false claims
- 18% fewer overall errors
- Critical for legal, finance, and technical generation
2. Token Efficiency
- Uses up to 47% fewer tokens in tool-heavy workflows (MCP Atlas)
- Mitigates higher per-token cost for API users
3. Computer Use Capabilities
- Native screenshot-to-action automation:
- Mouse/keyboard commands
- Browser scripting via Playwright
- Desktop navigation
- Customizable safety and confirmation
4. Tool Search and Integration
- On-demand tool definition lookup
- No need to load all tool specs into every request
- Enables workflows with tens of thousands of tools
- 54.6% accuracy on Toolathlon (vs 45.7% for GPT-5.2), fewer tool yields needed
GPT-5.4 Performance Benchmarks
Where does GPT-5.4 lead? See results below.
Knowledge Work (GDPval)
| Model | Win Rate vs Professionals |
|---|---|
| GPT-5.4 | 83.0% |
| GPT-5.4 Pro | 82.0% |
| GPT-5.2 Pro | 74.1% |
| GPT-5.2 | 70.9% |
GDPval covers knowledge tasks in 44 job types: sales, accounting, urgent care, manufacturing, and more.
Spreadsheet and Document Creation
- GPT-5.4: 87.3% mean score
- GPT-5.2: 68.4% mean score
For presentations, human reviewers preferred GPT-5.4 outputs 68% of the time.
Coding Performance (SWE-Bench Pro)
| Model | Accuracy | Estimated Latency |
|---|---|---|
| GPT-5.4 | 57.7% | ~1000s |
| GPT-5.3-Codex | 56.8% | ~1200s |
| GPT-5.2 | 55.6% | ~1500s |
GPT-5.4 matches/exceeds GPT-5.3-Codex, delivers lower latency, and supports /fast mode (1.5x token speed).
Computer Use (OSWorld-Verified)
- GPT-5.4: 75.0%
- GPT-5.3-Codex: 74.0% (maintaining full image resolution)
- GPT-5.2: 47.3%
- Human: 72.4%
Tasks: Email/calendar management, data entry, file operations, cross-app workflows.
Web Browsing (BrowseComp)
- GPT-5.4 Pro: 89.3%
- GPT-5.4: 82.7%
- GPT-5.2 Pro: 77.9%
- GPT-5.2: 65.8%
17% improvement over GPT-5.2; much better at persistent multi-source research.
Visual Understanding
-
MMMU Pro (no tools):
- GPT-5.4: 81.2%
- GPT-5.2: 79.5%
-
OmniDocBench (document parsing, lower is better):
- GPT-5.4: 0.109
- GPT-5.2: 0.140
Computer Use and Vision Capabilities
GPT-5.4 is the first OpenAI model with general-purpose computer operation.
How Computer Use Works
GPT-5.4 interprets screenshots and generates:
- Coordinate-based clicks
- Keyboard input
- Playwright commands for browsers
- Mouse movement/drag
Configure system messages to tune safety/confirmation based on risk.
Real-World Computer Use Example
Mainstay tested GPT-5.4 on 30,000+ property tax portals:
- GPT-5.4: 95% first-attempt success, 100% in three attempts
- Previous models: 73-79%
- Sessions: 3x faster, 70% fewer tokens/session
Handles navigation, data extraction, authentication, captchas, and multi-step forms.
Enhanced Visual Perception
Supports:
- Up to 10.24M pixels
- 6000-pixel max dimension
- High-fidelity for dense images
High detail option: 2.56M pixels, 2048-pixel max. API users report improved localization and click accuracy.
Document Parsing Improvements
Parses:
- Multi-page PDFs (tables, figures)
- Scanned docs (varied layouts)
- Screenshots with UI/text
- Technical diagrams
OmniDocBench error rate: 0.140 → 0.109 (22% improvement).
Coding and Development Features
GPT-5.4 extends GPT-5.3-Codex’s code generation with new computer use support.
Frontend Development
Excels at complex UI generation and browser automation.
Example: Theme Park Simulation
One prompt creates an isometric theme park sim with:
- Tile-based path/ride/scenery placement
- Guest pathfinding/metrics
- Playwright-powered browser playtests
- Isometric asset generation
Model builds, tests, and verifies game mechanics and UI stability.
Fast Mode for Developers
Use Codex /fast mode for up to 1.5x token speed—API gets priority processing for rapid iteration.
Context Window Support
GPT-5.4 Codex supports up to 1M tokens (experimental):
- Set
model_context_windowandmodel_auto_compact_token_limitparameters - >272K tokens: usage charged at 2x rate
Use for full codebase analysis, large doc sets, multi-file projects.
Apidog for API Documentation:
Keep your API documentation in sync as you integrate GPT-5.4. Apidog imports OpenAPI/Swagger specs, generates interactive docs, and auto-syncs with code changes.
Tool Integration and Search
Tool search changes tool interaction:
How Tool Search Works
- Before: All tool definitions loaded upfront (high token cost)
- Now: Model receives tool list, fetches definitions on-demand
Token Savings Example
Scale’s MCP Atlas: 250 tasks, 36 MCP servers:
Without tool search:
- 65,320 upfront input tokens (definitions)
- More tokens for outputs
With tool search: Eliminates upfront token cost, retains cache efficiency.
MCP Atlas Performance
- GPT-5.4: 67.2% accuracy
- GPT-5.2: 60.6% accuracy
Handles larger tool ecosystems efficiently.
Agentic Tool Calling
Toolathlon evaluates multi-step tool workflows (email, attachments, upload, grading):
Fewer tool yields (rounds) = lower latency.
GPT-5.4 vs GPT-5.3-Codex vs GPT-5.2
Choose your model based on workload:
When to Use GPT-5.4
- Need computer/browser automation
- Advanced knowledge work (spreadsheets, docs, presentations)
- Tool-heavy/agentic workflows (MCP servers, APIs)
- High-volume work where token savings offset higher cost
- Large context (up to 1M tokens)
When GPT-5.3-Codex Remains Competitive
- Pure code tasks (similar SWE-Bench Pro scores)
- Existing Codex integrations without computer use
- Lower per-token pricing (if available)
When GPT-5.2 Suffices
- Simple Q&A, summarization, basic generation
- Strict budget constraints
- Single-turn or non-agentic requests
Pricing Comparison
| Model | Input Price | Cached Input | Output Price |
|---|---|---|---|
| GPT-5.2 | $1.75/M | $0.175/M | $14/M |
| GPT-5.4 | $2.50/M | $0.25/M | $15/M |
| GPT-5.2 Pro | $21/M | - | $168/M |
| GPT-5.4 Pro | $30/M | - | $180/M |
Batch/Flex: 50% discount. Priority: 2x standard rate.
Availability and Access Options
GPT-5.4 is rolling out across ChatGPT, Codex, and API.
ChatGPT Access
- GPT-5.4 Thinking: ChatGPT Plus, Team, Pro
- GPT-5.4 Pro: ChatGPT Pro, Enterprise
- Legacy: GPT-5.2 available until June 5, 2026
Early enterprise/education access via admin settings.
Codex Access
- Default model: GPT-5.4
- Supports 1M context (experimental)
- Playwright Interactive for browser testing
-
/fastmode (1.5x token speed)
API Access
Model names:
-
gpt-5.4(standard) -
gpt-5.4-pro(pro)
Context:
- 272K standard
- Up to 1M (experimental, 2x rate)
Pricing:
- Standard: $2.50/M input, $0.25/M cached, $15/M output
- Pro: $30/M input, $180/M output
- Batch/Flex: 50% discount
- Priority: 2x rate
Deprecation Timeline
GPT-5.2 retires June 5, 2026. Migrate workflows before then.
Conclusion
GPT-5.4 offers significant advances in accuracy, efficiency, and capability for knowledge work, coding, and computer use. Its 83% GDPval win rate, 75% OSWorld-Verified, and 57.7% SWE-Bench Pro scores set a new standard for AI productivity.
For API integration, use robust tools like Apidog to design, debug, and document endpoints. This ensures reliable GPT-5.4-powered workflows from day one.
button
Key takeaways:
- 33% fewer false claims, 18% fewer response errors
- 47% token reduction in tool workflows
- 75% computer use success, better than human baseline
- Native mouse/keyboard operation
- Tool search: scale to tens of thousands of tools
- 1M token context window (experimental)
- Standard: $2.50/$15 per million tokens
When to adopt:
- Need for computer use/browser automation
- High-volume/token-sensitive workflows
- Factual accuracy is critical
- Large tool ecosystem or MCP server integration
- Long-context code or document analysis
When to wait:
- Simple Q&A workflows
- Strict budget priorities
- Existing GPT-5.2/5.3-Codex workflows suffice
GPT-5.4 is OpenAI’s most efficient reasoning model yet—pair improved accuracy and token savings with native computer use for next-gen professional automation.
FAQ
What is the difference between GPT-5.4 and GPT-5.2?
GPT-5.4: 83% win on knowledge work (vs 70.9%), uses fewer tokens, native computer use, 33% fewer hallucinations. Cost: $2.50/$15 (vs $1.75/$14 for GPT-5.2), but better efficiency can lower total spend.
How much does GPT-5.4 API cost?
$2.50/M input tokens, $0.25/M cached input, $15/M output. Pro: $30/M input, $180/M output. Batch/Flex: 50% off.
Does GPT-5.4 have a context window limit?
Yes: 272K tokens standard. 1M tokens (experimental) using model_context_window and model_auto_compact_token_limit (counts at 2x rate).
What is GPT-5.4 Pro used for?
Complex reasoning tasks. Higher benchmark scores (e.g., BrowseComp 89.3%) but costs 12x more.
When did GPT-5.4 release?
March 5, 2026. GPT-5.2 available until June 5, 2026 for migration.
Can GPT-5.4 use computers and browsers?
Yes. Issues mouse/keyboard commands, automates browsers (Playwright), and navigates desktops from screenshots.
What is tool search in GPT-5.4?
On-demand lookup of tool definitions—reduces tokens by 47% in tool-heavy workflows and supports large tool ecosystems.
How does GPT-5.4 compare to GPT-5.3-Codex for coding?
Similar or better SWE-Bench Pro (57.7% vs 56.8%) with lower latency and built-in computer use. Best for new dev workflows.
Is GPT-5.4 available in ChatGPT?
Yes—Plus, Team, Pro for GPT-5.4, and Pro/Enterprise for GPT-5.4 Pro. GPT-5.2 remains under Legacy until June 2026.
What are the safety considerations for GPT-5.4?
High cyber capability per OpenAI’s Preparedness Framework: expanded cyber safety, monitoring, access controls, and blocking for higher-risk requests. Some false positives expected as classifiers improve.





Top comments (0)