GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost
OpenAI shipped GPT-5.4 mini and nano on March 17. GitHub Copilot deployed mini within 24 hours. Here is what the benchmarks tell us.
The Numbers
GPT-5.4 mini scored 54.4% on SWE-Bench Pro. The full GPT-5.4? 57.7%. A 3.3-point gap, down from 12 points last generation.
On OSWorld-Verified: mini hit 72.1%. Human baseline: 72.4%. The small model matches human-level computer operation.
| Benchmark | GPT-5.4 | Mini | GPT-5 mini |
|---|---|---|---|
| SWE-Bench Pro | 57.7% | 54.4% | 45.7% |
| OSWorld | 75.0% | 72.1% | 42.0% |
| GPQA Diamond | 93.0% | 88.0% | 81.6% |
The Subagent Play
Designed for multi-agent systems. GPT-5.4 plans. Mini runs parallel subtasks. Nano handles classification at /bin/zsh.20/M tokens.
Hebbia CTO reported mini outperformed full GPT-5.4 on task-matched workloads.
Pricing
Mini: /bin/zsh.75/M input (3x over GPT-5 mini). A 50K-token code review costs /bin/zsh.08 vs /bin/zsh.60+ on flagship. 7-10x cheaper for 94% performance.
The catch: small model prices keep rising. GPT-4o mini was /bin/zsh.15. GPT-5 mini: /bin/zsh.25. Now /bin/zsh.75.
Full analysis with cost calculations: Skila AI
Top comments (0)