DEV Community

Skila AI
Skila AI

Posted on • Originally published at news.skila.ai

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost

OpenAI shipped GPT-5.4 mini and nano on March 17. GitHub Copilot deployed mini within 24 hours. Here is what the benchmarks tell us.

The Numbers

GPT-5.4 mini scored 54.4% on SWE-Bench Pro. The full GPT-5.4? 57.7%. A 3.3-point gap, down from 12 points last generation.

On OSWorld-Verified: mini hit 72.1%. Human baseline: 72.4%. The small model matches human-level computer operation.

Benchmark GPT-5.4 Mini GPT-5 mini
SWE-Bench Pro 57.7% 54.4% 45.7%
OSWorld 75.0% 72.1% 42.0%
GPQA Diamond 93.0% 88.0% 81.6%

The Subagent Play

Designed for multi-agent systems. GPT-5.4 plans. Mini runs parallel subtasks. Nano handles classification at /bin/zsh.20/M tokens.

Hebbia CTO reported mini outperformed full GPT-5.4 on task-matched workloads.

Pricing

Mini: /bin/zsh.75/M input (3x over GPT-5 mini). A 50K-token code review costs /bin/zsh.08 vs /bin/zsh.60+ on flagship. 7-10x cheaper for 94% performance.

The catch: small model prices keep rising. GPT-4o mini was /bin/zsh.15. GPT-5 mini: /bin/zsh.25. Now /bin/zsh.75.


Full analysis with cost calculations: Skila AI

Top comments (0)