GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

#webdev #ai #programming #openai

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost

OpenAI shipped GPT-5.4 mini and nano on March 17. GitHub Copilot deployed mini within 24 hours. Here is what the benchmarks tell us.

The Numbers

GPT-5.4 mini scored 54.4% on SWE-Bench Pro. The full GPT-5.4? 57.7%. A 3.3-point gap, down from 12 points last generation.

On OSWorld-Verified: mini hit 72.1%. Human baseline: 72.4%. The small model matches human-level computer operation.

Benchmark	GPT-5.4	Mini	GPT-5 mini
SWE-Bench Pro	57.7%	54.4%	45.7%
OSWorld	75.0%	72.1%	42.0%
GPQA Diamond	93.0%	88.0%	81.6%

The Subagent Play

Designed for multi-agent systems. GPT-5.4 plans. Mini runs parallel subtasks. Nano handles classification at /bin/zsh.20/M tokens.

Hebbia CTO reported mini outperformed full GPT-5.4 on task-matched workloads.

Pricing

Mini: /bin/zsh.75/M input (3x over GPT-5 mini). A 50K-token code review costs /bin/zsh.08 vs /bin/zsh.60+ on flagship. 7-10x cheaper for 94% performance.

The catch: small model prices keep rising. GPT-4o mini was /bin/zsh.15. GPT-5 mini: /bin/zsh.25. Now /bin/zsh.75.

Full analysis with cost calculations: Skila AI

DEV Community

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost

The Numbers

The Subagent Play

Pricing

Top comments (0)