DEV Community

Vilius
Vilius

Posted on

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

By Vilius Vystartas | May 2026

I ran the same 10 agent coding tasks against 8 locally-running models on my Mac. No cloud, no API keys, no per-token billing. The results surprised me enough that I ran them twice.

The leaderboard

Model Bits Size Score Time
Qwen 3.5 9B 4-bit ~5GB 83% 190s
AgenticQwen 8B 4-bit ~5GB 82% 189s
Bonsai 4B 1-bit 545MB 80% 18s
Ternary Bonsai 1.7B 2-bit 442MB 80% 10s
Bonsai 8B 1-bit 1.1GB 80% 15s
Ternary Bonsai 4B 2-bit 1.0GB 80% 20s
Ternary Bonsai 8B 2-bit 2.1GB 78% 22s
Bonsai 1.7B 1-bit 237MB 73% 8s

A 545MB model beats GPT-5.4

Bonsai 4B at 1-bit quantization scores 80% on the same tasks where GPT-5.4 scored 75%. Half a gigabyte. No data center. Your laptop processes every request locally, zero latency. It's 3x faster than the Qwen models because there's less to compute.

4-bit controls tie Claude

The 4-bit Qwen models at ~5GB score 82-83% — matching Claude Sonnet 4's cloud performance. On a Mac. These aren't toys.

1-bit vs 2-bit (ternary): the extra bit is dead weight

At the 1.7B size, ternary helps — 80% vs 73%. But at 4B and 8B, 1-bit and 2-bit perform identically (80%). That extra bit costs double the disk (1.0GB vs 545MB, 2.1GB vs 1.1GB) for zero gain. At larger model sizes, 1-bit quantization has already captured everything the model can offer.

What this means

You can run an agent coding model that beats GPT-5.4 on a laptop with no internet. For regulated industries — healthcare, finance, government — this removes the compliance headache. No data leaves the device. No vendor API agreement to negotiate. No per-request billing to track.

The Bonsai findings are also on benchmarks.workswithagents.dev, refreshed with each run. Alongside the cloud models for direct comparison.

I didn't expect a 545MB quantized model to beat a cutting-edge cloud API. But here we are.

Top comments (0)