Four AI models. One job: build a playable game. One laptop. I gave four different AI models the exact same prompt — build a complete, playable Asteroids game in a single HTML file — and then I actually played what each one built. Three ran entirely on my laptop. One was the cloud, as the benchmark to beat. Here is the whole thing in under a minute.
I have been doing a lot of these little head-to-heads lately, and I wanted one where the test was real — not a riddle or a trick question, but an actual build. Give every model the same spec, let it write the whole thing in one shot, then open the file and see if the game actually plays. No cherry-picking, no fixing it up afterward. Either it runs or it doesn't.
The setup
Everything ran on a MacBook Pro M5 Max with 128 GB of memory. Three of the four models ran 100% local — no internet, nothing leaving the machine — through MLX and llama.cpp. The fourth was Cloud Claude Opus, sitting in a data center, there to set the bar.
| Model | Where | Time | Speed |
|---|---|---|---|
| DeepSeek V4 Flash 284B | Local | 300s | 31 tok/s |
| Cloud Claude Opus | Cloud | 48s | 115 tok/s |
| Qwen3-Coder 30B | Local | 33s | 95 tok/s |
| Gemma 4 31B | Local | 145s | 26 tok/s |
The prompt was identical for all of them: a single self-contained HTML file, vanilla JavaScript, no libraries — a ship that rotates and thrusts, bullets, asteroids that split when you shoot them, screen wrap-around, score, lives, and a game-over screen. One shot.
What happened
All four wrote a real, working game. That part genuinely surprised me — a few years ago, "write me a complete arcade game in one file" was not something you handed to a model running on a laptop and expected to play afterward.
DeepSeek built the most polished one. A glowing ship with a thrust trail, a starfield background, asteroids that actually broke apart, hearts for lives. It took the longest, but it played beautifully. Cloud Claude was fast and clean — lives, a level counter, even on-screen control hints. Qwen3-Coder 30B was the speed demon, turning out a solid game in 33 seconds. And Gemma delivered a tidy, working game of its own.
The verdict
The winner, to my eye, was DeepSeek — and the part that sticks with me is that it was running locally, on my desk, and it built a better game than the cloud model. Cloud Claude was excellent and much faster, but on the actual finished product, the local 284B model edged it out.
The bigger point is the one I keep landing on in everything I write here. Three of these four ran completely offline, for free, on one machine. No subscription, no metered tokens, nothing leaving the laptop. The cloud is still faster, and for a lot of work that speed matters. But "you have to use the cloud or it won't be any good" is just not true anymore. Local AI is catching up, and on this particular test it pulled ahead.
Run local models yourself (free)
If you want to try this, the abliterated MLX models I have converted for Apple Silicon are all free to pull:
- My Hugging Face — abliterated MLX models for Apple Silicon
- claude-code-local — run Claude Code against a local model, no API key.
The narration in the video is local text-to-speech, and the whole comparison was made on one MacBook. The only thing that touched the cloud was the Cloud Claude baseline.
Originally published at Nice Dreamz Wholesale. I convert abliterated MLX models for Apple Silicon — all free at my Hugging Face.
Top comments (0)