TL;DR
Grok 4 = single brain, fast, cheap
Grok 4 Heavy = committee of brains, slower, $300 / mo, but record-breaking scores
1οΈβ£ 30-Second Visual Cheat-Sheet
| Grok 4 | Grok 4 Heavy | |
|---|---|---|
| Architecture | 1 agent | Multi-agent (committee of 4-6 copies) |
| Speed | β‘οΈ 2β3 s / 1 k tokens | π 15β25 s / 1 k tokens |
| Price | X Premium+ ($16 / mo) | SuperGrok Heavy ($300 / mo) |
| Humanityβs Last Exam | 38.6 % | 44.4 % |
| USAMO | 37.5 % | 61.9 % |
| AIME | 91.7 % | 100 % |
| Context Window | 128 k (app) / 256 k (API) | same |
| Best for | Daily dev work, chat | Research, theorem proving, PhD-level reasoning |
2οΈβ£ Benchmark Heat-Map
(Higher = greener)
| Benchmark | Grok 4 | Grok 4 Heavy |
|---|---|---|
| GPQA Science | 87.5 % | 88.4 % |
| LiveCodeBench | 79.0 % | 79.4 % |
| USAMO | 37.5 % | π₯ 61.9 % |
| AIME | 91.7 % | π₯ 100 % |
| ARC-AGI | 15.9 % | 15.9 % (same model core) |
| Humanityβs Last Exam | 38.6 % | 44.4 % |
Source: xAI livestream & independent evals
4οΈβ£ When Should You Pick Which?
| Use-case | Pick | Reason |
|---|---|---|
| Casual chat / general code | Grok 4 | Fast & cheap |
| Deep math proofs, PhD questions | Grok 4 Heavy | Highest score on record |
| Large-scale document analysis | Grok 4 Heavy | Multi-agent cross-checking |
| API on a budget | Grok 4 | $0.15 vs $0.30 per 1 k tokens |
| Enterprise research labs | Grok 4 Heavy | Accuracy > cost |
5οΈβ£ Cost Reality Check
Monthly usage: 500 k tokens/day
βββ Grok 4 (X Premium+) β $ 16
βββ Grok 4 Heavy (SuperGrok) β $ 300
6οΈβ£ One-Sentence Summary
Grok 4 is your everyday sports-car; Grok 4 Heavy is the F-1 racer you rent when the podium is the only acceptable outcome.
π Try them right now
β’ Grok 4: grok.com with any X Premium+ account
β’ Grok 4 Heavy: toggle βHeavyβ after subscribing to SuperGrok Heavy tier
Top comments (0)