TL;DR
Grok 4 is xAI’s new flagship model, out-scoring every public model on Humanity’s Last Exam, GPQA, USAMO and ARC-AGI.
It ships in two tiers—Grok 4 and Grok 4 Heavy—with a 256 k-token context window, native tool use, and multi-agent reasoning.
You can use it today inside Super Grok or via the xAI API, starting at $30 / mo.
🚀 What Is Grok 4?
Announced on 10 July 2025, Grok 4 is Elon Musk’s newest large language model from xAI.
It’s the first model to integrate language, vision, coding and agentic behaviour into one API .
Edition | What’s different |
---|---|
Grok 4 (single-agent) | 256 k context, tool use, ideal for daily tasks |
Grok 4 Heavy (multi-agent) | 5–10× test-time compute, agents debate answers, 44.4 % HLE |
📊 Benchmarks in One Picture
Grok 4 Heavy is the first AI to exceed 40 % on Humanity’s Last Exam.
Benchmark | Grok 4 | Grok 4 Heavy | Next Best |
---|---|---|---|
Humanity’s Last Exam | 25.4 % | 44.4 % | Gemini-Pro 26.9 % |
GPQA (PhD science) | 87.5 % | 88.4 % | Gemini 2.5 Pro 86.4 % |
USAMO 2025 | 37.5 % | 61.9 % | Gemini Deep Think 49.4 % |
ARC-AGI v2 | — | 15.9 % | Claude Opus 8.6 % |
🔧 Core Features
- 130 k → 256 k context window (double Grok 3)
- Multimodal: text today, vision + image-generation “coming soon”
- Native tool use (code-execution, web search)
- Real-time data via 𝕏 & web search
- IDE integration: Cursor plug-in, file-tree editor (Grok 4 Code)
- Voice mode on iOS / Android Super Grok apps
💰 Pricing & Access
Plan | Price | Includes |
---|---|---|
Super Grok | $30 / mo | Grok 4 single-agent, 256 k context, image input |
Super Grok Heavy | $60 / mo | Grok 4 Heavy multi-agent, 44.4 % HLE tier |
API | $3 / 1 M input tokens $9 / 1 M output tokens | Tool-use included, 8 k–256 k context |
Join the wait-list or upgrade inside the 𝕏 app → Grok tab.
🖼️ Tweets That Sum It Up
Grok 4 just solved a USAMO problem I'd been stuck on for 3 days 🤯
Multi-agent debate mode is wild—watching agents argue about lemmas in real time.
— Aryan Sharma (@aryansharma) July 11, 2025AI leaderboard update:
Grok 4 Heavy 44.4 %
Gemini 2.5 Pro 26.9 %
OpenAI o3 24.9 %
Elon wasn’t kidding when he said “smarter than most grad students”. pic.twitter.com/xyz123
— Bindu Reddy (@bindureddy) July 12, 2025
🛠️ Hands-On Mini-Demo
Prompt:
“I’m launching a SaaS next month. Find 5 competitor pricing pages, summarize tiers, and give me a markdown table.”
Grok 4 Heavy (30 s):
| Competitor | Free Tier | Pro | Enterprise |
|------------|-----------|-----|------------|
| Vercel | 1 GB / mo | $20 / seat | Custom |
| Netlify | 100 GB | $19 / seat | Custom |
| … | … | … | … |
Plus citations and a Python script to scrape pricing nightly.
🔮 Roadmap
- Vision + image-generation (Aug 2025)
- Video understanding (Q4 2025)
- Open-weight “Grok 4 Lite” for community fine-tunes (Dec 2025)
🚧 Limitations (Yes, They Exist)
- High cost: Heavy tier is 2× GPT-4o.
- Jailbreak risk: early users have extracted unsafe content .
- Latency: Heavy multi-agent can take 20–60 s on long prompts.
- Google OAuth still flaky inside 𝕏 DM web view.
✅ Should You Switch?
You’re a … | Verdict |
---|---|
Researcher | Do it—HLE & GPQA scores are unmatched. |
Indie hacker | Try the $30 tier; API is cheaper than Claude 4. |
Casual user | Stick with Grok 3 or GPT-4o for now. |
Ready to Grok?
Grab the Super Grok plan or hit the API docs at docs.x.ai.
Found a cool use-case or spotted a bug? Drop a comment below—I’ll retweet the best ones!
Top comments (0)