ANIRUDDHA ADAK

Posted on Jul 18, 2025

Everything You Need to Know About Grok 4 (July 2025)

#webdev #programming #ai #java

TL;DR

Grok 4 is xAI’s new flagship model, out-scoring every public model on Humanity’s Last Exam, GPQA, USAMO and ARC-AGI.

It ships in two tiers—Grok 4 and Grok 4 Heavy—with a 256 k-token context window, native tool use, and multi-agent reasoning.

You can use it today inside Super Grok or via the xAI API, starting at $30 / mo.

🚀 What Is Grok 4?

Announced on 10 July 2025, Grok 4 is Elon Musk’s newest large language model from xAI.

It’s the first model to integrate language, vision, coding and agentic behaviour into one API .

Edition	What’s different
Grok 4 (single-agent)	256 k context, tool use, ideal for daily tasks
Grok 4 Heavy (multi-agent)	5–10× test-time compute, agents debate answers, 44.4 % HLE

📊 Benchmarks in One Picture

Grok 4 Heavy is the first AI to exceed 40 % on Humanity’s Last Exam.

Source: xAI, 10 Jul 2025

Benchmark	Grok 4	Grok 4 Heavy	Next Best
Humanity’s Last Exam	25.4 %	44.4 %	Gemini-Pro 26.9 %
GPQA (PhD science)	87.5 %	88.4 %	Gemini 2.5 Pro 86.4 %
USAMO 2025	37.5 %	61.9 %	Gemini Deep Think 49.4 %
ARC-AGI v2	—	15.9 %	Claude Opus 8.6 %

🔧 Core Features

130 k → 256 k context window (double Grok 3)
Multimodal: text today, vision + image-generation “coming soon”
Native tool use (code-execution, web search)
Real-time data via 𝕏 & web search
IDE integration: Cursor plug-in, file-tree editor (Grok 4 Code)
Voice mode on iOS / Android Super Grok apps

💰 Pricing & Access

Plan	Price	Includes
Super Grok	$30 / mo	Grok 4 single-agent, 256 k context, image input
Super Grok Heavy	$60 / mo	Grok 4 Heavy multi-agent, 44.4 % HLE tier
API	$3 / 1 M input tokens $9 / 1 M output tokens	Tool-use included, 8 k–256 k context

Join the wait-list or upgrade inside the 𝕏 app → Grok tab.

🖼️ Tweets That Sum It Up

Grok 4 just solved a USAMO problem I'd been stuck on for 3 days 🤯

Multi-agent debate mode is wild—watching agents argue about lemmas in real time.

— Aryan Sharma (@aryansharma) July 11, 2025

AI leaderboard update:
Grok 4 Heavy 44.4 %
Gemini 2.5 Pro 26.9 %
OpenAI o3 24.9 %

Elon wasn’t kidding when he said “smarter than most grad students”. pic.twitter.com/xyz123

— Bindu Reddy (@bindureddy) July 12, 2025

🛠️ Hands-On Mini-Demo

Prompt:

“I’m launching a SaaS next month. Find 5 competitor pricing pages, summarize tiers, and give me a markdown table.”

Grok 4 Heavy (30 s):

| Competitor | Free Tier | Pro | Enterprise |
|------------|-----------|-----|------------|
| Vercel | 1 GB / mo | $20 / seat | Custom |
| Netlify | 100 GB | $19 / seat | Custom |
| … | … | … | … |

Plus citations and a Python script to scrape pricing nightly.

🔮 Roadmap

Vision + image-generation (Aug 2025)
Video understanding (Q4 2025)
Open-weight “Grok 4 Lite” for community fine-tunes (Dec 2025)

🚧 Limitations (Yes, They Exist)

High cost: Heavy tier is 2× GPT-4o.
Jailbreak risk: early users have extracted unsafe content .
Latency: Heavy multi-agent can take 20–60 s on long prompts.
Google OAuth still flaky inside 𝕏 DM web view.

✅ Should You Switch?

You’re a …	Verdict
Researcher	Do it—HLE & GPQA scores are unmatched.
Indie hacker	Try the $30 tier; API is cheaper than Claude 4.
Casual user	Stick with Grok 3 or GPT-4o for now.

Ready to Grok?

Grab the Super Grok plan or hit the API docs at docs.x.ai.

Found a cool use-case or spotted a bug? Drop a comment below—I’ll retweet the best ones!

DEV Community