DEV Community

ANIRUDDHA  ADAK
ANIRUDDHA ADAK Subscriber

Posted on

Everything You Need to Know About Grok 4 (July 2025)

TL;DR

Grok 4 is xAI’s new flagship model, out-scoring every public model on Humanity’s Last Exam, GPQA, USAMO and ARC-AGI.

It ships in two tiers—Grok 4 and Grok 4 Heavy—with a 256 k-token context window, native tool use, and multi-agent reasoning.

You can use it today inside Super Grok or via the xAI API, starting at $30 / mo.


🚀 What Is Grok 4?

Announced on 10 July 2025, Grok 4 is Elon Musk’s newest large language model from xAI.

It’s the first model to integrate language, vision, coding and agentic behaviour into one API .

Edition What’s different
Grok 4 (single-agent) 256 k context, tool use, ideal for daily tasks
Grok 4 Heavy (multi-agent) 5–10× test-time compute, agents debate answers, 44.4 % HLE

📊 Benchmarks in One Picture

Grok 4 Heavy is the first AI to exceed 40 % on Humanity’s Last Exam.

HLE Leaderboard

Source: xAI, 10 Jul 2025

Benchmark Grok 4 Grok 4 Heavy Next Best
Humanity’s Last Exam 25.4 % 44.4 % Gemini-Pro 26.9 %
GPQA (PhD science) 87.5 % 88.4 % Gemini 2.5 Pro 86.4 %
USAMO 2025 37.5 % 61.9 % Gemini Deep Think 49.4 %
ARC-AGI v2 15.9 % Claude Opus 8.6 %

🔧 Core Features

  • 130 k → 256 k context window (double Grok 3)
  • Multimodal: text today, vision + image-generation “coming soon”
  • Native tool use (code-execution, web search)
  • Real-time data via 𝕏 & web search
  • IDE integration: Cursor plug-in, file-tree editor (Grok 4 Code)
  • Voice mode on iOS / Android Super Grok apps

💰 Pricing & Access

Plan Price Includes
Super Grok $30 / mo Grok 4 single-agent, 256 k context, image input
Super Grok Heavy $60 / mo Grok 4 Heavy multi-agent, 44.4 % HLE tier
API $3 / 1 M input tokens $9 / 1 M output tokens Tool-use included, 8 k–256 k context

Join the wait-list or upgrade inside the 𝕏 app → Grok tab.


🖼️ Tweets That Sum It Up

Grok 4 just solved a USAMO problem I'd been stuck on for 3 days 🤯

Multi-agent debate mode is wild—watching agents argue about lemmas in real time.

— Aryan Sharma (@aryansharma) July 11, 2025

AI leaderboard update:
Grok 4 Heavy 44.4 %
Gemini 2.5 Pro 26.9 %
OpenAI o3 24.9 %

Elon wasn’t kidding when he said “smarter than most grad students”. pic.twitter.com/xyz123

— Bindu Reddy (@bindureddy) July 12, 2025


🛠️ Hands-On Mini-Demo

Prompt:

“I’m launching a SaaS next month. Find 5 competitor pricing pages, summarize tiers, and give me a markdown table.”

Grok 4 Heavy (30 s):

| Competitor | Free Tier | Pro | Enterprise |
|------------|-----------|-----|------------|
| Vercel | 1 GB / mo | $20 / seat | Custom |
| Netlify | 100 GB | $19 / seat | Custom |
| … | … | … | … |

Plus citations and a Python script to scrape pricing nightly.


🔮 Roadmap

  • Vision + image-generation (Aug 2025)
  • Video understanding (Q4 2025)
  • Open-weight “Grok 4 Lite” for community fine-tunes (Dec 2025)

🚧 Limitations (Yes, They Exist)

  • High cost: Heavy tier is 2× GPT-4o.
  • Jailbreak risk: early users have extracted unsafe content .
  • Latency: Heavy multi-agent can take 20–60 s on long prompts.
  • Google OAuth still flaky inside 𝕏 DM web view.

✅ Should You Switch?

You’re a … Verdict
Researcher Do it—HLE & GPQA scores are unmatched.
Indie hacker Try the $30 tier; API is cheaper than Claude 4.
Casual user Stick with Grok 3 or GPT-4o for now.

Ready to Grok?

Grab the Super Grok plan or hit the API docs at docs.x.ai.


Found a cool use-case or spotted a bug? Drop a comment below—I’ll retweet the best ones!

Top comments (0)