Three days ago, the Jerusalem Post asked four major AI models to do something uncomfortable: pick a single date for a hypothetical US military strike on Iran.
The results were revealing:
| Model | Answer | Behavior |
|---|---|---|
| Claude (Anthropic) | March 7-8 | Refused initially, then hedged |
| ChatGPT (OpenAI) | March 1 then 3 | Gave a date, then revised |
| Gemini (Google) | March 4-6 | Built a trigger calendar |
| Grok (xAI) | February 28 | Committed to a date. Twice. |
Today, February 28, the US and Israel launched coordinated strikes on Iran.
What Actually Happened
Grok didn't "predict" anything in the traditional sense. It's trained on real-time X/Twitter data, meaning it aggregated millions of public signals: diplomatic timelines, troop movements, political rhetoric patterns.
It read the open-source intelligence room. And the room was right.
Why This Matters for Developers
Training data matters more than architecture. The model with real-time social data outperformed models with arguably better reasoning.
Model behavior under pressure varies wildly. Claude refused, ChatGPT waffled, Gemini structured, Grok committed.
OSINT aggregation is becoming an AI capability. Models that can approximate classified intelligence timelines from public data raise real questions.
Source: https://www.jpost.com/middle-east/iran-news/article-887917
Top comments (0)