Over the past week, I ran brand checks on 160 companies across ChatGPT, Gemini, Perplexity, and Claude as part of building GEO Brand Monitor. The results surprised me — not because AI search is powerful, but because how differently the four engines score the same brand.
Here's the data.
The scoring method
Each engine gets queried with prompts like "what can you tell me about [Brand]" and "is [Brand] trustworthy." The response is scored 0–100 based on sentiment, mention frequency, and recommendation likelihood. 0 = negative/absent, 100 = confidently recommended.
You can run any brand yourself at geo.atlas1m.com — free, no signup.
Key findings
1. Engine gaps are the real story
A brand can score 100 on ChatGPT and 12 on Claude at the same time. That's not noise — it reflects how differently each engine was trained and what data it prioritizes.
Top gaps found:
| Brand | ChatGPT | Gemini | Perplexity | Claude |
|---|---|---|---|---|
| Cash App | 88 | 100 | 100 | 12 |
| Tripadvisor | 100 | 17 | 100 | 100 |
| West Elm | 100 | 100 | 17 | 100 |
| Hims | 83 | 17 | 83 | 88 |
| ClickUp | 100 | 50 | 100 | 86 |
| Klaviyo | 100 | 100 | 50 | 100 |
| Ramp | 100 | 100 | 100 | 50 |
| Lululemon | 100 | 92 | 100 | 50 |
Cash App at Claude:12 is the lowest single-engine score I've recorded. For context, Comcast — a brand synonymous with bad customer service — scores Claude:17. Cash App's Claude score is worse than Comcast's.
2. Each engine has a different "memory"
After running 160+ checks, a pattern emerged:
Claude heavily weights long-form investigative journalism. If a brand had a significant negative story in a major publication (NYT, The Verge, WSJ) in the last 3–7 years, that story often lingers in Claude's training data and depresses the score. Away luggage scores Claude:50 because of a 2019 Verge exposé about CEO Steph Korey. That story is still "alive" in Claude's weights.
Perplexity reflects live web sentiment most closely. It's a leading indicator — if your brand is getting bad press right now, Perplexity shows it within days. Klaviyo Perplexity:50 and Mercury Perplexity:50 suggest something in current web sentiment is pushing back.
ChatGPT is training-data-bound. It changes slowest. Good for understanding baseline historical reputation.
Gemini draws from Google's knowledge graph + recent web. Tends to correlate with traditional Google reputation signals.
3. Some brands with terrible reputations score surprisingly well
Comcast scores 42/100 overall — one of the lowest of any major brand I tested. But ChatGPT gives it a 50 (neutral) and Claude gives it a 17 (negative). Gemini: 17, Perplexity: 83.
X/Twitter scored ChatGPT:0 — the most polarizing single-engine score I found. When ChatGPT is asked about X/Twitter, it returns something sufficiently negative that it scores zero.
VW scores ChatGPT:50 — the Dieselgate scandal from 2015 is still materially impacting training data eleven years later.
4. Brands that "know" their space score better
OpenAI scores 88/100 on its own product (ChatGPT). Duolingo scores 96/100 with only a minor Claude dip from a 2024 contractor AI-replacement story.
Brands that generated significant authoritative coverage — even slightly negative — often outperform anonymous brands with no coverage at all. Zero mentions scores lower than mixed mentions.
5. The "good" brands that aren't perfect
Many well-regarded SaaS brands have a single-engine outlier:
- Notion: Claude:50
- ActiveCampaign: Claude:50
- Monday.com: Perplexity:50
- Typeform: Claude:50
- Brex: ChatGPT:88 (minor)
For these brands, the overall score looks fine (85–95/100) but the outlier engine is a gap in their AI-driven discovery channel.
Why this matters
ChatGPT processes 1.6 billion daily queries. When someone asks "what's the best corporate card for startups," Ramp's Claude:50 vs ChatGPT:100 is the difference between being recommended and getting a neutral mention in Claude's answer.
For brands that sell online — where AI Overviews have already cut organic CTR by 61% according to Ahrefs — AI search visibility is becoming the new first page of Google. Except there's no rank tracker, no Search Console, and no established playbook.
That's what I'm building.
Try it
Free brand check (no signup): geo.atlas1m.com
Run your brand, your clients' brands, or your competitors. The engine breakdown shows you exactly where the gaps are. If you're an SEO consultant or agency, this is the audit hook that starts client conversations about AI strategy.
Paid monitoring (weekly score tracking + change alerts): EUR 19/month.
What's the most surprising brand score you find? Drop it in the comments.
Top comments (0)