DEV Community

Cover image for I Blind-Tested 12 Brand Matchups Across 3 AI Models. Half Were Total Blowouts
Marco Duval
Marco Duval

Posted on

I Blind-Tested 12 Brand Matchups Across 3 AI Models. Half Were Total Blowouts

I keep coming back to this stat: 37% of consumers now start their searches with AI instead of Google. Not for fun. For buying decisions. "What CRM should I use?" "Best running shoes for flat feet?" "Shopify or WooCommerce?"

And the AI just... answers. No ten blue links. No ads. One recommendation, maybe two, with an explanation.

So the obvious question: when AI recommends something, what is it actually basing that on? Is it brand recognition? Content quality? Something else entirely?

I wanted to find out. So I built a tool and tested it.

What I actually did

I built GEO-Compare, which runs a blind test between two brand websites. It deep-crawls both sites, feeds the content to three AI models (GPT-5 Mini, Gemini 2.5 Flash, and Claude 3.5 Haiku), and asks each one to evaluate both sites on content quality, authority, expertise, user experience, and brand trust. They pick a winner and explain why. The scores get aggregated into a percentage split.

I started with 12 head-to-head matchups across different industries: SaaS, e-commerce, finance, dev tools, cloud, design, health, education. A mix of obvious rivalries and interesting pairings.

Some results were what I expected. Others made no sense to me at first.

The blowouts

Half the matchups weren't even close. Five of them ended 100-0 or close to it.

Stripe 100% vs PayPal 0%. A shutout. Every model, every prompt, every dimension picked Stripe. Stripe's website is dense with clear product explanations, use cases broken down by business type, and well-structured documentation. PayPal's site leans heavily on marketing copy and promotions. When AI evaluates which payment platform to recommend, it goes with the one that gives it more substantive information to work with. (Full results)

Shopify 100% vs WooCommerce 0%. Another shutout. This one surprised me less. Shopify's site is a machine: structured feature pages, industry-specific guides, pricing breakdowns, app ecosystem documentation. WooCommerce leans on WordPress's open-source positioning, which is compelling to developers but doesn't give AI models much to evaluate for a general audience.

Coursera 100% vs Udemy 0%. Third shutout. Coursera has structured course descriptions, university partnerships, degree program details, career outcome data. Udemy is a marketplace with user-generated course listings. AI models overwhelmingly prefer the one with institutional backing and structured educational content.

AWS 91% vs Azure 9%. Not quite a shutout, but close. AWS's documentation is massive and well-organized. Azure has good content too, but AWS's sheer depth of product pages, case studies, and technical documentation gave it a decisive edge.

Squarespace 91% vs Wix 9%. This one I didn't see coming. Wix has more users globally, but Squarespace's website content is much more structured and design-focused. Clear product explanations, template galleries with context, feature comparisons. Wix's site is more scattered, pushing multiple products and promotions. AI preferred the focused approach.

The close calls

Not everything was a blowout. A few matchups were genuinely tight.

Perplexity 48% vs Google 52%. Basically a coin flip. The AI-native search engine and the traditional search giant came out nearly even. Both have strong brand content, both are clear about what they do. There's something funny about AI models being asked to pick between an AI search engine and Google and not really having a strong opinion. (Full results)

MyFitnessPal 57% vs Noom 43%. A close race in health and fitness. MyFitnessPal edged it out, probably because its site has more concrete feature descriptions and food database information. Noom's content is more psychology-focused and coaching-oriented, which is harder for AI to evaluate objectively.

Figma 66% vs Canva 34%. Figma won but it wasn't the blowout I expected. Both have strong content strategies aimed at different audiences. Figma's documentation is more technical and structured. Canva's is more consumer-friendly. AI favored Figma, but gave Canva real credit.

The one that surprised me most

Booking.com 73% vs Airbnb 27%. I would have bet on Airbnb. It has the better brand story, the more interesting content (location guides, host stories, experience writeups). But Booking.com won convincingly. Looking at the actual scraped content, I think it comes down to structured product information. Booking.com has clear, parseable data about properties, amenities, pricing, reviews. Airbnb has great storytelling but less structured information. AI models, it turns out, care more about parseable data than good narratives. (Full results)

What seems to actually matter

After looking at these results, a few patterns:

Structured information beats marketing copy. Stripe over PayPal. Booking.com over Airbnb. The sites that present clear, organized product information won over the ones that lean on brand storytelling and promotional content. AI reads text and evaluates structure. It doesn't watch your video or feel your brand vibe.

Depth on a focused topic beats breadth. Squarespace over Wix. Shopify over WooCommerce. The brands with deep, focused content about what they do beat the ones spreading across too many things.

Institutional content beats marketplace content. Coursera over Udemy. Shopify over WooCommerce. When the content is curated and structured by the company itself, AI models trust it more than user-generated or marketplace-style listings.

Brand recognition isn't the deciding factor. Wix has more users than Squarespace. PayPal is a household name. Airbnb is iconic. They all lost. AI doesn't care about your brand equity. It evaluates what's actually on the page.

So what is GEO?

GEO (Generative Engine Optimization) is basically SEO's younger sibling. Instead of optimizing for Google's ranking algorithm, you're optimizing for AI's recommendation logic.

The GEO market was $886 million in 2024 and is projected to hit $7.3 billion by 2031. 85% of enterprises are already increasing investment in structured data to improve AI visibility. Gartner predicts traditional search volume will drop 25% by 2026.

A few things that seem to matter based on what I've seen:

Schema markup. AI models weight structured data (JSON-LD, Organization schema, Product schema, FAQ schema). If your site doesn't have it, you're harder to parse.

robots.txt configuration. A lot of websites still block GPTBot, ClaudeBot, and PerplexityBot. If AI can't crawl your site, it can't recommend you. Worth checking.

Clear product information over promotional copy. This came through strongly in the results. Tell AI what you do, who it's for, and why, in plain structured text. Save the clever marketing copy for humans.

FAQ sections. AI models are basically answering questions all day. Well-structured Q&A content maps directly to what AI assistants are trying to do.

Try it

GEO-Compare is free. Drop two URLs, get a blind test across three AI models in about 60 seconds. No signup.

There's also a GEO Audit that tests a single brand across 11 models with visibility testing and a PDF report, if you want to go deeper.

You can browse all 40+ matchups in the directory, sorted by industry.

I'm still running more tests and adding matchups. Some of these results feel right, some still confuse me (how does PayPal score literally zero?), and some make me question whether the current AI models are even good at this. But the shift from "rank on Google" to "get recommended by AI" is happening whether the models are ready or not.

Top comments (0)