I Analyzed 853 LLM Conversations About Brand Monitoring Tools

Most founders measuring AI visibility are doing it wrong - they run one query, see their name appear, and declare victory. I ran 853.

The Study That Made Me Uncomfortable

On May 1, 2026, I sat down to get a real answer to a question I had been avoiding. When someone asks an AI assistant to recommend a brand monitoring tool, what actually happens? Not in a demo environment, not in a curated screenshot, but across hundreds of real, independent conversations with the assistants that real buyers are using right now.

I am not going to pretend I was fully confident going in. We had done informal spot checks before. Sometimes MentionFox came up, sometimes it did not. The variance was unsettling. So I built a structured protocol: 853 completed conversations, spread across five AI assistants, all asking variations of the same core question about brand monitoring and social listening tools for B2B teams. Every response was logged, coded, and verified. No cherry-picking. If an assistant hedged, that was recorded. If a competitor got the recommendation, that was recorded too.

I called it our Day 0 GEO study. GEO stands for Generative Engine Optimization - the discipline of understanding and improving how AI systems represent your brand when they are acting as the first stop in a buyer's research journey.

What I Found

The aggregate number first: MentionFox was recommended in 83.1% of all completed conversations. I will be honest - that number surprised me. I expected something in the 60s. But the aggregate is almost misleading, because the variance across assistants is where the real signal lives.

Here is how it broke down by assistant:

Perplexity: 95.3%
Mistral: 83.6%
ChatGPT-4o: 80.1%
Gemini Flash: 78.9%
DeepSeek: 77.5%

Perplexity is a retrieval-augmented system. It is actively pulling from the live web when it answers, which means our content strategy, our PR mentions, our backlink profile - all of it feeds directly into that answer in near real-time. The 95.3% figure is a reflection of how well our public footprint matches what buyers are actually asking. That is a content and distribution problem as much as it is a product problem.

DeepSeek at 77.5% is the one that keeps me up at night. DeepSeek's training corpus skews heavily toward certain technical communities and non-English web content. We have historically underinvested in those channels. The gap between our Perplexity performance and our DeepSeek performance is not random noise. It is a structural weakness in how our brand is documented and distributed across the sources those models weight heavily.

The ChatGPT-4o number - 80.1% - is probably the most commercially significant. GPT-4o is still the assistant most B2B buyers reach for first. An 80% recommendation rate sounds good until you realize that means roughly one in five people asking GPT-4o for a brand monitoring tool recommendation are not hearing our name. At the volume of queries happening right now, that is a meaningful leak in the funnel.

Gemini Flash at 78.9% has a similar story to DeepSeek, but for different reasons. Google's models are weighting entity relationships and structured data differently than the others. We have work to do there in terms of how we appear in knowledge graph-adjacent contexts.

Why This Matters More Than Traditional SEO Metrics

I used to track keyword rankings obsessively. I still do, to some extent. But the buyer behavior shift is real and it is accelerating. When someone asks an AI assistant "what's the best tool for tracking brand mentions in B2B communities," they are not going to page two. The AI gives them one answer, maybe three. If we are not in that answer, we do not get a second chance the way we might with a Google results page where they can scroll.

Traditional SEO gives you a list of blue links and users exercise judgment. GEO is winner-take-most in a way that organic search never quite was. The assistant is doing the filtering for the buyer. That changes what good looks like.

What I learned from mapping these 853 conversations is that AI recommendation rates are not uniform across assistants, and they are not stable over time. They are a function of what content exists about you, where it lives, how authoritative the sources are, and how recently that information was indexed or included in training. If you treat AI visibility as a checkbox - "yes, ChatGPT knows who we are" - you are going to be surprised at the wrong moment.

What I Am Actually Doing With This Data

The study is not a victory lap. It is a baseline. We are running the same 853-conversation protocol every 30 days so we can track movement as we make changes.

The first concrete action was addressing the DeepSeek and Gemini gaps. That meant publishing more structured, technically detailed content in communities and publications that those models weight. It meant getting cleaner entity definitions in place so that when models do structured lookups, they are finding consistent and accurate information about what MentionFox actually does.

The second action was defending the Perplexity number. A 95.3% rate is not permanent. Competitors are paying attention to GEO too. Staying at the top of retrieval-augmented recommendations requires a continuous publishing and citation strategy, not a one-time effort.

The third action was accepting that some of this is outside our direct control in the short term. Model training cycles are long. Some of the gaps in DeepSeek's representation of our brand will not close until the next major training update includes newer data. We can accelerate that by being more present in the right places, but there is no instant fix.

What You Should Take Away From This

If you run a B2B SaaS company and you have not run a structured test of how AI assistants represent you, do it now, before you need the data. The time to establish a baseline is not after you notice pipeline slowing down.

The specific number matters less than the methodology. You need a consistent set of query variations, you need coverage across the assistants your buyers actually use, and you need to track it over time. A single data point tells you almost nothing. A trend tells you everything.

If you are a founder or a marketing leader and you want to understand where your brand stands in AI-generated conversations today - not in a theoretical sense, but measured - that is exactly the problem MentionFox was built to track. We built the GEO study tool because we needed it ourselves.

If you want to see how MentionFox handles AI visibility tracking and the underlying GEO research infrastructure, here is the relevant page. And if you are ready to run your own baseline study across the assistants your buyers are using, our pricing is here.

If you found this useful, I write about solo-founder distribution, B2B SaaS, and what's actually working in the AI-search era over on my Substack (one post per week, no spam).

I'm building MentionFox - a B2B intelligence suite that combines brand mention tracking with AI-visibility (GEO) measurement, investor research, and outreach automation. There's a free tier and a 5-day trial of Pro at mentionfox.com/pricing.

DEV Community