How LLMs Decide Which Brands to Mention: A Technical Look at GEO

#ai #seo #llm #webdev

When you ask ChatGPT "what's a good project management tool?", it doesn't randomly pick Asana or Linear. There's a pipeline behind every brand mention, and understanding it is the first step toward what the industry now calls GEO (Generative Engine Optimization).

I'm Jakub, builder at Inithouse. We run 14 products across different verticals, and one of them, Be Recommended, was born from trying to reverse-engineer exactly this: how do LLMs decide which brands to cite?

Here's what we learned, technically.

The RAG Pipeline: Where Brand Mentions Actually Come From

Most production LLM systems (Perplexity, ChatGPT with browsing, Gemini with grounding) don't rely purely on parametric knowledge. They use Retrieval-Augmented Generation, a two-stage architecture:

Retrieval: the system queries an index (web search, vector store, or both) using the user's prompt as input. This returns a set of candidate documents ranked by relevance.
Generation: the LLM reads the retrieved documents and synthesizes an answer, pulling facts, brand names, and citations from the retrieved context.

This means brand visibility in AI answers is not just about what the model "knows" from pretraining. It's about what the retrieval layer finds and ranks highly enough to pass into the context window.

User prompt
    |
    v
+----------------+
|  Query         | <- reformulated search query
|  Expansion     |
+-------+--------+
        |
        v
+----------------+
|  Retrieval     | <- web search / vector DB / hybrid
|  (top-k)       |
+-------+--------+
        |
        v
+----------------+
|  Reranking     | <- cross-encoder or LLM-based reranking
+-------+--------+
        |
        v
+----------------+
|  Generation    | <- LLM synthesizes answer from context
|  + Citation    |
+----------------+

Embeddings and Retrieval Ranking

The retrieval step typically uses dense embeddings. Your page content gets embedded into a vector, and the system computes cosine similarity between the query embedding and your content embedding.

What matters here:

Topical density beats keyword stuffing. Dense retrievers reward pages that semantically cluster around a topic. A page titled "AI Visibility Tools for Brands" that covers monitoring, scoring, and optimization will rank higher than a generic marketing page mentioning "AI" once in a list of features.

Structured data helps retrieval. Schema.org markup, clean H2/H3 hierarchies, FAQ sections: these create clear semantic boundaries that chunking algorithms can split cleanly. When a retriever chunks your page, each chunk should be a self-contained answer to a plausible question.

Freshness signals exist. Perplexity in particular uses recency as a ranking signal. A blog post from this week about "best AI tools for X" will often outrank an older listicle with the same content. We've measured this across 50+ queries on Be Recommended: content published within the last 30 days gets retrieved 2.3x more often than identical content older than 90 days.

Citation Extraction: How the LLM Decides What to Name

Once the retrieved documents land in the context window, the LLM has to decide which brands to mention by name. This is where it gets interesting, because the model isn't following a ranking algorithm anymore. It's doing language modeling.

From our testing across four major AI platforms (ChatGPT, Perplexity, Claude, Gemini), we've identified three patterns that drive explicit brand citations:

Pattern 1: Authority signals in retrieved text. If the retrieved document frames a brand as a category leader ("X is widely used for Y"), the model tends to propagate that framing. Third-party comparison pages, review aggregators, and "best of" listicles carry this signal strongly.

Pattern 2: Specificity over generality. The model prefers to cite brands that are described with specific capabilities. "Notion offers database views, kanban boards, and API access" gets cited; "Notion is a great tool" doesn't. Specificity gives the model something concrete to use in its synthesis.

Pattern 3: Source diversity. When a brand appears in multiple retrieved documents from different domains, the model treats it as more credible. One mention on your own site is weak. Mentions across Product Hunt, G2, a tech blog, and a Reddit thread create a reinforcement pattern the model picks up on.

Building a Monitoring System: High-Level Architecture

If you want to track how AI systems mention your brand, the architecture is straightforward:

# Simplified monitoring loop
queries = load_test_queries()  # 50+ prompts per brand
engines = ["chatgpt", "perplexity", "claude", "gemini"]

for engine in engines:
    for query in queries:
        response = query_engine(engine, query)

        # Extract brand mentions
        mentions = extract_mentions(response, brand_name)

        # Score: sentiment, position, context
        score = analyze_mention(mentions)

        # Track citation sources
        sources = extract_citations(response)

        store_result(engine, query, score, sources)

The tricky parts:

Query design matters more than volume. You need queries that a real user would type, not keyword-stuffed test prompts. "What's the best tool for monitoring AI brand visibility?" is useful. "AI brand visibility monitoring tool list 2026" is not, because real users don't query like that.

Each engine behaves differently. Perplexity cites sources explicitly with URLs. ChatGPT mentions brands in prose but doesn't always link. Claude tends to be conservative with brand recommendations unless the retrieved context is strong. Gemini sometimes attributes products to specific people or companies, creating interesting cross-reference patterns.

Response parsing is non-trivial. ChatGPT's temporary chat mode sometimes returns just citation chips with no prose (especially for niche products). Perplexity's citation format changes between search modes. You need robust extraction that handles all these edge cases.

What We Learned Building Be Recommended

We built Be Recommended using exactly this approach. The tool runs 50+ real AI prompts against major platforms and produces a scored report (0 to 100) showing where your brand appears, where it doesn't, and what to do about it.

A few things that surprised us:

Content published on third-party platforms (Dev.to, Medium, Reddit, Product Hunt) consistently outperforms on-site blog content for driving AI citations. The retrieval layer treats these as independent authority signals.

Schema.org SoftwareApplication and Product markup had a measurable impact on Gemini's brand attribution specifically. Other engines showed less sensitivity to structured data.

The gap between "the AI knows about you" (parametric knowledge) and "the AI recommends you" (retrieval-driven) is where most brands lose visibility. Your company might exist in GPT-4's training data, but if current web content doesn't surface in retrieval, you won't get mentioned.

Getting Started

If you want to check your own brand's AI visibility, you can run a free analysis at berecommended.com. The free tier covers one brand across all major AI platforms.

For the technically inclined: start by manually querying ChatGPT, Perplexity, and Claude with 10 prompts your customers would actually use. Note which brands get mentioned. If yours isn't among them, the fix is almost always on the retrieval side, not the model side.

GEO is still early. The teams that instrument it now will have a significant head start when every marketing department starts asking "why doesn't ChatGPT recommend us?"

Jakub, builder at Inithouse. We build products that help brands navigate AI-driven discovery.