Alexandre Caramaschi

Posted on May 31 • Originally published at alexandrecaramaschi.com

RAG and Query Fan-Out: How the AI Assembles the Answer That Cites You

#seo #geo #ai #webdev

When you ask ChatGPT, Gemini, or Perplexity a question and get back a tidy paragraph with two or three sources cited, it is tempting to believe the machine "found the right page." It did not. The answer was assembled from fragments of several pages, retrieved in parallel, reordered by relevance, and stitched together by the model. If your content made it into that process, you got cited. If it did not, you simply do not exist for that question, no matter how good your article is.

I treat this as the central engineering problem of GEO (Generative Engine Optimization). Optimizing for generative AI is not "writing for a robot." It is understanding two concrete mechanics, RAG and query fan-out, and shaping your content's structure to survive them. The counterintuitive thesis is simple: what decides the citation is not your whole page, it is the isolated passage inside it. Most sites optimize the page and never the passage.

What RAG actually is

RAG stands for Retrieval-Augmented Generation. The idea is direct: the language model does not answer only from what it memorized during training. Before writing, it retrieves relevant documents from the web (or from an index), injects the retrieved snippets into its own context, and only then generates the answer, grounded in those snippets. That is why the system can cite a source: the sentence it wrote is, ideally, backed by a chunk of text it just read.

This pattern is now the common denominator of search-enabled assistants. ChatGPT in search mode runs primarily on the Bing index; Google AI Mode runs on Google's own index; Perplexity behaves as an online information-retrieval system on top of an LLM where every answer ships with inline citations. Different backends, same underlying architecture: retrieve, then generate.

The detail that changes everything for content producers: the model does not retrieve your site. It retrieves chunks, separately delimited and indexed pieces of text. The unit of competition is not the URL, it is the block. If your strongest argument is spread across three paragraphs that only make sense together, no single chunk wins the retrieval ranking.

Query fan-out: one question becomes many

The second mechanism is query fan-out. Instead of searching for exactly what you typed, the system decomposes your question into several subqueries and fires them all in parallel. A question like "best ERP for fashion e-commerce in Brazil" does not become one search. It becomes a fan: "ERP for e-commerce," "ERP marketplace integration," "ERP tax/invoice management," "ERP retail comparison Brazil," "ERP user reviews," and so on.

Each subquery retrieves its own set of passages. The system then merges everything, re-ranks, and synthesizes. This multi-step behavior is what the market calls "Deep Research" in ChatGPT and "Pro Search" in Perplexity: break the question into subtasks, fetch multiple sources for each, synthesize into a report. Google's AI Mode, rolled out broadly in early 2026, turns this into an interactive space where new queries are fired internally to refine the answer.

The strategic implication is strong and almost nobody acts on it: you do not need to rank for the main question, you need to rank for the subqueries. A page that covers only the umbrella term loses to a page that clearly answers each derived intent. Subtopic coverage stopped being a long-tail SEO tactic and became a retrieval requirement. This is why I reorganize content by intents, not by keywords.

The full flow, from question to citation

It helps to see the whole path in order. The diagram below traces a query through a RAG + fan-out engine and marks the three points where your content is included or discarded.

USER QUESTION
   "best ERP for fashion e-commerce in Brazil?"
        |
        v
[1] QUERY FAN-OUT  (decomposition)
        |
        +--> subquery A: "ERP fashion e-commerce"
        +--> subquery B: "ERP marketplace integration"
        +--> subquery C: "ERP tax / e-invoice"
        +--> subquery D: "ERP retail comparison BR"
        |    (fired in PARALLEL)
        v
[2] RETRIEVAL  (per-chunk recall)
        |   index: Bing / Google / proprietary
        |   each subquery pulls N passages
        v
   POOL OF CANDIDATE PASSAGES
        |   <-- POINT 1: your chunk enters here or not
        v
[3] RE-RANKING  (reordered by relevance)
        |   relevance + authority + freshness + schema
        v
   TOP-K SELECTED PASSAGES
        |   <-- POINT 2: your chunk rises or drops
        v
[4] GENERATION  (LLM synthesis)
        |   model writes the answer grounded
        |   in the selected passages
        v
   ANSWER + CITATIONS
        |   <-- POINT 3: you become a cited source or not
        v
   USER READS THE ANSWER

Note the three decision points. At Point 1, your chunk must be retrieved by at least one subquery, which depends on textual relevance and on the content being crawlable and indexed. At Point 2, it must survive re-ranking, which weighs relevance, source authority, freshness, and structured signals. At Point 3, the model must actually use your passage in the synthesis and attribute the citation. Every optimization below targets one of these three bottlenecks, never "the site as a whole."

Self-contained passages and semantic chunking

Because the engine indexes and retrieves by piece, your number-one task is to make each piece stand on its own. I call this a self-contained passage: a block of text that answers a specific question without depending on the previous paragraph to make sense. If a passage opens with "as we saw above," it already lost, because "above" does not travel with the chunk when it is extracted.

In practice, write blocks that carry their own context. Repeat the subject instead of using a dangling pronoun. Open the section with the direct statement, then explain. Semantic chunking, from the content side, means helping the engine cut in the right places: one subtopic per section, headings that describe exactly what follows, and paragraphs that do not blend two ideas. The cleaner the semantic boundary, the better the retrieved chunk.

Rules I apply and enforce on teams:

One intent per block. Do not answer "what is it" and "how much does it cost" in the same paragraph. Those are two chunks, two subqueries, two citation chances.
Answer before explanation. The first sentence of the section delivers the conclusion. Detail comes after, for those who want depth.
Embedded context. Each passage names the entity, the number, and the relevant date instead of assuming the reader (or the extractor) already knows.
No orphan pronoun at the start. "This," "it," "that process" at the beginning of a block break self-containment.

Direct answers up top and intent coverage

Re-ranking favors passages that answer directly and densely. A 2026 GEO study based on 145 real fashion and e-commerce queries found that assistants prioritize content with tables, FAQs, direct answers, and proprietary data, because that makes extraction and synthesis easier. The same study quantifies how many brands each engine shows per answer: Perplexity typically surfaces 3 to 6 brands with sources, Google AI Overviews 3 to 5, ChatGPT or Claude 2 to 4. Few slots. Whoever does not answer directly does not compete.

Intent coverage is the complement. Because fan-out decomposes the question, content that covers the entire fan of sub-intents is retrieved by more subqueries and therefore appears at more points in the synthesis. Do not confuse this with stuffing keywords. It means covering, with real depth, the adjacent questions a user would ask in sequence: definition, comparison, price, common mistake, next step. Each becomes a different entry door.

I build this from the subquery map, not from search volume. Take the main question, list the 8 to 12 likely fan-out sub-intents, and guarantee at least one self-contained passage for each. It is laborious, and it is what separates a cited article from an invisible one.

Structured data: the layer re-ranking reads

Schema.org is not "old SEO" in this context. It is the layer that helps the engine understand what each thing is before ranking the passage. Fabrice Canel, on Microsoft's Bing team, has stated that schema markup helps Microsoft's LLMs understand content and serves as an essential data source for AI-based search features. When ChatGPT's backend is Bing, that stops being theory.

What matters for RAG and fan-out specifically:

Organization and Person with sameAs anchor your brand to a disambiguated entity, reducing the risk of the engine attributing the passage to the wrong company.
FAQPage and Article make the question-answer pair explicit, which is exactly the format fan-out looks to retrieve per subquery.
Stable @id and about/mentions references create a coherent internal graph, associated with fewer wrong citations and better eligibility for AI Overviews.

An honest caveat, because schema gets sold as a silver bullet: structured data is a disambiguation and eligibility signal, not a guaranteed citation trigger. It improves Point 1 and Point 2 of the flow and helps the engine trust the source, but it does not replace having the best passage.

Example: same information, two outcomes

To close the concept, compare two ways of writing the same thing. The first is how most people publish. The second is optimized for retrieval.

Version that is not retrieved (context locked to the previous paragraph, diluted answer):

As mentioned, it depends a lot on the size of the
operation. In their case, many factors come into
play and the ideal is always to analyze case by case
before deciding anything about tax integration.

This block answers no subquery. It names no entity, brings no number, gives no answer. Extracted alone, it is noise.

Retrievable version (self-contained, answer up top, data embedded):

A fashion e-commerce ERP in Brazil must issue NF-e
and NFC-e natively. In 2026, with the first phase of
split payment from the tax reform, automatic tax
integration stopped being optional: without it, the
merchant remits tax manually on every sale. Platforms
with a native tax module eliminate that work.

The second block names the entity (fashion e-commerce ERP), answers directly (must issue NF-e and NFC-e), carries a dated milestone (split payment in 2026), and ends with an actionable criterion. It is retrievable by the tax-management subquery, the integration subquery, and the comparison subquery. Same information, three entry doors instead of zero.

RAG-readiness checklist for engineers

I close any content audit with this checklist. It is not exhaustive, but it targets the three decision points of the flow.

Crawlable and indexable. Confirm the engine can read the rendered page, with no critical content hidden behind JavaScript the crawler does not execute. Without this, Point 1 never happens.
One subtopic per section, descriptive heading. Each heading should let you guess the passage from the title alone. This guides the semantic chunk cut.
Direct answer in the first sentence of each section. Conclusion before explanation.
Sub-intent map covered. List the 8 to 12 likely fan-out subqueries and verify a passage exists for each.
Self-contained passages. No block starts with an orphan pronoun or a reference to "above."
Entity and FAQ schema. Organization/Person with sameAs, and FAQPage wherever a question-answer pair exists.
Proprietary, dated data. At least one number, experiment, or date that does not exist identically elsewhere. Originality increases the odds the model prefers your passage in the synthesis.
Visible freshness. Explicit publish and update dates. Perplexity in particular values freshness and rewards visible updates.

This work is not glamorous and does not produce a pretty screenshot. But it is what puts your passage inside the candidate pool, at the top of the re-ranking, and finally into the sentence the user reads. The rest is consequence.

The practical next step

Google stopped treating this as an experiment. In May 2026 it published an official resource on optimizing for generative AI in Search, signaling the topic became product guidance, not community guesswork. At the same I/O 2026, the company declared Search had entered the "agentic era," with AI Mode and information agents running 24/7. Agents that assemble answers from retrieved passages. Same mechanic as this guide, now on Search's primary surface.

Start small and measurable: take your most important article, run the RAG-readiness checklist, and rewrite three sections as self-contained passages with the answer up top. Then test the subqueries in Perplexity and AI Mode and see whether your passage shows up. The rule I never give up: optimize the passage, not the page. That is where the AI decides whether to cite you.

Written by Alexandre Caramaschi, CEO of Brasil GEO, ex-CMO of Semantix (Nasdaq), co-founder of AI Brasil. This is an English adaptation of the original Portuguese article: RAG e query fan-out: como a IA monta a resposta que cita você.

DEV Community