DEV Community

Watson Foglift
Watson Foglift

Posted on

FAQ Schema Gets You 2.7x More AI Citations. But Not for the Reason You Think.

A 2025 Relixir study found that pages with FAQPage schema achieve a 41% AI citation rate versus 15% without — roughly 2.7x higher. That's a real number from a real study.

But here's the thing: AI models don't parse your JSON-LD as structured data. They tokenize it as raw text, the same way they'd read a paragraph.

We just added FAQ schema to 36 pages on our site. Before we did, we wanted to understand why it works — because the mechanism matters more than the correlation. Here's what we found.

The experiment that changed how I think about schema

In February 2026, SEO researcher Mark Williams-Cook ran a controlled experiment. He created a page for a fake company and embedded an address exclusively inside invalid, made-up JSON-LD schema — not in any visible page content. The schema type didn't even exist.

Both ChatGPT and Perplexity successfully extracted and returned the address.

That tells us two things:

  1. LLMs can read JSON-LD — they tokenize it like any other text on the page.
  2. LLMs don't parse the semantic structure of schema — they treated an invalid schema type identically to a valid one.

This is a crucial distinction. When Google processes your FAQPage schema, it parses the structure and feeds it into the Knowledge Graph. When ChatGPT reads your page, it just... reads all the text, including the JSON-LD block, as tokens.

So why does FAQ schema correlate with higher citation rates?

If LLMs don't understand schema structure, why the 2.7x difference? Four mechanisms are at play:

1. The visible Q&A content (the biggest factor)

Every good FAQ schema implementation includes a visible FAQ section on the page. That visible content — clear questions with concise answers — is exactly the format LLMs are optimized to extract. When ChatGPT is looking for "What is the difference between X and Y?", a visible FAQ section with that exact question is an easy win.

This is the mechanism that actually drives most of the citation lift. Not the JSON-LD — the content.

2. The JSON-LD as readable text

Since LLMs tokenize JSON-LD as text, your FAQPage schema becomes an additional, cleanly-formatted representation of your content. A well-structured JSON-LD block repeats your key Q&A pairs in a format that's easy for attention mechanisms to pick up on.

Think of it as giving the model a second, structured summary of your content — in the same page.

3. Google and Bing's Knowledge Graph pipeline

Fabrice Canel, Principal Product Manager at Bing, stated at SMX Munich 2025: "Schema markup helps Microsoft's LLMs understand your content." Google's Search Relations team made similar statements at Search Central Live Madrid (April 2025).

For AI Overviews and Bing Copilot specifically, schema is parsed structurally. These platforms have Knowledge Graph infrastructure that traditional LLMs don't. So FAQ schema has a direct effect on two of the six major AI answer surfaces.

4. Selection bias (the uncomfortable one)

Sites that implement FAQ schema tend to be sites that care about content quality, update frequently, and invest in SEO. The 2.7x correlation partially reflects the overall quality of sites that bother with schema — not just the schema itself. No study I've found controls for this.

What we actually built

We needed FAQ schema on 36 pages: 24 comparison pages and 12 blog posts. Here's the approach:

For comparison pages (dynamic template)

Our comparison pages use a shared template. We generate 5 FAQ items per page from the existing comparison data:

const faqs = [
  {
    question: `What is the main difference between Foglift and ${data.name}?`,
    answer: data.heroDescription,
  },
  {
    question: `How does ${data.name} pricing compare to Foglift?`,
    answer: `${data.name} starts at ${data.competitorStartPrice}. 
             Foglift offers a free plan with full website audits, 
             then paid monitoring from $49/month.`,
  },
  // ... 3 more questions generated from page data
];

const faqSchema = {
  "@context": "https://schema.org",
  "@type": "FAQPage",
  mainEntity: faqs.map((faq) => ({
    "@type": "Question",
    name: faq.question,
    acceptedAnswer: {
      "@type": "Answer",
      text: faq.answer,
    },
  })),
};
Enter fullscreen mode Exit fullscreen mode

Key decisions:

  • Generate from existing data — no hardcoded FAQ text. If pricing changes, the FAQs update automatically.
  • 5 questions per page — enough for depth, not so many that it feels like keyword stuffing.
  • Plain text answers — strip HTML before injecting into JSON-LD.

For blog posts (static per-post)

Each blog post gets a hand-written faqJsonLd constant with 4 Q&As specific to the post's topic:

const faqJsonLd = {
  "@context": "https://schema.org",
  "@type": "FAQPage",
  mainEntity: [
    {
      "@type": "Question",
      name: "How do AI search engines decide which websites to cite?",
      acceptedAnswer: {
        "@type": "Answer",
        text: "A 2025 SE Ranking study of 129,000 domains found that brand web mentions are the strongest predictor (35% weight), followed by referring domains, content freshness, and content depth."
      }
    },
    // ... 3 more with specific data
  ]
};
Enter fullscreen mode Exit fullscreen mode

Key decisions:

  • Data-backed answers only — every FAQ answer cites a specific source with sample size and year.
  • 4 per post — we tried more, but after 4 the quality drops and answers start restating each other.

The visible section (this is the part that actually matters)

Both implementations render a visible accordion FAQ section that matches the schema:

<h2>Frequently Asked Questions</h2>
{faqJsonLd.mainEntity.map((faq, i) => (
  <details key={i} open={i === 0}>
    <summary><h3>{faq.name}</h3></summary>
    <p>{faq.acceptedAnswer.text}</p>
  </details>
))}
Enter fullscreen mode Exit fullscreen mode

We use details/summary instead of custom accordion components:

  • Zero JavaScript — works with SSR/SSG
  • Semantic HTML — details has built-in accessibility
  • First item open by default — gives crawlers immediate visible content

What we measured

Before adding FAQ schema + visible FAQ sections, our AEO (Answer Engine Optimization) scores looked like this:

Page type Count AEO score before
Comparison pages 24 41-61
Blog posts 12 63-66
Homepage 1 88

The homepage scored highest because it already had structured FAQ content. The comparison pages scored lowest because they had minimal structured data.

After the upgrade, we're waiting on a deploy to measure the after. Based on the research, here's what we expect:

  • AEO improvement: We expect comparison pages to jump from 41-61 to the 75-85 range.
  • AI citation probability: Too early to measure directly. Our AI Visibility Check baseline shows 0/35 engine checks mentioning us — so we'll know if it moves.
  • What we DON'T expect: A 44% citation lift from schema alone. (If you're curious why, I wrote about that.)

The takeaway

FAQ schema works. The 2.7x correlation is real. But the mechanism is:

  1. Visible Q&A content is what LLMs actually extract (biggest effect)
  2. JSON-LD gives LLMs a second text representation of your key Q&As (smaller but real effect)
  3. Google/Bing Knowledge Graph parses schema structurally for AI Overviews (platform-specific effect)
  4. Selection bias inflates the correlation (unmeasured confounder)

If you only add the JSON-LD without visible FAQ content, you're capturing effects #2 and #3 but missing #1 — which is the largest factor. If you only add visible FAQ content without schema, you get #1 but miss #2 and #3.

The move is both layers. That's what we built.


We built Foglift to measure exactly this kind of thing — AEO scores, AI visibility, and the gap between your SEO readiness and your AI search readiness. The free scan shows you where your FAQ, schema, and content depth stand.

Sources

  • Relixir (2025) — FAQPage schema citation rate study: 41% vs 15% citation rate
  • Mark Williams-Cook (February 2026) — Controlled experiment on LLM JSON-LD tokenization
  • Fabrice Canel, Bing Principal PM, SMX Munich 2025
  • Google Search Central Live Madrid, April 2025
  • Dunn et al., Nature Communications, February 2024
  • Aggarwal et al., "Generative Engine Optimization," KDD 2024

Top comments (0)