Joseph Anady

Posted on May 24 • Originally published at thatdevpro.com

AI Citations: how ChatGPT, Claude, Gemini cite sources

#ai #seo #llms #chatgpt

Originally published at thatdevpro.com. Part of ThatDevPro's open SEO + AI framework library. ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio. Open-source AI citation toolkit: github.com/Janady13/aio-surfaces.

How LLMs Choose Sources to Cite — Optimizing for ChatGPT, Perplexity, Claude, Gemini, Copilot, and Grok

A comprehensive installation and audit reference for understanding how AI engines select and cite sources, structurally engineering content to be selected, monitoring AI citation status across engines, and building the kind of authority that compounds across the AI search ecosystem. This document is dual-purpose: installation manual and audit document.

Cross-stack implementation note: the code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see framework-cross-stack-implementation.md. For pure client-rendered SPAs (no SSR/SSG) see framework-react.md. For Tailwind-specific concerns (purge, dynamic classes, dark-mode CLS, focus accessibility) see framework-tailwind.md.

1. Document Purpose & How to Use This Document

1.1 What This Document Is

This is the canonical reference for AI Citation optimization — the practice of structuring content, signals, and authority so that AI engines (ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok, and Google's AI Overviews) select your content as a source when answering user questions. AI Citation is the new center of gravity in search visibility. Increasingly, users ask AI engines questions instead of search engines, and the question for content creators shifts from "do we rank?" to "do we get cited?"

The mechanics of AI Citation are not identical to traditional SEO. AI engines use Retrieval Augmented Generation (RAG), real-time search, embeddings-based similarity, freshness signals, and authority weightings to choose sources. Some signals overlap with SEO (E-E-A-T, entity authority, structured data); others are specific to how LLMs evaluate content (chunk-level coherence, factual density, citation-worthiness, embedding distinctiveness).

This framework specifies how each major AI engine selects sources, what signals to install on a website to improve selection probability, how to track AI citation status across engines, and how to maintain the authority that drives sustained citation over time. AI Citation works in concert with — but is structurally distinct from — the other frameworks in this library.

1.2 Three Operating Modes

Mode A — Install Mode: Building AI citation optimization infrastructure into a site. Follow Sections 2 → 14.

Mode B — Audit Mode: Evaluating current citation status across AI engines. Skip to Section 11.

Mode C — Hybrid Mode: Audit then install for failing items.

1.3 How Claude Code CLI Should Consume This Document

Read Section 2 — collect client variables, especially current AI engine citation status
Read Section 3 — understand AI Citation theory and how each engine works
Run Section 4 — assess current citation state across all major engines
Install Sections 5-9 — content patterns, technical infrastructure, llms.txt, RAG-friendly structure
Validate — Section 11 (test queries on each engine; document citation status)
Generate report — Section 14

1.4 Conflict Resolution Rules

Conflict	Rule
Existing content not appearing in AI citations despite ranking well in Google	Apply AI-specific signals (chunk structure, factual density, llms.txt, freshness markers). Traditional SEO doesn't fully translate.
AI engines citing outdated content over current content	Strengthen freshness signals; verify dateModified is genuine; reach out to engine via supported channels for re-indexing.
AI engines hallucinating about the entity	Reinforce Knowledge Graph signals (see `framework-knowledgegraph.md`); ensure structured facts on official pages.
Some AI engines citing, others not	Each engine has different mechanisms. Optimize per-engine where signals diverge.

1.5 Required Tools

AI engines themselves — ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok — primary testing environment
Profound, Athena HQ, AthenaHQ, BrightEdge AI Catalyst, Otterly.ai — AI citation tracking platforms
Google Search Console — for AI Overviews specifically (some impressions data is available)
Bing Webmaster Tools — Microsoft Copilot is integrated with Bing
Server logs — to identify and verify AI bot traffic (GPTBot, PerplexityBot, ClaudeBot, etc.)

2. Client Variables Intake

# ============================================
# AI CITATIONS FRAMEWORK CLIENT VARIABLES
# ============================================

# --- Business & Entity Identity (REQUIRED) ---
business_name: ""
primary_domain: ""
business_industry: ""
business_wikidata_qid: ""
business_in_knowledge_graph: false   # From framework-knowledgegraph.md audit

# --- Current AI Citation Status (REQUIRED for audit) ---
chatgpt_citation_status: ""          # "regularly_cited", "occasionally_cited", "rarely_cited", "never_cited", "unknown"
perplexity_citation_status: ""
claude_citation_status: ""
gemini_citation_status: ""
copilot_citation_status: ""
grok_citation_status: ""
google_ai_overview_citation_status: ""

# --- Citation-Worthy Topics (REQUIRED) ---
topics_where_we_should_be_cited: []  # Topics where the site has strong authority
topics_where_currently_cited: []     # Topics where AI engines do cite the site
topics_where_competitors_cited_instead: []  # Where the site should be cited but isn't

# --- Technical AI Access Status (REQUIRED) ---
robots_txt_blocks_ai_bots: false     # Critical — if true, no AI engine can cite this site
specific_ai_bots_blocked: []         # Which bots if any
llms_txt_exists: false               # /llms.txt file
llms_full_txt_exists: false          # /llms-full.txt file
ai_bot_access_verified: []           # Confirmed access for which bots via logs

# --- Content Patterns (REQUIRED) ---
articles_have_q_and_a_structure: false  # Q&A format aids AI extraction
articles_have_factual_summary_at_top: false  # Top-of-article TL;DR
articles_have_key_facts_callouts: false  # Highlighted key facts
content_uses_clear_definitional_statements: false
content_uses_first_paragraph_substantive_answers: false
content_has_high_factual_density: false  # Not opinion-padded

# --- Schema for AI Citation (REQUIRED) ---
has_organization_schema: false
has_article_schema_with_dateModified: false
has_qapage_schema_where_applicable: false
has_dataset_schema_for_research: false
has_clear_entity_declarations_per_page: false  # mainEntity + about

# --- Freshness Infrastructure (REQUIRED) ---
content_dateModified_kept_current: false
content_changelog_exposed: false     # Visible record of substantive updates
content_refresh_cadence: ""          # "monthly", "quarterly", "annually", "ad_hoc"
time_sensitive_content_specifically_managed: false

# --- Authority Signals AI Engines Weight (REQUIRED) ---
eeat_score: 0                        # From framework-eeat.md
ymyl_score: 0                        # From framework-ymyl.md
information_gain_score: 0            # From framework-infogain.md
external_citation_count: 0           # How often external sources cite this site
wikipedia_references_to_site: 0      # Most powerful AI citation signal
academic_citations: 0                # For research-bearing sites

# --- AI-Specific Authority Signals (RECOMMENDED) ---
mentioned_in_llm_training_documentation: false  # If site is documented as training source
official_partner_status_with_ai_companies: false
appears_in_stable_kg_databases: false  # CommonCrawl, etc.

# --- Tracking Infrastructure (REQUIRED) ---
has_ai_citation_tracking_setup: false
ai_citation_tracking_tool: ""        # Profound, Otterly, Athena, manual, etc.
tracked_query_set: []                # Specific queries tracked across engines
last_citation_audit_date: ""

3. What AI Citation Theory Is

AI Citation Theory describes how AI engines — large language models grounded with real-time retrieval — choose which web sources to cite when answering user questions. The mechanism is fundamentally different from traditional search ranking, though it shares some signals.

When a user asks an AI engine a question, the engine typically:

1. Interprets the query — uses the LLM to understand intent, identify entities, and formulate sub-queries

2. Retrieves candidate sources — searches the web (or a curated index) for documents potentially relevant to answering

3. Ranks and filters candidates — applies authority, freshness, factual density, and similarity scoring to select the strongest candidates

4. Reads and extracts — processes the candidate documents to extract answer-relevant content

5. Synthesizes the answer — composes a response drawing from extracted content

6. Selects citations — chooses which sources to display as citations alongside the response

The citation selection step is where "AI Citation Optimization" focuses. Even when a source is retrieved and read, it may not appear as a visible citation in the final response. Engines vary in how aggressively they cite — Perplexity displays many citations prominently; ChatGPT cites less prominently; Claude cites when retrieving but the citation may be brief.

The factors AI engines weight in source selection (synthesizing across engines):

Authority — Sources with established authority (recognized entities, credentialed authors, established publications) are preferred. Knowledge Graph presence is a particularly strong signal.

Factual density — Sources that pack more verifiable facts per word are preferred over opinion-heavy or padded content.

Definitional clarity — Sources that clearly define terms, concepts, and entities in straightforward language are easier for LLMs to extract from.

Freshness — For time-sensitive queries, newer content is preferred. For evergreen queries, freshness matters less but currency still helps.

Structural extractability — Content with clear Q&A structure, headed sections, factual summaries, and clean HTML is easier to extract and cite cleanly.

Distinctive content — Sources offering Information Gain (see framework-infogain.md) — original research, first-hand experience, contrarian analysis — are preferred over derivative content.

Trust signals — Sources with strong E-E-A-T (see framework-eeat.md) signals are preferred, especially for YMYL queries.

Bot accessibility — Sources that allow the engine's bot to crawl are required; sites blocking AI bots are excluded entirely.

The 2026 evolution of AI Citation:

AI Overviews now appear for the majority of US searches in Google — citation in AI Overviews is increasingly visible
ChatGPT's web search rolled out to default for free users; citation visibility is meaningful
Perplexity grew to substantial daily active user counts with prominent citation display
Claude's projects feature increased real-time browsing for retrieval
Microsoft Copilot deeply integrated with Bing's index
Grok added web grounding with X-source preference

Each engine evolves rapidly. The principles in this framework are stable; the specific implementation patterns adapt as engines change.

4. AI Citation Status Assessment

Before optimization, understand current citation status.

4.1 Per-Engine Citation Testing

Define a tracked query set — 10-30 queries the site should be cited for based on its topical authority. For each query, test on each engine:

ChatGPT (chatgpt.com):

Sign in with web search enabled
Ask the query
Document: did the response cite the site? What context?

Perplexity (perplexity.ai):

Ask the query
Document citations panel — is the site listed? What rank?

Claude (claude.ai):

Ask the query with web search prompted
Document citations in response

Gemini (gemini.google.com):

Ask the query
Document sources panel — is the site listed?

Microsoft Copilot (copilot.microsoft.com):

Ask the query
Document citations in response

Grok (x.com/i/grok):

Ask the query
Document citations

Google AI Overview:

Search the query in Google
If AI Overview appears, document if site is cited

Build a citation matrix:

query,chatgpt,perplexity,claude,gemini,copilot,grok,ai_overview,date_tested
"how to optimize for AI engines",no,yes_rank3,no,no,no,no,no,2026-04-29
"E-E-A-T explained",yes_inline,yes_rank1,yes,yes,no,no,yes,2026-04-29
"SDVOSB web development",no,yes_rank5,no,no,no,no,no,2026-04-29

4.2 Citation Frequency Classification

Per topic, classify citation frequency:

Regularly cited: Cited in 4+ engines for queries in the topic area
Occasionally cited: Cited in 2-3 engines
Rarely cited: Cited in 1 engine
Never cited: Not appearing in any engine for queries the site should win

4.3 Competitor Citation Analysis

For queries where the site is not cited, document who is:

Which competitors are cited?
What do those sources have that this site doesn't?
Are they Wikipedia entries? Major media? Industry publications? Smaller sites with distinctive content?

This reveals what types of authority each engine prefers for the topic.

4.4 Bot Access Verification

Verify AI engine bots can access the site:

# Check robots.txt explicitly
curl https://{{domain}}/robots.txt | grep -E "GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider|anthropic-ai|cohere-ai"

Major AI bots in 2026:

GPTBot — OpenAI's training crawler
OAI-SearchBot — OpenAI's search retrieval bot
ChatGPT-User — User-triggered fetches
PerplexityBot — Perplexity's crawler
ClaudeBot / anthropic-ai — Anthropic's crawlers
Google-Extended — Google AI training (separate from Googlebot)
GoogleOther — Google's other crawlers
CCBot — Common Crawl (used by many LLMs for training)
Applebot-Extended — Apple Intelligence
Bytespider — TikTok/ByteDance
Diffbot — knowledge graph crawler
FacebookBot / Meta-ExternalAgent — Meta AI

Confirm in server logs that these bots are visiting and getting 200 responses.

4.5 Citation Status Summary

After assessment, classify the site's citation posture:

AI-authoritative — regularly cited across multiple engines on multiple topics
Citation-emerging — occasionally cited; clear pattern of growing recognition
Citation-minimal — rarely cited despite topical authority
Citation-absent — not cited anywhere despite strong organic search performance

The implementation path differs by status.

5. Per-Article AI Citation Implementation

Structural patterns that improve AI citation likelihood.

5.1 Top-of-Article Factual Summary

AI engines often extract from the first part of an article. Lead with substantive content, not throat-clearing.

<article class="ai-citation-optimized">
  <header>
    <h1>{{ENTITY_OR_TOPIC_AS_HEADLINE}}</h1>
    {{BYLINE_WITH_CREDENTIALS}}
    {{DATES_INCLUDING_dateModified_PROMINENTLY}}
  </header>

  <!-- Top-of-article factual summary — 2-4 sentences, definitional and substantive -->
  <section class="factual-summary" aria-label="Summary">
    <p>{{2_TO_4_SENTENCE_SUMMARY_THAT_DEFINES_THE_TOPIC_OR_ANSWERS_THE_QUERY_DIRECTLY}}</p>
  </section>

  <!-- Body content -->
  <section class="article-body">
    {{CONTENT}}
  </section>
</article>

The factual summary should:

Define the topic or directly answer the query
Use complete sentences (extractable)
Be factually dense (multiple specific claims)
Avoid filler ("In this article we'll explore...")
Match the page's primary entity (see framework-entitysalience.md)

5.2 Q&A Structure for Common Questions

For sections answering specific questions, use explicit Q&A structure:

<section class="article-faq">
  <h2>Common Questions</h2>

  <h3>What is entity salience?</h3>
  <p>Entity salience is a numerical score (0.0 to 1.0) calculated by Google's natural language processing systems that represents how central a specific entity is to a piece of content...</p>

  <h3>How is entity salience measured?</h3>
  <p>Entity salience is measured by analyzing position, frequency, grammatical role, and co-occurrence patterns within the content...</p>
</section>

Pair with FAQPage schema:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is entity salience?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Entity salience is a numerical score..."
      }
    },
    {
      "@type": "Question",
      "name": "How is entity salience measured?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Entity salience is measured by..."
      }
    }
  ]
}
</script>

Q&A structure aligns directly with how users phrase queries to AI engines, making extraction-and-citation easier.

5.3 Key Facts Callouts

Highlight factual claims that AI engines might want to cite:

<aside class="key-fact" role="note">
  <p><strong>Key fact:</strong> Google's Information Gain patent (US 11,995,114 B2) was granted in May 2024 and describes a system for scoring documents based on novelty contribution.</p>
</aside>

Key facts should:

State a single specific fact
Be sourced (link to primary source)
Be self-contained (extractable without surrounding context)
Be visually distinguished

5.4 Definitional Clarity

When introducing terms or concepts, use clear definitional patterns AI engines extract well:

Strong: "Entity salience is a numerical score representing how central an entity is to content."

Weak: "When we talk about entity salience, what we're really getting at is how much a piece of content focuses on something specific."

The strong version is a clear "X is Y" definition. The weak version requires multiple inferences.

5.5 First-Paragraph Substantive Answers

For articles answering specific queries, the first paragraph should answer the query substantively. Don't make readers (or AI engines) hunt:

Anti-pattern (delays answer):

"Entity salience has become an increasingly important topic in modern SEO. With the rise of AI engines and the changing nature of search, many practitioners are wondering what this means for their content strategy. In this article, we'll explore the concept of entity salience in depth..."

Better (delivers answer immediately):

"Entity salience is a numerical score (0.0-1.0) calculated by natural language processing systems to determine how central an entity is to a piece of content. Google uses entity salience scoring as a key signal for determining what queries a page should rank for, and AI engines use similar scoring to determine which sources to cite."

5.6 High Factual Density

Articles that pack many specific facts per paragraph are more cite-worthy than articles that pad with opinion or generality.

Lower density: "SEO has changed a lot in recent years. Things that used to work don't anymore. Smart marketers are adapting to the new landscape."

Higher density: "Google's March 2024 core update integrated the Helpful Content System into core ranking. The September 2025 Search Quality Rater Guidelines update added evaluation criteria for AI Overviews. The December 2025 core update specifically targeted mass-produced AI content, with content farms losing 40-80% of organic traffic."

The high-density version is full of specific, citable facts.

5.7 Source Citations Within Content

When making factual claims, link to primary sources inline:

<p>Google's Information Gain patent (<a href="https://patents.google.com/patent/US11995114B2" rel="noopener">US 11,995,114 B2</a>) was granted in May 2024.</p>

This:

Substantiates the claim
Models good citation behavior
Provides AI engines a reference trail
Builds trust signals

5.8 Article Schema With Detailed Properties

Article schema with detailed properties gives AI engines structured metadata:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "@id": "{{PAGE_URL}}#article",
  "headline": "{{TITLE}}",
  "description": "{{2_SENTENCE_DESCRIPTION_FOR_CITATION_DISPLAY}}",
  "author": {"@id": "{{AUTHOR_PAGE_URL}}#person"},
  "datePublished": "{{ISO_PUBLISHED_DATE}}",
  "dateModified": "{{ISO_LAST_SUBSTANTIVE_UPDATE_DATE}}",
  "publisher": {"@id": "{{DOMAIN}}/#organization"},
  "mainEntity": {"@type": "Thing", "name": "{{PRIMARY_TOPIC}}", "sameAs": "{{WIKIDATA_URL}}"},
  "about": [
    {"@type": "Thing", "name": "{{TOPIC_1}}"},
    {"@type": "Thing", "name": "{{TOPIC_2}}"}
  ],
  "citation": [
    {
      "@type": "CreativeWork",
      "name": "{{REFERENCED_WORK_NAME}}",
      "url": "{{REFERENCED_URL}}"
    }
  ]
}
</script>

The citation property is particularly valuable — it tells AI engines what authoritative sources this content draws on, signaling rigor.

5.9 Visible Update Information

AI engines weight freshness, but only when freshness is genuine. Display update information visibly:

<div class="article-dates">
  <p>
    <time datetime="{{PUBLISHED}}">Published {{PUBLISHED_HUMAN}}</time>
    ·
    <time datetime="{{UPDATED}}">Last updated {{UPDATED_HUMAN}}</time>
  </p>
</div>

<details class="changelog">
  <summary>Article changelog</summary>
  <ul>
    <li>{{DATE}}: {{SUBSTANTIVE_CHANGE}}</li>
    <li>{{DATE}}: {{SUBSTANTIVE_CHANGE}}</li>
  </ul>
</details>

This signals genuine freshness and resists the "fake date refresh" pattern AI engines (and Google) increasingly detect.

6. Site-Wide AI Citation Infrastructure

Beyond per-article patterns, the site needs infrastructure-level AI signals.

6.1 Phase 1: Bot Access Configuration

6.1.1 robots.txt for AI bots

The robots.txt should explicitly allow AI bots the business wants citation from. The 2026 default for most businesses is allowing all major AI bots:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: CCBot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: Meta-ExternalAgent
Allow: /

User-agent: Diffbot
Allow: /

# Standard search engine bots
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /
Sitemap: https://{{domain}}/sitemap.xml

If specific bots should be blocked (e.g., for content licensing reasons), configure explicitly:

# Example: block training crawlers, allow search/answer crawlers
User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

Most businesses gain more from being cited than they lose from being trained on. The default should be permissive.

6.1.2 Verify bot visits in logs

# Check server logs for AI bot visits in last 30 days
grep -E "GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider" /var/log/nginx/access.log | \
  awk '{print $1, $7, $9}' | sort -u | head -50

If no AI bot visits appear in 30 days for a published site, troubleshoot:

Confirm robots.txt isn't blocking
Confirm site is technically reachable
Confirm no firewall blocking specific bot user agents
Submit URLs to engine's submission endpoints if available

6.2 Phase 2: llms.txt File

The llms.txt standard (proposed by Jeremy Howard in late 2024, gaining adoption through 2025-2026) provides AI engines a curated map of a site's most useful content. Build at /llms.txt:

# {{BUSINESS_NAME}}

> {{ONE_PARAGRAPH_DESCRIPTION_OF_THE_BUSINESS_AND_WHAT_THE_SITE_COVERS}}

## Primary Documentation

- [About {{BUSINESS_NAME}}](https://{{domain}}/about/): Comprehensive description of the business, its founder, and its mission
- [Services Overview](https://{{domain}}/services/): What we offer and who we serve
- [Editorial Policy](https://{{domain}}/editorial-policy/): How we create and review content
- [Disclosures](https://{{domain}}/disclosure/): AI use, advertising, and conflict-of-interest disclosures

## Core Topics We Cover

- [Web Development](https://{{domain}}/topics/web-development/): {{ONE_LINE_TOPIC_DESCRIPTION}}
- [SEO and AI Search Optimization](https://{{domain}}/topics/seo/): {{DESCRIPTION}}
- [Computer Repair](https://{{domain}}/topics/computer-repair/): {{DESCRIPTION}}

## Foundational Frameworks

- [E-E-A-T Framework](https://{{domain}}/framework-eeat/): Comprehensive guide to demonstrating Experience, Expertise, Authoritativeness, and Trustworthiness
- [YMYL Framework](https://{{domain}}/framework-ymyl/): Standards for Your Money or Your Life content
- [Helpful Content System](https://{{domain}}/framework-hcs/): People-first content guidance
- [Information Gain Framework](https://{{domain}}/framework-infogain/): Original contribution principles
- [Knowledge Graph Framework](https://{{domain}}/framework-knowledgegraph/): Entity establishment and Wikidata strategy
- [AI Citations Framework](https://{{domain}}/framework-aicitations/): How to optimize for AI engine citations

## Author / Founder

- [Joseph Anady's Profile](https://{{domain}}/about/joseph-anady/): Founder background, credentials, and topical expertise

## Contact

For inquiries, contact joseph.w.anady@icloud.com or call 505-512-3662.

The llms.txt is a hand-curated guide. It tells AI engines: "If you want to understand our site, here are the canonical entry points."

6.3 Phase 3: llms-full.txt

For longer-form crawl assistance, build /llms-full.txt with the actual full text of the most important pages concatenated:

# {{BUSINESS_NAME}} — Comprehensive Documentation

[Full text of About page]

---

[Full text of Editorial Policy]

---

[Full text of primary service descriptions]

---

[Full text of foundational framework documents]

---

[Full text of author bios for primary authors]

This is a larger file but gives AI engines a clean text representation of the site's most important content without HTML/JS interference.

6.4 Phase 4: Authority Infrastructure for AI

Apply the foundational frameworks for AI authority:

E-E-A-T strong (framework-eeat.md) — credentialed authors, comprehensive bios, organizational trust signals
Knowledge Graph established (framework-knowledgegraph.md) — Wikidata entry, Wikipedia article if notable, claimed Knowledge Panel
Entity Salience strong (framework-entitysalience.md) — primary entities clearly marked on every page
Information Gain demonstrated (framework-infogain.md) — original research, first-hand experience, novel contributions
YMYL standards met if applicable (framework-ymyl.md)

These foundational frameworks compound — sites that score highly across multiple frameworks are dramatically more likely to be cited by AI engines than sites that excel in only one.

6.5 Phase 5: AI-Specific Schema Patterns

Beyond standard schema, AI-specific patterns help:

Definitional schema for primary terms:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Entity Salience",
  "description": "A numerical score representing how central a specific entity is to a piece of content...",
  "inDefinedTermSet": {
    "@type": "DefinedTermSet",
    "name": "Search and AI Optimization Glossary",
    "url": "{{GLOSSARY_URL}}"
  }
}
</script>

Fact schema for specific facts:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Claim",
  "claimReviewed": "Google's Information Gain patent was granted May 2024",
  "claimInterpreter": {"@type": "Organization", "name": "{{BUSINESS_NAME}}"},
  "appearance": "{{PAGE_URL}}#fact-information-gain-patent-date"
}
</script>

HowTo schema for procedural content:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "{{PROCEDURE_NAME}}",
  "step": [
    {"@type": "HowToStep", "name": "{{STEP_1_NAME}}", "text": "{{STEP_1_TEXT}}"},
    {"@type": "HowToStep", "name": "{{STEP_2_NAME}}", "text": "{{STEP_2_TEXT}}"}
  ]
}
</script>

6.6 Phase 6: Freshness Strategy

AI engines prefer fresh content for time-sensitive topics. Maintain freshness genuinely:

Time-sensitive content reviewed and refreshed on cadence (see framework-hcs.md Section 6.6 — refresh strategy)
dateModified accurately reflects substantive updates
Article changelog visible (Section 5.9 above)
Time-sensitive sections marked with specific dates: "As of {{MONTH_YEAR}}"

Don't fake freshness. AI engines (especially Perplexity and ChatGPT's web search) are increasingly detecting fake refresh patterns.

6.7 Phase 7: Engine-Specific Considerations

ChatGPT/OpenAI:

Prefers factually dense content
Surfaces sources via "[Search]" mode
Cites in different presentation styles depending on query type
Prefers content from established domains
Open to user submission via web search (no specific submission API)

Perplexity:

Most aggressive about citing sources prominently
Uses real-time web retrieval
Heavy weight on freshness for time-sensitive queries
Submission via Perplexity for Publishers program (paid)
High volume of citations per response (often 5-10 sources)

Claude:

Citations tied to web search activations
Prefers structured, dense content
Heavy E-E-A-T weighting
Strong handling of YMYL — credentialed sources strongly preferred

Gemini:

Direct integration with Google's index
Strong Knowledge Graph weighting
Prefers entities recognized by Google KG
Sources panel less prominent than Perplexity

Microsoft Copilot:

Bing-indexed sources preferred
Bing Webmaster Tools submissions help
Heavy weighting toward Bing's authority signals

Grok:

Heavy preference for X (Twitter) sources
Real-time information weighted higher
Web sources cited but less prominently than X content
Authority signals less developed than other engines

Google AI Overviews:

Sources from Google search index
Strong correlation with featured snippet/top-ranked pages
Knowledge Graph entities preferred
Limited new mechanism — fundamentally an evolution of Google search

Optimize for the engines that drive your audience.

7. Tracking AI Citation Status

Continuous tracking is essential because AI engine behavior changes rapidly.

7.1 Manual Tracking Protocol

For solo practitioners or small teams without budget for tracking tools:

Maintain a query set document (10-30 queries the site should be cited for)
Quarterly: test each query in each engine
Document citation status in a tracking spreadsheet
Note changes from previous quarter
Investigate regressions

7.2 Tool-Based Tracking

For ongoing tracking:

Profound (tryprofound.com) — comprehensive AI citation tracking
Otterly.ai — tracks AI engine mentions
Athena HQ — AI search analytics
BrightEdge AI Catalyst — enterprise AI search tracking
Semrush AI Toolkit — AI Overview tracking

These tools automate query testing across engines and report citation status over time.

7.3 Server Log Analysis

Server logs show AI bot activity. Track:

Which AI bots visit and how often
Which pages they visit most
Whether visits correlate with visible citation activity
Whether bot access patterns change over time

# Monthly AI bot activity summary
awk '/GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider|anthropic-ai|CCBot|Applebot-Extended/' \
  /var/log/nginx/access.log.1 | \
  awk '{print $1, $11}' | \
  sed 's/.*\(GPTBot\|PerplexityBot\|ClaudeBot\|Google-Extended\|Bytespider\).*/\1/' | \
  sort | uniq -c | sort -rn

7.4 Citation Change Investigation

When citation status changes:

Sudden gain: Document what changed. Was new content published? Was an external citation acquired? Did a Wikipedia article appear? Understanding the trigger informs strategy.

Sudden loss: Investigate. Was content removed or moved? Did robots.txt change? Did dateModified pattern change? Did competitors gain stronger authority?

Engine-specific change: Check engine's recent product announcements. Engine ranking algorithm updates affect citation patterns.

8. Common Mistakes & Anti-Patterns

8.1 Blocking AI Bots in robots.txt Without Strategy

Anti-pattern: Blanket-blocking all AI bots out of vague concern about content training.

Why it fails: Eliminates citation possibility entirely. Lose visibility, lose traffic, lose authority compounding.

Fix: Permissive default. Block specifically only if business reason is concrete (licensed content, regulatory requirement, etc.).

8.2 No llms.txt File

Anti-pattern: Site has substantial content but no curated guide for AI engines.

Why it fails: AI engines have to figure out the site from raw crawl. Content prioritization left to engine's discretion.

Fix: Build comprehensive llms.txt directing engines to canonical content.

8.3 Padded Content with Low Factual Density

Anti-pattern: Articles padded with opinion, generality, throat-clearing — but few specific facts.

Why it fails: AI engines prefer factually dense content. Padded content gets passed over.

Fix: Cut padding. Pack specific, citable facts. If you don't have facts, you don't have an article.

8.4 Buried Answers

Anti-pattern: The answer to the page's primary question is in section 4 of an 8-section article.

Why it fails: AI engines extract from early content. Buried answers don't get extracted.

Fix: Direct answers in opening paragraphs. Elaboration follows.

8.5 No Knowledge Graph Presence

Anti-pattern: Site has good content but no Wikidata entry, no Knowledge Panel, no entity authority infrastructure.

Why it fails: AI engines weight Knowledge Graph presence heavily. Sites without it cited less.

Fix: Build Knowledge Graph foundation per framework-knowledgegraph.md.

8.6 Fake Date Refresh

Anti-pattern: Updating dateModified without substantive content updates to look fresh.

Why it fails: AI engines (and Google) increasingly detect fake refresh. Trust damage.

Fix: Refresh only on substantive updates. Use changelog to demonstrate genuine refresh history.

8.7 Generic Author Bylines

Anti-pattern: "By the editorial team" or "By staff writer" or no byline at all.

Why it fails: AI engines, particularly for YMYL content, weight credentialed authorship heavily. Anonymous content cited less.

Fix: Real authors with real credentials and Person schema.

8.8 No Schema or Minimal Schema

Anti-pattern: Site relies entirely on HTML structure with no JSON-LD schema.

Why it fails: Schema gives AI engines structured metadata for extraction. Missing it means engines work harder to understand the page.

Fix: Comprehensive schema per page type per framework-eeat.md and framework-entitysalience.md.

8.9 Content Behind JavaScript Walls

Anti-pattern: Content rendered entirely by JavaScript without server-side rendering or pre-rendering.

Why it fails: Many AI bots don't execute JavaScript. Content not rendered before delivery isn't readable.

Fix: Server-side rendering, static generation, or hybrid approaches that deliver content in HTML.

8.10 Information Gain Absent

Anti-pattern: Pages aggregate and rephrase existing content with no original contribution.

Why it fails: AI engines synthesizing from sources prefer sources adding novel information. Aggregator content gets passed over.

Fix: Information Gain per article per framework-infogain.md.

8.11 Engine-Agnostic Approach

Anti-pattern: Treating all AI engines the same. Optimizing identically for ChatGPT and Grok.

Why it fails: Engines weight signals differently. Maximum citation requires engine-aware optimization.

Fix: Track per-engine status. Optimize patterns where engine signals diverge.

8.12 No Tracking, No Learning

Anti-pattern: Hoping for AI citations without tracking whether they're happening.

Why it fails: Without data, no improvement loop. Strategies persist that don't work.

Fix: Tracked query set. Quarterly assessment minimum. Tools where budget allows.

9. Stack-Specific Notes

9.1 WordPress

Yoast or Rank Math handles much of the schema
Custom fields for AI-specific metadata: factual summary, key facts, definitional content
WP plugin for llms.txt generation
Editorial workflow includes AI citation optimization checklist

9.2 Next.js / Astro / Hugo

Static generation ensures content is in HTML for bots
Structured frontmatter requires AI-relevant metadata
llms.txt generated at build time from content collections
Schema generators automate JSON-LD creation

9.3 Universal

Pre-publish checklist includes AI citation patterns
Top-of-article factual summary required
Schema validation in CI/CD
llms.txt regenerated on content changes
Bot access verified post-deploy

10. Cross-Reference to the 14-Tier Framework

AI Citation implementation touches:

Tier 3 LLMO — LLM Optimization is foundational AI citation work
Tier 3 SGA — SearchGPT Optimization
Tier 3 GEO — Generative Engine Optimization
Tier 3 AEO — Answer Engine Optimization
Tier 3 BLF — Bot/LLM File optimization (llms.txt)
Tier 3 AIO — Direct AI optimization
Tier 3 EEO — Entity Engine Optimization
Tier 3 KGO — Knowledge Graph (foundational for AI)

The Tier 3 (AI Domination) cluster of the 14-tier framework is essentially a tactical implementation of this framework's principles. AI Citation is the strategic frame; Tier 3 items are the specific deliverables.

11. Audit Mode

11.1 Per-Engine Citation Audit

For tracked query set, score per engine:

Engine	Queries Cited For	% of Tracked Set	Citation Quality
ChatGPT	{{COUNT}}	{{%}}	{{HIGH/MEDIUM/LOW}}
Perplexity	{{COUNT}}	{{%}}	{{HIGH/MEDIUM/LOW}}
Claude	{{COUNT}}	{{%}}	{{HIGH/MEDIUM/LOW}}
Gemini	{{COUNT}}	{{%}}	{{HIGH/MEDIUM/LOW}}
Copilot	{{COUNT}}	{{%}}	{{HIGH/MEDIUM/LOW}}
Grok	{{COUNT}}	{{%}}	{{HIGH/MEDIUM/LOW}}
AI Overview	{{COUNT}}	{{%}}	{{HIGH/MEDIUM/LOW}}

11.2 Per-Page AI Optimization Audit

For sample pages, score:

#	Criterion	Pass/Fail
AI1	Top-of-article factual summary present
AI2	First paragraph delivers substantive answer
AI3	High factual density throughout
AI4	Q&A structure used where applicable
AI5	Key facts callouts present
AI6	Definitional clarity for terms
AI7	Source citations linked inline
AI8	Article schema with all major properties
AI9	mainEntity declared with sameAs
AI10	dateModified is genuine (not faked)
AI11	Author with credentials (Person schema)
AI12	FAQPage schema if Q&A content

Per-page score: 12. World-class AI citation page: 11+/12.

11.3 Site-Wide AI Citation Audit

#	Criterion	Pass/Fail
AIS1	robots.txt allows all major AI bots
AIS2	AI bot visits verified in server logs
AIS3	llms.txt file present at /llms.txt
AIS4	llms-full.txt present (optional but valuable)
AIS5	Knowledge Graph foundation established
AIS6	E-E-A-T score high (110+/130)
AIS7	Entity Salience strong on primary topics
AIS8	Information Gain demonstrable
AIS9	YMYL standards met if applicable
AIS10	Tracking infrastructure in place
AIS11	Per-engine citation status known
AIS12	Quarterly audit cadence active

Site score: 12. World-class AI citation site: 11+/12.

12. Maintenance Schedule

12.1 Weekly

Verify AI bot visits in server logs
Spot-check 1-2 tracked queries on primary engines
Monitor for AI engine product announcements affecting citation behavior

12.2 Monthly

Update llms.txt if site content has expanded
Review newly published content for AI citation patterns
Test 5 tracked queries across all engines

12.3 Quarterly

Full tracked query set test across all engines
Document citation status changes from previous quarter
Investigate gains and losses
Audit AI bot access in robots.txt
Refresh time-sensitive content
Check schema validation across primary pages

12.4 Annually

Comprehensive AI Citation framework audit
Strategic review of citation status across all engines
Update tracked query set to reflect current authority
Review engine-specific signal weights based on year's observed patterns
Update llms.txt comprehensively
Update Knowledge Graph entries with year's developments

12.5 On Major AI Engine Updates

When OpenAI, Perplexity, Anthropic, Google, etc. announce major updates:

Read the announcement in detail
Identify changes affecting citation behavior
Test citation status on affected query types
Adjust optimization patterns if engine signal weights shifted
Update this framework document if patterns suggest framework gaps

13. Implementation/Audit Report Templates

13.1 AI Citation Implementation Report Template

# AI Citation Framework Implementation Report

**Site**: {{BUSINESS_NAME}}
**Implementation Date**: {{TODAY}}

## Summary
- Pages with AI citation patterns installed: {{COUNT}}
- robots.txt configured for AI bots: {{STATUS}}
- llms.txt created: {{STATUS}}
- llms-full.txt created: {{STATUS}}
- Knowledge Graph foundation: {{STATUS}}

## AI Citation Patterns Installed
- Top-of-article factual summaries: {{COUNT}}
- Q&A structures: {{COUNT}}
- Key facts callouts: {{COUNT}}
- FAQPage schemas: {{COUNT}}
- Article schemas with citation properties: {{COUNT}}

## Bot Access Verification
- GPTBot visits in last 30 days: {{COUNT}}
- PerplexityBot visits: {{COUNT}}
- ClaudeBot visits: {{COUNT}}
- Google-Extended visits: {{COUNT}}
- CCBot visits: {{COUNT}}

## Tracking Infrastructure
- Tracked query set: {{COUNT}} queries
- Tool: {{TOOL_NAME}}
- Baseline citation status documented: {{YES/NO}}

## Sign-Off

13.2 AI Citation Audit Report Template

# AI Citation Framework Audit Report

**Site**: {{BUSINESS_NAME}}
**Audit Date**: {{TODAY}}

## Executive Summary
{{ONE_PARAGRAPH_ASSESSMENT}}

## Site-Wide AI Citation Score
{{X}}/12

## Per-Engine Citation Status
{{TABLE_OF_ENGINES_AND_CITATION_FREQUENCIES}}

## Per-Page Audit (Sample)
{{TABLE_OF_SAMPLED_PAGES_WITH_AI_CITATION_SCORES}}

## Foundation Framework Status
- E-E-A-T: {{SCORE}}/130
- Knowledge Graph: {{STATUS}}
- Entity Salience: {{STATUS}}
- Information Gain: {{STATUS}}
- YMYL: {{STATUS}}

## Bot Access Status
{{ROBOTS_TXT_AND_LOG_FINDINGS}}

## llms.txt Status
{{ASSESSMENT}}

## Critical Failures
{{LIST_WITH_REMEDIATION}}

## Engine-Specific Findings
- ChatGPT: {{FINDINGS}}
- Perplexity: {{FINDINGS}}
- Claude: {{FINDINGS}}
- Gemini: {{FINDINGS}}
- Copilot: {{FINDINGS}}
- Grok: {{FINDINGS}}
- AI Overview: {{FINDINGS}}

## Recommended Remediation Order
{{PRIORITIZED_LIST}}

## Tracked Query Status Trend
{{COMPARISON_TO_PREVIOUS_AUDITS}}

## Sign-Off

End of Framework Document

Document version: 1.0
Last updated: 2026-04-29
Maintained by: ThatDeveloperGuy

AI Citation is the new center of gravity for web visibility. Sites that earn citations across multiple AI engines compound authority across the entire AI search ecosystem. Sites that don't optimize for AI citation lose visibility as user behavior shifts toward AI-first information seeking.

The work is structural and methodical. Allow bots. Build the llms.txt. Engineer content for extractability. Establish the Knowledge Graph foundation. Demonstrate Information Gain. Maintain freshness. Track citation status. Iterate.

The frameworks in this library — E-E-A-T, YMYL, HCS, SQRG, Core Updates, Information Gain, Entity Salience, Knowledge Graph, AI Citations — converge on one operational truth: build a site that genuinely deserves to be cited as an authority on the topics it covers. Every framework approaches that truth from a different angle. Together they specify what "deserving" means in 2026 search and AI ecosystem.

Companion documents:

framework-eeat.md — Foundational E-E-A-T
framework-ymyl.md — Your Money or Your Life elevated standards
framework-hcs.md — Helpful Content System
framework-sqrg.md — Search Quality Rater Guidelines
framework-coreupdates.md — Google Core Updates
framework-infogain.md — Information Gain
framework-entitysalience.md — Entity Salience
framework-knowledgegraph.md — Knowledge Graph

About this framework library

This article is the Dev.to republish of a framework reference document from ThatDevPro's SEO + AI engineering library. Canonical source: https://www.thatdevpro.com/insights/framework-aicitations/

ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio operating from Cassville, Missouri. The studio runs the full 14-tier Engine Optimization stack and ships open-source tooling for AI citation engineering.

Companion 14-tier Engine Optimization stack (each tier is its own article):

Need this framework implemented on your site? See the Engine Optimization service or hire through ThatDevPro contact.

Top comments (1)

Harjot Singh • May 31

Understanding HOW each model picks sources is the actually-actionable layer under all the AEO hype - it's the difference between "make good content" (vague) and "structure content the way this model's retrieval actually rewards" (concrete). The fact that ChatGPT, Claude, and Perplexity weight and surface sources differently means there isn't one optimization; there's a per-engine game, the same way SEO once split across Google/Bing quirks.

The strategic implication for anyone building in this space: being the cited source is becoming a real acquisition channel, and it compounds - once a model reliably cites you for a topic, you're the default answer, which is far stickier than a search ranking that resets every algorithm update. It's a channel I'm actively betting on for Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - being the answer when someone asks an AI "how do I get from prompt to deployed app" beats any ad. Genuinely useful breakdown. Did you find the citation behavior differs enough between the three that you'd optimize separately, or is there a common core (clear structure, primary sourcing) that wins across all of them?