Originally published at thatdevpro.com. Part of ThatDevPro's open SEO + AI framework library. ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio. Open-source AI citation toolkit: github.com/Janady13/aio-surfaces.
How LLMs Choose Sources to Cite — Optimizing for ChatGPT, Perplexity, Claude, Gemini, Copilot, and Grok
A comprehensive installation and audit reference for understanding how AI engines select and cite sources, structurally engineering content to be selected, monitoring AI citation status across engines, and building the kind of authority that compounds across the AI search ecosystem. This document is dual-purpose: installation manual and audit document.
Cross-stack implementation note: the code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see
framework-cross-stack-implementation.md. For pure client-rendered SPAs (no SSR/SSG) seeframework-react.md. For Tailwind-specific concerns (purge, dynamic classes, dark-mode CLS, focus accessibility) seeframework-tailwind.md.
1. Document Purpose & How to Use This Document
1.1 What This Document Is
This is the canonical reference for AI Citation optimization — the practice of structuring content, signals, and authority so that AI engines (ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok, and Google's AI Overviews) select your content as a source when answering user questions. AI Citation is the new center of gravity in search visibility. Increasingly, users ask AI engines questions instead of search engines, and the question for content creators shifts from "do we rank?" to "do we get cited?"
The mechanics of AI Citation are not identical to traditional SEO. AI engines use Retrieval Augmented Generation (RAG), real-time search, embeddings-based similarity, freshness signals, and authority weightings to choose sources. Some signals overlap with SEO (E-E-A-T, entity authority, structured data); others are specific to how LLMs evaluate content (chunk-level coherence, factual density, citation-worthiness, embedding distinctiveness).
This framework specifies how each major AI engine selects sources, what signals to install on a website to improve selection probability, how to track AI citation status across engines, and how to maintain the authority that drives sustained citation over time. AI Citation works in concert with — but is structurally distinct from — the other frameworks in this library.
1.2 Three Operating Modes
Mode A — Install Mode: Building AI citation optimization infrastructure into a site. Follow Sections 2 → 14.
Mode B — Audit Mode: Evaluating current citation status across AI engines. Skip to Section 11.
Mode C — Hybrid Mode: Audit then install for failing items.
1.3 How Claude Code CLI Should Consume This Document
- Read Section 2 — collect client variables, especially current AI engine citation status
- Read Section 3 — understand AI Citation theory and how each engine works
- Run Section 4 — assess current citation state across all major engines
- Install Sections 5-9 — content patterns, technical infrastructure, llms.txt, RAG-friendly structure
- Validate — Section 11 (test queries on each engine; document citation status)
- Generate report — Section 14
1.4 Conflict Resolution Rules
| Conflict | Rule |
|---|---|
| Existing content not appearing in AI citations despite ranking well in Google | Apply AI-specific signals (chunk structure, factual density, llms.txt, freshness markers). Traditional SEO doesn't fully translate. |
| AI engines citing outdated content over current content | Strengthen freshness signals; verify dateModified is genuine; reach out to engine via supported channels for re-indexing. |
| AI engines hallucinating about the entity | Reinforce Knowledge Graph signals (see framework-knowledgegraph.md); ensure structured facts on official pages. |
| Some AI engines citing, others not | Each engine has different mechanisms. Optimize per-engine where signals diverge. |
1.5 Required Tools
- AI engines themselves — ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok — primary testing environment
- Profound, Athena HQ, AthenaHQ, BrightEdge AI Catalyst, Otterly.ai — AI citation tracking platforms
- Google Search Console — for AI Overviews specifically (some impressions data is available)
- Bing Webmaster Tools — Microsoft Copilot is integrated with Bing
- Server logs — to identify and verify AI bot traffic (GPTBot, PerplexityBot, ClaudeBot, etc.)
2. Client Variables Intake
# ============================================
# AI CITATIONS FRAMEWORK CLIENT VARIABLES
# ============================================
# --- Business & Entity Identity (REQUIRED) ---
business_name: ""
primary_domain: ""
business_industry: ""
business_wikidata_qid: ""
business_in_knowledge_graph: false # From framework-knowledgegraph.md audit
# --- Current AI Citation Status (REQUIRED for audit) ---
chatgpt_citation_status: "" # "regularly_cited", "occasionally_cited", "rarely_cited", "never_cited", "unknown"
perplexity_citation_status: ""
claude_citation_status: ""
gemini_citation_status: ""
copilot_citation_status: ""
grok_citation_status: ""
google_ai_overview_citation_status: ""
# --- Citation-Worthy Topics (REQUIRED) ---
topics_where_we_should_be_cited: [] # Topics where the site has strong authority
topics_where_currently_cited: [] # Topics where AI engines do cite the site
topics_where_competitors_cited_instead: [] # Where the site should be cited but isn't
# --- Technical AI Access Status (REQUIRED) ---
robots_txt_blocks_ai_bots: false # Critical — if true, no AI engine can cite this site
specific_ai_bots_blocked: [] # Which bots if any
llms_txt_exists: false # /llms.txt file
llms_full_txt_exists: false # /llms-full.txt file
ai_bot_access_verified: [] # Confirmed access for which bots via logs
# --- Content Patterns (REQUIRED) ---
articles_have_q_and_a_structure: false # Q&A format aids AI extraction
articles_have_factual_summary_at_top: false # Top-of-article TL;DR
articles_have_key_facts_callouts: false # Highlighted key facts
content_uses_clear_definitional_statements: false
content_uses_first_paragraph_substantive_answers: false
content_has_high_factual_density: false # Not opinion-padded
# --- Schema for AI Citation (REQUIRED) ---
has_organization_schema: false
has_article_schema_with_dateModified: false
has_qapage_schema_where_applicable: false
has_dataset_schema_for_research: false
has_clear_entity_declarations_per_page: false # mainEntity + about
# --- Freshness Infrastructure (REQUIRED) ---
content_dateModified_kept_current: false
content_changelog_exposed: false # Visible record of substantive updates
content_refresh_cadence: "" # "monthly", "quarterly", "annually", "ad_hoc"
time_sensitive_content_specifically_managed: false
# --- Authority Signals AI Engines Weight (REQUIRED) ---
eeat_score: 0 # From framework-eeat.md
ymyl_score: 0 # From framework-ymyl.md
information_gain_score: 0 # From framework-infogain.md
external_citation_count: 0 # How often external sources cite this site
wikipedia_references_to_site: 0 # Most powerful AI citation signal
academic_citations: 0 # For research-bearing sites
# --- AI-Specific Authority Signals (RECOMMENDED) ---
mentioned_in_llm_training_documentation: false # If site is documented as training source
official_partner_status_with_ai_companies: false
appears_in_stable_kg_databases: false # CommonCrawl, etc.
# --- Tracking Infrastructure (REQUIRED) ---
has_ai_citation_tracking_setup: false
ai_citation_tracking_tool: "" # Profound, Otterly, Athena, manual, etc.
tracked_query_set: [] # Specific queries tracked across engines
last_citation_audit_date: ""
3. What AI Citation Theory Is
AI Citation Theory describes how AI engines — large language models grounded with real-time retrieval — choose which web sources to cite when answering user questions. The mechanism is fundamentally different from traditional search ranking, though it shares some signals.
When a user asks an AI engine a question, the engine typically:
1. Interprets the query — uses the LLM to understand intent, identify entities, and formulate sub-queries
2. Retrieves candidate sources — searches the web (or a curated index) for documents potentially relevant to answering
3. Ranks and filters candidates — applies authority, freshness, factual density, and similarity scoring to select the strongest candidates
4. Reads and extracts — processes the candidate documents to extract answer-relevant content
5. Synthesizes the answer — composes a response drawing from extracted content
6. Selects citations — chooses which sources to display as citations alongside the response
The citation selection step is where "AI Citation Optimization" focuses. Even when a source is retrieved and read, it may not appear as a visible citation in the final response. Engines vary in how aggressively they cite — Perplexity displays many citations prominently; ChatGPT cites less prominently; Claude cites when retrieving but the citation may be brief.
The factors AI engines weight in source selection (synthesizing across engines):
Authority — Sources with established authority (recognized entities, credentialed authors, established publications) are preferred. Knowledge Graph presence is a particularly strong signal.
Factual density — Sources that pack more verifiable facts per word are preferred over opinion-heavy or padded content.
Definitional clarity — Sources that clearly define terms, concepts, and entities in straightforward language are easier for LLMs to extract from.
Freshness — For time-sensitive queries, newer content is preferred. For evergreen queries, freshness matters less but currency still helps.
Structural extractability — Content with clear Q&A structure, headed sections, factual summaries, and clean HTML is easier to extract and cite cleanly.
Distinctive content — Sources offering Information Gain (see framework-infogain.md) — original research, first-hand experience, contrarian analysis — are preferred over derivative content.
Trust signals — Sources with strong E-E-A-T (see framework-eeat.md) signals are preferred, especially for YMYL queries.
Bot accessibility — Sources that allow the engine's bot to crawl are required; sites blocking AI bots are excluded entirely.
The 2026 evolution of AI Citation:
- AI Overviews now appear for the majority of US searches in Google — citation in AI Overviews is increasingly visible
- ChatGPT's web search rolled out to default for free users; citation visibility is meaningful
- Perplexity grew to substantial daily active user counts with prominent citation display
- Claude's projects feature increased real-time browsing for retrieval
- Microsoft Copilot deeply integrated with Bing's index
- Grok added web grounding with X-source preference
Each engine evolves rapidly. The principles in this framework are stable; the specific implementation patterns adapt as engines change.
4. AI Citation Status Assessment
Before optimization, understand current citation status.
4.1 Per-Engine Citation Testing
Define a tracked query set — 10-30 queries the site should be cited for based on its topical authority. For each query, test on each engine:
ChatGPT (chatgpt.com):
- Sign in with web search enabled
- Ask the query
- Document: did the response cite the site? What context?
Perplexity (perplexity.ai):
- Ask the query
- Document citations panel — is the site listed? What rank?
Claude (claude.ai):
- Ask the query with web search prompted
- Document citations in response
Gemini (gemini.google.com):
- Ask the query
- Document sources panel — is the site listed?
Microsoft Copilot (copilot.microsoft.com):
- Ask the query
- Document citations in response
Grok (x.com/i/grok):
- Ask the query
- Document citations
Google AI Overview:
- Search the query in Google
- If AI Overview appears, document if site is cited
Build a citation matrix:
query,chatgpt,perplexity,claude,gemini,copilot,grok,ai_overview,date_tested
"how to optimize for AI engines",no,yes_rank3,no,no,no,no,no,2026-04-29
"E-E-A-T explained",yes_inline,yes_rank1,yes,yes,no,no,yes,2026-04-29
"SDVOSB web development",no,yes_rank5,no,no,no,no,no,2026-04-29
4.2 Citation Frequency Classification
Per topic, classify citation frequency:
- Regularly cited: Cited in 4+ engines for queries in the topic area
- Occasionally cited: Cited in 2-3 engines
- Rarely cited: Cited in 1 engine
- Never cited: Not appearing in any engine for queries the site should win
4.3 Competitor Citation Analysis
For queries where the site is not cited, document who is:
- Which competitors are cited?
- What do those sources have that this site doesn't?
- Are they Wikipedia entries? Major media? Industry publications? Smaller sites with distinctive content?
This reveals what types of authority each engine prefers for the topic.
4.4 Bot Access Verification
Verify AI engine bots can access the site:
# Check robots.txt explicitly
curl https://{{domain}}/robots.txt | grep -E "GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider|anthropic-ai|cohere-ai"
Major AI bots in 2026:
- GPTBot — OpenAI's training crawler
- OAI-SearchBot — OpenAI's search retrieval bot
- ChatGPT-User — User-triggered fetches
- PerplexityBot — Perplexity's crawler
- ClaudeBot / anthropic-ai — Anthropic's crawlers
- Google-Extended — Google AI training (separate from Googlebot)
- GoogleOther — Google's other crawlers
- CCBot — Common Crawl (used by many LLMs for training)
- Applebot-Extended — Apple Intelligence
- Bytespider — TikTok/ByteDance
- Diffbot — knowledge graph crawler
- FacebookBot / Meta-ExternalAgent — Meta AI
Confirm in server logs that these bots are visiting and getting 200 responses.
4.5 Citation Status Summary
After assessment, classify the site's citation posture:
- AI-authoritative — regularly cited across multiple engines on multiple topics
- Citation-emerging — occasionally cited; clear pattern of growing recognition
- Citation-minimal — rarely cited despite topical authority
- Citation-absent — not cited anywhere despite strong organic search performance
The implementation path differs by status.
5. Per-Article AI Citation Implementation
Structural patterns that improve AI citation likelihood.
5.1 Top-of-Article Factual Summary
AI engines often extract from the first part of an article. Lead with substantive content, not throat-clearing.
<article class="ai-citation-optimized">
<header>
<h1>{{ENTITY_OR_TOPIC_AS_HEADLINE}}</h1>
{{BYLINE_WITH_CREDENTIALS}}
{{DATES_INCLUDING_dateModified_PROMINENTLY}}
</header>
<!-- Top-of-article factual summary — 2-4 sentences, definitional and substantive -->
<section class="factual-summary" aria-label="Summary">
<p>{{2_TO_4_SENTENCE_SUMMARY_THAT_DEFINES_THE_TOPIC_OR_ANSWERS_THE_QUERY_DIRECTLY}}</p>
</section>
<!-- Body content -->
<section class="article-body">
{{CONTENT}}
</section>
</article>
The factual summary should:
- Define the topic or directly answer the query
- Use complete sentences (extractable)
- Be factually dense (multiple specific claims)
- Avoid filler ("In this article we'll explore...")
- Match the page's primary entity (see
framework-entitysalience.md)
5.2 Q&A Structure for Common Questions
For sections answering specific questions, use explicit Q&A structure:
<section class="article-faq">
<h2>Common Questions</h2>
<h3>What is entity salience?</h3>
<p>Entity salience is a numerical score (0.0 to 1.0) calculated by Google's natural language processing systems that represents how central a specific entity is to a piece of content...</p>
<h3>How is entity salience measured?</h3>
<p>Entity salience is measured by analyzing position, frequency, grammatical role, and co-occurrence patterns within the content...</p>
</section>
Pair with FAQPage schema:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is entity salience?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Entity salience is a numerical score..."
}
},
{
"@type": "Question",
"name": "How is entity salience measured?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Entity salience is measured by..."
}
}
]
}
</script>
Q&A structure aligns directly with how users phrase queries to AI engines, making extraction-and-citation easier.
5.3 Key Facts Callouts
Highlight factual claims that AI engines might want to cite:
<aside class="key-fact" role="note">
<p><strong>Key fact:</strong> Google's Information Gain patent (US 11,995,114 B2) was granted in May 2024 and describes a system for scoring documents based on novelty contribution.</p>
</aside>
Key facts should:
- State a single specific fact
- Be sourced (link to primary source)
- Be self-contained (extractable without surrounding context)
- Be visually distinguished
5.4 Definitional Clarity
When introducing terms or concepts, use clear definitional patterns AI engines extract well:
Strong: "Entity salience is a numerical score representing how central an entity is to content."
Weak: "When we talk about entity salience, what we're really getting at is how much a piece of content focuses on something specific."
The strong version is a clear "X is Y" definition. The weak version requires multiple inferences.
5.5 First-Paragraph Substantive Answers
For articles answering specific queries, the first paragraph should answer the query substantively. Don't make readers (or AI engines) hunt:
Anti-pattern (delays answer):
"Entity salience has become an increasingly important topic in modern SEO. With the rise of AI engines and the changing nature of search, many practitioners are wondering what this means for their content strategy. In this article, we'll explore the concept of entity salience in depth..."
Better (delivers answer immediately):
"Entity salience is a numerical score (0.0-1.0) calculated by natural language processing systems to determine how central an entity is to a piece of content. Google uses entity salience scoring as a key signal for determining what queries a page should rank for, and AI engines use similar scoring to determine which sources to cite."
5.6 High Factual Density
Articles that pack many specific facts per paragraph are more cite-worthy than articles that pad with opinion or generality.
Lower density: "SEO has changed a lot in recent years. Things that used to work don't anymore. Smart marketers are adapting to the new landscape."
Higher density: "Google's March 2024 core update integrated the Helpful Content System into core ranking. The September 2025 Search Quality Rater Guidelines update added evaluation criteria for AI Overviews. The December 2025 core update specifically targeted mass-produced AI content, with content farms losing 40-80% of organic traffic."
The high-density version is full of specific, citable facts.
5.7 Source Citations Within Content
When making factual claims, link to primary sources inline:
<p>Google's Information Gain patent (<a href="https://patents.google.com/patent/US11995114B2" rel="noopener">US 11,995,114 B2</a>) was granted in May 2024.</p>
This:
- Substantiates the claim
- Models good citation behavior
- Provides AI engines a reference trail
- Builds trust signals
5.8 Article Schema With Detailed Properties
Article schema with detailed properties gives AI engines structured metadata:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"@id": "{{PAGE_URL}}#article",
"headline": "{{TITLE}}",
"description": "{{2_SENTENCE_DESCRIPTION_FOR_CITATION_DISPLAY}}",
"author": {"@id": "{{AUTHOR_PAGE_URL}}#person"},
"datePublished": "{{ISO_PUBLISHED_DATE}}",
"dateModified": "{{ISO_LAST_SUBSTANTIVE_UPDATE_DATE}}",
"publisher": {"@id": "{{DOMAIN}}/#organization"},
"mainEntity": {"@type": "Thing", "name": "{{PRIMARY_TOPIC}}", "sameAs": "{{WIKIDATA_URL}}"},
"about": [
{"@type": "Thing", "name": "{{TOPIC_1}}"},
{"@type": "Thing", "name": "{{TOPIC_2}}"}
],
"citation": [
{
"@type": "CreativeWork",
"name": "{{REFERENCED_WORK_NAME}}",
"url": "{{REFERENCED_URL}}"
}
]
}
</script>
The citation property is particularly valuable — it tells AI engines what authoritative sources this content draws on, signaling rigor.
5.9 Visible Update Information
AI engines weight freshness, but only when freshness is genuine. Display update information visibly:
<div class="article-dates">
<p>
<time datetime="{{PUBLISHED}}">Published {{PUBLISHED_HUMAN}}</time>
·
<time datetime="{{UPDATED}}">Last updated {{UPDATED_HUMAN}}</time>
</p>
</div>
<details class="changelog">
<summary>Article changelog</summary>
<ul>
<li>{{DATE}}: {{SUBSTANTIVE_CHANGE}}</li>
<li>{{DATE}}: {{SUBSTANTIVE_CHANGE}}</li>
</ul>
</details>
This signals genuine freshness and resists the "fake date refresh" pattern AI engines (and Google) increasingly detect.
6. Site-Wide AI Citation Infrastructure
Beyond per-article patterns, the site needs infrastructure-level AI signals.
6.1 Phase 1: Bot Access Configuration
6.1.1 robots.txt for AI bots
The robots.txt should explicitly allow AI bots the business wants citation from. The 2026 default for most businesses is allowing all major AI bots:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: GoogleOther
Allow: /
User-agent: CCBot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
User-agent: Diffbot
Allow: /
# Standard search engine bots
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Allow: /
Sitemap: https://{{domain}}/sitemap.xml
If specific bots should be blocked (e.g., for content licensing reasons), configure explicitly:
# Example: block training crawlers, allow search/answer crawlers
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Allow: /
Most businesses gain more from being cited than they lose from being trained on. The default should be permissive.
6.1.2 Verify bot visits in logs
# Check server logs for AI bot visits in last 30 days
grep -E "GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider" /var/log/nginx/access.log | \
awk '{print $1, $7, $9}' | sort -u | head -50
If no AI bot visits appear in 30 days for a published site, troubleshoot:
- Confirm robots.txt isn't blocking
- Confirm site is technically reachable
- Confirm no firewall blocking specific bot user agents
- Submit URLs to engine's submission endpoints if available
6.2 Phase 2: llms.txt File
The llms.txt standard (proposed by Jeremy Howard in late 2024, gaining adoption through 2025-2026) provides AI engines a curated map of a site's most useful content. Build at /llms.txt:
# {{BUSINESS_NAME}}
> {{ONE_PARAGRAPH_DESCRIPTION_OF_THE_BUSINESS_AND_WHAT_THE_SITE_COVERS}}
## Primary Documentation
- [About {{BUSINESS_NAME}}](https://{{domain}}/about/): Comprehensive description of the business, its founder, and its mission
- [Services Overview](https://{{domain}}/services/): What we offer and who we serve
- [Editorial Policy](https://{{domain}}/editorial-policy/): How we create and review content
- [Disclosures](https://{{domain}}/disclosure/): AI use, advertising, and conflict-of-interest disclosures
## Core Topics We Cover
- [Web Development](https://{{domain}}/topics/web-development/): {{ONE_LINE_TOPIC_DESCRIPTION}}
- [SEO and AI Search Optimization](https://{{domain}}/topics/seo/): {{DESCRIPTION}}
- [Computer Repair](https://{{domain}}/topics/computer-repair/): {{DESCRIPTION}}
## Foundational Frameworks
- [E-E-A-T Framework](https://{{domain}}/framework-eeat/): Comprehensive guide to demonstrating Experience, Expertise, Authoritativeness, and Trustworthiness
- [YMYL Framework](https://{{domain}}/framework-ymyl/): Standards for Your Money or Your Life content
- [Helpful Content System](https://{{domain}}/framework-hcs/): People-first content guidance
- [Information Gain Framework](https://{{domain}}/framework-infogain/): Original contribution principles
- [Knowledge Graph Framework](https://{{domain}}/framework-knowledgegraph/): Entity establishment and Wikidata strategy
- [AI Citations Framework](https://{{domain}}/framework-aicitations/): How to optimize for AI engine citations
## Author / Founder
- [Joseph Anady's Profile](https://{{domain}}/about/joseph-anady/): Founder background, credentials, and topical expertise
## Contact
For inquiries, contact joseph.w.anady@icloud.com or call 505-512-3662.
The llms.txt is a hand-curated guide. It tells AI engines: "If you want to understand our site, here are the canonical entry points."
6.3 Phase 3: llms-full.txt
For longer-form crawl assistance, build /llms-full.txt with the actual full text of the most important pages concatenated:
# {{BUSINESS_NAME}} — Comprehensive Documentation
[Full text of About page]
---
[Full text of Editorial Policy]
---
[Full text of primary service descriptions]
---
[Full text of foundational framework documents]
---
[Full text of author bios for primary authors]
This is a larger file but gives AI engines a clean text representation of the site's most important content without HTML/JS interference.
6.4 Phase 4: Authority Infrastructure for AI
Apply the foundational frameworks for AI authority:
-
E-E-A-T strong (
framework-eeat.md) — credentialed authors, comprehensive bios, organizational trust signals -
Knowledge Graph established (
framework-knowledgegraph.md) — Wikidata entry, Wikipedia article if notable, claimed Knowledge Panel -
Entity Salience strong (
framework-entitysalience.md) — primary entities clearly marked on every page -
Information Gain demonstrated (
framework-infogain.md) — original research, first-hand experience, novel contributions -
YMYL standards met if applicable (
framework-ymyl.md)
These foundational frameworks compound — sites that score highly across multiple frameworks are dramatically more likely to be cited by AI engines than sites that excel in only one.
6.5 Phase 5: AI-Specific Schema Patterns
Beyond standard schema, AI-specific patterns help:
Definitional schema for primary terms:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "DefinedTerm",
"name": "Entity Salience",
"description": "A numerical score representing how central a specific entity is to a piece of content...",
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"name": "Search and AI Optimization Glossary",
"url": "{{GLOSSARY_URL}}"
}
}
</script>
Fact schema for specific facts:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Claim",
"claimReviewed": "Google's Information Gain patent was granted May 2024",
"claimInterpreter": {"@type": "Organization", "name": "{{BUSINESS_NAME}}"},
"appearance": "{{PAGE_URL}}#fact-information-gain-patent-date"
}
</script>
HowTo schema for procedural content:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "{{PROCEDURE_NAME}}",
"step": [
{"@type": "HowToStep", "name": "{{STEP_1_NAME}}", "text": "{{STEP_1_TEXT}}"},
{"@type": "HowToStep", "name": "{{STEP_2_NAME}}", "text": "{{STEP_2_TEXT}}"}
]
}
</script>
6.6 Phase 6: Freshness Strategy
AI engines prefer fresh content for time-sensitive topics. Maintain freshness genuinely:
- Time-sensitive content reviewed and refreshed on cadence (see
framework-hcs.mdSection 6.6 — refresh strategy) -
dateModifiedaccurately reflects substantive updates - Article changelog visible (Section 5.9 above)
- Time-sensitive sections marked with specific dates: "As of {{MONTH_YEAR}}"
Don't fake freshness. AI engines (especially Perplexity and ChatGPT's web search) are increasingly detecting fake refresh patterns.
6.7 Phase 7: Engine-Specific Considerations
ChatGPT/OpenAI:
- Prefers factually dense content
- Surfaces sources via "[Search]" mode
- Cites in different presentation styles depending on query type
- Prefers content from established domains
- Open to user submission via web search (no specific submission API)
Perplexity:
- Most aggressive about citing sources prominently
- Uses real-time web retrieval
- Heavy weight on freshness for time-sensitive queries
- Submission via Perplexity for Publishers program (paid)
- High volume of citations per response (often 5-10 sources)
Claude:
- Citations tied to web search activations
- Prefers structured, dense content
- Heavy E-E-A-T weighting
- Strong handling of YMYL — credentialed sources strongly preferred
Gemini:
- Direct integration with Google's index
- Strong Knowledge Graph weighting
- Prefers entities recognized by Google KG
- Sources panel less prominent than Perplexity
Microsoft Copilot:
- Bing-indexed sources preferred
- Bing Webmaster Tools submissions help
- Heavy weighting toward Bing's authority signals
Grok:
- Heavy preference for X (Twitter) sources
- Real-time information weighted higher
- Web sources cited but less prominently than X content
- Authority signals less developed than other engines
Google AI Overviews:
- Sources from Google search index
- Strong correlation with featured snippet/top-ranked pages
- Knowledge Graph entities preferred
- Limited new mechanism — fundamentally an evolution of Google search
Optimize for the engines that drive your audience.
7. Tracking AI Citation Status
Continuous tracking is essential because AI engine behavior changes rapidly.
7.1 Manual Tracking Protocol
For solo practitioners or small teams without budget for tracking tools:
- Maintain a query set document (10-30 queries the site should be cited for)
- Quarterly: test each query in each engine
- Document citation status in a tracking spreadsheet
- Note changes from previous quarter
- Investigate regressions
7.2 Tool-Based Tracking
For ongoing tracking:
-
Profound (
tryprofound.com) — comprehensive AI citation tracking - Otterly.ai — tracks AI engine mentions
- Athena HQ — AI search analytics
- BrightEdge AI Catalyst — enterprise AI search tracking
- Semrush AI Toolkit — AI Overview tracking
These tools automate query testing across engines and report citation status over time.
7.3 Server Log Analysis
Server logs show AI bot activity. Track:
- Which AI bots visit and how often
- Which pages they visit most
- Whether visits correlate with visible citation activity
- Whether bot access patterns change over time
# Monthly AI bot activity summary
awk '/GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider|anthropic-ai|CCBot|Applebot-Extended/' \
/var/log/nginx/access.log.1 | \
awk '{print $1, $11}' | \
sed 's/.*\(GPTBot\|PerplexityBot\|ClaudeBot\|Google-Extended\|Bytespider\).*/\1/' | \
sort | uniq -c | sort -rn
7.4 Citation Change Investigation
When citation status changes:
Sudden gain: Document what changed. Was new content published? Was an external citation acquired? Did a Wikipedia article appear? Understanding the trigger informs strategy.
Sudden loss: Investigate. Was content removed or moved? Did robots.txt change? Did dateModified pattern change? Did competitors gain stronger authority?
Engine-specific change: Check engine's recent product announcements. Engine ranking algorithm updates affect citation patterns.
8. Common Mistakes & Anti-Patterns
8.1 Blocking AI Bots in robots.txt Without Strategy
Anti-pattern: Blanket-blocking all AI bots out of vague concern about content training.
Why it fails: Eliminates citation possibility entirely. Lose visibility, lose traffic, lose authority compounding.
Fix: Permissive default. Block specifically only if business reason is concrete (licensed content, regulatory requirement, etc.).
8.2 No llms.txt File
Anti-pattern: Site has substantial content but no curated guide for AI engines.
Why it fails: AI engines have to figure out the site from raw crawl. Content prioritization left to engine's discretion.
Fix: Build comprehensive llms.txt directing engines to canonical content.
8.3 Padded Content with Low Factual Density
Anti-pattern: Articles padded with opinion, generality, throat-clearing — but few specific facts.
Why it fails: AI engines prefer factually dense content. Padded content gets passed over.
Fix: Cut padding. Pack specific, citable facts. If you don't have facts, you don't have an article.
8.4 Buried Answers
Anti-pattern: The answer to the page's primary question is in section 4 of an 8-section article.
Why it fails: AI engines extract from early content. Buried answers don't get extracted.
Fix: Direct answers in opening paragraphs. Elaboration follows.
8.5 No Knowledge Graph Presence
Anti-pattern: Site has good content but no Wikidata entry, no Knowledge Panel, no entity authority infrastructure.
Why it fails: AI engines weight Knowledge Graph presence heavily. Sites without it cited less.
Fix: Build Knowledge Graph foundation per framework-knowledgegraph.md.
8.6 Fake Date Refresh
Anti-pattern: Updating dateModified without substantive content updates to look fresh.
Why it fails: AI engines (and Google) increasingly detect fake refresh. Trust damage.
Fix: Refresh only on substantive updates. Use changelog to demonstrate genuine refresh history.
8.7 Generic Author Bylines
Anti-pattern: "By the editorial team" or "By staff writer" or no byline at all.
Why it fails: AI engines, particularly for YMYL content, weight credentialed authorship heavily. Anonymous content cited less.
Fix: Real authors with real credentials and Person schema.
8.8 No Schema or Minimal Schema
Anti-pattern: Site relies entirely on HTML structure with no JSON-LD schema.
Why it fails: Schema gives AI engines structured metadata for extraction. Missing it means engines work harder to understand the page.
Fix: Comprehensive schema per page type per framework-eeat.md and framework-entitysalience.md.
8.9 Content Behind JavaScript Walls
Anti-pattern: Content rendered entirely by JavaScript without server-side rendering or pre-rendering.
Why it fails: Many AI bots don't execute JavaScript. Content not rendered before delivery isn't readable.
Fix: Server-side rendering, static generation, or hybrid approaches that deliver content in HTML.
8.10 Information Gain Absent
Anti-pattern: Pages aggregate and rephrase existing content with no original contribution.
Why it fails: AI engines synthesizing from sources prefer sources adding novel information. Aggregator content gets passed over.
Fix: Information Gain per article per framework-infogain.md.
8.11 Engine-Agnostic Approach
Anti-pattern: Treating all AI engines the same. Optimizing identically for ChatGPT and Grok.
Why it fails: Engines weight signals differently. Maximum citation requires engine-aware optimization.
Fix: Track per-engine status. Optimize patterns where engine signals diverge.
8.12 No Tracking, No Learning
Anti-pattern: Hoping for AI citations without tracking whether they're happening.
Why it fails: Without data, no improvement loop. Strategies persist that don't work.
Fix: Tracked query set. Quarterly assessment minimum. Tools where budget allows.
9. Stack-Specific Notes
9.1 WordPress
- Yoast or Rank Math handles much of the schema
- Custom fields for AI-specific metadata: factual summary, key facts, definitional content
- WP plugin for llms.txt generation
- Editorial workflow includes AI citation optimization checklist
9.2 Next.js / Astro / Hugo
- Static generation ensures content is in HTML for bots
- Structured frontmatter requires AI-relevant metadata
- llms.txt generated at build time from content collections
- Schema generators automate JSON-LD creation
9.3 Universal
- Pre-publish checklist includes AI citation patterns
- Top-of-article factual summary required
- Schema validation in CI/CD
- llms.txt regenerated on content changes
- Bot access verified post-deploy
10. Cross-Reference to the 14-Tier Framework
AI Citation implementation touches:
- Tier 3 LLMO — LLM Optimization is foundational AI citation work
- Tier 3 SGA — SearchGPT Optimization
- Tier 3 GEO — Generative Engine Optimization
- Tier 3 AEO — Answer Engine Optimization
- Tier 3 BLF — Bot/LLM File optimization (llms.txt)
- Tier 3 AIO — Direct AI optimization
- Tier 3 EEO — Entity Engine Optimization
- Tier 3 KGO — Knowledge Graph (foundational for AI)
The Tier 3 (AI Domination) cluster of the 14-tier framework is essentially a tactical implementation of this framework's principles. AI Citation is the strategic frame; Tier 3 items are the specific deliverables.
11. Audit Mode
11.1 Per-Engine Citation Audit
For tracked query set, score per engine:
| Engine | Queries Cited For | % of Tracked Set | Citation Quality |
|---|---|---|---|
| ChatGPT | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} |
| Perplexity | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} |
| Claude | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} |
| Gemini | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} |
| Copilot | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} |
| Grok | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} |
| AI Overview | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} |
11.2 Per-Page AI Optimization Audit
For sample pages, score:
| # | Criterion | Pass/Fail |
|---|---|---|
| AI1 | Top-of-article factual summary present | |
| AI2 | First paragraph delivers substantive answer | |
| AI3 | High factual density throughout | |
| AI4 | Q&A structure used where applicable | |
| AI5 | Key facts callouts present | |
| AI6 | Definitional clarity for terms | |
| AI7 | Source citations linked inline | |
| AI8 | Article schema with all major properties | |
| AI9 | mainEntity declared with sameAs | |
| AI10 | dateModified is genuine (not faked) | |
| AI11 | Author with credentials (Person schema) | |
| AI12 | FAQPage schema if Q&A content |
Per-page score: 12. World-class AI citation page: 11+/12.
11.3 Site-Wide AI Citation Audit
| # | Criterion | Pass/Fail |
|---|---|---|
| AIS1 | robots.txt allows all major AI bots | |
| AIS2 | AI bot visits verified in server logs | |
| AIS3 | llms.txt file present at /llms.txt | |
| AIS4 | llms-full.txt present (optional but valuable) | |
| AIS5 | Knowledge Graph foundation established | |
| AIS6 | E-E-A-T score high (110+/130) | |
| AIS7 | Entity Salience strong on primary topics | |
| AIS8 | Information Gain demonstrable | |
| AIS9 | YMYL standards met if applicable | |
| AIS10 | Tracking infrastructure in place | |
| AIS11 | Per-engine citation status known | |
| AIS12 | Quarterly audit cadence active |
Site score: 12. World-class AI citation site: 11+/12.
12. Maintenance Schedule
12.1 Weekly
- Verify AI bot visits in server logs
- Spot-check 1-2 tracked queries on primary engines
- Monitor for AI engine product announcements affecting citation behavior
12.2 Monthly
- Update llms.txt if site content has expanded
- Review newly published content for AI citation patterns
- Test 5 tracked queries across all engines
12.3 Quarterly
- Full tracked query set test across all engines
- Document citation status changes from previous quarter
- Investigate gains and losses
- Audit AI bot access in robots.txt
- Refresh time-sensitive content
- Check schema validation across primary pages
12.4 Annually
- Comprehensive AI Citation framework audit
- Strategic review of citation status across all engines
- Update tracked query set to reflect current authority
- Review engine-specific signal weights based on year's observed patterns
- Update llms.txt comprehensively
- Update Knowledge Graph entries with year's developments
12.5 On Major AI Engine Updates
When OpenAI, Perplexity, Anthropic, Google, etc. announce major updates:
- Read the announcement in detail
- Identify changes affecting citation behavior
- Test citation status on affected query types
- Adjust optimization patterns if engine signal weights shifted
- Update this framework document if patterns suggest framework gaps
13. Implementation/Audit Report Templates
13.1 AI Citation Implementation Report Template
# AI Citation Framework Implementation Report
**Site**: {{BUSINESS_NAME}}
**Implementation Date**: {{TODAY}}
## Summary
- Pages with AI citation patterns installed: {{COUNT}}
- robots.txt configured for AI bots: {{STATUS}}
- llms.txt created: {{STATUS}}
- llms-full.txt created: {{STATUS}}
- Knowledge Graph foundation: {{STATUS}}
## AI Citation Patterns Installed
- Top-of-article factual summaries: {{COUNT}}
- Q&A structures: {{COUNT}}
- Key facts callouts: {{COUNT}}
- FAQPage schemas: {{COUNT}}
- Article schemas with citation properties: {{COUNT}}
## Bot Access Verification
- GPTBot visits in last 30 days: {{COUNT}}
- PerplexityBot visits: {{COUNT}}
- ClaudeBot visits: {{COUNT}}
- Google-Extended visits: {{COUNT}}
- CCBot visits: {{COUNT}}
## Tracking Infrastructure
- Tracked query set: {{COUNT}} queries
- Tool: {{TOOL_NAME}}
- Baseline citation status documented: {{YES/NO}}
## Sign-Off
13.2 AI Citation Audit Report Template
# AI Citation Framework Audit Report
**Site**: {{BUSINESS_NAME}}
**Audit Date**: {{TODAY}}
## Executive Summary
{{ONE_PARAGRAPH_ASSESSMENT}}
## Site-Wide AI Citation Score
{{X}}/12
## Per-Engine Citation Status
{{TABLE_OF_ENGINES_AND_CITATION_FREQUENCIES}}
## Per-Page Audit (Sample)
{{TABLE_OF_SAMPLED_PAGES_WITH_AI_CITATION_SCORES}}
## Foundation Framework Status
- E-E-A-T: {{SCORE}}/130
- Knowledge Graph: {{STATUS}}
- Entity Salience: {{STATUS}}
- Information Gain: {{STATUS}}
- YMYL: {{STATUS}}
## Bot Access Status
{{ROBOTS_TXT_AND_LOG_FINDINGS}}
## llms.txt Status
{{ASSESSMENT}}
## Critical Failures
{{LIST_WITH_REMEDIATION}}
## Engine-Specific Findings
- ChatGPT: {{FINDINGS}}
- Perplexity: {{FINDINGS}}
- Claude: {{FINDINGS}}
- Gemini: {{FINDINGS}}
- Copilot: {{FINDINGS}}
- Grok: {{FINDINGS}}
- AI Overview: {{FINDINGS}}
## Recommended Remediation Order
{{PRIORITIZED_LIST}}
## Tracked Query Status Trend
{{COMPARISON_TO_PREVIOUS_AUDITS}}
## Sign-Off
End of Framework Document
Document version: 1.0
Last updated: 2026-04-29
Maintained by: ThatDeveloperGuy
AI Citation is the new center of gravity for web visibility. Sites that earn citations across multiple AI engines compound authority across the entire AI search ecosystem. Sites that don't optimize for AI citation lose visibility as user behavior shifts toward AI-first information seeking.
The work is structural and methodical. Allow bots. Build the llms.txt. Engineer content for extractability. Establish the Knowledge Graph foundation. Demonstrate Information Gain. Maintain freshness. Track citation status. Iterate.
The frameworks in this library — E-E-A-T, YMYL, HCS, SQRG, Core Updates, Information Gain, Entity Salience, Knowledge Graph, AI Citations — converge on one operational truth: build a site that genuinely deserves to be cited as an authority on the topics it covers. Every framework approaches that truth from a different angle. Together they specify what "deserving" means in 2026 search and AI ecosystem.
Companion documents:
-
framework-eeat.md— Foundational E-E-A-T -
framework-ymyl.md— Your Money or Your Life elevated standards -
framework-hcs.md— Helpful Content System -
framework-sqrg.md— Search Quality Rater Guidelines -
framework-coreupdates.md— Google Core Updates -
framework-infogain.md— Information Gain -
framework-entitysalience.md— Entity Salience -
framework-knowledgegraph.md— Knowledge Graph
About this framework library
This article is the Dev.to republish of a framework reference document from ThatDevPro's SEO + AI engineering library. Canonical source: https://www.thatdevpro.com/insights/framework-aicitations/
ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio operating from Cassville, Missouri. The studio runs the full 14-tier Engine Optimization stack and ships open-source tooling for AI citation engineering.
Companion 14-tier Engine Optimization stack (each tier is its own article):
- Tier 1 — Foundation
- Tier 2 — Search Visibility
- Tier 3 — AI Domination
- Tier 4 — Entity and Authority
- Tier 5 — Local Domination
- Tier 6 — Content and Multimedia
- Tier 7 — Social and Community
- Tier 8 — Data, Analytics, Conversion
- Tier 9 — Monitoring and Intelligence
- Tier 10 — Workflow and Operations
- Tier 11 — Marketplace and Retail
- Tier 12 — International
- Tier 14 — Advanced and Immersive
Need this framework implemented on your site? See the Engine Optimization service or hire through ThatDevPro contact.
Top comments (0)