DEV Community

Cover image for Answer Engine Optimization (AEO): 6 Factors for AI Citations
Chudi Nnorukam
Chudi Nnorukam

Posted on • Edited on • Originally published at chudi.dev

Answer Engine Optimization (AEO): 6 Factors for AI Citations

Originally published at chudi.dev


My site ranked on Google. I had schema, fast load times, and decent backlinks. Then I searched my own main topic in Perplexity. My site wasn't cited once. My competitors, including one with a domain rating under 10, were showing up in every answer. I was invisible. That audit changed how I think about content structure entirely.

Answer Engine Optimization (AEO) is the practice of structuring content so AI answer engines can extract, trust, and cite it. It prioritizes crawl access, clear definitions, and machine-readable structure over classic link-based ranking signals. If SEO gets you into search results, AEO gets you into the answer itself.

TL;DR

AEO (Answer Engine Optimization) is optimizing your content to be found and cited by AI search engines like Perplexity, Claude, ChatGPT, and Google's AI Overview. Not traditional Google Search. The core insight: Google ranks pages by popularity; AI engines cite pages by extractability. Those are different problems requiring different solutions.

  • 60% of indie creators' sites are invisible to AI crawlers
  • Google's robots.txt allows AI crawlers by default, but 80% of sites block them anyway
  • Content that ranks on Google doesn't automatically appear in AI search results
  • AEO is simpler than SEO: fewer competitors, clearer rules, higher ROI

What Is AEO and Why Does It Matter?

Answer Engine Optimization is the practice of structuring content so AI systems like Perplexity, ChatGPT, and Claude can extract, trust, and cite it. It matters because ranking on Google no longer guarantees visibility in AI-generated answers. A separate optimization layer is now required for that.

The Problem: You're Invisible to AI

Ranking on Google does not mean you appear in AI-generated answers. Approximately 60% of indie creator sites are entirely blocked to AI crawlers, either by accident or outdated robots.txt defaults. Meanwhile, AI engines like Perplexity are already answering your audience's questions with your competitors' content.

You probably optimized your site for Google Search in 2024. Good job. But Perplexity, Claude, ChatGPT, and Microsoft Copilot are answering questions from your competitors' content instead of yours.

Here's why:

Google vs AI Search

Google Search: "Show me the 10 best pages matching my query"

  • Your meta description and title matter
  • Backlinks prove authority
  • Domain age signals trust

AI Search: "Synthesize an answer from multiple sources, cite them, move on"

  • Your content is extracted, not ranked
  • Meta descriptions are ignored (not shown to users)
  • Titles matter less than content quality
  • Backlinks don't matter at all

AI engines ask: "Is this content accurate, specific, and extractable?"

Google asks: "Is this content popular and authoritative?"

These are not the same thing.


2026 Platform Data: How AI Engines Actually Cite

Perplexity cites roughly 6.6 sources per answer; ChatGPT cites only 2.6. Only 12% of URLs cited by AI engines appear in Google's top 10. These numbers mean your Google ranking strategy and your AI citation strategy are nearly independent problems that require separate solutions.

Research published in early 2026 reveals specific citation behaviors across platforms:

Citation Volume by Platform

Platform Citations Per Answer Primary Source Preference
Perplexity ~6.6 Reddit (46.7% of top sources)
Google Gemini ~6.1 Google top 10 results (76% overlap)
ChatGPT ~2.6 Wikipedia (7.8% of all citations)

Key Findings

  • Only 12% of URLs cited by LLMs appear in Google's top 10. AI citation and Google ranking are largely separate systems, except for Google AI Overviews (76% overlap with traditional results).
  • Pages with dateModified schema receive 1.8x more citations. Freshness signals matter, but only when backed by substantive content updates. Bumping dates without changing content triggers penalty signals.
  • Inline statistics increase citations by 40%+. Pages containing specific numbers, data tables, and original research are significantly more likely to be cited as sources.
  • 95% of ChatGPT citations come from recently published or updated content. Content older than 10 months without updates gets deprioritized.
  • Reddit dominates Perplexity's source pool. If you want Perplexity citations, genuine Reddit engagement with your expertise topics matters more than on-site optimization alone.

Original Benchmark Data: The Visibility-Citation Gap

We ran the AI Visibility Readiness framework on 7 websites and measured both visibility (brand mentioned) and citability (URL linked) across AI platforms. The results quantify the gap between being known and being cited:

Site DA AI Visible AI Cited Gap
ahrefs.com 92 100% 5% 95 pts
citability.dev Under 10 44% 15% 29 pts
chudi.dev 28 25% 0% 25 pts

The finding: domain authority has zero correlation with AI citation rates. citability.dev (DA under 10) achieved 3x the citation rate of Ahrefs (DA 92). The differentiator was original benchmark data and answer-first content structure, not backlinks. Full audit methodology and results in I Audited 7 Websites for AI Citability.

These numbers reframe the AEO strategy: platform-specific optimization outperforms universal approaches. You can now audit your own site's AI readiness with tools like citability.dev.


How Do AI Engines Decide What Content to Cite?

AI engines prioritize content that is crawlable, clearly structured, and directly answers a specific question. Unlike Google, which weighs backlinks and engagement, AI systems evaluate accuracy, extractability, and metadata completeness. Technical access and content format are the primary ranking levers.

The 6 AEO Factors

The six factors that determine whether AI engines cite your content are: crawler access, llms.txt, structured data schema, content extractability, metadata completeness, and answer-ready format. Unlike SEO's 200+ signals, these six cover the full decision stack AI engines use to select and extract sources.

If SEO has 200+ ranking factors, AEO has 6 critical ones:

1. AI Crawler Access

First, your site needs to be crawlable by AI bots. Google's robots.txt documentation covers the standard. AI crawlers follow the same protocol using their own user-agent strings. Check your robots.txt:

User-agent: *
Disallow: /admin
Disallow: /private

# AI Crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Googlebot-Extended
Allow: /
Enter fullscreen mode Exit fullscreen mode

The stat: 60% of sites either:

  • Block all crawlers with Disallow: /
  • Use X-Robots-Tag: noai headers
  • Never heard of this and rely on old robots.txt defaults

If you block AI crawlers, you're invisible.

2. llms.txt (Robots.txt for AI)

You know about robots.txt. Now there's llms.txt. I wrote a full implementation guide in llms.txt: Robots.txt for AI Crawlers.

llms.txt is a human-readable file that tells AI crawlers what to index and how to cite you. It should live at yoursite.com/llms.txt:

# Our content policy for LLMs
All content on this site is available for training and search.
Please credit sources as: [Article Title] by [Author Name] (yoursite.com)

Sitemap: https://yoursite.com/sitemap.xml
RSS: https://yoursite.com/rss.xml
Enter fullscreen mode Exit fullscreen mode

Why it matters: Without llms.txt, AI engines might skip your site or misattribute content. With it, you're explicitly inviting them in and setting citation rules.

3. Structured Data

AI engines parse JSON-LD schemas. If your content has:

Content without schema is harder for AI to structure. <p>Learn how to build a SaaS</p> is vague. Schema says {"@type": "HowTo", "step": [...]}. That's machine-readable.

4. Content Extractability

AI engines don't need your layout. They need your text. This means:

  • Semantic HTML: Use <article>, <section>, proper <h1><h2><h3> hierarchy
  • No text in images: AI can't read screenshots. Use actual text + <img alt="...">
  • Lists > paragraphs: Bullet points are easier to extract than walls of text
  • Scannable structure: Headers every 2-3 paragraphs

A 2,000-word article with 3 headers is harder to cite than a 2,000-word article with 15 headers. AI needs to find the specific section that answers the user's question.

5. Metadata Completeness

Even though AI ignores <meta description>, it checks:

  • Canonical URL (to avoid duplicate content)
  • Open Graph image (for preview context)
  • Article datePublished (to know how old your content is)

Stale content (3+ years old) gets deprioritized. Fresh content gets cited more often.

6. Answer-Ready Format

The best AEO content directly answers common questions:

  • "What is X?" → Definition box at the top
  • "How do I X?" → Step-by-step with numbered lists
  • "Why does X matter?" → Clear benefits, quantified where possible

Content written as "walls of paragraphs" is less likely to be extracted. Content structured as "question → answer → proof" is gold. I built my autonomous blog agent around this principle. Every draft follows the question-answer-proof structure automatically.


AEO vs SEO vs GEO: Three Optimization Disciplines

The search landscape now has three distinct optimization disciplines. SEO targets traditional search rankings, AEO targets AI answer citations, and GEO (Generative Engine Optimization) targets how generative AI models represent your brand and content in synthesized responses. They overlap but reward different signals.

Factor SEO (Google) AEO (AI Engines) GEO (Generative AI)
Goal Rank in search results Get cited in AI answers Shape AI's representation of your brand
Crawling robots.txt robots.txt + llms.txt robots.txt + llms.txt + ai.txt
Ranking Backlinks + engagement Content accuracy + structure Entity clarity + topical authority
Signal count 200+ factors ~6 critical factors Emerging, entity-centric
Meta descriptions Shown to users Ignored Ignored
Title tags Shown to users Used for context Used for entity recognition
Content format Long-form preferred Any length, needs structure Definition-dense, claim-backed
Freshness Can rank for years Deprioritized after ~10 months Training data snapshots, slower refresh
Citations Implied (link juice) Explicit (source linked in answer) Implicit (model "knows" you)
Key differentiator Popularity Extractability Entity recognition

When to prioritize which

  • SEO first if your revenue depends on organic click-through traffic today.
  • AEO first if you produce reference content (definitions, tutorials, comparisons) that AI engines synthesize into answers.
  • GEO first if brand perception matters more than individual page traffic. You want AI to describe your company correctly when users ask about your category.

Most sites benefit from all three, and the technical foundations overlap heavily. The good news: structured data, semantic HTML, clear headings, and answer-first formatting improve all three simultaneously. The differences are in emphasis, not in contradiction.

The opportunity: A well-structured blog post with schema and semantic HTML will rank on Google, appear in AI answers, and inform how generative models represent your expertise. The cost of doing all three is marginally higher than doing one.


How Do You Start Optimizing for Answer Engines?

Start by confirming AI crawlers can access your site via robots.txt, then create an llms.txt file at your site root. Audit your top five pages to add schema markup, restructure them with more headers, and rewrite openings to directly answer the question in the title. Those four actions deliver the most AEO impact.

How to Start with AEO

Four actions deliver 80% of your AEO impact: unblock AI crawlers in robots.txt, create a two-paragraph llms.txt at your site root, add FAQPage schema to your three best posts, and rewrite each post's opening paragraph to directly answer the title question. You can finish all four in under an hour.

Step 1: Check if AI Can Find You

# Can Perplexity, Claude, etc. access your site?
curl -I yoursite.com/robots.txt
# Look for GPTBot, ClaudeBot, PerplexityBot allow rules
Enter fullscreen mode Exit fullscreen mode

Step 2: Create /llms.txt

Add this file to your site root:

# Content policy for LLMs
All content available for training and search.
Please attribute as: [Article] by [Author] (yoursite.com)

Sitemap: https://yoursite.com/sitemap.xml
RSS: https://yoursite.com/rss.xml
Enter fullscreen mode Exit fullscreen mode

Step 3: Audit Your Best Content

Pick your 5 best-performing pages and:

  • Add schema (BlogPosting, HowTo, FAQ)
  • Restructure with more headers
  • Move key info to the top
  • Add a definition box for the main question

Step 4: Monitor in Perplexity

Search your main topics in Perplexity. Are you being cited? If not, your content isn't being discovered.


Platform-by-Platform AEO Strategy

Each AI engine has distinct citation behavior and source preferences. Perplexity favors Reddit and reference-style content; ChatGPT is highly selective and freshness-sensitive; Google AI Overview mirrors traditional search rankings; Claude rewards structured technical writing. A platform-specific approach compounds your overall citation rate significantly beyond what generic AEO delivers.

Perplexity

Perplexity cites the most sources per answer (~6.6) and favors Reddit heavily (46.7% of top sources). To optimize for Perplexity:

  • Participate in Reddit discussions about your expertise topics. Perplexity's retrieval pipeline weights Reddit as a high-trust source for niche questions. Genuine contributions with links to your detailed writeups outperform any on-site optimization alone.
  • Structure content as reference material. Perplexity favors pages that read like authoritative references: tables, definitions, numbered lists, and comparative data. Pure opinion pieces get cited less.
  • Keep content updated. Perplexity's real-time retrieval means recently updated pages get priority. Pages with dateModified schema that reflect genuine content updates outperform stale pages.
  • Include original data. Benchmark results, survey findings, and original research get cited at higher rates because they provide information Perplexity can't synthesize from other sources.

ChatGPT (SearchGPT / Browse)

ChatGPT is the most selective citer (~2.6 sources per answer) and leans heavily on Wikipedia (7.8% of all citations). To optimize:

  • Answer the exact question in paragraph one. ChatGPT extracts more aggressively than other platforms. If your answer is in paragraph four, it may not get extracted at all.
  • Target Wikipedia-adjacent queries. ChatGPT cites Wikipedia for broad topics but turns to specialized sources for niche questions. The sweet spot is questions too specific for Wikipedia but too authoritative for forums.
  • Prioritize freshness. 95% of ChatGPT citations come from recently published or updated content. A page that hasn't been updated in 10+ months is effectively invisible to ChatGPT's citation pipeline.
  • Use canonical URLs consistently. ChatGPT deduplicates aggressively. If your content appears at multiple URLs (www vs non-www, trailing slashes, paginated versions), citations may be split or lost.

Google AI Overview

Google AI Overview draws 76% of its citations from the traditional top 10 results. This makes it the most SEO-correlated AI citation source.

  • Rank in Google first. Unlike other AI engines, Google AI Overview primarily cites pages that already rank well in traditional search. AEO-only optimization without SEO foundations won't work here.
  • Add FAQPage schema. Google AI Overview preferentially extracts from pages with structured FAQ markup. The question-answer format maps directly to how Overview constructs its responses.
  • Target featured snippet queries. Queries that currently trigger featured snippets are the most likely to trigger AI Overview. If you can win the snippet, you're positioned for the Overview citation.

Claude

Claude doesn't have a live search product with citations in the same way, but its training data preferences and retrieval behaviors are worth understanding:

  • Maintain an llms.txt file. Claude's parent company (Anthropic) respects llms.txt as a content policy signal. Having one is a positive signal for ClaudeBot crawling.
  • Provide structured, well-organized technical content. Claude's training pipeline favors content with clear hierarchical structure, code examples with context, and explicit reasoning chains.
  • Avoid content that reads like AI-generated filler. Claude's quality filters are sensitive to low-information-density content. Dense, opinionated, experience-backed writing gets weighted higher than generic explainers.

Common AEO Mistakes

The six mistakes that most often keep well-written content invisible to AI engines are: blocked crawlers, buried answers, image-only key information, assuming Google rank equals AI visibility, bumping date metadata without changing content, and skipping FAQ schema. Each is fixable in under an hour once you know to look for it.

Mistake 1: Blocking AI crawlers unintentionally

The most common AEO failure is a robots.txt that blocks crawlers the site owner doesn't know about. A blanket Disallow: / for unlisted user agents, or a hosting platform that adds X-Robots-Tag: noai by default, silently makes your entire site invisible. Run curl -I yoursite.com/robots.txt and check what it actually says. Don't assume.

Mistake 2: Burying the answer

SEO rewards content that keeps users scrolling: long intros, context-setting, narrative buildup. AEO rewards the opposite. If your page title is "What is Answer Engine Optimization?" and the definition doesn't appear until paragraph three, AI engines may extract from a competitor who puts it in paragraph one. Move the answer up. Context can follow.

Mistake 3: Using images instead of text for key information

Infographics, diagrams, and screenshots are invisible to AI extraction. If your comparison table is an image, AI engines can't read it. If your step-by-step tutorial is a screenshot of a terminal, AI can't extract the commands. Use real HTML tables, real code blocks, and real text. Add alt attributes to images, but don't rely on alt for primary content.

Mistake 4: Assuming Google ranking equals AI visibility

This is the most expensive mistake. A page ranking #1 on Google may not be cited by any AI engine. Google and AI engines evaluate different signals: Google weighs backlinks and click-through rate; AI engines weigh extractability and structural clarity. The only AI engine with strong Google correlation is Google AI Overview (76% overlap). For Perplexity, ChatGPT, and Claude, your Google rank is largely irrelevant.

Mistake 5: Updating dateModified without changing content

Pages with dateModified schema get 1.8x more AI citations, but only if the content actually changed. Bumping the date on unchanged content is detectable (AI engines can diff cached versions) and triggers a trust penalty. When you update dateModified, make substantive changes: new data, expanded sections, corrected claims, or added examples.

Mistake 6: Ignoring FAQ schema

FAQPage schema is the single highest-ROI structured data for AEO. It maps directly to how AI engines construct answers: question in, answer out. A page with five well-structured FAQ entries gives AI engines five separate extraction opportunities. A page without it gives AI engines one: the entire article body, which they then have to parse themselves with lower accuracy.


How Do You Measure AEO Performance Without a Dashboard?

Measure AEO manually each month by querying ChatGPT, Perplexity, Claude, and Gemini with the exact questions your content answers, then checking whether you are cited. Track AI referral traffic in analytics using domain segments, and monitor Google's AI Overview for your target queries to identify citation gaps.

Measuring AEO Progress

Track AEO performance with four monthly checks: a manual citation audit across ChatGPT, Perplexity, Claude, and Gemini; AI referral traffic segments in analytics; an automated infrastructure scan via citability.dev; and a Google AI Overview review for your target queries. No single dashboard does this yet.

The hardest part of AEO is knowing if it's working. Traditional SEO has rankings and impressions in Search Console. AEO doesn't have a dashboard yet.

Manual citation audit (monthly)

Open ChatGPT, Perplexity, Claude, and Gemini. Ask the exact questions your content answers:

  • "What is AEO?"
  • "How do I optimize for Perplexity?"
  • "What is llms.txt?"

Are you cited? If not, who is? Read the content that does get cited and compare it to yours. The differences are usually structural: their definition is in the first paragraph, yours is in paragraph four. Their page has 12 question-format headers, yours has three. These gaps are fixable.

AI referral traffic in analytics

Create a segment for sessions from AI engine domains: chatgpt.com, perplexity.ai, claude.ai, gemini.google.com, copilot.microsoft.com. Track this monthly. Growth here is a leading indicator of citation growth. Direct AI traffic often comes before organic traffic from AI-influenced searches.

Automated infrastructure auditing

Manual citation checks tell you the outcome but not the cause. Infrastructure audits identify the specific technical gaps preventing citations. Tools like citability.dev run 10 automated checks against your site: robots.txt permissions, sitemap completeness, answer-first content structure, content freshness via dateModified schema, JSON-LD coverage, meta descriptions, canonical URLs, HTTPS, heading hierarchy, and social sharing tags. The scan takes under 30 seconds and shows exactly which signals pass and which need fixing.

Google AI Overview tracking

Search your target queries in Chrome incognito. Does Google's AI Overview cite you? If it does, you're in the 2–7% of pages that get sourced for that query. If it doesn't but competitors are cited, run their pages through the AEO checklist. FAQPage schema and answer-first formatting are usually the gap.

The minimum viable AEO setup

If you want to start today with 30 minutes of work:

  1. Add User-agent: GPTBot / Allow: / and similar for ClaudeBot, PerplexityBot to your robots.txt
  2. Create a 200-word llms.txt at your root with your sitemap URL and preferred attribution format
  3. Add FAQPage schema to your three best posts
  4. Rewrite the opening paragraph of each post to directly answer the question in the title

That's it. Nothing else in the AEO checklist will have as much impact as those four actions. Do them before optimizing for any specific engine.

Advanced AEO: Beyond the 6 Factors

After the six core factors are in place, three techniques compound your AI visibility further: publishing an ai.txt policy file, implementing WebMCP so AI agents can query your site programmatically, and stacking entity mentions across Wikidata, GitHub, LinkedIn, and Crunchbase. These are differentiation plays, not baseline requirements.

ai.txt: Declaring your AI interaction policy

While llms.txt tells AI crawlers what content to index and how to cite it, ai.txt goes further. It declares your site's full AI interaction policy including tool use, embedding permissions, and content licensing terms. Place it at yoursite.com/ai.txt alongside your robots.txt and llms.txt.

An ai.txt file signals to AI systems that your site is intentionally designed for AI interaction, not just passively crawlable. This is a differentiator: most sites are accidentally AI-visible (or accidentally invisible). A site with ai.txt is declaring intent, which AI systems can use as a trust signal.

WebMCP: Making your site callable by AI agents

AEO gets your content cited in AI answers. WebMCP gets your content used by AI agents. WebMCP is a browser-side protocol that registers tools AI agents can call (search your site, query your data, interact with your product) directly from the browser context.

This is the frontier of AI-visible web architecture: a site that isn't just readable by AI, but usable by it. When an AI agent needs to find information in your domain, WebMCP lets it query your site programmatically rather than scraping HTML.

Entity stacking: Building your knowledge graph footprint

AI engines identify entities (people, companies, products) by cross-referencing structured data across the web. The more high-authority platforms that contain consistent information about your entity, the more confidently AI engines will cite you.

Key platforms for entity stacking:

  • Wikidata: Add a structured entry for yourself or your company. Wikidata is a primary knowledge source for most AI systems.
  • Crunchbase: For companies and products. Crunchbase data feeds into multiple AI training pipelines.
  • GitHub: Active repositories with clear README files and consistent author attribution.
  • LinkedIn: Detailed profile matching your site's author schema exactly (same name, same title, same description).
  • Product Hunt: For SaaS products. Product Hunt pages are high-authority and frequently cited by AI when discussing tools.

The key is consistency: the same name, same description, same URL across all platforms. Inconsistent entity data confuses AI systems and splits your citation authority across multiple "versions" of you.

Topic hubs: Building topical authority for AI

AI engines evaluate topical authority differently than Google. Google uses backlinks as authority proxies. AI engines use content depth and internal linking density. A site with one article about AEO is a mention. A site with ten interconnected articles about AEO (definitions, tutorials, case studies, comparisons, tooling) is a topical authority.

Build topic hubs by:

  1. Writing a pillar article (like this one) that defines the topic comprehensively.
  2. Creating supporting articles that go deep on subtopics: each of the 6 factors, platform-specific guides, case studies, tool comparisons.
  3. Internal linking densely. Every supporting article links to the pillar. The pillar links to every supporting article. AI engines follow these internal link graphs to gauge content depth.
  4. Using consistent terminology. If your pillar calls it "Answer Engine Optimization," every supporting article should use the same phrase, not alternate between "AEO," "AI SEO," and "answer engine marketing."

The Future is Plural Search

Over 30% of searchers now use AI answer engines for complex queries, and that share is growing monthly. AEO is not a replacement for SEO. It is a second optimization layer for a second type of search engine, one that extracts passages directly rather than ranking pages for users to click.

Google won't be the only search engine anymore. By 2026, over 30% of searchers use answer engines for complex queries. You need to be visible in all of them.

AEO isn't replacing SEO. It's extending your reach to a new search engine that's growing fast and underserved.

The technical foundation is the same as good SEO: well-structured, authoritative content with clear headings and direct answers. What changes is the mental model. SEO rewards findability: rank high enough and users click through. AEO rewards extractability: your H2 sections get lifted verbatim into AI responses. A page that ranks #3 on Google but buries its main answer in paragraph five won't get cited by AI even if Google loves it.

Write for systems extracting specific passages, not just readers scanning for reasons to click. Each H2 should be a complete, self-contained answer to the question it poses. Enough context to stand alone if extracted.

The easiest time to optimize for AEO was 2024. The second easiest time is today.

Next: Check out the optimization checklist for AI search.

Sources

Top comments (2)

Collapse
 
bhavin-allinonetools profile image
Bhavin Sheth

This is a really solid breakdown.

I’ve been noticing the same shift — content that ranks well on Google doesn’t automatically get cited in AI answers. Structure matters way more than backlinks now.

The extractability point is key. Adding clear definitions, proper headers, and schema made a visible difference for some of my own pages.

More builders should start thinking beyond just “ranking” and focus on “being quoted.” This is a helpful wake-up call.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.