DEV Community

Cover image for 8 Steps to Get Your Content Cited by Perplexity, ChatGPT, and AI Search
Chudi Nnorukam
Chudi Nnorukam

Posted on • Edited on • Originally published at chudi.dev

8 Steps to Get Your Content Cited by Perplexity, ChatGPT, and AI Search

Originally published at chudi.dev


To optimize for Perplexity, ChatGPT, and Claude, make your site crawlable and format content for extraction. That means allowing AI crawlers in robots.txt, adding llms.txt, and using structured headings plus schema on priority posts. This guide walks through the exact steps.

Quick Wins (Do These Today)

  1. Update robots.txt to allow AI crawlers
  2. Create /llms.txt at site root
  3. Add BlogPosting schema to 10 top articles
  4. Structure one article with 15+ headers instead of 3

That's a 30-minute investment that unlocks visibility in Perplexity, ChatGPT, and Claude.


Step 1: Update robots.txt for AI Crawlers

Your robots.txt is the bouncers list for search crawlers. Most sites have something like:

User-agent: *
Disallow: /admin
Disallow: /api

Sitemap: https://yoursite.com/sitemap.xml
Enter fullscreen mode Exit fullscreen mode

This is Google-focused. AI engines need explicit permission. Update it:

# Allow all standard crawlers
User-agent: *
Disallow: /admin
Disallow: /api
Disallow: /private

# Explicitly allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Googlebot-Extended
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml
Enter fullscreen mode Exit fullscreen mode

Why each one matters:

Bot Owner Used By
GPTBot OpenAI ChatGPT, GPT-5
ClaudeBot Anthropic Claude.ai, API users
PerplexityBot Perplexity Perplexity search
Googlebot-Extended Google Google's AI Overview, SGE
CCBot CommonCrawl Hugging Face, open-source models

Test it: Use this to verify:

curl -I https://yoursite.com/robots.txt
Enter fullscreen mode Exit fullscreen mode

Step 2: Create /llms.txt (New File)

While robots.txt controls access, /llms.txt controls attribution. Create a new file at the root. For a comprehensive deep-dive on this file, see my guide on llms.txt and robots.txt for AI crawlers:

https://yoursite.com/llms.txt

# LLM Content Policy

All articles on this site are available for training, search indexing, and answer generation by LLMs.

## How to attribute our content:

For articles: [Article Title] by [Author Name] (sitename.com)
For data: Link to the specific section
For code: Preserve license headers

## How we'd like to be credited:

If citing multiple articles, link to: https://yoursite.com

## Content discovery:

- Sitemap: https://yoursite.com/sitemap.xml
- RSS: https://yoursite.com/rss.xml
- Blog archive: https://yoursite.com/blog

## Content we don't want indexed:

- Drafts (marked `draft: true`)
- Private tools or dashboards
- Archived content older than 5 years

## Preferred citation format:

[Article Title] — Author Name on yoursite.com

---

Last updated: January 2025
Enter fullscreen mode Exit fullscreen mode

Why it matters: AI engines scan for llms.txt to understand your content policy. Without it, some engines might skip you (too risky). With it, you're explicitly inviting them in.


Step 3: Add Schema.org Structured Data

AI engines parse JSON-LD to understand content structure. This structured approach to content is what enables the extraction and synthesis that powers answer engines. Add this to your blog post template:

For Articles (BlogPosting)

Add this in your page's <head>:


Enter fullscreen mode Exit fullscreen mode

For How-To Content (HowToSchema)

If you're teaching a process:


Enter fullscreen mode Exit fullscreen mode

For FAQ Content (FAQPage)


Enter fullscreen mode Exit fullscreen mode

Verify schema: Use Google's Rich Results Tester to validate.


Step 4: Structure Content for Extraction

AI engines need to find the answer within your content. This means:

Use Headers to Break Up Content

Bad structure:

# Article Title
<p>Long paragraph explaining the concept...</p>
<p>More context...</p>
<p>Finally, the key insight...</p>
Enter fullscreen mode Exit fullscreen mode

Good structure:

# Article Title

## What is X?
<p>Clear definition here.</p>

## Why does X matter?
<p>Benefits and context.</p>

## How to implement X
<p>Steps...</p>

## Common mistakes
<p>What to avoid...</p>

## FAQ
- Q1: Answer
- Q2: Answer
Enter fullscreen mode Exit fullscreen mode

Every 2-3 paragraphs, add a header. This makes it easier for AI to:

  1. Find the specific section answering a user's question
  2. Extract just that section (not the whole article)
  3. Cite the correct part of your content

Use Lists for Dense Information

Instead of:

"To optimize your site, you need to update your robots.txt file, create an llms.txt file, add schema to your articles, and structure your content with headers."

Write:

"To optimize your site:

  1. Update robots.txt for AI crawlers
  2. Create llms.txt at site root
  3. Add schema to articles
  4. Structure with semantic headers"

AI engines can extract lists more reliably than paragraph prose.

Put the Answer at the Top

Don't make readers scroll for the punchline. If your headline is "Why AEO Matters," answer it in the first paragraph:

"AEO matters because 30% of searches now go through AI engines instead of Google. If your content isn't optimized for Perplexity, ChatGPT, and Claude, you're invisible to an entire audience."

Then expand with context, examples, and proof.

Use Definition Boxes

For key concepts, use a highlighted box:

> **Definition:** AEO (Answer Engine Optimization) is optimizing your content to be found, extracted, and cited by AI search engines.
Enter fullscreen mode Exit fullscreen mode

This signals to AI engines: "This is important context."

Include Tables and Structured Data

Tabular data is easier for AI to extract:

Factor SEO AEO
Ranking Backlinks Content structure
Speed Important Less important

Don't just describe in paragraphs. Use tables.


Step 5: Monitor Visibility in AI Engines

Search Your Topics in Perplexity

Go to perplexity.ai and search your main keyword. Do you see your content cited?

If yes ✅ → Your content is discoverable
If no ❌ → You need to audit (usually a robots.txt or freshness issue)

Check ChatGPT Search

OpenAI's ChatGPT now searches the web. Search your site name + keyword. Does your content appear?

Use Perplexity Citation Tracking

When your content is cited, you'll see traffic from perplexity.com in your analytics. Track this growth.


Complete Optimization Checklist

  • [ ] robots.txt allows GPTBot, ClaudeBot, PerplexityBot
  • [ ] /llms.txt exists at site root
  • [ ] Top 10 articles have BlogPosting schema
  • [ ] How-to content has HowToSchema
  • [ ] FAQ content has FAQPageSchema
  • [ ] Content has 10+ headers (not 3)
  • [ ] Key answers in first paragraph
  • [ ] Important data in tables, not paragraphs
  • [ ] Meta descriptions under 160 chars (Google habit)
  • [ ] Images have descriptive alt text
  • [ ] No critical content in images only
  • [ ] Content updated in last 12 months (fresh signal)

When Your Optimization Isn't Working

If you've done the five steps and still don't see citations in Perplexity or ChatGPT after 4-6 weeks, run this diagnostic.

Check if AI bots are reaching your pages. Look in your server logs or analytics for user-agent strings containing GPTBot, ClaudeBot, or PerplexityBot. If you see no traffic from these bots, there's a crawl access issue—either robots.txt is still blocking them, or your hosting provider's firewall doesn't recognize these newer bot names.

Verify your canonical is consistent. AI systems avoid citing content with conflicting canonical signals. If you're cross-posting to Dev.to, Medium, or LinkedIn, make sure each cross-post has your original URL in the canonical field—not the platform's default. One conflicting canonical can suppress citations across all AI engines.

Check for duplicate content. AI engines, like Google, downweight duplicate content. If you have two posts targeting the same query, consolidate them. Two 800-word posts on the same topic compete with each other and dilute citation chances compared to one 1,600-word authoritative post.

Test your schema. Run your top 3 posts through Google's Rich Results Test. If schema errors appear, fix them—AI engines use the same structured data signals to understand content type.

Most optimization failures trace back to one of these four issues. The good news: all four are diagnosable and fixable in an afternoon.

The Advantage

Here's the thing: most creators haven't even heard of AEO yet.

You just did these 5 steps. Your competitors haven't. This gives you a 6-12 month window where your content will be cited more often in AI answers, driving visibility and traffic.

This is the opposite of SEO, where the first-mover advantage is gone. AEO is still day one. If you want to go deeper into answer engine optimization as a comprehensive strategy, check out my detailed AEO guide.

Next: Set up an audit of your top 20 pages using SEOAuditLite to see your AEO readiness score.

Top comments (0)