<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nikhil Goyal</title>
    <description>The latest articles on DEV Community by Nikhil Goyal (@nikhilgoyal).</description>
    <link>https://dev.to/nikhilgoyal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F361448%2F880dc0bb-5b1e-4df7-b7dd-f1bbd934ddae.jpeg</url>
      <title>DEV Community: Nikhil Goyal</title>
      <link>https://dev.to/nikhilgoyal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nikhilgoyal"/>
    <language>en</language>
    <item>
      <title>The robots.txt Mistake That's Killing Your AI Search Visibility</title>
      <dc:creator>Nikhil Goyal</dc:creator>
      <pubDate>Tue, 31 Mar 2026 11:45:34 +0000</pubDate>
      <link>https://dev.to/nikhilgoyal/the-robotstxt-mistake-thats-killing-your-ai-search-visibility-1dlc</link>
      <guid>https://dev.to/nikhilgoyal/the-robotstxt-mistake-thats-killing-your-ai-search-visibility-1dlc</guid>
      <description>&lt;p&gt;There's a good chance your website is invisible to ChatGPT, Perplexity, and every other AI search engine — and the fix takes about 2 minutes.&lt;/p&gt;

&lt;p&gt;I've been auditing sites for AI readability for the past year, and the single most common issue I find isn't bad content or missing schema. It's &lt;code&gt;robots.txt&lt;/code&gt; blocking AI crawlers entirely. The site owner has no idea. They're optimizing content, writing FAQ pages, adding structured data — and none of it matters because the front door is locked.&lt;/p&gt;

&lt;p&gt;Here's how to check yours and fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 30-second check
&lt;/h2&gt;

&lt;p&gt;Run this right now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://yoursite.com/robots.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now look for any of these bot names in &lt;code&gt;Disallow&lt;/code&gt; rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GPTBot&lt;/code&gt; — OpenAI's crawler (powers ChatGPT citations)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;OAI-SearchBot&lt;/code&gt; — OpenAI's search indexer (powers ChatGPT search)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ChatGPT-User&lt;/code&gt; — fetches pages when a ChatGPT user asks for live info&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ClaudeBot&lt;/code&gt; — Anthropic's training crawler&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Claude-SearchBot&lt;/code&gt; — Anthropic's search indexer (powers Claude's web search)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PerplexityBot&lt;/code&gt; — Perplexity's search crawler&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Google-Extended&lt;/code&gt; — Google's AI training crawler (feeds Gemini and AI Overviews)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Applebot-Extended&lt;/code&gt; — Apple's AI training crawler (feeds Apple Intelligence)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or just grep for it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://yoursite.com/robots.txt | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-iE&lt;/span&gt; &lt;span class="s2"&gt;"gptbot|oai-searchbot|chatgpt-user|claudebot|claude-searchbot|perplexitybot|google-extended|applebot-extended"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see &lt;code&gt;Disallow: /&lt;/code&gt; next to any of those, that crawler can't see your site.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this happens more than you'd think
&lt;/h2&gt;

&lt;p&gt;In about 4 out of 10 sites I audit, at least one major AI crawler is blocked. Here's how it happens:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The wildcard block
&lt;/h3&gt;

&lt;p&gt;The most common culprit. Someone added this years ago and forgot about it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: *
&lt;span class="n"&gt;Disallow&lt;/span&gt;: /
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This blocks everything — Googlebot, AI crawlers, all of it. Sometimes it was intentional for a staging site and got copied to production. Sometimes it's a CMS default that nobody changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. WordPress security plugins
&lt;/h3&gt;

&lt;p&gt;Plugins like Wordfence, Sucuri, and All In One Security sometimes add bot-blocking rules automatically. I've seen configs that specifically block &lt;code&gt;GPTBot&lt;/code&gt; and &lt;code&gt;ClaudeBot&lt;/code&gt; because they were categorized as "scrapers" in early 2024 when AI crawling was more controversial.&lt;/p&gt;

&lt;p&gt;Check your security plugin settings — some have an "AI bot blocking" toggle that's enabled by default.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The copy-paste robots.txt
&lt;/h3&gt;

&lt;p&gt;A lot of robots.txt files in the wild were copied from blog posts written in 2023-2024, when the default recommendation was to block AI crawlers to "protect your content." The landscape has shifted. If your goal is visibility, those rules are now working against you.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. CDN or hosting-level blocks
&lt;/h3&gt;

&lt;p&gt;Cloudflare, Vercel, and other platforms offer bot management settings. Some templates or one-click security configs block AI user agents at the infrastructure level, before &lt;code&gt;robots.txt&lt;/code&gt; even gets read. If your robots.txt looks clean but AI crawlers still aren't hitting your server logs, check your CDN or hosting settings.&lt;/p&gt;




&lt;h2&gt;
  
  
  The distinction I wish someone had explained to me earlier
&lt;/h2&gt;

&lt;p&gt;When I first started looking into this, I treated all AI crawlers the same. That was a mistake. They fall into two very different categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training bots&lt;/strong&gt; scrape your content to train AI models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GPTBot&lt;/code&gt; (OpenAI)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ClaudeBot&lt;/code&gt; (Anthropic)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Google-Extended&lt;/code&gt; (Google)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Applebot-Extended&lt;/code&gt; (Apple)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Bytespider&lt;/code&gt; (ByteDance)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CCBot&lt;/code&gt; (Common Crawl)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Search bots&lt;/strong&gt; fetch your pages in real time to answer user queries and cite you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;OAI-SearchBot&lt;/code&gt; (OpenAI — powers ChatGPT search results)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ChatGPT-User&lt;/code&gt; (OpenAI — fetches pages during live conversations)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Claude-SearchBot&lt;/code&gt; (Anthropic — powers Claude's web search)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PerplexityBot&lt;/code&gt; (Perplexity — indexes for AI search)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The distinction matters. If you block the search bots, you won't get cited when someone asks ChatGPT or Perplexity for a recommendation in your space. That's live traffic you're turning away.&lt;/p&gt;

&lt;p&gt;Training bots are a different calculation. Some site owners are comfortable contributing to model training; others aren't. That's a legitimate choice. But blocking training bots doesn't necessarily remove you from AI answers — models are already trained on historical data, and search bots work independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  A robots.txt that works for AI visibility
&lt;/h2&gt;

&lt;p&gt;Here's what I recommend as a starting point. It allows all search-related AI bots while giving you explicit control over training bots:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;# Search engines
&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;Googlebot&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;Bingbot&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="c"&gt;# AI search bots — allow these for AI citation visibility
&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;OAI&lt;/span&gt;-&lt;span class="n"&gt;SearchBot&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;ChatGPT&lt;/span&gt;-&lt;span class="n"&gt;User&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;Claude&lt;/span&gt;-&lt;span class="n"&gt;SearchBot&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;PerplexityBot&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="c"&gt;# AI training bots — your call on these
&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;GPTBot&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;ClaudeBot&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;Google&lt;/span&gt;-&lt;span class="n"&gt;Extended&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;Applebot&lt;/span&gt;-&lt;span class="n"&gt;Extended&lt;/span&gt;
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="c"&gt;# Block training-only bots you're less comfortable with
&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;Bytespider&lt;/span&gt;
&lt;span class="n"&gt;Disallow&lt;/span&gt;: /

&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: &lt;span class="n"&gt;CCBot&lt;/span&gt;
&lt;span class="n"&gt;Disallow&lt;/span&gt;: /

&lt;span class="c"&gt;# Default: allow everything else
&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;-&lt;span class="n"&gt;agent&lt;/span&gt;: *
&lt;span class="n"&gt;Allow&lt;/span&gt;: /

&lt;span class="c"&gt;# Sitemap
&lt;/span&gt;&lt;span class="n"&gt;Sitemap&lt;/span&gt;: &lt;span class="n"&gt;https&lt;/span&gt;://&lt;span class="n"&gt;yoursite&lt;/span&gt;.&lt;span class="n"&gt;com&lt;/span&gt;/&lt;span class="n"&gt;sitemap&lt;/span&gt;.&lt;span class="n"&gt;xml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want AI visibility but don't want to contribute training data, you can &lt;code&gt;Disallow&lt;/code&gt; the training bots while keeping the search bots open. Just know that the line between training and search is blurry and getting blurrier — OpenAI's &lt;code&gt;GPTBot&lt;/code&gt; description says it's for "improving AI models," but model improvements directly affect how well ChatGPT cites you in the future.&lt;/p&gt;

&lt;p&gt;My take: unless you have a specific reason to block training bots, allow them all. In my experience, sites that allow both training and search bots tend to get cited more consistently than sites that only allow search bots — though I'll admit the sample size is small and I'm still tracking this.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verifying it's actually working
&lt;/h2&gt;

&lt;p&gt;This bit tripped me up at first — I updated a client's &lt;code&gt;robots.txt&lt;/code&gt; and assumed we were done. Took me a week to realize the CDN was still blocking at the edge. Always verify with server logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check for AI crawler activity in the last 7 days&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-iE&lt;/span&gt; &lt;span class="s2"&gt;"gptbot|oai-searchbot|chatgpt-user|claudebot|claude-searchbot|perplexitybot"&lt;/span&gt; /var/log/nginx/access.log | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're on a managed hosting platform without raw log access, check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare&lt;/strong&gt;: Security → Bots → look for verified bot traffic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel&lt;/strong&gt;: Analytics → check for known bot user agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GA4&lt;/strong&gt;: Won't show bot traffic directly, but watch for referrals from &lt;code&gt;chatgpt.com&lt;/code&gt;, &lt;code&gt;perplexity.ai&lt;/code&gt;, &lt;code&gt;gemini.google.com&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A few things I've noticed in the logs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI crawlers hit fewer pages than Googlebot, but spend more time per page&lt;/li&gt;
&lt;li&gt;They tend to favor pages with structured data and clean HTML&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ChatGPT-User&lt;/code&gt; shows up in bursts — someone is asking ChatGPT about your topic and it's fetching your page live&lt;/li&gt;
&lt;li&gt;If you see &lt;code&gt;OAI-SearchBot&lt;/code&gt; hitting your site regularly, that's a good sign — you're being indexed for ChatGPT search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don't panic if you don't see activity immediately. AI crawlers don't re-index on a fixed schedule. Give it 2-4 weeks after opening up your &lt;code&gt;robots.txt&lt;/code&gt; before expecting consistent crawler traffic.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I've seen happen after unblocking
&lt;/h2&gt;

&lt;p&gt;One thing I didn't expect: the effects aren't instant, but they compound. After unblocking AI crawlers on a few client sites, we noticed &lt;code&gt;OAI-SearchBot&lt;/code&gt; started hitting pages within 1-2 weeks. Actual citations in ChatGPT responses took another 2-4 weeks after that.&lt;/p&gt;

&lt;p&gt;But the interesting part was what happened to sites that stayed blocked. We ran the same queries monthly, and sites that were blocked for 6+ months essentially didn't exist in AI answers — even when their content was objectively better than what was getting cited. The crawlers had built indexing patterns around the sites that were consistently accessible, and the blocked sites had no history to draw on.&lt;/p&gt;

&lt;p&gt;It's similar to how Googlebot works — if your site has been returning 403s for months, you don't just flip a switch and rank tomorrow. There's a trust ramp.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick note: robots.txt is a request, not a wall
&lt;/h2&gt;

&lt;p&gt;Well-behaved crawlers (GPTBot, ClaudeBot, PerplexityBot) respect &lt;code&gt;robots.txt&lt;/code&gt;. But it's not a security mechanism. If you need granular control over AI training specifically, look into &lt;code&gt;X-Robots-Tag: noai, noimageai&lt;/code&gt; headers or &lt;code&gt;&amp;lt;meta name="robots" content="noai"&amp;gt;&lt;/code&gt; for page-level opt-out.&lt;/p&gt;




&lt;p&gt;I've been digging into this stuff while building &lt;a href="https://pagex.to" rel="noopener noreferrer"&gt;PageX&lt;/a&gt;, and &lt;code&gt;robots.txt&lt;/code&gt; misconfiguration is genuinely the most common issue we see — more than bad schema, more than thin content, more than any of the fancy optimization stuff. The boring infrastructure problem is usually the one that matters most.&lt;/p&gt;

&lt;p&gt;Has anyone here found surprising blocks in their robots.txt? Or noticed AI crawler activity change after opening things up? Curious what patterns others are seeing in their logs.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>seo</category>
      <category>aeo</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Make Your Website Easier for ChatGPT and Perplexity to Cite</title>
      <dc:creator>Nikhil Goyal</dc:creator>
      <pubDate>Mon, 30 Mar 2026 15:07:07 +0000</pubDate>
      <link>https://dev.to/nikhilgoyal/how-to-make-your-website-easier-for-chatgpt-and-perplexity-to-cite-3hle</link>
      <guid>https://dev.to/nikhilgoyal/how-to-make-your-website-easier-for-chatgpt-and-perplexity-to-cite-3hle</guid>
      <description>&lt;p&gt;I've spent the last year building an AI visibility tool, and in the process I've had to figure out what actually makes AI search engines cite one website over another.&lt;/p&gt;

&lt;p&gt;Most of what I assumed turned out to be wrong. Here's what I've learned.&lt;/p&gt;




&lt;h2&gt;
  
  
  The basics: AI search is a different game
&lt;/h2&gt;

&lt;p&gt;If you're already ranking well on Google, you might assume AI engines will find you too. The data says otherwise.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ahrefs.com/blog/ai-search-optimization/" rel="noopener noreferrer"&gt;Ahrefs found&lt;/a&gt; that around 80% of URLs cited by ChatGPT, Perplexity, Copilot, and Google AI Mode don't rank in Google's top 100 for the original query. These engines aren't just repackaging Google results — they're making independent decisions about what to cite.&lt;/p&gt;

&lt;p&gt;What seems to matter most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content structure and extractability&lt;/strong&gt; — can the AI pull a clean answer from your page?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Freshness&lt;/strong&gt; — &lt;a href="https://www.brightedge.com/" rel="noopener noreferrer"&gt;BrightEdge research&lt;/a&gt; shows pages updated within 60 days are 1.9x more likely to appear in AI answers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured data&lt;/strong&gt; — sites implementing schema markup and FAQ blocks saw a &lt;a href="https://www.brightedge.com/" rel="noopener noreferrer"&gt;44% increase in AI citations&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Depth over keyword density&lt;/strong&gt; — comprehensive coverage of a topic beats keyword stuffing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional SEO signals like backlink count and domain rating still matter, but they're not sufficient on their own.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I found in my server logs
&lt;/h2&gt;

&lt;p&gt;When I started checking server logs for AI crawler activity, a few things stood out.&lt;/p&gt;

&lt;p&gt;First, check whether you're even allowing AI crawlers in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://yoursite.com/robots.txt | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"gptbot&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;claudebot&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;perplexitybot&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;chatgpt"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A surprising number of sites — including ones that want AI visibility — have blanket blocks on these bots. Sometimes it's an overzealous security plugin, sometimes it's a robots.txt that hasn't been revisited since 2023.&lt;/p&gt;

&lt;p&gt;Second, AI crawlers behave differently from Googlebot. They tend to hit fewer pages but spend more time parsing each one. They care a lot about whether the content is directly accessible in the HTML versus buried behind client-side JavaScript rendering.&lt;/p&gt;

&lt;p&gt;If your site is a heavy SPA with most content rendered client-side, AI crawlers may be seeing an empty shell.&lt;/p&gt;




&lt;h2&gt;
  
  
  The content structure that gets cited
&lt;/h2&gt;

&lt;p&gt;After analyzing which pages on client sites get cited versus which get ignored, a clear pattern emerged. It comes down to &lt;strong&gt;answer-first content architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what I mean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- This gets ignored by AI engines --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;When it comes to understanding the complexities of emergency 
plumbing services, there are many factors that homeowners should 
consider before making a decision about which provider to call...&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- This gets cited --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;h2&amp;gt;&lt;/span&gt;How much does emergency plumbing cost?&lt;span class="nt"&gt;&amp;lt;/h2&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;Emergency plumbing typically costs $150–$500 for common issues 
like burst pipes or severe leaks. After-hours calls usually add 
a $75–$150 surcharge.&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't surprising when you look at the data: &lt;a href="https://www.wix.com/seo/learn/resource/how-to-get-cited-by-llms" rel="noopener noreferrer"&gt;a Wix study&lt;/a&gt; found that 44.2% of all LLM citations come from the first 30% of a page's text. If your answer is in paragraph 7, the AI has already moved on.&lt;/p&gt;

&lt;p&gt;The pattern that works best is what I've been calling "answer packs" — structured content blocks with a specific format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## [Question in natural language]&lt;/span&gt;

[Direct answer in 2-4 sentences]

&lt;span class="gs"&gt;**Key details:**&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Specific fact or data point
&lt;span class="p"&gt;-&lt;/span&gt; Another relevant detail
&lt;span class="p"&gt;-&lt;/span&gt; Context that helps the reader decide

&lt;span class="ge"&gt;*Last updated: March 2026*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "last updated" line matters. AI engines have a measurable recency bias — &lt;a href="https://www.position.digital/blog/ai-seo-statistics/" rel="noopener noreferrer"&gt;one study&lt;/a&gt; found that artificially refreshing publication dates alone can shift AI ranking positions by up to 95 places.&lt;/p&gt;




&lt;h2&gt;
  
  
  Schema markup: the low-hanging fruit most devs skip
&lt;/h2&gt;

&lt;p&gt;If you're a developer reading this, structured data is probably the highest-ROI thing you can implement today. Websites with author schema are &lt;a href="https://www.brightedge.com/" rel="noopener noreferrer"&gt;3x more likely to appear in AI answers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's a minimal FAQPage implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"@context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://schema.org"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FAQPage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mainEntity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Question"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What should I do if a pipe bursts?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"acceptedAnswer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Answer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Immediately shut off the main water valve, then call an emergency plumber. While waiting, open faucets to drain remaining water and move valuables away from the affected area."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if you're running a service business or local operation, &lt;code&gt;LocalBusiness&lt;/code&gt; schema is equally important:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"@context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://schema.org"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LocalBusiness"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Your Business Name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PostalAddress"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"addressLocality"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Austin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"addressRegion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"TX"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"telephone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"+1-512-555-0100"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priceRange"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$$"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"openingHoursSpecification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpeningHoursSpecification"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dayOfWeek"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Monday"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Tuesday"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Wednesday"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Thursday"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Friday"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"opens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"08:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"closes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"18:00"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The combination of semantic HTML (&lt;code&gt;&amp;lt;article&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt;, proper heading hierarchy) plus JSON-LD schema gives AI crawlers a machine-readable map of your content. Without it, they're guessing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The HTML quality checklist
&lt;/h2&gt;

&lt;p&gt;Here's the quick checklist I run on every page I want AI engines to cite:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structure:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Uses semantic HTML5 elements (&lt;code&gt;&amp;lt;article&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;main&amp;gt;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] Proper heading hierarchy (h1 → h2 → h3, no skipped levels)&lt;/li&gt;
&lt;li&gt;[ ] Key content is in the HTML source, not only rendered via JS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Content:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Page leads with a direct answer to the primary query&lt;/li&gt;
&lt;li&gt;[ ] Specific data points (prices, timelines, specs) are in plain text, not images&lt;/li&gt;
&lt;li&gt;[ ] Content is updated within the last 60 days&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Machine readability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] JSON-LD schema on the page (FAQPage, LocalBusiness, Product, HowTo — whatever fits)&lt;/li&gt;
&lt;li&gt;[ ] Author information present (name, credentials, schema)&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;robots.txt&lt;/code&gt; allows GPTBot, ClaudeBot, PerplexityBot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Meta:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Clean, descriptive URLs (not &lt;code&gt;/page?id=4827&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] OpenGraph and meta description present&lt;/li&gt;
&lt;li&gt;[ ] Sitemap includes the page and is submitted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is exotic. It's mostly just good web development hygiene. But it's surprising how many production sites fail 3-4 of these checks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Some counterintuitive findings
&lt;/h2&gt;

&lt;p&gt;A few things that went against my assumptions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question-style headings underperform.&lt;/strong&gt; I expected "How much does X cost?" headings to get cited more, but &lt;a href="https://www.position.digital/blog/ai-seo-statistics/" rel="noopener noreferrer"&gt;research from multiple sources&lt;/a&gt; shows straightforward headings actually get more citations than question-format ones (4.3 avg citations vs 3.4).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FAQ sections don't always help.&lt;/strong&gt; Pages with dedicated FAQ sections showed slightly fewer citations than those without in one study — but this likely reflects that FAQs tend to appear on simpler support pages with less depth overall. The format works; it's the content quality that matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Different AI engines prefer different content types.&lt;/strong&gt; ChatGPT disproportionately cites product and service pages directly. Perplexity leans toward listicles and comparison articles. Google AI Overviews pull from whatever it has already indexed highly. There's no single format that wins everywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Brand mentions correlate more strongly with AI visibility than backlinks.&lt;/strong&gt; &lt;a href="https://ahrefs.com/blog/ai-search-optimization/" rel="noopener noreferrer"&gt;Ahrefs data&lt;/a&gt; shows brand mention correlation at r = 0.664, higher than traditional link signals. Being talked about on Reddit, forums, and review sites seems to matter more for AI citation than having a strong backlink profile.&lt;/p&gt;




&lt;h2&gt;
  
  
  Measuring whether any of this works
&lt;/h2&gt;

&lt;p&gt;The tricky part: &lt;a href="https://upgrowth.in/ai-traffic-share-report-2026/" rel="noopener noreferrer"&gt;an estimated 25-35% of AI-influenced traffic is misattributed&lt;/a&gt; in standard analytics setups.&lt;/p&gt;

&lt;p&gt;What I do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Manual prompting&lt;/strong&gt; — every week, I run 20-30 relevant queries through ChatGPT, Perplexity, and Google AI. I note whether my pages are cited, who else is cited, and what format the cited content uses. Low-tech, high-signal.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GA4 referral sources&lt;/strong&gt; — filter for &lt;code&gt;chatgpt.com&lt;/code&gt;, &lt;code&gt;perplexity.ai&lt;/code&gt;, &lt;code&gt;gemini.google.com&lt;/code&gt;. The numbers will be small but growing fast.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Server logs&lt;/strong&gt; — grep for GPTBot, ClaudeBot, PerplexityBot user agents. Track which pages they're hitting and how often.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Correlation tracking&lt;/strong&gt; — watch for direct traffic spikes that line up with when your site starts appearing in AI answers. This catches the unattributed portion.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The manual prompting step sounds tedious, but it's by far the most useful. You'll learn more in 30 minutes of querying AI engines than from any dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  The opportunity most people are missing
&lt;/h2&gt;

&lt;p&gt;One data point that's been stuck in my head: &lt;a href="https://upgrowth.in/ai-traffic-share-report-2026/" rel="noopener noreferrer"&gt;according to an upGrowth report&lt;/a&gt;, technology and SaaS companies already see 18-25% of their traffic from AI referrals, but local service businesses sit at just 3-7%.&lt;/p&gt;

&lt;p&gt;That gap is enormous. And it exists mostly because local businesses haven't structured their content for AI readability yet. The first mover advantage in local AI search is wide open.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm still figuring out
&lt;/h2&gt;

&lt;p&gt;I don't want to pretend this is all solved. Here's what's still genuinely hard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Volatility is real.&lt;/strong&gt; AI Overview content changes roughly 70% of the time for the same query. Only about 30% of brands remain visible in back-to-back AI responses. Consistency is hard to achieve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attribution is messy.&lt;/strong&gt; Even with careful tracking, connecting AI citations to actual conversions requires a lot of inference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform fragmentation.&lt;/strong&gt; Optimizing for ChatGPT, Perplexity, and Google AI simultaneously sometimes requires conflicting approaches. There's no universal playbook yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The goalposts move.&lt;/strong&gt; AI engines update their models and citation patterns regularly. What works today might not work in 3 months.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;We've been testing all of this while building &lt;a href="https://pagex.to" rel="noopener noreferrer"&gt;PageX&lt;/a&gt;, and the biggest lesson so far is that AI systems reward clarity and structure far more than traditional SEO signals. Clean HTML, direct answers at the top of the page, fresh content, and proper schema — it's not glamorous, but it works.&lt;/p&gt;

&lt;p&gt;The other lesson: this stuff compounds. Sites that start optimizing for AI readability now are building citation history that'll be hard for competitors to catch up on later. AI engines learn which sources consistently give reliable, extractable answers, and they keep going back.&lt;/p&gt;

&lt;p&gt;Curious whether others here are seeing the same patterns in their logs, referrals, or citation data. If you've run the &lt;code&gt;robots.txt&lt;/code&gt; check and found something surprising, or if you've noticed AI referral traffic showing up in your analytics, I'd be interested to hear what you're seeing.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
