5 robots.txt Mistakes That Make Your Site Invisible to AI Search

#webdev #ai #seo

AI search engines like ChatGPT, Perplexity, and Claude now answer millions of queries daily. But here's the thing — they can only cite your site if their crawlers can actually reach it.

I've scanned thousands of sites and these are the 5 most common robots.txt mistakes I see:

1. Blanket Disallow for unknown user agents

Many sites default to blocking all unrecognized bots. Problem: GPTBot, ClaudeBot, and PerplexityBot are relatively new. If your robots.txt has a catch-all block, you're invisible to AI search.

2. Blocking /api/ paths that contain public content

Some sites serve blog content through API routes. Blocking /api/ might seem like good security practice, but if your content lives there, AI crawlers can't see it.

3. No explicit Allow for AI crawlers

Even if you don't block them, explicitly allowing AI bots signals that you welcome their indexing. Add these to your robots.txt:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

4. Aggressive Crawl-delay values

Setting Crawl-delay: 30 means a bot waits 30 seconds between requests. For large sites, this means AI crawlers may only index a fraction of your content before timing out.

5. Forgetting to update after CMS migrations

Moved from WordPress to Next.js? Your old robots.txt rules might still be blocking paths that no longer exist — while failing to protect paths that should be private.

How to check

You can manually review your robots.txt, or run a free scan at geoscoreai.com — it checks all 9 AI search readiness signals including crawler access, structured data, and content structure. Takes about 60 seconds.

What robots.txt mistakes have you encountered? Drop them in the comments.