Here's a sobering stat: 78% of websites are completely invisible to AI search platforms. Not because their content is bad — because they're technically blocked, poorly structured, or missing the signals AI systems look for.
I've been reverse-engineering how ChatGPT, Claude, and Gemini discover and evaluate websites. Based on that research, here's a 15-minute audit you can run right now to find out if your site is visible to AI — and what to fix if it's not.
Minute 1-3: Check Your robots.txt
This is where most sites fail before the game even starts.
Open yoursite.com/robots.txt and look for these user agents:
GPTBot → ChatGPT Search
ClaudeBot → Claude (Anthropic)
PerplexityBot → Perplexity AI
Google-Extended → Gemini
Bytespider → TikTok AI
If you see Disallow: / next to any of these, that AI platform cannot see your site at all. Many CMS platforms and security plugins block AI crawlers by default.
The fix (2 minutes):
# Allow AI crawlers
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
Add these lines to your robots.txt. This alone can take you from invisible to discoverable.
Important nuance: Allowing these crawlers means they can read your content. If you have privacy concerns about AI training, you can allow GPTBot (which powers ChatGPT Search) while blocking CCBot (which is used for training data). They're separate.
Minute 3-5: Test Your Bing Index
ChatGPT uses Bing as its search backend. If your site isn't indexed in Bing, ChatGPT literally cannot find you — no matter how good your Google rankings are.
Quick test: Go to bing.com and search site:yoursite.com. If you see zero results, you have a problem.
The fix:
- Create a Bing Webmaster Tools account (free)
- Submit your sitemap
- Request indexing for your key pages
Many developers focus exclusively on Google Search Console and forget that Bing is the gateway to ChatGPT visibility. This is a common blind spot.
Minute 5-8: Evaluate Content Extractability
AI platforms don't read your page like a human. They parse HTML structure and extract specific answers. Let's test how well your content works for this.
Open your most important page and ask: Can I extract a specific, quotable answer from the first 300 words?
<!-- LOW extractability (common on corporate sites) -->
<p>We are a leading provider of innovative solutions that help
businesses transform their digital presence. Our team of experts
brings decades of experience to every project.</p>
<!-- HIGH extractability -->
<p>Our deployment platform reduces CI/CD pipeline time by 73%,
from an average of 12 minutes to 3.2 minutes. Over 4,200 teams
use it to ship code to production, handling 890,000 deployments
per month.</p>
AI platforms need concrete facts to cite. If your content is full of vague marketing language, you'll get consulted (the AI reads you) but never cited (the AI doesn't reference you).
What to look for:
- Specific numbers and metrics
- Clear definitions and explanations
- Comparison data
- Step-by-step instructions with concrete outcomes
- Original data or research findings
Minute 8-10: Check Schema.org Markup
Run your URL through Google's Rich Results Test (search.google.com/test/rich-results) or check your page source for <script type="application/ld+json">.
Our research shows sites with Schema.org markup receive 30-40% more AI citations. The most impactful Schema types for AI visibility:
| Schema Type | Best For | Citation Impact |
|---|---|---|
TechArticle |
Developer content | High |
HowTo |
Tutorials, guides | Very High |
FAQPage |
Q&A content | Very High |
SoftwareApplication |
Tool/product pages | High |
Article |
Blog posts, news | Medium |
Dataset |
Research, studies | Very High |
If you have zero structured data, adding FAQPage schema to your key pages is the single fastest win. AI platforms parse FAQ schema to directly answer questions, and they cite the source.
Quick implementation:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "How do I optimize my site for AI search?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Start by allowing AI crawlers in robots.txt, add Schema.org markup, ensure your content contains specific extractable claims, and submit your site to Bing Webmaster Tools."
}
}]
}
</script>
Minute 10-12: Review Your llms.txt
This is the newest standard, and most sites don't have it yet — which means implementing it puts you ahead.
llms.txt is a file at your site root (like robots.txt) that tells AI systems what your site is about and where to find your best content. It's designed specifically for AI crawlers.
Check: Does yoursite.com/llms.txt return a 404? If yes, you're missing an easy win.
Template:
# Your Site Name
> One-line description of what you do.
## What We Cover
Brief description of your expertise and content focus.
## Key Resources
- Documentation: https://yoursite.com/docs
- Blog: https://yoursite.com/blog
- API Reference: https://yoursite.com/api
## Popular Content
- Guide to X: https://yoursite.com/guide-x
- Tutorial on Y: https://yoursite.com/tutorial-y
Think of it as an executive summary for AI. When a crawler visits your site, this file tells it exactly what's important and where to look.
Minute 12-14: Check Page Speed for AI Crawlers
AI crawlers have timeouts. If your page takes more than 5 seconds to return meaningful content, the crawler may give up. This is different from user-facing Core Web Vitals — AI crawlers care about time-to-first-byte and HTML response completeness.
The problems:
- Client-side rendering (React/Vue SPAs) → crawler sees an empty
<div id="root"> - Heavy JavaScript dependencies → content loads after JS execution
- Slow API calls blocking page render
Quick test: Use curl -s yoursite.com | head -50 and check if you see actual content or just a JavaScript shell. If it's a JS shell, AI crawlers probably can't see your content.
Fixes by priority:
- Server-side rendering (SSR) or static generation (SSG)
- Pre-rendering for bot user agents
- Ensure critical content is in the initial HTML response
Minute 14-15: The AI Citation Test
The ultimate test: ask AI about your topic and see if you appear.
Open ChatGPT, Claude, and Gemini. Ask a question that your content should answer. Look for:
- Are you cited as a source?
- Are your competitors cited instead?
- Is your content summarized without citation (consulted but not cited)?
- Are you completely absent?
This gives you a baseline. If you're absent from all three platforms, work through the fixes above. If you're consulted but not cited, focus on extractability and Schema markup.
The Scorecard
Here's a quick scoring system:
| Check | Points |
|---|---|
| robots.txt allows AI crawlers | +20 |
| Indexed in Bing | +15 |
| Schema.org markup present | +20 |
| Content has extractable claims | +15 |
| llms.txt exists | +10 |
| SSR or pre-rendered | +10 |
| Cited by at least 1 AI platform | +10 |
- 80-100: Excellent AI visibility
- 50-79: Good foundation, room to improve
- 20-49: Significant gaps, needs work
- 0-19: Essentially invisible to AI
Automate the Audit
If you want to go deeper than this manual checklist, AI Query Revealer includes an AI SEO Technical Scanner that automates most of these checks. It audits your robots.txt, Schema.org markup, HTML structure, meta tags, and crawler accessibility in about 15 seconds and generates a GEO Score from 0 to 100.
But honestly, this manual checklist covers the fundamentals. Start here, fix the gaps, and you'll already be ahead of 78% of websites.
Run the audit on your site and drop your score in the comments. Curious to see where everyone lands — and what the most common blocking issue is.
Top comments (0)