How to Check if a Website Blocks AI Bots (GPTBot, ClaudeBot, CCBot)

#ai #webdev #security #robots

With the rise of AI crawling, many websites now block AI bots in their robots.txt. Here's how to check.

The Quick Way

Fetch any site's robots.txt and look for these user-agents:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

If you see Disallow: / for any of these — that bot is blocked.

Examples

NYTimes — blocks GPTBot, CCBot ✅
OpenAI.com — does NOT block AI bots (interesting!) ❌
Google.com — selective blocking

Automated Check

I built a Robots.txt Analyzer that automatically detects AI bot blocking:

{
  "domain": "nytimes.com",
  "aiBotsBlocked": true,
  "blockedBots": {
    "GPTBot": true,
    "ClaudeBot": false,
    "CCBot": true,
    "Googlebot": false
  }
}

Why This Matters

For AI companies: know which data sources you can legally train on
For publishers: verify your robots.txt actually blocks unwanted crawlers
For researchers: study the AI crawling landscape

Free tool on Apify Store — search knotless_cadence robots-txt.\n\n---\n\n*More tools:* 60+ free scrapers | Reports | MCP Servers

Need data scraped or market research done? I offer web scraping ($20), market research reports ($20), and custom automation ($50+). 77 production scrapers. Hire me → or email Spinov001@gmail.com

Order custom data via Payoneer ($20)

Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email me: spinov001@gmail.com*

DEV Community

How to Check if a Website Blocks AI Bots (GPTBot, ClaudeBot, CCBot)

The Quick Way

Examples

Automated Check

Why This Matters

Top comments (0)