DEV Community

Cover image for robots.txt unreachable and technical SEO debugging
Andrei Mironiuk
Andrei Mironiuk

Posted on

robots.txt unreachable and technical SEO debugging

robots.txt unreachable is not usually a content problem.

It is usually a fetch, routing, DNS, CDN, middleware, firewall, redirect, or cache problem.

That distinction matters because teams often waste time editing pages when Google is really saying:

"I could not reliably fetch the file that tells me what I am allowed to crawl."

Here is the debugging order I use.

1. Confirm the file exists at the root

Open:

https://example.com/robots.txt
Enter fullscreen mode Exit fullscreen mode

It should return a plain-text response from the same public host Google crawls.

2. Check the HTTP status

Use:

curl -I https://example.com/robots.txt
Enter fullscreen mode Exit fullscreen mode

You want a stable 200 OK.

Watch for:

  • 403 from bot protection
  • 404 from routing
  • 5xx from hosting or edge functions
  • long redirect chains
  • HTML being returned instead of plain text

3. Check middleware and auth rules

This is especially easy to miss in modern app routers.

Make sure these paths are not behind auth, redirects, or app-level rewrites:

/robots.txt
/sitemap.xml
/llms.txt
Enter fullscreen mode Exit fullscreen mode

If your middleware protects everything by default, explicitly bypass these files.

4. Check CDN and bot rules

A site can work perfectly in your browser and still fail for Googlebot-like requests.

Look for:

  • managed challenge pages
  • country-level blocking
  • user-agent blocks
  • rate-limit rules
  • WAF rules applied to static text files

5. Do not overcomplicate robots.txt

For many public sites, simple is safer:

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Enter fullscreen mode Exit fullscreen mode

Complex rules create more places for accidental blocking.

6. Retest after the fix

After deployment, retest the live file and then use Google Search Console's robots.txt report or URL Inspection again.

If the issue was temporary, Search Console may need time to refresh its cached state.

Why this matters

If Google cannot fetch robots.txt, it may pause crawling because it cannot confirm crawl permissions.

That can make indexing problems look like content problems, even when the real issue is infrastructure.

I wrote a fuller breakdown here:

https://visrank.org/blog/why-google-search-console-says-robots-txt-unreachable

And if you want a broader launch checklist for crawlability, canonicals, schema, speed, security, and mobile basics:

https://visrank.org/blog/technical-seo-checklist-2026

The short version:

Before rewriting content, prove Google can fetch the boring files.

Those boring files decide whether the rest of the site can even enter the conversation.

Top comments (0)