Andrei Mironiuk

Posted on May 7

robots.txt unreachable and technical SEO debugging

#seo #webdev #debugging #nextjs

robots.txt unreachable is not usually a content problem.

It is usually a fetch, routing, DNS, CDN, middleware, firewall, redirect, or cache problem.

That distinction matters because teams often waste time editing pages when Google is really saying:

"I could not reliably fetch the file that tells me what I am allowed to crawl."

Here is the debugging order I use.

1. Confirm the file exists at the root

Open:

https://example.com/robots.txt

It should return a plain-text response from the same public host Google crawls.

2. Check the HTTP status

Use:

curl -I https://example.com/robots.txt

You want a stable 200 OK.

Watch for:

403 from bot protection
404 from routing
5xx from hosting or edge functions
long redirect chains
HTML being returned instead of plain text

3. Check middleware and auth rules

This is especially easy to miss in modern app routers.

Make sure these paths are not behind auth, redirects, or app-level rewrites:

/robots.txt
/sitemap.xml
/llms.txt

If your middleware protects everything by default, explicitly bypass these files.

4. Check CDN and bot rules

A site can work perfectly in your browser and still fail for Googlebot-like requests.

Look for:

managed challenge pages
country-level blocking
user-agent blocks
rate-limit rules
WAF rules applied to static text files

5. Do not overcomplicate robots.txt

For many public sites, simple is safer:

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

Complex rules create more places for accidental blocking.

6. Retest after the fix

After deployment, retest the live file and then use Google Search Console's robots.txt report or URL Inspection again.

If the issue was temporary, Search Console may need time to refresh its cached state.

Why this matters

If Google cannot fetch robots.txt, it may pause crawling because it cannot confirm crawl permissions.

That can make indexing problems look like content problems, even when the real issue is infrastructure.

I wrote a fuller breakdown here:

https://visrank.org/blog/why-google-search-console-says-robots-txt-unreachable

And if you want a broader launch checklist for crawlability, canonicals, schema, speed, security, and mobile basics:

https://visrank.org/blog/technical-seo-checklist-2026

The short version:

Before rewriting content, prove Google can fetch the boring files.

Those boring files decide whether the rest of the site can even enter the conversation.

DEV Community