Quick Summary: If GSC gives you a cryptic Could not fetch error on a technically valid XML sitemap, stop rewriting the XML too early. In this case, I controlled variables across Cloudflare Pages, Vercel, Cloudflare Workers/OpenNext, and domain changes, which pointed to a production-domain or GSC-state anomaly. The practical SEO fallback was to add a crawlable HTML sitemap.
I recently spent a lot of time debugging a sitemap problem that looked simple at first.
Google Search Console kept saying:
Could not fetch
The strange part was that the sitemap worked everywhere else.
It returned 200 OK. It had the correct application/xml content type. The XML was valid. robots.txt pointed to it. Browser access worked. Command-line checks worked.
But Google Search Console still refused to read it.
This post is a write-up of the debugging process: what I checked, what I ruled out, why the hosting platform was probably not the final root cause, and why I eventually added an HTML sitemap as a crawl-discovery fallback.
The Site Setup
The project is a Next.js tool site for browser-side video and image processing.
It has:
- mostly static SEO/tool pages
- localized routes
- no login
- no database
- no payment system
- client-side media processing
- some heavier runtime assets served through the Cloudflare ecosystem
The production deployment was on Cloudflare Pages.
The XML sitemap was generated by the app and included localized URLs across supported languages.
At a high level, this should have been a boring sitemap setup.
It was not.
The Initial Symptoms
In Google Search Console, both the dynamic XML sitemap and a static XML sitemap showed the same failure:
Could not fetch
The discovered page count stayed at zero.
The frustrating part was that direct HTTP checks looked normal:
/sitemap.xml -> 200 application/xml
/sitemap-static.xml -> 200 application/xml
/robots.txt -> 200 text/plain
Both XML files passed validation.
The static sitemap was especially important. It was a plain file under public/, not a Next.js metadata route. If both the generated sitemap and a static XML file failed in GSC, then the issue was probably not limited to the Next.js sitemap route.
Checking robots.txt
The robots.txt file was simple:
User-Agent: *
Allow: /
Sitemap: /sitemap.xml
There was no disallow rule blocking the sitemap or the main pages.
So the obvious robots explanation did not fit.
Checking Cloudflare DNS and Custom Domains
Because the site was on Cloudflare Pages, I spent a lot of time checking Cloudflare configuration.
The production domain and www domain were active in Cloudflare Pages. SSL was enabled. The DNS records pointed to the Pages deployment and were proxied through Cloudflare.
I also checked old verification records, email records, custom domain status, and the basic DNS setup.
Nothing obvious looked broken.
The domain resolved. The site loaded. The sitemap returned 200. The custom domains were active.
So the basic Cloudflare Pages domain setup did not explain why GSC could not fetch the sitemap.
Checking Cloudflare Bot and Security Rules
The next suspicion was that Googlebot might be getting challenged or blocked by Cloudflare.
I checked Cloudflare AI Crawl Control and security events.
Googlebot was recognized as a search engine crawler. The relevant controls were not blocking it.
Cloudflare Security Events also showed requests to the sitemap path from verified crawlers, including Google-related user agents.
There was a custom rule for verified bots:
cf.client.bot
The rule skipped security products for verified bots.
The important part: I did not find evidence of:
- WAF block
- Managed Challenge
- JS Challenge
- Interactive Challenge
- Googlebot being denied access
Googlebot-style requests could receive 200 responses.
That made a simple Cloudflare security-block explanation unlikely.
Checking HTTP Details
I also checked the sitemap response in several ways.
The sitemap returned:
200
application/xml
I checked normal requests, Googlebot-style user agents, HTTP/1.1 behavior, compressed responses, and XML validation.
The XML remained valid.
The dynamic sitemap had Next.js route headers, as expected. But the static sitemap also failed in GSC, which again suggested that the issue was not just the Next.js metadata route.
At this stage, the sitemap looked technically valid from the outside.
Google Search Console still disagreed.
Fixing Build and Deployment Noise
During the investigation, I also found unrelated build noise.
The Cloudflare build could fail because next/font/google tried to fetch fonts during the build. That was not directly the sitemap bug, but it made deployment verification less stable.
I removed the Google Fonts dependency and switched to a system font stack.
After that, both the normal Next build and the Cloudflare build completed successfully.
This mattered because I needed a stable deployment baseline before blaming Google, Cloudflare, or the domain.
The Vercel Diagnostic Test
At one point, I deployed the same project to Vercel as a diagnostic comparison.
The goal was not to move production to Vercel.
The goal was to answer a narrower question:
Is the sitemap XML/project output itself fundamentally broken?
The deployment itself worked.
But the key detail is this: when the production domain was used, Google Search Console still could not fetch the sitemap.
That meant simply changing the hosting platform was probably not enough.
Later, I tested the same project with a different temporary domain. Under that different domain, Google Search Console could fetch the sitemap successfully.
That changed the interpretation.
The issue was probably not just:
Cloudflare Pages vs Vercel
The stronger signal was:
The failure was likely tied to the production domain or Google/GSC state associated with that domain.
This distinction matters. Without it, it is easy to draw the wrong conclusion and think that a hosting migration alone would fix everything.
Here is the simplified control-variable table:
| Deployment path | Domain used | GSC status | What it suggested |
|---|---|---|---|
| Cloudflare Pages | Production domain | Failed: Could not fetch
|
Not explained by basic Cloudflare DNS, SSL, WAF, or robots settings |
| Vercel diagnostic deployment | Production domain | Failed: Could not fetch
|
Moving the same project to Vercel did not fix the production-domain problem |
| Same project on a different temporary domain | Temporary domain | Success | The sitemap output was likely valid; the production domain or GSC state became the stronger suspect |
| Cloudflare Workers + OpenNext | Production domain | Failed: Could not fetch
|
Swapping backend infrastructure did not fix the production-domain problem |
Trying Cloudflare Workers and OpenNext
Because the old Cloudflare Pages build chain used @cloudflare/next-on-pages, and that adapter is deprecated, I also tested a Workers/OpenNext path.
This was not a casual check. I actually went through the deployment path:
- added OpenNext/Workers configuration
- configured
wrangler - tested a Workers custom domain
- confirmed the Worker was serving traffic
- saw the
x-opennextresponse header - tested the homepage
- tested
robots.txt - tested the sitemap
- tested the runtime routes needed by the app
At first, the Worker test domain worked.
Then I tried switching the production domain from Pages to Workers.
That required removing the production custom domains from Pages and adding them to the Worker, because Cloudflare would not allow the same hostname to be managed by both at once.
After the switch, the production domain did serve through Workers/OpenNext.
The key routes still returned valid responses:
/ -> 200
/sitemap.xml -> 200 application/xml
/robots.txt -> 200 text/plain
The response headers confirmed traffic was going through OpenNext Workers.
Then I submitted the sitemap again in Google Search Console.
It still failed.
I also tried a cache-busting sitemap URL with a query string.
That URL returned valid XML outside GSC.
GSC still said it could not fetch it.
That was an important result.
It meant the original hypothesis was not supported:
This was not simply a Cloudflare Pages or next-on-pages problem.
The same production domain still had the issue even after moving the delivery path to Workers/OpenNext.
The Strongest Conclusion
After all of these tests, the most likely root area was not the XML file itself.
It was also not clearly one hosting provider.
The strongest clue was the domain test:
- same project
- same kind of sitemap
- different domain
- GSC could fetch it
That points toward a domain-level or Google-side state issue.
Possible explanations include:
- historical crawl state for the production domain
- Google Search Console state for the domain property
- DNS or routing history associated with the domain
- Google-side host classification or cache
I cannot prove exactly which one it is.
But the evidence pointed away from endlessly rewriting the sitemap XML.
The Practical Problem
Even if the XML sitemap is valid, it is not very helpful if Google Search Console refuses to process it.
The site still needs its important pages discovered.
For a multilingual tools site, that matters.
So I stopped treating the XML sitemap as the only discovery mechanism.
I added an HTML sitemap.
Why an HTML Sitemap Helps
An HTML sitemap is just a normal page with internal links.
Googlebot can crawl it like any other page.
That gives the site another discovery path:
normal page -> footer link -> HTML sitemap -> localized tool pages
This does not fix the XML sitemap failure directly.
It reduces the risk of relying on only one discovery mechanism.
That was the practical goal.
How I Designed the HTML Sitemap
I kept the page intentionally simple.
The HTML sitemap:
- returns plain HTML
- is linked from the footer
- uses
index, follow - has a canonical URL
- groups links by language
- lists only the core tool pages
- excludes privacy policy and terms pages
- avoids duplicate homepage entries
The page is not trying to be a fancy user interface.
It is a reliable crawl hub.
What the HTML Sitemap Contains
The site has multiple languages and a fixed set of core tools.
The HTML sitemap lists the localized version of each core tool page.
In the current setup:
9 languages x 11 core tools = 99 tool links
For the homepage converter, the link is represented by each language's localized tool title, instead of repeating a generic brand link.
That keeps the sitemap focused and avoids unnecessary duplicates.
Why I Did Not Just Move Hosting
Moving hosting would have been the wrong lesson.
The production domain still failed even when the project was served through a different deployment path.
A different temporary domain worked.
That means the domain/GSC state was the more important signal.
Also, the product uses browser-side media processing and heavier runtime assets that fit well with the existing Cloudflare setup.
So the better solution was not to migrate everything just because one diagnostic deployment behaved differently.
The better solution was to add a second crawl-discovery path.
What I Learned
Sitemap debugging is not always about the sitemap file.
Sometimes:
- the XML is valid
- the headers are correct
- the route is public
- bots are not blocked
- multiple hosting paths work technically
and Google Search Console still reports a fetch failure.
At that point, adding another discovery mechanism can be more useful than continuing to tweak a valid XML file.
For this case, the final strategy was:
- keep the XML sitemap
- keep monitoring GSC
- add an HTML sitemap
- link it from the footer
- make the important localized pages discoverable through ordinary internal links
Final Setup
The live site discussed in this post is:
- Main site: https://videosnap.cc/
- HTML sitemap fallback: https://videosnap.cc/html-sitemap
The HTML sitemap is not a replacement for the XML sitemap.
It is a crawl-discovery fallback.
And in this case, that was the most practical solution.
Top comments (0)