<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bob</title>
    <description>The latest articles on DEV Community by Bob (@jsxyzb).</description>
    <link>https://dev.to/jsxyzb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3869111%2Feac87454-3c4f-4595-922c-669cd0221fbc.png</url>
      <title>DEV Community: Bob</title>
      <link>https://dev.to/jsxyzb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jsxyzb"/>
    <language>en</language>
    <item>
      <title>When Google Search Console Couldn’t Fetch My Sitemap, I Added an HTML Sitemap Fallback</title>
      <dc:creator>Bob</dc:creator>
      <pubDate>Wed, 03 Jun 2026 09:35:08 +0000</pubDate>
      <link>https://dev.to/jsxyzb/when-google-search-console-couldnt-fetch-my-sitemap-i-added-an-html-sitemap-fallback-2d0l</link>
      <guid>https://dev.to/jsxyzb/when-google-search-console-couldnt-fetch-my-sitemap-i-added-an-html-sitemap-fallback-2d0l</guid>
      <description>&lt;p&gt;&lt;strong&gt;Quick Summary:&lt;/strong&gt; If GSC gives you a cryptic &lt;code&gt;Could not fetch&lt;/code&gt; error on a technically valid XML sitemap, stop rewriting the XML too early. In this case, I controlled variables across Cloudflare Pages, Vercel, Cloudflare Workers/OpenNext, and domain changes, which pointed to a production-domain or GSC-state anomaly. The practical SEO fallback was to add a crawlable HTML sitemap.&lt;/p&gt;

&lt;p&gt;I recently spent a lot of time debugging a sitemap problem that looked simple at first.&lt;/p&gt;

&lt;p&gt;Google Search Console kept saying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Could not fetch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The strange part was that the sitemap worked everywhere else.&lt;/p&gt;

&lt;p&gt;It returned &lt;code&gt;200 OK&lt;/code&gt;. It had the correct &lt;code&gt;application/xml&lt;/code&gt; content type. The XML was valid. &lt;code&gt;robots.txt&lt;/code&gt; pointed to it. Browser access worked. Command-line checks worked.&lt;/p&gt;

&lt;p&gt;But Google Search Console still refused to read it.&lt;/p&gt;

&lt;p&gt;This post is a write-up of the debugging process: what I checked, what I ruled out, why the hosting platform was probably not the final root cause, and why I eventually added an HTML sitemap as a crawl-discovery fallback.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Site Setup
&lt;/h2&gt;

&lt;p&gt;The project is a Next.js tool site for browser-side video and image processing.&lt;/p&gt;

&lt;p&gt;It has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mostly static SEO/tool pages&lt;/li&gt;
&lt;li&gt;localized routes&lt;/li&gt;
&lt;li&gt;no login&lt;/li&gt;
&lt;li&gt;no database&lt;/li&gt;
&lt;li&gt;no payment system&lt;/li&gt;
&lt;li&gt;client-side media processing&lt;/li&gt;
&lt;li&gt;some heavier runtime assets served through the Cloudflare ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The production deployment was on Cloudflare Pages.&lt;/p&gt;

&lt;p&gt;The XML sitemap was generated by the app and included localized URLs across supported languages.&lt;/p&gt;

&lt;p&gt;At a high level, this should have been a boring sitemap setup.&lt;/p&gt;

&lt;p&gt;It was not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Initial Symptoms
&lt;/h2&gt;

&lt;p&gt;In Google Search Console, both the dynamic XML sitemap and a static XML sitemap showed the same failure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Could not fetch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The discovered page count stayed at zero.&lt;/p&gt;

&lt;p&gt;The frustrating part was that direct HTTP checks looked normal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/sitemap.xml        -&amp;gt; 200 application/xml
/sitemap-static.xml -&amp;gt; 200 application/xml
/robots.txt         -&amp;gt; 200 text/plain
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both XML files passed validation.&lt;/p&gt;

&lt;p&gt;The static sitemap was especially important. It was a plain file under &lt;code&gt;public/&lt;/code&gt;, not a Next.js metadata route. If both the generated sitemap and a static XML file failed in GSC, then the issue was probably not limited to the Next.js sitemap route.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checking robots.txt
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;robots.txt&lt;/code&gt; file was simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User-Agent: *
Allow: /

Sitemap: /sitemap.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There was no disallow rule blocking the sitemap or the main pages.&lt;/p&gt;

&lt;p&gt;So the obvious robots explanation did not fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checking Cloudflare DNS and Custom Domains
&lt;/h2&gt;

&lt;p&gt;Because the site was on Cloudflare Pages, I spent a lot of time checking Cloudflare configuration.&lt;/p&gt;

&lt;p&gt;The production domain and &lt;code&gt;www&lt;/code&gt; domain were active in Cloudflare Pages. SSL was enabled. The DNS records pointed to the Pages deployment and were proxied through Cloudflare.&lt;/p&gt;

&lt;p&gt;I also checked old verification records, email records, custom domain status, and the basic DNS setup.&lt;/p&gt;

&lt;p&gt;Nothing obvious looked broken.&lt;/p&gt;

&lt;p&gt;The domain resolved. The site loaded. The sitemap returned &lt;code&gt;200&lt;/code&gt;. The custom domains were active.&lt;/p&gt;

&lt;p&gt;So the basic Cloudflare Pages domain setup did not explain why GSC could not fetch the sitemap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checking Cloudflare Bot and Security Rules
&lt;/h2&gt;

&lt;p&gt;The next suspicion was that Googlebot might be getting challenged or blocked by Cloudflare.&lt;/p&gt;

&lt;p&gt;I checked Cloudflare AI Crawl Control and security events.&lt;/p&gt;

&lt;p&gt;Googlebot was recognized as a search engine crawler. The relevant controls were not blocking it.&lt;/p&gt;

&lt;p&gt;Cloudflare Security Events also showed requests to the sitemap path from verified crawlers, including Google-related user agents.&lt;/p&gt;

&lt;p&gt;There was a custom rule for verified bots:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cf.client.bot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule skipped security products for verified bots.&lt;/p&gt;

&lt;p&gt;The important part: I did not find evidence of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WAF block&lt;/li&gt;
&lt;li&gt;Managed Challenge&lt;/li&gt;
&lt;li&gt;JS Challenge&lt;/li&gt;
&lt;li&gt;Interactive Challenge&lt;/li&gt;
&lt;li&gt;Googlebot being denied access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Googlebot-style requests could receive &lt;code&gt;200&lt;/code&gt; responses.&lt;/p&gt;

&lt;p&gt;That made a simple Cloudflare security-block explanation unlikely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checking HTTP Details
&lt;/h2&gt;

&lt;p&gt;I also checked the sitemap response in several ways.&lt;/p&gt;

&lt;p&gt;The sitemap returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;200
application/xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I checked normal requests, Googlebot-style user agents, HTTP/1.1 behavior, compressed responses, and XML validation.&lt;/p&gt;

&lt;p&gt;The XML remained valid.&lt;/p&gt;

&lt;p&gt;The dynamic sitemap had Next.js route headers, as expected. But the static sitemap also failed in GSC, which again suggested that the issue was not just the Next.js metadata route.&lt;/p&gt;

&lt;p&gt;At this stage, the sitemap looked technically valid from the outside.&lt;/p&gt;

&lt;p&gt;Google Search Console still disagreed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixing Build and Deployment Noise
&lt;/h2&gt;

&lt;p&gt;During the investigation, I also found unrelated build noise.&lt;/p&gt;

&lt;p&gt;The Cloudflare build could fail because &lt;code&gt;next/font/google&lt;/code&gt; tried to fetch fonts during the build. That was not directly the sitemap bug, but it made deployment verification less stable.&lt;/p&gt;

&lt;p&gt;I removed the Google Fonts dependency and switched to a system font stack.&lt;/p&gt;

&lt;p&gt;After that, both the normal Next build and the Cloudflare build completed successfully.&lt;/p&gt;

&lt;p&gt;This mattered because I needed a stable deployment baseline before blaming Google, Cloudflare, or the domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Vercel Diagnostic Test
&lt;/h2&gt;

&lt;p&gt;At one point, I deployed the same project to Vercel as a diagnostic comparison.&lt;/p&gt;

&lt;p&gt;The goal was not to move production to Vercel.&lt;/p&gt;

&lt;p&gt;The goal was to answer a narrower question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is the sitemap XML/project output itself fundamentally broken?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deployment itself worked.&lt;/p&gt;

&lt;p&gt;But the key detail is this: when the production domain was used, Google Search Console still could not fetch the sitemap.&lt;/p&gt;

&lt;p&gt;That meant simply changing the hosting platform was probably not enough.&lt;/p&gt;

&lt;p&gt;Later, I tested the same project with a different temporary domain. Under that different domain, Google Search Console could fetch the sitemap successfully.&lt;/p&gt;

&lt;p&gt;That changed the interpretation.&lt;/p&gt;

&lt;p&gt;The issue was probably not just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cloudflare Pages vs Vercel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The stronger signal was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The failure was likely tied to the production domain or Google/GSC state associated with that domain.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This distinction matters. Without it, it is easy to draw the wrong conclusion and think that a hosting migration alone would fix everything.&lt;/p&gt;

&lt;p&gt;Here is the simplified control-variable table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deployment path&lt;/th&gt;
&lt;th&gt;Domain used&lt;/th&gt;
&lt;th&gt;GSC status&lt;/th&gt;
&lt;th&gt;What it suggested&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Pages&lt;/td&gt;
&lt;td&gt;Production domain&lt;/td&gt;
&lt;td&gt;Failed: &lt;code&gt;Could not fetch&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Not explained by basic Cloudflare DNS, SSL, WAF, or robots settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vercel diagnostic deployment&lt;/td&gt;
&lt;td&gt;Production domain&lt;/td&gt;
&lt;td&gt;Failed: &lt;code&gt;Could not fetch&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Moving the same project to Vercel did not fix the production-domain problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Same project on a different temporary domain&lt;/td&gt;
&lt;td&gt;Temporary domain&lt;/td&gt;
&lt;td&gt;Success&lt;/td&gt;
&lt;td&gt;The sitemap output was likely valid; the production domain or GSC state became the stronger suspect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Workers + OpenNext&lt;/td&gt;
&lt;td&gt;Production domain&lt;/td&gt;
&lt;td&gt;Failed: &lt;code&gt;Could not fetch&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Swapping backend infrastructure did not fix the production-domain problem&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Trying Cloudflare Workers and OpenNext
&lt;/h2&gt;

&lt;p&gt;Because the old Cloudflare Pages build chain used &lt;code&gt;@cloudflare/next-on-pages&lt;/code&gt;, and that adapter is deprecated, I also tested a Workers/OpenNext path.&lt;/p&gt;

&lt;p&gt;This was not a casual check. I actually went through the deployment path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;added OpenNext/Workers configuration&lt;/li&gt;
&lt;li&gt;configured &lt;code&gt;wrangler&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;tested a Workers custom domain&lt;/li&gt;
&lt;li&gt;confirmed the Worker was serving traffic&lt;/li&gt;
&lt;li&gt;saw the &lt;code&gt;x-opennext&lt;/code&gt; response header&lt;/li&gt;
&lt;li&gt;tested the homepage&lt;/li&gt;
&lt;li&gt;tested &lt;code&gt;robots.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;tested the sitemap&lt;/li&gt;
&lt;li&gt;tested the runtime routes needed by the app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first, the Worker test domain worked.&lt;/p&gt;

&lt;p&gt;Then I tried switching the production domain from Pages to Workers.&lt;/p&gt;

&lt;p&gt;That required removing the production custom domains from Pages and adding them to the Worker, because Cloudflare would not allow the same hostname to be managed by both at once.&lt;/p&gt;

&lt;p&gt;After the switch, the production domain did serve through Workers/OpenNext.&lt;/p&gt;

&lt;p&gt;The key routes still returned valid responses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/              -&amp;gt; 200
/sitemap.xml   -&amp;gt; 200 application/xml
/robots.txt    -&amp;gt; 200 text/plain
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response headers confirmed traffic was going through OpenNext Workers.&lt;/p&gt;

&lt;p&gt;Then I submitted the sitemap again in Google Search Console.&lt;/p&gt;

&lt;p&gt;It still failed.&lt;/p&gt;

&lt;p&gt;I also tried a cache-busting sitemap URL with a query string.&lt;/p&gt;

&lt;p&gt;That URL returned valid XML outside GSC.&lt;/p&gt;

&lt;p&gt;GSC still said it could not fetch it.&lt;/p&gt;

&lt;p&gt;That was an important result.&lt;/p&gt;

&lt;p&gt;It meant the original hypothesis was not supported:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;This was not simply a Cloudflare Pages or next-on-pages problem.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same production domain still had the issue even after moving the delivery path to Workers/OpenNext.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Strongest Conclusion
&lt;/h2&gt;

&lt;p&gt;After all of these tests, the most likely root area was not the XML file itself.&lt;/p&gt;

&lt;p&gt;It was also not clearly one hosting provider.&lt;/p&gt;

&lt;p&gt;The strongest clue was the domain test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;same project&lt;/li&gt;
&lt;li&gt;same kind of sitemap&lt;/li&gt;
&lt;li&gt;different domain&lt;/li&gt;
&lt;li&gt;GSC could fetch it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That points toward a domain-level or Google-side state issue.&lt;/p&gt;

&lt;p&gt;Possible explanations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;historical crawl state for the production domain&lt;/li&gt;
&lt;li&gt;Google Search Console state for the domain property&lt;/li&gt;
&lt;li&gt;DNS or routing history associated with the domain&lt;/li&gt;
&lt;li&gt;Google-side host classification or cache&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I cannot prove exactly which one it is.&lt;/p&gt;

&lt;p&gt;But the evidence pointed away from endlessly rewriting the sitemap XML.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Problem
&lt;/h2&gt;

&lt;p&gt;Even if the XML sitemap is valid, it is not very helpful if Google Search Console refuses to process it.&lt;/p&gt;

&lt;p&gt;The site still needs its important pages discovered.&lt;/p&gt;

&lt;p&gt;For a multilingual tools site, that matters.&lt;/p&gt;

&lt;p&gt;So I stopped treating the XML sitemap as the only discovery mechanism.&lt;/p&gt;

&lt;p&gt;I added an HTML sitemap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why an HTML Sitemap Helps
&lt;/h2&gt;

&lt;p&gt;An HTML sitemap is just a normal page with internal links.&lt;/p&gt;

&lt;p&gt;Googlebot can crawl it like any other page.&lt;/p&gt;

&lt;p&gt;That gives the site another discovery path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;normal page -&amp;gt; footer link -&amp;gt; HTML sitemap -&amp;gt; localized tool pages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does not fix the XML sitemap failure directly.&lt;/p&gt;

&lt;p&gt;It reduces the risk of relying on only one discovery mechanism.&lt;/p&gt;

&lt;p&gt;That was the practical goal.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Designed the HTML Sitemap
&lt;/h2&gt;

&lt;p&gt;I kept the page intentionally simple.&lt;/p&gt;

&lt;p&gt;The HTML sitemap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;returns plain HTML&lt;/li&gt;
&lt;li&gt;is linked from the footer&lt;/li&gt;
&lt;li&gt;uses &lt;code&gt;index, follow&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;has a canonical URL&lt;/li&gt;
&lt;li&gt;groups links by language&lt;/li&gt;
&lt;li&gt;lists only the core tool pages&lt;/li&gt;
&lt;li&gt;excludes privacy policy and terms pages&lt;/li&gt;
&lt;li&gt;avoids duplicate homepage entries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The page is not trying to be a fancy user interface.&lt;/p&gt;

&lt;p&gt;It is a reliable crawl hub.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the HTML Sitemap Contains
&lt;/h2&gt;

&lt;p&gt;The site has multiple languages and a fixed set of core tools.&lt;/p&gt;

&lt;p&gt;The HTML sitemap lists the localized version of each core tool page.&lt;/p&gt;

&lt;p&gt;In the current setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;9 languages x 11 core tools = 99 tool links
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the homepage converter, the link is represented by each language's localized tool title, instead of repeating a generic brand link.&lt;/p&gt;

&lt;p&gt;That keeps the sitemap focused and avoids unnecessary duplicates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Did Not Just Move Hosting
&lt;/h2&gt;

&lt;p&gt;Moving hosting would have been the wrong lesson.&lt;/p&gt;

&lt;p&gt;The production domain still failed even when the project was served through a different deployment path.&lt;/p&gt;

&lt;p&gt;A different temporary domain worked.&lt;/p&gt;

&lt;p&gt;That means the domain/GSC state was the more important signal.&lt;/p&gt;

&lt;p&gt;Also, the product uses browser-side media processing and heavier runtime assets that fit well with the existing Cloudflare setup.&lt;/p&gt;

&lt;p&gt;So the better solution was not to migrate everything just because one diagnostic deployment behaved differently.&lt;/p&gt;

&lt;p&gt;The better solution was to add a second crawl-discovery path.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Sitemap debugging is not always about the sitemap file.&lt;/p&gt;

&lt;p&gt;Sometimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML is valid&lt;/li&gt;
&lt;li&gt;the headers are correct&lt;/li&gt;
&lt;li&gt;the route is public&lt;/li&gt;
&lt;li&gt;bots are not blocked&lt;/li&gt;
&lt;li&gt;multiple hosting paths work technically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and Google Search Console still reports a fetch failure.&lt;/p&gt;

&lt;p&gt;At that point, adding another discovery mechanism can be more useful than continuing to tweak a valid XML file.&lt;/p&gt;

&lt;p&gt;For this case, the final strategy was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep the XML sitemap&lt;/li&gt;
&lt;li&gt;keep monitoring GSC&lt;/li&gt;
&lt;li&gt;add an HTML sitemap&lt;/li&gt;
&lt;li&gt;link it from the footer&lt;/li&gt;
&lt;li&gt;make the important localized pages discoverable through ordinary internal links&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Setup
&lt;/h2&gt;

&lt;p&gt;The live site discussed in this post is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Main site: &lt;a href="https://videosnap.cc/" rel="noopener noreferrer"&gt;https://videosnap.cc/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HTML sitemap fallback: &lt;a href="https://videosnap.cc/html-sitemap" rel="noopener noreferrer"&gt;https://videosnap.cc/html-sitemap&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The HTML sitemap is not a replacement for the XML sitemap.&lt;/p&gt;

&lt;p&gt;It is a crawl-discovery fallback.&lt;/p&gt;

&lt;p&gt;And in this case, that was the most practical solution.&lt;/p&gt;

</description>
      <category>seo</category>
      <category>nextjs</category>
      <category>webdev</category>
      <category>cloudflarechallenge</category>
    </item>
    <item>
      <title>[Fixed] How to Solve the 99% Hang in ffmpeg.wasm Apps</title>
      <dc:creator>Bob</dc:creator>
      <pubDate>Fri, 08 May 2026 10:29:30 +0000</pubDate>
      <link>https://dev.to/jsxyzb/the-99-mystery-why-my-ffmpegwasm-app-stalls-at-the-finish-line-4fla</link>
      <guid>https://dev.to/jsxyzb/the-99-mystery-why-my-ffmpegwasm-app-stalls-at-the-finish-line-4fla</guid>
      <description>&lt;p&gt;TL;DR: The hang is caused by &lt;strong&gt;memory overlap&lt;/strong&gt;. &lt;strong&gt;Delete your input file&lt;/strong&gt; before reading the output.&lt;/p&gt;

&lt;p&gt;I’ve been building VideoSnap, a tool that processes video entirely in the browser using &lt;code&gt;ffmpeg.wasm&lt;/code&gt;. For a long time, I was haunted by a specific, frustrating bug: the "99% Trap."&lt;/p&gt;

&lt;p&gt;A user uploads a file, the progress bar climbs smoothly, hits 99%... and then everything just stops. &lt;/p&gt;

&lt;p&gt;The UI becomes unresponsive. The fan starts spinning. It doesn't crash with an "Aw, Snap!" error, but it hangs there, sometimes for minutes. Then, suddenly, the download pops up as if nothing happened.&lt;/p&gt;

&lt;p&gt;I realized that the 99% mark isn't where FFmpeg is working—it's where the browser is fighting for its life.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 99% isn't FFmpeg—it's the Handover
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;ffmpeg.wasm&lt;/code&gt;, the progress bar tracks the FFmpeg execution. When it hits 99%, the heavy lifting of transcoding is actually done. &lt;/p&gt;

&lt;p&gt;The "hang" happens during the handover: when you call &lt;code&gt;engine.readFile()&lt;/code&gt; to pull the processed video out of the WebAssembly virtual memory (MEMFS) and into the JavaScript heap.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Memory Overlap" Problem
&lt;/h3&gt;

&lt;p&gt;WebAssembly (currently) has a hard 32-bit memory limit (effectively ~2GB). Imagine you are converting a 500MB video:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Peak:&lt;/strong&gt; At 99%, the WASM memory is holding your 500MB input file PLUS the newly generated 500MB output file. That’s &lt;strong&gt;1GB&lt;/strong&gt; of WASM memory occupied.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Request:&lt;/strong&gt; You call &lt;code&gt;engine.readFile()&lt;/code&gt;. JavaScript now tries to allocate a &lt;strong&gt;new 500MB&lt;/strong&gt; &lt;code&gt;Uint8Array&lt;/code&gt; to copy that data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The GC Storm:&lt;/strong&gt; Your browser is now trying to manage nearly &lt;strong&gt;1.5GB to 2GB&lt;/strong&gt; of massive, contiguous memory blocks. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This triggers a "Stop-The-World" Garbage Collection (GC) event. The browser's Main Thread locks up completely. It is desperately trying to defragment memory to find a 500MB hole. This intense "GC thrashing" is why the UI freezes before the file finally breaks free.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Surgical" Fix: Breaking the Overlap
&lt;/h2&gt;

&lt;p&gt;Once I understood that the stall was caused by the &lt;strong&gt;simultaneous existence&lt;/strong&gt; of the input and output files in MEMFS, the fix became obvious. &lt;/p&gt;

&lt;p&gt;I needed to clear the desk before trying to move the big box.&lt;/p&gt;

&lt;p&gt;I implemented what I call &lt;strong&gt;Surgical Memory Management&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The optimized handover logic:&lt;/span&gt;

&lt;span class="c1"&gt;// 1. FFmpeg is done. Before we even THINK about reading the output, &lt;/span&gt;
&lt;span class="c1"&gt;// we must kill the input file to free up hundreds of MBs in WASM.&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deleteFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;input.mp4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 

&lt;span class="c1"&gt;// 2. Now that the WASM memory has breathing room, we read the result.&lt;/span&gt;
&lt;span class="c1"&gt;// The browser can allocate the JS buffer without a massive GC fight.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;output.mp4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. The millisecond we have the data in JS, we nuke the WASM output copy.&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deleteFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;output.mp4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 4. Now WASM is empty, and we only hold the file in the JS heap.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;video/mp4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By reordering these deletions, I eliminated the massive memory overlap at the exact moment the browser needs memory the most. The 99% hang doesn't magically vanish—it still takes time for the browser to allocate large JS buffers—but this surgical cleanup shaves off crucial seconds of GC thrashing. More portantly, it keeps the browser tab from quietly suffocating under heavy files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I didn't use WORKERFS or OPFS
&lt;/h2&gt;

&lt;p&gt;I explored other options, but they all had catch-22s:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WORKERFS:&lt;/strong&gt; It mounts files without copying them, which sounds perfect. But it uses a synchronous I/O bridge that makes FFmpeg run significantly slower. I traded memory for a massive speed penalty. Not worth it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OPFS (Origin Private File System):&lt;/strong&gt; This is the future. It streams data directly to disk. But it requires a custom-built FFmpeg core with &lt;code&gt;WASMFS&lt;/code&gt; support, which is a massive engineering undertaking that the official &lt;code&gt;@ffmpeg/ffmpeg&lt;/code&gt; doesn't support out-of-the-box yet.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Takeaway: Know Your Handovers
&lt;/h2&gt;

&lt;p&gt;If you are building high-performance WebAssembly apps, remember: &lt;strong&gt;the most dangerous part of the pipeline is the data handover.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you move large amounts of data between the WASM "world" and the JS "world," the browser doesn't see a file—it sees a massive, contiguous memory allocation request. If you don't clean up your internal state &lt;em&gt;before&lt;/em&gt; you make that request, you're asking for a GC storm.&lt;/p&gt;

&lt;p&gt;VideoSnap is now significantly more stable, not because I made the math faster, but because I managed the memory lifecycle with more precision.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I’m the builder of &lt;a href="https://videosnap.cc" rel="noopener noreferrer"&gt;VideoSnap&lt;/a&gt;. I write about the messy reality of building high-performance tools in the browser. Follow for more deep dives.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>performance</category>
      <category>webassembly</category>
      <category>javascript</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Why Prompt-Only Moderation Failed in My AI Generation App</title>
      <dc:creator>Bob</dc:creator>
      <pubDate>Fri, 10 Apr 2026 06:50:29 +0000</pubDate>
      <link>https://dev.to/jsxyzb/why-prompt-only-moderation-failed-in-my-ai-generation-app-1m11</link>
      <guid>https://dev.to/jsxyzb/why-prompt-only-moderation-failed-in-my-ai-generation-app-1m11</guid>
      <description>&lt;p&gt;When I first added moderation to my AI generation app, I treated it as a text problem.&lt;/p&gt;

&lt;p&gt;That seemed reasonable at the time. A user sends a prompt, I check the prompt, and if it looks unsafe, I block the request before it reaches the model.&lt;/p&gt;

&lt;p&gt;That approach worked for a very short time.&lt;/p&gt;

&lt;p&gt;It stopped working the moment I supported image inputs, reference images, and multiple generation flows. At that point, I realized something important: prompt-only moderation is not really moderation. It is just one partial check inside a much larger pipeline.&lt;/p&gt;

&lt;p&gt;This post is about what changed in my backend once I accepted that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mistake: treating moderation as a wrapper
&lt;/h2&gt;

&lt;p&gt;A lot of AI products start with moderation as a thin wrapper around generation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;receive a prompt&lt;/li&gt;
&lt;li&gt;run a text safety check&lt;/li&gt;
&lt;li&gt;call the model provider&lt;/li&gt;
&lt;li&gt;return the result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The problem is that real generation workflows are rarely that simple.&lt;/p&gt;

&lt;p&gt;Once users can upload source images, provide reference images, or switch between text-to-image and image-to-image generation flows, the prompt becomes just one component of the overall request. A completely harmless prompt can still be paired with problematic input images. If the backend only inspects the text, the system will inevitably have a blind spot.&lt;/p&gt;

&lt;p&gt;That was the first issue I had to fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moderation belongs inside the generation pipeline
&lt;/h2&gt;

&lt;p&gt;I ended up moving moderation into the backend generation workflow itself instead of treating it as a separate utility.&lt;/p&gt;

&lt;p&gt;Conceptually, the flow became:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;validate the request&lt;/li&gt;
&lt;li&gt;load the selected provider and model&lt;/li&gt;
&lt;li&gt;inspect both prompt text and image inputs&lt;/li&gt;
&lt;li&gt;block flagged requests before spending credits&lt;/li&gt;
&lt;li&gt;create the generation task only if moderation passes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That decision helped for two reasons.&lt;/p&gt;

&lt;p&gt;First, it kept moderation close to the actual business rules. I did not want unsafe requests to consume credits, create external jobs, or leave behind half-failed task records.&lt;/p&gt;

&lt;p&gt;Second, it forced me to normalize the input shape. Instead of only thinking in terms of prompt, I had to define a moderation input that could include prompt text, image URLs, model context, and generation scene.&lt;/p&gt;

&lt;p&gt;That made the system much easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt checks are useful, but incomplete
&lt;/h2&gt;

&lt;p&gt;Text moderation is still valuable. It catches a lot of obvious cases early, and it is usually cheaper and faster than processing images.&lt;/p&gt;

&lt;p&gt;But text-only checks have two major limitations.&lt;/p&gt;

&lt;p&gt;The first is obvious: users can submit problematic visual input even if the prompt itself looks harmless.&lt;/p&gt;

&lt;p&gt;The second is less obvious: language coverage is uneven. Depending on the moderation provider, some languages are better supported than others. That means your confidence level should not be the same across all prompts.&lt;/p&gt;

&lt;p&gt;In my case, that pushed me toward a more defensive design: if text checks are incomplete, the rest of the safety system has to acknowledge that limitation instead of pretending the problem is solved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Images changed the design
&lt;/h2&gt;

&lt;p&gt;The biggest improvement came from treating image inputs as first-class moderation targets.&lt;/p&gt;

&lt;p&gt;That sounds straightforward, but it changed several implementation details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the moderation step now had to collect image URLs from different request fields&lt;/li&gt;
&lt;li&gt;the backend needed one normalized moderation interface, even if the underlying provider had different APIs for text and image checks&lt;/li&gt;
&lt;li&gt;moderation results had to return structured categories and scores, not just a single boolean&lt;/li&gt;
&lt;li&gt;failure behavior had to be explicit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters more than it seems.&lt;/p&gt;

&lt;p&gt;If a moderation provider fails, what should happen?&lt;/p&gt;

&lt;p&gt;You have to choose between two imperfect options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fail-open: allow the request and accept some risk&lt;/li&gt;
&lt;li&gt;fail-closed: block the request and accept some false positives or degraded UX&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is no universal correct answer. It depends on the kind of product you are building, your abuse tolerance, and how costly a bad generation is for you. But the important part is to make the decision deliberately. Silent fallback logic is where safety systems get weak.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider-specific APIs should not leak everywhere
&lt;/h2&gt;

&lt;p&gt;Another lesson was that moderation providers should be isolated behind a small internal interface.&lt;/p&gt;

&lt;p&gt;Not because provider abstraction is fashionable, but because safety logic tends to spread if you let it.&lt;/p&gt;

&lt;p&gt;If one route handler knows how text moderation works, another knows how image moderation works, and a third knows how to interpret provider-specific category names, you do not have a moderation layer anymore. You have moderation fragments.&lt;/p&gt;

&lt;p&gt;I found it much cleaner to keep a moderation manager in the backend and let the generation route ask one question: “Is this request safe enough to proceed?”&lt;/p&gt;

&lt;p&gt;That does not remove complexity. It contains it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical takeaway
&lt;/h2&gt;

&lt;p&gt;The most useful shift in my thinking was this:&lt;/p&gt;

&lt;p&gt;Moderation is not a feature attached to generation. It is part of generation.&lt;/p&gt;

&lt;p&gt;Once I started treating it that way, the backend became easier to evolve. I could add checks for both prompt text and image inputs, make blocking decisions before credits were consumed, and keep provider-specific moderation details out of the rest of the app.&lt;/p&gt;

&lt;p&gt;I am using this approach while building &lt;a href="https://videoflux.video" rel="noopener noreferrer"&gt;videoflux.video&lt;/a&gt;, where one workflow needs to support AI image and video generation without assuming that a prompt alone tells the full safety story.&lt;/p&gt;

&lt;p&gt;Disclosure: I’m the builder of videoflux.video.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>security</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
