DEV Community

MORINAGA
MORINAGA

Posted on

5 things I noticed this week while shipping three programmatic SEO sites

Week 2 of running three AI-curated directory sites. Five things that caught me off-guard between Monday and Friday.

1. @astrojs/sitemap generates /sitemap-0.xml, not /sitemap-index.xml

I spent most of Tuesday assuming Google Search Console was silently ignoring my sitemap submission. It wasn't. On Astro sites with fewer pages, the @astrojs/sitemap plugin outputs /sitemap-0.xml as the leaf file — not a sitemap index. GSC expects the index URL unless you submit the leaf directly.

The fix is a single Cloudflare redirect rule (/sitemap-index.xml → /sitemap-0.xml), but you'll also want to re-submit the correct URL in GSC and re-ping IndexNow. The plugin behavior is correct by spec; it just doesn't match the pattern assumed in most Astro tutorials.

2. Cloudflare Pages returns HTTP 500 on pages with certain URL substrings

One of my sites stores open-source alternative listings that include DigitalOcean referral links. Every page containing the string m.do.co threw a silent HTTP 500 from Cloudflare's edge — no error logs, no detail, just 500. Local dev preview and Vercel were both fine.

The workaround was encoding the problematic substring in stored content before it reaches the HTML output. Isolating it took longer than it should have because the behavior only triggers on-edge. I've since migrated one site back off Cloudflare Pages to reduce the unknown footgun surface.

3. IndexNow + Wayback Machine is a free two-line workflow addition

After the sitemap gap, I wired two extra steps into the article publish GitHub Actions workflow:

  1. Ping IndexNow immediately on merge (covers Bing, Yandex, Naver)
  2. POST the article URL to the Wayback Machine public save API

The Wayback step creates a timestamped public record of when the content existed. Whether it influences Google's crawl schedule I genuinely don't know yet — I'll publish crawl data at the 30-day mark. Both steps are free and add about 3 seconds to total workflow time. The upside is a paper trail that proves publication date if a scraper syndicates the content first.

4. You can add a second content source to a single-source directory without schema forks

My indie game directory was built entirely around the Steam API: appId as primary key, all metadata derived from Steam's schema. Adding itch.io games meant introducing a source that has no equivalent of appId.

The solution was two nullable additions: a source column (steam | itchio) and a landing_url column that's computed at render time for Steam entries and stored directly for itch.io. No ETL fork, no separate table. Week 2 isn't the right moment to build a multi-source abstraction layer — one table with two nullable paths works until it stops working, and that inflection point is later than you think.

5. Hashnode sets itself as canonical if you post there without originalArticleURL

When cross-posting to Dev.to and Hashnode, order and explicit fields matter. Post to Hashnode first — or post without providing originalArticleURL — and Hashnode treats its own URL as the canonical. That means Google may index the Hashnode version as authoritative, not ideal if Dev.to is your intended primary channel.

Fix: in packages/publish/src/hashnode.ts, wait for the Dev.to publish response, extract its canonical_url, then pass it as originalArticleURL in the Hashnode GraphQL mutation. One extra await, one extra field. The Dev.to URL was already available in the pipeline — it just wasn't being passed downstream.


Three more weeks before GSC data becomes meaningful. Next week I want to look at structured data validation — I suspect a few pages have malformed FAQ JSON-LD that's quietly excluded from rich result eligibility.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

Top comments (0)