I run three directory sites that display affiliate links, AdSense slots, and Amazon blocks — but only when the corresponding environment variables are set in Cloudflare Pages. When the variables aren't deployed, those sections simply don't render. No error. No broken layout. Just missing revenue, invisible unless you check.
This happened twice in the first month. A redeploy would go out without the affiliate env vars being re-applied. The site looked identical to a working version at a glance. Clicking around would eventually reveal the missing CTA, but only if you happened to land on the right page type.
I wrote scripts/check-affiliates.mjs to make that check fast and explicit.
What the script does
The script checks three sites in sequence. For each site, it:
- Fetches
/sitemap-index.xmlto find the sub-sitemap for detail pages - Picks one detail URL from that sitemap
- Fetches the raw HTML
- Checks for specific strings that indicate each CTA is rendered
The output is a plain pass/fail report per site:
→ aiappdex.com
ads.txt: ✓ pub ID set
sample: https://aiappdex.com/models/qwen2-7b/
affiliate CTA "Run this model on": ✓ rendered
AdSense slot: ✓ in HTML
Amazon block: ✗ hidden (PUBLIC_AMAZON_TAG not deployed)
One line per check, one pass per site, one run to confirm three deployments.
The sitemap crawl
Hardcoding a URL would make the check brittle — detail URLs change when slugs are regenerated, and I'd rather not maintain a separate list. Using the sitemap is more robust: it's the canonical URL source the site already generates.
async function pickFirstSlug(site, prefix) {
const sitemapRes = await fetch(`https://${site}/sitemap-index.xml`);
if (!sitemapRes.ok) return null;
const idx = await sitemapRes.text();
const subSitemap = (idx.match(/<loc>([^<]+)<\/loc>/) ?? [])[1];
if (!subSitemap) return null;
const subRes = await fetch(subSitemap);
if (!subRes.ok) return null;
const sub = await subRes.text();
const urls = [...sub.matchAll(/<loc>([^<]+)<\/loc>/g)].map((m) => m[1]);
const detail = urls.find((u) => u.includes(prefix) && !u.endsWith(prefix));
return detail ?? null;
}
It parses <loc> tags from the sitemap XML with a regex rather than a full XML parser. That's intentional — DOMParser isn't available in Node by default, and adding a dependency for sitemap parsing felt disproportionate. The regex works because sitemap XML format is structurally consistent; a more complex format would warrant a parser.
One thing to note: the function takes a prefix argument (like /models/ or /games/). That's how I distinguish detail pages from the index pages that also appear in the sitemap. I want a URL like /models/qwen2-7b/, not /models/.
Checking ads.txt and affiliate strings
The ads.txt check is a separate fetch, not the HTML check. It looks for the AdSense publisher ID pattern:
const adsTxtRes = await fetch(`https://${site}/ads.txt`);
const adsTxt = await adsTxtRes.text();
const hasAdsensePub = /pub-\d{10,}/.test(adsTxt);
The HTML checks are string presence checks against the fetched page:
const hasSection = html.includes(section);
const hasAdsense = html.includes("adsbygoogle") && html.includes("data-ad-client");
const hasAmazon = html.includes("Gear up on Amazon") || html.includes("amazon.com/s?k=");
The section variable is site-specific. For aiappdex.com it's "Run this model on"; for findindiegame.com it's "Find on other stores"; for ossfind.com it's "Self-host on". These are heading strings that only appear in the rendered HTML when the relevant env var is set.
I deliberately check strings that are human-readable rather than env var names or data-attributes. If the heading renders, the CTA is live. If the heading is absent, something upstream didn't connect. The message in that case tells me exactly which env var to check in Cloudflare.
Why this is better than a visual check
The pattern I was relying on before — opening a few pages after a deploy and eyeballing them — has two problems. First, it's slow across three sites with multiple CTA types. Second, it's unreliable at catching conditional rendering: an affiliate block that's absent looks the same as a block I intentionally disabled or that I haven't scrolled to yet.
A script that fetches programmatically, checks presence by string match, and reports pass/fail for each CTA type takes about two seconds and catches the failure unambiguously. The output is readable in a terminal and doesn't require loading a browser.
This connects to the same principle behind the three-tier content quality ladder: checks at different stages catch different things. Post-deploy verification catches deploy-time configuration problems. Pre-commit linting catches content problems. Neither replaces the other.
What I'd add
Right now the script only checks one sample page per site. A more thorough version would check one page from each content type per site — a model page, a compare page, an alternatives page — since some CTAs only render on specific page types. That would require more sitemap traversal but would catch more edge cases.
The output format is human-readable but not machine-parseable. If I wanted to hook this into CI and fail a deploy when a CTA is missing, I'd add a JSON output mode and return a non-zero exit code on any ✗. For now I run it manually after deploys — it takes less than ten seconds and the terminal output is enough.
Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.
Top comments (0)