<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MORINAGA</title>
    <description>The latest articles on DEV Community by MORINAGA (@morinaga).</description>
    <link>https://dev.to/morinaga</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3907455%2F8e6a4a13-bec8-4ec0-bc2d-ec192b7880f8.png</url>
      <title>DEV Community: MORINAGA</title>
      <link>https://dev.to/morinaga</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/morinaga"/>
    <language>en</language>
    <item>
      <title>How I kept 62 of 80 programmatic pages alive while hiding them from Google</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 10 Jun 2026 22:17:34 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-kept-62-of-80-programmatic-pages-alive-while-hiding-them-from-google-1ao9</link>
      <guid>https://dev.to/morinaga/how-i-kept-62-of-80-programmatic-pages-alive-while-hiding-them-from-google-1ao9</guid>
      <description>&lt;p&gt;After my second AdSense rejection for scaled content, I had two options for the thin pages on &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt;: delete them and accept 404s on any inbound links, or keep them alive while hiding them from Google's quality evaluation. I chose the second.&lt;/p&gt;

&lt;p&gt;The reasoning: I have links pointing at some of these URLs — from earlier articles in this series, from social posts, from internal site navigation. A 404 would break all of them. The pages aren't &lt;em&gt;wrong&lt;/em&gt;, they're just thin. The correct signal to Google is "don't evaluate these" rather than "these don't exist."&lt;/p&gt;

&lt;h2&gt;
  
  
  The isCurated gate
&lt;/h2&gt;

&lt;p&gt;The gate lives in &lt;code&gt;apps/oss-alternatives/src/lib/curation.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CURATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;MIN_ALTERNATIVES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;MIN_TOP_STARS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;MIN_INTRO_LEN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isCurated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SaasEntry&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intro&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intro&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;CURATION&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MIN_INTRO_LEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;alts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alternatives&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;alts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;CURATION&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MIN_ALTERNATIVES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topStars&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;alts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stars&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topStars&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;CURATION&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MIN_TOP_STARS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three conditions, all required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;At least 4 open-source alternatives listed — a comparison page with fewer entries is barely a comparison&lt;/li&gt;
&lt;li&gt;Top alternative has 1,000+ GitHub stars — filters out obscure or unmaintained projects that don't demonstrate the category's depth&lt;/li&gt;
&lt;li&gt;Intro text is at least 80 characters — rules out the &lt;code&gt;fallback-template&lt;/code&gt; content that the &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;ETL quality ladder&lt;/a&gt; writes when Claude is unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are objective thresholds, not hand-picked entries. The gate runs automatically at every Astro build. Entries that gain another alternative or get a longer intro in the next ETL run will silently cross the threshold and become discoverable without any manual action.&lt;/p&gt;

&lt;p&gt;Currently: 18 of 80 entries pass. That's the real data state, not a target. The nightly ETL upgrades entries progressively; the curated count will grow as the content improves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the gate lives in its own module
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;saas.ts&lt;/code&gt; — where the main data access code lives — imports &lt;code&gt;@libsql/client&lt;/code&gt; to query Turso. Any module that imports &lt;code&gt;saas.ts&lt;/code&gt; at the value level picks up that dependency. Astro's static page bundles can't include server-only DB dependencies, so they'd fail to build.&lt;/p&gt;

&lt;p&gt;The solution: &lt;code&gt;curation.ts&lt;/code&gt; imports &lt;em&gt;only types&lt;/em&gt; from &lt;code&gt;saas.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;SaasEntry&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./saas.ts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TypeScript erases type imports at compile time. At runtime, &lt;code&gt;curation.ts&lt;/code&gt; has no external dependencies — it's a pure computation module that Astro can safely include in static page bundles. &lt;code&gt;saas.ts&lt;/code&gt; stays server-side-only, imported only in &lt;code&gt;getStaticPaths&lt;/code&gt; where the DB dependency is expected.&lt;/p&gt;

&lt;p&gt;This split-by-dependency-type pattern comes up regularly in Astro monorepos. Anything that touches a runtime external goes server-side; the pure logic you need in both places gets its own module.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four discovery surfaces gated on the same function
&lt;/h2&gt;

&lt;p&gt;A page being "hidden" means four things happen simultaneously:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;noindex&lt;/code&gt; meta tag&lt;/strong&gt; — &lt;code&gt;Base.astro&lt;/code&gt; checks &lt;code&gt;isCurated(entry)&lt;/code&gt; and adds &lt;code&gt;&amp;lt;meta name="robots" content="noindex, nofollow"&amp;gt;&lt;/code&gt; for entries that don't pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Sitemap exclusion&lt;/strong&gt; — &lt;code&gt;astro.config.mjs&lt;/code&gt; has a sitemap filter applying the same threshold logic. This is the one awkward part: &lt;code&gt;astro.config.mjs&lt;/code&gt; can't import from &lt;code&gt;src/&lt;/code&gt;, so the threshold values are duplicated. I put &lt;code&gt;// KEEP IN SYNC: curation.ts&lt;/code&gt; on both. Changing the thresholds in one place without updating the other would produce a sitemap that disagrees with the noindex tags — some pages would be submitted to Google while simultaneously declaring &lt;code&gt;noindex&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. RSS feed&lt;/strong&gt; — the feed only includes curated entries. Non-curated pages won't surface in feed readers as new content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Internal navigation&lt;/strong&gt; — homepage category cards, footer category links, breadcrumb paths, and "related alternatives" widgets all filter through &lt;code&gt;isCurated&lt;/code&gt;. A direct link from outside the site still reaches the page. But browsing the site organically won't surface non-curated entries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The category layer
&lt;/h2&gt;

&lt;p&gt;Categories follow the same logic. A category is only indexable if it has at least two curated entries (&lt;code&gt;CATEGORY_MIN_CURATED = 2&lt;/code&gt;). Categories below that threshold still generate pages — preserving any external links to category URLs — but they're &lt;code&gt;noindex&lt;/code&gt; and excluded from the sitemap, homepage, and footer navigation.&lt;/p&gt;

&lt;p&gt;Right now, only one category (&lt;code&gt;customer-support&lt;/code&gt;) meets the threshold. That's the honest state of the data: the site has broad coverage but thin editorial depth across most categories. As the ETL runs and more entries cross the curation threshold, more categories will become indexable automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changes automatically
&lt;/h2&gt;

&lt;p&gt;The gate is deterministic and evaluated at build time from live DB data. When &lt;code&gt;foss-alternative-to-figma&lt;/code&gt; gains its fourth alternative and Claude Haiku generates a 90-character intro in the next nightly run, the following Astro build will automatically include it in the sitemap, remove its &lt;code&gt;noindex&lt;/code&gt; tag, and add it to the relevant category card and footer link.&lt;/p&gt;

&lt;p&gt;The only thing that doesn't update automatically is the duplicate threshold in &lt;code&gt;astro.config.mjs&lt;/code&gt;. I'll eventually extract the constants to a shared JSON file that both &lt;code&gt;curation.ts&lt;/code&gt; and &lt;code&gt;astro.config.mjs&lt;/code&gt; read, eliminating the sync risk. For now the comment is the guard.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>astro</category>
      <category>typescript</category>
      <category>webdev</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>Why I'm abandoning AdSense on two sites and betting on affiliate monetization</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 10 Jun 2026 22:16:50 +0000</pubDate>
      <link>https://dev.to/morinaga/why-im-abandoning-adsense-on-two-sites-and-betting-on-affiliate-monetization-5hlc</link>
      <guid>https://dev.to/morinaga/why-im-abandoning-adsense-on-two-sites-and-betting-on-affiliate-monetization-5hlc</guid>
      <description>&lt;p&gt;The first AdSense rejection was predictable. I'd launched &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;three directory sites&lt;/a&gt; on Vercel and hadn't added custom domains immediately. &lt;a href="https://dev.to/articles/why-adsense-rejects-vercel-subdomain-sites"&gt;Google won't approve a *.vercel.app site&lt;/a&gt; — the subdomain pattern can't carry a credible publisher identity and the policy requirement for a real contact address on the privacy page can't be met on a free subdomain.&lt;/p&gt;

&lt;p&gt;Custom domains fixed that. I resubmitted.&lt;/p&gt;

&lt;p&gt;Two weeks later: rejected again. This time for "valuable inventory," which is AdSense's way of saying the content doesn't meet the quality bar they need to place ads against. The reviewer flagged scaled content. &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt; has 80 pages for 80 different paid tools. Even though Claude Haiku generates genuine editorial text for each one, &lt;a href="https://dev.to/articles/astro-slug-pages-unique-after-adsense-scaled-content-abuse"&gt;the programmatic pattern triggered AdSense's classifier&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That second rejection forced me to actually run the economics I'd been deferring.&lt;/p&gt;

&lt;h2&gt;
  
  
  The asymmetry between affiliate and AdSense for a zero-traffic site
&lt;/h2&gt;

&lt;p&gt;AdSense has an approval gate. Affiliate programs don't.&lt;/p&gt;

&lt;p&gt;For a site in month one, that asymmetry is the entire decision. Display ad revenue on a brand-new site with essentially no traffic is effectively zero regardless of whether you're approved — there's nothing to monetize. The path to positive earnings requires: getting approved, building traffic, then earning CPM-based revenue at scale.&lt;/p&gt;

&lt;p&gt;Affiliate revenue has no approval step. The first conversion earns commission the day the link is live. The earning curve is still terrible at low traffic, but the timeline starts earlier.&lt;/p&gt;

&lt;p&gt;I've been deliberately honest in this series about not having numbers to report yet. The sites launched April 23, 2026; I'll publish month-one metrics in June. But the structural argument for pivoting now — before I have revenue data — is that the two monetization models have different minimum viable conditions. AdSense requires approval. Affiliate requires a user who clicks and buys.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I didn't pivot all three sites
&lt;/h2&gt;

&lt;p&gt;Three sites, three different audiences:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Site&lt;/th&gt;
&lt;th&gt;Primary intent&lt;/th&gt;
&lt;th&gt;Monetization strategy&lt;/th&gt;
&lt;th&gt;Rationale&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;Top AI Tools&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Discover and adopt AI tools&lt;/td&gt;
&lt;td&gt;Affiliate (Amazon, SaaS programs)&lt;/td&gt;
&lt;td&gt;Purchase intent — evaluating paid tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://findindiegame.com" rel="noopener noreferrer"&gt;Find Games Like&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Find similar indie games&lt;/td&gt;
&lt;td&gt;Affiliate (Steam, Humble Bundle)&lt;/td&gt;
&lt;td&gt;Purchase intent — close to a buy decision&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Replace paid software with open-source&lt;/td&gt;
&lt;td&gt;AdSense (when approved)&lt;/td&gt;
&lt;td&gt;Anti-purchase intent — display ads monetize page views&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pivot logic turns on purchase intent. Someone browsing AI tools is probably evaluating whether to pay for a Pro plan. Someone looking for games similar to one they liked is close to a Steam purchase. Affiliate commissions trigger on exactly those decisions — the user was already considering the purchase.&lt;/p&gt;

&lt;p&gt;The OSS alternatives audience is explicitly trying to &lt;em&gt;not&lt;/em&gt; spend money. An affiliate link for "buy the paid version you were trying to avoid" is a misalignment. Display ads monetize the page view regardless of purchase intent, so AdSense is the structurally correct model for &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt; — when the editorial quality clears approval.&lt;/p&gt;

&lt;p&gt;This means ossfind stays on the quality-improvement track. I'm implementing a &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;content quality gate&lt;/a&gt; that limits which pages are indexable, reducing the scaled-content signal that triggered the rejection. The target: resubmit with a smaller set of genuinely thick pages and the rest marked noindex.&lt;/p&gt;

&lt;h2&gt;
  
  
  The implementation: monetization mode as env var, not deletion
&lt;/h2&gt;

&lt;p&gt;The cleanest part of this pivot was choosing not to delete the AdSense components. Deletion would make the decision permanent before I have revenue data. Instead I added a &lt;code&gt;PUBLIC_MONETIZATION_MODE&lt;/code&gt; env var to the shared monetization package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;MonetizationMode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;adsense&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;affiliate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getMonetization&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;MonetizationConfig&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MonetizationMode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PUBLIC_MONETIZATION_MODE&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;adsense&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;adsense&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;affiliate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// AdSense only renders when mode=adsense AND client ID is set.&lt;/span&gt;
      &lt;span class="c1"&gt;// Default "affiliate" means env leftovers can't accidentally surface ads.&lt;/span&gt;
      &lt;span class="na"&gt;ads&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;adsense&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;adsenseClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;amazon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;amazonTag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The default is &lt;code&gt;"affiliate"&lt;/code&gt;. If I forget to set the env var on a new deployment, AdSense doesn't accidentally appear and damage my publisher account reputation. To re-enable AdSense on ossfind when the quality work is done, it's one env var change in the Cloudflare Pages dashboard.&lt;/p&gt;

&lt;p&gt;This is the same "safe default" principle I apply elsewhere in the stack — the &lt;a href="https://dev.to/articles/jsonld-audit-post-deploy-ci"&gt;post-deploy JSON-LD audit&lt;/a&gt; ensures broken structured data can't reach Google undetected; the monetization default ensures AdSense can't appear on a site that hasn't been approved.&lt;/p&gt;

&lt;p&gt;I also added affiliate disclosure pages to both pivoted sites. The FTC requires disclosure when affiliate links appear; Amazon Associates adds its own ToS requirement. Each site now has &lt;code&gt;/affiliate-disclosure&lt;/code&gt; with a footer link. The copy renders from the shared &lt;code&gt;shared/legal&lt;/code&gt; package using a &lt;code&gt;privacyPolicy(site, { ads })&lt;/code&gt; function that switches between AdSense and affiliate text based on the monetization config. One source of truth for both modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What affiliate programs I'm actually using
&lt;/h2&gt;

&lt;p&gt;For &lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;Top AI Tools&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Associates&lt;/strong&gt; — primarily for AI-adjacent hardware (GPUs for local inference, books on practical ML) and tools that have physical product lines. Not every AI tool maps to an Amazon purchase, so this is supplementary coverage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct SaaS programs&lt;/strong&gt; — a handful of tools in the directory offer 20-30% recurring commission through their own partner programs. I'm applying to these individually. Slower to set up but higher per-conversion yield than Amazon.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;a href="https://findindiegame.com" rel="noopener noreferrer"&gt;Find Games Like&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Humble Bundle Partner&lt;/strong&gt; — covers Steam purchases through the Humble store. The commission on game sales is modest but consistent with audience behavior on a discovery site.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;itch.io&lt;/strong&gt; — no formal affiliate program. I link directly with no commission. Dropping itch games from the site to avoid the zero-commission awkwardness would be the wrong call; the indie-game audience expects to see itch alongside Steam.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm not using broad affiliate networks (CJ Affiliate, ShareASale) yet. At near-zero traffic, the compliance overhead isn't worth the incremental coverage. I'll add them when the sites hit meaningful monthly traffic volume — I'll know that threshold when I see it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The falsifiable bet
&lt;/h2&gt;

&lt;p&gt;By November 2026 — six months from launch — affiliate revenue on Top AI Tools and Find Games Like combined will exceed my estimate of what AdSense would have earned if approved on both sites.&lt;/p&gt;

&lt;p&gt;My AdSense estimate is: display ad CPM on a new directory site (low traffic tier) × page views ≈ single-digit dollars per month per site in the early phase. Affiliate target: one to two conversions per month at modest commission values per conversion ≈ comparable range, with no approval delay.&lt;/p&gt;

&lt;p&gt;The ranges overlap at low traffic. I'm not betting affiliate earns dramatically more. I'm betting it earns &lt;em&gt;at least as much as AdSense would have, faster, without the approval lag and quality-work costs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What would change my mind:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ossfind gets AdSense approved and earns substantially more per month than the other two sites combined via affiliate — that would signal the approval path has better unit economics than I modeled&lt;/li&gt;
&lt;li&gt;A SaaS affiliate program rejects my application or adds compliance requirements that would distort editorial recommendations (I won't link to something I wouldn't recommend regardless of commission)&lt;/li&gt;
&lt;li&gt;Traffic doesn't materialize on either site by month six — in which case the &lt;a href="https://dev.to/articles/ai-directories-vs-google-ai-overviews-bet"&gt;AI Overviews bet&lt;/a&gt; failed at a more fundamental level than the monetization question&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The update I'll actually publish
&lt;/h2&gt;

&lt;p&gt;I said &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;in the initial architecture post&lt;/a&gt; that I'd publish real numbers at 30 and 60 days. That post is due in late May 2026 for the first set.&lt;/p&gt;

&lt;p&gt;The metrics will include affiliate clicks and conversions broken down by site, AdSense quality-work progress on ossfind (measured by curated page count), and any Search Console signals worth sharing. I won't rationalize zero conversions as "still early" past month two. If the affiliate model isn't showing any signal by July, I'll say so and revisit.&lt;/p&gt;

&lt;p&gt;The honest current state: affiliate earnings are $0 and AdSense is not running on any of the three sites. That's the baseline. Everything else is probability estimates based on the structural arguments above.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can affiliate programs earn anything at very low traffic?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Technically yes, but it's rounding error until you reach a few hundred monthly visitors from high-intent queries. At low single-digit monthly conversion rates, you need consistent traffic before the commission math produces anything worth reporting. This is why month-one data won't be meaningful — the same is true for AdSense. Both models need traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not run both AdSense and affiliate on the same site?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AdSense policy allows affiliate links alongside display ads. But I'd rather keep ossfind as a clean AdSense application without the affiliate complexity for the reviewer to evaluate. Cleaner separation; easier to debug which factor drove any future rejection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you switch back to AdSense on the pivoted sites later?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. The implementation is one env var change. I specifically chose this pattern so no decision is permanent until the revenue data says it should be. If ossfind earns well under AdSense and the affiliate hypothesis turns out wrong, reversing either pivot is a ten-second config change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why keep ossfind on the AdSense track after two rejections?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The rejections were site-level, not account-level. The publisher account is in good standing. And the structural reason remains: the OSS-alternatives audience isn't buying — they're avoiding buying. Affiliate commission requires a purchase. Display ads monetize the visit. AdSense is the right model for that site if I can get the editorial quality to pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When do you expect to resubmit ossfind?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After I get the curated page count above 30. Currently at 18. Each nightly ETL run that generates real Claude Haiku content moves more entries across the threshold. I'm not setting a calendar date — I'll resubmit when the data supports it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Related: &lt;a href="https://dev.to/articles/ai-directories-vs-google-ai-overviews-bet"&gt;Why I'm betting on AI-curated directories when Google AI Overviews answer the same queries&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>indiehackers</category>
      <category>webdev</category>
      <category>showdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms, HuggingFace none</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 10 Jun 2026 22:16:37 +0000</pubDate>
      <link>https://dev.to/morinaga/three-sleep-intervals-for-three-apis-steam-250ms-github-100ms-huggingface-none-fbo</link>
      <guid>https://dev.to/morinaga/three-sleep-intervals-for-three-apis-steam-250ms-github-100ms-huggingface-none-fbo</guid>
      <description>&lt;p&gt;When I built the ETL pipelines for three programmatic directory sites in April — Top AI Tools (HuggingFace data), Find Games Like (Steam data), and Open Alternative To (GitHub data) — I had to figure out rate limits for three completely different APIs in the same week. The numbers, the failure modes, and the right way to handle errors are all different.&lt;/p&gt;

&lt;p&gt;Here's what I actually shipped and the reasoning behind each number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steam: 250ms, deliberately aggressive
&lt;/h2&gt;

&lt;p&gt;Steam's developer docs are sparse on hard rate-limit specifics. What I found from community discussion and trial: roughly 200 requests per 5 minutes per IP on the public Web API, which works out to one request per 1.5 seconds as a documented-safe interval. My code comments this openly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Steam rate limit: ~200/5min, 1.5s is safe; 250ms is aggressive but usually fine&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I chose 250ms anyway because the ETL runs as a nightly GitHub Actions job over ~60 game entries. At 250ms that's 15 seconds of sleep total. At 1.5 seconds it would be 90 seconds. The gap matters when the cron has three sites to process.&lt;/p&gt;

&lt;p&gt;The acceptable risk: Steam doesn't hard-ban on the first rate-limit violation, it returns HTTP 429 and the job logs the error. The games ETL treats review-endpoint failures as non-fatal — the game row is still written; only the review stats are absent until the next run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getAppReviewSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;appid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// ... write to DB&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;reviewsFailed&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`! Review fetch failed for appid &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;appid&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;reviewsFailed&lt;/code&gt; counter appears in the job log. If I see it climbing consistently, that's the signal to increase the sleep interval. So far I haven't needed to.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub: 100ms, with authentication doing the real work
&lt;/h2&gt;

&lt;p&gt;GitHub's REST API is explicit about limits: 60 requests per hour unauthenticated, 5,000 per hour with a personal access token. The &lt;a href="https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api" rel="noopener noreferrer"&gt;GitHub docs on rate limiting&lt;/a&gt; cover both the primary limit and the secondary limits for specific endpoint categories. The OSS alternatives ETL makes one &lt;code&gt;GET /repos/:owner/:repo&lt;/code&gt; call per alternative project — roughly 3–5 repos per SaaS tool in the seed data. Even a large seed run of 50 tools with 5 alternatives each is only 250 requests.&lt;/p&gt;

&lt;p&gt;The sleep is there as a politeness interval, but authentication is doing the real rate-limit work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;authHeaders&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;Accept&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/vnd.github+json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;X-GitHub-Api-Version&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2022-11-28&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Authorization&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GITHUB_TOKEN&lt;/code&gt; is set in GitHub Actions from a repository secret. Without it, 60 requests per hour would exhaust in under a minute for a full seed run. With it, the 5,000/hour ceiling gives comfortable headroom.&lt;/p&gt;

&lt;p&gt;One subtlety: there are two separate GitHub rate limits — the core REST API limit (5,000/hour authenticated) and the search API limit (30 requests per minute unauthenticated, 10 per second authenticated). The current ETL uses &lt;code&gt;GET /repos/:owner/:repo&lt;/code&gt; directly, not search, so the looser core limit applies. If I ever switch to search-based discovery the math changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  HuggingFace: no sleep, because none is needed
&lt;/h2&gt;

&lt;p&gt;The model registry API — listing models, fetching model metadata — has no hard documented rate limit that I've hit in weeks of nightly runs. The ETL fetches up to 100 models in one &lt;code&gt;GET /api/models?limit=100&amp;amp;sort=downloads&lt;/code&gt; call, then one detailed fetch per model. 100 rapid-fire requests, no sleep, no 429s.&lt;/p&gt;

&lt;p&gt;Part of this is the &lt;code&gt;HUGGINGFACE_TOKEN&lt;/code&gt; header in authenticated requests, which raises whatever ceiling exists. Part of it is that the registry API is explicitly designed for automated tooling at batch scale — it's the primary way model cards, metadata scrapers, and leaderboard tools consume the catalog.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;authHeaders&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HUGGINGFACE_TOKEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If I scale to 1,000 models per nightly fetch I'd add a 50ms sleep as a precaution. For 100, the simplest thing that works is also the correct thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  A comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Sleep&lt;/th&gt;
&lt;th&gt;Auth impact&lt;/th&gt;
&lt;th&gt;Failure mode&lt;/th&gt;
&lt;th&gt;Fatal?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Steam appdetails&lt;/td&gt;
&lt;td&gt;250ms&lt;/td&gt;
&lt;td&gt;None (public)&lt;/td&gt;
&lt;td&gt;429, occasional&lt;/td&gt;
&lt;td&gt;Non-fatal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Steam reviews&lt;/td&gt;
&lt;td&gt;250ms (shared)&lt;/td&gt;
&lt;td&gt;None (public)&lt;/td&gt;
&lt;td&gt;429, more frequent&lt;/td&gt;
&lt;td&gt;Non-fatal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub REST&lt;/td&gt;
&lt;td&gt;100ms&lt;/td&gt;
&lt;td&gt;60→5,000/hr&lt;/td&gt;
&lt;td&gt;403, clear message&lt;/td&gt;
&lt;td&gt;Non-fatal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HuggingFace registry&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Raises ceiling&lt;/td&gt;
&lt;td&gt;Rare 429&lt;/td&gt;
&lt;td&gt;Non-fatal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All four code paths are non-fatal. A 429 or connection error anywhere in the batch writes a fallback-template row to Turso and increments a counter. The &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;content upgrade loop&lt;/a&gt; picks up any gaps the next night.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern that matters
&lt;/h2&gt;

&lt;p&gt;The sleep interval is a guess. What actually protects the ETL from being useless after a rate-limit event is that failures are cheap. Every external API call in this stack is wrapped in a try/catch that writes degraded content rather than crashing the batch. The sleep interval controls how likely you are to hit a rate limit; the fallback chain controls what happens when you do.&lt;/p&gt;

&lt;p&gt;For indie-scale ETL — tens to hundreds of entries per night — the combination of a conservative-ish sleep and a non-fatal error path is enough. If the site grows to thousands of entries per run, I'd rethink both: moving to a queue-bounded concurrent fetcher with exponential backoff, and separating the content generation from the data fetch into stages that can be retried independently.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>programming</category>
      <category>ai</category>
      <category>showdev</category>
    </item>
    <item>
      <title>How I built a three-tier content quality ladder for programmatic directory ETL</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 22:20:34 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-built-a-three-tier-content-quality-ladder-for-programmatic-directory-etl-30b2</link>
      <guid>https://dev.to/morinaga/how-i-built-a-three-tier-content-quality-ladder-for-programmatic-directory-etl-30b2</guid>
      <description>&lt;p&gt;The three directory sites I launched in April — Top AI Tools, Find Games Like, and Open Alternative To — all generate editorial content the same way: fetch metadata from an external API, send it through Claude Haiku 4.5, write the result to Turso. But that description skips the part that actually matters for a programmatic site at scale: what happens when Claude can't run.&lt;/p&gt;

&lt;p&gt;The answer is a content quality ladder with three tiers, tracked by a single &lt;code&gt;model_used&lt;/code&gt; column.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three tiers
&lt;/h2&gt;

&lt;p&gt;Every content table across all three sites has a &lt;code&gt;model_used&lt;/code&gt; column. It takes one of three values:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Origin&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;seeded-from-json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Loaded from a curated JSON file at bootstrap&lt;/td&gt;
&lt;td&gt;Minimal — structured but thin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fallback-template&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Claude unavailable or API key absent&lt;/td&gt;
&lt;td&gt;Acceptable — technically correct, not editorial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-haiku-4-5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generated by Claude Haiku 4.5&lt;/td&gt;
&lt;td&gt;Target — editorial summaries, named examples, nuanced caveats&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Seeded content exists because each site ships with a JSON file of curated entries. Those entries have names, descriptions, and metadata from their upstream source (HuggingFace, Steam, GitHub), but no editorial layer yet. The page renders — but it reads like a database dump, not a directory.&lt;/p&gt;

&lt;p&gt;Fallback-template content is what you get when the API key isn't present or when a Claude call fails. For the AI tools site, the fallback for a model named &lt;code&gt;qwen2-7b&lt;/code&gt; in the &lt;code&gt;text-generation&lt;/code&gt; pipeline looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qwen2-7b is an open-source text-generation model available on HuggingFace.
Details are sourced from the public model registry.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not wrong. It just doesn't help anyone decide whether to use the model.&lt;/p&gt;

&lt;p&gt;Claude Haiku content is the target state. A good generation for the same model says something like: "Qwen2-7B is a 7-billion parameter instruction-tuned model from Alibaba Cloud optimized for multilingual generation, showing strong performance on Chinese and English benchmarks while fitting in 16GB of VRAM." The difference is editorial voice and specificity — neither of which template-filling can produce.&lt;/p&gt;

&lt;h2&gt;
  
  
  The upgrade query
&lt;/h2&gt;

&lt;p&gt;The ETL generation step doesn't blindly regenerate everything on each run. It targets only entries that need work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;model_content&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
   &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_used&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'fallback-template'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'seeded-from-json'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;downloads&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things happen simultaneously here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;LEFT JOIN ... WHERE c.model_id IS NULL&lt;/code&gt; catches brand-new entries added by the nightly fetch that have no content row yet.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;OR c.model_used IN ('fallback-template', 'seeded-from-json')&lt;/code&gt; catches existing rows that were written with lower-quality content.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ORDER BY m.downloads DESC&lt;/code&gt; means when the LIMIT is hit, the most-downloaded (most-visited) entries are upgraded first.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This identical query pattern appears in all three sites with different table names: &lt;code&gt;models&lt;/code&gt;/&lt;code&gt;model_content&lt;/code&gt; for AI tools, &lt;code&gt;games&lt;/code&gt;/&lt;code&gt;game_content&lt;/code&gt; for indie games, &lt;code&gt;saas&lt;/code&gt;/&lt;code&gt;saas_content&lt;/code&gt; for OSS alternatives. The abstraction was a late realization — I wrote it three times before noticing it was the same thing. A shared &lt;code&gt;buildUpgradeQuery(tableName, pkField, contentTable)&lt;/code&gt; helper would have been the right call from the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fallback chain
&lt;/h2&gt;

&lt;p&gt;Inside the generation loop, every entry goes through the same decision tree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hasApiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hasApiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;userPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;cacheSystem&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseOrFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;modelUsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-haiku-4-5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;generated&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`! Claude error for &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;fallback&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;fallback&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;cacheSystem: true&lt;/code&gt; flag marks the system prompt block with &lt;code&gt;cache_control: { type: "ephemeral" }&lt;/code&gt;. All three sites have fixed system prompts — the same AI tools instruction across every model generation, the same game critic instruction across every game — so the first call in a batch primes the cache and the remaining ~99 calls read it at the reduced input rate. I covered the mechanics in &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;the article on the shared Haiku client&lt;/a&gt;. With a ~900-token system prompt and 100 entries per run, the cache saves roughly 90,000 input tokens per nightly run. Anthropic's &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" rel="noopener noreferrer"&gt;prompt caching documentation&lt;/a&gt; has the exact pricing for cache creation vs cache read tokens.&lt;/p&gt;

&lt;p&gt;The error path is deliberately non-throwing. Any Claude failure — rate limit, network timeout, malformed response — drops through to &lt;code&gt;content = fb&lt;/code&gt; and increments &lt;code&gt;fallback&lt;/code&gt;. The run continues. If 10 of 100 Claude calls fail due to transient rate limits, 90 get written with &lt;code&gt;claude-haiku-4-5&lt;/code&gt; and the 10 failures get &lt;code&gt;fallback-template&lt;/code&gt;. Those 10 rows surface in the next night's upgrade query automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The upsert write
&lt;/h2&gt;

&lt;p&gt;Every content row is written with &lt;code&gt;INSERT ... ON CONFLICT ... DO UPDATE SET&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;game_content&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;appid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similar_games&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;good_for&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avoid_if&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;appid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt;
  &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;similar_games&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;similar_games&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;good_for&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;good_for&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;avoid_if&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avoid_if&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;generated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;model_used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_used&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The upsert makes the ETL fully idempotent: running it twice produces the same state as running it once. More importantly, it means the &lt;code&gt;model_used&lt;/code&gt; column gets overwritten when an upgrade succeeds. A row that was &lt;code&gt;fallback-template&lt;/code&gt; becomes &lt;code&gt;claude-haiku-4-5&lt;/code&gt; in-place, without any explicit "mark upgraded" step. The column just reflects what actually produced the current content.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/pairwise-ai-model-compare-pages-claude-haiku-budget-cap"&gt;compare-page ETL&lt;/a&gt; uses a different pattern: check-before-insert with an explicit &lt;code&gt;SELECT 1&lt;/code&gt; to skip already-generated pairs. Both patterns are valid. Check-before-insert is better when reprocessing is expensive (large Claude calls, multi-step generation). Upsert-overwrite is better when you always want the latest generation to win regardless of what was there before.&lt;/p&gt;

&lt;h2&gt;
  
  
  The noindex safety valve
&lt;/h2&gt;

&lt;p&gt;One consequence of shipping a three-tier system is that some pages launch with genuinely thin content. For the indie games site, the threshold is explicit in the game page component:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;noindex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;good_for&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;avoid_if&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similar_games&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a game entry has no &lt;code&gt;good_for&lt;/code&gt; audience signals, no &lt;code&gt;avoid_if&lt;/code&gt; caveats, and no similar game suggestions — which happens when the content row is missing entirely, not just fallback-template — the page gets &lt;code&gt;noindex&lt;/code&gt; in its robots meta. The page renders fine for direct visitors; it just isn't submitted to Search Console until content exists.&lt;/p&gt;

&lt;p&gt;In practice, the fallback templates do populate &lt;code&gt;good_for&lt;/code&gt; and &lt;code&gt;avoid_if&lt;/code&gt; with generic strings like "Indie game enthusiasts" and "You prefer AAA production values," so most fallback-template entries still pass the noindex check. The valve fires mainly on completely-missing rows, which are brief windows between when the fetch ETL adds a new game and when the generation ETL runs next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The export step
&lt;/h2&gt;

&lt;p&gt;After generation, a separate &lt;code&gt;export.ts&lt;/code&gt; script dumps the content tables to static JSON files that Astro reads at build time. This is the architectural detail that makes the quality ladder safe to run asynchronously.&lt;/p&gt;

&lt;p&gt;If the Anthropic API is down for an entire nightly run, the export runs with whatever's in the DB, the Astro build succeeds with existing content, and the deployed site doesn't have zero-content pages. The upgrade queue just has a larger backlog the following night.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/static-ssg-vs-dynamic-ai-rendering-directory-seo"&gt;static SSG approach&lt;/a&gt; I'm running across all three sites is partly justified by this property. Dynamic rendering from a live DB would mean a Claude outage or Turso blip directly impacts page load time for real users. The ETL → export → build pipeline adds ~24 hours of content staleness in exchange for availability that doesn't depend on the API being up at request time. For a directory site where model descriptions change rarely, that tradeoff is easy to accept.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;The generation loop is strictly sequential. One call, await, write to DB, next entry. For 100 entries at roughly 1–1.5 seconds per call that's about 2 minutes per run — fine for the current scale.&lt;/p&gt;

&lt;p&gt;At 1,000 entries it would be 20+ minutes, which starts blocking the rest of the GitHub Actions job. The fix is a semaphore-bounded batch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;PQueue&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;p-queue&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PQueue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;generateAndWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five concurrent workers would bring a 1,000-entry run down to under 5 minutes without risking the Anthropic rate limit. I've kept the sequential version because it's simpler to debug and the current batch sizes don't need it, but I'll add the queue before growing any site past ~300 entries.&lt;/p&gt;

&lt;p&gt;I also wish I'd started with better fallback copy. The initial seed templates are technically correct but thin, and some of that thin content shipped live to indexable pages before the ETL had a chance to upgrade it. A cleaner v1 strategy: run the full ETL before the first Astro build so every page that ships has at least a real Claude generation. The seeded-from-json tier exists because I moved too fast at launch; it's not architecturally necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can I run the ETL without an API key during local development?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. The &lt;code&gt;hasApiKey&lt;/code&gt; check means every generation falls through to &lt;code&gt;fallback-template&lt;/code&gt;. All DB writes still happen, the export still runs, and the Astro build succeeds. Once you add a real key, the next ETL run upgrades all &lt;code&gt;fallback-template&lt;/code&gt; rows automatically without any manual intervention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I check the current upgrade ratio?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;model_used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cnt&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;game_content&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;model_used&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A healthy site a week after launch should have mostly &lt;code&gt;claude-haiku-4-5&lt;/code&gt; rows with &lt;code&gt;fallback-template&lt;/code&gt; count trending toward zero. The &lt;code&gt;generated_at&lt;/code&gt; timestamp on each row also lets you see how recently content was last upgraded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when Claude returns malformed JSON?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each site's &lt;code&gt;parseOrFallback()&lt;/code&gt; function extracts the outermost &lt;code&gt;{...}&lt;/code&gt; block with a regex before parsing — this handles the common case where Haiku prepends an explanation like "Here is the entry:" before the actual JSON. All field accesses after the parse are null-safe and fall back to the fallback struct individually if a field is wrong type or missing. The row still gets written; &lt;code&gt;model_used&lt;/code&gt; records whichever tier actually filled the content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the cache persist between separate nightly runs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. Anthropic's ephemeral cache TTL is 5 minutes. Within a single run of 100 entries, the 99 calls after the first hit the cache. Across runs scheduled hours apart, the cache has expired and the first call re-primes it. The savings are per-batch, not cross-run — still meaningful for batches of 100, but not a persistent cost reduction over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Turso for this instead of Postgres?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I covered the comparison in detail &lt;a href="https://dev.to/articles/turso-libsql-vs-cloudflare-d1-astro-monorepo"&gt;in the Turso vs Cloudflare D1 article&lt;/a&gt;. The short version for this use case: &lt;code&gt;@libsql/client&lt;/code&gt; works identically in Node.js ETL scripts and at Astro serverless/edge, with no separate driver or connection-pooling setup for each environment. For a project where the same &lt;code&gt;getClient()&lt;/code&gt; call needs to work in GitHub Actions jobs and Vercel edge functions, that's the practical reason to use it.&lt;/p&gt;




&lt;p&gt;Related: &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;How I built a shared Claude Haiku client with system-prompt caching&lt;/a&gt; | &lt;a href="https://dev.to/articles/pairwise-ai-model-compare-pages-claude-haiku-budget-cap"&gt;How I built pairwise AI model compare pages&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>typescript</category>
      <category>turso</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Static site search for Astro in 2026: why I picked Pagefind over Algolia and Lunr</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 22:19:50 +0000</pubDate>
      <link>https://dev.to/morinaga/static-site-search-for-astro-in-2026-why-i-picked-pagefind-over-algolia-and-lunr-6dg</link>
      <guid>https://dev.to/morinaga/static-site-search-for-astro-in-2026-why-i-picked-pagefind-over-algolia-and-lunr-6dg</guid>
      <description>&lt;p&gt;I added search to all three of &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;my AI-curated directory sites&lt;/a&gt; last month. The choice wasn't obvious — there are at least four options with real adoption — so here's the breakdown I actually ran through before landing on &lt;a href="https://pagefind.app/" rel="noopener noreferrer"&gt;Pagefind&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four options I considered
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pagefind&lt;/strong&gt; is a Rust-based static search library. It runs at build time, generates an index in &lt;code&gt;/_pagefind/&lt;/code&gt;, and serves everything as static files. No backend, no API key, no per-query billing. It ships a prebuilt UI (&lt;code&gt;PagefindUI&lt;/code&gt;) that you can mount on any element, and it supports WebAssembly for in-browser querying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Algolia DocSearch&lt;/strong&gt; is free for open-source documentation sites, $49/month for commercial sites below a certain crawl limit. It indexes your content via their crawler (or an API push), stores it on Algolia's infrastructure, and gives you a hosted search widget. Fast, polished, and battle-tested — it's what most major docs sites use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lunr.js&lt;/strong&gt; is a client-side search library. You build the index at build time, serialize it to JSON, and ship it with the page. The browser loads the entire index on first search. Works offline, no external dependency, but the index size grows linearly with content, and there's no incremental loading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FlexSearch&lt;/strong&gt; is a newer alternative to Lunr with better performance characteristics and smaller bundle size, but the same core trade-off: you ship the whole index to the browser upfront.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Pagefind won
&lt;/h2&gt;

&lt;p&gt;The decisive factor was index size management. My directories have 500-1,000 entries per site, each with a multi-paragraph generated description. A Lunr index for 1,000 entries would be 2-4MB shipped with every page load. Pagefind shards its index and loads chunks lazily as the user types — so the initial load is under 30KB (the WASM binary + a small manifest), and individual chunk fetches happen on demand.&lt;/p&gt;

&lt;p&gt;The second factor was cost. Algolia DocSearch's commercial tier runs $49/month per site. I'm running three sites on a &lt;a href="https://dev.to/articles/static-ssg-vs-dynamic-ai-rendering-directory-seo"&gt;total infrastructure budget of roughly $25/month&lt;/a&gt;. Pagefind is free.&lt;/p&gt;

&lt;p&gt;The third factor was the deploy model. Because everything in &lt;code&gt;/_pagefind/&lt;/code&gt; is a static file, Cloudflare Pages caches it at the edge with no configuration. There's no API to rate-limit, no service availability to depend on, no API key to rotate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SearchDialog implementation
&lt;/h2&gt;

&lt;p&gt;The search component is a &lt;code&gt;&amp;lt;dialog&amp;gt;&lt;/code&gt; element with a Pagefind UI mounted inside it. I load the &lt;code&gt;pagefind-ui.js&lt;/code&gt; script lazily — only when the dialog is first opened — to keep it off the critical path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadPagefind&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;script&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/_pagefind/pagefind-ui.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PagefindUI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PagefindUI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;showSubResults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;resetStyles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onerror&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;p&amp;gt;Search index not available yet (first build). Try again after next deploy.&amp;lt;/p&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;head&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;s.onerror&lt;/code&gt; handler is the part most tutorials skip. On the first deploy of a new Cloudflare Pages site, the &lt;code&gt;/_pagefind/&lt;/code&gt; directory doesn't exist yet — Pagefind only runs during the build. If a user opens search before the first full build completes, &lt;code&gt;pagefind-ui.js&lt;/code&gt; 404s. Without the error handler, you get a silent failure. With it, you get a legible message.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;dialog&amp;gt;&lt;/code&gt; element is the right primitive here: it handles focus trapping automatically, Escape closes it natively, and &lt;code&gt;backdrop:&lt;/code&gt; CSS pseudo-element gives you the dimmed overlay without JavaScript. The Cmd+K keyboard shortcut is wired with &lt;code&gt;document.addEventListener("keydown", ...)&lt;/code&gt; — no library needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Pagefind doesn't do
&lt;/h2&gt;

&lt;p&gt;Two gaps I've hit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No query logging.&lt;/strong&gt; Pagefind runs entirely in the browser and doesn't send queries anywhere. For a commercial directory, knowing what users search for is valuable — it tells you which models or games to add, and which compare pages to prioritize. With Algolia you get this for free. With Pagefind you'd need to add a thin logging layer (a fetch POST to an analytics endpoint on each query event). I haven't built this yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No fuzzy matching out of the box.&lt;/strong&gt; Pagefind does stemming and basic substring matching, but "stabilty diffusion" (typo) won't match "stable diffusion". Algolia's typo-tolerance is significantly better. For an AI tools directory where model names are long and often misremembered, this matters. I'll probably add a query-suggestion layer that does fuzzy pre-matching before handing off to Pagefind.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Pagefind&lt;/th&gt;
&lt;th&gt;Algolia DocSearch&lt;/th&gt;
&lt;th&gt;Lunr.js&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$49/mo (commercial)&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index location&lt;/td&gt;
&lt;td&gt;Static files&lt;/td&gt;
&lt;td&gt;Algolia cloud&lt;/td&gt;
&lt;td&gt;Shipped with page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Initial JS load&lt;/td&gt;
&lt;td&gt;~30KB&lt;/td&gt;
&lt;td&gt;~80KB&lt;/td&gt;
&lt;td&gt;~10KB + index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index size scalability&lt;/td&gt;
&lt;td&gt;Chunked, lazy&lt;/td&gt;
&lt;td&gt;Server-side&lt;/td&gt;
&lt;td&gt;Linear, upfront&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typo tolerance&lt;/td&gt;
&lt;td&gt;Basic stemming&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query logging&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build-time integration&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Crawler / push API&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a static site on a tight infrastructure budget with 500-1,000 entries, Pagefind is the right default. If the site were larger or if I needed typo tolerance and query analytics without building them myself, Algolia would be worth the cost.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>astro</category>
      <category>webdev</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How I built pairwise AI model compare pages with Claude Haiku and a budget cap</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 22:19:37 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-built-pairwise-ai-model-compare-pages-with-claude-haiku-and-a-budget-cap-1ipn</link>
      <guid>https://dev.to/morinaga/how-i-built-pairwise-ai-model-compare-pages-with-claude-haiku-and-a-budget-cap-1ipn</guid>
      <description>&lt;p&gt;When I added compare pages to the &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;Top AI Tools directory&lt;/a&gt;, the first question I had to answer was: how many pairs am I actually looking at? With roughly 200 models across 8 pipeline tags, the naive upper bound is 200 × 199 / 2 ≈ 19,900 pairs. Generating content for each one with Claude Haiku would cost somewhere around $20 per run — not ruinous, but not something I wanted to run daily without thinking carefully.&lt;/p&gt;

&lt;p&gt;Here's what I actually built, where it falls short, and what I'd do differently if starting over.&lt;/p&gt;

&lt;h2&gt;
  
  
  The combinatorics problem
&lt;/h2&gt;

&lt;p&gt;Model compare pages exist for a specific type of query: "llama 3 vs mistral 7b", "stable diffusion vs sdxl", "whisper vs wav2vec2". These are high-intent queries — the user has already narrowed down to a shortlist and wants a concrete decision nudge. The &lt;a href="https://dev.to/articles/static-ssg-vs-dynamic-ai-rendering-directory-seo"&gt;static SSG approach I'm running&lt;/a&gt; means I need to precompute each compare page at build time, which puts pressure on how many pages I can afford to generate.&lt;/p&gt;

&lt;p&gt;The solution I landed on: group by &lt;code&gt;pipeline_tag&lt;/code&gt;, pair the top-4 models by download count within each group, then cap total pairs with a &lt;code&gt;COMPARE_LIMIT&lt;/code&gt; env var. Within a single pipeline like &lt;code&gt;text-generation&lt;/code&gt;, the top 4 models give 6 pairs (4 choose 2). Across 8 active pipelines that's roughly 48 pairs. The env cap of 50 means I stay within that budget while having room to grow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;byPipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;byPipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nx"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;byPipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[,&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;byPipe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;downloads&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;downloads&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chosen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pairing happens entirely within pipelines right now, which means I'm covering "llama vs mistral" (both &lt;code&gt;text-generation&lt;/code&gt;) but not "whisper vs gemini-vision" (cross-pipeline). Cross-pipeline comparisons are actually more valuable for users who don't know the landscape yet — that's the next iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pair_slug and idempotent inserts
&lt;/h2&gt;

&lt;p&gt;The slug for each compare pair is constructed deterministically: sort the two model slugs alphabetically, join with &lt;code&gt;--vs--&lt;/code&gt;. So whether the ETL processes &lt;code&gt;(llama-3, mistral-7b)&lt;/code&gt; or &lt;code&gt;(mistral-7b, llama-3)&lt;/code&gt;, the slug is always &lt;code&gt;llama-3--vs--mistral-7b&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pairSlug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;--vs--&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the entire ETL idempotent. The script runs every night. If all pairs already exist in the DB, it exits in a couple of seconds without a single Claude call. I check before inserting rather than using &lt;code&gt;INSERT OR IGNORE&lt;/code&gt; at the SQL level — the explicit check lets me count skipped vs generated in the same run, which I log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[compare] done — generated: 3, skipped: 47
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters for monitoring. A run that generates 0 and skips 50 is healthy. A run that generates 0 and skips 0 (nothing in DB, nothing processed) would indicate a bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Haiku with system-prompt caching
&lt;/h2&gt;

&lt;p&gt;I reuse the &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;shared Haiku client I built in week one&lt;/a&gt;, which handles &lt;code&gt;cacheSystem: true&lt;/code&gt; on the system prompt. Since the system prompt — the JSON schema instruction — is identical across all compare calls, the first call primes the cache and subsequent calls see near-zero token cost on that prefix.&lt;/p&gt;

&lt;p&gt;The user prompt includes both model names, their authors, pipeline tags, and up to 400 characters of their existing summaries (which come from the earlier content generation step):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Compare these two AI models:
A: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (author: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, pipeline: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)
   Summary: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;(none)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;
B: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (author: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, pipeline: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)
   Summary: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;(none)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;

Produce the JSON comparison.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Truncating summaries at 400 characters keeps the user prompt lean. Compare pages are about the &lt;em&gt;delta&lt;/em&gt; between two models, not a rehash of each model individually. I already have dedicated model pages for depth; the compare page needs to answer "which one, for what" — that takes maybe 6 sentences total.&lt;/p&gt;

&lt;p&gt;The system prompt requests a JSON object with &lt;code&gt;summary&lt;/code&gt;, &lt;code&gt;differences&lt;/code&gt; (array), &lt;code&gt;similarities&lt;/code&gt; (array), and &lt;code&gt;recommendation&lt;/code&gt;. Keeping the output shape narrow means Haiku rarely wanders off-schema.&lt;/p&gt;

&lt;h2&gt;
  
  
  JSON parsing with a regex fence
&lt;/h2&gt;

&lt;p&gt;Even with tight prompting, Haiku occasionally produces JSON with an explanation preamble: "Here is the comparison:" followed by the actual object. Strict &lt;code&gt;JSON.parse&lt;/code&gt; on the raw output would throw. I extract the outermost &lt;code&gt;{...}&lt;/code&gt; block with a regex before parsing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseCompare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CompareData&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CompareData&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\{[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\}&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
          &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each field is validated individually before being accepted. If &lt;code&gt;differences&lt;/code&gt; comes back as a string (occasional Haiku behavior when it conflates the array with a comma-separated list), the page falls back to the template for that field rather than crashing.&lt;/p&gt;

&lt;p&gt;The fallback struct is worth writing carefully. I spent five minutes on mine and it shows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CompareData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; and &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; are both &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; models. See each entry for specifics.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;See individual model pages for architecture and use cases.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Both are open-source models on HuggingFace.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Pick based on your compute budget and specific task requirements.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A user landing on a fallback-generated compare page gets a technically-true page that directs them to the model pages rather than a blank or error state. The &lt;code&gt;model_used&lt;/code&gt; column in the DB records &lt;code&gt;"fallback-template"&lt;/code&gt; for these rows, which I use to identify candidates for regeneration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage in libSQL and the static JSON dump
&lt;/h2&gt;

&lt;p&gt;Compare data lives in a &lt;code&gt;model_compare&lt;/code&gt; table in &lt;a href="https://dev.to/articles/turso-libsql-vs-cloudflare-d1-astro-monorepo"&gt;Turso libSQL&lt;/a&gt;, with a unique constraint on &lt;code&gt;pair_slug&lt;/code&gt;. After the ETL loop, everything gets dumped to &lt;code&gt;compare.json&lt;/code&gt; for the static build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;all&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`SELECT * FROM model_compare ORDER BY slug_a, slug_b`&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;entries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;slug_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug_a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;slug_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug_b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;pair_slug&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pair_slug&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="na"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;writeFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./src/data/compare.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Astro build reads this JSON at build time, generating one static page per pair. No runtime DB calls, no cold starts. The tradeoff is freshness: compare content is up to 24 hours stale. For "llama 3.1 vs llama 3.2", that's fine — the models don't change daily.&lt;/p&gt;

&lt;p&gt;I validate the JSON-LD on compare pages through the &lt;a href="https://dev.to/articles/jsonld-audit-post-deploy-ci"&gt;post-deploy audit CI step&lt;/a&gt; the same way I do for individual model pages. Structured data matters more on comparison queries because those are the exact queries that AI Overviews tend to surface, so getting the schema right is worth the CI overhead.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/astro-slug-pages-unique-after-adsense-scaled-content-abuse"&gt;Astro slug generation&lt;/a&gt; for compare pages uses the &lt;code&gt;pair_slug&lt;/code&gt; directly. The URL pattern is &lt;code&gt;/compare/llama-3--vs--mistral-7b/&lt;/code&gt;, which is ugly but unambiguous — the double-dash separator makes it clear this is a two-part slug rather than a hyphen in a model name.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd change starting over
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Generate cross-pipeline pairs from day one.&lt;/strong&gt; The most useful compare queries aren't "llama 3.1 vs llama 3.2" — users who care about that distinction already know. The interesting queries are cross-category: "should I run inference on a text-generation model or use a RAG pipeline?" I skipped this to stay within the budget cap, but it means I'm missing the long-tail traffic that would actually be differentiated from generic model pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drive pair selection from search query logs.&lt;/strong&gt; Right now I pick pairs by download rank. A better signal would be which pairs users actually search for. Pagefind runs client-side and doesn't log queries to any server, so I'd need a thin logging endpoint — something like a POST to a &lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;-triggered function that appends to a JSONL file. Then the ETL reads the top-N ungenerated pairs from the log. This is a small amount of infrastructure but it would make the pair selection much more demand-driven.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Raise the budget cap.&lt;/strong&gt; &lt;code&gt;MAX=50&lt;/code&gt; is conservative. At current Haiku pricing with prompt caching, 500 pairs would cost roughly $0.10 per nightly run. I was cautious when I set the default, but I've watched the billing closely and the actual spend is a fraction of what I modeled. I'll bump this to 200 in the next ETL config update.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/how-i-added-itchio-entries-to-a-steam-only-astro-directory"&gt;itch.io entries pattern I added to the indie-games directory&lt;/a&gt; taught me to plan for the second data source earlier. Compare pages have the same shape: a join between two rows. Getting the abstraction right before you have 500+ rows in the DB is much easier than retrofitting it.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does the ETL run every night even when no new models are added?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, but it's nearly free when nothing is new. The check-before-insert means most nights it does 50 DB reads and exits in under 3 seconds without touching the Claude API. The console output shows &lt;code&gt;generated: 0, skipped: 47&lt;/code&gt; which is the signal that everything is up to date.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when Claude returns malformed JSON?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parseCompare&lt;/code&gt; catches the error and returns the fallback struct. The row is still written to the DB with &lt;code&gt;model_used = "fallback-template"&lt;/code&gt;, which I can query to find rows worth retrying. In practice, this happens on maybe 2-3% of generations — usually when the two models have very sparse metadata and Haiku doesn't have enough context to produce structured output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the compare.json file get unwieldy as pairs accumulate?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At 50 pairs it's roughly 25KB. At 500 pairs it'll be around 250KB — still fine for build-time loading in Astro. If I ever hit 5,000 pairs I'd split the file by &lt;code&gt;pipeline_tag&lt;/code&gt; and lazy-import only the relevant subset for each page. For now, one flat JSON file is simpler and fast enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not compute compare content at request time with an edge function?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cold starts and cost. An edge function hit for each compare page view would add 200-500ms of latency (Haiku inference + DB round trip) and would cost much more per-pageview than the nightly batch approach. The content also doesn't need to be fresher than daily — model capabilities don't shift on an hourly basis. Static precomputation is the right tradeoff here, consistent with &lt;a href="https://dev.to/articles/static-ssg-vs-dynamic-ai-rendering-directory-seo"&gt;the broader bet on static SSG&lt;/a&gt; I'm running on all three sites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you handle the case where a model is removed from HuggingFace?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Right now, I don't. If model &lt;code&gt;foo&lt;/code&gt; is deleted from &lt;a href="https://huggingface.co" rel="noopener noreferrer"&gt;HuggingFace&lt;/a&gt; but its compare rows are still in the DB, those compare pages will still be served at build time. They'll have the old data until the model's row in &lt;code&gt;models.json&lt;/code&gt; is removed — which only happens if the model falls out of the top-500 in the nightly fetch. It's a known gap. For now, the risk is low; popular models don't disappear. A more robust system would cross-reference the compare table against the model table and tombstone orphaned pairs.&lt;/p&gt;




&lt;p&gt;Related: &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;How I built a shared Claude Haiku client with system-prompt caching&lt;/a&gt; | &lt;a href="https://dev.to/articles/turso-libsql-vs-cloudflare-d1-astro-monorepo"&gt;Turso libSQL vs Cloudflare D1 for an Astro monorepo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>astro</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Five overlooked packages running my AI directory stack</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 03:54:02 +0000</pubDate>
      <link>https://dev.to/morinaga/five-overlooked-packages-running-my-ai-directory-stack-21e7</link>
      <guid>https://dev.to/morinaga/five-overlooked-packages-running-my-ai-directory-stack-21e7</guid>
      <description>&lt;p&gt;The interesting parts of a project are not always the AI model or the hosting platform. This week I spent time reading source code for five dependencies that sit quietly in my &lt;code&gt;package.json&lt;/code&gt; files. None of them are trending. All of them are load-bearing.&lt;/p&gt;

&lt;p&gt;My stack is Astro 5 SSG + Turso libSQL + GitHub Actions cron + Claude Haiku 4.5. Three sites: Top AI Tools, Find Games Like, Open Alternative To. Seven weeks in, still under 400 total pageviews, but the infrastructure is solid enough that I can focus on content rather than firefighting.&lt;/p&gt;

&lt;h2&gt;
  
  
  tsx — TypeScript without the build ceremony
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/privatenumber/tsx" rel="noopener noreferrer"&gt;tsx&lt;/a&gt; by Hiroki Osame is how I run every ETL script in the monorepo. The command &lt;code&gt;tsx src/etl/run.ts&lt;/code&gt; just works — no tsconfig fiddling, no ts-node &lt;code&gt;--esm&lt;/code&gt; flags, no separate compile step. Under the hood it uses esbuild, which means startup is fast enough that a five-second cron warm-up doesn't matter.&lt;/p&gt;

&lt;p&gt;What surprised me when I read the repo: tsx strips types with esbuild rather than the TypeScript compiler, so it doesn't type-check. That's intentional. For ETL scripts where I want &lt;code&gt;pnpm typecheck&lt;/code&gt; to catch structural errors at CI time but not slow down the hot path, this is exactly the right tradeoff. The README calls this out clearly. I wish I'd read it three weeks ago instead of assuming tsx did full type checking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pagefind — static full-text search with no server
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/CloudCannon/pagefind" rel="noopener noreferrer"&gt;Pagefind&lt;/a&gt; runs as my &lt;code&gt;postbuild&lt;/code&gt; step: &lt;code&gt;pagefind --site dist --output-subdir _pagefind&lt;/code&gt;. It crawls the built HTML, creates a compressed WASM index, and the client-side JS loads only the chunk it needs per query. The result is search that works on a static Vercel or Cloudflare Pages deploy with zero additional infrastructure.&lt;/p&gt;

&lt;p&gt;I read through the index format docs this week. The segment files are stored as zstd-compressed binary blobs, and the JS client fetches them lazily based on the query prefix. For three sites each under 2,000 pages, the index stays under 500 KB total. The PageFind UI component is optional — I replaced it with a plain &lt;code&gt;&amp;lt;input&amp;gt;&lt;/code&gt; that calls the JS API directly so I could control the result rendering in Astro components.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crawlee — TypeScript scraping with built-in queue management
&lt;/h2&gt;

&lt;p&gt;I haven't shipped &lt;a href="https://github.com/apify/crawlee" rel="noopener noreferrer"&gt;Crawlee&lt;/a&gt; yet, but it's been on my bookmarks list since I started building the itch.io ETL. My current approach is &lt;code&gt;fetch&lt;/code&gt; + manual parsing, which works for known endpoints. Crawlee adds request queue persistence, rate limiting, and a cheerio integration for HTML extraction, all in TypeScript with native ESM support.&lt;/p&gt;

&lt;p&gt;The reason I haven't switched: my ETL runs inside GitHub Actions where I want simple, auditable scripts over a full crawl framework. But if I start scraping product pages from sites that don't have APIs — which is the next natural expansion for the OSS alternatives directory — Crawlee is the tool I'd reach for. The Apify team maintains it actively and the TypeScript types are genuinely good.&lt;/p&gt;

&lt;h2&gt;
  
  
  eemeli/yaml — small footprint, strict spec compliance
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/eemeli/yaml" rel="noopener noreferrer"&gt;yaml&lt;/a&gt; package by Eemeli Aro parses the frontmatter in my article files before cross-posting to Dev.to and Hashnode. It's 35 KB minified, has zero dependencies, and handles multi-line strings and nested objects without surprises. I switched from &lt;code&gt;js-yaml&lt;/code&gt; six weeks ago because eemeli/yaml has better ESM exports and the parse errors are more actionable when frontmatter has a typo.&lt;/p&gt;

&lt;p&gt;One thing I didn't know until this week: the &lt;code&gt;yaml&lt;/code&gt; package can also &lt;em&gt;stringify&lt;/em&gt; back to YAML, preserving comments. I don't use that feature yet, but it matters for a workflow where I want to programmatically update article frontmatter without clobbering the human-readable structure. That's on the roadmap for automating &lt;code&gt;canonical_url&lt;/code&gt; injection after Dev.to publish.&lt;/p&gt;

&lt;h2&gt;
  
  
  @libsql/client — batched writes are the underrated feature
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/tursodatabase/libsql-client-ts" rel="noopener noreferrer"&gt;@libsql/client&lt;/a&gt; TypeScript client is what connects my ETL scripts to Turso. I wrote about Turso vs Cloudflare D1 earlier this week, but I didn't cover the &lt;code&gt;batch&lt;/code&gt; API, which is the feature I actually rely on most. A single &lt;code&gt;db.batch([...])&lt;/code&gt; call wraps multiple &lt;code&gt;INSERT OR REPLACE&lt;/code&gt; statements in one network round trip, which matters when seeding a 500-row table from a GitHub Actions runner.&lt;/p&gt;

&lt;p&gt;The client supports both remote Turso connections and an embedded &lt;code&gt;file:&lt;/code&gt; mode that runs libSQL in-process with no network. I use the in-process mode for local ETL development so I don't burn Turso API quota while iterating on the seed logic. Switching between modes is one environment variable. That's the kind of DX detail that makes a dependency feel considered rather than assembled.&lt;/p&gt;




&lt;p&gt;None of these packages announced anything dramatic this week. They're just the boring infrastructure that lets the AI parts of the stack do their job. I'll write up actual traffic and content metrics in 30 days when I have a month of data worth publishing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>opensource</category>
      <category>webdev</category>
      <category>astro</category>
    </item>
    <item>
      <title>Five things that caught my attention this week in AI tools and open-source models</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 03:53:19 +0000</pubDate>
      <link>https://dev.to/morinaga/five-things-that-caught-my-attention-this-week-in-ai-tools-and-open-source-models-3hb2</link>
      <guid>https://dev.to/morinaga/five-things-that-caught-my-attention-this-week-in-ai-tools-and-open-source-models-3hb2</guid>
      <description>&lt;p&gt;A lighter week for me operationally — content refreshes, a YouTube analytics update, some Bluesky queue maintenance. Which meant more time to actually read things. Here are five items that stuck.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Claude Code Agent View changes the mental model
&lt;/h2&gt;

&lt;p&gt;Anthropic shipped Agent View inside Claude Code on May 11. It's a unified dashboard for managing multiple parallel Claude Code sessions: start a session, send it to the background, check results when you want to. The interface treats individual sessions the way a CI dashboard treats builds.&lt;/p&gt;

&lt;p&gt;I've been running Claude Code by opening multiple terminals with different working directories. It works, but the overhead of context-switching between tabs adds up fast. A UI that surfaces what each agent is doing without requiring a terminal switch is more than quality-of-life — it shifts Claude Code from "smart terminal" to "orchestration layer."&lt;/p&gt;

&lt;p&gt;That's the direction I think AI coding tools are heading. The question isn't whether you can have a useful conversation with an AI about code. It's whether you can queue up a batch of distinct tasks, step away, and come back to something actionable. Agent View is an early answer to that question.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. ZAYA1-8B trained on AMD hardware is a supply chain signal
&lt;/h2&gt;

&lt;p&gt;Zyphra released ZAYA1-8B under Apache 2.0 around May 6-7. It's a mixture-of-experts architecture: ~8B total parameters, ~760M active per token. Standard MoE efficiency math. What's not standard: the entire training run used AMD Instinct hardware.&lt;/p&gt;

&lt;p&gt;The serious open-weights training runs are almost universally done on NVIDIA H100s or A100s. Zyphra shipping a competitive reasoning model that's clean Apache license &lt;em&gt;and&lt;/em&gt; trained end-to-end on AMD is a concrete counter-example to "you need NVIDIA to train anything worth using."&lt;/p&gt;

&lt;p&gt;That doesn't mean AMD is catching up fast enough to matter at scale yet, or that my next fine-tune would go faster on Instinct hardware. It means the GPU monoculture in open-source training has a verifiable crack in it. I'm watching whether other small labs follow.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Harness productivity report has a buried lede
&lt;/h2&gt;

&lt;p&gt;Harness released &lt;em&gt;The State of Engineering Excellence 2026&lt;/em&gt; on May 13. The headline: 89% of engineering leaders report improved developer productivity; 88% report improved satisfaction since adopting AI coding tools.&lt;/p&gt;

&lt;p&gt;The headline is predictable. Every vendor survey about AI tools says the same thing. The part worth reading is the buried finding: AI has outpaced the measurement frameworks organizations use to track productivity. Existing DORA metrics — deployment frequency, change failure rate, MTTR, lead time — weren't designed for workflows where a human is reviewing and steering AI-generated output rather than writing from scratch.&lt;/p&gt;

&lt;p&gt;If you're building dev tooling and trying to sell to engineering leaders right now, "AI made us faster" is table stakes. "Here's what to measure instead, and here's how we surface it for your team" is the actual product bet worth making.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. ServiceNow Build Agent went GA inside Claude Code and Cursor
&lt;/h2&gt;

&lt;p&gt;ServiceNow &lt;a href="https://newsroom.servicenow.com/press-releases/details/2026/ServiceNow-Build-Agent-now-works-inside-every-major-AI-coding-tool-governed-by-default/default.aspx" rel="noopener noreferrer"&gt;announced on May 13&lt;/a&gt; that Build Agent is generally available in ServiceNow Studio and extended its core skills into Claude Code, Cursor, Windsurf, and GitHub Copilot — with governance defaults on. Developers can build with ServiceNow APIs from their own editors without leaving their environment.&lt;/p&gt;

&lt;p&gt;The governance-by-default choice is the interesting design decision here. Most IDE integrations hand full control to the developer and assume IT will configure guardrails separately. ServiceNow's bet is that enterprise buyers want the platform's access controls and audit trails to travel with the tool automatically. Harder to sell on a feature list; better moat if the bet holds.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. I removed MCP servers from my pipeline and reliability went up
&lt;/h2&gt;

&lt;p&gt;This one is personal. I dropped several MCP server connections from my content pipeline this week (the commit message is "i-removed-mcp-servers-and-my-pipeline-got-more-reliable," which about covers it).&lt;/p&gt;

&lt;p&gt;MCP servers add real capabilities. They also add failure surfaces: network timeouts, schema drift when a remote API changes without warning, authentication tokens that expire silently at 3 AM. My ETL runs unattended on a cron schedule. When a remote MCP call hangs, the whole job hangs. I didn't always know until I checked results the next morning.&lt;/p&gt;

&lt;p&gt;The lesson I'm taking: MCP integrations are excellent for interactive sessions where a human is watching and can handle a failure gracefully. For scheduled, unattended workflows, each external dependency is a reliability tax you pay whether or not you're awake to collect it. I'm keeping MCP for interactive use and building local fallback paths for anything production-critical.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Turso libSQL vs Cloudflare D1 for an Astro monorepo: the practical difference</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 03:53:06 +0000</pubDate>
      <link>https://dev.to/morinaga/turso-libsql-vs-cloudflare-d1-for-an-astro-monorepo-the-practical-difference-4ic4</link>
      <guid>https://dev.to/morinaga/turso-libsql-vs-cloudflare-d1-for-an-astro-monorepo-the-practical-difference-4ic4</guid>
      <description>&lt;p&gt;When I set up the shared ETL database for &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;three Astro SSG directory sites&lt;/a&gt;, I had two obvious SQLite-at-the-edge options: Turso (libSQL, runs anywhere) and Cloudflare D1 (SQLite inside Workers). I went with Turso. Here's the practical difference that drove the decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The local dev problem D1 doesn't solve cleanly
&lt;/h2&gt;

&lt;p&gt;Cloudflare D1 is native to the Workers runtime. If you're using Cloudflare Workers for server-side rendering, D1 is the obvious choice — it's edge-collocated with zero config and the &lt;code&gt;env.DB&lt;/code&gt; binding is automatic.&lt;/p&gt;

&lt;p&gt;My setup is different. The sites are &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;static Astro 5 SSG on Cloudflare Pages&lt;/a&gt; — no Workers, no server runtime. The ETL pipeline that populates the database runs in GitHub Actions. Nothing in my stack executes inside the Workers environment.&lt;/p&gt;

&lt;p&gt;To use D1 from GitHub Actions you either use the Cloudflare REST API or the &lt;code&gt;wrangler&lt;/code&gt; CLI. Both work, but neither gives you a local SQLite file you can query directly during development. You'd be hitting a remote database for every &lt;code&gt;SELECT&lt;/code&gt; during local ETL runs or testing schema changes. Wrangler does have a &lt;code&gt;--local&lt;/code&gt; flag that writes to a local D1 file, but the path and format differ from the production D1 setup, so you're managing two different code paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Turso's local fallback changes the calculus
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;@libsql/client&lt;/code&gt; package accepts a &lt;code&gt;url&lt;/code&gt; that can be either a &lt;code&gt;libsql://&lt;/code&gt; remote URL or a &lt;code&gt;file://&lt;/code&gt; path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getClient&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TURSO_DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;file:./data/local.db&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TURSO_AUTH_TOKEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;authToken&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In CI, &lt;code&gt;TURSO_DATABASE_URL&lt;/code&gt; is set to the Turso remote URL. On my laptop, the variable isn't set, so the client opens &lt;code&gt;file:./data/local.db&lt;/code&gt; — a plain SQLite file on disk. Same &lt;code&gt;@libsql/client&lt;/code&gt; package, same query API, same schema. The code path is identical.&lt;/p&gt;

&lt;p&gt;This means I can run ETL scripts locally and inspect the database with any SQLite viewer. Schema migrations apply with the same &lt;code&gt;applyMigrations()&lt;/code&gt; call used in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;applyMigrations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;migrations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getClient&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;migrations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No Docker containers. No Wrangler flags. No separate local-vs-remote code path. The same SQL that creates the &lt;code&gt;models&lt;/code&gt; table locally creates it in Turso.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the migration pattern actually looks like
&lt;/h2&gt;

&lt;p&gt;Each app defines its own migration array. The &lt;code&gt;run.ts&lt;/code&gt; entrypoint for the AI tools ETL calls &lt;code&gt;applyMigrations([CREATE_MODELS_TABLE, CREATE_REVIEWS_TABLE, ...])&lt;/code&gt; at startup. If the table already exists, &lt;code&gt;CREATE TABLE IF NOT EXISTS&lt;/code&gt; is a no-op. Idempotent, no migration runner needed.&lt;/p&gt;

&lt;p&gt;This is the same philosophy as the ETL publish step — the &lt;a href="https://dev.to/morinaga/why-i-reused-a-single-ci-pipeline-for-two-youtube-channels-and-three-seo-sites-50ae"&gt;article publish pipeline&lt;/a&gt; checks &lt;code&gt;published_urls&lt;/code&gt; in the frontmatter before posting, so re-running never double-posts. The database migration check follows the same pattern: check-then-act, idempotent by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where D1 would actually win
&lt;/h2&gt;

&lt;p&gt;If any part of the stack were running inside Cloudflare Workers — a search endpoint, an API route, a middleware layer — D1 would be the stronger choice. The &lt;code&gt;env.DB&lt;/code&gt; binding is faster than a network call to Turso's edge, and you don't need to manage auth tokens for a same-datacenter query.&lt;/p&gt;

&lt;p&gt;My architecture is fully static because the &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;freshness vs. speed trade-off works in SSG's favor&lt;/a&gt; for directory content. No Workers means D1's core advantage doesn't apply.&lt;/p&gt;

&lt;p&gt;If I add a Cloudflare Worker for site search or a revalidation webhook, I'd reconsider. A hybrid Turso (ETL reads/writes in GitHub Actions) + D1 (Workers queries) setup would be more complex than I want right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three things I don't know yet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Concurrent write performance.&lt;/strong&gt; The ETL pipeline has &lt;code&gt;max-parallel: 1&lt;/code&gt; set explicitly in the workflow. Writes are serial and controlled. I haven't tested what happens with concurrent writes, and Turso's concurrent write behavior on the free tier is not something I've stress-tested. I'll know more in 30 days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migration safety at schema evolution.&lt;/strong&gt; Adding a nullable column to an existing table is straightforward. Renaming a column or changing a type requires a table rebuild. I haven't had to do either yet. When I do, the &lt;code&gt;applyMigrations()&lt;/code&gt; approach will require careful ordering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;D1 pricing at scale.&lt;/strong&gt; Turso's free tier covers 500 databases and 1 billion row reads per month. Cloudflare D1's free tier is similar. At the current scale of three sites with daily ETL runs, neither would cost anything. If traffic grows enough to matter, I'll publish actual numbers — not estimates.&lt;/p&gt;

&lt;p&gt;The database choice was not the interesting part of this project. The local file fallback is the entire reason I made it; everything else is roughly equivalent between the two for my use case.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>astro</category>
      <category>typescript</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What I learned generating OG images for articles with Playwright and zero API cost</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 03:52:22 +0000</pubDate>
      <link>https://dev.to/morinaga/what-i-learned-generating-og-images-for-articles-with-playwright-and-zero-api-cost-18n8</link>
      <guid>https://dev.to/morinaga/what-i-learned-generating-og-images-for-articles-with-playwright-and-zero-api-cost-18n8</guid>
      <description>&lt;p&gt;The conclusion first: for a batch of under a few hundred static articles, generating OG images by screenshotting HTML templates with Playwright costs nothing, gives you full CSS control, and requires zero external API keys. The trade-offs are real — it's slow per image, it's not suitable for on-demand generation, and it has a hidden dependency on network availability during the build step. But for my use case, those trade-offs don't hurt.&lt;/p&gt;

&lt;p&gt;Here's how the script works, what broke, and what I'd do differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I avoided image generation APIs
&lt;/h2&gt;

&lt;p&gt;My three directory sites — aiappdex.com, findindiegame.com, ossfind.com — are &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;fully static Astro 5 SSG builds&lt;/a&gt;. Articles publish automatically through a GitHub Actions pipeline. The &lt;a href="https://dev.to/morinaga/why-i-reused-a-single-ci-pipeline-for-two-youtube-channels-and-three-seo-sites-50ae"&gt;pipeline already handles Dev.to, Hashnode, and Bluesky distribution&lt;/a&gt;, plus &lt;a href="https://dev.to/morinaga/auto-generating-youtube-thumbnails-with-ffmpeg-inside-a-ci-pipeline-53bn"&gt;YouTube thumbnail generation with ffmpeg&lt;/a&gt;. I didn't want to add a billed API dependency to this stack.&lt;/p&gt;

&lt;p&gt;The options I considered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloudinary with remote transformations&lt;/strong&gt;: works for on-demand, but requires a paid plan for custom fonts and the transformation URL syntax is brittle to URL-encode correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;@vercel/og (Satori-based)&lt;/strong&gt;: excellent for Next.js and Vercel serverless functions, but my sites are static pages on Cloudflare Pages — there's no Edge runtime to serve dynamic OG images from.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;node-canvas&lt;/strong&gt;: full control, zero cost, but native C++ binding compilation in GitHub Actions runners is a recurring pain point. It works, but it adds a non-trivial setup step to CI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pillow (Python image library)&lt;/strong&gt;: draws to a bitmap directly. Fine for simple layouts, but anything involving custom fonts, gradients, or CSS flexbox behavior is either impossible or requires dozens of manual coordinate calculations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Playwright approach: build an HTML string with CSS, pass it to a headless browser, screenshot it. The browser handles fonts, gradients, flexbox, and every other CSS feature I want to use. No API key. No external service. Just a 160-line Python script and Playwright installed in the runner.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the HTML template and accent color system works
&lt;/h2&gt;

&lt;p&gt;The script builds a full HTML document as a string, fills in the article title, date, and tags, and hands it to Playwright. The template has a dark card layout with an &lt;code&gt;Inter&lt;/code&gt; typeface loaded from Google Fonts CDN.&lt;/p&gt;

&lt;p&gt;The one non-obvious piece is the accent color selection. Each article has tags like &lt;code&gt;["webdev", "astro", "tutorial", "githubactions"]&lt;/code&gt;. The script matches these against five regex rules to pick an accent color:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ACCENT_RULES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b(claude|anthropic|ai|llm|machinelearning)\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#8B5CF6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# purple
&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b(astro|webdev|tailwindcss|react|nextjs|typescript|javascript)\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#0EA5E9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# blue
&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b(godot|gamedev|csharp|game|unity)\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#22C55E&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# green
&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b(opensource|github|programming|tutorial)\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#F97316&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# orange
&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b(showdev|indiehackers|productivity)\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#F59E0B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# amber
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;DEFAULT_ACCENT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#475569&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# slate fallback
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rules are checked in order; the first match wins. An article tagged &lt;code&gt;["ai", "webdev"]&lt;/code&gt; would pick purple, because &lt;code&gt;ai&lt;/code&gt; matches the first rule before &lt;code&gt;webdev&lt;/code&gt; matches the second.&lt;/p&gt;

&lt;p&gt;The accent color is inserted into the HTML at three points: the background radial gradient (at two different opacity levels: &lt;code&gt;accent + "55"&lt;/code&gt; and &lt;code&gt;accent + "33"&lt;/code&gt;), the brand mark block, and the tag pill borders. This gives each article a visually distinct color family without requiring any per-article design decision.&lt;/p&gt;

&lt;p&gt;Font size also adjusts dynamically: titles over 70 characters render at 54px; shorter titles render at 64px. This is a heuristic that prevents long titles from overflowing the card boundary. It's not perfect for every title, but I haven't needed to manually override anything across 22 articles yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The key implementation: &lt;code&gt;wait_until="networkidle"&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The core Playwright call is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;sync_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;viewport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;630&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networkidle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out_path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;full_page&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;clip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;630&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;wait_until="networkidle"&lt;/code&gt; argument was the critical discovery. Without it, Playwright fires the screenshot as soon as the DOM is ready — before Google Fonts has loaded and applied Inter. The result: the fallback &lt;code&gt;system-ui&lt;/code&gt; font renders instead, which looks noticeably different and varies by the runner's OS default.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;networkidle&lt;/code&gt; tells Playwright to wait until there are no more than 0 network connections for 500ms. In practice this means the Google Fonts CDN request completes and Inter loads before the screenshot fires. This adds roughly 300–500ms per image.&lt;/p&gt;

&lt;p&gt;The template includes &lt;code&gt;&amp;lt;link rel="preconnect" href="https://fonts.googleapis.com"&amp;gt;&lt;/code&gt; and the corresponding &lt;code&gt;gstatic.com&lt;/code&gt; preconnect to minimize the latency. Without preconnect, I saw occasional timeouts where the font didn't load fast enough within the idle window.&lt;/p&gt;

&lt;h2&gt;
  
  
  The browser instance stays open across all articles
&lt;/h2&gt;

&lt;p&gt;One implementation detail that matters for batch performance: the script opens a single browser instance and reuses the same &lt;code&gt;page&lt;/code&gt; object across all articles, calling &lt;code&gt;set_content()&lt;/code&gt; in a loop rather than navigating to a URL.&lt;/p&gt;

&lt;p&gt;This is faster than opening a new browser per article because Playwright browser startup time is around 500ms. For 22 articles, that's ~11 seconds saved. For 200 articles, it would be ~100 seconds.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;clip&lt;/code&gt; parameter on &lt;code&gt;screenshot()&lt;/code&gt; is necessary even though the viewport is already set to 1200x630. Without it, Playwright screenshots include a 1px bottom border artifact on some versions of Chromium. The clip forces the exact pixel region I want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two image formats from one pipeline
&lt;/h2&gt;

&lt;p&gt;The same GitHub Actions job runs two separate scripts: &lt;code&gt;generate-og.py&lt;/code&gt; for the standard 1200×630 OG image (used by Twitter/X, LinkedIn, Dev.to article cards), and &lt;code&gt;generate-summary.py&lt;/code&gt; for a 1080×1350 portrait image optimized for Bluesky's visual post format.&lt;/p&gt;

&lt;p&gt;The portrait image uses a structured layout with optional sections — card grids, pipeline diagrams, or stat blocks — depending on what &lt;code&gt;summary_data&lt;/code&gt; YAML is present in the article frontmatter. Articles that don't have &lt;code&gt;summary_data&lt;/code&gt; skip the portrait generation entirely and fall back to the URL card Bluesky generates natively.&lt;/p&gt;

&lt;p&gt;This is the same pipeline that runs the &lt;a href="https://dev.to/morinaga/what-i-learned-wiring-json-ld-structured-data-audits-into-a-post-deploy-ci-step-..."&gt;post-deploy JSON-LD audit&lt;/a&gt; and &lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;Bluesky image upload&lt;/a&gt;. Adding image generation was a matter of adding two &lt;code&gt;python3 scripts/...&lt;/code&gt; steps — no new runner setup beyond &lt;code&gt;pip install playwright &amp;amp;&amp;amp; playwright install chromium&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cover_image auto-patch
&lt;/h2&gt;

&lt;p&gt;One quality-of-life feature: the script writes &lt;code&gt;cover_image: &amp;lt;url&amp;gt;&lt;/code&gt; back into the article's frontmatter automatically if it's missing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_cover_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slug_base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;article_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_frontmatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cover_image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;cover_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/og/articles/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;slug_base&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;new_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^(---\n)([\s\S]*?)(\n---\n)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;cover_image: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cover_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;article_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The URL is deterministic — it's the slug plus &lt;code&gt;.png&lt;/code&gt; on my CDN. The script generates the image first, then updates the frontmatter. Dev.to and Hashnode both read &lt;code&gt;cover_image&lt;/code&gt; from the frontmatter when publishing, so the OG image shows up as the article cover automatically. No manual step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison: Playwright vs the alternatives
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;API cost&lt;/th&gt;
&lt;th&gt;CSS control&lt;/th&gt;
&lt;th&gt;CI-friendly&lt;/th&gt;
&lt;th&gt;On-demand capable&lt;/th&gt;
&lt;th&gt;Font flexibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Playwright + HTML&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Full&lt;/td&gt;
&lt;td&gt;Yes (slow, ~2s/image)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Any web font&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudinary transformations&lt;/td&gt;
&lt;td&gt;$89/mo at scale&lt;/td&gt;
&lt;td&gt;Template only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Cloudinary library&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@vercel/og (Satori)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;JSX subset&lt;/td&gt;
&lt;td&gt;Vercel only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Web fonts via fetch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;node-canvas&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Full&lt;/td&gt;
&lt;td&gt;Needs native build&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;System + manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pillow + Python&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Pixel-level only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;PIL-loaded fonts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;static sites where every page is a flat HTML file&lt;/a&gt;, on-demand OG generation is irrelevant — there's no server to serve it from. Playwright is the only option on this list that gives full CSS control without either native compilation issues or a billed external service.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd change
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bundle Inter locally instead of fetching from Google Fonts.&lt;/strong&gt; The &lt;code&gt;networkidle&lt;/code&gt; approach works, but it means a slow or blocked CDN during CI can cause font loading failures. Bunding the Inter woff2 file in the repo eliminates the network dependency entirely. I haven't done this yet because Google Fonts is convenient and the CDN has been reliable, but a CI failure at 07:00 JST because of a Google CDN blip would motivate the change immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run images in parallel with &lt;code&gt;asyncio&lt;/code&gt;.&lt;/strong&gt; The synchronous Playwright API processes articles sequentially. For 22 articles at ~2 seconds each, total time is around 45 seconds. For 200 articles, it would be ~7 minutes — too slow for a per-commit CI step. The async Playwright API supports &lt;code&gt;asyncio.gather()&lt;/code&gt; for concurrent page instances. I'll need this before the article count gets much larger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One Playwright instance per image format.&lt;/strong&gt; Currently &lt;code&gt;generate-og.py&lt;/code&gt; and &lt;code&gt;generate-summary.py&lt;/code&gt; are separate scripts that each launch their own browser. A single script that generates both formats per article would halve browser launch overhead. Minor at current scale, relevant at 200+ articles.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard limit: not suitable for on-demand generation
&lt;/h2&gt;

&lt;p&gt;If you need OG images generated per-request — for a blog where new posts are published dynamically, or for a user-facing tool — this approach doesn't work. Playwright takes 2+ seconds per image and requires a full Chromium binary. Serving that from a request path is impractical.&lt;/p&gt;

&lt;p&gt;For on-demand generation at low volume, @vercel/og or a Cloudflare Worker with a canvas API is the right answer. For batch generation at build time in CI, where you control the timing and don't care about per-image latency, Playwright is simpler than any alternative I've found.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Why Python instead of Node.js Playwright?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both Playwright SDKs are functionally equivalent for this use case. I chose Python because my other image-related scripts (&lt;code&gt;generate-summary.py&lt;/code&gt;, &lt;code&gt;polish.py&lt;/code&gt;) were already Python, and keeping them in one language simplifies the CI setup. The &lt;code&gt;sync_playwright&lt;/code&gt; API is slightly more readable than the async Node.js version for sequential batch processing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does &lt;code&gt;wait_until="networkidle"&lt;/code&gt; always ensure fonts load?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not guaranteed. &lt;code&gt;networkidle&lt;/code&gt; fires when there are zero network connections for 500ms. If the Google Fonts CDN request hasn't started yet when the idle window begins — which can happen if Playwright is very fast at rendering the DOM — the font request comes after the screenshot. In practice, the &lt;code&gt;&amp;lt;link rel="preconnect"&amp;gt;&lt;/code&gt; tags I added push the font request early enough that I haven't seen this failure mode. A more reliable approach is to wait for a specific CSS font to be applied using &lt;code&gt;page.wait_for_function("document.fonts.ready")&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Can I use this for dynamically generated pages?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, with caveats. You can pass a real URL instead of &lt;code&gt;set_content()&lt;/code&gt; using &lt;code&gt;page.goto(url, wait_until="networkidle")&lt;/code&gt;. This works well for screenshotting pages that already exist. The timing is less predictable than screenshotting a controlled HTML string because you don't control what JavaScript the page runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Why not use Satori directly without @vercel/og?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Satori is an interesting option — it renders JSX to SVG, which you can then convert to PNG with &lt;code&gt;sharp&lt;/code&gt;. It's faster per image than Playwright, doesn't require a browser binary, and works in any Node.js environment. The limitation is that it supports a subset of CSS: no &lt;code&gt;background-image: radial-gradient()&lt;/code&gt;, no &lt;code&gt;backdrop-filter&lt;/code&gt;, limited &lt;code&gt;position&lt;/code&gt; support. For my template — which depends on radial gradients for the card background — Satori would require redesigning the layout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How does the cover image URL work if the PNG isn't published yet?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;cover_image&lt;/code&gt; URL points to &lt;code&gt;https://aiappdex.com/og/articles/&amp;lt;slug&amp;gt;.png&lt;/code&gt;. The script generates the PNG first and commits it to the repository. Cloudflare Pages deploys the committed file, so by the time the article publishes to Dev.to and Hashnode, the OG image is already live at that URL. The sequence matters: image generation and commit happen before the publish step runs.&lt;/p&gt;

&lt;p&gt;Related reading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/morinaga/auto-generating-youtube-thumbnails-with-ffmpeg-inside-a-ci-pipeline-53bn"&gt;Auto-generating YouTube thumbnails with ffmpeg inside a CI pipeline&lt;/a&gt; — a different batch image generation approach in the same CI stack&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;How I fixed a Bluesky image upload race against Cloudflare Pages deploy lag&lt;/a&gt; — timing issues in the same publish pipeline&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;Why I'm betting static SSG beats dynamic AI rendering for directory SEO&lt;/a&gt; — why static architecture makes on-demand OG generation irrelevant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>tutorial</category>
      <category>githubactions</category>
      <category>python</category>
    </item>
    <item>
      <title>Rolling a Google Service Account JWT in Node.js without the googleapis package</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 03:52:09 +0000</pubDate>
      <link>https://dev.to/morinaga/rolling-a-google-service-account-jwt-in-nodejs-without-the-googleapis-package-pig</link>
      <guid>https://dev.to/morinaga/rolling-a-google-service-account-jwt-in-nodejs-without-the-googleapis-package-pig</guid>
      <description>&lt;p&gt;The &lt;code&gt;googleapis&lt;/code&gt; npm package is the default answer for calling Google APIs from Node.js. It works, but it installs around 380KB and brings in over 450 transitive dependencies. For a single API used in a CI script — the Search Console URL Inspection API — the underlying auth flow is simple enough to handle directly.&lt;/p&gt;

&lt;p&gt;I built &lt;code&gt;scripts/gsc-inspect.mjs&lt;/code&gt; to check index status for published URLs. It's about 60 lines, uses three Node.js built-ins (&lt;code&gt;crypto&lt;/code&gt;, &lt;code&gt;fetch&lt;/code&gt;, &lt;code&gt;URL&lt;/code&gt;), and adds zero packages to the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  The service account auth flow
&lt;/h2&gt;

&lt;p&gt;Google's service account auth follows RFC 7523 — the JWT Bearer Grant profile of OAuth2. The steps are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Construct a JWT with your service account's &lt;code&gt;client_email&lt;/code&gt; and private key&lt;/li&gt;
&lt;li&gt;POST that JWT to &lt;code&gt;https://oauth2.googleapis.com/token&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Receive a short-lived access token (valid 3600 seconds)&lt;/li&gt;
&lt;li&gt;Use the access token as a &lt;code&gt;Bearer&lt;/code&gt; header on API requests&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The JWT claims:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;iss&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;client_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://www.googleapis.com/auth/webmasters.readonly&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;aud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://oauth2.googleapis.com/token&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;iat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing worth knowing upfront: use the &lt;code&gt;webmasters&lt;/code&gt; scope, not &lt;code&gt;searchconsole&lt;/code&gt;. The URL Inspection API requires &lt;code&gt;webmasters.readonly&lt;/code&gt; — the newer &lt;code&gt;searchconsole&lt;/code&gt; scope doesn't grant access to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signing with Node's crypto module
&lt;/h2&gt;

&lt;p&gt;Base64url encoding is the only non-obvious part. Standard base64 needs three character replacements to become base64url:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createSign&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node:crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;b64url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/=+$/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// strip padding&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// + → -&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// / → _&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;unsigned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;b64url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;b64url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;signer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createSign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;RSA-SHA256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;signer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;unsigned&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;signer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;signer&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/=+$/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jwt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;unsigned&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;sa.private_key&lt;/code&gt; is the RSA private key string from the service account JSON you download from Google Cloud Console. It's already in PKCS#8 PEM format (&lt;code&gt;-----BEGIN PRIVATE KEY-----...&lt;/code&gt;), so &lt;code&gt;createSign("RSA-SHA256").sign(key)&lt;/code&gt; works directly. No key conversion or external library needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exchanging the JWT for an access token
&lt;/h2&gt;

&lt;p&gt;The token exchange is a URL-encoded form POST:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://oauth2.googleapis.com/token&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;content-type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/x-www-form-urlencoded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URLSearchParams&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;grant_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;urn:ietf:params:oauth:grant-type:jwt-bearer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;assertion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Token exchange failed (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;): &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;access_token&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The error handling matters here. The &lt;code&gt;invalid_grant&lt;/code&gt; error from Google often includes a useful &lt;code&gt;error_description&lt;/code&gt; like &lt;code&gt;"Token must expire within 3600 seconds of the issued time"&lt;/code&gt; or &lt;code&gt;"Service account not found"&lt;/code&gt;. Logging the raw response body — truncated to 300 chars — surfaces that directly without digging through a framework's error abstraction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calling the URL Inspection endpoint
&lt;/h2&gt;

&lt;p&gt;With the token in hand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://searchconsole.googleapis.com/v1/urlInspection/index:inspect&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;accessToken&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;content-type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;inspectionUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;siteUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`https://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;siteHost&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;coverageState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inspectionResult&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;indexStatusResult&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;coverageState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lastCrawlTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inspectionResult&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;indexStatusResult&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;lastCrawlTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;siteUrl&lt;/code&gt; field must match a property you've verified in Google Search Console — and it must be the exact string you registered (trailing slash matters). After &lt;a href="https://dev.to/morinaga/verifying-three-custom-domains-in-google-search-console-with-cloudflare-dns-2jfh"&gt;verifying three domains with Cloudflare DNS TXT records&lt;/a&gt;, you also need to add the service account's &lt;code&gt;client_email&lt;/code&gt; as a Search Console user (Owner or Full user) before the API will respond to requests.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;coverageState&lt;/code&gt; values include &lt;code&gt;INDEXED&lt;/code&gt;, &lt;code&gt;SUBMITTED_AND_INDEXED&lt;/code&gt;, &lt;code&gt;CRAWLED_CURRENTLY_NOT_INDEXED&lt;/code&gt;, and a few others. For post-publish verification, one JSON line per URL is enough — grep-able, loggable in CI without any special tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  When not to use this approach
&lt;/h2&gt;

&lt;p&gt;Raw implementation is appropriate for a single API in a CI script. It's less appropriate when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're calling multiple Google APIs and want unified auth handling&lt;/li&gt;
&lt;li&gt;You need automatic token refresh across long-running processes&lt;/li&gt;
&lt;li&gt;You need retry logic, batching, or type-safe API responses&lt;/li&gt;
&lt;li&gt;You're shipping production server code where a well-tested library is worth its weight&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this project, the tradeoff is clear: I'm running &lt;a href="https://dev.to/morinaga/why-i-reused-a-single-ci-pipeline-for-two-youtube-channels-and-three-seo-sites-50ae"&gt;one CI pipeline across five automated workflows&lt;/a&gt; and avoiding unnecessary npm additions. Sixty lines of readable, inspectable code beats 450 transitive dependencies for a use case this narrow.&lt;/p&gt;

&lt;p&gt;The implementation is also useful as documentation. The googleapis package abstracts the JWT flow so thoroughly that many developers don't know what's actually happening in the auth exchange. Understanding the raw flow — JWT → token endpoint → Bearer header — makes debugging auth failures faster regardless of what library you end up using.&lt;/p&gt;

&lt;p&gt;The sites are still new. I'll publish actual per-URL index coverage data in 30 days once there's something meaningful to report.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>typescript</category>
    </item>
    <item>
      <title>What I learned wiring JSON-LD structured data audits into a post-deploy CI step</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 09 Jun 2026 03:51:52 +0000</pubDate>
      <link>https://dev.to/morinaga/what-i-learned-wiring-json-ld-structured-data-audits-into-a-post-deploy-ci-step-3ga8</link>
      <guid>https://dev.to/morinaga/what-i-learned-wiring-json-ld-structured-data-audits-into-a-post-deploy-ci-step-3ga8</guid>
      <description>&lt;p&gt;The conclusion first: JSON-LD structured data is one of those things that can vanish from your site without breaking anything visible. The Astro build succeeds. The Cloudflare Pages deploy completes. The page renders fine in a browser. But inside the &lt;code&gt;&amp;lt;script type="application/ld+json"&amp;gt;&lt;/code&gt; block — what Googlebot reads to decide whether your page qualifies for rich results — something went wrong, and you won't know until Search Console flags it weeks later.&lt;/p&gt;

&lt;p&gt;I added a post-deploy audit step to my CI pipeline that finds this in under 60 seconds. Here's how the script works, what it found on first run, and where the approach falls short.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why structured data breaks silently in Astro SSG
&lt;/h2&gt;

&lt;p&gt;My three directory sites — aiappdex.com, findindiegame.com, ossfind.com — are &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;fully static Astro 5 SSG builds deployed to Cloudflare Pages&lt;/a&gt;. Structured data lives in the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; of each page, injected by layout components. No server-side rendering, no dynamic injection.&lt;/p&gt;

&lt;p&gt;The schema types in use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SoftwareApplication&lt;/code&gt; + &lt;code&gt;BreadcrumbList&lt;/code&gt; on aiappdex.com model pages&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;VideoGame&lt;/code&gt; + &lt;code&gt;BreadcrumbList&lt;/code&gt; on findindiegame.com game pages&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ItemList&lt;/code&gt; + &lt;code&gt;BreadcrumbList&lt;/code&gt; on ossfind.com alternatives pages&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WebSite&lt;/code&gt; on all homepages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These all come from Astro layout components. When I add a new slot, reorganize the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt;, or extract shared layout logic, the JSON-LD block can disappear. The Astro compiler doesn't validate structured data. The build step doesn't check it. The deploy succeeds. Nothing errors.&lt;/p&gt;

&lt;p&gt;This matters especially for &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;static SSG sites where correctness at build time is the only opportunity&lt;/a&gt; — there's no server to validate output at runtime. If a template change drops the &lt;code&gt;VideoGame&lt;/code&gt; schema from 2,000 game pages, the damage is done by the time the deploy finishes.&lt;/p&gt;

&lt;p&gt;I mentioned in a &lt;a href="https://dev.to/morinaga/5-things-i-noticed-this-week-while-shipping-three-programmatic-seo-sites-4b26"&gt;weekly recap&lt;/a&gt; that I suspected some pages had malformed FAQ JSON-LD. That was the nudge to actually build the check.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the audit script checks
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;scripts/audit-jsonld.mjs&lt;/code&gt; defines a table of expectations per site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SITES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aiappdex.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;expectedTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WebSite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;pathRegex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;models&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;expectedTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SoftwareApplication&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;BreadcrumbList&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;findindiegame.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;expectedTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WebSite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;pathRegex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;games&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;expectedTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;VideoGame&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;BreadcrumbList&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ossfind.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;expectedTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WebSite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;pathRegex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;alternatives&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;expectedTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ItemList&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;BreadcrumbList&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each site, the script fetches the homepage and two sample detail pages, extracts all JSON-LD blocks, collects the &lt;code&gt;@type&lt;/code&gt; values present, and reports any expected type that's missing.&lt;/p&gt;

&lt;p&gt;It runs against &lt;strong&gt;live deployed pages&lt;/strong&gt;, not build output. If Cloudflare returns a cached version of the old page, this catches it. If a CDN edge is serving different HTML than origin, this catches it. Testing the build artifact catches template errors earlier, but not deployment and caching issues — and those are real failure modes I've &lt;a href="https://dev.to/morinaga/cloudflare-pages-returned-http-500-on-every-page-that-contained-mdoco-4pbh"&gt;hit before with Cloudflare Pages&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it discovers live pages from the sitemap
&lt;/h2&gt;

&lt;p&gt;Instead of hardcoding detail page paths, the script reads the live sitemap to find real pages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;discoverDetailPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sitemap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-0.xml`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;sitemap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matchAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;loc&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;([^&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;loc&amp;gt;/g&lt;/span&gt;&lt;span class="p"&gt;)].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The filename &lt;code&gt;sitemap-0.xml&lt;/code&gt; is intentional. As I &lt;a href="https://dev.to/morinaga/astrojssitemap-generates-sitemap-0xml-not-sitemap-indexxml-on-small-sites-5c7d"&gt;documented earlier in this series&lt;/a&gt;, &lt;code&gt;@astrojs/sitemap&lt;/code&gt; on small sites (under roughly 1,000 pages) writes &lt;code&gt;/sitemap-0.xml&lt;/code&gt;, not &lt;code&gt;/sitemap-index.xml&lt;/code&gt;. Hardcoding &lt;code&gt;/sitemap-index.xml&lt;/code&gt; would cause discovery to fail silently — falling back to checking no detail pages at all.&lt;/p&gt;

&lt;p&gt;Filtering against &lt;code&gt;pathRegex&lt;/code&gt; finds actual model/game/alternative pages that exist in the current production deployment. It checks 2 samples per site per run, which is fast but not exhaustive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extracting JSON-LD and handling &lt;a class="mentioned-user" href="https://dev.to/graph"&gt;@graph&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;The extraction is a regex over the HTML, with one non-obvious case: the &lt;code&gt;@graph&lt;/code&gt; unwrapping.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractJsonLd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matchAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="sr"&gt;/&amp;lt;script&lt;/span&gt;&lt;span class="se"&gt;[^&lt;/span&gt;&lt;span class="sr"&gt;&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+type=&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;"'&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;application&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;ld&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;json&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;"'&lt;/span&gt;&lt;span class="se"&gt;][^&lt;/span&gt;&lt;span class="sr"&gt;&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;*&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;([\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?)&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;script&amp;gt;/gi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;node&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@graph&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sub&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@graph&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_PARSE_ERROR&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Some generators bundle multiple schema objects inside a top-level &lt;code&gt;@graph&lt;/code&gt; array. Google treats each item in &lt;code&gt;@graph&lt;/code&gt; as a separate entity — the audit does the same. If structured data has &lt;code&gt;@graph: [{ "@type": "VideoGame" }, { "@type": "BreadcrumbList" }]&lt;/code&gt;, both types are extracted and validated individually.&lt;/p&gt;

&lt;p&gt;Parse errors surface as a &lt;code&gt;_PARSE_ERROR&lt;/code&gt; entry. This catches malformed JSON before it reaches the &lt;code&gt;@type&lt;/code&gt; check — useful if a template interpolation injects an unescaped quote into the JSON block.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding it to the CI pipeline as a non-fatal step
&lt;/h2&gt;

&lt;p&gt;I wired it into &lt;code&gt;publish-articles.yml&lt;/code&gt; — the &lt;a href="https://dev.to/morinaga/why-i-reused-a-single-ci-pipeline-for-two-youtube-channels-and-three-seo-sites-50ae"&gt;same pipeline that handles article distribution across Dev.to, Hashnode, and Bluesky&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Audit JSON-LD (non-fatal)&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node scripts/audit-jsonld.mjs || echo "JSON-LD audit reported issues (non-fatal)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;||&lt;/code&gt; fallback is the key design decision. It means the step always exits 0, so a failing audit never blocks article publishing. Issues appear in the action log, but no deploy is halted.&lt;/p&gt;

&lt;p&gt;This mirrors how I handled the &lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;Bluesky image upload timing issue&lt;/a&gt;: add the check first, observe what it reports in real conditions, fix the underlying problems, then tighten the failure mode. Making a new check fatal immediately guarantees you'll be debugging a blocked pipeline at the worst moment.&lt;/p&gt;

&lt;p&gt;Once all three sites audit clean on every run, I'll drop the &lt;code&gt;||&lt;/code&gt; and let a missing &lt;code&gt;BreadcrumbList&lt;/code&gt; fail the workflow. Not yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it found on first run
&lt;/h2&gt;

&lt;p&gt;Three issues surfaced immediately:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ossfind.com alternatives pages: missing &lt;code&gt;ItemList&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Expected — I hadn't added &lt;code&gt;ItemList&lt;/code&gt; schema to the ossfind alternatives layout yet. The audit turned "I should add structured data to ossfind someday" into a concrete, CI-visible task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;findindiegame.com homepage: &lt;code&gt;http://&lt;/code&gt; in WebSite &lt;code&gt;@id&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;@id&lt;/code&gt; field in the &lt;code&gt;WebSite&lt;/code&gt; block was &lt;code&gt;http://findindiegame.com&lt;/code&gt;. I had copied a schema template and missed updating the protocol. Nothing breaks visibly — the page renders correctly, the structured data is syntactically valid — but it's inconsistent with what Googlebot sees for the canonical URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;aiappdex.com model pages: &lt;code&gt;name&lt;/code&gt; field used raw HuggingFace model ID&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;name&lt;/code&gt; field in &lt;code&gt;SoftwareApplication&lt;/code&gt; schema contained &lt;code&gt;"meta-llama/Llama-3.1-8B-Instruct"&lt;/code&gt; — the raw database ID — instead of the human-readable &lt;code&gt;"Llama 3.1 8B Instruct"&lt;/code&gt; that appears in the page &lt;code&gt;&amp;lt;h1&amp;gt;&lt;/code&gt;. Both values were available in the Astro component, but the template was pulling from the wrong field.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Site&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ossfind.com&lt;/td&gt;
&lt;td&gt;Missing &lt;code&gt;ItemList&lt;/code&gt; on alternatives pages&lt;/td&gt;
&lt;td&gt;Backlog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;findindiegame.com&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;http://&lt;/code&gt; in WebSite &lt;code&gt;@id&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Fixed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;aiappdex.com&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;name&lt;/code&gt; used raw model ID instead of display name&lt;/td&gt;
&lt;td&gt;Fixed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Issues 2 and 3 were genuine bugs I wouldn't have found otherwise. Neither showed up in the Astro build, the Cloudflare deploy log, or any browser-level review. The audit found them on first run because it reads structured data the same way Googlebot does: as text inside a &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag, not as something the browser renders.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd add next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;URL self-consistency check.&lt;/strong&gt; The &lt;code&gt;http://&lt;/code&gt; bug was caught by manual inspection of the reported types. A systematic check would verify that every &lt;code&gt;url&lt;/code&gt; or &lt;code&gt;@id&lt;/code&gt; field in structured data matches the actual canonical URL of the page — so that class of error gets caught automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;aggregateRating&lt;/code&gt; on VideoGame pages.&lt;/strong&gt; The Steam review data is already in the Turso database: &lt;code&gt;total_reviews&lt;/code&gt;, &lt;code&gt;total_positive&lt;/code&gt;, &lt;code&gt;review_score&lt;/code&gt;. Once I emit &lt;code&gt;aggregateRating&lt;/code&gt; structured data, the audit should verify it's present and well-formed on game pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;FAQPage&lt;/code&gt; schema.&lt;/strong&gt; I want to add FAQ sections to top model pages on aiappdex.com. Once added, the audit needs a validation rule for those pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running against build output before deploy.&lt;/strong&gt; The current approach finds issues after they're live. Running the same extraction logic against the Astro build output — with a local &lt;code&gt;astro preview&lt;/code&gt; server in CI — would catch template regressions pre-deploy. That adds CI complexity I'm not ready to take on; post-deploy detection is good enough for now.&lt;/p&gt;

&lt;p&gt;The limit worth stating plainly: the audit checks 2 sample pages per site per run. It doesn't catch issues that only affect specific page types, rare edge cases in the data, or pages that happen not to be in the sitemap sample. It's a smoke test, not a full validation suite.&lt;/p&gt;

&lt;p&gt;Related reading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/morinaga/verifying-three-custom-domains-in-google-search-console-with-cloudflare-dns-2jfh"&gt;Verifying three custom domains in Google Search Console with Cloudflare DNS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/morinaga/astrojssitemap-generates-sitemap-0xml-not-sitemap-indexxml-on-small-sites-5c7d"&gt;@astrojs/sitemap generates /sitemap-0.xml not /sitemap-index.xml on small sites&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;The architecture behind three programmatic directory sites for $25/month&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>astro</category>
      <category>githubactions</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
