<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MORINAGA</title>
    <description>The latest articles on DEV Community by MORINAGA (@morinaga).</description>
    <link>https://dev.to/morinaga</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3907455%2F8e6a4a13-bec8-4ec0-bc2d-ec192b7880f8.png</url>
      <title>DEV Community: MORINAGA</title>
      <link>https://dev.to/morinaga</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/morinaga"/>
    <language>en</language>
    <item>
      <title>Three post-deploy checks I run after every Cloudflare Pages build</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Thu, 21 May 2026 22:15:55 +0000</pubDate>
      <link>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-39ee</link>
      <guid>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-39ee</guid>
      <description>&lt;p&gt;After spending two weeks debugging issues that only showed up in production — a &lt;a href="https://dev.to/morinaga/astrojssitemap-generates-sitemap-0xml-not-sitemap-indexxml-on-small-sites-5c7d"&gt;sitemap _redirects rule that was blocking my own sitemap-index.xml&lt;/a&gt; and a &lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;Bluesky image upload race against Cloudflare Pages deploy lag&lt;/a&gt; — I added three post-deploy checks to my workflow. They're fast and specific to the failure modes I've actually hit, not a full end-to-end test suite.&lt;/p&gt;

&lt;p&gt;Three sites (aiappdex.com, findindiegame.com, ossfind.com) on Cloudflare Pages with Astro 5 SSG. Here's what I check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 1: Sitemap reachability
&lt;/h2&gt;

&lt;p&gt;The simplest check and the one I should have had from day one. After a Cloudflare Pages deploy, I verify that &lt;code&gt;sitemap-index.xml&lt;/code&gt; is reachable and returning 200 on all three domains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;domain &lt;span class="k"&gt;in &lt;/span&gt;aiappdex.com findindiegame.com ossfind.com&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml → &lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAIL: &lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt; sitemap unreachable"&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also check &lt;code&gt;sitemap-0.xml&lt;/code&gt; — the actual URL sub-sitemap that &lt;code&gt;@astrojs/sitemap&lt;/code&gt; generates — and assert that it contains at least a minimum expected URL count. For aiappdex.com that threshold is 1,000; if it drops below that after a deploy, the ETL data pipeline probably broke silently.&lt;/p&gt;

&lt;p&gt;The reason this check exists: I had a &lt;code&gt;_redirects&lt;/code&gt; rule rewriting &lt;code&gt;sitemap-index.xml&lt;/code&gt; → &lt;code&gt;sitemap-0.xml&lt;/code&gt; as an emergency workaround that turned out to be wrong. It was live for five days before I found it. The rule was blocking the real &lt;code&gt;sitemap-index.xml&lt;/code&gt; from reaching crawlers while appearing fine in the browser (which followed the redirect). Curl with &lt;code&gt;-o /dev/null -w "%{http_code}"&lt;/code&gt; doesn't follow redirects by default, so it would have caught this immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 2: IndexNow batch submission
&lt;/h2&gt;

&lt;p&gt;After every successful sitemap check, I run &lt;code&gt;node scripts/indexnow.mjs&lt;/code&gt;. The script reads the live sitemap XML from each domain, collects all URLs, and POSTs them to the IndexNow endpoint for Bing, Yandex, Naver, and Seznam using site-specific keys.&lt;/p&gt;

&lt;p&gt;Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aiappdex.com: submitted 1179 URLs → 200 OK
findindiegame.com: submitted 139 URLs → 200 OK
ossfind.com: submitted 144 URLs → 200 OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a site returns 403 from IndexNow it usually means the key verification file (&lt;code&gt;/&amp;lt;key&amp;gt;.txt&lt;/code&gt;) wasn't deployed correctly or a &lt;code&gt;_redirects&lt;/code&gt; rule is mangling the path. Catching this right after deploy matters because the IndexNow key-verification window isn't instantaneous — letting it sit in a broken state delays indexing. I wrote more about the IndexNow setup in &lt;a href="https://dev.to/morinaga/indexnow-libsql-and-three-other-tools-i-reached-for-this-week-5c4m"&gt;this week's tools post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I run this manually after deploy rather than inline in the GitHub Actions workflow because the Cloudflare Pages build takes 2-3 minutes, and IndexNow works best with live URLs. Running it as a separate &lt;code&gt;workflow_dispatch&lt;/code&gt; trigger after the deployment succeeds means I'm submitting URLs that are actually live rather than ones that might still be deploying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 3: Weekly Lighthouse spot-check
&lt;/h2&gt;

&lt;p&gt;The third check runs on a cron — Monday 04:30 UTC — not after every deploy. It's slower (3-4 minutes per site, nine URLs total), so daily would be wasteful for a static site that doesn't change at runtime.&lt;/p&gt;

&lt;p&gt;The workflow uses &lt;code&gt;treosh/lighthouse-ci-action&lt;/code&gt; with one homepage and one deep entry page per site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;site&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;aiappdex.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/models/timm-vit-base-patch16-clip-224-openai/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;findindiegame.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/games/dredge-1562430/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;ossfind.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/alternatives/ghost/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm watching for Performance below 80, CLS above 0.1, or accessibility score regression. Astro SSG with no client-side JS should hold steady on all three — if they slip it means something in Tailwind v4 config or the ad slot component changed the layout paint behavior. The results upload to &lt;code&gt;temporaryPublicStorage&lt;/code&gt; so I can diff before/after on regressions.&lt;/p&gt;

&lt;p&gt;I don't set hard failure thresholds that block deploys. These sites are pre-revenue with essentially zero traffic right now; blocking a deploy because a Lighthouse score dropped from 94 to 88 would be disproportionate. I treat Lighthouse as a trend monitor, not a gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm deliberately not checking
&lt;/h2&gt;

&lt;p&gt;No uptime monitoring — I'm relying on Cloudflare's own infrastructure status. No end-to-end user flow tests. No API availability checks — the Turso DB is only queried at build time in SSG mode, so there's nothing to check at runtime.&lt;/p&gt;

&lt;p&gt;For a dynamically rendered site, those gaps would matter. For a static CDN deployment where the entire runtime is pre-built HTML, CSS, and a handful of JSON files, the three checks above cover the actual failure surface I've encountered.&lt;/p&gt;

&lt;p&gt;The publish pipeline has its own idempotency layer (it reads &lt;code&gt;published_urls&lt;/code&gt; from article frontmatter and skips already-distributed posts), so I don't need to verify cross-posting state after each deploy. That's a separate concern.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>githubactions</category>
      <category>astro</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why I'm betting on AI-curated directories when Google AI Overviews answer the same queries</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Thu, 21 May 2026 22:15:52 +0000</pubDate>
      <link>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-2mp9</link>
      <guid>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-2mp9</guid>
      <description>&lt;p&gt;The obvious counterargument to everything I'm building is this: Google already does it. You type "best AI tools for video editing" into Google and an AI Overview surfaces a curated list, synthesized from the same kind of data I maintain, without requiring a click. My three directory sites — &lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;Top AI Tools&lt;/a&gt;, &lt;a href="https://findindiegame.com" rel="noopener noreferrer"&gt;Find Games Like&lt;/a&gt;, and &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt; — are competing with a feature baked into the world's dominant search engine.&lt;/p&gt;

&lt;p&gt;I launched these sites on 2026-04-23, built on &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;an architecture that runs at about $25/month&lt;/a&gt;. Traffic is essentially zero — the sites have been indexed for three weeks and organic crawling takes time. The question I keep returning to isn't whether Google will eventually index my pages. It's whether anyone will prefer clicking through to my site over reading the AI Overview box that already answered the same question.&lt;/p&gt;

&lt;p&gt;Here's my honest, falsifiable position.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bet, stated plainly
&lt;/h2&gt;

&lt;p&gt;By October 2026 — six months post-launch — at least one of the three sites will show organic click trends in Google Search Console indicating real query traffic to specific comparison or filtered-browse pages. I define that as: at least 200 non-homepage organic clicks per month, sustained for two consecutive months, from queries I didn't directly drive through social or newsletter posts.&lt;/p&gt;

&lt;p&gt;If that doesn't happen, I'll publish the Search Console screenshots and write a post explaining what I got wrong. I'm committing to that here.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument I take seriously
&lt;/h2&gt;

&lt;p&gt;AI Overviews have gotten genuinely good at list-and-compare synthesis. If you search "open source alternative to Notion" today, Google often returns a four-item structured list with one-sentence descriptions directly in the Overview box. My Open Alternative To site covers that territory. The AI Overview absorbs the zero-click version of that query.&lt;/p&gt;

&lt;p&gt;The optimistic response is: "my site appears as a citation source." The pessimistic response is: "Google consumes your signal and stops sending clicks." The pessimistic version has supporting evidence — industry-wide CTR on informational queries dropped measurably as AI Overviews expanded through 2025, and the trend hasn't reversed.&lt;/p&gt;

&lt;p&gt;I don't think the pessimistic version is the whole story, but I'm not dismissing it. The most dangerous move is to assume the counterargument is wrong without designing around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AI Overviews have structural blind spots
&lt;/h2&gt;

&lt;p&gt;AI Overviews are strong at synthesizing "what exists." They're weaker at three things I've deliberately built for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attribute-based filtering.&lt;/strong&gt; If someone wants "open source Notion alternatives that work offline and have a mobile app," AI Overviews give hedged prose answers because they're synthesizing text, not querying structured fields. My Turso DB has &lt;code&gt;works_offline&lt;/code&gt;, &lt;code&gt;has_mobile_app&lt;/code&gt;, and &lt;code&gt;last_commit_date&lt;/code&gt; as typed columns. Faceted filtering on those fields is something a browseable directory does better than a language model writing a paragraph about the general landscape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editorial negative-space.&lt;/strong&gt; My game recommender &lt;a href="https://dev.to/morinaga/adding-avoid-if-caveats-to-my-ai-game-recommender-what-changed-hk3"&gt;includes "avoid if" caveats&lt;/a&gt; — structured fields that answer "who should skip this?" generated by a Claude Haiku prompt that specifically forces a critical answer. AI Overviews don't have a mechanism to surface structured negatives. They default to positive framing, which means someone with a specific disqualifying requirement gets an unhelpful answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Freshness on maintenance status.&lt;/strong&gt; The ETL that populates the AI tools directory pulls GitHub commit activity weekly. A tool that hasn't been touched in 14 months is marked as low activity. AI Overviews don't distinguish between a tool actively maintained in 2026 and one that peaked in 2024 — they rely on the recency of web mentions, which can lag by months after a project goes dormant.&lt;/p&gt;

&lt;p&gt;None of these defenses are permanent. Google could build structured attribute filtering into AI Overviews. But they require deliberate pipeline design, not just synthesis, and the gap exists now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The downstream click thesis
&lt;/h2&gt;

&lt;p&gt;Even if my sites lose the zero-click battle on broad discovery terms, there's a second query type I'm explicitly targeting: the downstream comparison query.&lt;/p&gt;

&lt;p&gt;The sequence: someone types "Notion alternatives" into Google, gets an AI Overview naming four tools, then types "Appflowy vs Anytype performance" to compare the two they're considering. That second query is post-AI-Overview research. It has commercial intent. It wants a verdict, not another list.&lt;/p&gt;

&lt;p&gt;For that query, a page with structured attribute comparison, a clear verdict, and fast load time competes directly with another AI-style answer — and structured data beats generative prose for "which one wins on attribute X." This is partly why &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;I chose static SSG over dynamic AI rendering&lt;/a&gt; for these sites: a fast, indexable page with typed comparison fields is what a second-stage research click needs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query type&lt;/th&gt;
&lt;th&gt;AI Overview strength&lt;/th&gt;
&lt;th&gt;Directory strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discovery ("best tools for X")&lt;/td&gt;
&lt;td&gt;High — often answers directly&lt;/td&gt;
&lt;td&gt;Low for zero-click intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparison ("X vs Y, which wins")&lt;/td&gt;
&lt;td&gt;Medium — hedges, rarely commits&lt;/td&gt;
&lt;td&gt;High — structured attrs + verdict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filtered browse ("offline + mobile app")&lt;/td&gt;
&lt;td&gt;Low — prose, no filters&lt;/td&gt;
&lt;td&gt;High — faceted structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Freshness ("is X still maintained?")&lt;/td&gt;
&lt;td&gt;Inconsistent — lags commits&lt;/td&gt;
&lt;td&gt;High — weekly ETL refresh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The comparison and filtered-browse rows are the actual load-bearing columns of this bet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the cost structure matters for intellectual honesty
&lt;/h2&gt;

&lt;p&gt;At $25/month, I can run this experiment for a year without needing revenue to justify continuing. I'm not under pressure to interpret ambiguous signals optimistically.&lt;/p&gt;

&lt;p&gt;Compare that to a project burning $200/month on infrastructure: you'd rationalize flat Search Console data as "still in the sandbox phase" past the point where the data actually says something. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;full cost breakdown&lt;/a&gt; is genuinely minimal — Vercel Pro at $20, Turso starter at $0, Claude Haiku API in single-digit dollars for monthly ETL runs, GitHub Actions on free minutes.&lt;/p&gt;

&lt;p&gt;I won't claim AdSense is approved or revenue is flowing until it is. Right now, &lt;a href="https://dev.to/morinaga/why-google-adsense-will-not-approve-a-vercelapp-site-110b"&gt;AdSense rejected the *.vercel.app version&lt;/a&gt; of the sites. I've moved to custom domains and &lt;a href="https://dev.to/morinaga/verifying-three-custom-domains-in-google-search-console-with-cloudflare-dns"&gt;verified them in Search Console&lt;/a&gt;. I'm waiting for real crawl data before making any claims about what's working.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would change my mind
&lt;/h2&gt;

&lt;p&gt;Three outcomes would tell me the bet is wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impressions but near-zero clicks at 90 days.&lt;/strong&gt; If Search Console shows my pages appearing as AI Overview citation sources but click rates stay near zero on comparison pages specifically, Google is extracting my signal without distributing traffic. That's the worst-case scenario — I'd need to rethink the format entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AdSense keeps rejecting after genuine depth improvements.&lt;/strong&gt; The original rejection was partly a *.vercel.app domain issue, but if Google's classifier still rates the pages as thin after I've rebuilt with real structured content and specific editorial attributes, my model of what "quality" means to the classifier is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparison queries migrate fully to LLM chat.&lt;/strong&gt; If people stop typing "X vs Y" into Google and start asking ChatGPT directly, the downstream click I'm betting on disappears. I don't see evidence of this happening at scale for research involving specific attribute constraints — but I'm monitoring query volume patterns month-over-month.&lt;/p&gt;

&lt;p&gt;The first outcome is the one I'd want to see early. Impressions with near-zero clicks on comparison pages by month 3 would tell me to pivot the format immediately rather than wait six months for a conclusion I could have reached sooner.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why three sites instead of one authority site?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three narrow sites let me test three different intent types simultaneously. Games-like, AI tools, and OSS alternatives attract different queries and different audiences. One site would take longer to produce the same signal volume about which format works. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;original architecture post&lt;/a&gt; covers the reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Claude Haiku generate the structured editorial fields?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each ETL run sends entries through a &lt;a href="https://dev.to/morinaga/how-i-built-a-shared-claude-haiku-client-with-system-prompt-caching-for-batch-etl-1ddp"&gt;shared Claude Haiku client&lt;/a&gt; that uses system-prompt caching to amortize the cost across batch runs. The prompts are tuned to force specific attribute outputs — avoid-if caveats, audience fit, freshness status — not open-ended descriptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if one site works and two don't?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a useful outcome, not a failure. The format that works tells me something specific about the intent type. I'll invest in what works and document what didn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where will you publish the October 2026 verdict?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On this blog, with raw Search Console screenshots. I'll publish regardless of whether the numbers are favorable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>indiehackers</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Static site search for Astro in 2026: why I picked Pagefind over Algolia and Lunr</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 20 May 2026 22:12:46 +0000</pubDate>
      <link>https://dev.to/morinaga/static-site-search-for-astro-in-2026-why-i-picked-pagefind-over-algolia-and-lunr-pg1</link>
      <guid>https://dev.to/morinaga/static-site-search-for-astro-in-2026-why-i-picked-pagefind-over-algolia-and-lunr-pg1</guid>
      <description>&lt;p&gt;I added search to all three of &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;my AI-curated directory sites&lt;/a&gt; last month. The choice wasn't obvious — there are at least four options with real adoption — so here's the breakdown I actually ran through before landing on &lt;a href="https://pagefind.app/" rel="noopener noreferrer"&gt;Pagefind&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four options I considered
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pagefind&lt;/strong&gt; is a Rust-based static search library. It runs at build time, generates an index in &lt;code&gt;/_pagefind/&lt;/code&gt;, and serves everything as static files. No backend, no API key, no per-query billing. It ships a prebuilt UI (&lt;code&gt;PagefindUI&lt;/code&gt;) that you can mount on any element, and it supports WebAssembly for in-browser querying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Algolia DocSearch&lt;/strong&gt; is free for open-source documentation sites, $49/month for commercial sites below a certain crawl limit. It indexes your content via their crawler (or an API push), stores it on Algolia's infrastructure, and gives you a hosted search widget. Fast, polished, and battle-tested — it's what most major docs sites use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lunr.js&lt;/strong&gt; is a client-side search library. You build the index at build time, serialize it to JSON, and ship it with the page. The browser loads the entire index on first search. Works offline, no external dependency, but the index size grows linearly with content, and there's no incremental loading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FlexSearch&lt;/strong&gt; is a newer alternative to Lunr with better performance characteristics and smaller bundle size, but the same core trade-off: you ship the whole index to the browser upfront.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Pagefind won
&lt;/h2&gt;

&lt;p&gt;The decisive factor was index size management. My directories have 500-1,000 entries per site, each with a multi-paragraph generated description. A Lunr index for 1,000 entries would be 2-4MB shipped with every page load. Pagefind shards its index and loads chunks lazily as the user types — so the initial load is under 30KB (the WASM binary + a small manifest), and individual chunk fetches happen on demand.&lt;/p&gt;

&lt;p&gt;The second factor was cost. Algolia DocSearch's commercial tier runs $49/month per site. I'm running three sites on a &lt;a href="https://dev.to/articles/static-ssg-vs-dynamic-ai-rendering-directory-seo"&gt;total infrastructure budget of roughly $25/month&lt;/a&gt;. Pagefind is free.&lt;/p&gt;

&lt;p&gt;The third factor was the deploy model. Because everything in &lt;code&gt;/_pagefind/&lt;/code&gt; is a static file, Cloudflare Pages caches it at the edge with no configuration. There's no API to rate-limit, no service availability to depend on, no API key to rotate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SearchDialog implementation
&lt;/h2&gt;

&lt;p&gt;The search component is a &lt;code&gt;&amp;lt;dialog&amp;gt;&lt;/code&gt; element with a Pagefind UI mounted inside it. I load the &lt;code&gt;pagefind-ui.js&lt;/code&gt; script lazily — only when the dialog is first opened — to keep it off the critical path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadPagefind&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;script&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/_pagefind/pagefind-ui.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PagefindUI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PagefindUI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;showSubResults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;resetStyles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onerror&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;p&amp;gt;Search index not available yet (first build). Try again after next deploy.&amp;lt;/p&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;head&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;s.onerror&lt;/code&gt; handler is the part most tutorials skip. On the first deploy of a new Cloudflare Pages site, the &lt;code&gt;/_pagefind/&lt;/code&gt; directory doesn't exist yet — Pagefind only runs during the build. If a user opens search before the first full build completes, &lt;code&gt;pagefind-ui.js&lt;/code&gt; 404s. Without the error handler, you get a silent failure. With it, you get a legible message.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;dialog&amp;gt;&lt;/code&gt; element is the right primitive here: it handles focus trapping automatically, Escape closes it natively, and &lt;code&gt;backdrop:&lt;/code&gt; CSS pseudo-element gives you the dimmed overlay without JavaScript. The Cmd+K keyboard shortcut is wired with &lt;code&gt;document.addEventListener("keydown", ...)&lt;/code&gt; — no library needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Pagefind doesn't do
&lt;/h2&gt;

&lt;p&gt;Two gaps I've hit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No query logging.&lt;/strong&gt; Pagefind runs entirely in the browser and doesn't send queries anywhere. For a commercial directory, knowing what users search for is valuable — it tells you which models or games to add, and which compare pages to prioritize. With Algolia you get this for free. With Pagefind you'd need to add a thin logging layer (a fetch POST to an analytics endpoint on each query event). I haven't built this yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No fuzzy matching out of the box.&lt;/strong&gt; Pagefind does stemming and basic substring matching, but "stabilty diffusion" (typo) won't match "stable diffusion". Algolia's typo-tolerance is significantly better. For an AI tools directory where model names are long and often misremembered, this matters. I'll probably add a query-suggestion layer that does fuzzy pre-matching before handing off to Pagefind.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Pagefind&lt;/th&gt;
&lt;th&gt;Algolia DocSearch&lt;/th&gt;
&lt;th&gt;Lunr.js&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$49/mo (commercial)&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index location&lt;/td&gt;
&lt;td&gt;Static files&lt;/td&gt;
&lt;td&gt;Algolia cloud&lt;/td&gt;
&lt;td&gt;Shipped with page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Initial JS load&lt;/td&gt;
&lt;td&gt;~30KB&lt;/td&gt;
&lt;td&gt;~80KB&lt;/td&gt;
&lt;td&gt;~10KB + index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index size scalability&lt;/td&gt;
&lt;td&gt;Chunked, lazy&lt;/td&gt;
&lt;td&gt;Server-side&lt;/td&gt;
&lt;td&gt;Linear, upfront&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typo tolerance&lt;/td&gt;
&lt;td&gt;Basic stemming&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query logging&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build-time integration&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Crawler / push API&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a static site on a tight infrastructure budget with 500-1,000 entries, Pagefind is the right default. If the site were larger or if I needed typo tolerance and query analytics without building them myself, Algolia would be worth the cost.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>astro</category>
      <category>webdev</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How I built pairwise AI model compare pages with Claude Haiku and a budget cap</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 20 May 2026 22:12:43 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-built-pairwise-ai-model-compare-pages-with-claude-haiku-and-a-budget-cap-ia0</link>
      <guid>https://dev.to/morinaga/how-i-built-pairwise-ai-model-compare-pages-with-claude-haiku-and-a-budget-cap-ia0</guid>
      <description>&lt;p&gt;When I added compare pages to the &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;Top AI Tools directory&lt;/a&gt;, the first question I had to answer was: how many pairs am I actually looking at? With roughly 200 models across 8 pipeline tags, the naive upper bound is 200 × 199 / 2 ≈ 19,900 pairs. Generating content for each one with Claude Haiku would cost somewhere around $20 per run — not ruinous, but not something I wanted to run daily without thinking carefully.&lt;/p&gt;

&lt;p&gt;Here's what I actually built, where it falls short, and what I'd do differently if starting over.&lt;/p&gt;

&lt;h2&gt;
  
  
  The combinatorics problem
&lt;/h2&gt;

&lt;p&gt;Model compare pages exist for a specific type of query: "llama 3 vs mistral 7b", "stable diffusion vs sdxl", "whisper vs wav2vec2". These are high-intent queries — the user has already narrowed down to a shortlist and wants a concrete decision nudge. The &lt;a href="https://dev.to/articles/static-ssg-vs-dynamic-ai-rendering-directory-seo"&gt;static SSG approach I'm running&lt;/a&gt; means I need to precompute each compare page at build time, which puts pressure on how many pages I can afford to generate.&lt;/p&gt;

&lt;p&gt;The solution I landed on: group by &lt;code&gt;pipeline_tag&lt;/code&gt;, pair the top-4 models by download count within each group, then cap total pairs with a &lt;code&gt;COMPARE_LIMIT&lt;/code&gt; env var. Within a single pipeline like &lt;code&gt;text-generation&lt;/code&gt;, the top 4 models give 6 pairs (4 choose 2). Across 8 active pipelines that's roughly 48 pairs. The env cap of 50 means I stay within that budget while having room to grow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;byPipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;byPipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nx"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;byPipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[,&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;byPipe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;downloads&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;downloads&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chosen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pairing happens entirely within pipelines right now, which means I'm covering "llama vs mistral" (both &lt;code&gt;text-generation&lt;/code&gt;) but not "whisper vs gemini-vision" (cross-pipeline). Cross-pipeline comparisons are actually more valuable for users who don't know the landscape yet — that's the next iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pair_slug and idempotent inserts
&lt;/h2&gt;

&lt;p&gt;The slug for each compare pair is constructed deterministically: sort the two model slugs alphabetically, join with &lt;code&gt;--vs--&lt;/code&gt;. So whether the ETL processes &lt;code&gt;(llama-3, mistral-7b)&lt;/code&gt; or &lt;code&gt;(mistral-7b, llama-3)&lt;/code&gt;, the slug is always &lt;code&gt;llama-3--vs--mistral-7b&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pairSlug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;--vs--&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the entire ETL idempotent. The script runs every night. If all pairs already exist in the DB, it exits in a couple of seconds without a single Claude call. I check before inserting rather than using &lt;code&gt;INSERT OR IGNORE&lt;/code&gt; at the SQL level — the explicit check lets me count skipped vs generated in the same run, which I log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[compare] done — generated: 3, skipped: 47
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters for monitoring. A run that generates 0 and skips 50 is healthy. A run that generates 0 and skips 0 (nothing in DB, nothing processed) would indicate a bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Haiku with system-prompt caching
&lt;/h2&gt;

&lt;p&gt;I reuse the &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;shared Haiku client I built in week one&lt;/a&gt;, which handles &lt;code&gt;cacheSystem: true&lt;/code&gt; on the system prompt. Since the system prompt — the JSON schema instruction — is identical across all compare calls, the first call primes the cache and subsequent calls see near-zero token cost on that prefix.&lt;/p&gt;

&lt;p&gt;The user prompt includes both model names, their authors, pipeline tags, and up to 400 characters of their existing summaries (which come from the earlier content generation step):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Compare these two AI models:
A: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (author: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, pipeline: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)
   Summary: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;(none)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;
B: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (author: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, pipeline: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)
   Summary: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;(none)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;

Produce the JSON comparison.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Truncating summaries at 400 characters keeps the user prompt lean. Compare pages are about the &lt;em&gt;delta&lt;/em&gt; between two models, not a rehash of each model individually. I already have dedicated model pages for depth; the compare page needs to answer "which one, for what" — that takes maybe 6 sentences total.&lt;/p&gt;

&lt;p&gt;The system prompt requests a JSON object with &lt;code&gt;summary&lt;/code&gt;, &lt;code&gt;differences&lt;/code&gt; (array), &lt;code&gt;similarities&lt;/code&gt; (array), and &lt;code&gt;recommendation&lt;/code&gt;. Keeping the output shape narrow means Haiku rarely wanders off-schema.&lt;/p&gt;

&lt;h2&gt;
  
  
  JSON parsing with a regex fence
&lt;/h2&gt;

&lt;p&gt;Even with tight prompting, Haiku occasionally produces JSON with an explanation preamble: "Here is the comparison:" followed by the actual object. Strict &lt;code&gt;JSON.parse&lt;/code&gt; on the raw output would throw. I extract the outermost &lt;code&gt;{...}&lt;/code&gt; block with a regex before parsing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseCompare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CompareData&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CompareData&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\{[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\}&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
          &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each field is validated individually before being accepted. If &lt;code&gt;differences&lt;/code&gt; comes back as a string (occasional Haiku behavior when it conflates the array with a comma-separated list), the page falls back to the template for that field rather than crashing.&lt;/p&gt;

&lt;p&gt;The fallback struct is worth writing carefully. I spent five minutes on mine and it shows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CompareData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; and &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; are both &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_tag&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; models. See each entry for specifics.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;See individual model pages for architecture and use cases.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Both are open-source models on HuggingFace.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Pick based on your compute budget and specific task requirements.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A user landing on a fallback-generated compare page gets a technically-true page that directs them to the model pages rather than a blank or error state. The &lt;code&gt;model_used&lt;/code&gt; column in the DB records &lt;code&gt;"fallback-template"&lt;/code&gt; for these rows, which I use to identify candidates for regeneration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage in libSQL and the static JSON dump
&lt;/h2&gt;

&lt;p&gt;Compare data lives in a &lt;code&gt;model_compare&lt;/code&gt; table in &lt;a href="https://dev.to/articles/turso-libsql-vs-cloudflare-d1-astro-monorepo"&gt;Turso libSQL&lt;/a&gt;, with a unique constraint on &lt;code&gt;pair_slug&lt;/code&gt;. After the ETL loop, everything gets dumped to &lt;code&gt;compare.json&lt;/code&gt; for the static build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;all&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`SELECT * FROM model_compare ORDER BY slug_a, slug_b`&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;entries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;slug_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug_a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;slug_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug_b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;pair_slug&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pair_slug&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="na"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;writeFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./src/data/compare.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Astro build reads this JSON at build time, generating one static page per pair. No runtime DB calls, no cold starts. The tradeoff is freshness: compare content is up to 24 hours stale. For "llama 3.1 vs llama 3.2", that's fine — the models don't change daily.&lt;/p&gt;

&lt;p&gt;I validate the JSON-LD on compare pages through the &lt;a href="https://dev.to/articles/jsonld-audit-post-deploy-ci"&gt;post-deploy audit CI step&lt;/a&gt; the same way I do for individual model pages. Structured data matters more on comparison queries because those are the exact queries that AI Overviews tend to surface, so getting the schema right is worth the CI overhead.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/astro-slug-pages-unique-after-adsense-scaled-content-abuse"&gt;Astro slug generation&lt;/a&gt; for compare pages uses the &lt;code&gt;pair_slug&lt;/code&gt; directly. The URL pattern is &lt;code&gt;/compare/llama-3--vs--mistral-7b/&lt;/code&gt;, which is ugly but unambiguous — the double-dash separator makes it clear this is a two-part slug rather than a hyphen in a model name.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd change starting over
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Generate cross-pipeline pairs from day one.&lt;/strong&gt; The most useful compare queries aren't "llama 3.1 vs llama 3.2" — users who care about that distinction already know. The interesting queries are cross-category: "should I run inference on a text-generation model or use a RAG pipeline?" I skipped this to stay within the budget cap, but it means I'm missing the long-tail traffic that would actually be differentiated from generic model pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drive pair selection from search query logs.&lt;/strong&gt; Right now I pick pairs by download rank. A better signal would be which pairs users actually search for. Pagefind runs client-side and doesn't log queries to any server, so I'd need a thin logging endpoint — something like a POST to a &lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;-triggered function that appends to a JSONL file. Then the ETL reads the top-N ungenerated pairs from the log. This is a small amount of infrastructure but it would make the pair selection much more demand-driven.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Raise the budget cap.&lt;/strong&gt; &lt;code&gt;MAX=50&lt;/code&gt; is conservative. At current Haiku pricing with prompt caching, 500 pairs would cost roughly $0.10 per nightly run. I was cautious when I set the default, but I've watched the billing closely and the actual spend is a fraction of what I modeled. I'll bump this to 200 in the next ETL config update.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/how-i-added-itchio-entries-to-a-steam-only-astro-directory"&gt;itch.io entries pattern I added to the indie-games directory&lt;/a&gt; taught me to plan for the second data source earlier. Compare pages have the same shape: a join between two rows. Getting the abstraction right before you have 500+ rows in the DB is much easier than retrofitting it.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does the ETL run every night even when no new models are added?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, but it's nearly free when nothing is new. The check-before-insert means most nights it does 50 DB reads and exits in under 3 seconds without touching the Claude API. The console output shows &lt;code&gt;generated: 0, skipped: 47&lt;/code&gt; which is the signal that everything is up to date.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when Claude returns malformed JSON?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parseCompare&lt;/code&gt; catches the error and returns the fallback struct. The row is still written to the DB with &lt;code&gt;model_used = "fallback-template"&lt;/code&gt;, which I can query to find rows worth retrying. In practice, this happens on maybe 2-3% of generations — usually when the two models have very sparse metadata and Haiku doesn't have enough context to produce structured output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the compare.json file get unwieldy as pairs accumulate?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At 50 pairs it's roughly 25KB. At 500 pairs it'll be around 250KB — still fine for build-time loading in Astro. If I ever hit 5,000 pairs I'd split the file by &lt;code&gt;pipeline_tag&lt;/code&gt; and lazy-import only the relevant subset for each page. For now, one flat JSON file is simpler and fast enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not compute compare content at request time with an edge function?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cold starts and cost. An edge function hit for each compare page view would add 200-500ms of latency (Haiku inference + DB round trip) and would cost much more per-pageview than the nightly batch approach. The content also doesn't need to be fresher than daily — model capabilities don't shift on an hourly basis. Static precomputation is the right tradeoff here, consistent with &lt;a href="https://dev.to/articles/static-ssg-vs-dynamic-ai-rendering-directory-seo"&gt;the broader bet on static SSG&lt;/a&gt; I'm running on all three sites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you handle the case where a model is removed from HuggingFace?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Right now, I don't. If model &lt;code&gt;foo&lt;/code&gt; is deleted from &lt;a href="https://huggingface.co" rel="noopener noreferrer"&gt;HuggingFace&lt;/a&gt; but its compare rows are still in the DB, those compare pages will still be served at build time. They'll have the old data until the model's row in &lt;code&gt;models.json&lt;/code&gt; is removed — which only happens if the model falls out of the top-500 in the nightly fetch. It's a known gap. For now, the risk is low; popular models don't disappear. A more robust system would cross-reference the compare table against the model table and tombstone orphaned pairs.&lt;/p&gt;




&lt;p&gt;Related: &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;How I built a shared Claude Haiku client with system-prompt caching&lt;/a&gt; | &lt;a href="https://dev.to/articles/turso-libsql-vs-cloudflare-d1-astro-monorepo"&gt;Turso libSQL vs Cloudflare D1 for an Astro monorepo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>astro</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Three post-deploy checks I run after every Cloudflare Pages build</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 20 May 2026 22:12:08 +0000</pubDate>
      <link>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-40nh</link>
      <guid>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-40nh</guid>
      <description>&lt;p&gt;After spending two weeks debugging issues that only showed up in production — a &lt;a href="https://dev.to/morinaga/astrojssitemap-generates-sitemap-0xml-not-sitemap-indexxml-on-small-sites-5c7d"&gt;sitemap _redirects rule that was blocking my own sitemap-index.xml&lt;/a&gt; and a &lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;Bluesky image upload race against Cloudflare Pages deploy lag&lt;/a&gt; — I added three post-deploy checks to my workflow. They're fast and specific to the failure modes I've actually hit, not a full end-to-end test suite.&lt;/p&gt;

&lt;p&gt;Three sites (aiappdex.com, findindiegame.com, ossfind.com) on Cloudflare Pages with Astro 5 SSG. Here's what I check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 1: Sitemap reachability
&lt;/h2&gt;

&lt;p&gt;The simplest check and the one I should have had from day one. After a Cloudflare Pages deploy, I verify that &lt;code&gt;sitemap-index.xml&lt;/code&gt; is reachable and returning 200 on all three domains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;domain &lt;span class="k"&gt;in &lt;/span&gt;aiappdex.com findindiegame.com ossfind.com&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml → &lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAIL: &lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt; sitemap unreachable"&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also check &lt;code&gt;sitemap-0.xml&lt;/code&gt; — the actual URL sub-sitemap that &lt;code&gt;@astrojs/sitemap&lt;/code&gt; generates — and assert that it contains at least a minimum expected URL count. For aiappdex.com that threshold is 1,000; if it drops below that after a deploy, the ETL data pipeline probably broke silently.&lt;/p&gt;

&lt;p&gt;The reason this check exists: I had a &lt;code&gt;_redirects&lt;/code&gt; rule rewriting &lt;code&gt;sitemap-index.xml&lt;/code&gt; → &lt;code&gt;sitemap-0.xml&lt;/code&gt; as an emergency workaround that turned out to be wrong. It was live for five days before I found it. The rule was blocking the real &lt;code&gt;sitemap-index.xml&lt;/code&gt; from reaching crawlers while appearing fine in the browser (which followed the redirect). Curl with &lt;code&gt;-o /dev/null -w "%{http_code}"&lt;/code&gt; doesn't follow redirects by default, so it would have caught this immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 2: IndexNow batch submission
&lt;/h2&gt;

&lt;p&gt;After every successful sitemap check, I run &lt;code&gt;node scripts/indexnow.mjs&lt;/code&gt;. The script reads the live sitemap XML from each domain, collects all URLs, and POSTs them to the IndexNow endpoint for Bing, Yandex, Naver, and Seznam using site-specific keys.&lt;/p&gt;

&lt;p&gt;Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aiappdex.com: submitted 1179 URLs → 200 OK
findindiegame.com: submitted 139 URLs → 200 OK
ossfind.com: submitted 144 URLs → 200 OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a site returns 403 from IndexNow it usually means the key verification file (&lt;code&gt;/&amp;lt;key&amp;gt;.txt&lt;/code&gt;) wasn't deployed correctly or a &lt;code&gt;_redirects&lt;/code&gt; rule is mangling the path. Catching this right after deploy matters because the IndexNow key-verification window isn't instantaneous — letting it sit in a broken state delays indexing. I wrote more about the IndexNow setup in &lt;a href="https://dev.to/morinaga/indexnow-libsql-and-three-other-tools-i-reached-for-this-week-5c4m"&gt;this week's tools post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I run this manually after deploy rather than inline in the GitHub Actions workflow because the Cloudflare Pages build takes 2-3 minutes, and IndexNow works best with live URLs. Running it as a separate &lt;code&gt;workflow_dispatch&lt;/code&gt; trigger after the deployment succeeds means I'm submitting URLs that are actually live rather than ones that might still be deploying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 3: Weekly Lighthouse spot-check
&lt;/h2&gt;

&lt;p&gt;The third check runs on a cron — Monday 04:30 UTC — not after every deploy. It's slower (3-4 minutes per site, nine URLs total), so daily would be wasteful for a static site that doesn't change at runtime.&lt;/p&gt;

&lt;p&gt;The workflow uses &lt;code&gt;treosh/lighthouse-ci-action&lt;/code&gt; with one homepage and one deep entry page per site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;site&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;aiappdex.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/models/timm-vit-base-patch16-clip-224-openai/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;findindiegame.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/games/dredge-1562430/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;ossfind.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/alternatives/ghost/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm watching for Performance below 80, CLS above 0.1, or accessibility score regression. Astro SSG with no client-side JS should hold steady on all three — if they slip it means something in Tailwind v4 config or the ad slot component changed the layout paint behavior. The results upload to &lt;code&gt;temporaryPublicStorage&lt;/code&gt; so I can diff before/after on regressions.&lt;/p&gt;

&lt;p&gt;I don't set hard failure thresholds that block deploys. These sites are pre-revenue with essentially zero traffic right now; blocking a deploy because a Lighthouse score dropped from 94 to 88 would be disproportionate. I treat Lighthouse as a trend monitor, not a gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm deliberately not checking
&lt;/h2&gt;

&lt;p&gt;No uptime monitoring — I'm relying on Cloudflare's own infrastructure status. No end-to-end user flow tests. No API availability checks — the Turso DB is only queried at build time in SSG mode, so there's nothing to check at runtime.&lt;/p&gt;

&lt;p&gt;For a dynamically rendered site, those gaps would matter. For a static CDN deployment where the entire runtime is pre-built HTML, CSS, and a handful of JSON files, the three checks above cover the actual failure surface I've encountered.&lt;/p&gt;

&lt;p&gt;The publish pipeline has its own idempotency layer (it reads &lt;code&gt;published_urls&lt;/code&gt; from article frontmatter and skips already-distributed posts), so I don't need to verify cross-posting state after each deploy. That's a separate concern.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>githubactions</category>
      <category>astro</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why I'm betting on AI-curated directories when Google AI Overviews answer the same queries</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 20 May 2026 22:11:59 +0000</pubDate>
      <link>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-4f7d</link>
      <guid>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-4f7d</guid>
      <description>&lt;p&gt;The obvious counterargument to everything I'm building is this: Google already does it. You type "best AI tools for video editing" into Google and an AI Overview surfaces a curated list, synthesized from the same kind of data I maintain, without requiring a click. My three directory sites — &lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;Top AI Tools&lt;/a&gt;, &lt;a href="https://findindiegame.com" rel="noopener noreferrer"&gt;Find Games Like&lt;/a&gt;, and &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt; — are competing with a feature baked into the world's dominant search engine.&lt;/p&gt;

&lt;p&gt;I launched these sites on 2026-04-23, built on &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;an architecture that runs at about $25/month&lt;/a&gt;. Traffic is essentially zero — the sites have been indexed for three weeks and organic crawling takes time. The question I keep returning to isn't whether Google will eventually index my pages. It's whether anyone will prefer clicking through to my site over reading the AI Overview box that already answered the same question.&lt;/p&gt;

&lt;p&gt;Here's my honest, falsifiable position.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bet, stated plainly
&lt;/h2&gt;

&lt;p&gt;By October 2026 — six months post-launch — at least one of the three sites will show organic click trends in Google Search Console indicating real query traffic to specific comparison or filtered-browse pages. I define that as: at least 200 non-homepage organic clicks per month, sustained for two consecutive months, from queries I didn't directly drive through social or newsletter posts.&lt;/p&gt;

&lt;p&gt;If that doesn't happen, I'll publish the Search Console screenshots and write a post explaining what I got wrong. I'm committing to that here.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument I take seriously
&lt;/h2&gt;

&lt;p&gt;AI Overviews have gotten genuinely good at list-and-compare synthesis. If you search "open source alternative to Notion" today, Google often returns a four-item structured list with one-sentence descriptions directly in the Overview box. My Open Alternative To site covers that territory. The AI Overview absorbs the zero-click version of that query.&lt;/p&gt;

&lt;p&gt;The optimistic response is: "my site appears as a citation source." The pessimistic response is: "Google consumes your signal and stops sending clicks." The pessimistic version has supporting evidence — industry-wide CTR on informational queries dropped measurably as AI Overviews expanded through 2025, and the trend hasn't reversed.&lt;/p&gt;

&lt;p&gt;I don't think the pessimistic version is the whole story, but I'm not dismissing it. The most dangerous move is to assume the counterargument is wrong without designing around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AI Overviews have structural blind spots
&lt;/h2&gt;

&lt;p&gt;AI Overviews are strong at synthesizing "what exists." They're weaker at three things I've deliberately built for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attribute-based filtering.&lt;/strong&gt; If someone wants "open source Notion alternatives that work offline and have a mobile app," AI Overviews give hedged prose answers because they're synthesizing text, not querying structured fields. My Turso DB has &lt;code&gt;works_offline&lt;/code&gt;, &lt;code&gt;has_mobile_app&lt;/code&gt;, and &lt;code&gt;last_commit_date&lt;/code&gt; as typed columns. Faceted filtering on those fields is something a browseable directory does better than a language model writing a paragraph about the general landscape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editorial negative-space.&lt;/strong&gt; My game recommender &lt;a href="https://dev.to/morinaga/adding-avoid-if-caveats-to-my-ai-game-recommender-what-changed-hk3"&gt;includes "avoid if" caveats&lt;/a&gt; — structured fields that answer "who should skip this?" generated by a Claude Haiku prompt that specifically forces a critical answer. AI Overviews don't have a mechanism to surface structured negatives. They default to positive framing, which means someone with a specific disqualifying requirement gets an unhelpful answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Freshness on maintenance status.&lt;/strong&gt; The ETL that populates the AI tools directory pulls GitHub commit activity weekly. A tool that hasn't been touched in 14 months is marked as low activity. AI Overviews don't distinguish between a tool actively maintained in 2026 and one that peaked in 2024 — they rely on the recency of web mentions, which can lag by months after a project goes dormant.&lt;/p&gt;

&lt;p&gt;None of these defenses are permanent. Google could build structured attribute filtering into AI Overviews. But they require deliberate pipeline design, not just synthesis, and the gap exists now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The downstream click thesis
&lt;/h2&gt;

&lt;p&gt;Even if my sites lose the zero-click battle on broad discovery terms, there's a second query type I'm explicitly targeting: the downstream comparison query.&lt;/p&gt;

&lt;p&gt;The sequence: someone types "Notion alternatives" into Google, gets an AI Overview naming four tools, then types "Appflowy vs Anytype performance" to compare the two they're considering. That second query is post-AI-Overview research. It has commercial intent. It wants a verdict, not another list.&lt;/p&gt;

&lt;p&gt;For that query, a page with structured attribute comparison, a clear verdict, and fast load time competes directly with another AI-style answer — and structured data beats generative prose for "which one wins on attribute X." This is partly why &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;I chose static SSG over dynamic AI rendering&lt;/a&gt; for these sites: a fast, indexable page with typed comparison fields is what a second-stage research click needs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query type&lt;/th&gt;
&lt;th&gt;AI Overview strength&lt;/th&gt;
&lt;th&gt;Directory strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discovery ("best tools for X")&lt;/td&gt;
&lt;td&gt;High — often answers directly&lt;/td&gt;
&lt;td&gt;Low for zero-click intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparison ("X vs Y, which wins")&lt;/td&gt;
&lt;td&gt;Medium — hedges, rarely commits&lt;/td&gt;
&lt;td&gt;High — structured attrs + verdict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filtered browse ("offline + mobile app")&lt;/td&gt;
&lt;td&gt;Low — prose, no filters&lt;/td&gt;
&lt;td&gt;High — faceted structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Freshness ("is X still maintained?")&lt;/td&gt;
&lt;td&gt;Inconsistent — lags commits&lt;/td&gt;
&lt;td&gt;High — weekly ETL refresh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The comparison and filtered-browse rows are the actual load-bearing columns of this bet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the cost structure matters for intellectual honesty
&lt;/h2&gt;

&lt;p&gt;At $25/month, I can run this experiment for a year without needing revenue to justify continuing. I'm not under pressure to interpret ambiguous signals optimistically.&lt;/p&gt;

&lt;p&gt;Compare that to a project burning $200/month on infrastructure: you'd rationalize flat Search Console data as "still in the sandbox phase" past the point where the data actually says something. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;full cost breakdown&lt;/a&gt; is genuinely minimal — Vercel Pro at $20, Turso starter at $0, Claude Haiku API in single-digit dollars for monthly ETL runs, GitHub Actions on free minutes.&lt;/p&gt;

&lt;p&gt;I won't claim AdSense is approved or revenue is flowing until it is. Right now, &lt;a href="https://dev.to/morinaga/why-google-adsense-will-not-approve-a-vercelapp-site-110b"&gt;AdSense rejected the *.vercel.app version&lt;/a&gt; of the sites. I've moved to custom domains and &lt;a href="https://dev.to/morinaga/verifying-three-custom-domains-in-google-search-console-with-cloudflare-dns"&gt;verified them in Search Console&lt;/a&gt;. I'm waiting for real crawl data before making any claims about what's working.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would change my mind
&lt;/h2&gt;

&lt;p&gt;Three outcomes would tell me the bet is wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impressions but near-zero clicks at 90 days.&lt;/strong&gt; If Search Console shows my pages appearing as AI Overview citation sources but click rates stay near zero on comparison pages specifically, Google is extracting my signal without distributing traffic. That's the worst-case scenario — I'd need to rethink the format entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AdSense keeps rejecting after genuine depth improvements.&lt;/strong&gt; The original rejection was partly a *.vercel.app domain issue, but if Google's classifier still rates the pages as thin after I've rebuilt with real structured content and specific editorial attributes, my model of what "quality" means to the classifier is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparison queries migrate fully to LLM chat.&lt;/strong&gt; If people stop typing "X vs Y" into Google and start asking ChatGPT directly, the downstream click I'm betting on disappears. I don't see evidence of this happening at scale for research involving specific attribute constraints — but I'm monitoring query volume patterns month-over-month.&lt;/p&gt;

&lt;p&gt;The first outcome is the one I'd want to see early. Impressions with near-zero clicks on comparison pages by month 3 would tell me to pivot the format immediately rather than wait six months for a conclusion I could have reached sooner.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why three sites instead of one authority site?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three narrow sites let me test three different intent types simultaneously. Games-like, AI tools, and OSS alternatives attract different queries and different audiences. One site would take longer to produce the same signal volume about which format works. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;original architecture post&lt;/a&gt; covers the reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Claude Haiku generate the structured editorial fields?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each ETL run sends entries through a &lt;a href="https://dev.to/morinaga/how-i-built-a-shared-claude-haiku-client-with-system-prompt-caching-for-batch-etl-1ddp"&gt;shared Claude Haiku client&lt;/a&gt; that uses system-prompt caching to amortize the cost across batch runs. The prompts are tuned to force specific attribute outputs — avoid-if caveats, audience fit, freshness status — not open-ended descriptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if one site works and two don't?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a useful outcome, not a failure. The format that works tells me something specific about the intent type. I'll invest in what works and document what didn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where will you publish the October 2026 verdict?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On this blog, with raw Search Console screenshots. I'll publish regardless of whether the numbers are favorable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>indiehackers</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Three post-deploy checks I run after every Cloudflare Pages build</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 19 May 2026 22:12:50 +0000</pubDate>
      <link>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-f8i</link>
      <guid>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-f8i</guid>
      <description>&lt;p&gt;After spending two weeks debugging issues that only showed up in production — a &lt;a href="https://dev.to/morinaga/astrojssitemap-generates-sitemap-0xml-not-sitemap-indexxml-on-small-sites-5c7d"&gt;sitemap _redirects rule that was blocking my own sitemap-index.xml&lt;/a&gt; and a &lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;Bluesky image upload race against Cloudflare Pages deploy lag&lt;/a&gt; — I added three post-deploy checks to my workflow. They're fast and specific to the failure modes I've actually hit, not a full end-to-end test suite.&lt;/p&gt;

&lt;p&gt;Three sites (aiappdex.com, findindiegame.com, ossfind.com) on Cloudflare Pages with Astro 5 SSG. Here's what I check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 1: Sitemap reachability
&lt;/h2&gt;

&lt;p&gt;The simplest check and the one I should have had from day one. After a Cloudflare Pages deploy, I verify that &lt;code&gt;sitemap-index.xml&lt;/code&gt; is reachable and returning 200 on all three domains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;domain &lt;span class="k"&gt;in &lt;/span&gt;aiappdex.com findindiegame.com ossfind.com&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml → &lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAIL: &lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt; sitemap unreachable"&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also check &lt;code&gt;sitemap-0.xml&lt;/code&gt; — the actual URL sub-sitemap that &lt;code&gt;@astrojs/sitemap&lt;/code&gt; generates — and assert that it contains at least a minimum expected URL count. For aiappdex.com that threshold is 1,000; if it drops below that after a deploy, the ETL data pipeline probably broke silently.&lt;/p&gt;

&lt;p&gt;The reason this check exists: I had a &lt;code&gt;_redirects&lt;/code&gt; rule rewriting &lt;code&gt;sitemap-index.xml&lt;/code&gt; → &lt;code&gt;sitemap-0.xml&lt;/code&gt; as an emergency workaround that turned out to be wrong. It was live for five days before I found it. The rule was blocking the real &lt;code&gt;sitemap-index.xml&lt;/code&gt; from reaching crawlers while appearing fine in the browser (which followed the redirect). Curl with &lt;code&gt;-o /dev/null -w "%{http_code}"&lt;/code&gt; doesn't follow redirects by default, so it would have caught this immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 2: IndexNow batch submission
&lt;/h2&gt;

&lt;p&gt;After every successful sitemap check, I run &lt;code&gt;node scripts/indexnow.mjs&lt;/code&gt;. The script reads the live sitemap XML from each domain, collects all URLs, and POSTs them to the IndexNow endpoint for Bing, Yandex, Naver, and Seznam using site-specific keys.&lt;/p&gt;

&lt;p&gt;Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aiappdex.com: submitted 1179 URLs → 200 OK
findindiegame.com: submitted 139 URLs → 200 OK
ossfind.com: submitted 144 URLs → 200 OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a site returns 403 from IndexNow it usually means the key verification file (&lt;code&gt;/&amp;lt;key&amp;gt;.txt&lt;/code&gt;) wasn't deployed correctly or a &lt;code&gt;_redirects&lt;/code&gt; rule is mangling the path. Catching this right after deploy matters because the IndexNow key-verification window isn't instantaneous — letting it sit in a broken state delays indexing. I wrote more about the IndexNow setup in &lt;a href="https://dev.to/morinaga/indexnow-libsql-and-three-other-tools-i-reached-for-this-week-5c4m"&gt;this week's tools post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I run this manually after deploy rather than inline in the GitHub Actions workflow because the Cloudflare Pages build takes 2-3 minutes, and IndexNow works best with live URLs. Running it as a separate &lt;code&gt;workflow_dispatch&lt;/code&gt; trigger after the deployment succeeds means I'm submitting URLs that are actually live rather than ones that might still be deploying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 3: Weekly Lighthouse spot-check
&lt;/h2&gt;

&lt;p&gt;The third check runs on a cron — Monday 04:30 UTC — not after every deploy. It's slower (3-4 minutes per site, nine URLs total), so daily would be wasteful for a static site that doesn't change at runtime.&lt;/p&gt;

&lt;p&gt;The workflow uses &lt;code&gt;treosh/lighthouse-ci-action&lt;/code&gt; with one homepage and one deep entry page per site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;site&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;aiappdex.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/models/timm-vit-base-patch16-clip-224-openai/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;findindiegame.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/games/dredge-1562430/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;ossfind.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/alternatives/ghost/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm watching for Performance below 80, CLS above 0.1, or accessibility score regression. Astro SSG with no client-side JS should hold steady on all three — if they slip it means something in Tailwind v4 config or the ad slot component changed the layout paint behavior. The results upload to &lt;code&gt;temporaryPublicStorage&lt;/code&gt; so I can diff before/after on regressions.&lt;/p&gt;

&lt;p&gt;I don't set hard failure thresholds that block deploys. These sites are pre-revenue with essentially zero traffic right now; blocking a deploy because a Lighthouse score dropped from 94 to 88 would be disproportionate. I treat Lighthouse as a trend monitor, not a gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm deliberately not checking
&lt;/h2&gt;

&lt;p&gt;No uptime monitoring — I'm relying on Cloudflare's own infrastructure status. No end-to-end user flow tests. No API availability checks — the Turso DB is only queried at build time in SSG mode, so there's nothing to check at runtime.&lt;/p&gt;

&lt;p&gt;For a dynamically rendered site, those gaps would matter. For a static CDN deployment where the entire runtime is pre-built HTML, CSS, and a handful of JSON files, the three checks above cover the actual failure surface I've encountered.&lt;/p&gt;

&lt;p&gt;The publish pipeline has its own idempotency layer (it reads &lt;code&gt;published_urls&lt;/code&gt; from article frontmatter and skips already-distributed posts), so I don't need to verify cross-posting state after each deploy. That's a separate concern.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>githubactions</category>
      <category>astro</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why I'm betting on AI-curated directories when Google AI Overviews answer the same queries</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Tue, 19 May 2026 22:12:48 +0000</pubDate>
      <link>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-5g3i</link>
      <guid>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-5g3i</guid>
      <description>&lt;p&gt;The obvious counterargument to everything I'm building is this: Google already does it. You type "best AI tools for video editing" into Google and an AI Overview surfaces a curated list, synthesized from the same kind of data I maintain, without requiring a click. My three directory sites — &lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;Top AI Tools&lt;/a&gt;, &lt;a href="https://findindiegame.com" rel="noopener noreferrer"&gt;Find Games Like&lt;/a&gt;, and &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt; — are competing with a feature baked into the world's dominant search engine.&lt;/p&gt;

&lt;p&gt;I launched these sites on 2026-04-23, built on &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;an architecture that runs at about $25/month&lt;/a&gt;. Traffic is essentially zero — the sites have been indexed for three weeks and organic crawling takes time. The question I keep returning to isn't whether Google will eventually index my pages. It's whether anyone will prefer clicking through to my site over reading the AI Overview box that already answered the same question.&lt;/p&gt;

&lt;p&gt;Here's my honest, falsifiable position.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bet, stated plainly
&lt;/h2&gt;

&lt;p&gt;By October 2026 — six months post-launch — at least one of the three sites will show organic click trends in Google Search Console indicating real query traffic to specific comparison or filtered-browse pages. I define that as: at least 200 non-homepage organic clicks per month, sustained for two consecutive months, from queries I didn't directly drive through social or newsletter posts.&lt;/p&gt;

&lt;p&gt;If that doesn't happen, I'll publish the Search Console screenshots and write a post explaining what I got wrong. I'm committing to that here.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument I take seriously
&lt;/h2&gt;

&lt;p&gt;AI Overviews have gotten genuinely good at list-and-compare synthesis. If you search "open source alternative to Notion" today, Google often returns a four-item structured list with one-sentence descriptions directly in the Overview box. My Open Alternative To site covers that territory. The AI Overview absorbs the zero-click version of that query.&lt;/p&gt;

&lt;p&gt;The optimistic response is: "my site appears as a citation source." The pessimistic response is: "Google consumes your signal and stops sending clicks." The pessimistic version has supporting evidence — industry-wide CTR on informational queries dropped measurably as AI Overviews expanded through 2025, and the trend hasn't reversed.&lt;/p&gt;

&lt;p&gt;I don't think the pessimistic version is the whole story, but I'm not dismissing it. The most dangerous move is to assume the counterargument is wrong without designing around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AI Overviews have structural blind spots
&lt;/h2&gt;

&lt;p&gt;AI Overviews are strong at synthesizing "what exists." They're weaker at three things I've deliberately built for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attribute-based filtering.&lt;/strong&gt; If someone wants "open source Notion alternatives that work offline and have a mobile app," AI Overviews give hedged prose answers because they're synthesizing text, not querying structured fields. My Turso DB has &lt;code&gt;works_offline&lt;/code&gt;, &lt;code&gt;has_mobile_app&lt;/code&gt;, and &lt;code&gt;last_commit_date&lt;/code&gt; as typed columns. Faceted filtering on those fields is something a browseable directory does better than a language model writing a paragraph about the general landscape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editorial negative-space.&lt;/strong&gt; My game recommender &lt;a href="https://dev.to/morinaga/adding-avoid-if-caveats-to-my-ai-game-recommender-what-changed-hk3"&gt;includes "avoid if" caveats&lt;/a&gt; — structured fields that answer "who should skip this?" generated by a Claude Haiku prompt that specifically forces a critical answer. AI Overviews don't have a mechanism to surface structured negatives. They default to positive framing, which means someone with a specific disqualifying requirement gets an unhelpful answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Freshness on maintenance status.&lt;/strong&gt; The ETL that populates the AI tools directory pulls GitHub commit activity weekly. A tool that hasn't been touched in 14 months is marked as low activity. AI Overviews don't distinguish between a tool actively maintained in 2026 and one that peaked in 2024 — they rely on the recency of web mentions, which can lag by months after a project goes dormant.&lt;/p&gt;

&lt;p&gt;None of these defenses are permanent. Google could build structured attribute filtering into AI Overviews. But they require deliberate pipeline design, not just synthesis, and the gap exists now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The downstream click thesis
&lt;/h2&gt;

&lt;p&gt;Even if my sites lose the zero-click battle on broad discovery terms, there's a second query type I'm explicitly targeting: the downstream comparison query.&lt;/p&gt;

&lt;p&gt;The sequence: someone types "Notion alternatives" into Google, gets an AI Overview naming four tools, then types "Appflowy vs Anytype performance" to compare the two they're considering. That second query is post-AI-Overview research. It has commercial intent. It wants a verdict, not another list.&lt;/p&gt;

&lt;p&gt;For that query, a page with structured attribute comparison, a clear verdict, and fast load time competes directly with another AI-style answer — and structured data beats generative prose for "which one wins on attribute X." This is partly why &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;I chose static SSG over dynamic AI rendering&lt;/a&gt; for these sites: a fast, indexable page with typed comparison fields is what a second-stage research click needs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query type&lt;/th&gt;
&lt;th&gt;AI Overview strength&lt;/th&gt;
&lt;th&gt;Directory strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discovery ("best tools for X")&lt;/td&gt;
&lt;td&gt;High — often answers directly&lt;/td&gt;
&lt;td&gt;Low for zero-click intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparison ("X vs Y, which wins")&lt;/td&gt;
&lt;td&gt;Medium — hedges, rarely commits&lt;/td&gt;
&lt;td&gt;High — structured attrs + verdict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filtered browse ("offline + mobile app")&lt;/td&gt;
&lt;td&gt;Low — prose, no filters&lt;/td&gt;
&lt;td&gt;High — faceted structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Freshness ("is X still maintained?")&lt;/td&gt;
&lt;td&gt;Inconsistent — lags commits&lt;/td&gt;
&lt;td&gt;High — weekly ETL refresh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The comparison and filtered-browse rows are the actual load-bearing columns of this bet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the cost structure matters for intellectual honesty
&lt;/h2&gt;

&lt;p&gt;At $25/month, I can run this experiment for a year without needing revenue to justify continuing. I'm not under pressure to interpret ambiguous signals optimistically.&lt;/p&gt;

&lt;p&gt;Compare that to a project burning $200/month on infrastructure: you'd rationalize flat Search Console data as "still in the sandbox phase" past the point where the data actually says something. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;full cost breakdown&lt;/a&gt; is genuinely minimal — Vercel Pro at $20, Turso starter at $0, Claude Haiku API in single-digit dollars for monthly ETL runs, GitHub Actions on free minutes.&lt;/p&gt;

&lt;p&gt;I won't claim AdSense is approved or revenue is flowing until it is. Right now, &lt;a href="https://dev.to/morinaga/why-google-adsense-will-not-approve-a-vercelapp-site-110b"&gt;AdSense rejected the *.vercel.app version&lt;/a&gt; of the sites. I've moved to custom domains and &lt;a href="https://dev.to/morinaga/verifying-three-custom-domains-in-google-search-console-with-cloudflare-dns"&gt;verified them in Search Console&lt;/a&gt;. I'm waiting for real crawl data before making any claims about what's working.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would change my mind
&lt;/h2&gt;

&lt;p&gt;Three outcomes would tell me the bet is wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impressions but near-zero clicks at 90 days.&lt;/strong&gt; If Search Console shows my pages appearing as AI Overview citation sources but click rates stay near zero on comparison pages specifically, Google is extracting my signal without distributing traffic. That's the worst-case scenario — I'd need to rethink the format entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AdSense keeps rejecting after genuine depth improvements.&lt;/strong&gt; The original rejection was partly a *.vercel.app domain issue, but if Google's classifier still rates the pages as thin after I've rebuilt with real structured content and specific editorial attributes, my model of what "quality" means to the classifier is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparison queries migrate fully to LLM chat.&lt;/strong&gt; If people stop typing "X vs Y" into Google and start asking ChatGPT directly, the downstream click I'm betting on disappears. I don't see evidence of this happening at scale for research involving specific attribute constraints — but I'm monitoring query volume patterns month-over-month.&lt;/p&gt;

&lt;p&gt;The first outcome is the one I'd want to see early. Impressions with near-zero clicks on comparison pages by month 3 would tell me to pivot the format immediately rather than wait six months for a conclusion I could have reached sooner.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why three sites instead of one authority site?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three narrow sites let me test three different intent types simultaneously. Games-like, AI tools, and OSS alternatives attract different queries and different audiences. One site would take longer to produce the same signal volume about which format works. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;original architecture post&lt;/a&gt; covers the reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Claude Haiku generate the structured editorial fields?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each ETL run sends entries through a &lt;a href="https://dev.to/morinaga/how-i-built-a-shared-claude-haiku-client-with-system-prompt-caching-for-batch-etl-1ddp"&gt;shared Claude Haiku client&lt;/a&gt; that uses system-prompt caching to amortize the cost across batch runs. The prompts are tuned to force specific attribute outputs — avoid-if caveats, audience fit, freshness status — not open-ended descriptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if one site works and two don't?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a useful outcome, not a failure. The format that works tells me something specific about the intent type. I'll invest in what works and document what didn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where will you publish the October 2026 verdict?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On this blog, with raw Search Console screenshots. I'll publish regardless of whether the numbers are favorable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>indiehackers</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Three post-deploy checks I run after every Cloudflare Pages build</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Mon, 18 May 2026 22:12:21 +0000</pubDate>
      <link>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-3l67</link>
      <guid>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-3l67</guid>
      <description>&lt;p&gt;After spending two weeks debugging issues that only showed up in production — a &lt;a href="https://dev.to/morinaga/astrojssitemap-generates-sitemap-0xml-not-sitemap-indexxml-on-small-sites-5c7d"&gt;sitemap _redirects rule that was blocking my own sitemap-index.xml&lt;/a&gt; and a &lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;Bluesky image upload race against Cloudflare Pages deploy lag&lt;/a&gt; — I added three post-deploy checks to my workflow. They're fast and specific to the failure modes I've actually hit, not a full end-to-end test suite.&lt;/p&gt;

&lt;p&gt;Three sites (aiappdex.com, findindiegame.com, ossfind.com) on Cloudflare Pages with Astro 5 SSG. Here's what I check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 1: Sitemap reachability
&lt;/h2&gt;

&lt;p&gt;The simplest check and the one I should have had from day one. After a Cloudflare Pages deploy, I verify that &lt;code&gt;sitemap-index.xml&lt;/code&gt; is reachable and returning 200 on all three domains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;domain &lt;span class="k"&gt;in &lt;/span&gt;aiappdex.com findindiegame.com ossfind.com&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml → &lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAIL: &lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt; sitemap unreachable"&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also check &lt;code&gt;sitemap-0.xml&lt;/code&gt; — the actual URL sub-sitemap that &lt;code&gt;@astrojs/sitemap&lt;/code&gt; generates — and assert that it contains at least a minimum expected URL count. For aiappdex.com that threshold is 1,000; if it drops below that after a deploy, the ETL data pipeline probably broke silently.&lt;/p&gt;

&lt;p&gt;The reason this check exists: I had a &lt;code&gt;_redirects&lt;/code&gt; rule rewriting &lt;code&gt;sitemap-index.xml&lt;/code&gt; → &lt;code&gt;sitemap-0.xml&lt;/code&gt; as an emergency workaround that turned out to be wrong. It was live for five days before I found it. The rule was blocking the real &lt;code&gt;sitemap-index.xml&lt;/code&gt; from reaching crawlers while appearing fine in the browser (which followed the redirect). Curl with &lt;code&gt;-o /dev/null -w "%{http_code}"&lt;/code&gt; doesn't follow redirects by default, so it would have caught this immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 2: IndexNow batch submission
&lt;/h2&gt;

&lt;p&gt;After every successful sitemap check, I run &lt;code&gt;node scripts/indexnow.mjs&lt;/code&gt;. The script reads the live sitemap XML from each domain, collects all URLs, and POSTs them to the IndexNow endpoint for Bing, Yandex, Naver, and Seznam using site-specific keys.&lt;/p&gt;

&lt;p&gt;Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aiappdex.com: submitted 1179 URLs → 200 OK
findindiegame.com: submitted 139 URLs → 200 OK
ossfind.com: submitted 144 URLs → 200 OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a site returns 403 from IndexNow it usually means the key verification file (&lt;code&gt;/&amp;lt;key&amp;gt;.txt&lt;/code&gt;) wasn't deployed correctly or a &lt;code&gt;_redirects&lt;/code&gt; rule is mangling the path. Catching this right after deploy matters because the IndexNow key-verification window isn't instantaneous — letting it sit in a broken state delays indexing. I wrote more about the IndexNow setup in &lt;a href="https://dev.to/morinaga/indexnow-libsql-and-three-other-tools-i-reached-for-this-week-5c4m"&gt;this week's tools post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I run this manually after deploy rather than inline in the GitHub Actions workflow because the Cloudflare Pages build takes 2-3 minutes, and IndexNow works best with live URLs. Running it as a separate &lt;code&gt;workflow_dispatch&lt;/code&gt; trigger after the deployment succeeds means I'm submitting URLs that are actually live rather than ones that might still be deploying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 3: Weekly Lighthouse spot-check
&lt;/h2&gt;

&lt;p&gt;The third check runs on a cron — Monday 04:30 UTC — not after every deploy. It's slower (3-4 minutes per site, nine URLs total), so daily would be wasteful for a static site that doesn't change at runtime.&lt;/p&gt;

&lt;p&gt;The workflow uses &lt;code&gt;treosh/lighthouse-ci-action&lt;/code&gt; with one homepage and one deep entry page per site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;site&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;aiappdex.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/models/timm-vit-base-patch16-clip-224-openai/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;findindiegame.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/games/dredge-1562430/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;ossfind.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/alternatives/ghost/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm watching for Performance below 80, CLS above 0.1, or accessibility score regression. Astro SSG with no client-side JS should hold steady on all three — if they slip it means something in Tailwind v4 config or the ad slot component changed the layout paint behavior. The results upload to &lt;code&gt;temporaryPublicStorage&lt;/code&gt; so I can diff before/after on regressions.&lt;/p&gt;

&lt;p&gt;I don't set hard failure thresholds that block deploys. These sites are pre-revenue with essentially zero traffic right now; blocking a deploy because a Lighthouse score dropped from 94 to 88 would be disproportionate. I treat Lighthouse as a trend monitor, not a gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm deliberately not checking
&lt;/h2&gt;

&lt;p&gt;No uptime monitoring — I'm relying on Cloudflare's own infrastructure status. No end-to-end user flow tests. No API availability checks — the Turso DB is only queried at build time in SSG mode, so there's nothing to check at runtime.&lt;/p&gt;

&lt;p&gt;For a dynamically rendered site, those gaps would matter. For a static CDN deployment where the entire runtime is pre-built HTML, CSS, and a handful of JSON files, the three checks above cover the actual failure surface I've encountered.&lt;/p&gt;

&lt;p&gt;The publish pipeline has its own idempotency layer (it reads &lt;code&gt;published_urls&lt;/code&gt; from article frontmatter and skips already-distributed posts), so I don't need to verify cross-posting state after each deploy. That's a separate concern.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>githubactions</category>
      <category>astro</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why I'm betting on AI-curated directories when Google AI Overviews answer the same queries</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Mon, 18 May 2026 22:12:19 +0000</pubDate>
      <link>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-54kg</link>
      <guid>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-54kg</guid>
      <description>&lt;p&gt;The obvious counterargument to everything I'm building is this: Google already does it. You type "best AI tools for video editing" into Google and an AI Overview surfaces a curated list, synthesized from the same kind of data I maintain, without requiring a click. My three directory sites — &lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;Top AI Tools&lt;/a&gt;, &lt;a href="https://findindiegame.com" rel="noopener noreferrer"&gt;Find Games Like&lt;/a&gt;, and &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt; — are competing with a feature baked into the world's dominant search engine.&lt;/p&gt;

&lt;p&gt;I launched these sites on 2026-04-23, built on &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;an architecture that runs at about $25/month&lt;/a&gt;. Traffic is essentially zero — the sites have been indexed for three weeks and organic crawling takes time. The question I keep returning to isn't whether Google will eventually index my pages. It's whether anyone will prefer clicking through to my site over reading the AI Overview box that already answered the same question.&lt;/p&gt;

&lt;p&gt;Here's my honest, falsifiable position.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bet, stated plainly
&lt;/h2&gt;

&lt;p&gt;By October 2026 — six months post-launch — at least one of the three sites will show organic click trends in Google Search Console indicating real query traffic to specific comparison or filtered-browse pages. I define that as: at least 200 non-homepage organic clicks per month, sustained for two consecutive months, from queries I didn't directly drive through social or newsletter posts.&lt;/p&gt;

&lt;p&gt;If that doesn't happen, I'll publish the Search Console screenshots and write a post explaining what I got wrong. I'm committing to that here.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument I take seriously
&lt;/h2&gt;

&lt;p&gt;AI Overviews have gotten genuinely good at list-and-compare synthesis. If you search "open source alternative to Notion" today, Google often returns a four-item structured list with one-sentence descriptions directly in the Overview box. My Open Alternative To site covers that territory. The AI Overview absorbs the zero-click version of that query.&lt;/p&gt;

&lt;p&gt;The optimistic response is: "my site appears as a citation source." The pessimistic response is: "Google consumes your signal and stops sending clicks." The pessimistic version has supporting evidence — industry-wide CTR on informational queries dropped measurably as AI Overviews expanded through 2025, and the trend hasn't reversed.&lt;/p&gt;

&lt;p&gt;I don't think the pessimistic version is the whole story, but I'm not dismissing it. The most dangerous move is to assume the counterargument is wrong without designing around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AI Overviews have structural blind spots
&lt;/h2&gt;

&lt;p&gt;AI Overviews are strong at synthesizing "what exists." They're weaker at three things I've deliberately built for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attribute-based filtering.&lt;/strong&gt; If someone wants "open source Notion alternatives that work offline and have a mobile app," AI Overviews give hedged prose answers because they're synthesizing text, not querying structured fields. My Turso DB has &lt;code&gt;works_offline&lt;/code&gt;, &lt;code&gt;has_mobile_app&lt;/code&gt;, and &lt;code&gt;last_commit_date&lt;/code&gt; as typed columns. Faceted filtering on those fields is something a browseable directory does better than a language model writing a paragraph about the general landscape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editorial negative-space.&lt;/strong&gt; My game recommender &lt;a href="https://dev.to/morinaga/adding-avoid-if-caveats-to-my-ai-game-recommender-what-changed-hk3"&gt;includes "avoid if" caveats&lt;/a&gt; — structured fields that answer "who should skip this?" generated by a Claude Haiku prompt that specifically forces a critical answer. AI Overviews don't have a mechanism to surface structured negatives. They default to positive framing, which means someone with a specific disqualifying requirement gets an unhelpful answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Freshness on maintenance status.&lt;/strong&gt; The ETL that populates the AI tools directory pulls GitHub commit activity weekly. A tool that hasn't been touched in 14 months is marked as low activity. AI Overviews don't distinguish between a tool actively maintained in 2026 and one that peaked in 2024 — they rely on the recency of web mentions, which can lag by months after a project goes dormant.&lt;/p&gt;

&lt;p&gt;None of these defenses are permanent. Google could build structured attribute filtering into AI Overviews. But they require deliberate pipeline design, not just synthesis, and the gap exists now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The downstream click thesis
&lt;/h2&gt;

&lt;p&gt;Even if my sites lose the zero-click battle on broad discovery terms, there's a second query type I'm explicitly targeting: the downstream comparison query.&lt;/p&gt;

&lt;p&gt;The sequence: someone types "Notion alternatives" into Google, gets an AI Overview naming four tools, then types "Appflowy vs Anytype performance" to compare the two they're considering. That second query is post-AI-Overview research. It has commercial intent. It wants a verdict, not another list.&lt;/p&gt;

&lt;p&gt;For that query, a page with structured attribute comparison, a clear verdict, and fast load time competes directly with another AI-style answer — and structured data beats generative prose for "which one wins on attribute X." This is partly why &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;I chose static SSG over dynamic AI rendering&lt;/a&gt; for these sites: a fast, indexable page with typed comparison fields is what a second-stage research click needs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query type&lt;/th&gt;
&lt;th&gt;AI Overview strength&lt;/th&gt;
&lt;th&gt;Directory strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discovery ("best tools for X")&lt;/td&gt;
&lt;td&gt;High — often answers directly&lt;/td&gt;
&lt;td&gt;Low for zero-click intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparison ("X vs Y, which wins")&lt;/td&gt;
&lt;td&gt;Medium — hedges, rarely commits&lt;/td&gt;
&lt;td&gt;High — structured attrs + verdict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filtered browse ("offline + mobile app")&lt;/td&gt;
&lt;td&gt;Low — prose, no filters&lt;/td&gt;
&lt;td&gt;High — faceted structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Freshness ("is X still maintained?")&lt;/td&gt;
&lt;td&gt;Inconsistent — lags commits&lt;/td&gt;
&lt;td&gt;High — weekly ETL refresh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The comparison and filtered-browse rows are the actual load-bearing columns of this bet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the cost structure matters for intellectual honesty
&lt;/h2&gt;

&lt;p&gt;At $25/month, I can run this experiment for a year without needing revenue to justify continuing. I'm not under pressure to interpret ambiguous signals optimistically.&lt;/p&gt;

&lt;p&gt;Compare that to a project burning $200/month on infrastructure: you'd rationalize flat Search Console data as "still in the sandbox phase" past the point where the data actually says something. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;full cost breakdown&lt;/a&gt; is genuinely minimal — Vercel Pro at $20, Turso starter at $0, Claude Haiku API in single-digit dollars for monthly ETL runs, GitHub Actions on free minutes.&lt;/p&gt;

&lt;p&gt;I won't claim AdSense is approved or revenue is flowing until it is. Right now, &lt;a href="https://dev.to/morinaga/why-google-adsense-will-not-approve-a-vercelapp-site-110b"&gt;AdSense rejected the *.vercel.app version&lt;/a&gt; of the sites. I've moved to custom domains and &lt;a href="https://dev.to/morinaga/verifying-three-custom-domains-in-google-search-console-with-cloudflare-dns"&gt;verified them in Search Console&lt;/a&gt;. I'm waiting for real crawl data before making any claims about what's working.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would change my mind
&lt;/h2&gt;

&lt;p&gt;Three outcomes would tell me the bet is wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impressions but near-zero clicks at 90 days.&lt;/strong&gt; If Search Console shows my pages appearing as AI Overview citation sources but click rates stay near zero on comparison pages specifically, Google is extracting my signal without distributing traffic. That's the worst-case scenario — I'd need to rethink the format entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AdSense keeps rejecting after genuine depth improvements.&lt;/strong&gt; The original rejection was partly a *.vercel.app domain issue, but if Google's classifier still rates the pages as thin after I've rebuilt with real structured content and specific editorial attributes, my model of what "quality" means to the classifier is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparison queries migrate fully to LLM chat.&lt;/strong&gt; If people stop typing "X vs Y" into Google and start asking ChatGPT directly, the downstream click I'm betting on disappears. I don't see evidence of this happening at scale for research involving specific attribute constraints — but I'm monitoring query volume patterns month-over-month.&lt;/p&gt;

&lt;p&gt;The first outcome is the one I'd want to see early. Impressions with near-zero clicks on comparison pages by month 3 would tell me to pivot the format immediately rather than wait six months for a conclusion I could have reached sooner.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why three sites instead of one authority site?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three narrow sites let me test three different intent types simultaneously. Games-like, AI tools, and OSS alternatives attract different queries and different audiences. One site would take longer to produce the same signal volume about which format works. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;original architecture post&lt;/a&gt; covers the reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Claude Haiku generate the structured editorial fields?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each ETL run sends entries through a &lt;a href="https://dev.to/morinaga/how-i-built-a-shared-claude-haiku-client-with-system-prompt-caching-for-batch-etl-1ddp"&gt;shared Claude Haiku client&lt;/a&gt; that uses system-prompt caching to amortize the cost across batch runs. The prompts are tuned to force specific attribute outputs — avoid-if caveats, audience fit, freshness status — not open-ended descriptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if one site works and two don't?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a useful outcome, not a failure. The format that works tells me something specific about the intent type. I'll invest in what works and document what didn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where will you publish the October 2026 verdict?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On this blog, with raw Search Console screenshots. I'll publish regardless of whether the numbers are favorable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>indiehackers</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Three post-deploy checks I run after every Cloudflare Pages build</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sun, 17 May 2026 22:12:32 +0000</pubDate>
      <link>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-1off</link>
      <guid>https://dev.to/morinaga/three-post-deploy-checks-i-run-after-every-cloudflare-pages-build-1off</guid>
      <description>&lt;p&gt;After spending two weeks debugging issues that only showed up in production — a &lt;a href="https://dev.to/morinaga/astrojssitemap-generates-sitemap-0xml-not-sitemap-indexxml-on-small-sites-5c7d"&gt;sitemap _redirects rule that was blocking my own sitemap-index.xml&lt;/a&gt; and a &lt;a href="https://dev.to/morinaga/how-i-fixed-a-bluesky-image-upload-race-against-cloudflare-pages-deploy-lag-5ahk"&gt;Bluesky image upload race against Cloudflare Pages deploy lag&lt;/a&gt; — I added three post-deploy checks to my workflow. They're fast and specific to the failure modes I've actually hit, not a full end-to-end test suite.&lt;/p&gt;

&lt;p&gt;Three sites (aiappdex.com, findindiegame.com, ossfind.com) on Cloudflare Pages with Astro 5 SSG. Here's what I check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 1: Sitemap reachability
&lt;/h2&gt;

&lt;p&gt;The simplest check and the one I should have had from day one. After a Cloudflare Pages deploy, I verify that &lt;code&gt;sitemap-index.xml&lt;/code&gt; is reachable and returning 200 on all three domains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;domain &lt;span class="k"&gt;in &lt;/span&gt;aiappdex.com findindiegame.com ossfind.com&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml → &lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAIL: &lt;/span&gt;&lt;span class="nv"&gt;$domain&lt;/span&gt;&lt;span class="s2"&gt; sitemap unreachable"&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also check &lt;code&gt;sitemap-0.xml&lt;/code&gt; — the actual URL sub-sitemap that &lt;code&gt;@astrojs/sitemap&lt;/code&gt; generates — and assert that it contains at least a minimum expected URL count. For aiappdex.com that threshold is 1,000; if it drops below that after a deploy, the ETL data pipeline probably broke silently.&lt;/p&gt;

&lt;p&gt;The reason this check exists: I had a &lt;code&gt;_redirects&lt;/code&gt; rule rewriting &lt;code&gt;sitemap-index.xml&lt;/code&gt; → &lt;code&gt;sitemap-0.xml&lt;/code&gt; as an emergency workaround that turned out to be wrong. It was live for five days before I found it. The rule was blocking the real &lt;code&gt;sitemap-index.xml&lt;/code&gt; from reaching crawlers while appearing fine in the browser (which followed the redirect). Curl with &lt;code&gt;-o /dev/null -w "%{http_code}"&lt;/code&gt; doesn't follow redirects by default, so it would have caught this immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 2: IndexNow batch submission
&lt;/h2&gt;

&lt;p&gt;After every successful sitemap check, I run &lt;code&gt;node scripts/indexnow.mjs&lt;/code&gt;. The script reads the live sitemap XML from each domain, collects all URLs, and POSTs them to the IndexNow endpoint for Bing, Yandex, Naver, and Seznam using site-specific keys.&lt;/p&gt;

&lt;p&gt;Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aiappdex.com: submitted 1179 URLs → 200 OK
findindiegame.com: submitted 139 URLs → 200 OK
ossfind.com: submitted 144 URLs → 200 OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a site returns 403 from IndexNow it usually means the key verification file (&lt;code&gt;/&amp;lt;key&amp;gt;.txt&lt;/code&gt;) wasn't deployed correctly or a &lt;code&gt;_redirects&lt;/code&gt; rule is mangling the path. Catching this right after deploy matters because the IndexNow key-verification window isn't instantaneous — letting it sit in a broken state delays indexing. I wrote more about the IndexNow setup in &lt;a href="https://dev.to/morinaga/indexnow-libsql-and-three-other-tools-i-reached-for-this-week-5c4m"&gt;this week's tools post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I run this manually after deploy rather than inline in the GitHub Actions workflow because the Cloudflare Pages build takes 2-3 minutes, and IndexNow works best with live URLs. Running it as a separate &lt;code&gt;workflow_dispatch&lt;/code&gt; trigger after the deployment succeeds means I'm submitting URLs that are actually live rather than ones that might still be deploying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check 3: Weekly Lighthouse spot-check
&lt;/h2&gt;

&lt;p&gt;The third check runs on a cron — Monday 04:30 UTC — not after every deploy. It's slower (3-4 minutes per site, nine URLs total), so daily would be wasteful for a static site that doesn't change at runtime.&lt;/p&gt;

&lt;p&gt;The workflow uses &lt;code&gt;treosh/lighthouse-ci-action&lt;/code&gt; with one homepage and one deep entry page per site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;site&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;aiappdex.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/models/timm-vit-base-patch16-clip-224-openai/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;findindiegame.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/games/dredge-1562430/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;ossfind.com&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/alternatives/ghost/&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm watching for Performance below 80, CLS above 0.1, or accessibility score regression. Astro SSG with no client-side JS should hold steady on all three — if they slip it means something in Tailwind v4 config or the ad slot component changed the layout paint behavior. The results upload to &lt;code&gt;temporaryPublicStorage&lt;/code&gt; so I can diff before/after on regressions.&lt;/p&gt;

&lt;p&gt;I don't set hard failure thresholds that block deploys. These sites are pre-revenue with essentially zero traffic right now; blocking a deploy because a Lighthouse score dropped from 94 to 88 would be disproportionate. I treat Lighthouse as a trend monitor, not a gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm deliberately not checking
&lt;/h2&gt;

&lt;p&gt;No uptime monitoring — I'm relying on Cloudflare's own infrastructure status. No end-to-end user flow tests. No API availability checks — the Turso DB is only queried at build time in SSG mode, so there's nothing to check at runtime.&lt;/p&gt;

&lt;p&gt;For a dynamically rendered site, those gaps would matter. For a static CDN deployment where the entire runtime is pre-built HTML, CSS, and a handful of JSON files, the three checks above cover the actual failure surface I've encountered.&lt;/p&gt;

&lt;p&gt;The publish pipeline has its own idempotency layer (it reads &lt;code&gt;published_urls&lt;/code&gt; from article frontmatter and skips already-distributed posts), so I don't need to verify cross-posting state after each deploy. That's a separate concern.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>githubactions</category>
      <category>astro</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why I'm betting on AI-curated directories when Google AI Overviews answer the same queries</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sun, 17 May 2026 22:12:28 +0000</pubDate>
      <link>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-3g93</link>
      <guid>https://dev.to/morinaga/why-im-betting-on-ai-curated-directories-when-google-ai-overviews-answer-the-same-queries-3g93</guid>
      <description>&lt;p&gt;The obvious counterargument to everything I'm building is this: Google already does it. You type "best AI tools for video editing" into Google and an AI Overview surfaces a curated list, synthesized from the same kind of data I maintain, without requiring a click. My three directory sites — &lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;Top AI Tools&lt;/a&gt;, &lt;a href="https://findindiegame.com" rel="noopener noreferrer"&gt;Find Games Like&lt;/a&gt;, and &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;Open Alternative To&lt;/a&gt; — are competing with a feature baked into the world's dominant search engine.&lt;/p&gt;

&lt;p&gt;I launched these sites on 2026-04-23, built on &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;an architecture that runs at about $25/month&lt;/a&gt;. Traffic is essentially zero — the sites have been indexed for three weeks and organic crawling takes time. The question I keep returning to isn't whether Google will eventually index my pages. It's whether anyone will prefer clicking through to my site over reading the AI Overview box that already answered the same question.&lt;/p&gt;

&lt;p&gt;Here's my honest, falsifiable position.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bet, stated plainly
&lt;/h2&gt;

&lt;p&gt;By October 2026 — six months post-launch — at least one of the three sites will show organic click trends in Google Search Console indicating real query traffic to specific comparison or filtered-browse pages. I define that as: at least 200 non-homepage organic clicks per month, sustained for two consecutive months, from queries I didn't directly drive through social or newsletter posts.&lt;/p&gt;

&lt;p&gt;If that doesn't happen, I'll publish the Search Console screenshots and write a post explaining what I got wrong. I'm committing to that here.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument I take seriously
&lt;/h2&gt;

&lt;p&gt;AI Overviews have gotten genuinely good at list-and-compare synthesis. If you search "open source alternative to Notion" today, Google often returns a four-item structured list with one-sentence descriptions directly in the Overview box. My Open Alternative To site covers that territory. The AI Overview absorbs the zero-click version of that query.&lt;/p&gt;

&lt;p&gt;The optimistic response is: "my site appears as a citation source." The pessimistic response is: "Google consumes your signal and stops sending clicks." The pessimistic version has supporting evidence — industry-wide CTR on informational queries dropped measurably as AI Overviews expanded through 2025, and the trend hasn't reversed.&lt;/p&gt;

&lt;p&gt;I don't think the pessimistic version is the whole story, but I'm not dismissing it. The most dangerous move is to assume the counterargument is wrong without designing around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AI Overviews have structural blind spots
&lt;/h2&gt;

&lt;p&gt;AI Overviews are strong at synthesizing "what exists." They're weaker at three things I've deliberately built for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attribute-based filtering.&lt;/strong&gt; If someone wants "open source Notion alternatives that work offline and have a mobile app," AI Overviews give hedged prose answers because they're synthesizing text, not querying structured fields. My Turso DB has &lt;code&gt;works_offline&lt;/code&gt;, &lt;code&gt;has_mobile_app&lt;/code&gt;, and &lt;code&gt;last_commit_date&lt;/code&gt; as typed columns. Faceted filtering on those fields is something a browseable directory does better than a language model writing a paragraph about the general landscape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editorial negative-space.&lt;/strong&gt; My game recommender &lt;a href="https://dev.to/morinaga/adding-avoid-if-caveats-to-my-ai-game-recommender-what-changed-hk3"&gt;includes "avoid if" caveats&lt;/a&gt; — structured fields that answer "who should skip this?" generated by a Claude Haiku prompt that specifically forces a critical answer. AI Overviews don't have a mechanism to surface structured negatives. They default to positive framing, which means someone with a specific disqualifying requirement gets an unhelpful answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Freshness on maintenance status.&lt;/strong&gt; The ETL that populates the AI tools directory pulls GitHub commit activity weekly. A tool that hasn't been touched in 14 months is marked as low activity. AI Overviews don't distinguish between a tool actively maintained in 2026 and one that peaked in 2024 — they rely on the recency of web mentions, which can lag by months after a project goes dormant.&lt;/p&gt;

&lt;p&gt;None of these defenses are permanent. Google could build structured attribute filtering into AI Overviews. But they require deliberate pipeline design, not just synthesis, and the gap exists now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The downstream click thesis
&lt;/h2&gt;

&lt;p&gt;Even if my sites lose the zero-click battle on broad discovery terms, there's a second query type I'm explicitly targeting: the downstream comparison query.&lt;/p&gt;

&lt;p&gt;The sequence: someone types "Notion alternatives" into Google, gets an AI Overview naming four tools, then types "Appflowy vs Anytype performance" to compare the two they're considering. That second query is post-AI-Overview research. It has commercial intent. It wants a verdict, not another list.&lt;/p&gt;

&lt;p&gt;For that query, a page with structured attribute comparison, a clear verdict, and fast load time competes directly with another AI-style answer — and structured data beats generative prose for "which one wins on attribute X." This is partly why &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;I chose static SSG over dynamic AI rendering&lt;/a&gt; for these sites: a fast, indexable page with typed comparison fields is what a second-stage research click needs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query type&lt;/th&gt;
&lt;th&gt;AI Overview strength&lt;/th&gt;
&lt;th&gt;Directory strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discovery ("best tools for X")&lt;/td&gt;
&lt;td&gt;High — often answers directly&lt;/td&gt;
&lt;td&gt;Low for zero-click intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparison ("X vs Y, which wins")&lt;/td&gt;
&lt;td&gt;Medium — hedges, rarely commits&lt;/td&gt;
&lt;td&gt;High — structured attrs + verdict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filtered browse ("offline + mobile app")&lt;/td&gt;
&lt;td&gt;Low — prose, no filters&lt;/td&gt;
&lt;td&gt;High — faceted structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Freshness ("is X still maintained?")&lt;/td&gt;
&lt;td&gt;Inconsistent — lags commits&lt;/td&gt;
&lt;td&gt;High — weekly ETL refresh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The comparison and filtered-browse rows are the actual load-bearing columns of this bet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the cost structure matters for intellectual honesty
&lt;/h2&gt;

&lt;p&gt;At $25/month, I can run this experiment for a year without needing revenue to justify continuing. I'm not under pressure to interpret ambiguous signals optimistically.&lt;/p&gt;

&lt;p&gt;Compare that to a project burning $200/month on infrastructure: you'd rationalize flat Search Console data as "still in the sandbox phase" past the point where the data actually says something. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;full cost breakdown&lt;/a&gt; is genuinely minimal — Vercel Pro at $20, Turso starter at $0, Claude Haiku API in single-digit dollars for monthly ETL runs, GitHub Actions on free minutes.&lt;/p&gt;

&lt;p&gt;I won't claim AdSense is approved or revenue is flowing until it is. Right now, &lt;a href="https://dev.to/morinaga/why-google-adsense-will-not-approve-a-vercelapp-site-110b"&gt;AdSense rejected the *.vercel.app version&lt;/a&gt; of the sites. I've moved to custom domains and &lt;a href="https://dev.to/morinaga/verifying-three-custom-domains-in-google-search-console-with-cloudflare-dns"&gt;verified them in Search Console&lt;/a&gt;. I'm waiting for real crawl data before making any claims about what's working.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would change my mind
&lt;/h2&gt;

&lt;p&gt;Three outcomes would tell me the bet is wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impressions but near-zero clicks at 90 days.&lt;/strong&gt; If Search Console shows my pages appearing as AI Overview citation sources but click rates stay near zero on comparison pages specifically, Google is extracting my signal without distributing traffic. That's the worst-case scenario — I'd need to rethink the format entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AdSense keeps rejecting after genuine depth improvements.&lt;/strong&gt; The original rejection was partly a *.vercel.app domain issue, but if Google's classifier still rates the pages as thin after I've rebuilt with real structured content and specific editorial attributes, my model of what "quality" means to the classifier is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparison queries migrate fully to LLM chat.&lt;/strong&gt; If people stop typing "X vs Y" into Google and start asking ChatGPT directly, the downstream click I'm betting on disappears. I don't see evidence of this happening at scale for research involving specific attribute constraints — but I'm monitoring query volume patterns month-over-month.&lt;/p&gt;

&lt;p&gt;The first outcome is the one I'd want to see early. Impressions with near-zero clicks on comparison pages by month 3 would tell me to pivot the format immediately rather than wait six months for a conclusion I could have reached sooner.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why three sites instead of one authority site?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three narrow sites let me test three different intent types simultaneously. Games-like, AI tools, and OSS alternatives attract different queries and different audiences. One site would take longer to produce the same signal volume about which format works. The &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;original architecture post&lt;/a&gt; covers the reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Claude Haiku generate the structured editorial fields?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each ETL run sends entries through a &lt;a href="https://dev.to/morinaga/how-i-built-a-shared-claude-haiku-client-with-system-prompt-caching-for-batch-etl-1ddp"&gt;shared Claude Haiku client&lt;/a&gt; that uses system-prompt caching to amortize the cost across batch runs. The prompts are tuned to force specific attribute outputs — avoid-if caveats, audience fit, freshness status — not open-ended descriptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if one site works and two don't?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a useful outcome, not a failure. The format that works tells me something specific about the intent type. I'll invest in what works and document what didn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where will you publish the October 2026 verdict?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On this blog, with raw Search Console screenshots. I'll publish regardless of whether the numbers are favorable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>indiehackers</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
