<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MORINAGA</title>
    <description>The latest articles on DEV Community by MORINAGA (@morinaga).</description>
    <link>https://dev.to/morinaga</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3907455%2F8e6a4a13-bec8-4ec0-bc2d-ec192b7880f8.png</url>
      <title>DEV Community: MORINAGA</title>
      <link>https://dev.to/morinaga</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/morinaga"/>
    <language>en</language>
    <item>
      <title>Four filters I apply when pulling HuggingFace models into an AI tools directory</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 01 Jul 2026 09:26:17 +0000</pubDate>
      <link>https://dev.to/morinaga/four-filters-i-apply-when-pulling-huggingface-models-into-an-ai-tools-directory-1kjd</link>
      <guid>https://dev.to/morinaga/four-filters-i-apply-when-pulling-huggingface-models-into-an-ai-tools-directory-1kjd</guid>
      <description>&lt;p&gt;HuggingFace's model hub has over 900,000 models as of mid-2026. Surfacing all of them on &lt;a href="https://aiappdex.com" rel="noopener noreferrer"&gt;aiappdex.com&lt;/a&gt; would produce noise, not a directory. The ETL that runs nightly to update the AI tools directory applies four filters before any model is considered for inclusion. Here's what each filter does, what it catches that the previous filter missed, and what the chain still doesn't solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filter 1: pipeline tag — only end-user-facing tasks
&lt;/h2&gt;

&lt;p&gt;The HuggingFace API returns a &lt;code&gt;pipeline_tag&lt;/code&gt; field on every model. The allowed set in my ETL is not the full HuggingFace taxonomy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;text-generation, text-classification, token-classification,
question-answering, summarization, translation, image-classification,
image-generation, image-to-text, text-to-image, automatic-speech-recognition,
text-to-speech, audio-classification, zero-shot-classification,
feature-extraction, sentence-similarity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this excludes: &lt;code&gt;image-segmentation&lt;/code&gt;, &lt;code&gt;object-detection&lt;/code&gt;, &lt;code&gt;depth-estimation&lt;/code&gt;, &lt;code&gt;tabular-regression&lt;/code&gt;, and a dozen other computer-vision pipelines that are real ML tasks but not tools most directory visitors would search for. Also excluded: models with no &lt;code&gt;pipeline_tag&lt;/code&gt; at all, which covers roughly 40% of the hub — mostly adapter weights, partial checkpoints, and fine-tune datasets uploaded alongside a model rather than the model itself.&lt;/p&gt;

&lt;p&gt;This filter alone cuts the candidate set by roughly 80%. The resulting pool is still large enough to be useful — text-generation alone has hundreds of thousands of models — but it's a tractable number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filter 2: minimum likes threshold — filtering out abandoned and test uploads
&lt;/h2&gt;

&lt;p&gt;Every model on HuggingFace has a &lt;code&gt;likes&lt;/code&gt; count. The current threshold in the ETL is 30. Models below that get skipped entirely.&lt;/p&gt;

&lt;p&gt;Thirty likes is a very low bar. What it actually catches: test uploads that were never intended for public use, deprecated model versions that were superseded by a renamed upload, and fine-tunes of popular base models trained on private datasets and uploaded without cleanup. These aren't useful directory entries — they don't have documentation, they often have incorrect or placeholder metadata, and they frequently return 404 on the download endpoint even though the API record exists.&lt;/p&gt;

&lt;p&gt;The 30-like threshold isn't magic. It's the point where I stopped finding entries that were clearly accidental public uploads. Ten likes still produced a lot of noise; 30 produced much less. The threshold is a config value I can change per pipeline_tag if needed — text-generation models benefit from a higher threshold (I'd probably raise it to 50 for that pipeline if I wanted stricter quality) while rarer pipelines like &lt;code&gt;text-to-speech&lt;/code&gt; work better with a lower cutoff because the ecosystem is smaller.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filter 3: last-modified recency — flagging dormant models
&lt;/h2&gt;

&lt;p&gt;This filter doesn't exclude models; it sets a &lt;code&gt;low_activity&lt;/code&gt; flag on models that haven't been modified in 14 months. That flag gets stored in the Turso database and surfaced in the directory as a "last active" label rather than a hidden exclusion.&lt;/p&gt;

&lt;p&gt;Why flag instead of exclude? Because an old model isn't necessarily a useless model. GPT-J 6B is from 2021 and still appears in production stacks. BERT-base-uncased is from 2019 and is used in half the fine-tuning tutorials published in 2026. Excluding by recency would misrepresent the landscape.&lt;/p&gt;

&lt;p&gt;What the low-activity flag does catch: models that were announced with fanfare, got early likes, and then quietly went dormant when the authors moved to a new architecture. Without this flag, those models appear alongside actively maintained alternatives without any signal that maintenance stopped. For someone evaluating tools for production use, that distinction matters.&lt;/p&gt;

&lt;p&gt;The 14-month threshold aligns with how HuggingFace itself handles model cards — a model page that hasn't been touched in 14 months almost certainly doesn't reflect the current state of the codebase or the upstream library it depends on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filter 4: gated and private models — inclusion requires public weights
&lt;/h2&gt;

&lt;p&gt;Models marked &lt;code&gt;gated: true&lt;/code&gt; on HuggingFace require login and a request form to download. Models marked &lt;code&gt;private: true&lt;/code&gt; aren't downloadable at all. Neither appears in the directory.&lt;/p&gt;

&lt;p&gt;The rationale isn't philosophical — it's practical. A directory entry for a model that visitors can't access without a form submission is a poor UX. The directory's value is "find a model, go use it." Gated models break that flow entirely for anyone who hasn't already been approved.&lt;/p&gt;

&lt;p&gt;This filter has one real cost: it excludes some important models. Meta's Llama 3 series launched gated, as did several Mistral fine-tunes during their initial access period. The directory missed those for the window between public announcement and when gating was lifted. That's a real gap. The &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;three-tier content quality ladder&lt;/a&gt; that handles model content upgrades would apply here too — if I wanted to include gated models with a "requires access request" label, I could add a tier for that. I've chosen not to for now because the "access request required" experience isn't what the directory is designed around.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the chain still doesn't solve
&lt;/h2&gt;

&lt;p&gt;These four filters produce a candidate set that's tractable and skewed toward useful entries. What they don't catch:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duplicate fine-tunes.&lt;/strong&gt; There are thousands of Llama-3.1-8B fine-tunes on HuggingFace, all passing the pipeline tag filter, all with enough likes, all public and actively maintained. The directory clusters them by base model but doesn't deduplicate in a meaningful way. Someone searching for an instruction-following model still faces a wall of variants.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quality of the model card.&lt;/strong&gt; A model that passes all four filters might have a model card that says "fine-tuned for [task]" with no further detail — no eval results, no intended use, no known limitations. The ETL can't infer quality from card text reliably. That's what Claude Haiku's &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;editorial generation step&lt;/a&gt; handles: a prompted generation that forces structured outputs around audience fit and limitations. But it's worth naming that the ETL filters select for metadata quality, not model quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing and deployment complexity.&lt;/strong&gt; HuggingFace doesn't expose whether a model runs in 8GB of VRAM, requires a dedicated A100, or is practical to call via the Inference API without self-hosting. That data isn't in the API response. It's the kind of structured attribute that would make the directory genuinely useful for someone deciding between models — and it's something I'd want to add as a manual editorial field rather than an ETL-derived one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>Why I'm betting on Claude Code over Cursor for a solo dev pipeline</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Wed, 01 Jul 2026 09:26:12 +0000</pubDate>
      <link>https://dev.to/morinaga/why-im-betting-on-claude-code-over-cursor-for-a-solo-dev-pipeline-46a0</link>
      <guid>https://dev.to/morinaga/why-im-betting-on-claude-code-over-cursor-for-a-solo-dev-pipeline-46a0</guid>
      <description>&lt;p&gt;I used Cursor as my primary AI coding tool from early 2025 through February 2026. In March, when I started the three-directory-site experiment I've been documenting here, I switched to Claude Code as my main driver. That switch wasn't impulsive — I tried both on the same tasks for about two weeks before committing. Here's the specific bet I'm making, the case against it that I find genuinely compelling, and the three conditions under which I'd reverse the decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bet, stated plainly
&lt;/h2&gt;

&lt;p&gt;By December 2026 — nine months into this project — Claude Code will have saved me more total wall-clock time on this experiment than Cursor would have, net of context-restart overhead and the overhead of not having inline tab completion. The measurement is informal but the conditions aren't vague: if I'm spending more than 20 minutes per week re-establishing context that Cursor would have kept in a sidebar chat history, I'm wrong.&lt;/p&gt;

&lt;p&gt;I'm not claiming Claude Code is better across all developer workflows. My bet is specific to the task distribution this project has required: multi-file refactors touching five or more files simultaneously, GitHub Actions debugging where terminal output is the primary signal, automated pipeline scripts that need AI assistance at invocation time, and &lt;a href="https://dev.to/articles/content-quality-gate-lint-audit-articles"&gt;content-generation runs&lt;/a&gt; that consume the staged git diff as input.&lt;/p&gt;

&lt;h2&gt;
  
  
  What pushed me toward Claude Code
&lt;/h2&gt;

&lt;p&gt;The immediate catalyst was the GitHub Actions CI setup. The &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;three-site architecture&lt;/a&gt; runs a nightly ETL, a daily article generation job, a Bluesky queue refill, and several post-deploy checks — each as a separate workflow. Debugging those workflows from Cursor's chat mode is awkward. Cursor sees the YAML file correctly; it doesn't see the runtime logs that show exactly which step failed and what the failing bash command produced. I had to copy error output from the GitHub Actions interface into Cursor's chat manually, which breaks flow.&lt;/p&gt;

&lt;p&gt;Claude Code sits in the terminal alongside the git output, the pnpm install errors, the node script stack traces. I can paste a failing run log directly into a session without switching context. That terminal-native loop — observe failure, invoke AI, inspect proposed fix, run command — is where I first noticed a meaningful productivity gap.&lt;/p&gt;

&lt;p&gt;The second factor is multi-file coherence. The &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;shared Claude Haiku client&lt;/a&gt; is imported by five different ETL scripts across three separate apps. Refactoring it — adding a retry parameter, changing the caching behavior — means touching all five call sites simultaneously. Claude Code can open all five files in context, reason about which call sites need updating and which don't based on usage patterns, and produce a coherent multi-file diff explanation. Cursor's "apply to multiple files" flow surfaces one diff at a time with manual approval at each step. For this specific operation — a cross-repo parameter change — I find Claude Code's approach faster.&lt;/p&gt;

&lt;p&gt;Third is the article-generation pipeline itself. The &lt;a href="https://dev.to/articles/content-quality-gate-lint-audit-articles"&gt;content quality gate&lt;/a&gt; runs an audit script on every generated file. The routine I run uses staged git output to feed a reviewer, then optionally patches the article before committing. That whole loop — generate, stage, review, fix, re-stage, commit — runs in the terminal. Claude Code can execute bash commands, inspect what changed, and iterate without clipboard handoffs. In Cursor I'd break that flow every time I needed to check the audit output or run a pnpm script.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm giving up
&lt;/h2&gt;

&lt;p&gt;Cursor's inline edit mode is genuinely better for micro-changes. CMD+K opens a floating edit bar at the cursor position, accepts a one-sentence description, and shows an inline diff that accepts or rejects in under two seconds. Claude Code has no equivalent. If I want to rename a variable or flip a conditional, I describe the location in the terminal, wait for the tool to navigate there, and approve the change — objectively slower.&lt;/p&gt;

&lt;p&gt;Tab completion is the other thing I miss. Cursor's completions predict what you're about to type in a familiar codebase with surprising accuracy. In the TypeScript ETL scripts I iterate on constantly, Cursor already knows I'm about to write &lt;code&gt;await db.execute({sql:&lt;/code&gt; and completes the pattern including the object shape. Claude Code has no tab-complete mode; it's interactive-only.&lt;/p&gt;

&lt;p&gt;The third gap is session continuity. Cursor's sidebar chat persists history across sessions. I can scroll back in a Cursor conversation and see the discussion that explained a design decision two weeks ago. Claude Code starts fresh on every invocation. If I'm debugging something I touched four days ago, I'm re-establishing context from git log and file reads rather than from a conversation thread that already captured the reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument I take seriously
&lt;/h2&gt;

&lt;p&gt;The strongest case against my bet: Claude Code's terminal-native strength is also its ceiling. The operations where it beats Cursor — multi-file refactors, CI debugging, pipeline automation — are a minority of actual characters typed in a development session. Line-level edits, variable renames, docstring updates, quick function calls — those are the majority. Cursor handles them faster.&lt;/p&gt;

&lt;p&gt;If the correct mental model is "80% of dev time is small edits, 20% is large operations," then optimizing for the 20% with Claude Code while taking a speed penalty on the 80% is a net loss. Cursor, covering 80% well and 20% adequately, might win on total wall-clock time even if Claude Code wins on per-operation speed for the big tasks.&lt;/p&gt;

&lt;p&gt;I don't have tracked data to refute this cleanly. My intuition is that this project, specifically, is skewed toward the 20% end — pipeline-wide changes, new ETL integrations, debugging CI failures — more than a typical single-app product build would be. The &lt;a href="https://dev.to/articles/ai-directories-vs-google-ai-overviews-bet"&gt;AI directories bet&lt;/a&gt; has the same honest structure: I'm making a claim based on structural reasoning, not on clean measurement.&lt;/p&gt;

&lt;p&gt;What would resolve this: tracking dev time by operation type for four weeks and comparing the two categories. That's a straightforward measurement I haven't run. If small edits consume more than two-thirds of my Claude Code sessions, the counterargument wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost structure
&lt;/h2&gt;

&lt;p&gt;Both tools' monthly costs are close enough that cost alone isn't the deciding factor, but the structure matters for how I think about usage.&lt;/p&gt;

&lt;p&gt;Cursor Pro is $20/month flat, covering unlimited completions and a monthly cap on "premium" model uses — Claude Sonnet and GPT-4o, as of this writing — with automatic fallback to a smaller model when the cap is hit. Predictable cost, opaque per-operation consumption.&lt;/p&gt;

&lt;p&gt;Claude Code bills against an Anthropic API key directly. For my usage pattern — roughly three to five complex sessions per day on a project at this scale — the monthly API cost lands between $15 and $30 depending on session complexity. The variance comes from how often I ask for full-codebase reads versus targeted edits. It's not consistently cheaper than Cursor Pro, and it's not consistently more expensive.&lt;/p&gt;

&lt;p&gt;The meaningful difference is visibility. With Claude Code I can see exactly what each session consumed. With Cursor Pro I don't know whether a "apply to 12 files" operation used one premium credit or ten. For a project where I'm tracking &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;every dollar of infrastructure cost&lt;/a&gt;, per-operation visibility changes how I think about usage patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I partition the workflow in practice
&lt;/h2&gt;

&lt;p&gt;I use Claude Code for anything that starts with a problem statement spanning multiple files: "the ETL is writing duplicate entries — find where the upsert logic lives and figure out why it's firing twice for the same ID." Terminal access plus file-reading plus bash execution in one context is worth the micro-edit tradeoff for that class of problem.&lt;/p&gt;

&lt;p&gt;For genuinely small edits — a Tailwind class adjustment, a typo in a component — I open the file directly in VSCode and edit manually. No AI involved. That's faster than either tool for operations with clear, exact solutions.&lt;/p&gt;

&lt;p&gt;What I've essentially done is partition my editing: Claude Code for architectural operations, direct editing for surgical fixes, no AI for trivial changes. Cursor was attempting to cover all three categories; the result was friction at both ends because the tool can't optimize simultaneously for "large autonomous operation" and "two-keystroke inline fix."&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;static site rendering choice&lt;/a&gt; followed similar reasoning: picking one approach for a specific constraint set rather than picking the tool that's most general. I'm applying the same thinking to development tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would change my mind
&lt;/h2&gt;

&lt;p&gt;Three signals would push me back to Cursor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context loss compounds past a threshold.&lt;/strong&gt; If I find myself spending more than 20 minutes per week re-explaining architectural decisions to fresh Claude Code sessions that should have retained them, the continuity gap stops being an acceptable tradeoff. That threshold is specific enough to evaluate month-by-month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor ships terminal-native agentic mode.&lt;/strong&gt; Cursor is actively developing agentic capabilities. If they ship a mode where Cursor executes terminal commands, observes output, and iterates without requiring IDE focus — essentially what Claude Code's bash tool does — the workflow gap I've described narrows to near zero. I'd run a direct comparison again at that point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task distribution shifts toward UI iteration.&lt;/strong&gt; This project is currently infrastructure-heavy: ETL pipelines, CI workflows, cross-posting automation, the &lt;a href="https://dev.to/articles/pairwise-ai-model-compare-pages-claude-haiku-budget-cap"&gt;pairwise compare page generation&lt;/a&gt;. If it matures into mostly front-end iteration — layout experiments on the directory pages, A/B testing components — the small-edit / tab-completion advantage that Cursor holds would outweigh the pipeline operations advantage. The bet is partly a claim about what the project will continue to require.&lt;/p&gt;

&lt;h2&gt;
  
  
  The December 2026 checkpoint
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/ai-directories-vs-google-ai-overviews-bet"&gt;AI directories bet&lt;/a&gt; has a formal October 2026 deadline. This tooling bet is softer — I'll check in by December 2026 with whatever the data shows. I'll report: how often I hit the context-loss pain point, whether the task distribution stayed infrastructure-heavy, and whether either tool materially changed its offering. If I've switched back to Cursor by then, I'll say so with specifics.&lt;/p&gt;

&lt;p&gt;One thing I won't do is rationalize ambiguous signals optimistically. The same commitment I made about &lt;a href="https://dev.to/articles/bluesky-pre-post-qc-gate-four-gates"&gt;Bluesky automation quality&lt;/a&gt; — systematic gates, not self-review — applies here. If the measurement says I'm wrong, I'll say I'm wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can you use Claude Code and Cursor at the same time?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, and I occasionally do. Claude Code sessions happen in a terminal window; Cursor runs in the IDE alongside. The main friction: if both are running Claude Sonnet simultaneously, they're competing for the same API rate limits. In practice I don't hit conflicts on this project's volume, but it's something to watch for longer agentic sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is Claude Code available everywhere Cursor is?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. Cursor is a full IDE replacement available on Mac, Windows, and Linux. Claude Code is a CLI that requires a terminal. It doesn't have a native Windows GUI experience as of mid-2026, though WSL2 works. For developers primarily on Windows without WSL, this is a practical blocker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about GitHub Copilot — does either replace it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cursor includes Copilot-style tab completion with its own model backend; you don't need a separate Copilot subscription if you're using Cursor Pro. Claude Code doesn't offer tab completion. If inline completions are your primary AI coding use, Claude Code isn't a Copilot replacement — it's a different tool that doesn't try to be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does this tooling choice affect the pipeline automation?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Directly: the ETL scripts, GitHub Actions YAML, and article generation routines are all authored and debugged through Claude Code. Indirectly: the &lt;a href="https://dev.to/articles/content-quality-gate-lint-audit-articles"&gt;content quality gate&lt;/a&gt; and the QC review runs are shell scripts that fit naturally into a Claude Code session but would require clipboard handoffs in Cursor. The tool shapes how I build the automation, which shapes what automation I'm willing to maintain.&lt;/p&gt;




&lt;p&gt;Related: &lt;a href="https://dev.to/morinaga/why-im-betting-static-ssg-beats-dynamic-ai-rendering-for-directory-seo-1pbd"&gt;Why I'm betting static SSG beats dynamic AI rendering for directory SEO&lt;/a&gt; | &lt;a href="https://dev.to/morinaga/i-built-3-programmatic-seo-sites-for-25month-using-claude-haiku-heres-the-full-architecture-3pl8"&gt;The full $25/month architecture&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>How I verify affiliate CTAs are actually rendering in production</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sun, 28 Jun 2026 22:17:37 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-verify-affiliate-ctas-are-actually-rendering-in-production-3i9a</link>
      <guid>https://dev.to/morinaga/how-i-verify-affiliate-ctas-are-actually-rendering-in-production-3i9a</guid>
      <description>&lt;p&gt;I run &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;three directory sites&lt;/a&gt; that display affiliate links, AdSense slots, and Amazon blocks — but only when the corresponding environment variables are set in Cloudflare Pages. When the variables aren't deployed, those sections simply don't render. No error. No broken layout. Just missing revenue, invisible unless you check.&lt;/p&gt;

&lt;p&gt;This happened twice in the first month. A redeploy would go out without the affiliate env vars being re-applied. The site looked identical to a working version at a glance. Clicking around would eventually reveal the missing CTA, but only if you happened to land on the right page type.&lt;/p&gt;

&lt;p&gt;I wrote &lt;code&gt;scripts/check-affiliates.mjs&lt;/code&gt; to make that check fast and explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the script does
&lt;/h2&gt;

&lt;p&gt;The script checks three sites in sequence. For each site, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fetches &lt;code&gt;/sitemap-index.xml&lt;/code&gt; to find the sub-sitemap for detail pages&lt;/li&gt;
&lt;li&gt;Picks one detail URL from that sitemap&lt;/li&gt;
&lt;li&gt;Fetches the raw HTML&lt;/li&gt;
&lt;li&gt;Checks for specific strings that indicate each CTA is rendered&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The output is a plain pass/fail report per site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;→ aiappdex.com
  ads.txt: ✓ pub ID set
  sample: https://aiappdex.com/models/qwen2-7b/
  affiliate CTA "Run this model on": ✓ rendered
  AdSense slot: ✓ in HTML
  Amazon block: ✗ hidden (PUBLIC_AMAZON_TAG not deployed)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line per check, one pass per site, one run to confirm three deployments.&lt;/p&gt;

&lt;h2&gt;
  
  
  The sitemap crawl
&lt;/h2&gt;

&lt;p&gt;Hardcoding a URL would make the check brittle — detail URLs change when slugs are regenerated, and I'd rather not maintain a separate list. Using the sitemap is more robust: it's the canonical URL source the site already generates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pickFirstSlug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;site&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sitemapRes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;site&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/sitemap-index.xml`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;sitemapRes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sitemapRes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;subSitemap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;loc&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;([^&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;loc&amp;gt;/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[])[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;subSitemap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;subRes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;subSitemap&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;subRes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sub&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;subRes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matchAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;loc&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;([^&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;loc&amp;gt;/g&lt;/span&gt;&lt;span class="p"&gt;)].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;detail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;detail&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It parses &lt;code&gt;&amp;lt;loc&amp;gt;&lt;/code&gt; tags from the sitemap XML with a regex rather than a full XML parser. That's intentional — &lt;code&gt;DOMParser&lt;/code&gt; isn't available in Node by default, and adding a dependency for sitemap parsing felt disproportionate. The regex works because &lt;a href="https://www.sitemaps.org/protocol.html" rel="noopener noreferrer"&gt;sitemap XML format&lt;/a&gt; is structurally consistent; a more complex format would warrant a parser.&lt;/p&gt;

&lt;p&gt;One thing to note: the function takes a &lt;code&gt;prefix&lt;/code&gt; argument (like &lt;code&gt;/models/&lt;/code&gt; or &lt;code&gt;/games/&lt;/code&gt;). That's how I distinguish detail pages from the index pages that also appear in the sitemap. I want a URL like &lt;code&gt;/models/qwen2-7b/&lt;/code&gt;, not &lt;code&gt;/models/&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checking ads.txt and affiliate strings
&lt;/h2&gt;

&lt;p&gt;The ads.txt check is a separate fetch, not the HTML check. It looks for the AdSense publisher ID pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;adsTxtRes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;site&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/ads.txt`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;adsTxt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;adsTxtRes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hasAdsensePub&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/pub-&lt;/span&gt;&lt;span class="se"&gt;\d{10,}&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;adsTxt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HTML checks are string presence checks against the fetched page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hasSection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;section&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hasAdsense&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;adsbygoogle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data-ad-client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hasAmazon&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Gear up on Amazon&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;amazon.com/s?k=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;section&lt;/code&gt; variable is site-specific. For &lt;code&gt;aiappdex.com&lt;/code&gt; it's &lt;code&gt;"Run this model on"&lt;/code&gt;; for &lt;code&gt;findindiegame.com&lt;/code&gt; it's &lt;code&gt;"Find on other stores"&lt;/code&gt;; for &lt;code&gt;ossfind.com&lt;/code&gt; it's &lt;code&gt;"Self-host on"&lt;/code&gt;. These are heading strings that only appear in the rendered HTML when the relevant env var is set.&lt;/p&gt;

&lt;p&gt;I deliberately check strings that are human-readable rather than env var names or data-attributes. If the heading renders, the CTA is live. If the heading is absent, something upstream didn't connect. The message in that case tells me exactly which env var to check in Cloudflare.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is better than a visual check
&lt;/h2&gt;

&lt;p&gt;The pattern I was relying on before — opening a few pages after a deploy and eyeballing them — has two problems. First, it's slow across three sites with multiple CTA types. Second, it's unreliable at catching conditional rendering: an affiliate block that's absent looks the same as a block I intentionally disabled or that I haven't scrolled to yet.&lt;/p&gt;

&lt;p&gt;A script that fetches programmatically, checks presence by string match, and reports pass/fail for each CTA type takes about two seconds and catches the failure unambiguously. The output is readable in a terminal and doesn't require loading a browser.&lt;/p&gt;

&lt;p&gt;This connects to the same principle behind the &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;three-tier content quality ladder&lt;/a&gt;: checks at different stages catch different things. Post-deploy verification catches deploy-time configuration problems. Pre-commit linting catches content problems. Neither replaces the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd add
&lt;/h2&gt;

&lt;p&gt;Right now the script only checks one sample page per site. A more thorough version would check one page from each content type per site — a model page, a compare page, an alternatives page — since some CTAs only render on specific page types. That would require more sitemap traversal but would catch more edge cases.&lt;/p&gt;

&lt;p&gt;The output format is human-readable but not machine-parseable. If I wanted to hook this into CI and fail a deploy when a CTA is missing, I'd add a JSON output mode and return a non-zero exit code on any &lt;code&gt;✗&lt;/code&gt;. For now I run it manually after deploys — it takes less than ten seconds and the terminal output is enough.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>indiehackers</category>
      <category>showdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>How I built a content quality gate that stops bad articles before they publish</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sun, 28 Jun 2026 22:17:32 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-built-a-content-quality-gate-that-stops-bad-articles-before-they-publish-p5c</link>
      <guid>https://dev.to/morinaga/how-i-built-a-content-quality-gate-that-stops-bad-articles-before-they-publish-p5c</guid>
      <description>&lt;p&gt;I run &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;three directory sites and a content pipeline&lt;/a&gt; that generates and cross-posts articles to Dev.to, Hashnode, and Bluesky automatically. The pipeline has been running for about six weeks. Early on I found a category of failure that no amount of CI infrastructure was catching: content quality problems. Wrong tags. Cliché phrases that slipped past self-review. Articles that implied specific traffic metrics I couldn't back up. Fabricated specificity disguised as honest reporting.&lt;/p&gt;

&lt;p&gt;The solution was &lt;code&gt;scripts/audit-articles.mjs&lt;/code&gt; — a lint-style quality gate that runs on every new article before the publish step. It works the way &lt;code&gt;eslint&lt;/code&gt; works for code: structured checks, clear error/warning distinction, strict mode for pre-publish, lenient mode for a historical baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a lint gate and not a manual review
&lt;/h2&gt;

&lt;p&gt;The specific failure mode I was trying to prevent was this: automated generation leaves an article that reads fine on first pass but fails on systematic inspection. A cliché phrase at the start of a section. The tag &lt;code&gt;"seo"&lt;/code&gt; slipping in when the pool explicitly forbids it. A word count of 580 when the spec requires 600-900 for a lightweight article. These aren't hard to catch — they're tedious to catch every single time, and tedium is where manual review degrades.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/bluesky-pre-post-qc-gate-four-gates"&gt;pre-post Bluesky QC gate&lt;/a&gt; I built earlier applies the same principle to social posts: systematic gates catch what self-review misses reliably. For articles, the gate runs before the publish workflow can touch the file.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/jsonld-audit-post-deploy-ci"&gt;JSON-LD audit script&lt;/a&gt; runs a similar check post-deploy against live pages. &lt;code&gt;audit-articles.mjs&lt;/code&gt; runs pre-commit, against local markdown. Catching it before it ships is always cheaper than chasing it down after.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the gate checks
&lt;/h2&gt;

&lt;p&gt;The script runs about 12 distinct checks per file, split into errors (fail hard in strict mode) and warnings (report but continue by default).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontmatter structure.&lt;/strong&gt; Four required keys must be present: &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, &lt;code&gt;tags&lt;/code&gt;, &lt;code&gt;publish_to&lt;/code&gt;. Missing any of them is an error. &lt;code&gt;publish_to&lt;/code&gt; must contain only known targets. &lt;code&gt;tags&lt;/code&gt; must be an array of exactly 4 items, each drawn from an explicit pool of 18 allowed values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TAG_POOL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;astro&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;webdev&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;showdev&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;typescript&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;javascript&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;indiehackers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;productivity&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;opensource&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;programming&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tutorial&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;machinelearning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vercel&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;turso&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tailwindcss&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;githubactions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any tag outside the pool is an error, not a warning. The pool exists to prevent both &lt;code&gt;"seo"&lt;/code&gt; (explicitly prohibited in the spec) and organic drift toward one-off tags that don't build topic coherence. The &lt;a href="https://dev.to/articles/astro-content-collections-editorial-layer-programmatic"&gt;editorial layer over programmatic content&lt;/a&gt; has a similar constraint: structure imposed at ingestion time is easier to maintain than structure enforced by convention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Title and description length.&lt;/strong&gt; Titles over 90 characters are an error — Dev.to and Hashnode truncate feed titles beyond that. Descriptions over 200 characters are an error — the meta description budget for Google and Hashnode display.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word count.&lt;/strong&gt; This is a warning rather than a hard error because the threshold varies by article archetype. A lightweight at 580 words isn't technically failing a hard constraint, but the flag forces a decision: add a section, or accept the shorter count and justify it. The word count always appears in the summary line regardless, so there's no hiding from it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cliché detection.&lt;/strong&gt; 14 literal phrases are checked case-insensitively against the full body. The list covers diving metaphors, fast-paced-world openers, hyperbolic tech superlatives, and in-this-article throat-clearing — the standard set of AI-assisted writing tics. I won't reproduce all 14 here for an ironic reason I'll cover in "What I'd do differently."&lt;/p&gt;

&lt;p&gt;The check is simple &lt;code&gt;String.prototype.includes&lt;/code&gt;, not regex, because the phrases are specific enough that false positives aren't a real concern. An article using the typical "world of AI" framing is probably using clichéd structure; the check forcing me to see it is the point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fabricated metric detection.&lt;/strong&gt; Two regex patterns catch the most common forms of fabricated social proof:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;FABRICATED_METRIC_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(\d{2,3}&lt;/span&gt;&lt;span class="sr"&gt;,&lt;/span&gt;&lt;span class="se"&gt;?\d{3,}&lt;/span&gt;&lt;span class="sr"&gt;|&lt;/span&gt;&lt;span class="se"&gt;\d{4,})\s&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;visit|view|readers&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;|users&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;|subscribers&lt;/span&gt;&lt;span class="se"&gt;?)\b&lt;/span&gt;&lt;span class="sr"&gt;/gi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;ranked #1&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;/gi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;million&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;s&lt;/span&gt;&lt;span class="se"&gt;)?\s&lt;/span&gt;&lt;span class="sr"&gt;+of&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;users&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;|readers&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;|developers&lt;/span&gt;&lt;span class="se"&gt;?)&lt;/span&gt;&lt;span class="sr"&gt;/gi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first pattern catches round numbers followed by social-proof nouns like "visitors", "users", or "subscribers" — numbers presented as fact when the sites are young enough that no credible claim to that kind of traffic exists. The pattern requires a four-digit-or-higher number followed directly by a social-proof noun. It catches claimed traffic figures but not numeric references in code or data discussion.&lt;/p&gt;

&lt;p&gt;This check produces an error, not a warning, because fabricated metrics are the one category I genuinely can't allow. The whole premise of the &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;three sites experiment&lt;/a&gt; depends on readers trusting that I'm reporting honestly. One fabricated stat poisons that trust retroactively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internal and external link presence.&lt;/strong&gt; For pillar-sized articles (over 1200 words), the check warns if internal links number fewer than 5 — the spec requires 5-10 for pillar archetype. For any article over 600 words, it warns if there are zero external links. The check uses a simple regex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;internalLinks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\]\(\/[^&lt;/span&gt;&lt;span class="sr"&gt;)&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\)&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[]).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;wc&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;internalLinks&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`pillar-sized (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;wc&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; words) has &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;internalLinks&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; internal links — spec requires 5-10`&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://dev.to/articles/noindex-gate-programmatic-pages-without-404s"&gt;noindex gate for programmatic pages&lt;/a&gt; uses a similar "catch before ship" logic at the infrastructure level. The article gate applies it at the content level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sentence repetition.&lt;/strong&gt; Any sentence appearing three or more times in the same article generates a warning. This catches a specific AI-assisted writing failure: a paragraph occasionally gets reformulated twice, and the reformulation ends up identical to the original a few hundred words later. The check normalizes to lowercase and trims whitespace before comparing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strict versus lenient mode
&lt;/h2&gt;

&lt;p&gt;The gate has two distinct behaviors depending on how it's called. The &lt;a href="https://nodejs.org/api/process.html#processexitcode" rel="noopener noreferrer"&gt;Node.js &lt;code&gt;process.exitCode&lt;/code&gt; documentation&lt;/a&gt; is the relevant primitive — returning &lt;code&gt;1&lt;/code&gt; from &lt;code&gt;main()&lt;/code&gt; is enough to block any downstream step that checks exit codes.&lt;/p&gt;

&lt;p&gt;When run against a single file (&lt;code&gt;node scripts/audit-articles.mjs path/to/article.md&lt;/code&gt;) or with &lt;code&gt;--strict&lt;/code&gt;, errors cause a non-zero exit code and the publish step stops.&lt;/p&gt;

&lt;p&gt;When run against all articles without explicit paths (&lt;code&gt;node scripts/audit-articles.mjs&lt;/code&gt;), errors are reported but don't fail the process. This baseline scan mode exists because older articles pre-date some current rules, and a retroactive constraint would block everything. The &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;three-tier content quality ladder&lt;/a&gt; applies here: different content tiers have different quality expectations.&lt;/p&gt;

&lt;p&gt;The article generation routine runs strict mode on each newly staged file before committing. The publish workflow runs against the specific file being published. The all-articles scan runs periodically to report on historical drift without failing anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Title duplicate detection across the repo
&lt;/h2&gt;

&lt;p&gt;When scanning all articles (not single-file mode), the gate runs a cross-article deduplication check. Titles are normalized to lowercase alphanumeric, whitespace collapsed, then compared:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;detectTitleDuplicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reports&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;titles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;reports&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[^&lt;/span&gt;&lt;span class="sr"&gt;a-z0-9&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;titles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;titles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;titles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(([,&lt;/span&gt; &lt;span class="nx"&gt;paths&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;paths&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;paths&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;paths&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches rephrase cases more than exact duplicates (which would be obvious). "Why I use Turso for my Astro monorepo" and "Using Turso libSQL in an Astro monorepo" normalize to similar-enough strings to surface as candidates. The check doesn't do semantic similarity — it doesn't need to. Structural overlap is enough of a signal to warrant a second look, given how the &lt;a href="https://dev.to/articles/pipeline-aware-content-variants-astro-directory"&gt;pipeline-aware content variants approach&lt;/a&gt; works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually got caught
&lt;/h2&gt;

&lt;p&gt;Since adding the gate, here are the real categories it's flagged that I fixed before publishing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tag outside pool&lt;/strong&gt;: twice. Both times were &lt;code&gt;"seo"&lt;/code&gt; appearing in generated frontmatter. Reliable pattern: when generating frontmatter from a prompt, the model includes obviously relevant tags that are explicitly off-limits in the spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word count too low&lt;/strong&gt;: three lightweight articles came in at 540-580 words. Two got an additional section and reached spec. One I accepted shorter — the content was complete and adding words would have been padding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fabricated metric match&lt;/strong&gt;: once. The generated draft said "early testers report" something in a way that parsed as a quantified claim. Fix was removing the number.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing external link&lt;/strong&gt;: regularly. Generated articles sometimes make claims about tools or APIs without citing the primary source. About half the time I add a link on review; the other half I rephrase to not make a claim that requires one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;The cliché list is static. A smarter approach: add phrases from real failures after the fact. When a specific phrase slips through and I notice it post-publish, add it. The list has grown from 8 to 14 entries through manual additions, but there's no automated feedback loop. One possible improvement: run the gate on published articles periodically and flag new patterns.&lt;/p&gt;

&lt;p&gt;The biggest gap: the checker doesn't strip code blocks before running cliché or metric checks. This article itself demonstrates the problem — I had to describe the cliché list in prose rather than including it as a JavaScript literal, because reproducing the exact phrases inside a code block triggers the gate. The fix is straightforward: split the body on triple-backtick fences, check only the prose regions. A two-pass preprocessing step before any pattern check would resolve this entirely. I haven't shipped it yet because the current behavior just means I write around it, which isn't painful enough to force the fix.&lt;/p&gt;

&lt;p&gt;CI integration is partial. The gate runs as part of the article generation routine and as a verify step, but it doesn't run as a blocking step in the publish workflow itself. That means a file committed outside the routine could bypass the gate entirely. The fix is one line in &lt;code&gt;publish-articles.yml&lt;/code&gt; — a step I haven't prioritized because the routine path is currently the only real publishing path.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/eeat-transparency-pages-programmatic-directory"&gt;EEAT transparency pages&lt;/a&gt; I built approach content credibility from the site structure side. The lint gate approaches it from the individual article side. Both serve the same goal: content that doesn't embarrass the project in hindsight.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why not use an existing prose linter like alex or write-good?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;alex&lt;/code&gt; focuses on inclusive language; &lt;code&gt;write-good&lt;/code&gt; checks passive voice and weak qualifiers. Neither checks frontmatter structure, tag pools, or fabricated metrics — the pipeline-specific failures I actually need to catch. A domain-specific gate catches domain-specific failures better than a general-purpose tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does the fabricated metric check handle code blocks?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Currently it doesn't exclude them, which is the known gap. Numeric references in code examples occasionally trigger false positives. I review the flagged line location before treating it as a real failure. Pre-processing the body to strip triple-backtick blocks would fix this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why warn on internal link count instead of hard-erroring?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because article type isn't always deterministic from content alone. A 1200-word article might be a pillar or a borderline lightweight. Treating it as a warning preserves the ability to make a judgment call; treating it as a hard error would block edge cases that are genuinely fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens to historical articles that fail current checks?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Baseline scan mode reports them without failing. Historical content is in a different tier. New articles must pass strict mode; historical articles are known issues reported for visibility, not blocked.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>showdev</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>Two-host AI dialogue specs: how I structure YouTube longform scripts with A/B speaker JSON</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sun, 28 Jun 2026 08:39:02 +0000</pubDate>
      <link>https://dev.to/morinaga/two-host-ai-dialogue-specs-how-i-structure-youtube-longform-scripts-with-ab-speaker-json-451c</link>
      <guid>https://dev.to/morinaga/two-host-ai-dialogue-specs-how-i-structure-youtube-longform-scripts-with-ab-speaker-json-451c</guid>
      <description>&lt;p&gt;The video pipeline I've been building for &lt;a href="https://dev.to/articles/single-ci-pipeline-two-youtube-channels-three-seo-sites"&gt;two YouTube channels running off this monorepo&lt;/a&gt; started with short-form vertical clips — a single narrator, a single slide, done. Longform is different. A ten-minute explainer with one voice and no conversational variation is hard to watch even when the content is good. I wanted something that felt like two people talking through a problem, not a text-to-speech audiobook.&lt;/p&gt;

&lt;p&gt;The solution was a two-host dialogue spec: a JSON file where each line of audio is tagged with a speaker (A or B), and the build script renders it as a full-length video alternating between two neural voices.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the spec looks like
&lt;/h2&gt;

&lt;p&gt;The simplest possible spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How Turso libSQL compares to Cloudflare D1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cloudflare"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"turso"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"privacy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"segments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"speaker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Turso and D1 look similar from the outside — both are SQLite-compatible edge databases."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"slide"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"heading"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Turso vs D1"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"speaker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"B"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Right. But they differ significantly on branching, replication topology, and cost model once you scale past the free tier."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"speaker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Let's go through each. First, branching."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"slide"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"section"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"heading"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Branching"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to notice. The &lt;code&gt;slide&lt;/code&gt; field is optional — when omitted, the build script holds the previous slide while the audio plays. This means you don't need a new visual for every sentence, which would be exhausting to maintain and would produce a choppy video. A new slide appears only when there's something worth showing.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;speaker&lt;/code&gt; field maps to a voice. In &lt;code&gt;build_longform.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;VOICE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LF_VOICE_A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US-GuyNeural&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LF_VOICE_B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US-AvaNeural&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both are &lt;a href="https://github.com/rany2/edge-tts" rel="noopener noreferrer"&gt;edge-tts&lt;/a&gt; neural voices — a Python wrapper around Microsoft Edge's text-to-speech API that gives access to the same neural voices as the browser without requiring an Azure subscription. The A/B assignment came from testing: one lower/measured voice for exposition, one that sounds more conversational for follow-up and counterpoint. You can override both with environment variables, which matters if the default voices aren't available in a given edge-tts catalog version.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the build works
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;build_longform.py&lt;/code&gt; processes the spec linearly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For each segment with a &lt;code&gt;slide&lt;/code&gt;, render the slide to PNG via &lt;code&gt;slides.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Synthesize the segment's &lt;code&gt;text&lt;/code&gt; with &lt;code&gt;edge-tts&lt;/code&gt; for the assigned speaker voice, writing to an mp3&lt;/li&gt;
&lt;li&gt;Build a silent video clip from the PNG, then mux it with the audio&lt;/li&gt;
&lt;li&gt;After all segments: concatenate all clips with ffmpeg&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result is a single &lt;code&gt;output.mp4&lt;/code&gt; where each visual change happens exactly when a new slide is specified in the spec — usually at section transitions, not on every sentence.&lt;/p&gt;

&lt;p&gt;If a segment has no &lt;code&gt;slide&lt;/code&gt; key, the previous slide's PNG is reused. The timing automatically matches the audio duration because each clip is built from its own audio file. No manual timestamp editing.&lt;/p&gt;

&lt;p&gt;The CI step that runs this writes the output path to an environment file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;YT_OUTPUT_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/longform/output.mp4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The downstream YouTube publish step reads that variable and uploads. Same pattern as the &lt;a href="https://dev.to/articles/two-host-video-pipeline-edge-tts-pillow-ffmpeg"&gt;short-form video pipeline I wrote about&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the spec generator produces
&lt;/h2&gt;

&lt;p&gt;The specs aren't written by hand. A Claude call takes a topic and an outline and produces the full segment list, deciding where slides should appear, which speaker handles which part of the argument, and what heading text goes on each slide.&lt;/p&gt;

&lt;p&gt;The prompt instructs the model to split responsibilities clearly: speaker A leads and introduces, speaker B challenges, adds nuance, or extends with examples. This produces a conversational dynamic that's more engaging than a single narrator even though neither voice is a real person.&lt;/p&gt;

&lt;p&gt;One thing that took adjustment: Claude tends to generate very even A/B splits — roughly alternating every sentence. Real dialogue isn't that regular. I added an instruction to vary the run lengths: sometimes A speaks three sentences before B responds, sometimes B only adds a single sentence. That small change makes the output feel less mechanical.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I haven't solved yet
&lt;/h2&gt;

&lt;p&gt;The PNGtuber-style character art mentioned in the build script (&lt;code&gt;_host_assets()&lt;/code&gt; function) is asset-gated and returns &lt;code&gt;None&lt;/code&gt; currently — I haven't made the visual assets for either host. The code path is there for when I do, but for now the video is slide-only with audio.&lt;/p&gt;

&lt;p&gt;The slide renderer (&lt;code&gt;slides.py&lt;/code&gt;) is also limited to a few layouts: title cards, section headers, comparison tables, bullet lists. Richer layouts like code blocks with syntax highlighting or real diagrams would require more work in Pillow or a headless browser, which I'm deferring. The &lt;a href="https://dev.to/articles/mermaid-matplotlib-ci-youtube-slides"&gt;Mermaid + matplotlib diagram pipeline&lt;/a&gt; I built for short-form videos doesn't cleanly transfer to longform because the timing model is different.&lt;/p&gt;

&lt;p&gt;The two-voice format is working for the content I'm producing. Whether it affects watch time versus a single-voice format — I don't have enough data yet to say anything reliable. I'll publish numbers once there are 30+ videos in the channel.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>showdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I built a pre-post QC gate that blocks Bluesky automation from self-revealing</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sun, 28 Jun 2026 08:38:57 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-built-a-pre-post-qc-gate-that-blocks-bluesky-automation-from-self-revealing-41ja</link>
      <guid>https://dev.to/morinaga/how-i-built-a-pre-post-qc-gate-that-blocks-bluesky-automation-from-self-revealing-41ja</guid>
      <description>&lt;p&gt;Three weeks into running a Bluesky queue from an automated content pipeline, I saw a post go out that referenced "the content pipeline" directly. Not egregiously — it was a passing phrase — but it was the kind of thing that reads differently on a social timeline than it does in a dev.to article. On dev.to, being honest about automation is a feature. On Bluesky, unprompted mentions of your own automation mechanism register as a red flag to human readers who are already primed to distrust content farms.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/bluesky-jsonl-queue-daily-posts-no-scheduler"&gt;JSONL-based queue I described earlier&lt;/a&gt; was working fine mechanically — entries generate, sit in the queue, and flush one at a time via a cron job. But there was no filter between the generation step and the post step. Whatever the prompt produced went into the queue, and whatever was at the front of the queue got posted. The &lt;a href="https://docs.bsky.app/docs/tutorials/creating-a-post" rel="noopener noreferrer"&gt;Bluesky AT Protocol post API&lt;/a&gt; has no server-side content filter beyond spam detection, so the responsibility sits entirely with the client.&lt;/p&gt;

&lt;p&gt;I spent a Saturday building &lt;code&gt;bluesky-qc.mjs&lt;/code&gt;. It's a gate script that now runs as the first step in the posting workflow. Here's how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture: a gate between queue and post
&lt;/h2&gt;

&lt;p&gt;Before &lt;code&gt;bluesky-qc.mjs&lt;/code&gt;, the cron job was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;bluesky&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mjs&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nx"&gt;Bluesky&lt;/span&gt; &lt;span class="nx"&gt;API&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bluesky-qc.mjs → (PASS) bluesky-post-queue.mjs → Bluesky API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both scripts read from the same &lt;code&gt;content/bluesky-queue.jsonl&lt;/code&gt; file. The QC script walks entries in order, applies four gates to each, and either clears the first clean entry for the post script or moves failing entries to a rejection log. The post script then finds the first unposted entry in the queue — which, if QC just ran, should be a clean one.&lt;/p&gt;

&lt;p&gt;In GitHub Actions this runs as &lt;code&gt;pnpm bluesky:qc-then-post&lt;/code&gt;, a single composite command. If QC rejects everything in the queue and there's nothing clean to pass through, the workflow exits 0 without posting. Skipping a day is fine. Posting something that reads as automation-reveal isn't.&lt;/p&gt;

&lt;p&gt;This connects to the same philosophy behind the &lt;a href="https://dev.to/articles/bluesky-image-upload-cloudflare-pages-race-fix"&gt;Cloudflare Pages race fix&lt;/a&gt; I wrote about earlier — building explicit serialization into a pipeline that otherwise looks like it can run all steps in parallel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gate 1: vocabulary rejection
&lt;/h2&gt;

&lt;p&gt;G1 is a compiled case-insensitive regex against a list of phrases that signal automated origin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;REVEAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="sr"&gt;/programmatic|content&lt;/span&gt;&lt;span class="se"&gt;[\s&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;pipeline|AI-curated|AI-generated|AI authorship&lt;/span&gt;&lt;span class="err"&gt;|
&lt;/span&gt;   &lt;span class="nx"&gt;my&lt;/span&gt; &lt;span class="nx"&gt;sites&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nf"&gt;three &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;independent&lt;/span&gt; &lt;span class="p"&gt;)?&lt;/span&gt;&lt;span class="nx"&gt;sites&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;pages&lt;/span&gt; &lt;span class="nx"&gt;generated&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;records_in&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="nx"&gt;llm_calls&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;build_duration&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;directory&lt;/span&gt; &lt;span class="nx"&gt;site&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;scaled&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;batch&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="mi"&gt;120&lt;/span&gt; &lt;span class="nx"&gt;pages&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;bcron&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;autonomous&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;unsupervised&lt;/span&gt; &lt;span class="nx"&gt;PR&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;curat&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;generation&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="nx"&gt;generate&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;disclos&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;thin&lt;/span&gt; &lt;span class="nx"&gt;pages&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;thin&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;bprose&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;HCU&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="nx"&gt;vs&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="kd"&gt;get&lt;/span&gt; &lt;span class="nx"&gt;dropped&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;pruning&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;bounce&lt;/span&gt; &lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;citation&lt;/span&gt; &lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nx"&gt;refresh&lt;/span&gt; &lt;span class="nx"&gt;signals&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="nx"&gt;experiments&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;ix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is aggressive by design. "curat" catches "curated" and "curation" — both common in automated content contexts. "generate" catches anything mentioning generation. Posts occasionally trip G1 on phrases I'd consider acceptable, and I intentionally leave the regex tight. If a post needs to reference generation or curation to make its point, Bluesky probably isn't the right channel for that thought; dev.to is.&lt;/p&gt;

&lt;p&gt;The pattern grew over time. I started with about 15 terms and added ~15 more after specific entries cleared an early version and still felt off when I reviewed the rejection log. The final regex is the output of four weeks of manually auditing what the gate should have caught.&lt;/p&gt;

&lt;p&gt;One thing I added late: &lt;code&gt;\bcron\b&lt;/code&gt;. Even the word "cron" — in a sentence like "I run this as a cron job" — is a signaling phrase for automation in a social context where the default assumption is that people are posting manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gate 2: freshness, two parts
&lt;/h2&gt;

&lt;p&gt;Staleness shows up in two distinct forms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stale phrasing&lt;/strong&gt;: the entry uses time-relative language that was accurate when generated but isn't accurate when it finally posts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;STALE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;today&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;|this week|yesterday|this morning|just&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;announced|released|landed|launched|dropped&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An entry might say "just dropped" about something that was new when the prompt ran but is three days old by post time. G2a catches this before it reaches the timeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stale timestamp&lt;/strong&gt;: the entry was created more than &lt;code&gt;TTL_DAYS = 14&lt;/code&gt; days ago. The queue sits ahead of a given entry if the entry was generated a while back and newer entries jumped ahead of it. Both &lt;code&gt;created_at&lt;/code&gt; and &lt;code&gt;generated_at&lt;/code&gt; field names are checked because the generation scripts in this repo aren't consistent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;generated_at&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ageDays&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;86400000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ageDays&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;TTL_DAYS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;reasons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`G2: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ageDays&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; days old (TTL &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;TTL_DAYS&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)`&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 14-day TTL is based on my queue depth and posting rate. At one post per day, anything more than 14 entries deep in the queue was probably generated in a context that no longer applies — the tool being referenced may have had an update, the framing may feel dated.&lt;/p&gt;

&lt;p&gt;Both sub-checks write separate rejection reasons so the log makes clear which flavor of staleness triggered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gate 3: engagement prediction (warn-only in v1)
&lt;/h2&gt;

&lt;p&gt;Gate 3 uses &lt;code&gt;data/bluesky-engagement-profile.json&lt;/code&gt;, generated weekly by &lt;code&gt;bluesky-engagement-stats.mjs&lt;/code&gt;. That script pulls my last 300 posts from the Bluesky API, calculates a score per post as &lt;code&gt;likes + 2×reposts + replies&lt;/code&gt; (reposts weighted higher as a stronger signal), and builds a breakdown by hashtag.&lt;/p&gt;

&lt;p&gt;At post time, G3 looks at which hashtags appear in the pending entry and computes a predicted score relative to the baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;by_hashtag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matchAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/#&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z&lt;/span&gt;&lt;span class="se"&gt;][&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z0-9_&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+/g&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;found&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tags&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;by_hashtag&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;median&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;baseline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;overall&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;median&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;found&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;predicted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;found&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="nx"&gt;g3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;predicted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;found&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;G3 is warn-only in v1. It logs the predicted vs. baseline score but doesn't reject. The reason: I don't have enough post history yet for the signal to be reliable. The &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;shared Claude Haiku client&lt;/a&gt; I use for content generation has been running for about two months, and post volume is roughly one per day — that's around 60 data points. Median engagement is still low enough that G3's predictions are noisy.&lt;/p&gt;

&lt;p&gt;When I flip G3 to hard-fail (the code has a comment marking where the threshold check goes), I expect it to catch G1/G2 survivors that technically pass the text checks but target hashtags that don't perform for this account. A comment in the code marks this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// v1: warn-only. データ蓄積後に hard-fail へ昇格予定&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The engagement stats script also breaks down by time window (30d, 60d, 90d), so the profile should become more useful as data accumulates over the next few months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gate 4: reserved for Codex
&lt;/h2&gt;

&lt;p&gt;The fourth gate is unimplemented. The design intent is a &lt;code&gt;--codex&lt;/code&gt; flag that pulls Codex for a quality pass on anything that cleared G1-G3 but still needs a final review. In the current setup, Codex runs at generation time through the article pipeline (the &lt;a href="https://dev.to/articles/five-things-noticed-week-ci-cost-bluesky-qc-cc0"&gt;three-layer Codex protocol&lt;/a&gt; handles article quality) — but for Bluesky posts, the generation step doesn't include a Codex pass. G4 would close that gap.&lt;/p&gt;

&lt;p&gt;I'm deferring G4 until G3 is stable, because adding latency and API cost to a cron that already runs three times a day doesn't make sense while the data foundations aren't solid yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happens to rejected entries
&lt;/h2&gt;

&lt;p&gt;Every failing entry gets appended to &lt;code&gt;data/bluesky-qc-rejected.jsonl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;appendFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;REJECTED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;qc_reasons&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;reasons&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;qc_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rejected log serves two purposes: nothing is lost (entries can be salvaged by editing and re-adding to the queue), and it's the primary feedback mechanism for tuning generation prompts upstream. If G1 keeps hitting "content pipeline" in posts that were supposed to be tool roundups, the prompt that generates those posts is leaking pipeline jargon. That's a prompt fix, not a gate fix.&lt;/p&gt;

&lt;p&gt;I review the rejected log about once a week, the same session where I run &lt;code&gt;bluesky-engagement-stats.mjs&lt;/code&gt; to refresh the G3 profile. Together this takes about 20 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Differences from the &lt;a href="https://dev.to/articles/three-post-deploy-checks-cloudflare-pages"&gt;post-deploy check pattern&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/single-ci-pipeline-two-youtube-channels-three-seo-sites"&gt;single CI pipeline&lt;/a&gt; this sits in already runs post-deploy checks after every Cloudflare Pages build. Those checks are about production correctness — did the right pages render, do the JSON-LD blocks validate, is the sitemap current. The Bluesky QC gate is about tone and context — does the text feel authentic for a social timeline.&lt;/p&gt;

&lt;p&gt;The design principle is the same: gate as early as possible, make failures informative, never silently swallow errors. But the failure modes are different. A broken JSON-LD block is objectively wrong. A post that mentions "content pipeline" isn't wrong — it's contextually wrong for the audience.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Atomic state machine.&lt;/strong&gt; The two-script design has a coordination gap: QC clears an entry, post script runs, if the network fails mid-post the entry stays at the front of the queue and QC re-evaluates it next run. This is harmless but wasteful. A single script that locks the entry state before attempting the post — and marks it as &lt;code&gt;qc_passed&lt;/code&gt; in the JSONL — would eliminate the re-evaluation churn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classifier instead of regex.&lt;/strong&gt; The REVEAL regex is 30+ terms and growing. The right long-term design is a small logistic regression or a few-shot Claude classifier trained on the rejected log once there are 100+ examples. I'll have enough data in about six months at current rejection rates. Until then, a regex is zero latency, zero cost, and deterministic — properties I'd be trading away for accuracy I don't yet need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;G3 as a hard gate sooner.&lt;/strong&gt; The warn-only period for G3 feels too conservative in retrospect. I could have set a very permissive threshold (say, predicted score &amp;lt; 0.5 × baseline median) and still caught the clearly low-signal cases without needing extensive data. I'll revisit once I hit 90 days of post history.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why not filter at generation time rather than at post time?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I do — generation prompts include explicit instructions to avoid revealing automation. But prompts are probabilistic. The gate is deterministic. Both layers running in sequence is cheaper than relying on either alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if the whole queue fails QC?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The script exits 0 with a log message. Nothing posts. The queue isn't cleared, so the next run tries again. If the queue stays empty for several consecutive days, the queue refill cron (&lt;code&gt;bluesky-refill-queue.mjs&lt;/code&gt;) adds fresh entries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you decide what goes into the REVEAL list?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Anything I'd feel awkward saying to a human who asked "did you write this?" — tool names for automated workflows, quantitative signals from CI pipelines, phrases that have no natural place in a first-person social post.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the gate ever false-positive on legitimate content?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. "generate" catches legitimate uses of the word. Posts about generators, code generation tools, or electrical generation would trip G1. I accept that loss. Posts about those topics are edge cases for this particular Bluesky presence, and if they do come up I can edit the queue entry to rephrase before re-adding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/articles/bluesky-jsonl-queue-daily-posts-no-scheduler"&gt;How the JSONL queue works without a scheduler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/articles/bluesky-image-upload-cloudflare-pages-race-fix"&gt;Fixing the Bluesky image upload race against Cloudflare deploy lag&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>showdev</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>What I'm watching this week: OpenClaw, Trellis.2, Gemma 4, and two more</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sat, 27 Jun 2026 22:11:27 +0000</pubDate>
      <link>https://dev.to/morinaga/what-im-watching-this-week-openclaw-trellis2-gemma-4-and-two-more-p70</link>
      <guid>https://dev.to/morinaga/what-im-watching-this-week-openclaw-trellis2-gemma-4-and-two-more-p70</guid>
      <description>&lt;p&gt;Five open-source projects I pulled up and actually thought about this week. Not a comprehensive roundup — just what caught my attention as someone running three AI-curated directory sites on a $25/month budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw: the most-starred repo in GitHub history
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;openclaw/openclaw&lt;/a&gt; just crossed 373,000 stars, making it the most-starred software project in GitHub history, overtaking React. It's a self-hosted AI assistant that routes to any model — local or API — and connects to over 50 messaging platforms: WhatsApp, Signal, iMessage, Slack, Discord, Telegram, and more, all from a single config file. Created by Peter Steinberger, it has 1,200+ contributors.&lt;/p&gt;

&lt;p&gt;From my directory site angle, OpenClaw is exactly the kind of project that drives "Open Alternative To" traffic. People search for "self-hosted alternatives to [Claude API / ChatGPT]" and what they actually want is something like this — a local-first agent they control. The 300K+ star count without a polished marketing page tells me the latent demand is real. I'm going to add it to the OSS directory and watch whether the page indexes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Mythos Preview at 93.9% SWE-bench Verified
&lt;/h2&gt;

&lt;p&gt;Anthropic published Claude Mythos Preview this week, which now leads the &lt;a href="https://benchlm.ai/benchmarks/sweVerified" rel="noopener noreferrer"&gt;SWE-bench Verified leaderboard&lt;/a&gt; at 93.9%. For reference: Claude Opus 4.8 is at 88.6%, Claude Opus 4.7 Adaptive at 87.6%, and six months ago 80% was the ceiling. The trajectory is steep.&lt;/p&gt;

&lt;p&gt;I'm running Claude Haiku 4.5 for my ETL and content generation — cost and latency matter more to me than ceiling performance. But I'm watching this number for a different reason: as coding agents approach 95%, the interesting question shifts from "can it write correct code?" to "can it understand architecture trade-offs without prompting?" SWE-bench doesn't measure that, and I don't know when that bar will move.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 4: Apache 2.0, agentic, multimodal, four sizes
&lt;/h2&gt;

&lt;p&gt;Google released Gemma 4 in early April and made further announcements at Google I/O in May. The family: Effective 2B, Effective 4B, 26B MoE, and 31B Dense. All under Apache 2.0. The 31B model ranks #3 on the Arena AI open-model leaderboard; the 26B is #6.&lt;/p&gt;

&lt;p&gt;What makes it relevant for my stack: native function-calling and structured JSON output are built in from day one, not retrofitted. The E2B model is designed for edge inference, which raises the question of whether it could replace Claude Haiku for my nightly ETL without a cost increase. The honest answer is I don't know — it depends on inference latency under load, and I haven't run that benchmark. I'll do it once I have a stable GPU budget to test against.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLaDA2.0-Uni: discrete diffusion at 100B parameters
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/inclusionAI/LLaDA2.0-Uni" rel="noopener noreferrer"&gt;InclusionAI/LLaDA2.0-Uni&lt;/a&gt; (Ant Group) is a unified multimodal model — text generation, image understanding, and image generation — built on discrete diffusion instead of autoregressive decoding. The 100B flash variant uses MoE, making it the largest discrete diffusion model published so far.&lt;/p&gt;

&lt;p&gt;The architecture is genuinely different from everything else: during training, tokens are randomly masked at varying rates; at inference you start from all-masked and run the reverse process. I'm not suggesting this is production-ready for most use cases today. But as a directional signal — "you don't have to do autoregressive decoding to get strong results at scale" — it's worth tracking. If this line of research matures over the next 12–18 months, it changes the inference cost curve in ways that matter for people running batch AI pipelines cheaply.&lt;/p&gt;

&lt;h2&gt;
  
  
  TRELLIS.2: image-to-3D in three seconds, MIT license
&lt;/h2&gt;

&lt;p&gt;Microsoft released &lt;a href="https://huggingface.co/microsoft/TRELLIS.2-4B" rel="noopener noreferrer"&gt;TRELLIS.2-4B on HuggingFace&lt;/a&gt; under MIT license. It converts a single image to a fully textured 3D asset: 512³ resolution in roughly 3 seconds, 1024³ in about 17 seconds, on an NVIDIA A100. Minimum 24GB VRAM.&lt;/p&gt;

&lt;p&gt;I don't have an immediate use for 3D generation in my stack. But I'm watching because it fits the same pattern as the open-source video generation wave from late 2025: once the compute floor drops below "GPU you already own," the tool stops being a research artifact and becomes infrastructure. TRELLIS.2 isn't there yet at 24GB minimum, but the MIT license and HuggingFace availability suggest Microsoft is betting on community adoption rather than API gating. The next version will probably run on a 16GB card.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>machinelearning</category>
      <category>webdev</category>
    </item>
    <item>
      <title>5 things I noticed this week: CI cost, Bluesky QC, and CC0 licensing</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sat, 27 Jun 2026 22:11:23 +0000</pubDate>
      <link>https://dev.to/morinaga/5-things-i-noticed-this-week-ci-cost-bluesky-qc-and-cc0-licensing-49ig</link>
      <guid>https://dev.to/morinaga/5-things-i-noticed-this-week-ci-cost-bluesky-qc-and-cc0-licensing-49ig</guid>
      <description>&lt;p&gt;Five things I noticed or shipped this week while running the AI directory sites and a YouTube automation pipeline. No big announcements — mostly friction I hit and adjustments I made.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. GitHub Actions was eating my free quota silently
&lt;/h2&gt;

&lt;p&gt;I had a Bluesky posting cron running three times a day. Each run triggered a four-platform matrix build (three sites × multiple jobs). When I looked at my Actions minutes consumed this week, the math was embarrassing: 3 cron runs × 5-6 minutes each × 7 days = roughly 120 minutes/week just for posting a tweet-sized status update.&lt;/p&gt;

&lt;p&gt;Two fixes. First, I changed the Bluesky cron from &lt;code&gt;0 */8 * * *&lt;/code&gt; to a single daily trigger — the queue buffers three posts regardless of how many times the cron fires, and posts don't go out faster than the queue feeds them. Second, I added a path filter so content-only commits (articles, copy edits) skip the four-way matrix build entirely. A new article doesn't need a full CI rebuild of all three Astro sites.&lt;/p&gt;

&lt;p&gt;Actions quota is not infinite. Even on a paid plan, burning minutes on no-ops is a bad habit to get into before the repo scales.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Bluesky posts need a quality gate before they leave the queue
&lt;/h2&gt;

&lt;p&gt;I added a QC gate to the Bluesky post pipeline this week — a step that reads each queued post, checks it against a short ruleset (no broken links, no expired announcements, no posts that reveal the automation stack in a tone that sounds like spam), and drops anything that fails before the cron fires.&lt;/p&gt;

&lt;p&gt;The immediate trigger: I audited the outbox and found 17 posts that read like a bot talking to itself. Phrases like "🔁 queued" and "auto-generated" in a context where I had not disclosed that. Not illegal, but not the tone I want on a personal account.&lt;/p&gt;

&lt;p&gt;The gate runs as a step before the actual &lt;code&gt;bluesky post&lt;/code&gt; command. If it rejects a post, the item stays in the queue with a &lt;code&gt;flagged&lt;/code&gt; status so I can review it manually. Net result: fewer posts per day (down from three to one or two), but ones I would not be embarrassed to have written manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Stopping model routing made the pipeline simpler
&lt;/h2&gt;

&lt;p&gt;I wrote a YT script this week about removing model routing — the pattern where you send different content types to different AI models based on some classifier. I had been routing "short factual" queries to a faster/cheaper model and "synthesis" queries to a more capable one.&lt;/p&gt;

&lt;p&gt;What I found after removing it: latency stayed basically the same, cost went up about 8%, and the code got significantly simpler. The routing classifier itself had edge cases. When the classifier misfired on a synthesis query and sent it to the cheaper model, the output was noticeably worse. The 8% cost increase to send everything to the capable model is cheaper than debugging routing bugs.&lt;/p&gt;

&lt;p&gt;This is not a universal takeaway — at scale, routing probably pays off. At indie scale with a handful of daily API calls, the complexity cost is real and the savings are marginal.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Openverse CC0 filtering is not default — you have to opt in
&lt;/h2&gt;

&lt;p&gt;I added image slides to the YouTube slide renderer this week using &lt;a href="https://openverse.org/" rel="noopener noreferrer"&gt;Openverse&lt;/a&gt;. The API returns results across multiple Creative Commons license types by default. For a monetized YouTube channel, using CC-BY images without visible on-screen attribution is a real licensing problem.&lt;/p&gt;

&lt;p&gt;The filter I needed is &lt;code&gt;license=cc0,pdm&lt;/code&gt; — not the default. Without it, you get CC-BY, CC-BY-SA, CC-BY-NC results mixed in with no indication they require credit. The API returns a &lt;code&gt;license&lt;/code&gt; field per result, but if you're batch-processing slides and forget to filter upstream, you will miss one eventually.&lt;/p&gt;

&lt;p&gt;A second issue: Openverse sometimes returns results pointing to images that have since been removed from the source host. The API returns 200 with metadata, but the actual image URL 404s. I added a &lt;code&gt;requests.head()&lt;/code&gt; check before the slide renderer tries to download anything, and skip results that return non-200.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Self-hosted observability tools have a comfort vs. capability gap
&lt;/h2&gt;

&lt;p&gt;I did a comparison of Netdata, SigNoz, and OpenObserve this week for the purpose of monitoring the three sites. All three install in under 10 minutes. The divergence shows up in what you're comfortable touching at 2am when something breaks.&lt;/p&gt;

&lt;p&gt;Netdata is the most comfortable out of the box — it auto-discovers processes and starts charting immediately. SigNoz requires you to send OpenTelemetry traces explicitly, which means instrumenting your code first. OpenObserve is log-focused and works well if you're piping structured JSON logs, but its dashboard interface has a steeper learning curve than the other two.&lt;/p&gt;

&lt;p&gt;For my current situation (Vercel + Cloudflare Pages, no VPS to instrument), all three are somewhat over-engineered. I ended up with a single Datadog free-tier integration for error alerting and leaving the self-hosted tools as a future option if the infrastructure changes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>webdev</category>
      <category>githubactions</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>Openverse CC0 images in a monetized YouTube pipeline: four things to check first</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sat, 27 Jun 2026 22:10:51 +0000</pubDate>
      <link>https://dev.to/morinaga/openverse-cc0-images-in-a-monetized-youtube-pipeline-four-things-to-check-first-3e6p</link>
      <guid>https://dev.to/morinaga/openverse-cc0-images-in-a-monetized-youtube-pipeline-four-things-to-check-first-3e6p</guid>
      <description>&lt;p&gt;I added an &lt;code&gt;image&lt;/code&gt; slide type to the &lt;a href="https://dev.to/articles/youtube-slide-renderer-pillow-eight-kinds-no-browser"&gt;YouTube slide renderer&lt;/a&gt; for cases where a real photo contextualizes a slide better than a Mermaid diagram. The source is &lt;a href="https://openverse.org/" rel="noopener noreferrer"&gt;Openverse&lt;/a&gt;, the CC-licensed media search maintained by WordPress.org. Before it worked reliably in CI, I hit four issues that are not obvious from reading the Openverse API docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. CC0 and PDM only — not CC-BY
&lt;/h2&gt;

&lt;p&gt;Openverse returns results across multiple Creative Commons license types. The default search includes CC-BY, CC-BY-SA, CC-BY-NC, and others alongside CC0 and PDM. For a monetized YouTube channel, the practical split is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CC0 and PDM&lt;/strong&gt;: no attribution required. You can use these in commercial content without a credit line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CC-BY and variants&lt;/strong&gt;: attribution required. On a slide or thumbnail, that means visible on-screen credit — "Photo by X via Y" somewhere on the frame, readable on screen.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For slides in a fast-paced video, on-screen attribution text has to be legible but unobtrusive. CC-BY attribution on a dark-background slide is doable but adds layout complexity — the credit needs to clear the host overlay, stay below the heading, and still be readable at 1080p. More importantly, if you miss a single CC-BY image in a batch-generated video, you are distributing unlicensed content.&lt;/p&gt;

&lt;p&gt;The filter I use is &lt;code&gt;license=cc0,pdm&lt;/code&gt; — nothing else. Even though CC0 technically does not require attribution, I still add a courtesy credit caption on every image slide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;custom&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; · &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attribution&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;custom&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;attribution&lt;/span&gt;  &lt;span class="c1"&gt;# always keep credit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;attribution&lt;/code&gt; string is formatted like &lt;code&gt;"Title" · Creator · CC0 · Openverse&lt;/code&gt;. It appears at the bottom of the slide in muted text. Not legally required for CC0; still correct practice, especially for Openverse's long-term funding (the catalog depends on content creators seeing attribution from reuse).&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The 200px minimum filter matters
&lt;/h2&gt;

&lt;p&gt;The Openverse API can return results that include small thumbnails, low-resolution copies, and occasionally images that fail to load entirely. Without filtering, you get unusable frames in your videos.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;im&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RGB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;im&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="c1"&gt;# skip tiny/thumbnail junk
&lt;/span&gt;    &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The check is against the smaller dimension. An image that is 1800x180 pixels would fail this check — it's too narrow to fill a 1920x1080 slide usefully. 200px is a loose floor; in practice most results that pass it are at least 400x400. If the query term is obscure (narrow technical topics often return few results), you may exhaust all 12 results in a page and get a &lt;code&gt;RuntimeError&lt;/code&gt;. The fallback in &lt;code&gt;_visual_slide&lt;/code&gt; catches this and renders a heading-only card — the build continues.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;page_size=12&lt;/code&gt; is set explicitly because the Openverse default varies and results 0-11 give you enough candidates to find at least one usable image for common topics.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. SSL certificate errors in CI
&lt;/h2&gt;

&lt;p&gt;On Ubuntu GitHub Actions runners, plain &lt;code&gt;ssl.create_default_context()&lt;/code&gt; sometimes fails HTTPS requests to Openverse with certificate verification errors. The reliable fix is certifi:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;certifi&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_default_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cafile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;certifi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_default_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;except&lt;/code&gt; branch keeps the code working locally where certifi may not be installed. In CI, certifi is pinned in the workflow's pip install step (&lt;code&gt;pip install certifi Pillow&lt;/code&gt;), so the primary branch runs.&lt;/p&gt;

&lt;p&gt;The same pattern applies to the image download request — both the API call and the image fetch use the same &lt;code&gt;ctx&lt;/code&gt;. Without this, roughly one in ten CI runs fails on SSL at the image download step even if the API call succeeds, because the two requests can hit different CDN nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Retry with backoff is necessary
&lt;/h2&gt;

&lt;p&gt;The Openverse API has rate limits and occasional transient 5xx responses. A single request without retry logic fails silently in CI in a way that is hard to debug after the fact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three attempts with 1.5s and 3.0s waits covers most transient failures. The timeout on &lt;code&gt;urlopen&lt;/code&gt; is 15 seconds — enough for a slow CDN response, not so long that it blocks the entire video build if Openverse is down.&lt;/p&gt;

&lt;p&gt;The image download loop (iterating through results) has no retry per-image, because if one image URL fails you just move to the next candidate. The retry logic sits at the API query level, where the cost of a transient failure is losing all candidates at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the complete integration looks like
&lt;/h2&gt;

&lt;p&gt;In the slide spec JSON, an image slide looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"heading"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Self-hosted observability stack"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server rack data center monitoring"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"caption"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grafana + Prometheus in production"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;query&lt;/code&gt; drives the Openverse search. The &lt;code&gt;caption&lt;/code&gt; prefixes the attribution string. When the fetch succeeds, the slide shows the heading, the photo centered in the content area, and the attribution credit at the bottom. When it fails — query too specific, Openverse down, all results too small — the slide shows the heading alone. Either way the video builds.&lt;/p&gt;

&lt;p&gt;The design principle is the same one behind the &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;three-tier content quality ladder&lt;/a&gt; for the directory ETL: the build never blocks on an enrichment step. Degraded output is better than a failed pipeline.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>opensource</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>What I learned adding diagram and chart slides to a CI-rendered YouTube pipeline</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sat, 27 Jun 2026 08:24:01 +0000</pubDate>
      <link>https://dev.to/morinaga/what-i-learned-adding-diagram-and-chart-slides-to-a-ci-rendered-youtube-pipeline-3bnl</link>
      <guid>https://dev.to/morinaga/what-i-learned-adding-diagram-and-chart-slides-to-a-ci-rendered-youtube-pipeline-3bnl</guid>
      <description>&lt;p&gt;The conclusion first: pre-rendering diagrams and charts to PNG before compositing them onto slides — rather than generating visual content inline or inside ffmpeg — is the right architecture for a CI video pipeline. The tooling gap between Chromium-backed Mermaid rendering, headless matplotlib, and ffmpeg's static frame expectation makes a shared PNG handoff the only approach that keeps each piece testable and replaceable.&lt;/p&gt;

&lt;p&gt;I added three new slide types to the &lt;a href="https://dev.to/articles/youtube-slide-renderer-pillow-eight-kinds-no-browser"&gt;YouTube slide renderer&lt;/a&gt; last week: &lt;code&gt;diagram&lt;/code&gt; (Mermaid flowcharts and sequence diagrams), &lt;code&gt;chart&lt;/code&gt; (branded horizontal bar charts via matplotlib), and &lt;code&gt;image&lt;/code&gt; (license-clear photos from Openverse). The existing slides — title, bullets, table, tool, outro — all draw directly with Pillow. These three render externally, produce a PNG, and get pasted into the same Pillow canvas. Same output contract, different render path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why pre-render instead of embed
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/two-host-video-pipeline-edge-tts-pillow-ffmpeg"&gt;two-host pipeline&lt;/a&gt; assembles video by compositing a still image for each dialogue segment, synthesizing audio with edge-tts, and using ffmpeg to concatenate the clips. ffmpeg expects the still to be a file or a stream of identical frames — it does not run JavaScript, and it cannot call a browser mid-concat.&lt;/p&gt;

&lt;p&gt;Mermaid runs through Puppeteer and Chromium. Pillow draws directly on a numpy-backed image. There is no in-process way to make these talk. The only clean option is: mmdc produces a PNG, Pillow pastes the PNG.&lt;/p&gt;

&lt;p&gt;matplotlib is different — it could theoretically produce an image buffer in the same process. But having a consistent "render to PNG file, paste PNG file" pattern for all three visual types means they share the same &lt;code&gt;_visual_slide&lt;/code&gt; scaffold and graceful-degradation path. One code path is better than two.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mermaid via mmdc: the CI-specific configuration
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;render_mermaid()&lt;/code&gt; in &lt;code&gt;visuals.py&lt;/code&gt; writes two config files before calling mmdc:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gettempdir&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bs-mmd-theme.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_MMD_THEME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pcfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gettempdir&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bs-puppeteer.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--no-sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--disable-setuid-sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The puppeteer config is the one that bit me first. Chromium refuses to start as root without &lt;code&gt;--no-sandbox&lt;/code&gt;, and GitHub Actions runs as root inside the Ubuntu container. Without &lt;code&gt;--disable-setuid-sandbox&lt;/code&gt;, it will also fail on containers where setuid is restricted. Both flags are needed.&lt;/p&gt;

&lt;p&gt;The theme config uses Mermaid's &lt;code&gt;base&lt;/code&gt; theme, not &lt;code&gt;dark&lt;/code&gt; or &lt;code&gt;forest&lt;/code&gt;. The other named themes override &lt;code&gt;themeVariables&lt;/code&gt;, so color injection does not work reliably with them. Only &lt;code&gt;base&lt;/code&gt; respects the custom palette (&lt;a href="https://mermaid.js.org/config/theming.html" rel="noopener noreferrer"&gt;confirmed in the Mermaid.js theme docs&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_MMD_THEME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;theme&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;themeVariables&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;primaryColor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PANEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;primaryTextColor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;INK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;primaryBorderColor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ACCENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lineColor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MUTED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secondaryColor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#1B2B4A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tertiaryColor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fontFamily&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Arial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fontSize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;22px&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clusterBkg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PANEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clusterBorder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ACCENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mmdc binary resolution tries three paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_mmdc_cmd&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;local&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node_modules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mmdc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;which&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mmdc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mmdc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--yes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@mermaid-js/mermaid-cli&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The npx fallback is correct for CI: the GitHub Actions workflow installs &lt;code&gt;@mermaid-js/mermaid-cli&lt;/code&gt; as a dev dependency, so local &lt;code&gt;node_modules&lt;/code&gt; is the hot path. The npx branch exists for local dev where you have not run &lt;code&gt;npm install&lt;/code&gt;. Do not make npx the primary path — it downloads the package on every invocation, which adds 20-30 seconds per diagram slide in a cold runner.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;-w 1600&lt;/code&gt; width flag matters. At 1920x1080, the content area after chrome and heading is roughly 1700x700. Rendering at 1600px wide gives mmdc enough resolution to produce readable text without scaling artifacts when &lt;code&gt;_paste_visual()&lt;/code&gt; thumbnails it into the slot.&lt;/p&gt;

&lt;h2&gt;
  
  
  matplotlib horizontal bars with a custom dark palette
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;render_chart()&lt;/code&gt; takes a simple spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# spec = {"items": [["Tool A", 41200], ["Tool B", 28900], ...], "unit": "stars", "highlight": 0}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things that tripped me up when trying to match the slide background:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;: &lt;code&gt;matplotlib.use("Agg")&lt;/code&gt; must be called before &lt;code&gt;import matplotlib.pyplot as plt&lt;/code&gt;. In Python, the backend selection call and the pyplot import are order-dependent — if you call &lt;code&gt;use("Agg")&lt;/code&gt; after pyplot is imported, it either silently fails or raises. The function imports matplotlib inside the function body to avoid this at module load time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;render_chart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out_png&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib&lt;/span&gt;
    &lt;span class="n"&gt;matplotlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;: setting the background requires two separate calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_facecolor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BG&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_facecolor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BG&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;fig.patch&lt;/code&gt; is the outer figure background; &lt;code&gt;ax&lt;/code&gt; is the axes area. Missing either one leaves a white rectangle where the background should be dark navy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third&lt;/strong&gt;: &lt;code&gt;plt.tight_layout(pad=1.2)&lt;/code&gt; is not enough on its own. Adding &lt;code&gt;bbox_inches="tight"&lt;/code&gt; to &lt;code&gt;fig.savefig()&lt;/code&gt; is required to clip the white padding matplotlib adds around the bounding box by default. Without it, the saved PNG has a white border that composites badly onto the dark slide background.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;highlight&lt;/code&gt; index accentuates one bar in ACCENT2 (green) instead of ACCENT (blue). The spec author sets it to mark the bar that is the point of the slide — the tool with the most stars, or the benchmark winner. It is optional; when absent, all bars render in blue.&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;_visual_slide&lt;/code&gt; scaffold and graceful degradation
&lt;/h2&gt;

&lt;p&gt;All three types share the same scaffold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_visual_slide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;render_fn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_base&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;_chrome&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;heading&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;heading&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;heading&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;MARGIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;heading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;font&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;font&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;INK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rectangle&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;MARGIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;228&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MARGIN&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;236&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ACCENT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;
    &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkstemp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;render_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;_paste_visual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;270&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;heading&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WARN: visual render failed (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;MARGIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;420&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[visual unavailable]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;font&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;font&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MUTED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unlink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;tempfile.mkstemp&lt;/code&gt; pattern (create and close descriptor separately) is deliberately cross-platform: on Windows, &lt;code&gt;NamedTemporaryFile&lt;/code&gt; with &lt;code&gt;delete=False&lt;/code&gt; sometimes holds a lock that prevents a subprocess from writing to the same path. mkstemp avoids this. On Linux it makes no practical difference, but the code runs locally on macOS too.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;except Exception&lt;/code&gt; fallback is intentional. If mmdc is not installed, if matplotlib is not present, or if Openverse returns no usable images, the slide renders as a clean heading card with a muted "[visual unavailable]" placeholder. The video build continues. A missing diagram is not a build failure.&lt;/p&gt;

&lt;p&gt;This matches the design of the &lt;a href="https://dev.to/articles/single-ci-pipeline-two-youtube-channels-three-seo-sites"&gt;rest of the pipeline&lt;/a&gt;: the CI job should produce a video even when external dependencies are partially absent. A video with a placeholder slide is reviewable; a failed build is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the approach has limits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;mmdc startup cost.&lt;/strong&gt; Each mmdc call launches a Chromium process, renders the SVG, and exits. That takes 2-4 seconds per diagram on the GitHub Actions ubuntu-latest runner. A video spec with five diagram slides adds 10-20 seconds to the build. For the current video lengths (15-25 segments), this is acceptable. If the format grew to 50+ slides, pre-rendering all diagrams in parallel before the main loop would matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;matplotlib version pinning.&lt;/strong&gt; The chart code relies on a specific call signature for &lt;code&gt;barh()&lt;/code&gt; and &lt;code&gt;tight_layout()&lt;/code&gt;. matplotlib has changed these interfaces across minor versions. The workflow pins &lt;code&gt;matplotlib==3.9.*&lt;/code&gt; to avoid surprises in future runner image updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mermaid syntax drift.&lt;/strong&gt; mmdc's supported Mermaid syntax depends on the installed package version. Sequence diagrams work. &lt;code&gt;gitGraph&lt;/code&gt; works. More recent additions (&lt;code&gt;xychart-beta&lt;/code&gt;, &lt;code&gt;sankey-beta&lt;/code&gt;) require a newer mmdc version than what ships with npm without explicit pinning. The solution is to add &lt;code&gt;@mermaid-js/mermaid-cli@^11&lt;/code&gt; to the workflow's npm install step rather than relying on the default version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No preview in local dev.&lt;/strong&gt; The &lt;a href="https://dev.to/articles/yt-analytics-performance-classifier-video-script-bias"&gt;analytics feedback loop&lt;/a&gt; tells me which slides hold attention; I cannot see a diagram slide without building the full video. Adding a &lt;code&gt;--demo&lt;/code&gt; flag to &lt;code&gt;slides.py&lt;/code&gt; that renders a single slide type from a spec file would help iterate faster, but I have not built it yet. For now the iteration loop is: edit the JSON spec, push a commit, wait for the CI video build, scrub to the diagram segment. That is slow. The &lt;a href="https://dev.to/articles/single-ci-pipeline-two-youtube-channels-three-seo-sites"&gt;single CI pipeline&lt;/a&gt; takes about 4 minutes end-to-end for a 20-segment video, which is reasonable for final validation but too slow for visual design iteration. A local &lt;code&gt;--demo&lt;/code&gt; render that skips TTS and ffmpeg and just opens the PNG would cut that to under 10 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does mmdc need Node.js pre-installed on the runner?&lt;/strong&gt;&lt;br&gt;
Yes. The &lt;code&gt;ubuntu-latest&lt;/code&gt; runner ships with Node.js 20, so the workflow just needs &lt;code&gt;npm install -D @mermaid-js/mermaid-cli&lt;/code&gt; in a setup step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Mermaid's dark theme instead of base with custom variables?&lt;/strong&gt;&lt;br&gt;
No. The &lt;code&gt;dark&lt;/code&gt; theme resets most &lt;code&gt;themeVariables&lt;/code&gt; to its own palette. The only theme that lets you fully override colors via &lt;code&gt;themeVariables&lt;/code&gt; is &lt;code&gt;base&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens if Openverse returns zero results for a query?&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;fetch_image()&lt;/code&gt; raises &lt;code&gt;RuntimeError("no usable Openverse image for query: ...")&lt;/code&gt;. The &lt;code&gt;_visual_slide&lt;/code&gt; scaffold catches it and renders a heading-only fallback card. No build failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why horizontal bar charts instead of vertical?&lt;/strong&gt;&lt;br&gt;
Longer tool or model names truncate or require rotation on vertical axes. Horizontal bars handle 20-40 character labels without layout code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the PNG-to-Pillow paste lossless?&lt;/strong&gt;&lt;br&gt;
Pillow opens the PNG fully decoded at 24-bit color depth. The save to final PNG at the end of slide composition uses default Pillow compression (level 6), which is lossless. There is no quality loss from the round trip.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Related: &lt;a href="https://dev.to/articles/youtube-slide-renderer-pillow-eight-kinds-no-browser"&gt;YouTube slide renderer — eight kinds, no browser&lt;/a&gt; — &lt;a href="https://dev.to/articles/two-host-video-pipeline-edge-tts-pillow-ffmpeg"&gt;Two-host video pipeline with edge-tts and ffmpeg&lt;/a&gt; — &lt;a href="https://dev.to/articles/free-neural-tts-options-ci-pipelines-compared"&gt;Free neural TTS options for CI pipelines&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>webdev</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Netdata vs SigNoz vs OpenObserve: self-hosted observability for indie projects</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Sat, 27 Jun 2026 08:23:57 +0000</pubDate>
      <link>https://dev.to/morinaga/netdata-vs-signoz-vs-openobserve-self-hosted-observability-for-indie-projects-1mob</link>
      <guid>https://dev.to/morinaga/netdata-vs-signoz-vs-openobserve-self-hosted-observability-for-indie-projects-1mob</guid>
      <description>&lt;p&gt;I've been building out the &lt;a href="https://ossfind.com" rel="noopener noreferrer"&gt;ossfind.com&lt;/a&gt; OSS alternatives directory, and observability tools come up constantly as a category where the SaaS-to-OSS migration question is genuinely interesting. Datadog is the canonical expensive monitoring platform; replacing it with open source isn't a simple one-for-one swap. I spent time researching three projects that together — or individually — cover a meaningful slice of what Datadog does.&lt;/p&gt;

&lt;p&gt;These aren't toy tools. All three have serious GitHub star counts and production deployments. The question for an indie project isn't whether they're capable enough, it's whether they're operable enough given a small team and a tight budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Netdata: fastest time to value on this list
&lt;/h2&gt;

&lt;p&gt;GitHub stars: ~79k. License: GPL-3.0.&lt;/p&gt;

&lt;p&gt;Netdata installs with one command and immediately starts collecting ~800 pre-built metrics across your host, containers, and services. No config file required to get a working dashboard. That's a legitimately different experience from standing up Prometheus from scratch.&lt;/p&gt;

&lt;p&gt;GPL-3.0 is clean for self-hosting. The copyleft only matters if you build a commercial product on top of Netdata and sell it. Running it internally doesn't trigger the requirement. The scope limit to understand: Netdata is infrastructure metrics and alerting. If distributed traces or centralized log search are the Datadog features you actually depend on, Netdata alone doesn't close that gap.&lt;/p&gt;

&lt;p&gt;For an indie project running a few Vercel deployments and a VPS, Netdata on the VPS handles host-level observability without any setup cost. That's its actual sweet spot.&lt;/p&gt;

&lt;h2&gt;
  
  
  SigNoz: the all-in-one APM stack
&lt;/h2&gt;

&lt;p&gt;GitHub stars: ~27k. License: see LICENSE (dual — a non-AGPL community edition and a paid enterprise tier).&lt;/p&gt;

&lt;p&gt;SigNoz is the most ambitious of the three. It bundles infrastructure metrics, distributed traces (via OpenTelemetry), and log management in a single product with a single login. That's the Datadog feature set as a self-hosted stack.&lt;/p&gt;

&lt;p&gt;The OpenTelemetry integration is worth noting: your instrumentation code sends standard OTLP data, and you could swap SigNoz for any compatible backend later. That's a better position than adopting a proprietary agent format.&lt;/p&gt;

&lt;p&gt;The tradeoff is operational weight. SigNoz runs multiple containers — ClickHouse, query service, frontend. A basic deployment consumes meaningful memory. That's fine for a team with dedicated infrastructure, but for a solo project where the "server" is a $6 Hetzner box also running a database and three Astro sites, it's a lot to manage.&lt;/p&gt;

&lt;p&gt;The license situation needs attention before any commercial deployment. The community edition is open-source. The enterprise tier is not. Check the LICENSE file rather than assuming one way or the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenObserve: the log-first approach
&lt;/h2&gt;

&lt;p&gt;GitHub stars: ~19k. License: AGPL-3.0.&lt;/p&gt;

&lt;p&gt;OpenObserve optimizes for storage efficiency on logs. The project claims significant storage savings compared to Elasticsearch-based setups for log data — that's its differentiator. It also supports traces and metrics, but log aggregation is where it's strongest.&lt;/p&gt;

&lt;p&gt;AGPL-3.0 means the same copyleft as SigNoz's community license: use it internally without restriction, but if you expose it as a SaaS product, you need to open-source that service. For a directory site or a personal project, that's a non-issue.&lt;/p&gt;

&lt;p&gt;The use case I'd reach for it: a project where log volume is high and storage cost is a real constraint. If you're paying for a logging SaaS and the bill is mostly driven by log data volume, OpenObserve is where I'd look first.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I'd pick between them
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Need&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Host metrics, minimal ops&lt;/td&gt;
&lt;td&gt;Netdata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full APM (metrics + traces + logs)&lt;/td&gt;
&lt;td&gt;SigNoz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High log volume, storage efficiency&lt;/td&gt;
&lt;td&gt;OpenObserve&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget under $10/month server&lt;/td&gt;
&lt;td&gt;Netdata only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most indie projects in 2026, the honest answer is that full observability — metrics, traces, and logs — isn't the first priority. Getting something running cheaply is. Netdata solves the "is my server on fire" question immediately; that's usually enough to start.&lt;/p&gt;

&lt;p&gt;The moment distributed tracing becomes useful is when you have multiple services talking to each other and you can't tell which one is slow. A single Astro site served from Vercel edge functions with a Turso database doesn't really need distributed tracing. A microservices setup does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually run
&lt;/h2&gt;

&lt;p&gt;For my three sites, I use none of these currently. The sites are static Astro deployments on Vercel; Vercel's built-in analytics and the GitHub Actions run logs cover 90% of what I need right now. I'll revisit when I have a persistent service that generates actual log volume worth searching.&lt;/p&gt;

&lt;p&gt;The research fed directly into the &lt;a href="https://ossfind.com/alternatives/datadog/" rel="noopener noreferrer"&gt;ossfind.com Datadog alternatives page&lt;/a&gt;, which lists these three plus Grafana and HyperDX with more detail on each.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>webdev</category>
      <category>programming</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>How I built the OSS alternatives directory: GitHub ETL, Turso, and the UPSERT trap I hit</title>
      <dc:creator>MORINAGA</dc:creator>
      <pubDate>Fri, 26 Jun 2026 22:12:58 +0000</pubDate>
      <link>https://dev.to/morinaga/how-i-built-the-oss-alternatives-directory-github-etl-turso-and-the-upsert-trap-i-hit-11ie</link>
      <guid>https://dev.to/morinaga/how-i-built-the-oss-alternatives-directory-github-etl-turso-and-the-upsert-trap-i-hit-11ie</guid>
      <description>&lt;p&gt;When I launched &lt;a href="https://dev.to/articles/three-sites-experiment"&gt;three programmatic directory sites in April 2026&lt;/a&gt;, the open-source alternatives site had the most interesting data model. The AI tools directory indexes HuggingFace models — that's a pull from one API. The indie games directory reads Steam. But the OSS alternatives site has to answer a different question: for this SaaS product, which open-source repos actually cover the same use case, and how do they compare?&lt;/p&gt;

&lt;p&gt;Getting that right required a two-phase ETL approach, a careful UPSERT strategy I initially got wrong, and some deliberate choices about where to use Claude Haiku and where to use a fallback template.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the data model looks like
&lt;/h2&gt;

&lt;p&gt;Three tables in &lt;a href="https://dev.to/articles/turso-libsql-vs-cloudflare-d1-astro-monorepo"&gt;Turso libSQL&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;saas&lt;/code&gt; — the SaaS tool being replaced (Datadog, Notion, Figma, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;alternatives&lt;/code&gt; — GitHub repos that serve the same use case, linked by &lt;code&gt;saas_slug&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;saas_content&lt;/code&gt; — Claude-generated per-entry text: an intro, comparison notes, and migration tips&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;alternatives&lt;/code&gt; table stores everything the GitHub API returns that matters for a directory: &lt;code&gt;stars&lt;/code&gt;, &lt;code&gt;forks&lt;/code&gt;, &lt;code&gt;language&lt;/code&gt;, &lt;code&gt;license&lt;/code&gt;, &lt;code&gt;last_pushed&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;. The &lt;code&gt;saas_content&lt;/code&gt; table stores only what Claude adds — the editorial layer that turns raw repo metadata into something useful.&lt;/p&gt;

&lt;p&gt;The full export lives in a JSON file that Astro reads at build time. No database connection at build. The ETL pipeline and the Astro build are separate processes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: seeding from JSON
&lt;/h2&gt;

&lt;p&gt;The first time the site runs on a new machine, there's no database. Rather than block a local build on a live GitHub API pass, I wrote a &lt;code&gt;seed.ts&lt;/code&gt; script that bootstraps the database from a hand-curated &lt;code&gt;saas.json&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;The JSON contains: SaaS name, slug, homepage, category, and a list of &lt;code&gt;owner/repo&lt;/code&gt; strings. Stars, forks, license, and last_pushed are deliberately omitted — they'll come from the live fetch. What I do include in JSON is pre-polished content for some entries where the Claude default output was weak.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`INSERT INTO saas (slug, name, homepage, category, fetched_at)
          VALUES (?, ?, ?, ?, ?)
          ON CONFLICT(slug) DO NOTHING`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alternatives&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`INSERT INTO alternatives (saas_slug, repo, name, description, ...)
            VALUES (?, ?, ?, ?, ...)
            ON CONFLICT(saas_slug, repo) DO NOTHING`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;DO NOTHING&lt;/code&gt; on conflict for &lt;code&gt;alternatives&lt;/code&gt; is correct: once GitHub data is live, the seed shouldn't clobber fresh stars counts with the static values from the JSON. But for &lt;code&gt;saas_content&lt;/code&gt;, I initially used the same &lt;code&gt;DO NOTHING&lt;/code&gt; — and that was a mistake I'll get to below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2: live GitHub data
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;fetch-alternatives.ts&lt;/code&gt; calls the GitHub REST API for every &lt;code&gt;owner/repo&lt;/code&gt; in the database and upserts the live fields. Unlike the seed, this is &lt;code&gt;DO UPDATE&lt;/code&gt; — we want fresh data.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/sleep-intervals-steam-github-huggingface-etl"&gt;sleep interval is 100ms between GitHub API calls&lt;/a&gt;. For an authenticated token that rate limit is conservative (&lt;a href="https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api" rel="noopener noreferrer"&gt;GitHub's REST API allows 5000 requests per hour for authenticated users&lt;/a&gt;, so 100ms is well under the minimum gap needed). Unauthenticated would be 60 per hour, which is 60 seconds per call — completely impractical at scale. The monorepo authenticates with a secret in GitHub Actions.&lt;/p&gt;

&lt;p&gt;Errors per-repo are caught and logged but don't abort the batch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;repoFull&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alternatives&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;repoFull&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getRepo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`INSERT INTO alternatives (saas_slug, repo, name, description, stars,
              forks, language, license, last_pushed, url, fetched_at)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            ON CONFLICT(saas_slug, repo) DO UPDATE SET
              description = excluded.description,
              stars = excluded.stars,
              forks = excluded.forks,
              language = excluded.language,
              license = excluded.license,
              last_pushed = excluded.last_pushed,
              fetched_at = excluded.fetched_at`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;repoFull&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stargazers_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;forks_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;license&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;spdx_id&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pushed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;html_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`  ! Failed &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;repoFull&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One field worth noting: &lt;code&gt;r.license?.spdx_id&lt;/code&gt; returns &lt;code&gt;null&lt;/code&gt; when GitHub sees a license file but can't identify the SPDX identifier. That happens more than you'd expect with non-standard licenses. I render those rows with "see repo" instead of a badge so I'm not misleading visitors about the license type.&lt;/p&gt;

&lt;h2&gt;
  
  
  Content generation with Claude Haiku
&lt;/h2&gt;

&lt;p&gt;After the GitHub data is fresh, &lt;code&gt;generate-content.ts&lt;/code&gt; queries for SaaS entries that either have no content row or whose &lt;code&gt;model_used&lt;/code&gt; column is &lt;code&gt;'fallback-template'&lt;/code&gt; or &lt;code&gt;'seeded-from-json'&lt;/code&gt;. For each, it asks Claude Haiku for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;intro&lt;/code&gt; — 2 sentences on what the SaaS is and why teams seek OSS alternatives&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;comparison_notes&lt;/code&gt; — 2-3 sentences on actual tradeoffs (self-hosting overhead, feature gaps)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;migration_tips&lt;/code&gt; — a 2-4 item array of concrete migration steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I use the &lt;a href="https://dev.to/articles/shared-claude-haiku-client-prompt-caching"&gt;shared Claude Haiku client with system-prompt caching&lt;/a&gt; here. The system prompt is identical for every call in a batch, so caching it saves input tokens on all subsequent calls. On a 50-entry pass, the cost difference is real.&lt;/p&gt;

&lt;p&gt;The fallback template — which runs when &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt; is absent — generates deterministic placeholder text. This matters for CI: the Astro build needs a content row for every SaaS entry. Missing content produces a blank page, which would then trigger &lt;a href="https://dev.to/articles/noindex-gate-programmatic-pages-without-404s"&gt;the noindex gate I use for thin programmatic pages&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/articles/three-tier-content-quality-ladder-programmatic-etl"&gt;three-tier content quality ladder&lt;/a&gt; I described earlier puts these generated entries at the middle tier — better than the raw repo description, worse than hand-edited content.&lt;/p&gt;

&lt;h2&gt;
  
  
  The UPSERT trap
&lt;/h2&gt;

&lt;p&gt;Original &lt;code&gt;seed.ts&lt;/code&gt; for &lt;code&gt;saas_content&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;saas_content&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;saas_slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intro&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;comparison_notes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;migration_tips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;saas_slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTHING&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That looked safe. But the problem was subtle. When I seeded with &lt;code&gt;model_used = null&lt;/code&gt; (the original JSON had no field), &lt;code&gt;generate-content.ts&lt;/code&gt; queried:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;saas&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;saas_content&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;saas_slug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;saas_slug&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
   &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_used&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'fallback-template'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'seeded-from-json'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rows seeded with &lt;code&gt;model_used = null&lt;/code&gt; didn't match either condition. They also weren't NULL (the row existed). So they got skipped by the generator — but the seed &lt;code&gt;DO NOTHING&lt;/code&gt; also prevented the polished JSON content from landing, because a fallback-template row had already been written by an earlier run.&lt;/p&gt;

&lt;p&gt;The fix was two parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Seed.ts now uses &lt;code&gt;DO UPDATE&lt;/code&gt; for &lt;code&gt;saas_content&lt;/code&gt;, not &lt;code&gt;DO NOTHING&lt;/code&gt;. Polished JSON content always wins.&lt;/li&gt;
&lt;li&gt;The JSON schema requires &lt;code&gt;model_used&lt;/code&gt; to be set explicitly — &lt;code&gt;'seeded-from-json'&lt;/code&gt; for automatic entries, &lt;code&gt;'claude-routine-polish'&lt;/code&gt; for hand-checked ones. The generator's WHERE clause excludes both.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;saas_slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt;
  &lt;span class="n"&gt;intro&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intro&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;comparison_notes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;comparison_notes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;migration_tips&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migration_tips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;generated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;model_used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_used&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern — using &lt;code&gt;model_used&lt;/code&gt; as a status field to coordinate between ETL phases — also showed up in the &lt;a href="https://dev.to/articles/upgrade-fallback-model-entries-deterministic-hash-pool"&gt;AI tools directory's fallback entry upgrade work&lt;/a&gt;. The lesson there was the same: never let an ETL pass silently skip a row because the status field was written inconsistently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Astro page structure
&lt;/h2&gt;

&lt;p&gt;Each SaaS entry renders as a static page at &lt;code&gt;/alternatives/[saas]/&lt;/code&gt;. The renderer reads from &lt;code&gt;saas.json&lt;/code&gt;, assembles a grid of alternatives sorted by stars, and inlines the Claude-generated comparison notes. Each entry shows a license badge, language indicator, and last_pushed date formatted as a relative time string.&lt;/p&gt;

&lt;p&gt;The grid intentionally doesn't paginate at the SaaS level. I capped entries per SaaS at 8. More than that becomes noise — the directory's value is curation, not exhaustiveness. The &lt;a href="https://dev.to/articles/eeat-transparency-pages-programmatic-directory"&gt;E-E-A-T transparency pages&lt;/a&gt; include a methodology note explaining what that cap means for each category.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd change
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Store raw GitHub JSON alongside derived columns.&lt;/strong&gt; Currently each ETL adds derived fields: stars, forks, license, last_pushed. When I later wanted a "has_recent_releases" signal, I had to add a full new API call. If I'd kept the raw response in a JSONB/TEXT column, &lt;code&gt;json_extract(raw, '$.has_wiki')&lt;/code&gt; would have been enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add a &lt;code&gt;deprecated_at&lt;/code&gt; field.&lt;/strong&gt; When a repo gets deleted or renamed, the ETL call returns a 404 and the code just logs it. The row stays in the database with increasingly stale data. A &lt;code&gt;deprecated_at&lt;/code&gt; timestamp would let the page renderer show a warning and let the content team decide whether to replace or remove the entry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallelize generate-content with a rate-limit counter.&lt;/strong&gt; The current sequential loop takes a noticeable number of minutes on a cold run with 100+ entries. Batching 10 concurrent Haiku calls with a shared counter that throttles at the API limit would be 5-10x faster without touching cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why Turso instead of a hosted Postgres?&lt;/strong&gt;&lt;br&gt;
Turso's edge replicas are in the same regions as Vercel's serverless functions, so read latency is low. The cost for my usage tier is also lower than a comparable Postgres instance. &lt;a href="https://dev.to/articles/turso-libsql-vs-cloudflare-d1-astro-monorepo"&gt;The full comparison is here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do you need a paid GitHub plan to avoid rate limits?&lt;/strong&gt;&lt;br&gt;
No. A free personal access token gives 5000 requests per hour — enough to fetch metadata for several hundred repos in a single daily cron run. The 60/hr unauthenticated limit would not work at any meaningful scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you prevent Claude costs from escalating?&lt;/strong&gt;&lt;br&gt;
System-prompt caching amortises the per-call cost across the batch. I also set &lt;code&gt;max_tokens: 1024&lt;/code&gt; for each call, which caps output length. The biggest lever is the &lt;code&gt;model_used&lt;/code&gt; status field: entries that already have good content don't get regenerated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens if a GitHub repo is deleted?&lt;/strong&gt;&lt;br&gt;
Right now the row goes stale silently. The fetch fails, the error is logged, and the next build still renders the row with whatever data the last successful fetch stored. Adding a 404-specific handler that sets &lt;code&gt;deprecated_at&lt;/code&gt; is on the backlog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/articles/sleep-intervals-steam-github-huggingface-etl"&gt;Three sleep intervals for Steam, GitHub, and HuggingFace ETLs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/articles/noindex-gate-programmatic-pages-without-404s"&gt;How I kept 62 of 80 programmatic pages alive while hiding them from Google&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>typescript</category>
      <category>turso</category>
      <category>claude</category>
    </item>
  </channel>
</rss>
