<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jeremy Longshore</title>
    <description>The latest articles on DEV Community by Jeremy Longshore (@jeremy_longshore).</description>
    <link>https://dev.to/jeremy_longshore</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3842419%2Ff5d02b54-daf0-4520-9aef-118fbd0c24ac.jpeg</url>
      <title>DEV Community: Jeremy Longshore</title>
      <link>https://dev.to/jeremy_longshore</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jeremy_longshore"/>
    <language>en</language>
    <item>
      <title>Honor the Gate When the Verdict Is Inconvenient</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Sun, 14 Jun 2026 13:00:28 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/honor-the-gate-when-the-verdict-is-inconvenient-4oep</link>
      <guid>https://dev.to/jeremy_longshore/honor-the-gate-when-the-verdict-is-inconvenient-4oep</guid>
      <description>&lt;p&gt;On June 10, 2026, two completely unrelated systems hit quality gates that came back the wrong way. Neither system rationalized the verdict. Both honored it. That honesty is what makes the gate worth having.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Thesis
&lt;/h2&gt;

&lt;p&gt;A quality gate is only worth building if you will honor its verdict when the verdict is inconvenient. If you override a STOP without pre-registering a follow-up test, or fake a green check because a tool can't run, you're not adding rigor—you're adding theater. The gate becomes another notification that landed in your inbox yesterday.&lt;/p&gt;

&lt;p&gt;On the same day, two teams in the Intent Solutions portfolio faced exactly that choice. The stories look nothing alike. The discipline they landed on was identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Story 1: Semantic-Flux—Honoring a Pre-Registered STOP on a $4 ML Signal Test
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;semantic-flux&lt;/strong&gt; is building QCSS, a query-conditioned semantic search architecture. The patented core is a FiLM layer (feature-wise linear modulation) that lets the query modulate the document encoder's activations.&lt;/p&gt;

&lt;p&gt;The project history matters because it shows why pre-registration exists. Earlier runs—v1, v2, v3—each produced results that got rationalized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;v1: 4.79% nDCG FAIL. Someone overrode it. Reason: "It was undertrained." But no pre-registered test was written that could have disproven that excuse. The goalposts moved after seeing the number.&lt;/li&gt;
&lt;li&gt;v3: Used a pretrained MiniLM backbone, which is itself a trained retriever. The PASS measured the backbone's strength, not QCSS. The FiLM layer was never ablated away.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three runs. No trustworthy verdict. Because every excuse closed the gate after the number landed.&lt;/p&gt;

&lt;p&gt;So before v4 ran, the team locked a commitment into &lt;code&gt;DECISIONS.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PROCEED to Phase 1 only if ALL three:
  - film_lift (nDCG@10 with ON minus OFF) ≥ +0.03, 95% CI excludes 0
  - scratch encoder beats BM25 floor (0.268)
  - seed std ≤ 0.02

STOP if:
  - film_lift ≤ +0.01 (conditioning inert)
  - scratch encoder ≤ BM25 (can't beat 1990s baseline)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Timestamped. Before the number. The FiLM-ablation A/B (query-conditioning ON vs OFF on the same encoder) was the only way to isolate the patented mechanism.&lt;/p&gt;

&lt;p&gt;On June 10, the v4 result landed (HF job 6a28d1c9, NVIDIA L4, 3 seeds, full NFCorpus = 3,633 docs):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BM25 floor: 0.268&lt;/li&gt;
&lt;li&gt;BGE ceiling: 0.371&lt;/li&gt;
&lt;li&gt;Scratch encoder with FiLM-on: 0.0096&lt;/li&gt;
&lt;li&gt;Scratch encoder with FiLM-off: 0.0086&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;film_lift = +0.001&lt;/strong&gt;—95% CI straddles zero&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The FiLM weights never moved from 0.0002–0.0014 across every arm. The mechanism sat at identity.&lt;/p&gt;

&lt;p&gt;STOP fired on two counts: film_lift inert AND scratch encoder 0.01 ≪ BM25 0.27.&lt;/p&gt;

&lt;p&gt;The team honored it.&lt;/p&gt;

&lt;p&gt;The honest read from &lt;code&gt;DECISIONS.md&lt;/code&gt;: "This recipe does not work" because two things went wrong: (1) the from-scratch encoder learned almost nothing (trained on only 2,202 queries; MS MARCO has millions); (2) FiLM never trained—a zero-init / backbone-competition deadlock the team had flagged in advance. The result did not falsify QCSS—the mechanism never engaged—and did not support it either. The ~$4 signal test correctly said: do not spend the $2–5K Phase 1 yet.&lt;/p&gt;

&lt;p&gt;Here's the transferable artifact. The team baked an &lt;strong&gt;anti-rationalization rule&lt;/strong&gt; into the project: overriding a STOP requires a NEW &lt;code&gt;DECISIONS.md&lt;/code&gt; entry that (i) names the specific confound believed to explain the result AND (ii) pre-registers a falsifying follow-up test. "It was undertrained" is not an admissible override without a pre-registered test that could disprove it.&lt;/p&gt;

&lt;p&gt;That rule exists because v1's FAIL was overridden without (ii).&lt;/p&gt;

&lt;p&gt;The result didn't make the business decision. But it made the decision honest and bounded: either retry with frozen backbone / non-zero FiLM init / higher FiLM learning rate + MS MARCO data (same locked gate, pre-registered falsifier), or halt empirics and file the structural claim. Either way, $4 stopped from turning into $5K of motivated spending.&lt;/p&gt;

&lt;h2&gt;
  
  
  Story 2: agent-governance-plane—Deleting a CI Gate Rather Than Faking It Green
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;agent-governance-plane&lt;/strong&gt; is a TypeScript/Bun project—a policy and governance layer for agent sessions. On June 10 it shipped v0.1.46 and v0.1.47.&lt;/p&gt;

&lt;p&gt;v0.1.46 delivered real infrastructure: PR #67 added credential injection into a sandbox plus a test that actually proves network isolation, not just assumes it. Shipped clean.&lt;/p&gt;

&lt;p&gt;v0.1.47 (PR #68) set out to close two test-infra gaps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;Stryker mutation-testing gate&lt;/strong&gt;—instrumentation of the two highest-risk files (src/policy/engine.ts, src/policy/dangerous.ts), a fail-closed mutation-gate.sh script, and the runner pinned to @stryker-mutator/&lt;a href="mailto:core@9.6.1"&gt;core@9.6.1&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Gherkin BDD acceptance layer&lt;/strong&gt;—tests/features/J1-governed-session.feature with 6 scenarios, backed by real assertions against real modules, 24 expect() calls, no tautologies.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then Stryker hit a wall: v9 cannot instrument this Bun/TS codebase. The babel instrumenter threw &lt;code&gt;TypeError: generator is not a function&lt;/code&gt; (CI run 27325670887). The Bun mutation-runner toolchain isn't there yet, and no fix was in reach for this PR.&lt;/p&gt;

&lt;p&gt;That left exactly three options for the new "Mutation testing" CI check:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Leave it &lt;strong&gt;permanently red&lt;/strong&gt;—every PR shows a failing check that means nothing. The team learns to ignore red checks. This is exactly &lt;a href="https://dev.to/posts/stop-crying-wolf-3-strike-uptime-monitor-gate/"&gt;alert fatigue&lt;/a&gt;, and it kills every gate downstream.&lt;/li&gt;
&lt;li&gt;Make it &lt;strong&gt;fake-green&lt;/strong&gt;—&lt;code&gt;continue-on-error: true&lt;/code&gt; or a script that always exits 0. A green checkmark that lies. The label says "Mutation testing passed" when mutation testing never ran.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove the check&lt;/strong&gt;—keep the stryker.config.json and mutation-gate.sh as hash-pinned scaffolding, document the toolchain block in tests/TESTING.md, and file a tracked bead to re-wire it when a Bun-compatible runner exists.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;They chose removal.&lt;/p&gt;

&lt;p&gt;The commit message: &lt;em&gt;"A permanently-red OR fake-green 'Mutation testing' check both violate the repo's honest-gate culture, so the CI job is removed."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The actual deliverable—the BDD acceptance layer—shipped unaffected. All the real hard gates passed: typecheck, biome lint, coverage-gate at 91.43%, claim-scan, doc-drift audit, harness verify, escape-scan (REFUSE=0, CHALLENGE=0).&lt;/p&gt;

&lt;p&gt;The transferable point: a green checkmark is a claim. "Mutation testing passed" has to &lt;em&gt;mean&lt;/em&gt; &lt;a href="https://dev.to/posts/manifest-system-mutation-testing-pyramid/"&gt;mutation testing&lt;/a&gt; ran. If a tool can't run against your codebase, a check that always passes is worse than no check. It launders trust. Removing it (with scaffolding retained and a tracked bead filed) is the honest move, not a regression.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Parallel
&lt;/h2&gt;

&lt;p&gt;Two domains that share nothing technically—a 530K-param IR encoder on an L4 GPU and a Bun/TS governance layer's CI pipeline—converged on the same discipline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A gate's value is entirely in whether you honor its verdict when it's inconvenient.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The two failure modes are symmetric:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rationalizing a STOP after the number lands&lt;/strong&gt; (semantic-flux v1: "it was undertrained")—textbook confirmation bias. Converts a signal into a comforting excuse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faking a green when you can't run honestly&lt;/strong&gt; (the Stryker check: &lt;code&gt;continue-on-error: true&lt;/code&gt;). Converts a missing signal into a false confirmation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both erode trust. Both invite drift.&lt;/p&gt;

&lt;p&gt;The defenses are also symmetric:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pre-registration&lt;/strong&gt;: Write the gate before the number lands so the verdict is binding. The anti-rationalization rule is the written form—name the confound, pre-register the falsifier, or the override doesn't count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Honest-gate culture&lt;/strong&gt;: A check must mean what its label says. If a tool can't run against your codebase, remove the check openly rather than let it lie. Keep the scaffolding, file a bead, move on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both teams kept an audit trail and a path forward. semantic-flux logged the STOP, the two confounds, and the pre-registered retry option. agent-governance-plane kept the stryker.config.json hash-pinned and filed bead agp-7r4 to re-wire it when the Bun runner ships.&lt;/p&gt;

&lt;p&gt;Honoring a gate is not giving up. It's refusing to lie about where you are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also Shipped
&lt;/h2&gt;

&lt;p&gt;Elsewhere in the portfolio on the same day, the discipline was quieter but the same — ship the real thing, document the state honestly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;intent-solutions-landing&lt;/strong&gt;: Deep clean—the marketing site went zero-GCP. Firebase Analytics → self-hosted Umami, deleted ~116MB Firebase Cloud Functions stack, npm audit fix took 25 vulns (13 high) down to 5 moderate / 0 high.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;intentsolutions-vps-runbook&lt;/strong&gt;: Slack alerting cutover from one firehose channel to 11 named channels, fully wired end-to-end with smoke-test evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;learn-intentsolutions&lt;/strong&gt;: 5 study-notes pages distilling a Kobiton Phase-A literature pass (OSS community formation, pain-driven adoption, real-device-cloud market positioning, DevRel effectiveness).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Cost Asymmetry
&lt;/h2&gt;

&lt;p&gt;The semantic-flux gate cost ~$4 and saved a possible $2–5K of motivated spending downstream. The mutation-check decision cost one CI job and saved the credibility of every other green check in the repo. Cheap insurance against expensive self-deception. Honor the gate when the verdict is inconvenient—that's the whole discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/posts/the-wrong-product-built-perfectly/"&gt;The Wrong Product, Built Perfectly&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/posts/honest-perf-benchmarks-paid-api-compiler/"&gt;Honest Perf Benchmarks for a Paid-API Compiler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/posts/manifest-system-mutation-testing-pyramid/"&gt;Manifest System + Mutation Testing: Two Ways to Find Out What Actually Works&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;{&lt;br&gt;
  "&lt;a class="mentioned-user" href="https://dev.to/context"&gt;@context&lt;/a&gt;": "&lt;a href="https://schema.org" rel="noopener noreferrer"&gt;https://schema.org&lt;/a&gt;",&lt;br&gt;
  "@type": "BlogPosting",&lt;br&gt;
  "headline": "Honor the Gate When the Verdict Is Inconvenient",&lt;br&gt;
  "description": "A quality gate only matters if you honor its verdict. How pre-registration and honest-gate culture stopped two teams from faking green or rationalizing a STOP.",&lt;br&gt;
  "datePublished": "2026-06-10T10:00:00-05:00",&lt;br&gt;
  "dateModified": "2026-06-10T10:00:00-05:00",&lt;br&gt;
  "author": {&lt;br&gt;
    "@type": "Person",&lt;br&gt;
    "name": "Jeremy Longshore"&lt;br&gt;
  },&lt;br&gt;
  "publisher": {&lt;br&gt;
    "@type": "Organization",&lt;br&gt;
    "name": "Start AI Tools"&lt;br&gt;
  },&lt;br&gt;
  "url": "&lt;a href="https://startaitools.com/posts/honor-the-gate-when-the-verdict-is-inconvenient/" rel="noopener noreferrer"&gt;https://startaitools.com/posts/honor-the-gate-when-the-verdict-is-inconvenient/&lt;/a&gt;",&lt;br&gt;
  "keywords": "quality gates, CI gates, mutation testing, pre-registration, honest metrics, testing, ci-cd, ml-engineering, devops"&lt;br&gt;
}&lt;/p&gt;

</description>
      <category>testing</category>
      <category>cicd</category>
      <category>mlengineering</category>
      <category>devops</category>
    </item>
    <item>
      <title>Human-in-the-Loop Is a Delivery Guarantee, Not a UI Feature</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Sun, 14 Jun 2026 13:00:27 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/human-in-the-loop-is-a-delivery-guarantee-not-a-ui-feature-3je7</link>
      <guid>https://dev.to/jeremy_longshore/human-in-the-loop-is-a-delivery-guarantee-not-a-ui-feature-3je7</guid>
      <description>&lt;p&gt;Two repos. One missing guarantee.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;agent-governance-plane&lt;/code&gt;, a human's approval was cryptographically signed with Ed25519 and written to a tamper-evident journal. Solid. Except the only &lt;code&gt;InteractionSource&lt;/code&gt; wired into the system was an in-memory test stub. A human could see an Allow/Deny prompt — but the click had no way home. The signed approval was a letter with no mailbox.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;claude-code-slack-channel&lt;/code&gt;, an agent's reply to a Slack thread was a synchronous tool call. If the process died between "decide to send" and "send," the reply vanished. No turn-terminal flush, no retry, no record that an obligation ever existed. The user just waited.&lt;/p&gt;

&lt;p&gt;Different repos, different directions of travel — one inbound (receive a human's decision), one outbound (deliver an agent's reply). Same hole: the part where a human and an agent actually hand work to each other was the part nobody made durable.&lt;/p&gt;

&lt;p&gt;On 2026-06-07 both repos closed that hole. They shipped the &lt;strong&gt;same four-move discipline&lt;/strong&gt;. And one repo's spec was lifted, by name, from the other's pattern. That is the part worth your attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reframe: this is distributed systems, not UI
&lt;/h2&gt;

&lt;p&gt;Human-in-the-loop gets filed under "product" — a button, a modal, a confirmation dialog. That framing is why it breaks in production. The moment a decision has to survive a crash, an ack-loss, or a dropped socket, you are no longer building UI. You are building a durable message system, and the well-known failure modes apply.&lt;/p&gt;

&lt;p&gt;The receiver and the reply path are both solving &lt;strong&gt;exactly-once delivery&lt;/strong&gt;, &lt;strong&gt;message deduplication&lt;/strong&gt;, and &lt;strong&gt;fail-closed defaults&lt;/strong&gt;. Strip away Slack and the problem is identical to any outbox or any consumer that must not lose, must not duplicate, and must not silently double-act.&lt;/p&gt;

&lt;p&gt;Four moves fall out of that framing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Record the obligation before the send.&lt;/strong&gt; Crash-before-send must be safe — the obligation outlives the process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stamp an idempotency key into the message&lt;/strong&gt; so a later scan can recognize "already delivered."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redeliver idempotently from a poller&lt;/strong&gt; — a background drain that reconciles from disk, so ack-loss redelivery is a no-op.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail closed on no-decision.&lt;/strong&gt; Timeout, dropped socket, no lease → deny and journal, or queue. Never crash, never silently double-act.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hold those four moves. They recur on both sides.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reply outbox: CCSC's durable delivery pattern
&lt;/h2&gt;

&lt;p&gt;CCSC has an inverted architecture, and the inversion is exactly why durability is hard: &lt;strong&gt;a reply is a synchronous tool call, not a turn-terminal event.&lt;/strong&gt; There is no natural "end of turn" to flush at. That is what the &lt;strong&gt;outbox pattern&lt;/strong&gt; is for: record the obligation durably &lt;em&gt;before&lt;/em&gt; attempting the send, so a crash can't lose it. The agent calls a reply tool mid-reasoning and expects delivery to just happen. So the durability machinery has to live behind the tool, invisible to the caller.&lt;/p&gt;

&lt;p&gt;The reply-delivery contract (ADR-002 addendum, "Option A: a safety-net behind the reply tool") is deliberately narrow: &lt;strong&gt;single-message text replies only.&lt;/strong&gt; One obligation equals one message, so the idempotency is exact. Chunked, file, and streaming replies stay best-effort and do not enqueue — zero double-send risk, durability deferred to later beads. Scope discipline is part of the design, not a shortcut.&lt;/p&gt;

&lt;p&gt;The machinery lives in &lt;code&gt;slack-delivery.ts&lt;/code&gt; — a side-effect-free sibling module, deliberately &lt;em&gt;not&lt;/em&gt; inline in &lt;code&gt;server.ts&lt;/code&gt;, so it is testable without dragging in server module-load side effects. The center of it is &lt;code&gt;deliverReplyDurably&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Record the obligation BEFORE the send (move 1).&lt;/span&gt;
&lt;span class="c1"&gt;// Crash here is safe — the poller will find the pending obligation and drain it.&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;deliverReplyDurably&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;IdempotentSendDeps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReplyObligation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;obligation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recordPending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// UUID id == idempotency key&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;obligation&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;           &lt;span class="c1"&gt;// one inline attempt&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markDelivered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;obligation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;delivered&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isTransient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Queued — the poller redelivers idempotently. Tell the agent "queued"&lt;/span&gt;
      &lt;span class="c1"&gt;// so it does NOT retry and double-post (move 4: never double-act).&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;queued&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markDead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;obligation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;           &lt;span class="c1"&gt;// non-retryable → recorded dead&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The obligation id is a fresh UUID per reply call, and that id &lt;em&gt;is&lt;/em&gt; the idempotency key. So when the poller later redelivers the same obligation, it dedups against itself. That is move 2, and it lives in the message metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The single site that stamps delivery metadata onto the outbound message.&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;postReply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;WebClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;obligation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReplyObligation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;postMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;obligation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;thread_ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;obligation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;obligation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DELIVERY_METADATA_EVENT_TYPE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;event_payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Move 3 is the scan. Before sending, or when redelivering, &lt;code&gt;findDelivered&lt;/code&gt; walks the destination thread via &lt;code&gt;conversations.replies&lt;/code&gt; with &lt;code&gt;include_all_metadata: true&lt;/code&gt;, looking for a message &lt;em&gt;we&lt;/em&gt; posted carrying our delivery &lt;code&gt;event_type&lt;/code&gt; and a matching &lt;code&gt;idempotency_key&lt;/code&gt;. A hit returns the existing &lt;code&gt;ts&lt;/code&gt; — so an ack-loss redelivery becomes a no-op instead of a duplicate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// A redelivery after ack-loss finds the prior post and returns its ts. No second message.&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;findDelivered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;WebClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;conversations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replies&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;include_all_metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;DELIVERY_METADATA_EVENT_TYPE&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;event_payload&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;idempotency_key&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The poller itself (PR #228) is a &lt;code&gt;deliveryTimer&lt;/code&gt; tick calling &lt;code&gt;supervisor.drainOutbox()&lt;/code&gt; on an interval — &lt;code&gt;SLACK_DELIVERY_POLL_MS&lt;/code&gt;, default 15s, the timer &lt;code&gt;unref&lt;/code&gt;'d so it never holds the process open. A &lt;strong&gt;boot-time drain&lt;/strong&gt; recovers crash-pending obligations on startup, and the timer is cleared on shutdown before the supervisor drain, mirroring the existing idle-reaper exactly.&lt;/p&gt;

&lt;p&gt;Fail-closed also means degrading gracefully when the outbox itself is unavailable. &lt;code&gt;DurableUnavailableError&lt;/code&gt; is thrown &lt;em&gt;before&lt;/em&gt; any obligation is recorded — the outbox isn't activated, or there's no lease. The caller catches it and falls back to the prior direct send. Nothing is persisted, nothing needs redelivery, and there is zero regression versus the old path. Durability is additive; its absence degrades gracefully to what shipped before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not the obvious approach?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why not just retry inline?&lt;/strong&gt; Because inline retry only survives failures the process is alive to handle. The crash-before-send window — record nothing, send nothing, die — is exactly the window inline retry can't cover. The obligation has to exist on disk before the attempt, or there is nothing to retry from.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not best-effort fire-and-forget?&lt;/strong&gt; Because "the reply usually arrives" is not a contract a human can build on. In a HITL loop the reply &lt;em&gt;is&lt;/em&gt; the work product. Best-effort means the agent thinks it answered and the human is still waiting — the worst failure, because nobody knows it failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why is fail-open the dangerous default for an approval gate?&lt;/strong&gt; This is the load-bearing one. If a receiver times out and the system fails &lt;em&gt;open&lt;/em&gt;, the gated action proceeds without a human decision. An approval gate has to act &lt;em&gt;exactly once&lt;/em&gt; on a real human decision; failing open makes it act on &lt;em&gt;no&lt;/em&gt; decision at all. The entire reason the gate exists is to stop unapproved actions; failing open deletes the gate precisely when it matters. For anything guarding an action, no-decision must mean &lt;strong&gt;deny&lt;/strong&gt;, not &lt;strong&gt;proceed&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The approval receiver: AGP's Socket Mode pattern
&lt;/h2&gt;

&lt;p&gt;AGP comes at the same four moves from the inbound side. PR #66 builds the production receiver per spec &lt;strong&gt;033-AT-SPEC — which is explicitly "lifted from the CCSC pattern, completes the HITL round-trip."&lt;/strong&gt; This is the keystone. What transferred wasn't code: CCSC delivers outbound and AGP receives inbound, so the two share not a single line. What transferred was the discipline — record the obligation, key it, reconcile it, fail closed — restated as a spec for an inbound approval channel. The convergence is not two teams independently reinventing a wheel; one read the other's pattern and applied it to the mirror-image problem.&lt;/p&gt;

&lt;p&gt;The transport is Socket Mode: an &lt;strong&gt;outbound&lt;/strong&gt; WebSocket from the control plane to Slack. No public ingress — which honors AGP's "no public surface until defensible" P0 decision. You get durability &lt;em&gt;and&lt;/em&gt; no inbound attack surface. That combination is the design point.&lt;/p&gt;

&lt;p&gt;Parsing is a pure function, so it is trivially testable and impossible to make stateful by accident:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// parseBlockAction — pure. block_actions payload → SlackInteraction, or null for noise.&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseBlockAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SlackPayload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;SlackInteraction&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;block_actions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// ack-and-ignore&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;action_id&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;approve&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;isBot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;is_bot&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The receiver holds a &lt;strong&gt;pending-by-nonce promise Map&lt;/strong&gt;, acks every envelope first (Slack drops you if you're slow), then resolves the awaiting &lt;code&gt;awaitInteraction(nonce)&lt;/code&gt; on a matching click. A &lt;code&gt;resolved&lt;/code&gt; Set makes replay detection explicit — a click that arrives for an already-settled nonce is &lt;em&gt;reported as a replay&lt;/em&gt;, never acted on a second time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SocketModeInteractionSource&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Deferred&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Decision&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;resolved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Surface a stray/replayed click via onRejected — report it, never act on it.&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* onRejected({ nonce, reason }) */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;onInteraction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SlackInteraction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// move 4: a replayed click is a no-op, surfaced as a reason — not a second approval.&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;nonce already used (replay)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown nonce&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Timeout defaults to the nonce TTL (5 min) — the receiver never out-waits the nonce.&lt;/span&gt;
  &lt;span class="nf"&gt;awaitInteraction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* register + arm TTL timer */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That replay-detection guard is move 4 again, mirrored: the outbox's never-double-&lt;em&gt;send&lt;/em&gt; rule becomes the receiver's never-double-&lt;em&gt;act&lt;/em&gt; rule. Same principle, opposite direction of travel.&lt;/p&gt;

&lt;p&gt;Fail-closed shows up in three places, and all three matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;stop()&lt;/code&gt; closes the socket and &lt;strong&gt;fails closed on every still-pending approval&lt;/strong&gt; — a shutdown mid-decision denies, it does not hang.&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;run.ts&lt;/code&gt;, &lt;code&gt;AGP_CHANNEL=slack&lt;/code&gt; plus &lt;code&gt;AGP_SLACK_LIVE=1&lt;/code&gt; constructs the live receiver. &lt;strong&gt;Unset &lt;code&gt;AGP_SLACK_LIVE&lt;/code&gt; fails closed&lt;/strong&gt; — the system refuses to post a prompt that nothing can answer. A prompt with no receiver is worse than no prompt; it implies a decision is being collected when none can be.&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;daemon.ts&lt;/code&gt;, a no-decision — receiver timeout or socket drop — now &lt;strong&gt;fails closed (deny + journaled reason) instead of crashing the loop&lt;/strong&gt;, in both &lt;code&gt;mediate()&lt;/code&gt; and &lt;code&gt;gate()&lt;/code&gt;. It reuses the existing &lt;code&gt;approval.denied&lt;/code&gt; journal kind, so there is no schema change: a fail-closed deny is indistinguishable, downstream, from an explicit human deny. The action does not happen, and there is a signed record of why.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;FetchWebSocketDialer&lt;/code&gt; (&lt;code&gt;apps.connections.open&lt;/code&gt; → &lt;code&gt;wss&lt;/code&gt;) injects both &lt;code&gt;fetch&lt;/code&gt; and the socket constructor, so the response and adapter logic is CI-tested; only the real &lt;code&gt;new WebSocket&lt;/code&gt; seam runs off-CI under &lt;code&gt;AGP_SLACK_LIVE&lt;/code&gt;. The same testability discipline as CCSC's side-effect-free module — extract the seam, test everything up to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The transferable rule
&lt;/h2&gt;

&lt;p&gt;If you are building any human-in-the-loop agent — Slack, email, a web approval, a CLI prompt — these four moves are your checklist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Record the obligation before the send.&lt;/strong&gt; Crash-before-send must be safe.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stamp an idempotency key into the message.&lt;/strong&gt; A later scan must be able to recognize "already done."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redeliver idempotently from a poller.&lt;/strong&gt; Reconcile from durable state; make redelivery a no-op.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail closed on no-decision.&lt;/strong&gt; Timeout, dropped socket, no lease → deny and journal, or queue. Never crash, never silently double-act.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Move 4 is the non-negotiable one. Anything that gates an action must treat the absence of a human decision as a &lt;em&gt;no&lt;/em&gt;. Fail-open turns a safety gate into a rubber stamp the instant the network hiccups.&lt;/p&gt;

&lt;p&gt;The convergence is the evidence. When two systems built for mirror-image roles — one delivering replies out, one receiving approvals in — arrive at the same discipline, and one explicitly cites the other, that is not a local trick. It is the shape of the problem. And the discipline it demands is non-negotiable: an agent you can't trust to fail safely is an agent you can't deploy.&lt;/p&gt;

&lt;p&gt;Both shipped clean. CCSC's &lt;code&gt;slack-delivery.ts&lt;/code&gt; hit 100% line coverage; the test suite grew from 1127 to 1133 tests across the three PRs with all nine gates green each time, and the durable path was extracted to &lt;code&gt;executeReplyDurablePath&lt;/code&gt; to keep &lt;code&gt;executeReply&lt;/code&gt; under CRAP 30. AGP landed at 88.89% function / 91.43% line coverage — over the repo's configured floor — with typecheck, Biome, claim-scan, harness verify, and escape-scan all green. The PR sequencing in CCSC is itself the lesson: #228 wired the poller into the runtime without touching the reply tool path (&lt;a href="https://dev.to/posts/ship-dormant-wire-later-multi-agent-slack/"&gt;machinery live but dormant&lt;/a&gt;), #229 added the tested building block plus the ADR (design-first), #230 flipped &lt;code&gt;executeReply&lt;/code&gt; to route through it. The security-sensitive change got its own isolated PR. Ship dormant, wire later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also shipped
&lt;/h2&gt;

&lt;p&gt;The same day, &lt;code&gt;claude-code-plugins&lt;/code&gt; landed a deterministic-CI grading track: a &lt;code&gt;pr-classifier&lt;/code&gt; doing file-level PR component detection, feeding per-domain lint workflows plus actionlint and a path-routing test, then a PR-level grade coordinator with golden fixtures — alongside the public 100-point grading rubric at &lt;code&gt;/grading&lt;/code&gt;, a 176-test pytest harness for the penetration-tester pack, and a switch of the PR pre-screen LLM from Groq to DeepSeek. And &lt;code&gt;contributing-clanker&lt;/code&gt; added two gates: C24 (engagement-frame) and C25 (maintainer-URL leakage).&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/posts/ship-dormant-wire-later-multi-agent-slack/"&gt;Ship Dormant, Wire Later — A Multi-Agent Slack Production Day&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/posts/making-agents-reliable-on-real-device-clouds/"&gt;Making Agents Reliable on Real-Device Clouds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/posts/server-ops-mcp-safety-before-tools/"&gt;Safety Model First: 16-Tool Ops MCP, One Day&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;{&lt;br&gt;
  "&lt;a class="mentioned-user" href="https://dev.to/context"&gt;@context&lt;/a&gt;": "&lt;a href="https://schema.org" rel="noopener noreferrer"&gt;https://schema.org&lt;/a&gt;",&lt;br&gt;
  "@type": "BlogPosting",&lt;br&gt;
  "headline": "Human-in-the-Loop Is a Delivery Guarantee, Not a UI Feature",&lt;br&gt;
  "description": "Human-in-the-loop agent delivery is exactly-once, fail-closed. Two repos shipped the same four-move discipline the same day — convergence, not coincidence.",&lt;br&gt;
  "datePublished": "2026-06-07T08:00:00-05:00",&lt;br&gt;
  "author": {&lt;br&gt;
    "@type": "Person",&lt;br&gt;
    "name": "Jeremy Longshore",&lt;br&gt;
    "url": "&lt;a href="https://startaitools.com/about/" rel="noopener noreferrer"&gt;https://startaitools.com/about/&lt;/a&gt;"&lt;br&gt;
  },&lt;br&gt;
  "publisher": {&lt;br&gt;
    "@type": "Organization",&lt;br&gt;
    "name": "StartAITools",&lt;br&gt;
    "url": "&lt;a href="https://startaitools.com" rel="noopener noreferrer"&gt;https://startaitools.com&lt;/a&gt;"&lt;br&gt;
  },&lt;br&gt;
  "articleSection": "Technical Deep-Dive",&lt;br&gt;
  "keywords": "ai-agents, typescript, architecture, slack, distributed-systems",&lt;br&gt;
  "mainEntityOfPage": {&lt;br&gt;
    "@type": "WebPage",&lt;br&gt;
    "&lt;a class="mentioned-user" href="https://dev.to/id"&gt;@id&lt;/a&gt;": "&lt;a href="https://startaitools.com/posts/hitl-delivery-is-a-fail-closed-exactly-once-problem/" rel="noopener noreferrer"&gt;https://startaitools.com/posts/hitl-delivery-is-a-fail-closed-exactly-once-problem/&lt;/a&gt;"&lt;br&gt;
  }&lt;br&gt;
}&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>typescript</category>
      <category>architecture</category>
      <category>slack</category>
    </item>
    <item>
      <title>The Wrong Product, Built Perfectly</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Wed, 10 Jun 2026 13:27:20 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/the-wrong-product-built-perfectly-2ipm</link>
      <guid>https://dev.to/jeremy_longshore/the-wrong-product-built-perfectly-2ipm</guid>
      <description>&lt;p&gt;A clean pipeline faithfully amplifies a requirements-level error all the way to production. Nothing in a good build process catches a misread spec. The process makes the misread arrive faster, signed, and TLS-valid.&lt;/p&gt;

&lt;p&gt;On June 5 we stood up a new site, deployed it through the canonical VPS-as-the-home pattern, and watched it go green — valid Let's Encrypt cert, &lt;code&gt;healthz&lt;/code&gt; returning 200, the whole chain — in under an hour. Then it was declared the wrong product. The entire premise was inverted in one sentence.&lt;/p&gt;

&lt;p&gt;Here is the part worth keeping: the reversal was cheap. Not because the process was good — it was, but that's not the reason. It was cheap because the expensive, durable infrastructure had been decoupled from the product frame we'd read wrong. A 100% inversion of the product cost us content, not a rebuild. That's the thesis, and it's the only insurance that paid off.&lt;/p&gt;

&lt;h2&gt;
  
  
  The build that worked
&lt;/h2&gt;

&lt;p&gt;The request arrived around 09:35 local: stand up &lt;code&gt;learn.intentsolutions.io&lt;/code&gt; as a "learning hub, open to public," with links out to the owner's other properties so it's "easy to know where to click for what." Deploy it the usual way.&lt;/p&gt;

&lt;p&gt;We read "learning hub, open to public" as an outward-facing marketing property — a hub where the public comes to learn. That reading drove everything downstream.&lt;/p&gt;

&lt;p&gt;The plan was complete and specific: an information architecture of hero → four-role audience triage → property cards → featured content → FAQ → footer. Charcoal Slate and Zinc color tokens with brutalist CTA accents. Full SEO: &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt;, OpenGraph and Twitter meta, JSON-LD &lt;code&gt;Organization&lt;/code&gt; + &lt;code&gt;WebSite&lt;/code&gt; + &lt;code&gt;FAQPage&lt;/code&gt; schema. The option to fan out five design subagents was on the table and declined as process theater — the IA was already specified, so the build went direct. Hold that detail; it matters later, and not in the direction you'd guess.&lt;/p&gt;

&lt;p&gt;The execution was clean and fast. The error was entirely upstream of it.&lt;/p&gt;

&lt;p&gt;The deploy followed the VPS-as-the-home pattern — eleven steps, every one of them touching real, durable external state. A public GitHub repo, &lt;a href="https://github.com/jeremylongshore/learn-intentsolutions" rel="noopener noreferrer"&gt;&lt;code&gt;jeremylongshore/learn-intentsolutions&lt;/code&gt;&lt;/a&gt;, with the Hugo scaffold pushed. An ed25519 deploy key whose public half lands on the VPS behind a &lt;a href="https://man.openbsd.org/sshd.8" rel="noopener noreferrer"&gt;force-command lock&lt;/a&gt;, so that key can do exactly one thing and nothing else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ssh"&gt;&lt;code&gt;&lt;span class="c1"&gt;# /home/deploy/.ssh/authorized_keys on the VPS&lt;/span&gt;
&lt;span class="k"&gt;command&lt;/span&gt;="/usr/local/sbin/deploy-learn-intentsolutions",no-port-forwarding,no-pty ssh-ed25519 AAAA... learn-deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key cannot open a shell. It cannot forward a port. It triggers one script and exits. The script is the entire deploy surface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="c"&gt;# /usr/local/sbin/deploy-learn-intentsolutions&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail
&lt;span class="nb"&gt;cd&lt;/span&gt; /srv/learn-intentsolutions/checkout
git fetch &lt;span class="nt"&gt;--quiet&lt;/span&gt; origin main
git reset &lt;span class="nt"&gt;--hard&lt;/span&gt; &lt;span class="nt"&gt;--quiet&lt;/span&gt; origin/main
hugo &lt;span class="nt"&gt;--minify&lt;/span&gt; &lt;span class="nt"&gt;--gc&lt;/span&gt;
rsync &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nt"&gt;--delete&lt;/span&gt; public/ /srv/learn-intentsolutions/dist/
&lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /srv/learn-intentsolutions/dist/healthz   &lt;span class="c"&gt;# fail the deploy if the build is empty&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the trust plumbing — Tailscale OIDC scoped to exactly this repo, deliberately not the org wildcard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Tailscale ACL — OIDC subject scoped to ONE repo&lt;/span&gt;
&lt;span class="s2"&gt;"subject"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"repo:jeremylongshore/learn-intentsolutions:*"&lt;/span&gt;
&lt;span class="c1"&gt;# NOT "repo:jeremylongshore/*:*" — a prior AAR's root-cause forbids the wildcard.&lt;/span&gt;
&lt;span class="c1"&gt;# A wildcard subject means any repo in the org can assume the deploy identity.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four GitHub Actions secrets set with the direct-argument form, because a prior post-mortem found zsh corrupts secrets piped through stdin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh secret &lt;span class="nb"&gt;set &lt;/span&gt;TS_OAUTH_CLIENT_ID    &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CLIENT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
gh secret &lt;span class="nb"&gt;set &lt;/span&gt;TS_OAUTH_SECRET       &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CLIENT_SECRET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="c"&gt;# --body "$VALUE", never `echo "$VALUE" | gh secret set` — stdin gets mangled.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A Caddy block for the subdomain, a Porkbun A-record at 60-second TTL for a fast cutover, and a &lt;code&gt;.github/workflows/deploy.yml&lt;/code&gt; calling the reusable &lt;code&gt;vps-deploy.yml&lt;/code&gt; with &lt;code&gt;variant: static&lt;/code&gt;. Eleven steps, all of them the kind of state that's annoying to create and annoying to recreate.&lt;/p&gt;

&lt;p&gt;By every operational metric, this was a flawless ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  The detour
&lt;/h2&gt;

&lt;p&gt;The deploy didn't go clean on the first pass, and the failure is worth a paragraph because it relocated the blame correctly.&lt;/p&gt;

&lt;p&gt;The Caddy reload hung. Two things were wrong at once, which is the worst kind. First, &lt;code&gt;systemctl reload caddy&lt;/code&gt; was timing out mid-apply — the systemd unit wrapper, not Caddy itself, was the bottleneck. Second, Let's Encrypt's &lt;a href="https://letsencrypt.org/docs/challenge-types/" rel="noopener noreferrer"&gt;HTTP-01 ACME challenge&lt;/a&gt; couldn't complete because the DNS A-record didn't resolve yet; there was no resolvable name for the challenge to hit.&lt;/p&gt;

&lt;p&gt;The fix was two moves. Reorder, so the DNS record exists before the reload and the ACME challenge has something to resolve:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# WRONG order: reload first, then DNS — ACME challenge has no name to hit.&lt;/span&gt;
&lt;span class="c"&gt;# RIGHT order: DNS first, let it propagate, then reload.&lt;/span&gt;
porkbun-cli dns create intentsolutions.io &lt;span class="nt"&gt;--type&lt;/span&gt; A &lt;span class="nt"&gt;--name&lt;/span&gt; learn &lt;span class="nt"&gt;--content&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VPS_IP&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--ttl&lt;/span&gt; 60
&lt;span class="c"&gt;# ...wait for resolution...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And when the systemd wrapper still timed out mid-apply — leaving Caddy half-configured, HTTP routes loaded but the TLS app missing the new subject — bypass the wrapper and talk to Caddy directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# systemctl reload caddy → hangs, half-applies, exit 1&lt;/span&gt;
&lt;span class="c"&gt;# caddy reload directly → returns immediately&lt;/span&gt;
&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; caddy caddy reload &lt;span class="nt"&gt;--config&lt;/span&gt; /etc/caddy/Caddyfile &lt;span class="nt"&gt;--adapter&lt;/span&gt; caddyfile
&lt;span class="c"&gt;# EXIT=0&lt;/span&gt;
&lt;span class="c"&gt;# "certificate obtained successfully for learn.intentsolutions.io"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returned &lt;code&gt;0&lt;/code&gt; instantly and the cert came through. The lesson inside the lesson: the systemd unit was the slow part the whole time. Caddy reloads in well under a second; the wrapper was adding the timeout. End-to-end deploy SLA is around 22 seconds once the infrastructure exists.&lt;/p&gt;

&lt;p&gt;Site live. Valid TLS. &lt;code&gt;healthz&lt;/code&gt; green. Under an hour, start to finish.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reversal
&lt;/h2&gt;

&lt;p&gt;The clarification, when it came, was one sentence. The site was meant to be a place for the owner to study — a personal reference and notebook for AWS, Claude, Bedrock, and enterprise AI — not a public destination for an audience to learn alongside them.&lt;/p&gt;

&lt;p&gt;Wrong product. Not wrong execution — wrong product.&lt;/p&gt;

&lt;p&gt;Read the original phrase again: "learning hub, open to public." It has two opposite readings. One is a hub where &lt;em&gt;the public&lt;/em&gt; learns — outward-facing, audience-driven, the thing we built. The other is a hub where &lt;em&gt;the owner&lt;/em&gt; learns — a private notebook that happens to live on the open web. The first reading made the public the protagonist. The second makes the owner the only reader who matters.&lt;/p&gt;

&lt;p&gt;Everything we'd built assumed the first. The four-role audience triage sorted visitors who weren't coming. The property cards pitched to a reader who didn't exist for this site. The featured-content section curated for an audience of one who didn't want to be an audience. The &lt;code&gt;FAQPage&lt;/code&gt; schema answered questions nobody was going to ask. The brutalist CTA called an action that had no taker. The product layer was internally coherent and pointed entirely the wrong way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the reversal cost
&lt;/h2&gt;

&lt;p&gt;The frame inversion was a single commit: &lt;strong&gt;+436 / −687 across 16 files.&lt;/strong&gt; It deleted more than it added, which is the signature of a frame inversion rather than a feature change. You don't tweak a misread product; you demolish the part that encoded the misread.&lt;/p&gt;

&lt;p&gt;Five partials came out — &lt;code&gt;role-triage.html&lt;/code&gt;, &lt;code&gt;property-cards.html&lt;/code&gt;, &lt;code&gt;featured.html&lt;/code&gt;, &lt;code&gt;faq.html&lt;/code&gt;, &lt;code&gt;hero.html&lt;/code&gt; — along with the brutalist CTA that was the old &lt;code&gt;index.html&lt;/code&gt;. In came reading-optimized layouts — a &lt;code&gt;list.html&lt;/code&gt; topic landing and a &lt;code&gt;single.html&lt;/code&gt; note page with prev/next navigation. The CSS was rebuilt for long-form reading: a real typography scale, code blocks, tables, blockquotes, breadcrumbs. The dark theme and Inter typeface stayed because they were never the problem. We seeded a &lt;code&gt;/aws/&lt;/code&gt; section with canonical reference links and the first study note, "The AWS mental model," and rewrote the project's CLAUDE.md so the operating frame became "the owner studying," not "the owner teaching."&lt;/p&gt;

&lt;p&gt;A second commit — &lt;strong&gt;+212 / −3&lt;/strong&gt; — fleshed the home page into a curated link directory: 122 links across Anthropic &amp;amp; Claude (docs, cookbook, courses, MCP, prompt caching, vision, extended thinking, Trust Center, Privacy, Commercial Terms, AUP), Amazon Bedrock (Claude-on-Bedrock, Knowledge Bases, Agents, Guardrails, Prompt Management), AWS general, Enterprise AI (PrivateLink, HIPAA, Organizations, SCPs, Control Tower, Amazon Q), and a GDPR / EU data-protection section spanning both vendors — including the sharp note that Bedrock acting as an intermediate processor is &lt;em&gt;not&lt;/em&gt; in Anthropic's sub-processor list, because that relationship lives in the AWS contract, not Anthropic's.&lt;/p&gt;

&lt;p&gt;Now the ledger that explains why this didn't hurt:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Survived the reversal at zero cost&lt;/th&gt;
&lt;th&gt;Discarded&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub repo &lt;code&gt;learn-intentsolutions&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;role-triage.html&lt;/code&gt; partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VPS checkout + dist directory&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;property-cards.html&lt;/code&gt; partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Force-command deploy key&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;featured.html&lt;/code&gt; partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tailscale OIDC trust (repo-scoped)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;faq.html&lt;/code&gt; partial + &lt;code&gt;FAQPage&lt;/code&gt; schema&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caddy subdomain block&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;hero.html&lt;/code&gt; partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Porkbun DNS A-record&lt;/td&gt;
&lt;td&gt;Brutalist CTA &lt;code&gt;index.html&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Let's Encrypt TLS cert&lt;/td&gt;
&lt;td&gt;~687 lines of layouts + CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;deploy.yml&lt;/code&gt; CI workflow&lt;/td&gt;
&lt;td&gt;The entire "audience" frame&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every expensive, durable thing survived untouched. The deploy key didn't care that the product flipped. The TLS cert is for a domain, not a design. The CI workflow ships whatever is in &lt;code&gt;public/&lt;/code&gt;. What got thrown away was the cheap, reversible layer — markup, partials, copy. The whole reversal was about an hour of content rework. Not a rebuild from zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a clean pipeline can't save you
&lt;/h2&gt;

&lt;p&gt;This is the structural point, and it's the reason the post exists.&lt;/p&gt;

&lt;p&gt;The error lived at the requirements layer. The pipeline below it was faithful — and faithful is exactly the problem. A high-quality pipeline does not catch a misread spec. It executes the misread perfectly and hands you the wrong thing at production grade. We've watched the same shape from a different angle — &lt;a href="https://dev.to/posts/vite-dev-server-in-production-the-871-byte-tell/"&gt;a React app whose container shipped the Vite dev server to every visitor&lt;/a&gt;, green health checks and all. Every downstream step compounded the original reading: the plan assumed an audience, the IA sorted that audience, the schema described that audience, the deploy shipped it to that audience. Each step was correct relative to the one above it, and the one at the very top was wrong.&lt;/p&gt;

&lt;p&gt;The five declined design subagents are the tell. People reach for "more process at the execution layer" as the fix for shipping the wrong thing. It isn't. Those five subagents would have made the &lt;em&gt;audience&lt;/em&gt; version more beautiful — better triage copy, tighter property cards, a more polished FAQ. They would have polished the wrong product. Spending more at the execution layer when the defect is at the requirements layer just buys you a higher-fidelity mistake.&lt;/p&gt;

&lt;p&gt;Declining them was the right call. It made the build fast, and the speed didn't cause the miss — the miss was already baked in before the first subagent could have run. Execution-layer process spends its budget improving an artifact you've already committed to. It pays off only when the artifact is the right one. The cheap save here was never available at the execution layer; it was one clarifying question at the spec layer, which costs a sentence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The principle: couple to what you trust
&lt;/h2&gt;

&lt;p&gt;When you build something speculative — a new property, a first cut, anything where the spec might be a misread — the risk question is not "will I get the spec right?" You might not. The honest question is: &lt;strong&gt;what did I couple to the part I might get wrong?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Couple the expensive-and-durable to the cheap-and-reversible and a misread spec becomes a demolition — the blast radius is the whole system. Decouple them and the same misread is contained to the product layer: a content edit. That's the entire move:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expensive and durable&lt;/strong&gt; — repo, deploy key, OIDC trust, domain, DNS, TLS, CI. Annoying to build, annoying to rebuild. Make it product-agnostic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cheap and reversible&lt;/strong&gt; — IA, layouts, partials, copy, schema, color. Easy to throw away and redo.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The VPS-as-the-home pattern is product-agnostic &lt;em&gt;by design&lt;/em&gt; — it ships whatever is in &lt;code&gt;public/&lt;/code&gt; to whatever domain you point it at. It never knew or cared whether the site was a marketing hub or a private notebook. That indifference is the feature. Because the durable layer didn't encode the product frame, flipping the frame couldn't damage it.&lt;/p&gt;

&lt;p&gt;So the takeaway, stated flat: &lt;strong&gt;decouple the parts you can't cheaply rebuild from the parts you might have read wrong.&lt;/strong&gt; Then being wrong costs you content, not infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tradeoffs, honestly
&lt;/h3&gt;

&lt;p&gt;Decoupling is not free. The generic VPS-as-the-home pattern is more ceremony than &lt;code&gt;scp -r public/ server:/var/www&lt;/code&gt;. Force-command keys, repo-scoped OIDC, reusable workflows, DNS-before-reload ordering — that's real setup overhead, and most of it is invisible until the day you're wrong. If you're never wrong about a spec, you paid for insurance you didn't use. The payoff is asymmetric: small, constant cost; large, occasional save. On a speculative build, where "wrong spec" is a live possibility, the trade is worth it.&lt;/p&gt;

&lt;p&gt;And don't draw the wrong conclusion about the five design subagents we declined in the planning stage. Building direct was correct. The fix for this miss is not "use more agents" or "add a design review stage." Both spend at the execution layer, which is the wrong layer. The fix is a cheap clarifying question at the spec layer — "open to public, meaning the public reads it, or meaning it's your notebook that's publicly visible?" — which was one sentence away the entire time. Heavier execution is the expensive cure for a cheap disease.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also shipped
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;agent-governance-plane&lt;/strong&gt; released v0.1.44, anchored by a 546-line competitive-landscape analysis that did something most competitive docs don't: it corrected the project's own prior claims. A competitor's audit log turned out to be Merkle + HMAC — symmetric — not "merely tamper-evident" as we'd characterized it, which means the real differentiator isn't "they don't sign," it's that our signing is asymmetric and publicly verifiable via Ed25519. The doc also pinned the EU AI Act high-risk obligations to the date set by the &lt;a href="https://www.consilium.europa.eu/en/press/press-releases/2026/05/07/artificial-intelligence-council-and-parliament-agree-to-simplify-and-streamline-rules/" rel="noopener noreferrer"&gt;May 2026 Digital Omnibus agreement&lt;/a&gt; — 2027-12-02, pending Official Journal publication — rather than the often-cited 2026 deadline. The discipline worth naming: a competitive document is more useful when it corrects your own marketing than when it flatters it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;intentsolutions-vps-runbook&lt;/strong&gt; got its production alerting rebuilt ntfy-first. Six topics — health, uptime, backups, deploys, security, incidents — where routine alerts go to ntfy only and high/urgent plus every security event also hit Slack. The root cause of the old single-channel firehose was unglamorous: a &lt;code&gt;SLACK_WEBHOOK_FIREHOSE&lt;/code&gt; variable was never set, so every alert fell back to one webhook. The rebuild added an &lt;code&gt;llm_normalize()&lt;/code&gt; step (Groq → NVIDIA → raw passthrough fallback chain, hard 6-second timeout so it never blocks an alert) to render alerts in plain English, plus a 398-line operator handover packet for a new DevOps owner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/vite-dev-server-in-production-the-871-byte-tell/"&gt;The Vite Dev Server in Production: The 871-Byte Tell&lt;/a&gt; — another story of shipping the wrong thing to production cleanly, and the small artifact that finally gave it away.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/server-ops-mcp-safety-before-tools/"&gt;Server-Ops MCP: Safety Before Tools&lt;/a&gt; — the same instinct as the force-command deploy key: constrain the blast radius before you hand anything the keys.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/self-expiring-report-only-ci-gates/"&gt;Self-Expiring, Report-Only CI Gates&lt;/a&gt; — on putting process spend where it actually pays instead of where it feels productive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;{&lt;br&gt;
  "&lt;a class="mentioned-user" href="https://dev.to/context"&gt;@context&lt;/a&gt;": "&lt;a href="https://schema.org" rel="noopener noreferrer"&gt;https://schema.org&lt;/a&gt;",&lt;br&gt;
  "@type": "BlogPosting",&lt;br&gt;
  "headline": "The Wrong Product, Built Perfectly",&lt;br&gt;
  "description": "Why a misread specification arrives faithfully at production — and how decoupling durable infrastructure from the product frame makes a total reversal cost content, not a rebuild.",&lt;br&gt;
  "author": { "@type": "Person", "name": "Jeremy Longshore" },&lt;br&gt;
  "publisher": { "@type": "Organization", "name": "Start AI Tools" },&lt;br&gt;
  "datePublished": "2026-06-05T08:00:00-05:00",&lt;br&gt;
  "dateModified": "2026-06-05T08:00:00-05:00",&lt;br&gt;
  "url": "&lt;a href="https://startaitools.com/posts/the-wrong-product-built-perfectly/" rel="noopener noreferrer"&gt;https://startaitools.com/posts/the-wrong-product-built-perfectly/&lt;/a&gt;",&lt;br&gt;
  "keywords": "architecture, devops, claude-code, hugo, deployment, requirements, infrastructure decoupling, blast radius"&lt;br&gt;
}&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>devops</category>
      <category>claudecode</category>
      <category>hugo</category>
    </item>
    <item>
      <title>From one adopter to two: the discovery-affordance spec just got named</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Sat, 06 Jun 2026 13:00:21 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/from-one-adopter-to-two-the-discovery-affordance-spec-just-got-named-24nd</link>
      <guid>https://dev.to/jeremy_longshore/from-one-adopter-to-two-the-discovery-affordance-spec-just-got-named-24nd</guid>
      <description>&lt;p&gt;A few weeks ago I shipped schema 3.4.0 on &lt;a href="https://github.com/jeremylongshore/claude-code-plugins-plus-skills" rel="noopener noreferrer"&gt;claude-code-plugins-plus-skills&lt;/a&gt;. The headline feature: a single machine-crawlable manifest indexing 2,783 skills, ~97 KB gzipped. Any client — a marketplace UI, a CLI installer, a federated search index — can browse or search the catalog without scraping the repo directory by directory.&lt;/p&gt;

&lt;p&gt;The point of the manifest wasn't "here is a better README." It was a bet that skill ecosystems past a few hundred entries need an affordance for clients, not just humans. Without one, every new tool has to write a custom crawler against each repo's layout. With one, the work happens once at the spec layer and any compliant index is queryable.&lt;/p&gt;

&lt;p&gt;I shipped it. One launch consumer (mine). Whether the bet held depended entirely on whether anyone else built against the same shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  The validation moment
&lt;/h2&gt;

&lt;p&gt;Two weeks ago &lt;code&gt;latentloop07&lt;/code&gt; — building &lt;a href="https://github.com/luna-prompts/skillnote" rel="noopener noreferrer"&gt;skillnote&lt;/a&gt; at luna-prompts — opened &lt;a href="https://github.com/sickn33/antigravity-awesome-skills/issues/596" rel="noopener noreferrer"&gt;issue #596&lt;/a&gt; on &lt;a href="https://github.com/sickn33/antigravity-awesome-skills" rel="noopener noreferrer"&gt;sickn33/antigravity-awesome-skills&lt;/a&gt;. The repo is a 39,715-star skills library, currently around 1,500 skills across Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and others. The issue title: &lt;em&gt;"at 1,400 skills, how are users actually picking which to install?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The body lays out the discovery problem at scale, then names my work as the existing model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;jeremylongshore just shipped one for his 2,783-skill &lt;code&gt;claude-code-plugins-plus-skills&lt;/code&gt; collection (schema 3.4.0, 97 KB gzipped) and it's the cleanest path we've seen for federated discovery so far.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I didn't file the issue. I didn't post in the thread. The pattern got picked up because the artifact was there to point at.&lt;/p&gt;

&lt;p&gt;sickn33 — the repo owner — responded by shipping it. The closeout comment is the part worth reading verbatim:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We addressed it on &lt;code&gt;main&lt;/code&gt; with a stable discovery-manifest contract rather than asking clients or users to scrape the full repository: kept the root &lt;code&gt;skills_index.json&lt;/code&gt; as the canonical public manifest in the existing array format; mirrored that manifest exactly to &lt;code&gt;data/skills_index.json&lt;/code&gt; for compatibility with integrations that already look under &lt;code&gt;data/&lt;/code&gt;; added &lt;code&gt;schemas/skills-index.v1.schema.json&lt;/code&gt; so downstream consumers can validate the manifest shape; added &lt;code&gt;docs/users/discovery-manifest.md&lt;/code&gt; documenting the canonical root manifest, compatibility mirror, array compatibility, lazy-loading pattern, and the expectation that clients should not load every skill up front.&lt;/p&gt;

&lt;p&gt;That gives us the &lt;strong&gt;Jeremy Longshore-style discovery affordance&lt;/strong&gt; without replacing the existing canonical public manifest or forcing a breaking format change.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the line that made me pause. Not the adoption — the naming. A 39.7k-star repo's maintainer wrote "Jeremy Longshore-style discovery affordance" into a public closeout comment, into a file format that lives in the repo on &lt;code&gt;main&lt;/code&gt;, where future contributors will read it as the canonical reference for why the schema exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "spec, not opinion"
&lt;/h2&gt;

&lt;p&gt;Single-repo specs are easy to mistake for what they're not. Anyone can write a JSON schema, slap a version on it, and call it a standard. Until someone else builds against it, the schema is a strongly-held opinion with a &lt;code&gt;.json&lt;/code&gt; extension.&lt;/p&gt;

&lt;p&gt;MarketplaceJsonProvider — the consumer-side abstraction skillnote is building — needed at least two real upstream implementations before its shape could be trusted. With one source, the abstraction quietly inherits everything that source happens to do. With two, the abstraction has to actually hold up against both.&lt;/p&gt;

&lt;p&gt;As of the closeout, that's the state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;claude-code-plugins-plus-skills&lt;/code&gt; schema 3.4.0 — 2,783 skills, my repo&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sickn33/antigravity-awesome-skills&lt;/code&gt; &lt;code&gt;skills-index.v1.schema.json&lt;/code&gt; — ~1,500 skills, independent author&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;latentloop07's &lt;a href="https://github.com/luna-prompts/skillnote" rel="noopener noreferrer"&gt;skillnote&lt;/a&gt; is wiring its &lt;code&gt;MarketplaceJsonProvider&lt;/code&gt; adapter to ingest both from day one. That's the consumer side of the same spec story: two independent upstreams, one consumer abstraction, and the abstraction has to survive being implemented against both without leaking either schema's quirks.&lt;/p&gt;

&lt;p&gt;Two adopters is the floor, not the ceiling. But it's the line below which "spec" doesn't mean anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for tonsofskills.com users
&lt;/h2&gt;

&lt;p&gt;The practical effect for anyone using the &lt;a href="https://tonsofskills.com" rel="noopener noreferrer"&gt;tonsofskills.com&lt;/a&gt; marketplace: the catalog isn't trapped in a single-vendor index anymore.&lt;/p&gt;

&lt;p&gt;When a third-party tool wants to surface a skill from claude-code-plugins-plus-skills or from antigravity-awesome-skills, it doesn't need a custom integration for either. It reads the manifest, filters by category or risk profile, and lazy-loads only the specific &lt;code&gt;SKILL.md&lt;/code&gt; bodies it needs to render. The marketplace UX gets to focus on browse-and-install instead of crawl-and-cache.&lt;/p&gt;

&lt;p&gt;For end users that translates to one thing: discovery scales past the point where reading a README is feasible. If you've ever tried to find the right skill in a 1,400-entry directory tree without a search box, you've felt the gap the manifest closes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next: &lt;code&gt;install_source_url&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The current schema lets a consumer find a skill. It doesn't yet tell the consumer where to install it from in a fully machine-readable way. The convention right now is heuristic — strip the path, prepend the GitHub raw URL, assume the layout. Works most of the time. Fails silently the rest.&lt;/p&gt;

&lt;p&gt;The next schema-feature is a per-entry &lt;code&gt;install_source_url&lt;/code&gt; field: an explicit, validated URL pointing to the raw &lt;code&gt;SKILL.md&lt;/code&gt; (or skill bundle) that a client can fetch and install verbatim. No path heuristics. No "assume the repo's layout follows convention X." The skill says where it lives, and the installer respects that.&lt;/p&gt;

&lt;p&gt;Why it matters past the immediate convenience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vendored skills become first-class.&lt;/strong&gt; A skill that lives outside the manifesting repo (mirrored, forked, hosted on a CDN) can declare its real install URL without the manifest having to lie about location.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-repo installs become safe.&lt;/strong&gt; A consumer that pulls from multiple upstreams doesn't have to maintain a per-upstream URL-construction rule. Each entry self-describes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit trails get cleaner.&lt;/strong&gt; "Where did this skill come from" becomes a single field lookup, not an inference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;latentloop07 mentioned filing the &lt;code&gt;install_source_url&lt;/code&gt; schema-feature request against my repo this week, sketched against the real sickn33 dataset. That's the next milestone for the spec — the first field added under multi-adopter pressure rather than single-vendor opinion. Different bar.&lt;/p&gt;

&lt;h2&gt;
  
  
  The take-home
&lt;/h2&gt;

&lt;p&gt;If you write a spec, you find out fast whether anyone wanted it. The signal isn't stars or downloads — it's whether somebody else, working independently, builds against the same shape and names it the same thing. That's the moment an opinion stops being yours.&lt;/p&gt;

&lt;p&gt;In this case, the moment came with a verbatim closeout comment on a 39.7k-star repo and a third-party consumer wiring both upstreams into one abstraction. Two adopters, one spec, one consumer in progress. The next test is whether &lt;code&gt;install_source_url&lt;/code&gt; survives the same scrutiny — added because both implementers want the same field, not because one repo's maintainer thought it'd be nice.&lt;/p&gt;

&lt;p&gt;I'll write up the &lt;code&gt;install_source_url&lt;/code&gt; design when the schema PR lands. In the meantime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The spec lives at &lt;a href="https://github.com/jeremylongshore/claude-code-plugins-plus-skills" rel="noopener noreferrer"&gt;claude-code-plugins-plus-skills&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The marketplace surfacing the catalog is at &lt;a href="https://tonsofskills.com" rel="noopener noreferrer"&gt;tonsofskills.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The thread that named the pattern: &lt;a href="https://github.com/sickn33/antigravity-awesome-skills/issues/596" rel="noopener noreferrer"&gt;sickn33/antigravity-awesome-skills#596&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The skillnote adapter (consumer side): &lt;a href="https://github.com/luna-prompts/skillnote" rel="noopener noreferrer"&gt;luna-prompts/skillnote&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PRs and feedback welcome.&lt;/p&gt;

</description>
      <category>spec</category>
      <category>ecosystem</category>
      <category>claudecode</category>
      <category>marketplaces</category>
    </item>
    <item>
      <title>Vite Dev Server in Production: The 871-Byte Tell</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Sat, 06 Jun 2026 03:36:41 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/vite-dev-server-in-production-the-871-byte-tell-1ogi</link>
      <guid>https://dev.to/jeremy_longshore/vite-dev-server-in-production-the-871-byte-tell-1ogi</guid>
      <description>&lt;h2&gt;
  
  
  The 871-byte tell
&lt;/h2&gt;

&lt;p&gt;A user reported scorecardecho.com was slow and "had to be refreshed" to render. The site fronts a React SPA behind Caddy, behind Docker Compose, behind a single VPS — a stack that had been "fine" for months. One &lt;code&gt;curl -s https://scorecardecho.com/ | head -40&lt;/code&gt; answered the ticket in under a minute. The HTML shell was 871 bytes. It contained &lt;code&gt;&amp;lt;script type="module" src="/@vite/client"&amp;gt;&lt;/code&gt;, &lt;code&gt;/@react-refresh&lt;/code&gt;, and a literal &lt;code&gt;src="/src/main.tsx"&lt;/code&gt; reference. That output is not a production response. That is the Vite dev server, exposed straight to the public internet, transpiling raw TypeScript per-request for every visitor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three smoking guns (the 60-second checklist)
&lt;/h2&gt;

&lt;p&gt;Each signal below is independent. Any one of them is enough to know. All three together is diagnosis-complete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signal 1 — the container's own CMD.&lt;/strong&gt; Inspect what the running image was actually told to do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker inspect frontend &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{{.Config.Cmd}}'&lt;/span&gt;
&lt;span class="c"&gt;# [npm run dev -- --host]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A production container has no business running &lt;code&gt;npm run dev&lt;/code&gt;. That string is the smoking gun in the container definition itself, surviving every restart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signal 2 — the port mapping.&lt;/strong&gt; Compose binds the framework's default development port:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose ps
&lt;span class="c"&gt;# frontend ... 127.0.0.1:5173-&amp;gt;5173/tcp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vite's dev port is 5173. Next.js dev is 3000. CRA is 3000. webpack-dev-server is 8080. If the port mapping matches a framework default-dev port (not a chosen runtime port like 80 or 8080-for-nginx), the container was built from a dev recipe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signal 3 — the HTML payload.&lt;/strong&gt; The framework injects dev-only client scripts into the page shell. One grep tells the whole story:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://scorecardecho.com/ | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'@vite/client|@react-refresh|webpack-hmr|src/main'&lt;/span&gt;
&lt;span class="c"&gt;# &amp;lt;script type="module" src="/@vite/client"&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;# &amp;lt;script type="module"&amp;gt;import RefreshRuntime from "/@react-refresh" ...&lt;/span&gt;
&lt;span class="c"&gt;# &amp;lt;script type="module" src="/src/main.tsx?t=1748394610234"&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A built bundle has none of these. A dev server cannot avoid them — they are the runtime hooks that HMR depends on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;The Dockerfile was a single-stage &lt;code&gt;node:22-slim&lt;/code&gt; image with &lt;code&gt;CMD ["npm", "run", "dev", "--", "--host"]&lt;/code&gt;. It worked on a laptop. Someone (the project's own past self) copied it into production months ago because the site rendered, the port forwarded, and Caddy in front returned &lt;code&gt;200 OK&lt;/code&gt; on every probe. Compose layered on two volume mounts — &lt;code&gt;./frontend/src:/app/src&lt;/code&gt; and &lt;code&gt;./frontend/index.html:/app/index.html&lt;/code&gt; — which made HMR-driven local development feel instant. In production, those same mounts shadowed any &lt;code&gt;dist/&lt;/code&gt; directory the container might have built, guaranteeing the dev server was the only thing that could ever serve a request. Nothing alarmed because nothing was failing — the site was just doing transpile-on-demand for every visitor, every page load, forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: a two-stage Dockerfile
&lt;/h2&gt;

&lt;p&gt;The new &lt;code&gt;frontend/Dockerfile&lt;/code&gt; builds the SPA in one stage and serves the static &lt;code&gt;dist/&lt;/code&gt; in a second:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;-FROM node:22-slim AS base
+FROM node:22-slim AS build
 WORKDIR /app

 COPY package.json package-lock.json ./
 RUN npm ci

 COPY . .
+RUN npm run build
+
+FROM node:22-slim AS runtime
+WORKDIR /app
+
+# `serve` is a tiny static file server with built-in SPA fallback
+# (rewrites every unknown path to /index.html via `-s`).
+RUN npm install -g serve@14
+
+COPY --from=build /app/dist ./dist

 EXPOSE 5173
-CMD ["npm", "run", "dev", "--", "--host"]
+CMD ["serve", "-s", "dist", "-l", "5173", "-L"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three flags carry the load. &lt;code&gt;-s&lt;/code&gt; is "single-page application" mode: any path that does not resolve to a real file in &lt;code&gt;dist/&lt;/code&gt; rewrites to &lt;code&gt;/index.html&lt;/code&gt;, so the React Router routes survive a hard refresh. &lt;code&gt;-l 5173&lt;/code&gt; keeps the listen port identical to the dev configuration — every upstream piece (Caddy block on the VPS, compose port map, internal service name) keeps working with zero coordinated change. &lt;code&gt;-L&lt;/code&gt; is &lt;code&gt;serve&lt;/code&gt;'s &lt;code&gt;--no-request-logging&lt;/code&gt; switch; it suppresses the per-request log line the process would otherwise emit, which keeps the container's stdout channel quiet enough that real anomalies (a sudden burst of 4xxs, a backend exception leaking through) are not buried under access-log noise. The port was kept deliberately. Picking 80 or 3000 would have meant editing the Caddyfile and the compose file in the same deploy — a coordinated change for no benefit. Keeping 5173 made the switch a one-file diff at the perimeter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The compose change
&lt;/h2&gt;

&lt;p&gt;The two source-mount volumes had to go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./frontend&lt;/span&gt;
     &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
&lt;span class="s"&gt;+&lt;/span&gt;    &lt;span class="c1"&gt;# Production: built static bundle served by `serve` (no dev server).&lt;/span&gt;
     &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;VITE_BACKEND_URL=http://backend:3001&lt;/span&gt;
     &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1:5173:5173"&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt;    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt;      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./frontend/src:/app/src&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt;      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./frontend/index.html:/app/index.html&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those mounts are HMR-essential in development — they are how a file save on the host instantly reflects in the running container. In production they actively shadow whatever the multi-stage image built into &lt;code&gt;/app/dist&lt;/code&gt;. Without removing them, the new build stage would still produce a real bundle, the runtime stage would still copy it into &lt;code&gt;dist/&lt;/code&gt;, and then compose would lay raw source files right back over the top of the runtime filesystem the instant the container started. The container would silently revert to transpile-on-demand from the mounted source — same behavior as before, just with extra build time and a confused operator wondering why the fix did not take. Removing the volumes is half the fix; the Dockerfile is the other half. Neither one in isolation would have worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification: predicted before / after
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Before (Vite dev server)&lt;/th&gt;
&lt;th&gt;After (built bundle)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SPA shell HTML&lt;/td&gt;
&lt;td&gt;~871 bytes (dev injects &lt;code&gt;/@vite/client&lt;/code&gt;, &lt;code&gt;/@react-refresh&lt;/code&gt;, &lt;code&gt;src/main.tsx&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;~250 bytes (minified &lt;code&gt;index.html&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JS payload&lt;/td&gt;
&lt;td&gt;N raw &lt;code&gt;.tsx&lt;/code&gt; modules, transpiled per-request&lt;/td&gt;
&lt;td&gt;one minified bundle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container memory&lt;/td&gt;
&lt;td&gt;~136 MB (Node + Vite process)&lt;/td&gt;
&lt;td&gt;~30 MB (just &lt;code&gt;serve&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTFB on &lt;code&gt;/&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;~370 ms&lt;/td&gt;
&lt;td&gt;single-digit ms (Caddy → &lt;code&gt;serve&lt;/code&gt; ≈ sendfile)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HMR WebSocket&lt;/td&gt;
&lt;td&gt;attempted; periodically renders blocked&lt;/td&gt;
&lt;td&gt;no WebSocket at all&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The HMR WebSocket line is the one users were feeling. Vite's client repeatedly tried to attach an HMR socket through Caddy, which had no &lt;code&gt;handle&lt;/code&gt; block for the upgrade. The attempt eventually timed out, sometimes mid-render, leaving the page partly hydrated — which is the exact "have to refresh" symptom that surfaced the bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would have caught this earlier
&lt;/h2&gt;

&lt;p&gt;Three of the five rows in the verification table are observable from outside the container, which means every one of them is automatable as a deploy-time smoke test. A post-deploy GitHub Actions step running against the public URL would have failed loudly on day one of the misconfiguration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fail the deploy if the production shell still contains dev-mode hooks.&lt;/span&gt;
&lt;span class="nv"&gt;HTML&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; &lt;span class="s2"&gt;"https://scorecardecho.com/"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HTML&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s1"&gt;'@vite/client|@react-refresh|webpack-hmr|sockjs-node'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAIL: production HTML contains dev-server hooks"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Sanity-check the shell size. A built SPA index.html for a small site&lt;/span&gt;
&lt;span class="c"&gt;# lands in the low hundreds of bytes; a dev-injected shell is consistently&lt;/span&gt;
&lt;span class="c"&gt;# north of 700. Tune THRESHOLD to your own built index.html size + headroom&lt;/span&gt;
&lt;span class="c"&gt;# (this site's built shell is ~250 bytes, dev-injected was 871 — 600 is the&lt;/span&gt;
&lt;span class="c"&gt;# midpoint with room to grow before the threshold itself needs revisiting).&lt;/span&gt;
&lt;span class="nv"&gt;THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;600
&lt;span class="nv"&gt;SIZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'%s'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HTML&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SIZE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$THRESHOLD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAIL: shell HTML is &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SIZE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; bytes (&amp;gt; &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;THRESHOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;) — looks dev-injected"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six lines of shell, run once after every deploy, would have caught the regression the same hour it landed instead of waiting for a user to file a "feels slow" ticket months later. The smoke test does not require introspection inside the container; it does not need access to the VPS; it does not need credentials. It treats the production URL as the contract and asserts the contract holds. That is the cheapest possible enforcement layer, and it is the layer that should have existed from the first deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The transferable lesson
&lt;/h2&gt;

&lt;p&gt;The three-signal checklist works on every Node-based SPA container, not just Vite ones. Next.js dev injects &lt;code&gt;/_next/static/chunks/webpack.js&lt;/code&gt; with an HMR client. CRA dev injects &lt;code&gt;webpack-dev-server/client/index.js?protocol=ws&lt;/code&gt;. webpack-dev-server in any flavor injects &lt;code&gt;/sockjs-node&lt;/code&gt;. The signature is always the same: framework injects dev-only HMR client into the HTML shell, default dev port is exposed on the host, the CMD line still says "dev." Run &lt;code&gt;curl -s https://&amp;lt;your-site&amp;gt;/ | grep -E '@vite/client|webpack-hmr|_next/static/chunks/webpack|sockjs-node|src/main\.(tsx|jsx)'&lt;/code&gt; against your own production URL right now. If any line comes back, the container is shipping its dev server to every visitor, and the fix is a multi-stage Dockerfile that costs nothing but stays structurally invisible until somebody looks.&lt;/p&gt;

</description>
      <category>vite</category>
      <category>docker</category>
      <category>production</category>
      <category>diagnostics</category>
    </item>
    <item>
      <title>The Unicode Layer Your Validator Can't See</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Thu, 28 May 2026 02:41:56 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/the-unicode-layer-your-validator-cant-see-35m1</link>
      <guid>https://dev.to/jeremy_longshore/the-unicode-layer-your-validator-cant-see-35m1</guid>
      <description>&lt;p&gt;A schema validator reads parsed structure. It never sees the bytes.&lt;/p&gt;

&lt;p&gt;That gap is where a whole class of supply-chain attack lives. The &lt;code&gt;claude-code-plugins&lt;/code&gt; marketplace already ran a schema validator over every skill, agent, command, and catalog file — confirming required fields, enum values, shapes. All of it operating on the &lt;em&gt;parsed&lt;/em&gt; document. None of it looking at the raw codepoints underneath.&lt;/p&gt;

&lt;p&gt;An attacker can hide an instruction in characters that are invisible to a human reviewer and invisible to a structural validator, yet fully meaningful to an LLM that parses the file as text — or to a shell that executes a line copied out of it. The reviewer sees &lt;code&gt;npm install left-pad&lt;/code&gt;. The model sees something else.&lt;/p&gt;

&lt;p&gt;On 2026-05-24 the Socket "TrapDoor" advisory described exactly this: invisible Unicode tag characters smuggling instructions into LLM-parsed content. The same day, we shipped a CI gate for it — and folded in the older Trojan Source class (CVE-2021-42574) while we were at it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The threat model has three shapes
&lt;/h2&gt;

&lt;p&gt;The detection model is built around three distinct attack surfaces, each with a different blast radius and a different rate of legitimate false positives. That asymmetry is the whole reason the gate is tiered instead of binary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tag characters (U+E0000–U+E007F).&lt;/strong&gt; These render as nothing. A human sees an empty span. An LLM reading the file as a token stream reads them as text — they can carry a complete hidden instruction inside what looks like an innocent line of documentation. This is the TrapDoor vector. There is no legitimate reason for a tag character to appear in a skill file. Unambiguous attack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bidirectional (bidi) overrides and isolates (U+202A–U+202E, U+2066–U+2069).&lt;/strong&gt; Trojan Source. These reorder how text &lt;em&gt;renders&lt;/em&gt; without changing how it &lt;em&gt;parses&lt;/em&gt;. The classic demonstration: code that displays as a benign comment to a reviewer but executes as an active statement. Renders as one thing, parses as another. Also unambiguous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Homoglyphs.&lt;/strong&gt; A Cyrillic &lt;code&gt;а&lt;/code&gt; (U+0430) is pixel-identical to a Latin &lt;code&gt;a&lt;/code&gt; (U+0061) in most fonts. Drop one into &lt;code&gt;npm install pаckage&lt;/code&gt; and the reviewer reads the right name while the resolver fetches a different one. This is a real attack — but mixing scripts is also completely normal in human prose. Cyrillic next to Latin in a sentence is not suspicious. The signal only matters in a narrow context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The artifact
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;scripts/validate-unicode-hygiene.py&lt;/code&gt; — 317 lines, stdlib only. &lt;code&gt;argparse&lt;/code&gt;, &lt;code&gt;pathlib&lt;/code&gt;, &lt;code&gt;re&lt;/code&gt;, &lt;code&gt;unicodedata&lt;/code&gt;, &lt;code&gt;dataclasses&lt;/code&gt;. No third-party dependency, nothing to pin, nothing to audit transitively. The detection rules are original, derived from the public Unicode Standard, the CVE-2021-42574 advisory, and the TrapDoor advisory. Not a fork of any existing scanner.&lt;/p&gt;

&lt;p&gt;The codepoint classes map straight onto the threat model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TAG_CHARS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xE0000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0xE0080&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# exclusive end -&amp;gt; U+E0000-U+E007F inclusive
&lt;/span&gt;&lt;span class="n"&gt;BIDI_CONTROLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="mh"&gt;0x202A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x202B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x202C&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x202D&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x202E&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                           &lt;span class="mh"&gt;0x2066&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x2067&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x2068&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x2069&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;ZERO_WIDTH_MAJOR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="mh"&gt;0x200B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x200C&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x200D&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x2060&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0xFEFF&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;OTHER_INVISIBLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="mh"&gt;0x00AD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x034F&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x115F&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x1160&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x17B4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x17B5&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;HOMOGLYPH_SCRIPTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cyrillic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Greek&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Armenian&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cherokee&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tiered severity, not a single verdict
&lt;/h2&gt;

&lt;p&gt;A binary pass/fail gate forces a bad choice: either it's too loud (every zero-width space halts the build) or too quiet (you skip the ambiguous cases and miss the homoglyph in an install line). Three tiers resolve the tension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BLOCKER&lt;/strong&gt; — tag characters and bidi controls. These fail CI by default. There is no benign use, so there is no false-positive cost to refusing them outright.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAJOR&lt;/strong&gt; — zero-width and format characters (U+200B, U+200C, U+200D, U+2060, U+FEFF) anywhere they don't belong, plus other invisibles like soft hyphen (U+00AD), combining grapheme joiner (U+034F), Hangul fillers, and Khmer zero-width vowels. These &lt;em&gt;can&lt;/em&gt; be attacks, but they also show up in legitimate-if-messy authoring. So they warn by default and only fail under &lt;code&gt;--strict&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MINOR&lt;/strong&gt; — mixed-script identifiers, but scoped hard. The homoglyph pass fires only inside URLs, package-manager install lines, and code-fence language tags. Never on prose.&lt;/p&gt;

&lt;p&gt;That scoping is a deliberate design call. The line patterns the homoglyph pass inspects: &lt;code&gt;https?://&lt;/code&gt;, &lt;code&gt;npm&lt;/code&gt;/&lt;code&gt;pnpm&lt;/code&gt;/&lt;code&gt;yarn&lt;/code&gt;/&lt;code&gt;bun install&lt;/code&gt;, &lt;code&gt;pip&lt;/code&gt;/&lt;code&gt;uv install&lt;/code&gt;, &lt;code&gt;cargo install&lt;/code&gt;/&lt;code&gt;add&lt;/code&gt;, &lt;code&gt;brew&lt;/code&gt;/&lt;code&gt;gem&lt;/code&gt;/&lt;code&gt;composer&lt;/code&gt;/&lt;code&gt;go install&lt;/code&gt;, and &lt;code&gt;gh repo clone&lt;/code&gt;. Those are the lines a reader copies and runs. Prose is left alone because flagging every Greek letter in a math explanation would bury the one finding that matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Severity tiers at a glance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Character class&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;False-positive cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BLOCKER&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tag chars, bidi controls&lt;/td&gt;
&lt;td&gt;U+E0000–U+E007F, U+202A–U+202E&lt;/td&gt;
&lt;td&gt;Fail CI immediately&lt;/td&gt;
&lt;td&gt;None — no legitimate use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MAJOR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero-width / format chars&lt;/td&gt;
&lt;td&gt;U+200B, U+FEFF (non-BOM), U+00AD&lt;/td&gt;
&lt;td&gt;Warn by default; fail under &lt;code&gt;--strict&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Possible in legitimate authoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MINOR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mixed-script identifiers&lt;/td&gt;
&lt;td&gt;Cyrillic &lt;code&gt;а&lt;/code&gt;, Greek &lt;code&gt;α&lt;/code&gt; in URLs / install lines&lt;/td&gt;
&lt;td&gt;Warn in narrow contexts only&lt;/td&gt;
&lt;td&gt;Low — scoped to package lines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The BOM exception
&lt;/h3&gt;

&lt;p&gt;One codepoint sits on a fence. U+FEFF is a byte-order mark when it's the very first byte of a file — legitimate. The same codepoint anywhere else is a zero-width no-break space, which is exactly the kind of invisible an attacker reaches for. So the rule grants a pass to exactly one position and flags every other occurrence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ZERO_WIDTH_MAJOR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# A single U+FEFF at the very first byte of the file is a
&lt;/span&gt;    &lt;span class="c1"&gt;# legitimate BOM and gets a pass.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mh"&gt;0xFEFF&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;line_no&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;col_idx&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Finding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAJOR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zero-width-or-format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Position-aware, not codepoint-aware. The byte is fine at offset 0 and suspect everywhere else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Make the invisible visible in the log
&lt;/h2&gt;

&lt;p&gt;A finding that says "MAJOR at line 14, column 22" is useless if the reviewer opens the file and sees nothing there — because the offending character is, by definition, invisible. Every finding carries &lt;code&gt;file:line:column&lt;/code&gt;, the codepoint's &lt;code&gt;unicodedata.name()&lt;/code&gt; label, the rule name, and a ~32-character context window with every invisible escaped to &lt;code&gt;&amp;lt;U+XXXX&amp;gt;&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_escape_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TAG_CHARS&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;BIDI_CONTROLS&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ZERO_WIDTH_MAJOR&lt;/span&gt; \
           &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;OTHER_INVISIBLE&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mh"&gt;0x20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;out_chars&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;U+&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;04&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;out_chars&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out_chars&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the CI log shows &lt;code&gt;npm install p&amp;lt;U+0430&amp;gt;ckage&lt;/code&gt; instead of a line that looks identical to the clean one. The reviewer can actually see the attack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ship report-only, then ratchet
&lt;/h2&gt;

&lt;p&gt;The gate has two rollout switches. &lt;code&gt;--warn-only&lt;/code&gt; always exits 0 — it reports findings without failing the build, for the window where you're still learning what's in the corpus. &lt;code&gt;--strict&lt;/code&gt; flips MAJOR findings into build failures once you've cleaned up the known-benign noise.&lt;/p&gt;

&lt;p&gt;This is the same self-expiring report-only pattern we use elsewhere: land a gate in advisory mode, let it observe production traffic, then enforce once you've proven it won't false-positive your own contributors into a wall. BLOCKER fires from day one because it has no false-positive cost; MAJOR waits behind &lt;code&gt;--strict&lt;/code&gt; until the corpus is clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  The result
&lt;/h2&gt;

&lt;p&gt;Wired into &lt;code&gt;.github/workflows/validate-plugins.yml&lt;/code&gt; next to the existing schema validator. Scanned 4,776 files.&lt;/p&gt;

&lt;p&gt;Zero blockers. Clean main.&lt;/p&gt;

&lt;p&gt;Eight MAJOR findings — and the honest detail is the interesting one. All eight traced to a single community-contributed file that intentionally used a zero-width space inside a fenced code block as a rendering workaround. Not an attack. A legitimate-but-messy authoring choice. That is precisely why MAJOR sits behind &lt;code&gt;--strict&lt;/code&gt; and isn't flipped on yet: the ratchet waits until that one file is cleaned up, so the first enforced run doesn't punish a contributor for a cosmetic hack.&lt;/p&gt;

&lt;p&gt;Tests cover the boundaries with six byte-precise fixtures — &lt;code&gt;blocker-tag-chars&lt;/code&gt;, &lt;code&gt;blocker-bidi-override&lt;/code&gt;, &lt;code&gt;bom-allowed&lt;/code&gt;, &lt;code&gt;clean-skill&lt;/code&gt;, &lt;code&gt;major-zero-width&lt;/code&gt;, &lt;code&gt;minor-homoglyph-install&lt;/code&gt; — driving an 8-test regression suite in &lt;code&gt;tests/test_validate_unicode_hygiene.py&lt;/code&gt;. The whole thing shipped as PR #777 closing #776: ~317 lines of validator, one workflow edit, one test file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also shipped
&lt;/h2&gt;

&lt;p&gt;Same day, part of a wider CI-hardening campaign:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An after-action review (PR #775) closed a 2026-05-22→24 hardening sequence: 11 PRs landing v4.32.0 with 10 blocking required gates and zero report-only, plus cleanup of 974 Python errors, 223 shellcheck warnings, ~60k markdown issues, and 970 MB freed.&lt;/li&gt;
&lt;li&gt;Doc-quality gate "round 2" in two other repos (&lt;code&gt;intentional-cognition-os&lt;/code&gt;, &lt;code&gt;qmd-team-intent-kb&lt;/code&gt;): fixed Vale by scoping via per-directory &lt;code&gt;.vale.ini&lt;/code&gt; sections instead of the action's broken single-path &lt;code&gt;files:&lt;/code&gt; input, and lychee by passing &lt;code&gt;.&lt;/code&gt; as the required positional argument. Scoping refinements, not policy loosening — the gates stay BLOCKING.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;intentional-cognition-os&lt;/code&gt; also got an &lt;code&gt;ico audit verify&lt;/code&gt; SHA-256 chain verifier. The audit log had carried a tamper-evident hash chain since launch, but nothing actually walked it — so the tamper-evidence was theoretical. Now &lt;code&gt;ico audit verify&lt;/code&gt; exits 2 on &lt;code&gt;AUDIT_TAMPERED&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/self-expiring-report-only-ci-gates/"&gt;Self-Expiring Report-Only CI Gates&lt;/a&gt; — the &lt;code&gt;--warn-only&lt;/code&gt; → &lt;code&gt;--strict&lt;/code&gt; ratchet is the same advisory-to-enforced pattern, generalized.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/server-ops-mcp-safety-before-tools/"&gt;Safety Model First: 16-Tool Ops MCP in One Day&lt;/a&gt; — designing the threat model before the surface, applied to an ops MCP server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;{&lt;br&gt;
  "&lt;a class="mentioned-user" href="https://dev.to/context"&gt;@context&lt;/a&gt;": "&lt;a href="https://schema.org" rel="noopener noreferrer"&gt;https://schema.org&lt;/a&gt;",&lt;br&gt;
  "@type": "BlogPosting",&lt;br&gt;
  "headline": "The Unicode Layer Your Validator Can't See",&lt;br&gt;
  "description": "Schema validation can't see invisible Unicode. A stdlib-only CI gate that catches tag-char injection, Trojan Source bidi overrides, and homoglyph attacks.",&lt;br&gt;
  "image": "&lt;a href="https://startaitools.com/images/og-image.png" rel="noopener noreferrer"&gt;https://startaitools.com/images/og-image.png&lt;/a&gt;",&lt;br&gt;
  "datePublished": "2026-05-24",&lt;br&gt;
  "author": {&lt;br&gt;
    "@type": "Person",&lt;br&gt;
    "name": "Jeremy Longshore"&lt;br&gt;
  },&lt;br&gt;
  "publisher": {&lt;br&gt;
    "@type": "Organization",&lt;br&gt;
    "name": "Start AI Tools"&lt;br&gt;
  },&lt;br&gt;
  "url": "&lt;a href="https://startaitools.com/posts/unicode-hygiene-gate-same-day-trapdoor-defense/" rel="noopener noreferrer"&gt;https://startaitools.com/posts/unicode-hygiene-gate-same-day-trapdoor-defense/&lt;/a&gt;"&lt;br&gt;
}&lt;/p&gt;

</description>
      <category>cicd</category>
      <category>python</category>
      <category>security</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>Self-Expiring Report-Only CI Gates: From Advisory to Enforced</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Wed, 27 May 2026 13:00:40 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/self-expiring-report-only-ci-gates-from-advisory-to-enforced-3c2c</link>
      <guid>https://dev.to/jeremy_longshore/self-expiring-report-only-ci-gates-from-advisory-to-enforced-3c2c</guid>
      <description>&lt;p&gt;Advisory CI gates are where good intentions go to die. A team adds a linter "in warning mode for now," and "for now" becomes forever. The violations scroll past in PR reviews, nobody cleans them, the gate never goes blocking. Six months later the warnings are archaeological noise.&lt;/p&gt;

&lt;p&gt;The pattern that breaks this cycle is simple but mechanical: every report-only gate carries a self-expiring deadline marker — &lt;code&gt;REPORT-ONLY-UNTIL: &amp;lt;date&amp;gt;&lt;/code&gt; — and a meta-gate script fails the build if any gate outlives its deadline. Advisory mode becomes temporary by construction. The second part: organize around one logical concern per PR, bulk-cleaning all violations for that concern in the same commit. Tightly-coupled tools — like eslint and prettier, or ruff and ruff-format — flip together because they share a single run step and a single class of violation. Unrelated gates never share a PR. This keeps each change reviewable and the blame surface small.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Self-Expiring Report-Only CI Gate?
&lt;/h2&gt;

&lt;p&gt;A self-expiring report-only CI gate is a quality check that runs in advisory mode — violations are warnings, not build failures — but carries an explicit deadline date. A meta-gate fails the build once that deadline passes, forcing the team to either clean the violations and flip the gate to blocking, or remove it. This prevents advisory gates from quietly becoming permanent technical debt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta-Gate: Enforced Deadlines
&lt;/h2&gt;

&lt;p&gt;The enforcement script is small but decisive. It lives in the workflow as a step that runs before any code cleanup and fails the build if any gate is past its deadline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Check CI deadline compliance&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;python scripts/check-ci-deadlines.py&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;GITHUB_REF&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.ref }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script (~150 lines) scans workflow files for &lt;code&gt;REPORT-ONLY-UNTIL: &amp;lt;date&amp;gt;&lt;/code&gt; markers and compares them against &lt;code&gt;datetime.now()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_deadlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workflow_file&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workflow_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Match: REPORT-ONLY-UNTIL: YYYY-MM-DD
&lt;/span&gt;    &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;REPORT-ONLY-UNTIL:\s*(\d{4}-\d{2}-\d{2})&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;finditer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;expired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;deadline_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;deadline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deadline_str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;deadline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;expired&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;deadline_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;expired&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;expired_gates&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;check_deadlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.github/workflows/validate-plugins.yml&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FATAL: report-only gates past deadline:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;deadline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expired_gates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (expired &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;deadline&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hardcoded workflow path can be swapped for any project (or glob all workflow files in &lt;code&gt;.github/workflows/&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The marker lives as a comment in the workflow, immediately preceding the gate it guards:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run ESLint&lt;/span&gt;
  &lt;span class="c1"&gt;# REPORT-ONLY-UNTIL: 2026-06-20&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run lint:js || &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the deadline arrives, the meta-gate blocks the build. To flip the gate to blocking, you remove the marker AND remove the &lt;code&gt;|| true&lt;/code&gt; in the same commit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;  - name: Run ESLint
&lt;span class="gd"&gt;-   # REPORT-ONLY-UNTIL: 2026-06-20
-   run: npm run lint:js || true
&lt;/span&gt;&lt;span class="gi"&gt;+   run: npm run lint:js
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This forces a deliberate choice: keep the gate advisory (extend the deadline or remove the marker entirely to soft-delete it), or flip it to blocking (bulk-clean the violations first, then remove both markers in one PR).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Campaign: Nine PRs, Zero Rot
&lt;/h2&gt;

&lt;p&gt;Claude-code-plugins is a 2000+ GitHub star monorepo with ~10,468 markdown files and plugins written in Python, TypeScript, shell, and markdown. The groundwork — the deadline meta-gate and one chastening moment where a freshly-blocking gate was flipped back to report-only after it caught a real failure on its first run — landed 2026-05-21. The campaign proper ran across 2026-05-22 and 2026-05-23, organized as nine sequential PRs (A through I), each addressing one logical concern.&lt;/p&gt;

&lt;h3&gt;
  
  
  PR A (#764): ESLint + Prettier
&lt;/h3&gt;

&lt;p&gt;ESLint and Prettier flipped to blocking gates together. No backlog of violations; the gates were added with cleanup already done, so the flip was immediate. These tools are tightly-coupled (same run step, same class of violations).&lt;/p&gt;

&lt;h3&gt;
  
  
  PR B (#765): Ruff + ruff-format
&lt;/h3&gt;

&lt;p&gt;Ruff (Python linter) and ruff-format flipped to blocking gates together. Bulk cleanup across ~180 Python files, then flip. This PR established the template: bulk cleanup + immediate gate, no deadline crutch. Again, tightly-coupled tools (same step, same violation class).&lt;/p&gt;

&lt;h3&gt;
  
  
  PR C (#767): Markdownlint (report-only)
&lt;/h3&gt;

&lt;p&gt;Markdownlint added as a report-only gate. Added with a &lt;code&gt;REPORT-ONLY-UNTIL: 2026-06-20&lt;/code&gt; deadline because the backlog was large (80 violations across 10,468 files). This was deliberate: the gate exists, enforcement is temporary, and the deadline is explicit.&lt;/p&gt;

&lt;h3&gt;
  
  
  PR D (#766): Root directory hygiene
&lt;/h3&gt;

&lt;p&gt;Root directory cleanup (removed stale license files, dead symlinks, misplaced config fragments). No gate change, just hygiene.&lt;/p&gt;

&lt;h3&gt;
  
  
  PR E (#768): Shellcheck — where the pattern earned its keep
&lt;/h3&gt;

&lt;p&gt;Shellcheck cleanup and flip to blocking. This PR surfaced the real value of the pattern.&lt;/p&gt;

&lt;p&gt;Shellcheck flagged 223 violations across 47 shell script files. These weren't nitpicks a reviewer would catch — they were silent runtime failures that report-only mode had been hiding for months. Flipping the gate to blocking forced them into the light.&lt;/p&gt;

&lt;p&gt;First detail: 47 of the .sh files were actually Python scripts with a &lt;code&gt;#!/usr/bin/env python3&lt;/code&gt; shebang but &lt;code&gt;.sh&lt;/code&gt; extension. Shellcheck flagged them as SC1071 (unknown shell dialect). The fix was &lt;code&gt;git mv&lt;/code&gt; to rename them &lt;code&gt;.py&lt;/code&gt;, which also correctly moved them under ruff's scope (discovered later in PR G when the widened-test-loop caught new Python lint).&lt;/p&gt;

&lt;p&gt;The real bugs shellcheck surfaced:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SC2064 – trap expansion time.&lt;/strong&gt; A cleanup handler was written as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;trap&lt;/span&gt; &lt;span class="s2"&gt;"rm -rf &lt;/span&gt;&lt;span class="nv"&gt;$temp_dir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; EXIT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The issue: &lt;code&gt;$temp_dir&lt;/code&gt; expands when the trap is SET (at script start), not when it FIRES (at exit). If the script changed the variable later, the trap deleted the wrong directory (or nothing at all). The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;trap&lt;/span&gt; &lt;span class="s1"&gt;'rm -rf "$temp_dir"'&lt;/span&gt; EXIT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Single quotes defer expansion to fire-time. Shellcheck flags this by default; it's a real gotcha.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unquoted redirection operator.&lt;/strong&gt; A dependency pinning line read:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;some-package[extra]&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;gt;=1.0&lt;/code&gt; is unquoted, so bash interprets &lt;code&gt;&amp;gt;&lt;/code&gt; as file redirection. The script silently created a file named &lt;code&gt;=1.0&lt;/code&gt; instead of pinning the version. Quoting the entire argument:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'some-package[extra]&amp;gt;=1.0'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;fixes it. Shellcheck flags the unquoted redirection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SC2166 – POSIX &lt;code&gt;-o&lt;/code&gt; operator.&lt;/strong&gt; The script used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$file1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$file2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-o&lt;/code&gt; operator (logical OR in &lt;code&gt;[ ]&lt;/code&gt; test) is not well-defined in POSIX and can misparse in edge cases. The portable form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$file1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$file2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is safer and clearer.&lt;/p&gt;

&lt;p&gt;Other notable finds: SC2188 (no-command redirect — fix: &lt;code&gt;: &amp;gt; "$LOG_FILE"&lt;/code&gt;), SC2213/SC2214 (getopts with duplicate option letters and unreachable case branches).&lt;/p&gt;

&lt;p&gt;After cleanup, PR E added a &lt;code&gt;.shellcheckrc&lt;/code&gt; file with project-wide disables for checks that were genuinely incompatible with the codebase style:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;disable&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;SC1090,SC1091,SC2155,SC2034&lt;/span&gt;
&lt;span class="py"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;warning&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are set in the config, NOT in the workflow step. A gotcha: &lt;code&gt;.shellcheckrc&lt;/code&gt; does not have a &lt;code&gt;severity=&lt;/code&gt; directive (it only recognizes &lt;code&gt;disable=&lt;/code&gt;). The severity policy is set at runtime in the CI step, so the workflow must include &lt;code&gt;--severity=warning&lt;/code&gt; in the &lt;code&gt;run:&lt;/code&gt; command.&lt;/p&gt;

&lt;h3&gt;
  
  
  PR F (#769): TypeScript coverage + codeblock syntax
&lt;/h3&gt;

&lt;p&gt;TypeScript coverage audit and code-block syntax linting flipped to blocking together. These tools are tightly-coupled (both operate on codeblock content in TypeScript/SKILL.md).&lt;/p&gt;

&lt;h3&gt;
  
  
  PR G (#770): Widened test loop
&lt;/h3&gt;

&lt;p&gt;Widened test loop (a 15-minute timeout gate that runs the full plugin suite). Flipped to blocking.&lt;/p&gt;

&lt;h3&gt;
  
  
  PR H (#771): Codeblock syntax cleanup
&lt;/h3&gt;

&lt;p&gt;Codeblock syntax cleanup (SKILL.md and README fenced-code fence fixes across plugins) flipped to blocking.&lt;/p&gt;

&lt;h3&gt;
  
  
  PR I (#772): Markdownlint — the last gate flips
&lt;/h3&gt;

&lt;p&gt;Markdownlint, the last report-only gate. 80 violations reduced to zero across 10,468 files. The largest categories were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MD051 (broken anchor links): 33 violations. TOC link targets regenerated to match GitHub's auto-slug format.&lt;/li&gt;
&lt;li&gt;MD056 (table column mismatch): 20 violations. Notable: 8 were a literal &lt;code&gt;|&lt;/code&gt; inside a backtick code span that needed escaping to &lt;code&gt;\|&lt;/code&gt;, and 12 were genuine header/row count mismatches.&lt;/li&gt;
&lt;li&gt;MD001 (improper heading increment): 7 violations.&lt;/li&gt;
&lt;li&gt;MD045 (missing alt text on shield badges): 3 violations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The remaining ~17 violations were distributed across smaller categories (MD046 indented-code, MD003 setext-style, MD031 fence blank-line) and cascading --fix resolutions.&lt;/p&gt;

&lt;p&gt;After PR I merged: &lt;strong&gt;10 blocking required gates, zero report-only.&lt;/strong&gt; The gates: eslint, prettier, ruff, ruff-format, shellcheck-skills, skill-codeblock-syntax, typescript-coverage-audit, widened-test-loop, markdownlint, and the base validate step.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Proof: External Contributions Now Bind
&lt;/h2&gt;

&lt;p&gt;The same day PR I merged (2026-05-23), a community-contributed plugin (agency-os, PR #709) landed. This PR had been open for review since before the campaign; when the 10 gates went blocking on 2026-05-23, the still-open PR's CI started failing, forcing a restructure. Its SKILL.md was 863 lines of auto-generated cruft. The marketplace-tier required fields (name, description, allowed-tools, version, author, license, compatibility, tags) were missing.&lt;/p&gt;

&lt;p&gt;The CI ran. All 10 gates failed. The contributor restructured the SKILL.md from 863 lines to 168 and added the 6 missing marketplace fields. No human reviewer had to enforce the standard in the PR thread — the CI did. This is the payoff: the gates now bind external contributions automatically, without a human argument happening in the thread.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Adopt This
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add the meta-gate script&lt;/strong&gt; (&lt;code&gt;check-ci-deadlines.py&lt;/code&gt;) to &lt;code&gt;scripts/&lt;/code&gt;. Swap the hardcoded workflow path (or glob all workflow files) and any project can adopt it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mark each report-only gate with a deadline comment&lt;/strong&gt; before the run step. Use format &lt;code&gt;# REPORT-ONLY-UNTIL: YYYY-MM-DD&lt;/code&gt;. Choose a date 4–6 weeks out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add the meta-gate to your workflow&lt;/strong&gt; as an early step. It fails fast if any gate is past deadline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When you're ready to flip a gate or tool-pair to blocking:&lt;/strong&gt; bulk-clean all violations in one commit, then remove the deadline marker AND the &lt;code&gt;|| true&lt;/code&gt; in the same PR. One logical concern per PR (tightly-coupled tools flip together).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never extend a deadline without a written plan.&lt;/strong&gt; If the backlog is too large, the deadline was too aggressive; next time, choose 8–12 weeks instead. The discipline is in the deliberate choice, not in the calendar date.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The system ensures that advisory gates cannot quietly rot into permanent technical debt. They expire, they get re-evaluated, or they go blocking. Technical governance becomes automatic.&lt;/p&gt;

</description>
      <category>ci</category>
      <category>devops</category>
      <category>githubactions</category>
      <category>linting</category>
    </item>
    <item>
      <title>Safety Model First: 16-Tool Ops MCP, One Day</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Mon, 25 May 2026 13:00:30 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/safety-model-first-16-tool-ops-mcp-one-day-36b5</link>
      <guid>https://dev.to/jeremy_longshore/safety-model-first-16-tool-ops-mcp-one-day-36b5</guid>
      <description>&lt;p&gt;The Intent Solutions production stack now lives on a single Contabo VPS after &lt;a href="https://dev.to/posts/propagation-day-when-the-spec-becomes-the-migration-plan/"&gt;a multi-week migration&lt;/a&gt;. Twenty-four containers across five stacks — Braves, Plane, Twenty, Umami, ntfy — sit behind one Caddy reverse proxy. Every day-to-day operational task touches that box: reload Caddy after a host-block edit, restart a stuck container, pull the last 200 lines of a service log, snapshot an instance before a risky change. Doing those by hand from a shell defeats the point of having Claude Code in the loop. Doing them through &lt;a href="https://dev.to/posts/guidewire-mcp-v0-1-0-foundation-ship/"&gt;a sloppy MCP server&lt;/a&gt; is how you brick prod from a chat window.&lt;/p&gt;

&lt;p&gt;So the question on day one wasn't "what tools do I want?" The question was "what would have to be true of those tools before I let a model fire them at a server I actually depend on?"&lt;/p&gt;

&lt;p&gt;The answer became a 7-point safety model, written into the README on the very first commit, before any tool existed. That model — and the host registry that sits behind it — is the whole reason the rest of the work fit in one day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 7-point safety model
&lt;/h2&gt;

&lt;p&gt;The model is what every tool in the catalog has to satisfy before it ships: a hard write denylist, dry-run-by-default for destructive operations, argument validation, key-based SSH only, backup-before-write, validate-before-reload, and capped output. No exceptions, no "we'll add that later."&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Write denylist (hard).&lt;/strong&gt; &lt;code&gt;server.file.write&lt;/code&gt; always refuses &lt;code&gt;/etc/passwd&lt;/code&gt;, &lt;code&gt;/etc/shadow&lt;/code&gt;, &lt;code&gt;/etc/sudoers[.d/...]&lt;/code&gt;, &lt;code&gt;~/.ssh/authorized_keys&lt;/code&gt;, &lt;code&gt;/boot/&lt;/code&gt;, &lt;code&gt;/usr/&lt;/code&gt;. Per-host config cannot opt out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dry-run by default.&lt;/strong&gt; Every destructive tool requires explicit &lt;code&gt;apply: true&lt;/code&gt; to actually fire. Omitting it returns the command that &lt;em&gt;would&lt;/em&gt; run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Argument validation.&lt;/strong&gt; Unit names, container names, service names, and paths are regex-validated before they ever reach a shell. &lt;code&gt;/^[A-Za-z0-9_.@-]+$/&lt;/code&gt; for systemd units, &lt;code&gt;/^[A-Za-z0-9_.-]+$/&lt;/code&gt; for container names, &lt;code&gt;/^\/[A-Za-z0-9_./-]+$/&lt;/code&gt; for absolute paths. Shell metacharacters never get the chance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key-based SSH only.&lt;/strong&gt; The &lt;code&gt;ssh2&lt;/code&gt; Client opens with &lt;code&gt;privateKey&lt;/code&gt;. No password fallback, no agent fallback. If the key isn't there, the call fails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backup before write.&lt;/strong&gt; Real writes do &lt;code&gt;cp -p &amp;lt;path&amp;gt; &amp;lt;path&amp;gt;.bak&lt;/code&gt; first. Recovery is one SSH command away.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate before reload.&lt;/strong&gt; &lt;code&gt;server.caddy.reload&lt;/code&gt; runs &lt;code&gt;caddy validate&lt;/code&gt; first and refuses to reload if validation fails. A broken Caddyfile cannot leave the building.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output capped.&lt;/strong&gt; &lt;code&gt;server.exec&lt;/code&gt; defaults to 8 KiB per stream; &lt;code&gt;server.file.read&lt;/code&gt; to 1 MiB. A runaway log tail can't blow up the context window or the model's reasoning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The thing to notice: none of these are tool-specific. They're cross-cutting invariants. Designing them first meant every tool I built afterward had a fixed checklist to satisfy, not a fresh argument to relitigate.&lt;/p&gt;

&lt;p&gt;A tool author shouldn't be inventing safety policy at 3pm on commit four. By then, the pressure to "just ship this one helper" has won the argument. The 7-point list exists so that argument never starts — every tool either passes all seven gates or doesn't ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  The host registry as the access-control boundary
&lt;/h2&gt;

&lt;p&gt;Here's the architectural choice that did the most work: &lt;strong&gt;the tools enforce nothing host-level. The registry does.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Only hosts in &lt;code&gt;~/.config/server-ops/hosts.yaml&lt;/code&gt; are reachable. The tools take a &lt;code&gt;host&lt;/code&gt; parameter, look it up, and refuse if it isn't there. Adding a new host is one YAML edit. Rotating a key is one YAML edit. Restricting which commands &lt;code&gt;server.exec&lt;/code&gt; may run on a given host is one YAML edit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;intentsolutions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;167.86.106.29&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;intentsolutions&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~/.ssh/id_ed25519&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
    &lt;span class="na"&gt;allowed_commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^docker&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^free"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^df"&lt;/span&gt;
    &lt;span class="na"&gt;write_allowed_paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^/srv/braves/.env$"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^/etc/caddy/Caddyfile$"&lt;/span&gt;

  &lt;span class="na"&gt;dev-box&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100.x.x.x&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jeremy&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~/.ssh/id_ed25519&lt;/span&gt;
    &lt;span class="c1"&gt;# No allowed_commands -&amp;gt; all commands allowed&lt;/span&gt;
    &lt;span class="c1"&gt;# No write_allowed_paths -&amp;gt; denylist only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two optional per-host regex allowlists do the per-host narrowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;allowed_commands&lt;/code&gt; — &lt;code&gt;server.exec&lt;/code&gt; only runs commands matching one of these. Omit to permit anything.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;write_allowed_paths&lt;/code&gt; — &lt;code&gt;server.file.write&lt;/code&gt; only writes paths matching one of these. The hard write denylist is &lt;em&gt;always&lt;/em&gt; enforced regardless.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prod host gets the tight collar. The dev box stays loose because if I brick it, I haven't taken any customer down.&lt;/p&gt;

&lt;p&gt;The asymmetry is the point — least privilege per environment, not a blanket policy. A flat policy that treated every host identically would either be too loose for prod or too tight for the dev box. Per-host config lets each environment carry exactly the constraints that match its blast radius. The Caddyfile and the Braves &lt;code&gt;.env&lt;/code&gt; are the only two files I ever intentionally edit on the prod box; everything else stays read-only. That's enforceable in seven lines of YAML.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not let the tools do ACLs?
&lt;/h3&gt;

&lt;p&gt;Tempting design: bake the allow/deny logic into each tool. Don't. Two reasons.&lt;/p&gt;

&lt;p&gt;First, you end up with the same access policy duplicated across &lt;code&gt;exec&lt;/code&gt;, &lt;code&gt;file.write&lt;/code&gt;, &lt;code&gt;systemd.restart&lt;/code&gt;, &lt;code&gt;docker.restart&lt;/code&gt;, &lt;code&gt;compose.up/down&lt;/code&gt;, and &lt;code&gt;caddy.reload&lt;/code&gt;. Drift between those is inevitable, and one drifted tool is the one an attacker (or a confused model) walks through.&lt;/p&gt;

&lt;p&gt;Second, the host registry is a much better trust seam. Reviewing one YAML file is a five-minute job. Reviewing seven tool implementations for consistent ACL enforcement is a meeting. The tools stay simple — they just do their one thing, validate their args, and ask the registry "am I allowed here?" The registry owns the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shared SSH client pool
&lt;/h2&gt;

&lt;p&gt;The second choice worth calling out: &lt;strong&gt;one &lt;code&gt;ssh2.Client&lt;/code&gt; per host, shared across exec and SFTP.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every tool — &lt;code&gt;server.exec&lt;/code&gt;, &lt;code&gt;server.file.read&lt;/code&gt;, &lt;code&gt;server.file.write&lt;/code&gt; — goes through the same pool. A first call to &lt;code&gt;host:intentsolutions&lt;/code&gt; opens the TCP connection, authenticates, and caches the client. Every subsequent call against that host reuses it. The pool drops the cached client on &lt;code&gt;close&lt;/code&gt; or &lt;code&gt;error&lt;/code&gt; events so the next call reconnects cleanly. On &lt;code&gt;SIGINT&lt;/code&gt; / &lt;code&gt;SIGTERM&lt;/code&gt; the server calls &lt;code&gt;closeAllClients()&lt;/code&gt; to drain on shutdown.&lt;/p&gt;

&lt;p&gt;The alternative — opening a fresh SSH session per tool call — would have added a full TCP + TLS + auth roundtrip to every operation. With Claude Code firing a sequence of &lt;code&gt;exec&lt;/code&gt; then &lt;code&gt;file.read&lt;/code&gt; then &lt;code&gt;systemd.restart&lt;/code&gt; calls against the same host, that's three connection setups where one would do. Worse, key-only auth is slow enough that the latency would show up in conversations.&lt;/p&gt;

&lt;p&gt;Pooling is one of those "obvious in hindsight" decisions that only stays clean if you commit to it on day one. Retrofit-pooling is a rewrite — every tool that opened its own client now has to learn to ask the pool, and the lifecycle questions (who closes? who reconnects on error? what happens on shutdown?) get answered seven different ways instead of once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not use ssh-agent?
&lt;/h3&gt;

&lt;p&gt;Using the local SSH agent would have eliminated the need to handle a private key file directly. It would also have made the server's behavior depend on out-of-band ambient state — whether an agent is running, whether the right key is loaded, whether the user remembered to &lt;code&gt;ssh-add&lt;/code&gt; after a reboot. For a server that needs to be predictable from a long-running Claude Code session, that ambient state is a liability. The &lt;code&gt;privateKey&lt;/code&gt; path is explicit, reproducible, and the failure mode is loud: missing key, fail immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tool catalog
&lt;/h2&gt;

&lt;p&gt;With the safety model and the host registry settled, the actual tools fell out fast. Six commits, 16 tools, 40 unit tests, ~1,950 lines added.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Destructive?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.exec&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run shell command; capped output, optional allowlist&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.file.read&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SFTP read, 1 MiB cap&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.file.write&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SFTP write with &lt;code&gt;&amp;lt;path&amp;gt;.bak&lt;/code&gt; snapshot&lt;/td&gt;
&lt;td&gt;Dry-run default + write denylist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.health&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Snapshot uptime, free, df, sensors, 7d OOM count&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.systemd.status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;systemctl status &amp;lt;unit&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.systemd.restart&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sudo systemctl restart &amp;lt;unit&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dry-run default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.caddy.reload&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Validate then reload Caddyfile&lt;/td&gt;
&lt;td&gt;Dry-run default; refuses if validate fails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.docker.ps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List running containers&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.docker.logs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tail container logs (default 200 lines)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.docker.restart&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Restart container&lt;/td&gt;
&lt;td&gt;Dry-run default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.compose.up&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;docker compose up -d&lt;/code&gt; in a directory&lt;/td&gt;
&lt;td&gt;Dry-run default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.compose.down&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker compose down&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dry-run default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server.compose.logs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tail one service's logs&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contabo.instance.list&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List Contabo instances&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contabo.instance.create&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Provision new instance (billed)&lt;/td&gt;
&lt;td&gt;Dry-run default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contabo.instance.snapshot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Snapshot an instance&lt;/td&gt;
&lt;td&gt;Dry-run default&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is uniform: read-only tools just run; destructive tools default to dry-run and only fire with &lt;code&gt;apply: true&lt;/code&gt;. A dry-run call returns the command that &lt;em&gt;would&lt;/em&gt; execute, so the model (or the human reading the transcript) can verify before committing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Contabo wrapper has a quirk worth knowing
&lt;/h2&gt;

&lt;p&gt;Contabo's identity server requires &lt;em&gt;both&lt;/em&gt; OAuth2 &lt;code&gt;(client_id, client_secret)&lt;/code&gt; &lt;em&gt;and&lt;/em&gt; legacy &lt;code&gt;(api_user, api_password)&lt;/code&gt; to mint a bearer token. This is non-standard — most OAuth2 flows want one or the other, not both. The wrapper validates all four env vars are present and surfaces a clean error if any are missing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Missing Contabo credentials: CONTABO_API_USER, CONTABO_API_PASSWORD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The token is cached in-process until 30 seconds before expiry. Tests expose &lt;code&gt;_clearTokenCache()&lt;/code&gt; to reset between cases. &lt;code&gt;contabo.instance.create&lt;/code&gt; and &lt;code&gt;contabo.instance.snapshot&lt;/code&gt; default to dry-run for the obvious reason: a create call books real money. Snapshot doesn't bill the same way, but the principle is consistent — if it mutates infrastructure, it asks before firing.&lt;/p&gt;

&lt;p&gt;The wrapper is intentionally thin. It's not trying to be a full Contabo SDK. It exposes three operations because those are the three I actually need from a Claude Code session: list what's running, provision a new instance when scaling out, snapshot before doing something risky. Adding more later is cheap; getting the safety defaults right is the part that has to be right on the first commit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not allowlist commands inside &lt;code&gt;server.exec&lt;/code&gt; itself?
&lt;/h3&gt;

&lt;p&gt;I went back and forth on this. The case for in-tool allowlists is "defense in depth — even if the registry is wrong, the tool refuses." The case against won, for the same reason the registry owns ACLs in general: every additional enforcement point is another spot where the policy can drift from intent.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;server.exec&lt;/code&gt; allowlist &lt;em&gt;is&lt;/em&gt; a regex array — but it lives per-host in &lt;code&gt;hosts.yaml&lt;/code&gt;, not hardcoded in the tool. The tool itself just asks the registry "is this command allowed on this host?" and refuses if no, runs if yes. One source of truth, reviewable in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this buys you
&lt;/h2&gt;

&lt;p&gt;The day's stats are downstream of the design, not the other way around. Six commits, 16 tools, 40 passing unit tests, typecheck + build clean, CI matrix green on Node 20 and 22. v0.1.0 cut and registered in &lt;code&gt;.mcp.json&lt;/code&gt; by end of day.&lt;/p&gt;

&lt;p&gt;That pace is possible because the cross-cutting decisions were nailed before any tool got written. Every tool was a fill-in-the-blanks exercise: validate args, look up host, check allowlist, build command, run with output cap, return dry-run-or-result. No tool needed to argue with the safety model — it just had to satisfy it.&lt;/p&gt;

&lt;p&gt;The opposite path — build tools first, retrofit safety after — is how MCP servers end up with five different error paths, three different ACL styles, and the one tool that forgot to cap its output. Designing the invariants before the surface area is what keeps the surface area honest.&lt;/p&gt;

&lt;p&gt;There's a second-order benefit: the safety model is the spec. When a future tool gets added — say &lt;code&gt;server.nginx.reload&lt;/code&gt; or &lt;code&gt;server.zfs.snapshot&lt;/code&gt; — there's no design meeting. The author reads the 7-point list, picks the matching pattern (validate-before-reload for nginx, dry-run for zfs), drops it into a host's &lt;code&gt;allowed_commands&lt;/code&gt;, and ships. The cognitive load of "is this safe enough?" got paid down once, on day one.&lt;/p&gt;

&lt;p&gt;This matters more for a Model Context Protocol (MCP) server than for ordinary tools because the caller is a model, not a human. A human reading &lt;code&gt;server.exec&lt;/code&gt; documentation would hesitate before running &lt;code&gt;rm -rf /srv/braves&lt;/code&gt;. A model that just learned the tool exists will happily fire it if the prompt nudges that way. The guardrails are the layer that doesn't depend on the caller having good judgment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also shipped
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;claude-code-plugins (PRs #762/#763/#764):&lt;/strong&gt; CI hardening — eslint + prettier added as blocking gates (first PR of a multi-PR cleanup), human-triggered auto-merge disabled (dependabot bumps still auto-merge), and nine historical AA-AACR audit files from December 2025 committed to the record.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;intentional-cognition-os:&lt;/strong&gt; v0.2 dogfood continued — &lt;code&gt;paraphrase_robustness&lt;/code&gt; metric landed in &lt;code&gt;verify.py&lt;/code&gt;, &lt;code&gt;ask-loop.py&lt;/code&gt; extracted as a standalone helper with &lt;code&gt;--paraphrases&lt;/code&gt;, &lt;code&gt;bank.py&lt;/code&gt; schema library plus ADRs 029-032, release v1.2.5, wiki/citation resolution against the workspace cache (closes h99). Continuation of &lt;a href="https://dev.to/posts/icos-dogfood-zero-to-five-fts-fallback/"&gt;yesterday's zero-to-five FTS fallback arc&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pipelinepilot:&lt;/strong&gt; Firebase Cloud Functions standardized on Gen2 Node 20 ESM (&lt;code&gt;firebase-admin&lt;/code&gt; import fixed), orchestrator wrapper added with sync &lt;code&gt;query(**kwargs)&lt;/code&gt; and pinned cloudpickle, Python smoke for the Vertex AI Reasoning Engine, beads tracking initialized, &lt;code&gt;.env&lt;/code&gt; patterns added to &lt;code&gt;.gitignore&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;hybrid-ai-stack + intent-genai-project-template:&lt;/strong&gt; Gemini PR-review fixups — flake8 violations cleared, three pre-existing security defects in redact/sanitize/routes closed, mypy + ruff cleanup.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/intent-catalog-six-phases-control-plane/"&gt;Intent Catalog: Six Phases from Empty Repo to Production Control Plane&lt;/a&gt; — the sibling shape: empty repo to working control plane in a small number of clean phases.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/guidewire-mcp-v0-1-0-foundation-ship/"&gt;Guidewire MCP v0.1.0: Carrier-Native Server Blueprint&lt;/a&gt; — another foundation-ship MCP post, same v0.1.0 framing.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/v1-release-gate-conditional-go/"&gt;A v1.0 Is a Gate, Not a Tag&lt;/a&gt; — release-gate discipline; the same logic that says "design the invariants first" applies to "earn the version number."&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>mcp</category>
      <category>typescript</category>
      <category>claudecode</category>
      <category>devops</category>
    </item>
    <item>
      <title>Five Tags, Zero Ships: How an Auto-Release Workflow Lied for a Whole Day</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Sun, 24 May 2026 13:00:27 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/five-tags-zero-ships-how-an-auto-release-workflow-lied-for-a-whole-day-3cii</link>
      <guid>https://dev.to/jeremy_longshore/five-tags-zero-ships-how-an-auto-release-workflow-lied-for-a-whole-day-3cii</guid>
      <description>&lt;p&gt;Five GitHub tags. v1.0.4 through v1.1.0. Five green checkmarks on the workflow. Five formatted release notes. The npm registry stayed at v1.0.5 the entire time.&lt;/p&gt;

&lt;p&gt;This is what it looks like when a release workflow ships tags without shipping code. Every observable surface said "done" except the one that mattered — the registry. The bug wasn't in one place; it was three independent failures that combined to make the lie convincing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Checkmarks Promised
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;gh release list&lt;/code&gt; showed all five tags with formatted changelogs. The workflow run logs were entirely green. If you ran &lt;code&gt;npm install -g intentional-cognition-os&lt;/code&gt;, you got v1.0.5. No error. No warning. Silently wrong for anyone relying on v1.0.5+, silently right for everyone else.&lt;/p&gt;

&lt;p&gt;The pattern repeated across the morning: commit → auto-release fires → tag appears → npm registry unchanged. The workflow was perfectly honest about tagging. It just wasn't releasing anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 1: Tests That Passed by Lying
&lt;/h2&gt;

&lt;p&gt;The "Verify readiness" step was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Verify readiness&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm test || &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;|| true&lt;/code&gt; is the tell. Every test failed. &lt;code&gt;Failed to resolve entry for package @ico/types&lt;/code&gt; — the workspace packages hadn't been built yet, so &lt;code&gt;pnpm test&lt;/code&gt; resolved nothing, threw hard errors, and the &lt;code&gt;|| true&lt;/code&gt; swallowed them all. The workflow saw exit code 0 and kept going.&lt;/p&gt;

&lt;p&gt;In a monorepo, the build step is not optional ceremony. The test runner needs the workspace packages to be built first. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Verify readiness&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;set -e&lt;/span&gt;
    &lt;span class="s"&gt;pnpm build&lt;/span&gt;
    &lt;span class="s"&gt;pnpm test&lt;/span&gt;
    &lt;span class="s"&gt;pnpm lint&lt;/span&gt;
    &lt;span class="s"&gt;pnpm typecheck&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;set -e&lt;/code&gt; means any non-zero exit stops the workflow. If tests fail after the build, you find out. If the build fails, you stop. Lint and typecheck went into the same step because they were already in the local pre-push hook; the only reason to keep them out of the release gate is laziness or speed, and a release gate is the wrong place to optimize either.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 2: Nine Version Sources, Six Ignored
&lt;/h2&gt;

&lt;p&gt;Nine surfaces emit a version string in this repo: root &lt;code&gt;package.json&lt;/code&gt;, &lt;code&gt;version.txt&lt;/code&gt;, &lt;code&gt;CHANGELOG.md&lt;/code&gt;, the five workspace &lt;code&gt;package.json&lt;/code&gt; files (&lt;code&gt;packages/cli&lt;/code&gt;, &lt;code&gt;packages/kernel&lt;/code&gt;, &lt;code&gt;packages/compiler&lt;/code&gt;, &lt;code&gt;packages/types&lt;/code&gt;, &lt;code&gt;packages/benchmarks&lt;/code&gt;), and the runtime constant at &lt;code&gt;packages/kernel/src/version.ts&lt;/code&gt;. The workflow bumped three of them — root, &lt;code&gt;version.txt&lt;/code&gt;, &lt;code&gt;CHANGELOG.md&lt;/code&gt; — and silently left the other six behind.&lt;/p&gt;

&lt;p&gt;Result: root said 1.0.4, workspace packages said 1.0.3. Root said 1.0.5, workspace said 1.0.4. Drift every run. &lt;code&gt;ico --version&lt;/code&gt; told users the workspace's number, not the tag's.&lt;/p&gt;

&lt;p&gt;Lock-step monorepos need single-source-of-truth version sync. A helper that picks up the six the workflow was missing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bump_pkg_json&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;local &lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;
  &lt;span class="nb"&gt;local &lt;/span&gt;&lt;span class="nv"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;
  node &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"
    const fs = require('fs');
    const pkg = JSON.parse(fs.readFileSync('&lt;/span&gt;&lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="s2"&gt;', 'utf8'));
    pkg.version = '&lt;/span&gt;&lt;span class="nv"&gt;$version&lt;/span&gt;&lt;span class="s2"&gt;';
    fs.writeFileSync('&lt;/span&gt;&lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="s2"&gt;', JSON.stringify(pkg, null, 2) + '&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;');
  "&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

bump_pkg_json package.json &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERSION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;pkg &lt;span class="k"&gt;in &lt;/span&gt;packages/&lt;span class="k"&gt;*&lt;/span&gt;/package.json&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;bump_pkg_json &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$pkg&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERSION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done
&lt;/span&gt;&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"s/export const VERSION = '.*';/export const VERSION = '&lt;/span&gt;&lt;span class="nv"&gt;$VERSION&lt;/span&gt;&lt;span class="s2"&gt;';/"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  packages/kernel/src/version.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All nine sources now move together. &lt;code&gt;ico --version&lt;/code&gt; reports the truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 3: The Step That Wasn't There
&lt;/h2&gt;

&lt;p&gt;The workflow tagged releases. It never published to npm. There was no &lt;code&gt;npm publish&lt;/code&gt; step. That's not a typo — the workflow was complete without it. Every release ran. Every release skipped the one thing that makes it a release.&lt;/p&gt;

&lt;p&gt;Here's what belongs after "Create GitHub Release":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Publish to npm&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;NPM_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.NPM_TOKEN }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;set -e&lt;/span&gt;
    &lt;span class="s"&gt;if [ -z "$NPM_TOKEN" ]; then&lt;/span&gt;
      &lt;span class="s"&gt;echo "NPM_TOKEN not set — skipping publish"&lt;/span&gt;
      &lt;span class="s"&gt;exit 0&lt;/span&gt;
    &lt;span class="s"&gt;fi&lt;/span&gt;
    &lt;span class="s"&gt;if npm view "intentional-cognition-os@$VERSION" version 2&amp;gt;/dev/null; then&lt;/span&gt;
      &lt;span class="s"&gt;echo "intentional-cognition-os@$VERSION already on npm — skipping"&lt;/span&gt;
      &lt;span class="s"&gt;exit 0&lt;/span&gt;
    &lt;span class="s"&gt;fi&lt;/span&gt;
    &lt;span class="s"&gt;echo "//registry.npmjs.org/:_authToken=$NPM_TOKEN" &amp;gt; ~/.npmrc&lt;/span&gt;
    &lt;span class="s"&gt;pnpm --filter intentional-cognition-os publish --no-git-checks&lt;/span&gt;
    &lt;span class="s"&gt;sleep 5&lt;/span&gt;
    &lt;span class="s"&gt;npm view "intentional-cognition-os@$VERSION" version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three guards, all in the script — not in the step's &lt;code&gt;if:&lt;/code&gt; condition. (Step-level &lt;code&gt;env:&lt;/code&gt; isn't available to that step's own &lt;code&gt;if:&lt;/code&gt; in GitHub Actions, so &lt;code&gt;if: env.NPM_TOKEN != ''&lt;/code&gt; would always evaluate false. The check belongs inside &lt;code&gt;run:&lt;/code&gt;, where the env is real.) Token presence fails safe if it's missing. Idempotency skips if already published (covers manual publishes). Post-publish verification re-queries the registry to confirm it landed.&lt;/p&gt;

&lt;p&gt;A release workflow that doesn't end with a verifiable artifact in the registry isn't a release workflow. It's a tagging workflow with extra steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  The State Behind the Process
&lt;/h2&gt;

&lt;p&gt;Fixing the workflow forward didn't fix the present. When the workflow was corrected (commit &lt;code&gt;7681dd5&lt;/code&gt;), &lt;code&gt;main&lt;/code&gt; was drifted: root at 1.1.0, workspace at 1.0.5. Users running &lt;code&gt;ico --version&lt;/code&gt; got &lt;code&gt;1.0.5&lt;/code&gt;. One-time backfill in commit &lt;code&gt;c651de8&lt;/code&gt; aligned all nine version sources to 1.1.0. Then verified: &lt;code&gt;pnpm build&lt;/code&gt; succeeded, &lt;code&gt;pnpm test&lt;/code&gt; 1,210/1,210 passing, &lt;code&gt;ico --version&lt;/code&gt; → &lt;code&gt;1.1.0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Process bugs leave state behind. Fixing the process doesn't heal the damage. You clean it up separately.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Bug Pattern
&lt;/h2&gt;

&lt;p&gt;Every CI/CD pipeline that ships has these three failure modes available:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Quality gates that pass on failure (&lt;code&gt;|| true&lt;/code&gt;, swallowed errors). Fix: &lt;code&gt;set -e&lt;/code&gt; and explicit step order.&lt;/li&gt;
&lt;li&gt;Monorepo workspaces with distributed version state. Fix: single-source-of-truth version sync in the workflow.&lt;/li&gt;
&lt;li&gt;A release workflow that doesn't end with verification the artifact reached the registry. Fix: final step that queries the registry and confirms.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The icos release workflow had all three. The checkmarks lied because the workflow wasn't designed to catch itself lying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also Shipped 2026-05-19
&lt;/h2&gt;

&lt;p&gt;Daily-log convention — the rest of the day, in one paragraph each. Not connected to the release-workflow thread; logged here because they happened on the same git day.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;claude-code-slack-channel v2 cluster&lt;/strong&gt; — 4 PRs merged with enterprise governance substrate framing. RFC 8785 JCS interop vectors (#175), cross-tier shadow detection (#176), journal v2 Ed25519 signing (#177), strip denied tool-call detail (#178).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;kobiton R3 close-out&lt;/strong&gt; — deliverable final review, Blog 3 rewrite, 5 close-out PRs merged.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude-code-plugins partner portal&lt;/strong&gt; — Kobiton and Nixtla brand integration, Killer Skill of the Week refresh.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;intentional-cognition-os test infra&lt;/strong&gt; — Intent Solutions Testing SOP layers L0-L7 installed (&lt;code&gt;.husky/&lt;/code&gt;, dependency-cruiser, stryker, RTM/PERSONAS/JOURNEYS docs). 3,447 insertions in commit &lt;code&gt;e0efdee&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/v1-release-gate-conditional-go/"&gt;v1.0.0: Conditional GO Through a Release Gate&lt;/a&gt; — The gate that flagged this path.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/honest-perf-benchmarks-paid-api-compiler/"&gt;Honest Performance Benchmarks for a Paid-API Compiler&lt;/a&gt; — Earlier icos work from this release cycle; same repo, different failure mode.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>releaseengineering</category>
      <category>cicd</category>
      <category>monorepo</category>
      <category>debugging</category>
    </item>
    <item>
      <title>A v1.0 Is a Gate, Not a Tag</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Thu, 21 May 2026 13:00:39 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/a-v10-is-a-gate-not-a-tag-3bc4</link>
      <guid>https://dev.to/jeremy_longshore/a-v10-is-a-gate-not-a-tag-3bc4</guid>
      <description>&lt;p&gt;Two beads were open at the start of 2026-05-18. E10-B11 was the v1.0 release-readiness gate. E10-B12 was the v1.0 release cut, blocked-by-design on B11. Epic 10 was the last epic in &lt;code&gt;intentional-cognition-os&lt;/code&gt; (ICO). The release pipeline was wired through &lt;code&gt;/release&lt;/code&gt;. Everything that mattered had to clear one ritual.&lt;/p&gt;

&lt;p&gt;Five npm releases shipped that day: v0.21.0 → v0.22.0 → v0.22.1 → v0.22.2 → &lt;strong&gt;v1.0.0&lt;/strong&gt; → v1.0.1. The interesting one is v1.0.0, because the gate said &lt;strong&gt;GO with conditions&lt;/strong&gt;, not GO. And the same-day v1.0.1 is the proof that "GO with conditions" is the correct verdict shape for a real release, not a binary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 3× degradation gate
&lt;/h2&gt;

&lt;p&gt;The release ran on top of fresh benchmark infrastructure. &lt;code&gt;625691e&lt;/code&gt; and &lt;code&gt;f7bd287&lt;/code&gt; closed out E10-B06 (performance profiling) with a 500-source large-corpus benchmark. The headline addition was a &lt;strong&gt;3× degradation gate&lt;/strong&gt; — a configurable cap (default 3.0) that fails the run if per-unit cost at large scale exceeds 3× the moderate-corpus baseline.&lt;/p&gt;

&lt;p&gt;The gate is intentionally narrow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// utils/degradation.ts — gate stays honest by NOT inferring per-unit costs&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;computeDegradation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;moderatePerUnitMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;largePerUnitMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;ratio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;pass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;moderatePerUnitMs&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ratio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;pass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt; &lt;span class="c1"&gt;// catch degenerate samples loudly&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;largePerUnitMs&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;moderatePerUnitMs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ratio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;pass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;cap&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runner does per-unit derivation BEFORE calling the gate. Ingest's &lt;code&gt;perFile.medianMs&lt;/code&gt; is already per-unit (each iteration was one file). Lint's &lt;code&gt;result.medianMs&lt;/code&gt; is whole-workspace, so the runner divides by page count first. Putting that decision in the runner instead of the gate is the difference between "gate that knows what it's measuring" and "gate that guesses at the measurement units."&lt;/p&gt;

&lt;p&gt;Results at 500 sources: ingest 1.25× (PASS), lint 0.33× (PASS — got faster at scale, likely amortized constants). The gate had teeth and the system passed cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The release-readiness checklist (E10-B11, PR #73)
&lt;/h2&gt;

&lt;p&gt;Eight items, verified item-by-item, recorded honestly. No "looks good to me" entries:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;CI passes&lt;/strong&gt; — all 4 jobs green on last 3 main runs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evals pass&lt;/strong&gt; — smoke eval clean; retrieval/citation/compilation handlers wired with 30+ unit tests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage targets&lt;/strong&gt; — PARTIAL. Types 100%, kernel 84.6% (target 90%), compiler 62.3% (target 80%), CLI 45.2% (target 70%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs updated&lt;/strong&gt; — current per E10-B07/B08&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CHANGELOG complete&lt;/strong&gt; — auto-generated, current through v0.22.0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No critical beads open&lt;/strong&gt; — only B11 (this) + B12 (release cut, blocked by design)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User journey walkthrough&lt;/strong&gt; — &lt;code&gt;ico init&lt;/code&gt; → status → 14-command CLI surface, live smoke-tested&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance targets met&lt;/strong&gt; — ingest 200× headroom, lint 3000× headroom, 3× degradation gate PASS&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Verdict: &lt;strong&gt;GO with two conditions.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;C1:&lt;/strong&gt; &lt;code&gt;ico --version&lt;/code&gt; reported &lt;code&gt;0.1.0&lt;/code&gt; (a stale kernel constant) instead of the published &lt;code&gt;0.22.x&lt;/code&gt;. Fix in-cut.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C2:&lt;/strong&gt; Coverage shortfall on kernel/compiler/cli. Documented as post-v1, not blocking. 1,210 passing tests, zero known bugs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That verdict is the artifact. Most release rituals make GO/NO-GO a binary. The conditional verdict is honest: state the gap, decide if it blocks, ship if it doesn't, document the gap permanently if it doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "GO with conditions" actually means
&lt;/h2&gt;

&lt;p&gt;A conditional release verdict is the three-state model: &lt;strong&gt;fix what's fixable in-cut, document what isn't, ship anyway.&lt;/strong&gt; Unlike a binary GO/NO-GO gate that forces a boolean choice, a conditional gate acknowledges that real releases ship with known imperfections. The conditions are documented forever in the release record — no lying about readiness, no pretending gaps don't exist, but no unnecessary delays waiting for the perfect threshold that never comes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not GO/NO-GO binary?
&lt;/h2&gt;

&lt;p&gt;Binary GO/NO-GO encourages two bad behaviors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behavior one: lower the bar to ship.&lt;/strong&gt; "The version-string bug is fine, users will figure it out." The release ships, the operator-visible defect ships with it, and the next person debugging an environment ends up reading the wrong build into their incident postmortem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behavior two: delay until the gate is perfect.&lt;/strong&gt; Coverage targets met on a Tuesday that never comes. Kernel at 84.6% is allegedly not 90%, so v1.0 slips. Then 90% becomes 95%, because some new code landed during the wait. The gate becomes a treadmill.&lt;/p&gt;

&lt;p&gt;Coverage at kernel 84.6% / compiler 62.3% / CLI 45.2% with 1,210 passing tests and zero known bugs &lt;strong&gt;is shippable&lt;/strong&gt;. Blocking v1.0 on coverage uplift would have been a bigger lie than shipping with documented shortfalls. The AAR opens C2 as a post-v1 bead for the next planning cycle. The truth is in the record.&lt;/p&gt;

&lt;p&gt;C1 is the inverse case — &lt;code&gt;ico --version&lt;/code&gt; reporting the wrong number is shippable but ugly, and the fix is small. So fix it in-cut, document it, move on. The gate didn't pretend C1 was fine; it just didn't pretend it was a v2.0-blocker either.&lt;/p&gt;

&lt;p&gt;The prescription is a three-part rule, not a two-part one: &lt;strong&gt;fix what's fixable in-cut, document what isn't, ship anyway.&lt;/strong&gt; Binary GO/NO-GO collapses three states into two and loses the most useful one — the "shippable with known imperfections" state where most real releases actually live.&lt;/p&gt;

&lt;h2&gt;
  
  
  C1 fix: read your own version (PR #74)
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;packages/cli/src/index.ts&lt;/code&gt; had been importing &lt;code&gt;version&lt;/code&gt; from &lt;code&gt;@ico/kernel&lt;/code&gt;, which exported a hardcoded string. The kernel constant was never maintained in lock-step with the published CLI package — and &lt;strong&gt;shouldn't be&lt;/strong&gt;, since they are independent artifacts on independent release cadences.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// packages/cli/src/index.ts — read from CLI's own package.json&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;readCliVersion&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pkgPath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../package.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pkg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pkgPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;utf-8&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;pkg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[ico] failed to read CLI package.json:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0.0.0-unknown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// sentinel — CLI keeps working, operator sees clear msg&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cliVersion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;readCliVersion&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The try/catch is load-bearing. &lt;code&gt;readCliVersion()&lt;/code&gt; runs at module load, BEFORE the process-level error handlers are installed further down the file. An uncaught throw here would surface as a raw Node stack trace and bypass the friendly &lt;code&gt;[ico]&lt;/code&gt;-prefixed message convention every other CLI error uses. The sentinel path is what makes this safe to call at import time — the CLI keeps working, the operator gets a legible message, and the bug is visible without crashing.&lt;/p&gt;

&lt;p&gt;The test was tightened in the same PR. &lt;code&gt;/^\d+\.\d+\.\d+/&lt;/code&gt; (no end anchor — would accept nonsense like &lt;code&gt;0.22.1.99&lt;/code&gt;) became:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cliVersion&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toMatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;[\w&lt;/span&gt;&lt;span class="sr"&gt;.-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)?&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Strict semver core plus optional pre-release tag. The previous regex was a one-character bug; the fix is one character plus an opt-in pre-release group.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cut itself (52fa7a4 → v1.0.0)
&lt;/h2&gt;

&lt;p&gt;The cut commit was tiny: 11 files, +54/-10 lines. It did one thing: aligned &lt;strong&gt;all 6 workspace &lt;code&gt;package.json&lt;/code&gt;&lt;/strong&gt; + &lt;code&gt;version.txt&lt;/code&gt; + &lt;code&gt;kernel/src/version.ts&lt;/code&gt; at 1.0.0.&lt;/p&gt;

&lt;p&gt;The auto-release workflow had been bumping the root &lt;code&gt;package.json&lt;/code&gt; and &lt;code&gt;version.txt&lt;/code&gt; only — internal packages had drifted to 0.1.0 or 0.22.1 depending on history. &lt;code&gt;/release&lt;/code&gt; Phase 3 caught the drift. Phase 5 required explicit SHA approval before any push (&lt;code&gt;f1a627b&lt;/code&gt;). Phases 6-8 ran atomically.&lt;/p&gt;

&lt;p&gt;Verified at v1.0:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,210 / 1,210 tests pass across 5 packages&lt;/li&gt;
&lt;li&gt;Lint + typecheck clean&lt;/li&gt;
&lt;li&gt;escape-scan REFUSE=0 CHALLENGE=0 FLAG=0&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ico --version&lt;/code&gt; reports &lt;code&gt;1.0.0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The tarball turned out incomplete (v1.0.1, same day)
&lt;/h2&gt;

&lt;p&gt;During the actual &lt;code&gt;npm publish&lt;/code&gt; flow, the pack dry-run reported &lt;strong&gt;7 files&lt;/strong&gt; when expected was 9: dist + package.json, no README, no LICENSE. The CLI's &lt;code&gt;package.json&lt;/code&gt; declared:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"files"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dist"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"README.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LICENSE"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the CLI directory didn't OWN those files. The canonical &lt;code&gt;README.md&lt;/code&gt; and &lt;code&gt;LICENSE&lt;/code&gt; live at the monorepo root.&lt;/p&gt;

&lt;p&gt;Fix landed inline before the real publish:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// packages/cli/tsup.config.ts — copy README + LICENSE at build time&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="c1"&gt;// ... entry, format, dts, sourcemap ...&lt;/span&gt;
  &lt;span class="na"&gt;onSuccess&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cp ../../README.md ../../LICENSE ./&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The copies are gitignored (their source of truth is the repo root). v1.0.0 on npm now includes both. No version bump for the build-infra fix itself, but the same day shipped v1.0.1 for the next user-visible change.&lt;/p&gt;

&lt;p&gt;This is the test of whether "GO with conditions" was the right shape. A binary GO/NO-GO ritual would have caught the version string (C1) and either fixed it before re-running the whole gate or punted to v1.0.1. The conditional model said: ship, here's what we know is imperfect. When the tarball turned out incomplete during the actual publish — a discovery that &lt;strong&gt;couldn't&lt;/strong&gt; have been made during gate verification, because it only surfaces in the publish pipeline itself — the answer was just: ship v1.0.1 the same day. No drama. No "release is broken" panic. The model already accepted that real releases generate follow-on releases.&lt;/p&gt;

&lt;h2&gt;
  
  
  AAR same day
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;d17e10e docs(aar): v1.0.0 release after-action report&lt;/code&gt; landed within hours. Three lessons-for-next-release, captured while they were still warm:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Beads JSONL/Dolt sync flapping&lt;/strong&gt; during multi-PR sessions — repeated need to re-close beads after merges. Filed as a follow-up to investigate the sync ordering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-release workflow bumps root + &lt;code&gt;version.txt&lt;/code&gt; only&lt;/strong&gt; — should bump &lt;code&gt;packages/*/package.json&lt;/code&gt; in lock-step. The 11-file cut commit was entirely correcting drift the workflow could have prevented.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/release&lt;/code&gt; skill execution worked as designed&lt;/strong&gt; — Phase 0 surfaced no blockers, Phases 1-3 caught the version drift, Phase 5 required SHA approval, Phases 6-8 atomic.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Same-day AAR is non-negotiable. The version-drift issue, the tarball issue, the conditional-verdict pattern — all of them lose 80% of their teaching value if you write the AAR a week later, after the warm memory of "wait, why didn't the workflow catch that?" has faded into "yeah, we shipped, it was fine."&lt;/p&gt;

&lt;h2&gt;
  
  
  Also shipped
&lt;/h2&gt;

&lt;p&gt;The release gate constrained the v1.0.0 cut, not the working day. Three other repos kept moving in parallel — exactly the behavior the conditional-verdict model is designed to enable. A release that takes the whole org offline isn't a release ritual; it's an outage.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;hustle:&lt;/strong&gt; Phase 3 auth landed in three commits — NextAuth + Drizzle/SQLite infrastructure, dashboard cutover, password reset flow. Coordinated migration from the previous auth stack on a single feature branch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude-code-slack-channel:&lt;/strong&gt; ACP session/cancel boundary adapter extracted into a module, and JSON-RPC &lt;code&gt;id&lt;/code&gt; widened to nullable per spec §5.1 (#172, #173).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude-code-plugins:&lt;/strong&gt; Six PRs — repo quality audit, private vulnerability reporting enabled, validator discovers root-level &lt;code&gt;SKILL.md&lt;/code&gt; (Anthropic-spec layout), slack-channel mirror stopped stripping upstream tests, blog cross-post infra fix.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/honest-perf-benchmarks-paid-api-compiler/"&gt;Honest perf benchmarks for a paid-API compiler&lt;/a&gt; — yesterday's post on the benchmark infrastructure that fed this release gate&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/five-releases-fifteen-minutes-mandy-cutover-and-freeze-break/"&gt;Five releases in fifteen minutes: Mandy cutover and freeze break&lt;/a&gt; — earlier five-releases-in-a-day pattern&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/github-release-workflow-uncommitted-changes-semantic-versioning/"&gt;GitHub release workflow: uncommitted changes and semantic versioning&lt;/a&gt; — related release-engineering theme&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>releaseengineering</category>
      <category>cicd</category>
      <category>typescript</category>
      <category>testing</category>
    </item>
    <item>
      <title>Honest Perf Benchmarks for a Paid-API Compiler</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Wed, 20 May 2026 13:00:40 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/honest-perf-benchmarks-for-a-paid-api-compiler-56h4</link>
      <guid>https://dev.to/jeremy_longshore/honest-perf-benchmarks-for-a-paid-api-compiler-56h4</guid>
      <description>&lt;p&gt;&lt;code&gt;intentional-cognition-os&lt;/code&gt; is a TypeScript "compiler" — markdown sources go in one end, a structured artifact comes out the other, and several of the middle stages call paid Claude APIs to do the cognitive work. Up to today there were zero performance gates on any of it. No baseline, no regression alarm, no "did that refactor make ingest 4× slower" check.&lt;/p&gt;

&lt;p&gt;The benchmark suite that landed across four PRs answers two design questions that had to be settled before a single line of timing code got written:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How do you compare numbers across machines when half the corpus is randomly generated text?&lt;/li&gt;
&lt;li&gt;What do you do about the steps that cost real money on every run?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Get either answer wrong and the benchmark suite is worse than no benchmark suite — it produces numbers that look authoritative and aren't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The corpus has to be byte-identical
&lt;/h2&gt;

&lt;p&gt;The first scenario — &lt;code&gt;ingest&lt;/code&gt; — needs a corpus. Hand-curated fixtures committed to disk were considered and rejected: they don't scale, they go stale, and they encode whoever-wrote-them's idea of "representative." A generator is the right answer, but a generator has to be deterministic or before/after diffs are noise.&lt;/p&gt;

&lt;p&gt;The generator uses a seeded &lt;code&gt;mulberry32&lt;/code&gt; PRNG and pulls UUIDs from the same stream:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;mulberry32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mh"&gt;0x6d2b79f5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;^=&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;61&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4294967296&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;seededUuidV4&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// 16 bytes from the seeded stream, version + variant nibbles set per RFC 4122&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x0f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mh"&gt;0x40&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x3f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mh"&gt;0x80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;formatUuid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The non-obvious trap is &lt;code&gt;crypto.randomUUID&lt;/code&gt;. It would have looked correct, passed every unit test, and silently produced different UUIDs on every run — so every "identical" corpus would have differed in the front-matter &lt;code&gt;id&lt;/code&gt; field. That breaks ingest's content-hash cache in different ways on different machines. Same seed, same count, same body-word count yields byte-identical output everywhere. That's the contract.&lt;/p&gt;

&lt;p&gt;One more gotcha worth a sentence: the corpus generator writes front matter through &lt;code&gt;gray-matter&lt;/code&gt;, which quotes string values. The compiler's wiki-page validator uses a hand-rolled YAML parser that does NOT strip quotes — so wiki fixtures emit all values unquoted. A quoted &lt;code&gt;compiled_at&lt;/code&gt; would arrive at Zod's datetime check with literal &lt;code&gt;"&lt;/code&gt; characters in it and fail. Two parsers, two rules, documented inline at the parser boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  An API key is not consent
&lt;/h2&gt;

&lt;p&gt;The render, compile, and ask scenarios call Claude. Running them on every CI pass would either drain a budget or quietly stop running when the budget hit zero. Neither is acceptable.&lt;/p&gt;

&lt;p&gt;The gate is two env vars, both required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-... &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;ICO_BENCH_INCLUDE_CLAUDE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
pnpm &lt;span class="nt"&gt;--filter&lt;/span&gt; @ico/benchmarks bench
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From PR #70's design notes, kept verbatim because the framing matters:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The double gate is intentional. An API key alone is not consent — many developers have it set for normal CLI use. &lt;code&gt;ICO_BENCH_INCLUDE_CLAUDE&lt;/code&gt; is the explicit "yes, burn tokens on this benchmark run" signal.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This pattern shows up elsewhere — &lt;code&gt;CI=true&lt;/code&gt; plus &lt;code&gt;RUN_E2E=1&lt;/code&gt;, prod credentials plus &lt;code&gt;--really-really-yes&lt;/code&gt;. The shape is the same: one signal proves capability, the second proves intent. A single-gate design fails open the first time someone forgets which shell they're in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skipped is not zero
&lt;/h2&gt;

&lt;p&gt;The interesting design call was what to do when the gate is closed. The wrong answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't run, don't record. Trend tooling then can't tell "we stopped running render" from "render still passes."&lt;/li&gt;
&lt;li&gt;Record a zero. Trend tooling thinks render got infinitely fast and stops alarming.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right answer: record the scenario as &lt;code&gt;skipped: true&lt;/code&gt; with a stable &lt;code&gt;skipReason&lt;/code&gt;. &lt;code&gt;ScenarioRecord&lt;/code&gt; is &lt;code&gt;Partial&amp;lt;CommonTiming&amp;gt;&lt;/code&gt; so the timing fields legitimately don't exist on skipped records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"render"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skipped"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skipReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ICO_BENCH_INCLUDE_CLAUDE not set"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"git_sha"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"9c14f02"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v22.21.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"linux-x64"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A baseline-comparison script can now answer three different questions instead of two: did this scenario regress, did it improve, or did it not run? Skipped runs stay visible in the JSON timeline. They don't pollute the histogram, but they prove the scenario still exists and the runner saw it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four PRs, briefly
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PR #68&lt;/strong&gt; scaffolded the &lt;code&gt;packages/benchmarks/&lt;/code&gt; workspace, the corpus generator, a &lt;code&gt;bench()&lt;/code&gt; timer with warmup + N-iteration median + RSS delta, and the runner that captures git SHA, Node version, and platform into &lt;code&gt;results/&amp;lt;iso&amp;gt;-&amp;lt;sha&amp;gt;.json&lt;/code&gt;. The &lt;code&gt;results/&lt;/code&gt; directory is gitignored except &lt;code&gt;.gitkeep&lt;/code&gt; — baselines get tracked explicitly, not by accident.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR #69&lt;/strong&gt; added the &lt;code&gt;lint&lt;/code&gt; scenario and moved &lt;code&gt;runLint&lt;/code&gt;, &lt;code&gt;scanWikiPages&lt;/code&gt;, &lt;code&gt;extractWikilinks&lt;/code&gt;, &lt;code&gt;detectOrphans&lt;/code&gt;, &lt;code&gt;LintResult&lt;/code&gt;, and &lt;code&gt;SchemaError&lt;/code&gt; out of &lt;code&gt;packages/cli/src/commands/lint.ts&lt;/code&gt; into a new &lt;code&gt;packages/compiler/src/lint.ts&lt;/code&gt;. The function only composes compiler + kernel primitives and has no CLI dependency — it belonged in the compiler the whole time. The CLI's lint command shrunk to a thin wrapper around commander wiring and &lt;code&gt;renderLintReport&lt;/code&gt;. Side fix: &lt;code&gt;extractWikilinks&lt;/code&gt; had a module-level &lt;code&gt;/g&lt;/code&gt; regex whose &lt;code&gt;lastIndex&lt;/code&gt; carried state between calls — the same class of bug that landed in PR #67 the day before. Fixed by constructing the regex per call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR #70&lt;/strong&gt; added the &lt;code&gt;render&lt;/code&gt; scenario and the double-gate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR #71&lt;/strong&gt; added &lt;code&gt;compile&lt;/code&gt; and &lt;code&gt;ask&lt;/code&gt;, each using the same gating pattern. Roughly 70 lines of additions across both files — the gate had already done the hard work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why not the obvious alternatives
&lt;/h2&gt;

&lt;p&gt;Vitest's built-in &lt;code&gt;bench&lt;/code&gt; was considered. It does microbenchmarks well and integrates with the existing test runner. It does not produce the JSON timeline shape needed for cross-run comparison, and bolting that on means owning the storage layer anyway. Build it once, build it right.&lt;/p&gt;

&lt;p&gt;Committing fixture corpora to disk was considered. They go stale, balloon the repo, and encode one author's idea of "moderate." The seeded generator is reproducible AND parameterizable — same determinism guarantee, no committed binary blobs.&lt;/p&gt;

&lt;p&gt;Running Claude scenarios always was considered for about a minute, then rejected on cost grounds. Even with caching, a benchmark suite that costs $2 per run on a busy day stops getting run.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the numbers say
&lt;/h2&gt;

&lt;p&gt;Three scenarios ran on the dev box this afternoon (Claude-gated ones skipped because the opt-in wasn't set):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Median&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Headroom&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ingest (per-file, 50 sources × 500 words)&lt;/td&gt;
&lt;td&gt;~9 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 2 s&lt;/td&gt;
&lt;td&gt;220×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lint (50 sources + 30 wiki pages)&lt;/td&gt;
&lt;td&gt;~12 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 30 s&lt;/td&gt;
&lt;td&gt;2400×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;render&lt;/td&gt;
&lt;td&gt;SKIPPED (no opt-in)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;recorded&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headroom isn't the point — those targets are deliberately generous because the goal is regression detection, not perf bragging. The point is that there are now numbers to regress &lt;em&gt;against&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also shipped today
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;claude-code-plugins repo audit.&lt;/strong&gt; A 232-line audit landed at &lt;code&gt;266-RA-AUDT-repo-quality-audit-2026-05-17.md&lt;/code&gt; cataloguing a broken &lt;code&gt;/about&lt;/code&gt; route, missing 404 handling, 14 stale &lt;code&gt;MS-OLDV&lt;/code&gt; files still claiming v1.0.0 while the repo is at v4.30.0, and notebook content teaching the old 6-required-fields skill spec when the current spec requires 8. The first commit incorrectly flagged the wiki as empty, because &lt;code&gt;gh api repos/.../wiki&lt;/code&gt; returns 404 even when the wiki has content — that endpoint isn't a content probe, it's a metadata probe with bad error semantics. Followup commit cloned the wiki, found 23 pages, and refreshed all of them with current numbers. Lesson noted inline: don't use API existence probes as content probes. Clone and read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;claude-code-slack-channel threat model.&lt;/strong&gt; Added T11 (EchoLeak — instructions exfiltrated via legitimate-looking message replies) and invariant #7: admin verbs are not chat content. An operational key-management doc for the audit-signing key landed alongside the threat model update.&lt;/p&gt;

&lt;h2&gt;
  
  
  The transferable pattern
&lt;/h2&gt;

&lt;p&gt;Five scenarios in source tree, three actively measured, two gated behind explicit consent. The numbers that get reported are honest because the inputs are reproducible and the skipped runs are visible. Forget the opt-in flag and three scenarios show up as &lt;code&gt;skipped&lt;/code&gt; in the JSON — they don't disappear, and they don't pretend to be zero.&lt;/p&gt;

&lt;p&gt;Any benchmark suite that mixes deterministic and paid steps needs all three pieces: a deterministic corpus that survives machine swaps, an opt-in gate strong enough to mean something, and a record shape that distinguishes "didn't run" from "ran fast." Miss one and the suite will quietly lie to you the first time someone forgets which mode they're in. The lie is worse than the gap it filled.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/five-silent-failures-one-day/"&gt;Five Silent Failures in One Day&lt;/a&gt; — the regex &lt;code&gt;lastIndex&lt;/code&gt; bug that re-appeared in PR #69 was one of these.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/deterministic-first-llm-advisory-ci/"&gt;Deterministic-First, LLM-Advisory CI&lt;/a&gt; — same principle: the deterministic gate decides, the paid gate informs.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/transitive-cve-clearance-dual-layer-pattern/"&gt;Transitive CVE Clearance: A Dual-Layer Pattern&lt;/a&gt; — the double-gate is the same shape as that two-layer defense.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>typescript</category>
      <category>testing</category>
      <category>architecture</category>
      <category>cicd</category>
    </item>
    <item>
      <title>Five Silent Failures in One Day</title>
      <dc:creator>Jeremy Longshore</dc:creator>
      <pubDate>Tue, 19 May 2026 13:00:41 +0000</pubDate>
      <link>https://dev.to/jeremy_longshore/five-silent-failures-in-one-day-4n7d</link>
      <guid>https://dev.to/jeremy_longshore/five-silent-failures-in-one-day-4n7d</guid>
      <description>&lt;p&gt;&lt;strong&gt;A silent failure is when a tool reports PASS without doing the work it was supposed to do — the legitimate empty-set case and the broken-but-silent case produce identical output, and nothing downstream can tell them apart.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A green check is not evidence of work. It is evidence that whatever ran did not raise an error. Those are different claims, and on 2026-05-16 the difference surfaced five times in five unrelated systems before lunch.&lt;/p&gt;

&lt;p&gt;The pattern is the same in all five: a tool reported PASS without doing the work it was supposed to do. Not a wrong answer — no answer, dressed up as a correct one. The legitimate empty-set case and the broken-but-silent case produced identical output. CI was green. Reviewers saw nothing to push back on. The signal that something was wrong came from downstream consumers noticing the work was missing.&lt;/p&gt;

&lt;p&gt;The five instances, in the order they were found:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A CI prescreen that ran on zero plugins and called itself green&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;.gitignore&lt;/code&gt; rule that silently dropped plugin configs from every commit&lt;/li&gt;
&lt;li&gt;Prettier that reformatted an 11,000-line catalog and exited 0&lt;/li&gt;
&lt;li&gt;An SSH deploy that succeeded by doing nothing&lt;/li&gt;
&lt;li&gt;A regex that quietly skipped matches because the &lt;code&gt;/g&lt;/code&gt; flag left state behind&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each one shipped past code review. Each one was caught by a downstream user, not by the gate that was supposed to catch it. Each one has now been re-armed with a guard whose job is to assert the work actually happened — not to assert that the command exited zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The prescreen that ran on zero plugins
&lt;/h2&gt;

&lt;p&gt;Repo: &lt;code&gt;claude-code-plugins&lt;/code&gt;, PR #730.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;pr-prescreen.yml&lt;/code&gt; workflow's "Compute changed plugin paths" step combined &lt;code&gt;gh api --paginate&lt;/code&gt; with &lt;code&gt;--jq&lt;/code&gt; in a single pipe:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Compute changed plugin paths&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;gh api --paginate \&lt;/span&gt;
      &lt;span class="s"&gt;"/repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \&lt;/span&gt;
      &lt;span class="s"&gt;--jq '.[].filename' \&lt;/span&gt;
      &lt;span class="s"&gt;| grep -E '^plugins/[^/]+/' \&lt;/span&gt;
      &lt;span class="s"&gt;| cut -d/ -f1-2 \&lt;/span&gt;
      &lt;span class="s"&gt;| sort -u &amp;gt; changed-plugins.txt || true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works on every local shell. On the GitHub Actions runner, the &lt;code&gt;--paginate&lt;/code&gt; + &lt;code&gt;--jq&lt;/code&gt; combination silently produced empty stdout. No error. No exit code. Just nothing on the pipe. The downstream &lt;code&gt;grep | cut | sort -u&lt;/code&gt; happily processed zero lines and wrote an empty file. The trailing &lt;code&gt;|| true&lt;/code&gt; swallowed any failure that might have escaped the pipeline.&lt;/p&gt;

&lt;p&gt;The classifier then read &lt;code&gt;changed-plugins.txt&lt;/code&gt;, saw zero entries, and emitted &lt;code&gt;PASS: no plugin paths matched the PR diff&lt;/code&gt;. Two external PRs — #726 and #728, the first contributions through the new pipeline — both landed false PASS verdicts on PRs that obviously added new plugin directories.&lt;/p&gt;

&lt;p&gt;The fix is two changes and a guard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fetch changed files&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;gh api --paginate \&lt;/span&gt;
      &lt;span class="s"&gt;"/repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \&lt;/span&gt;
      &lt;span class="s"&gt;&amp;gt; pr-files.json&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Extract plugin paths&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;jq -r '.[].filename' pr-files.json \&lt;/span&gt;
      &lt;span class="s"&gt;| grep -E '^plugins/[^/]+/' \&lt;/span&gt;
      &lt;span class="s"&gt;| cut -d/ -f1-2 \&lt;/span&gt;
      &lt;span class="s"&gt;| sort -u &amp;gt; changed-plugins.txt&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Sanity guard&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;if jq -r '.[].filename' pr-files.json | grep -qE '^plugins/'; then&lt;/span&gt;
      &lt;span class="s"&gt;if [ ! -s changed-plugins.txt ]; then&lt;/span&gt;
        &lt;span class="s"&gt;echo "HARD_BLOCK: PR touches plugins/ but extraction produced zero dirs"&lt;/span&gt;
        &lt;span class="s"&gt;exit 1&lt;/span&gt;
      &lt;span class="s"&gt;fi&lt;/span&gt;
    &lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Splitting &lt;code&gt;gh api --paginate&lt;/code&gt; from &lt;code&gt;jq&lt;/code&gt; removes the pipe-buffering interaction that ate stdout. Dropping the blanket &lt;code&gt;|| true&lt;/code&gt; lets real errors propagate. The third step is the actual fix: it asserts that &lt;em&gt;if&lt;/em&gt; the PR diff touched any plugin path, the extraction must have produced at least one row. "I found nothing" becomes "I would have found something — fail loud."&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The gitignore that ate plugin configs
&lt;/h2&gt;

&lt;p&gt;Repo: &lt;code&gt;claude-code-plugins&lt;/code&gt;, PR #733.&lt;/p&gt;

&lt;p&gt;The root &lt;code&gt;.gitignore&lt;/code&gt; contained one line that was never meant to apply globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.mcp.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The original intent was dev-local — devs sometimes drop a &lt;code&gt;.mcp.json&lt;/code&gt; at the repo root for personal MCP servers. The pattern matched everywhere. Three plugins — &lt;code&gt;slack-channel&lt;/code&gt;, &lt;code&gt;pr-to-spec&lt;/code&gt;, &lt;code&gt;x-bug-triage&lt;/code&gt; — had a &lt;code&gt;.mcp.json&lt;/code&gt; on disk because the mirror sync wrote them, and git silently never tracked any of the three. The mirror produced the file. The working tree showed the file. &lt;code&gt;git status&lt;/code&gt; showed it as ignored. Nothing red anywhere.&lt;/p&gt;

&lt;p&gt;Plugins without their &lt;code&gt;.mcp.json&lt;/code&gt; fail the MCP handshake at install time. Claude Code can't determine how to spawn the server. The plugin loads, registers nothing, and the user sees commands that do nothing.&lt;/p&gt;

&lt;p&gt;A second silent failure lived in the same PR. The mirror's &lt;code&gt;sources.yaml&lt;/code&gt; listed source files explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;plugins/x-bug-triage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;server.ts&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;lib.ts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;server.ts&lt;/code&gt; imports &lt;code&gt;journal.ts&lt;/code&gt;, &lt;code&gt;manifest.ts&lt;/code&gt;, &lt;code&gt;policy.ts&lt;/code&gt;, &lt;code&gt;supervisor.ts&lt;/code&gt; — none of which were in the allow-list. The mirror shipped a non-functional server, not because anything errored, but because the include list silently skipped the missing files. No "file not in sources" warning. No diff check. Just a partial build that compiled because the imports themselves were valid module references at type-check time but missing at runtime.&lt;/p&gt;

&lt;p&gt;The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# .gitignore
.mcp.json
!plugins/**/.mcp.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;plugins/x-bug-triage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;include&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.ts"&lt;/span&gt;
    &lt;span class="na"&gt;exclude&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.test.ts"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.spec.ts"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The negation rule re-tracks plugin configs. The glob-with-exclude replaces named-file allow-lists with a pattern that can't silently miss a new file. The three affected &lt;code&gt;.mcp.json&lt;/code&gt; files were force-added in the same commit.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Prettier that reformatted 11,000 lines and exited 0
&lt;/h2&gt;

&lt;p&gt;Repo: &lt;code&gt;claude-code-plugins&lt;/code&gt;, PR #730 (same PR as the prescreen failure).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.claude-plugin/marketplace.extended.json&lt;/code&gt; is the canonical plugin catalog — eleven thousand lines, hand-formatted with deliberate multi-line &lt;code&gt;keywords&lt;/code&gt; arrays for git-diff hygiene:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"example-plugin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"keywords"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"ci"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"validation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"marketplace"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A contributor's format-on-save action ran prettier across the catalog. Prettier collapsed every keyword array to a single line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"example-plugin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"keywords"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ci"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"validation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"marketplace"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The JSON was still valid. Prettier exited 0. The &lt;code&gt;validate-plugins.yml&lt;/code&gt; workflow loaded the catalog, parsed it, ran every entry through the schema — all green. The actual diff was +1 plugin entry, -1,200 lines of reformatted catalog. Every other in-flight PR's merge base was now unrecoverable without rebase-and-reformat.&lt;/p&gt;

&lt;p&gt;The fix has two parts. First, &lt;code&gt;.prettierignore&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.claude-plugin/marketplace.extended.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second, an active line-budget guard at &lt;code&gt;scripts/check-catalog-format.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;expected_line_delta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_catalog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;head_catalog&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_catalog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;head_catalog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;base_by_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plugins&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="n"&gt;head_by_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plugins&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="n"&gt;added&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;head_by_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_by_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;removed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_by_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;head_by_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;modified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;head_by_name&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;base_by_name&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;head_by_name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;base_by_name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="c1"&gt;# Average plugin block is ~30 lines.
&lt;/span&gt;    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;added&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;removed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modified&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

&lt;span class="n"&gt;actual_delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;file_line_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;file_line_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;expected_line_delta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;  &lt;span class="c1"&gt;# slack for inline edits
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;actual_delta&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAIL: catalog diff &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actual_delta&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; lines, budget &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The guard parses both catalogs structurally, computes the expected line delta from the actual content changes, and rejects PRs where the file delta exceeds that by more than 300 lines. "The file is still valid" becomes "the diff is the size we expected from the work that was claimed."&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The SSH deploy that succeeded by doing nothing
&lt;/h2&gt;

&lt;p&gt;Repo: &lt;code&gt;hustle&lt;/code&gt;, PR #40. Documented in the &lt;code&gt;intentsolutions-vps-runbook&lt;/code&gt; AAR for Phase 2.5 of the VPS migration.&lt;/p&gt;

&lt;p&gt;The new Hustle VPS deploy workflow merged green. The first auto-deploy reported success. The container on the VPS was untouched.&lt;/p&gt;

&lt;p&gt;The canonical reusable VPS deploy workflow is one SSH call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ssh ${{ env.DEPLOY_USER }}@${{ env.DEPLOY_HOST }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no command argument. The whole architecture relies on a &lt;code&gt;command="..."&lt;/code&gt; force-command directive in &lt;code&gt;authorized_keys&lt;/code&gt; to bind the deploy key to a specific script. Connect with the key, the forced command runs, deploy happens, connection closes.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;hustle-deploy&lt;/code&gt; user's &lt;code&gt;authorized_keys&lt;/code&gt; had no force-command. Plain &lt;code&gt;ssh user@host&lt;/code&gt; with no command and no force-command opens an interactive session. The runner has no TTY. The session sits idle for a moment, the server times out the silent connection, exit 0. From the runner's perspective: SSH connected, SSH closed cleanly, deploy step SUCCESS. From the VPS's perspective: a key authenticated, nothing happened, the session ended.&lt;/p&gt;

&lt;p&gt;The fix is a deploy script and a force-command lock:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# /usr/local/sbin/deploy-hustle&lt;/span&gt;
&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail
&lt;span class="nb"&gt;cd&lt;/span&gt; /srv/hustle
git fetch origin
git reset &lt;span class="nt"&gt;--hard&lt;/span&gt; origin/main
docker compose pull
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--remove-orphans&lt;/span&gt;
docker compose ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# /home/hustle-deploy/.ssh/authorized_keys
command="/usr/local/sbin/deploy-hustle",no-port-forwarding,no-X11-forwarding,no-pty ssh-ed25519 AAAA... deploy@github
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now there is no path where the SSH channel can do nothing. The forced command runs or the key fails to authenticate. The second deploy ran the script end-to-end, recreated the container, and produced visible log output the runner could grep.&lt;/p&gt;

&lt;p&gt;The generalization matters more than the fix. Every Docker-variant deploy in the fleet that depends on a force-command and doesn't have one is silently broken in the same way. &lt;code&gt;lilly-75-holy&lt;/code&gt; and &lt;code&gt;braves-booth&lt;/code&gt; are flagged for audit; &lt;code&gt;partner-portals&lt;/code&gt; and &lt;code&gt;claude-code-plugins-plus-skills&lt;/code&gt; are safe — both have the force-command directive in place. The fleet sweep is tracked as a follow-up bead off the P7 Stage C epic, not folded into this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The regex that skipped matches because &lt;code&gt;/g&lt;/code&gt; left state behind
&lt;/h2&gt;

&lt;p&gt;Repo: &lt;code&gt;intentional-cognition-os&lt;/code&gt;, PR #67 (a Gemini review followup on E10-B03).&lt;/p&gt;

&lt;p&gt;Two module-level constants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SOURCE_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\[\^&lt;/span&gt;&lt;span class="sr"&gt;src:&lt;/span&gt;&lt;span class="se"&gt;([^\]]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)\]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;WIKILINK_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\[\[([^\]]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)\]\]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Used in two back-to-back &lt;code&gt;RegExp.exec&lt;/code&gt; loops to iterate citation markers in a body of text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractCitations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;Citation&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Citation&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;SOURCE_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;WIKILINK_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;wikilink&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;RegExp&lt;/code&gt; instances with the &lt;code&gt;/g&lt;/code&gt; flag carry a mutable &lt;code&gt;lastIndex&lt;/code&gt; between calls. The &lt;code&gt;exec&lt;/code&gt; loop is supposed to walk it to the end and let the final non-match reset it to 0 — but any code path that exits the loop early, throws mid-iteration, or runs concurrently on the same regex object leaves &lt;code&gt;lastIndex&lt;/code&gt; mid-string. The next call to &lt;code&gt;extractCitations&lt;/code&gt; starts searching from wherever the last one stopped.&lt;/p&gt;

&lt;p&gt;The citation handler kept reporting "verified" because the missed citations were not checked at all — not flagged as missing, not flagged as wrong. They were invisible. Whichever entries fell before the carried-over &lt;code&gt;lastIndex&lt;/code&gt; were skipped silently, every time.&lt;/p&gt;

&lt;p&gt;The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractCitations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;Citation&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Required: SOURCE_RE and WIKILINK_RE are module-level /g regexes.&lt;/span&gt;
  &lt;span class="c1"&gt;// Reset lastIndex on entry so prior loop state cannot cause this call&lt;/span&gt;
  &lt;span class="c1"&gt;// to start mid-string and silently skip matches.&lt;/span&gt;
  &lt;span class="nx"&gt;SOURCE_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;WIKILINK_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Citation&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;SOURCE_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;WIKILINK_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;wikilink&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The comment is load-bearing. Without it, the next refactor pulls the resets out as "redundant" and the silent skip comes back. Six regression tests pin the invariant: prebuilt-index honored, batch aggregation correct, 100 sequential calls return identical output, two interleaved bodies (one long, one short) stay independent of each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of a silent failure
&lt;/h2&gt;

&lt;p&gt;All five share the same anatomy. There exists a legitimate no-op outcome — no plugin paths matched, no files to include, no formatting changes needed, no command to run, no remaining matches in the string. The error path produces an observable state identical to the legitimate no-op. The downstream consumer cannot tell which one it got.&lt;/p&gt;

&lt;p&gt;The fixes are not better error handling. The fixes are active assertions about the work that was claimed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;prescreen:&lt;/strong&gt; if files matched the trigger, the extraction must have produced rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gitignore + allow-list:&lt;/strong&gt; plugin configs must reach the tree, not just the working directory — and source allow-lists must fail on missing imports, not silently ship a partial build&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;prettier:&lt;/strong&gt; the diff size must match the structural work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSH deploy:&lt;/strong&gt; bind the command to the key — make it impossible for the channel to do nothing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;regex:&lt;/strong&gt; reset state to a known precondition before every call, and pin that contract with a test&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common verb in every fix is &lt;em&gt;assert&lt;/em&gt;, not &lt;em&gt;handle&lt;/em&gt;. The bug was not that errors weren't caught. The bug was that there was no point in the pipeline where the system stated, in code, what counted as the work actually being done.&lt;/p&gt;

&lt;p&gt;The hardest silent failures to catch are the ones where the tool's success state and its silent-failure state are observationally identical. That is the category. Once auditing for it begins, more keep surfacing — most CI pipelines have at least one step that exits 0 whether or not it did anything, and most of them are downstream of a step that &lt;em&gt;can&lt;/em&gt; legitimately produce empty output.&lt;/p&gt;

&lt;p&gt;Silent failures don't get worse over time. They get more confident. Each green check trains the audit instinct to skip them, and the audit instinct is the only thing standing between the build status and the truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/deterministic-first-llm-advisory-ci/"&gt;Deterministic-first, LLM-advisory CI&lt;/a&gt; — the broader argument for keeping reject/accept decisions in code that can be reasoned about, with model output as advisory signal&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/three-guards-against-shipping-slop/"&gt;Three guards against shipping slop&lt;/a&gt; — earlier examples of the same assert-the-work pattern in plugin merges&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/posts/two-false-positive-fixes-same-root-cause/"&gt;Two false-positive fixes, same root cause&lt;/a&gt; — when two unrelated bugs share an underlying shape&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cicd</category>
      <category>debugging</category>
      <category>devops</category>
      <category>claudecode</category>
    </item>
  </channel>
</rss>
