<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jun0</title>
    <description>The latest articles on DEV Community by Jun0 (@jun0-ds).</description>
    <link>https://dev.to/jun0-ds</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850354%2Fb0c4c5b1-dd46-426f-8ca0-8a3aa4354132.png</url>
      <title>DEV Community: Jun0</title>
      <link>https://dev.to/jun0-ds</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jun0-ds"/>
    <language>en</language>
    <item>
      <title>GPT-4 said strawberry has two R's. The word has three.</title>
      <dc:creator>Jun0</dc:creator>
      <pubDate>Thu, 07 May 2026 11:17:54 +0000</pubDate>
      <link>https://dev.to/jun0-ds/gpt-4-said-strawberry-has-two-rs-the-word-has-three-1m7b</link>
      <guid>https://dev.to/jun0-ds/gpt-4-said-strawberry-has-two-rs-the-word-has-three-1m7b</guid>
      <description>&lt;h2&gt;
  
  
  "How many R's are in 'strawberry'?"
&lt;/h2&gt;

&lt;p&gt;By 2024 every developer had seen the screenshot. GPT-4 confidently insisting &lt;code&gt;strawberry&lt;/code&gt; has two R's. The word has three. The fix eventually landed — but for a moment it captured something cleaner than any benchmark: a thing a human does in half a second, that the model gets confidently wrong.&lt;/p&gt;

&lt;p&gt;That's the picture most people have when they hear "hallucination." sonmat v0.8.0 (April 11, 2026) dealt with hallucinations. Just not that kind.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the 7% actually was
&lt;/h2&gt;

&lt;p&gt;The trigger was a 2,700-question wiki QA evaluation on a 24B model. Hallucination rate: 7%. Looking at the number you'd shrug — "yeah, LLMs hallucinate, that's life." But once I went through the actual flagged responses one by one, the picture was different.&lt;/p&gt;

&lt;p&gt;Strawberry-style cases — the model fabricating something that wasn't in its training distribution — were a minority. What showed up more often was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Facility management is in table A."&lt;br&gt;
Reality: it's in table B.&lt;br&gt;
Model dutifully searched table A.&lt;br&gt;
Found nothing, got confused, ended up extrapolating something plausible.&lt;br&gt;
This response landed in the 7% bucket.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Is this hallucination? From the user's seat, yes. &lt;strong&gt;The answer was wrong, that's all that matters.&lt;/strong&gt; But put a human in the same situation and the result is the same. An intern handed a wrong manual, sent off to find the facilities lead, comes back with a confused report. The model isn't broken. The input was.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two sources got tangled together
&lt;/h2&gt;

&lt;p&gt;Here's where I had to draw a line. The user experiences hallucination as &lt;strong&gt;one event&lt;/strong&gt;, but its source splits in two.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Where it starts&lt;/th&gt;
&lt;th&gt;Treatment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model-side&lt;/td&gt;
&lt;td&gt;Plausible combinations get assembled inside the weights (the strawberry case)&lt;/td&gt;
&lt;td&gt;Model researcher territory. Has to be fixed at the weights level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context-side&lt;/td&gt;
&lt;td&gt;The input was wrong; the model dutifully followed&lt;/td&gt;
&lt;td&gt;Doubt the input. System designer territory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The literature isn't unanimous either. Under faithfulness (does the output stay loyal to the input?), the context-side case is "loyal, so not a hallucination." Under factuality (does the output match reality?), it's "wrong, so yes, a hallucination." Ji et al.'s NLG hallucination survey (2023) splits intrinsic vs. extrinsic — and the wrong-manual case fits neither cleanly. Input-faithful and reality-unfaithful at the same time.&lt;/p&gt;

&lt;p&gt;The reason researchers can't agree is simple: from where the user sits, both look like the same event. "The AI was wrong." The split only matters if you're building tools — because &lt;strong&gt;different sources need different treatments&lt;/strong&gt;. Model-side, we can't touch. Context-side, we can.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strawberry isn't a one-off
&lt;/h3&gt;

&lt;p&gt;The same model-side pattern shows up wherever an LLM lands inside a rule-bound environment. Ask one to play chess and watch it confidently slide a rook diagonally, or move through another piece. The rule violation is obvious to any player. The model has no world model — just a learned distribution of plausible-looking continuations.&lt;/p&gt;

&lt;p&gt;Every code agent inherits this risk. It'll eventually do the equivalent of sliding a rook diagonally, with full confidence, and you won't catch it unless you're looking. Strawberry was a single screenshot. The pattern is structural.&lt;/p&gt;

&lt;p&gt;Model-side hallucination isn't sonmat's territory. v0.8 only dealt with the side we &lt;em&gt;can&lt;/em&gt; touch.&lt;/p&gt;




&lt;h2&gt;
  
  
  I was stuck in this frame for a while
&lt;/h2&gt;

&lt;p&gt;The split looks self-evident written down. It wasn't, for me. I spent a long stretch nodding along with "hallucination = model problem" — figuring sonmat could add all the doubt tools it wanted, none of them would touch the statistical combinations made inside the weights. I'd parked hallucination as &lt;strong&gt;out of scope for sonmat&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The 7% breakdown was what cracked that. The frame wasn't wrong, the scope was just way too tight. I was building a tool that says &lt;em&gt;doubt the context you're given&lt;/em&gt;, while standing on a piece of context I'd never doubted. Embarrassing place to be — but that's where v0.8 actually started.&lt;/p&gt;

&lt;p&gt;Both of the changes that followed were that one realization, pushed into discipline (reasoning rules) and into a skill (an action tool) at the same time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Six places in core
&lt;/h2&gt;

&lt;p&gt;I touched the discipline file first. &lt;code&gt;discipline/core.md&lt;/code&gt; is sonmat's short prescription for &lt;em&gt;how Claude should think&lt;/em&gt;. Up through v0.8, the doubt was almost entirely turned &lt;strong&gt;inward&lt;/strong&gt; — "are my assumptions actually solid? am I jumping to a conclusion?" That kind of question.&lt;/p&gt;

&lt;p&gt;v0.8 widened the doubt by one notch. Not just your reasoning — &lt;strong&gt;the context you received&lt;/strong&gt; is suspect too. Same line, planted in six places.&lt;/p&gt;

&lt;p&gt;The received context can be broken in three flavors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;incomplete&lt;/strong&gt; — left unsaid&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;imprecise&lt;/strong&gt; — said loosely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;incorrect&lt;/strong&gt; — said wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three coexist. Fixate on one and the others slip past. One-beat pause, for example, picked up this in v0.8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; ### One-beat pause
 Before agreeing with anything — is there something worth doubting here?
 If the question even crosses your mind, that's the signal. Check before you nod.
&lt;span class="gi"&gt;+This includes the context itself — it may be incomplete (left unsaid),
+imprecise (said loosely), or incorrect (said wrong).
+All three coexist; don't fixate on one.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same pattern landed in Strip to essentials, Predict before acting, Ground it, Pace it, and Weight it. Weight it got an extra line on top — split the &lt;em&gt;source&lt;/em&gt; of your confidence: verified fact / user statement / inference / guess. Not "I'm 80% sure" but "I'm 80% sure based on a user statement, which is not the same as a verified fact."&lt;/p&gt;

&lt;p&gt;A bunch of one-line additions that look tiny. The actual move was widening sonmat's territory of doubt from "inside the model's own reasoning" to "the inputs the model was handed." A tool that only doubts its own reasoning gets dragged the moment a user says "facility management is in table A" and is wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  Same realization, other face — &lt;code&gt;/punch&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;If the core changes were one face of v0.8, the other face was the new &lt;code&gt;/punch&lt;/code&gt; skill in the same release.&lt;/p&gt;

&lt;p&gt;Background: a quantitative pattern from communication-error research. Aviation CRM (Helmreich), surgical teams (Lingard 2004), software engineering (Boehm/Firesmith). Different domains, suspiciously similar splits:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error type&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Omission&lt;/td&gt;
&lt;td&gt;40–55%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Imprecision&lt;/td&gt;
&lt;td&gt;20–25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Incorrect&lt;/td&gt;
&lt;td&gt;10–15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context/timing&lt;/td&gt;
&lt;td&gt;10–20%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Caveat up front. &lt;strong&gt;This is the human-to-human distribution. There's no direct evidence LLM hallucinations follow the same ratios.&lt;/strong&gt; Borrowed assumption, not measured result. But the qualitative pattern — &lt;strong&gt;omissions vastly outnumber outright wrongs&lt;/strong&gt; — does seem to track on the LLM side. Models hallucinate by &lt;em&gt;filling in what you didn't say&lt;/em&gt; far more often than by &lt;em&gt;contradicting what you did say&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So the highest-ROI move is to find what's missing. Existing sonmat skills weren't doing that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/guard&lt;/code&gt; — "is this safe?"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/inspect&lt;/code&gt; — "what could break?"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/devil&lt;/code&gt; — "is this reasoning sound?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three inspect &lt;em&gt;what's there&lt;/em&gt;. "What's &lt;em&gt;not&lt;/em&gt; there but should be?" wasn't being asked by anyone. That's the slot &lt;code&gt;/punch&lt;/code&gt; fills.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;guard asks "is this safe?"&lt;br&gt;
inspect asks "what could break?"&lt;br&gt;
devil asks "is this reasoning sound?"&lt;br&gt;
&lt;strong&gt;punch asks "is anything missing?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The name is from a construction punch list. You walk a finished building with the contractor and note every outlet that was on the plan but not in the wall, every door that won't close, every fixture missing entirely. That walk.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why punch stands on two legs
&lt;/h2&gt;

&lt;p&gt;Method is short: &lt;strong&gt;reconstruct + domain checklist&lt;/strong&gt;. Two legs.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Reconstruct
&lt;/h3&gt;

&lt;p&gt;Code alone doesn't reveal intent. There's always something the user had in their head that never made it into the file, and that's where omission leaks the hardest. So &lt;code&gt;/punch&lt;/code&gt; doesn't analyze unilaterally. It opens a &lt;strong&gt;dialogue&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[punch] Inferred intent from the implementation:
  User stories: [...]
  Contracts: [...]
  Constraints: [...]
  Uncertain: [things I couldn't infer — input needed]
  Anything missing, off, or wrong here?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output at this point isn't a verdict. It's a &lt;strong&gt;checkpoint&lt;/strong&gt;. The valuable round happens when the user replies "oh, forgot that," "that's not what I meant." Aviation challenge-and-response, surgical Time Out, military brief-back — verification traditions across very different fields converge on the same shape. The maker and someone else, immediately after the work, run a quick alignment.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Domain checklist
&lt;/h3&gt;

&lt;p&gt;Reconstruction alone isn't enough. The bits the user themselves forgot don't surface in reconstruction. (The "missing bathroom" case.) So the second leg is a domain checklist:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Core items&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Web app&lt;/td&gt;
&lt;td&gt;Auth/session, input validation, error pages, loading states, responsive, a11y, CORS, rate limiting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Versioning, error format, auth, pagination, timeout, idempotency, docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data pipeline&lt;/td&gt;
&lt;td&gt;Schema validation, null/empty, dedup, retry, monitoring, backfill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;Help, exit codes, stdin/stdout, error messages, config, --dry-run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ML/AI&lt;/td&gt;
&lt;td&gt;Baseline, eval, data leakage, latency, fallback on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The checklist won't catch everything. Project-specific requirements aren't on it. But the territory the checklist covers and the territory reconstruction covers are &lt;strong&gt;orthogonal&lt;/strong&gt;. One leg asks "what was specifically intended for this project," the other asks "what does any project in this domain usually need." Run only one and the other half walks out the door.&lt;/p&gt;




&lt;h2&gt;
  
  
  The limit, plainly stated
&lt;/h2&gt;

&lt;p&gt;Where this frame is solid and where it leans on hope, separated honestly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The model-side hallucinations stay.&lt;/strong&gt; Strawberry, chess rooks, the lot. v0.8 doesn't dent them. Model-side comes out of the weights, weights belong to model researchers. sonmat doesn't touch it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The 7% number is one person's one test.&lt;/strong&gt; A 24B model, 2,700 wiki QA's. No guarantee the same distribution holds on a different model, a different domain, a different evaluation prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The error-rate table is from human-to-human research.&lt;/strong&gt; Aviation CRM, surgical teams, software engineering retrospectives. No direct evidence LLM hallucinations split into the same ratios. &lt;strong&gt;They look qualitatively similar&lt;/strong&gt; — that's the most I can honestly say.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sources don't always split cleanly.&lt;/strong&gt; A user mumbles half a requirement, the model fills in the rest from its learned distribution, and now context-side and model-side are tangled inside one response. This frame catches half of those at best.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With all that conceded — what did v0.8 actually do? One sentence.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one-line lesson
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;It pulled apart two events that had been bundled under the single word "hallucination,"&lt;/strong&gt; and started treating each one according to where it actually started. One source (model-side) we can't fix. The other (context-side) we can. The fix split in two — six lines in &lt;code&gt;discipline/core.md&lt;/code&gt; extending doubt outward to the input context, and a new tool, &lt;code&gt;/punch&lt;/code&gt;, that goes looking for what's missing.&lt;/p&gt;

&lt;p&gt;The same realization landed in discipline (rules of reasoning) and skill (an action tool) at once. Not coincidence — two faces of one finding. v0.8 didn't solve hallucination. It picked the events that had been &lt;em&gt;miscategorized as&lt;/em&gt; hallucination apart from the rest, and started treating them on their own terms.&lt;/p&gt;

&lt;p&gt;Move the direction of doubt one notch outward — from your own reasoning to the context you were handed. That was sonmat's step.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Release notes&lt;/em&gt;: &lt;a href="https://github.com/jun0-ds/sonmat/releases/tag/v0.8.0" rel="noopener noreferrer"&gt;v0.8.0&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Repo&lt;/em&gt;: &lt;a href="https://github.com/jun0-ds/sonmat" rel="noopener noreferrer"&gt;https://github.com/jun0-ds/sonmat&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/jun0-ds" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/junyoung-ryu-422501117/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>agents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>My devil's advocate worked. That was the bug.</title>
      <dc:creator>Jun0</dc:creator>
      <pubDate>Tue, 05 May 2026 11:41:18 +0000</pubDate>
      <link>https://dev.to/jun0-ds/my-devils-advocate-worked-that-was-the-bug-4k3k</link>
      <guid>https://dev.to/jun0-ds/my-devils-advocate-worked-that-was-the-bug-4k3k</guid>
      <description>&lt;h2&gt;
  
  
  Why a single tool got rewritten five times
&lt;/h2&gt;

&lt;p&gt;Inside sonmat there's a slash command called &lt;code&gt;/devil&lt;/code&gt;. When you arrive at a confident conclusion, it sits down and beats on it for a minute. Devil's advocate. A well-known reasoning move. I figured wrapping it in a slash command would be a one-shot job.&lt;/p&gt;

&lt;p&gt;Five rewrites.&lt;/p&gt;

&lt;p&gt;The first cut shipped on April 2nd as &lt;code&gt;v0.6.0&lt;/code&gt;. The last meaningful overhaul shipped on April 26th as &lt;code&gt;v0.11.0&lt;/code&gt;. Twenty-four days, five steps, and at every step the tool taught me one thing it didn't know yet. This post is that arc.&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.6 — meet imp
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/jun0-ds/your-ai-is-confident-your-ai-is-wrong-you-shipped-it-anyway-37i7"&gt;the first post in this series&lt;/a&gt; I went after the gap that swallows AI work — the model produces confident nonsense, the human gets persuaded by the confidence, nobody verifies. To close that gap I needed a tool that would force me into self-rebuttal.&lt;/p&gt;

&lt;p&gt;I called it &lt;code&gt;imp&lt;/code&gt;. Little gremlin. I liked the playful name. The description was just "Devil's advocate for reasoning."&lt;/p&gt;

&lt;p&gt;The first design was simple. Restate the user's claim in one line. Attack on three axes — Evidence (cherry-picked? missing data?), Logic (any leaps?), Alternatives (could the same facts support a different conclusion?). Name the cognitive bias in play. Close with a balance table — Strength column, Verdict column.&lt;/p&gt;

&lt;p&gt;It worked. The first two times.&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.7 — the name was lying
&lt;/h2&gt;

&lt;p&gt;Third use, I caught myself hesitating. The command was &lt;code&gt;/imp&lt;/code&gt;, but every line of the description called it "devil's advocate." Same tool, two names, in the same document. Every time I went to use it, my brain did a little "wait, was it imp," and that little hitch added up.&lt;/p&gt;

&lt;p&gt;Sounds petty. It isn't. The name &lt;em&gt;is&lt;/em&gt; the tool's identity. If the identity is fuzzy, every invocation costs you a context switch you shouldn't be paying for. Pile up enough of those and you stop calling the tool at all.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;v0.7.0&lt;/code&gt; (April 6th) renamed &lt;code&gt;/imp&lt;/code&gt; → &lt;code&gt;/devil&lt;/code&gt;. Breaking change. The user (still me) had to relearn a command. Did it anyway. A lie you're carrying gets more expensive the longer you carry it.&lt;/p&gt;

&lt;p&gt;That same release also dropped Rhythm Rules (Pace / Weight / Learn) into core. So &lt;code&gt;/devil&lt;/code&gt; stopped being a standalone tool and became one component inside a larger verification system. Pace is "when do I use this," Weight is "how heavy a pass do I need," Learn is "how do I save what I find." &lt;code&gt;/devil&lt;/code&gt; ended up being the thing you reach for when Pace and Weight tell you to.&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.7.1 → v0.10 — the table was unreadable
&lt;/h2&gt;

&lt;p&gt;Once it actually got used, the balance table started misbehaving:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Original claim | Counter-argument | Strength | Verdict |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Strength of &lt;em&gt;what?&lt;/em&gt; The original claim's strength? The counter-argument's strength? Every single time I read this table, I had to drop back into the body text to figure it out. The column name wasn't carrying the meaning it was supposed to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v0.7.1 (April 10th)&lt;/strong&gt; — pinned the subjects to the labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Original claim | Counter-argument | Counter strength | Claim verdict |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better. But a week later it was annoying me again. "Counter strength" is a noun phrase that's &lt;em&gt;too&lt;/em&gt; compact — it doesn't tell the reader what question the column is answering. The first thing you ask when you read a column header is "what is this column asking me?" The faster the header answers that, the faster the table reads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v0.10.0 (April 17th)&lt;/strong&gt; — turned the noun phrases into questions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Original claim | Counter-argument | Counter (strong/moderate/weak) | Claim after challenge |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual header text is "How strong is the counter-argument?" with the answer options shown in parentheses. "Verdict" — abstract noun — got replaced with "Claim after challenge," which has time embedded in it. Much more direct.&lt;/p&gt;

&lt;p&gt;Two passes for what looks like a tiny change. But column headers are the first signal a reader gets when they hit a table. If the first signal is fuzzy, every row underneath turns into noise. The header &lt;em&gt;is&lt;/em&gt; the meaning carrier; the cells are just data riding it. Worth two passes.&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.11 — devil started wearing me out
&lt;/h2&gt;

&lt;p&gt;The biggest shift came on April 26th, and here's the awkward part: &lt;strong&gt;the problem was that the tool was working.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/devil&lt;/code&gt; was too good at doubting.&lt;/p&gt;

&lt;p&gt;Every time I made a decision and called &lt;code&gt;/devil&lt;/code&gt;, it would dutifully surface real weaknesses. Not fluff — real ones. Cherry-picking risks. Causal directions I hadn't verified. Alternative hypotheses that explained the same data equally well. All true.&lt;/p&gt;

&lt;p&gt;The catch: &lt;strong&gt;none of it changed what I did next.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Concrete example. I'd ask &lt;code&gt;/devil&lt;/code&gt; about a decision like "which series should I file this post under?" and it would honestly point out that series taxonomies are arbitrary, that readers don't browse by series, that my own category metrics are basically empty. All true. But I still had to put the post somewhere, I could move it a week later if I felt like it, and none of the surfaced weaknesses were going to change my next click. The tool had done real work. The result was churn.&lt;/p&gt;

&lt;p&gt;I gave it a name — &lt;strong&gt;reactive contradiction&lt;/strong&gt;. Real weakness, but tangential. The tool looks smart. The user gets nothing actionable.&lt;/p&gt;

&lt;p&gt;Realizing this was the tool's signature failure mode is what triggered v0.11.&lt;/p&gt;

&lt;p&gt;The fix was a gate. &lt;code&gt;v0.11&lt;/code&gt; added a new section called §2.5: a &lt;strong&gt;project-relevance gate&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After CCT (Claim-crux / Counter-fit / Cause-chain — the existing step where devil hunts for the load-bearing assumption), there's now a gate it has to pass before §3 depth drive runs. Three questions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;What it asks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stakes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;If this reasoning is wrong, what does the user actually lose &lt;em&gt;here&lt;/em&gt;?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amendment cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Where in the lifecycle is this decision — draft, or shipped operational state?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Next-action delta&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;If this counter is surfaced, does the user's next action actually change, or do we just add words?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The three answers produce a verdict:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[devil] Project relevance: "{material | load-bearing-but-low-stakes | off-project}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;material&lt;/code&gt; — real stakes. Proceed to §3 depth.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;load-bearing-but-low-stakes&lt;/code&gt; — real weakness, but stakes don't justify a deep grilling. Note it briefly and stop.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;off-project&lt;/code&gt; — the weakness is technically correct but disconnected from the actual decision. Say so and stop.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last verdict — &lt;code&gt;off-project&lt;/code&gt; — is the one that mattered. Until v0.11, &lt;code&gt;/devil&lt;/code&gt; could only land on "claim survives" or "claim weakened/flipped." Both of those describe what happens &lt;em&gt;after&lt;/em&gt; a challenge lands. There was no way to express "the challenge itself missed." So &lt;code&gt;/devil&lt;/code&gt; would throw the missed punches anyway. That was the structural cause of reactive contradiction.&lt;/p&gt;

&lt;p&gt;Once &lt;code&gt;off-project&lt;/code&gt; existed, &lt;code&gt;/devil&lt;/code&gt; had the option of an honest exit. &lt;strong&gt;A tool shouldn't stand up when there's no work for it to do.&lt;/strong&gt; That's what the gate is for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five rewrites, four lessons
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. A verification tool needs to be verified too.&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;/devil&lt;/code&gt; can't audit itself, so the signal is &lt;strong&gt;user fatigue&lt;/strong&gt;. If after invoking the tool you find yourself sighing instead of nodding, the tool isn't doing its job — no matter how many real weaknesses it found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The name is the tool's identity.&lt;/strong&gt;&lt;br&gt;
v0.7 was about not carrying that imp/devil lie one more day. Every invocation that costs your reader a small mental swap is a tax that compounds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Column headers are where signal turns into noise.&lt;/strong&gt;&lt;br&gt;
Two passes to get them right. The leap was going from a noun phrase to a question. Tables hold data. Headers hold the question the data is answering. Get the second wrong and the first becomes meaningless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. A busy-looking tool isn't necessarily a useful tool.&lt;/strong&gt;&lt;br&gt;
When &lt;code&gt;/devil&lt;/code&gt; was finding real weaknesses on every call, it &lt;em&gt;looked&lt;/em&gt; like the tool was earning its keep. It wasn't. It was generating output that looked like work but wasn't actually moving anything. The fix wasn't to make &lt;code&gt;/devil&lt;/code&gt; smarter. It was to give it permission to leave.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's already pulling on the next rewrite
&lt;/h2&gt;

&lt;p&gt;The gate landed less than a week ago and there are already cracks I can see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The user doesn't always state Stakes explicitly. So how does &lt;code&gt;/devil&lt;/code&gt; estimate them? If the estimate's wrong, doesn't the whole gate collapse back into reactive contradiction?&lt;/li&gt;
&lt;li&gt;The boundary between &lt;code&gt;load-bearing-but-low-stakes&lt;/code&gt; and &lt;code&gt;off-project&lt;/code&gt; is sharp on paper but probably blurry in practice. Both end in "stop here," but they mean different things.&lt;/li&gt;
&lt;li&gt;When does &lt;code&gt;/devil&lt;/code&gt; need to self-critique its own verdict? "Your gate verdict is too conservative" is a meta-challenge that doesn't have a home yet.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five rewrites in, the work isn't done. What goes into the next release will be whatever the user (still me) gets tired of next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add jun0-ds/sonmat
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;sonmat@sonmat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After install, try calling &lt;code&gt;/devil&lt;/code&gt; on any decision you're sitting on. If the result feels like churn, that's the signal — the tool has no real work here. Catching that signal is how you train the tool, and yourself.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/jun0-ds/sonmat" rel="noopener noreferrer"&gt;GitHub: jun0-ds/sonmat&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/jun0-ds" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/junyoung-ryu-422501117/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>agents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I built sonmat to fix this. Then sonmat had the same bug.</title>
      <dc:creator>Jun0</dc:creator>
      <pubDate>Sun, 03 May 2026 03:16:19 +0000</pubDate>
      <link>https://dev.to/jun0-ds/i-built-sonmat-to-fix-this-then-sonmat-had-the-same-bug-3n49</link>
      <guid>https://dev.to/jun0-ds/i-built-sonmat-to-fix-this-then-sonmat-had-the-same-bug-3n49</guid>
      <description>&lt;h2&gt;
  
  
  Another confession
&lt;/h2&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/jun0-ds/your-ai-is-confident-your-ai-is-wrong-you-shipped-it-anyway-37i7"&gt;last post&lt;/a&gt;, I went after the bug that every Claude Code discipline plugin seems to share: the rules live in the main session, the work happens in the workers (subagents), and the rules don't make the trip across. I named names. I quoted the maintainer of &lt;code&gt;superpowers&lt;/code&gt; closing a related issue as "not planned." And then, with a straight face, I claimed that sonmat was different.&lt;/p&gt;

&lt;p&gt;It really wasn't. Not yet, anyway.&lt;/p&gt;

&lt;p&gt;For a while, sonmat had this nicely-crafted hook. Every time you opened Claude Code, it would shove 1,239 characters of discipline into &lt;code&gt;additionalContext&lt;/code&gt; before you even said hello. "MANDATORY. Apply Break it / Cross it / Ground it. Read project memory. Watch for novel traps…" Every session, every time, before the model got a word in.&lt;/p&gt;

&lt;p&gt;I thought this was the strong play. The hook fires before the model speaks, the instruction lands in &lt;code&gt;additionalContext&lt;/code&gt;, the discipline can't be skipped. That was the theory.&lt;/p&gt;

&lt;p&gt;What I didn't notice — for embarrassingly long — was that I was rebuilding, with my own hands, the exact bug I'd just spent a whole post laughing at.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I figured it out
&lt;/h2&gt;

&lt;p&gt;Here's the awkward bit: &lt;code&gt;additionalContext&lt;/code&gt; is delivered to the main session. It is not delivered to subagents.&lt;/p&gt;

&lt;p&gt;So picture what was actually happening. The discipline lived in the place I could see (the main session). It was completely absent from the place where the work actually got done (the workers). The main session would dutifully announce "applying Break / Cross / Ground" — and then dispatch a worker. The worker would receive a clean task with a clean context. No discipline. The worker would shrug and go, "this is simple enough, I don't need tests." The result would come back, the main session would format it confidently (still holding all 1,239 characters of rules), and I'd nod along and approve.&lt;/p&gt;

&lt;p&gt;It was not fine.&lt;/p&gt;

&lt;p&gt;Which is to say: the exact failure mode I'd been mocking in &lt;code&gt;superpowers&lt;/code&gt; and &lt;code&gt;karpathy-skills&lt;/code&gt;? Same mechanism, different label, &lt;strong&gt;mine.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Honestly, I only caught it kind of by accident. I'd started spinning up other CLIs for the same kind of work, and something in the output felt off. So I went poking around. Turns out, every CLI handles hooks slightly differently — different contracts, different injection points, sometimes none at all. And while I was wrestling with making the discipline survive outside Claude Code, the thing that should have been obvious &lt;em&gt;inside&lt;/em&gt; Claude Code finally clicked: &lt;strong&gt;a hook that lands in one place but not another isn't a guarantee. It's a happy accident that landed in the main session.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The bug wasn't multi-CLI. The bug was that I'd been calling that happy accident a guardrail.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in v0.4.0
&lt;/h2&gt;

&lt;p&gt;I emptied the hook. &lt;code&gt;additionalContext&lt;/code&gt;: 1,239 → 0.&lt;/p&gt;

&lt;p&gt;The discipline didn't disappear — it just moved. It now lives in &lt;code&gt;CLAUDE.md → discipline/core.md&lt;/code&gt;, the same file the agent already reads as part of its prompt context, and the same file you'd put any other instruction in. Workers spawned by the main agent inherit the same &lt;code&gt;CLAUDE.md&lt;/code&gt; chain. So the rule lands in the same place, every layer.&lt;/p&gt;

&lt;p&gt;The hook still runs. It just sticks to what hooks are good at. Make the &lt;code&gt;.claude/sonmat/&lt;/code&gt; directory. Plant a one-time &lt;code&gt;## sonmat&lt;/code&gt; block in your global &lt;code&gt;CLAUDE.md&lt;/code&gt; so the discipline gets referenced. Check for updates. &lt;strong&gt;Side effects only.&lt;/strong&gt; It doesn't try to shape behavior anymore.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BEFORE                                 AFTER
hooks/session-start                    hooks/session-start
  └─ additionalContext: 1,239 chars      └─ side effects only
       "MANDATORY: sonmat..."                 ├─ create .claude/sonmat/
       (delivered to main session             ├─ plant ## sonmat block
        only — workers never saw it)          └─ git pull if outdated

                                       CLAUDE.md → discipline/core.md
                                         (read by main and by every
                                          worker spawned from it.
                                          visible to the user. editable.)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same discipline, different path. The behavior didn't get weaker — it just got honest about where it actually lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four things I believe now
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. A guardrail that doesn't reach the worker is a fake guardrail.&lt;/strong&gt;&lt;br&gt;
If your "mandatory" rule is being delivered through a channel the worker doesn't subscribe to, it isn't mandatory. It's decoration. And the trap is that you can see it sitting in the main session — which is exactly why you stop checking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Visibility &lt;em&gt;is&lt;/em&gt; the contract.&lt;/strong&gt;&lt;br&gt;
A rule sitting in &lt;code&gt;additionalContext&lt;/code&gt; is invisible to the user. You can't read it, can't edit it, can't disagree with it. A rule sitting in &lt;code&gt;CLAUDE.md → core.md&lt;/code&gt; is just there, in the repo. The agent reads it. You read it. You can disagree with it — and that's a &lt;em&gt;good&lt;/em&gt; thing, because that's how drift gets caught before it ships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Hooks are for side effects. They are not for behavior.&lt;/strong&gt;&lt;br&gt;
Make the directory, plant the marker, pull the update. That's the job. The moment a hook starts trying to shape what the agent does, you're betting that the hook fires in every code path the agent will ever take. It doesn't. It can't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. "Strong" enforcement is usually fragile enforcement.&lt;/strong&gt;&lt;br&gt;
The 1,239-character injection felt powerful because it was automatic. But automatic-and-incomplete is worse than manual-and-complete — the user trusts the automation and stops looking. Moving discipline into a file the user can edit (and ignore) sounds weaker. It isn't. It's where the user actually re-enters the loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard part
&lt;/h2&gt;

&lt;p&gt;Honestly, emptying the hook felt like giving up control. The hook was the place where I could &lt;em&gt;make sure&lt;/em&gt;. If discipline lives in &lt;code&gt;CLAUDE.md&lt;/code&gt;, the user can edit it, override &lt;code&gt;core.md&lt;/code&gt;, even ignore the whole thing.&lt;/p&gt;

&lt;p&gt;Which, yes, is the entire point.&lt;/p&gt;

&lt;p&gt;A discipline the user can't see is a discipline the user can't trust. A discipline the user can't edit is a rule, not a tool — and sonmat is supposed to be a tool. &lt;strong&gt;Visibility is the price of trust.&lt;/strong&gt; And there's a bonus: the discipline now reaches the workers, because the workers read the same file the user reads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Diagnose your own setup
&lt;/h2&gt;

&lt;p&gt;If you're running any Claude Code plugin that promises "guardrails," try asking three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Where does the rule physically live?&lt;/strong&gt; A hook injecting &lt;code&gt;additionalContext&lt;/code&gt;? A skill the model has to remember to invoke? A line in &lt;code&gt;CLAUDE.md&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Who actually reads it?&lt;/strong&gt; Just the main session? Workers too? Subagents spawned from workers?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Can you see it yourself?&lt;/strong&gt; If you can't open a file and read the rule that's supposedly governing your agent, you don't have a guardrail. You have a vibe.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I had to put my own plugin through those three questions before the answer became obvious. Doing the diagnosis in &lt;a href="https://dev.to/jun0-ds/your-ai-is-confident-your-ai-is-wrong-you-shipped-it-anyway-37i7"&gt;01&lt;/a&gt; was the easy part. Applying it to sonmat itself took a lot longer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add jun0-ds/sonmat
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;sonmat@sonmat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After install, the discipline lives at &lt;code&gt;~/.claude/plugins/marketplaces/sonmat/discipline/core.md&lt;/code&gt;. Open it. Read it. Disagree with parts of it if you want — that's actually how you'll know it's doing something real.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/jun0-ds/sonmat" rel="noopener noreferrer"&gt;GitHub: jun0-ds/sonmat&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/jun0-ds" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/junyoung-ryu-422501117/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>agents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your AI is confident. Your AI is wrong. You shipped it anyway.</title>
      <dc:creator>Jun0</dc:creator>
      <pubDate>Fri, 01 May 2026 03:13:10 +0000</pubDate>
      <link>https://dev.to/jun0-ds/your-ai-is-confident-your-ai-is-wrong-you-shipped-it-anyway-37i7</link>
      <guid>https://dev.to/jun0-ds/your-ai-is-confident-your-ai-is-wrong-you-shipped-it-anyway-37i7</guid>
      <description>&lt;h2&gt;
  
  
  A confession
&lt;/h2&gt;

&lt;p&gt;I told Claude to write tests first. Claude said "understood." Then Claude spawned a subagent. The subagent said "this is simple enough, I don't need tests." It shipped. I approved. The tests that didn't exist didn't fail. Everything looked fine.&lt;/p&gt;

&lt;p&gt;It was not fine.&lt;/p&gt;

&lt;p&gt;The fun part: I had three plugins installed specifically to prevent this. They were all working correctly. In the main session. Where the work wasn't happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with being confident
&lt;/h2&gt;

&lt;p&gt;AI agents have a specific failure mode: they sound right even when they're wrong. This is well-known. What's less discussed is the other half — &lt;strong&gt;you also stop checking when the output sounds right.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So you have two parties in a conversation. One produces confident nonsense. The other accepts it because confidence is persuasive. Nobody verifies. Errors ship.&lt;/p&gt;

&lt;p&gt;This is not a technology problem. This is a trust problem. And every tool I tried was solving the wrong half of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What every plugin gets right (and then misses)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/obra/superpowers" rel="noopener noreferrer"&gt;superpowers&lt;/a&gt; (175k stars) adds TDD, debugging, code review. Smart rules. They live in the main session. When Claude spawns a subagent — which is where the actual work happens — the subagent &lt;a href="https://github.com/obra/superpowers/issues/237" rel="noopener noreferrer"&gt;doesn't get them&lt;/a&gt;. The maintainer closed it as &lt;code&gt;not planned&lt;/code&gt;: "this is a Claude Code platform limitation. There's not much superpowers can do."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/forrestchang/andrej-karpathy-skills" rel="noopener noreferrer"&gt;karpathy-skills&lt;/a&gt; puts principles in CLAUDE.md. Subagents &lt;a href="https://github.com/anthropics/claude-code/issues/22022" rel="noopener noreferrer"&gt;can't reliably read CLAUDE.md&lt;/a&gt;. Sometimes they claim they did. They didn't.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/gsd-build/gsd-2" rel="noopener noreferrer"&gt;GSD&lt;/a&gt; has beautiful structure. Milestones, slices, tasks. Discipline is the user's job. The framework doesn't enforce it at the worker level.&lt;/p&gt;

&lt;p&gt;The pattern: great rules → main session only → workers ignore them → output looks fine → it isn't.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/8395" rel="noopener noreferrer"&gt;Documented&lt;/a&gt;. &lt;a href="https://github.com/anthropics/claude-code/issues/22022" rel="noopener noreferrer"&gt;Repeatedly&lt;/a&gt;. &lt;a href="https://github.com/obra/superpowers/issues/237" rel="noopener noreferrer"&gt;Across projects&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built instead
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/jun0-ds/sonmat" rel="noopener noreferrer"&gt;sonmat&lt;/a&gt; (손맛 — Korean for "mother's touch." The secret ingredient that makes the same recipe taste different.)&lt;/p&gt;

&lt;p&gt;It does two things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Makes the AI doubt itself.&lt;/strong&gt; Verification discipline goes directly into every worker's prompt at dispatch time. Not a file reference. Not a hook that might fire. The actual rules, in the actual prompt. Break it, Cross it, Ground it — on every task, including the ones you don't see.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Makes you doubt the AI.&lt;/strong&gt; Every decision surfaces with its reasoning. Not "here's the answer" but "here's the answer, here's why, and here's what I'm not sure about." When you see the reasoning, you can judge. When you only see the answer, you won't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And the AI doubts you back.&lt;/strong&gt; When your instruction is ambiguous or conflicts with what it sees, sonmat doesn't just comply — it asks. The same verification attitude applies in both directions.&lt;/p&gt;

&lt;p&gt;That's the whole thing. Everything else — autonomous loops, escalation levels, domain-specific traps — is implementation detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four things I believe now
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Confidence is the worst signal.&lt;/strong&gt;&lt;br&gt;
When the model feels sure, that's exactly when it should look for counterexamples. Confidence without verification is hallucination in a suit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Rules that don't reach workers are decoration.&lt;/strong&gt;&lt;br&gt;
A coding standard that exists only in the main session is a Post-it note on a door nobody walks through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Autonomy without guardrails is just expensive chaos.&lt;/strong&gt;&lt;br&gt;
sonmat escalates automatically — pause, spawn worker, spawn parallel workers — when it hits surprises or repeated failures. You don't babysit. It doesn't run blind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Universal rules are universally mediocre.&lt;/strong&gt;&lt;br&gt;
"Write tests first" is critical for dev, meaningless for data analysis. "One change at a time" is essential for ML, overkill for docs. sonmat loads domain-specific traps. The right advice for the right context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard lesson
&lt;/h2&gt;

&lt;p&gt;I wanted to add more rules. Every edge case screamed for a new rule. I resisted.&lt;/p&gt;

&lt;p&gt;Too few rules: chaos. Too many: the agent spends its time checking boxes instead of working. The answer was a small, hard core — three verification methods — plus domain hints that activate only when relevant.&lt;/p&gt;

&lt;p&gt;The other lesson: &lt;strong&gt;transparency beats enforcement.&lt;/strong&gt; A guard that says "no" gets worked around. A colleague that says "I noticed this — your call" gets listened to. sonmat chose the second approach. For the AI and for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add jun0-ds/sonmat
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;sonmat@sonmat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No config. Start talking.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/jun0-ds/sonmat" rel="noopener noreferrer"&gt;GitHub: jun0-ds/sonmat&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/jun0-ds" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/junyoung-ryu-422501117/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>agents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Spent a Week Installing WSL2. The Fix Was Two Lines.</title>
      <dc:creator>Jun0</dc:creator>
      <pubDate>Tue, 31 Mar 2026 01:22:16 +0000</pubDate>
      <link>https://dev.to/jun0-ds/i-spent-a-week-installing-wsl2-the-fix-was-two-lines-hcb</link>
      <guid>https://dev.to/jun0-ds/i-spent-a-week-installing-wsl2-the-fix-was-two-lines-hcb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;"WSL2? Five minutes, tops." — Me, seven days ago.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;On Windows 11 25H2 (build 26200), enabling VirtualMachinePlatform hangs at 37.8%. Forever. The servicing stack is stuck on 24H2 (build 26100) while the OS moved to 25H2 (build 26200). Windows literally cannot service itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Add-WindowsCapability&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Microsoft.Windows.HyperV.VirtualMachinePlatform~~~~0.0.1.0"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;dism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/enable-feature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/featurename:VirtualMachinePlatform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/LimitAccess&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you just want the fix, there it is. If you want to know how it took a week and 10 failed attempts to arrive at two lines of PowerShell, keep reading.&lt;/p&gt;




&lt;h2&gt;
  
  
  Day 1: Innocence
&lt;/h2&gt;

&lt;p&gt;Simple plan. Install WSL2. Set up Ubuntu 24.04. Do actual work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;wsl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Ubuntu-24.04&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;37.8%.&lt;/p&gt;

&lt;p&gt;I made coffee. Came back. 37.8%.&lt;/p&gt;

&lt;p&gt;I had lunch. Came back. 37.8%.&lt;/p&gt;

&lt;p&gt;37.8 is now my least favorite number.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Crime Scene
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OS&lt;/td&gt;
&lt;td&gt;Windows 11 Pro 25H2 (Build 26200.8037)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;Intel Core Ultra 9 275HX&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DISM version&lt;/td&gt;
&lt;td&gt;10.0.26100.5074&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Servicing stack&lt;/td&gt;
&lt;td&gt;10.0.26100.8035&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;See it? &lt;strong&gt;OS is build 26200. Servicing stack is build 26100.&lt;/strong&gt; The mechanic has last year's manual for this year's car. If you spotted this, you already know the ending. I did not spot this on Day 1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Days 2-6: The Parade of Failures
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GUI — Stuck. Cancel also stuck.
&lt;/h3&gt;

&lt;p&gt;"Turn Windows features on/off." Checked the box. Progress bar froze.&lt;/p&gt;

&lt;p&gt;Clicked Cancel. Cancel froze.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A cancel button that cannot be cancelled.&lt;/strong&gt; This is the operating system from the world's most valuable company.&lt;/p&gt;

&lt;p&gt;Task Manager. End Process. Moving on.&lt;/p&gt;

&lt;h3&gt;
  
  
  DISM — 37.8%
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;dism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/enable-feature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/featurename:VirtualMachinePlatform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/norestart&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;37.8%. We meet again.&lt;/p&gt;

&lt;h3&gt;
  
  
  PowerShell — Same wall, different paint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Enable-WindowsOptionalFeature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FeatureName&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;VirtualMachinePlatform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-All&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-NoRestart&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calls DISM internally. Same result. Changing the wrapper doesn't change the candy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uninstalled BlueStacks — Wrong suspect
&lt;/h3&gt;

&lt;p&gt;Found forum posts: "Android emulators conflict with Hyper-V." I had BlueStacks 10 installed. Uninstalled it completely. Registry cleanup. Folder purge. The works.&lt;/p&gt;

&lt;p&gt;Result: 37.8%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BlueStacks was innocent.&lt;/strong&gt; And now I have to reinstall it later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Offline install from 24H2 ISO — Version mismatch
&lt;/h3&gt;

&lt;p&gt;"If the download is the problem, go offline."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;dism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/enable-feature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/featurename:VirtualMachinePlatform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/LimitAccess&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/Source:D:\Sources\Install.wim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;0x800f0912&lt;/code&gt;. The ISO is 24H2 (26100), the OS is 25H2 (26200). Windows refuses the source files because they're from "the wrong version." Self-compatibility is apparently optional.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windows Update cache reset — No effect
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;stop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wuauserv&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;stop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bits&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Remove-Item&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;C:\Windows\SoftwareDistribution\Download\&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Recurse&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Force&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bits&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wuauserv&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean cache. Same 37.8%. Cleaning the house doesn't fix the plumbing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pending operations cleanup — Partial
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;dism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/cleanup-image&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/revertpendingactions&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cleared the backlog. Feature still won't activate. Finishing your homework doesn't mean you'll pass the exam.&lt;/p&gt;

&lt;h3&gt;
  
  
  In-place repair install — Didn't downgrade
&lt;/h3&gt;

&lt;p&gt;The nuclear option. Ran 24H2 ISO's &lt;code&gt;setup.exe&lt;/code&gt; with "Keep personal files and apps."&lt;/p&gt;

&lt;p&gt;40 minutes. Three reboots.&lt;/p&gt;

&lt;p&gt;Result: &lt;strong&gt;OS stayed on 25H2 (26200).&lt;/strong&gt; In-place install doesn't downgrade. But it did clean up the component store. This becomes a crucial plot point later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Day 7: Reading the Logs (Finally Using My Brain)
&lt;/h2&gt;

&lt;p&gt;A week of "try something else until it works." At this point, the question isn't &lt;em&gt;what&lt;/em&gt; doesn't work — it's &lt;em&gt;why&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Select-String&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;C:\Windows\Logs\CBS\CBS.log&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Pattern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Error|Failed|0x800f"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Context&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Select-Object&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Last&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The smoking gun:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Failed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;uup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;features&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;WU,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sessionData:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ModuleID"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"FOD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Features"&lt;/span&gt;&lt;span class="p"&gt;:[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Windows.HyperV.OptionalFeature.VirtualMachinePlatform.Client.Disabled~"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;HRESULT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="err"&gt;x&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="err"&gt;f&lt;/span&gt;&lt;span class="mi"&gt;0820&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;CBS_E_CANCEL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;download source: 8, download time (secs): 1256, 
download status: 0x800f0820 (CBS_E_CANCEL)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;1,256 seconds. Twenty-one minutes waiting for Windows Update to deliver a package it will never find.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;From the DISM log, the confession:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Dism.exe version: 10.0.26100.5074
Target image: OS Version=10.0.26200.8037
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The servicing stack (26100) is trying to service a newer OS (26200). It goes to Windows Update looking for FOD packages matching this combination. Those packages don't exist in the UUP catalog. So it waits. And waits. And times out at 37.8%.&lt;/p&gt;

&lt;p&gt;A car mechanic with a 2023 catalog trying to order parts for a 2024 model. "This part number doesn't exist in our system, sir."&lt;/p&gt;

&lt;p&gt;This is the servicing architecture of the world's largest software company.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Use the Back Door
&lt;/h2&gt;

&lt;p&gt;DISM can't download the FOD through its usual channel (UUP). But &lt;code&gt;Add-WindowsCapability&lt;/code&gt; uses a &lt;strong&gt;different channel&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Same building. Front door is under construction. Back door is open. The sign only mentions the front door. Classic Windows UX.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Back door — install payload via alternative channel&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Add-WindowsCapability&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Microsoft.Windows.HyperV.VirtualMachinePlatform~~~~0.0.1.0"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Step 2: Now activate using only local files (no internet needed)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;dism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/enable-feature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/featurename:VirtualMachinePlatform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/LimitAccess&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;100%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One hundred percent.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First three-digit number I've seen all week.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Front door — BLOCKED]
DISM enable-feature
  → Needs FOD payload
  → Windows Update UUP channel
  → Servicing stack (26100) ≠ OS (26200)
  → "Part not found in catalog"
  → 21 min timeout → CBS_E_CANCEL

[Back door — OPEN]
Add-WindowsCapability  
  → Different download channel (bypasses UUP)
  → Payload installed locally ✓

DISM + /LimitAccess
  → "Internet? Don't need it."
  → Local files only
  → Success ✓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both commands use the same Windows servicing system. But they fetch FOD packages through different channels. DISM goes through UUP, where the version mismatch kills it. &lt;code&gt;Add-WindowsCapability&lt;/code&gt; takes a different route. The official docs don't mention this distinction. You're welcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  If You Found This Article
&lt;/h2&gt;

&lt;p&gt;You've probably already tried and failed multiple times. That means &lt;strong&gt;pending operations&lt;/strong&gt; are likely piled up. Clean house first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Light cleanup&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;dism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/cleanup-image&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/revertpendingactions&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# → Reboot → Run the two-line fix&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# If that's not enough (in-place repair)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# 24H2 ISO → setup.exe → "Keep personal files and apps"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# → Reboot → Run the two-line fix&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Complete Flow (For Fresh Starts)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Enable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;virtualization&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;BIOS&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="n"&gt;HP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;laptops:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;F10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Security&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Configuration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Enable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;VT-x&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Enable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Hyper-V&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;WSL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;these&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;work&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ironically&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="n"&gt;dism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/online&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/enable-feature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/featurename:Microsoft-Hyper-V-All&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/norestart&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Clean&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pending&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;operations&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kr"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;you&lt;/span&gt;&lt;span class="s1"&gt;'ve been trying things)
   dism /online /cleanup-image /revertpendingactions → reboot
        ↓
4. The actual fix
   Add-WindowsCapability -Online -Name "Microsoft.Windows.HyperV.VirtualMachinePlatform~~~~0.0.1.0"
   dism /online /enable-feature /featurename:VirtualMachinePlatform /all /LimitAccess
        ↓
5. Reboot → Install WSL2
   wsl --install -d Ubuntu-24.04
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Is This Your Problem?
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check version mismatch in DISM log&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Get-Content&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;C:\Windows\Logs\DISM\dism.log&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Tail&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# "version:" ≠ "image version:" → yes, this is your problem&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Check CBS log for the specific failure&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Select-String&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;C:\Windows\Logs\CBS\CBS.log&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Pattern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Error|Failed|0x800f"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Context&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Select-Object&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Last&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# "CBS_E_CANCEL" → yes, this is your problem&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Verify CPU virtualization (prerequisite)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Get-CimInstance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ClassName&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Win32_Processor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; 
  &lt;/span&gt;&lt;span class="n"&gt;Select-Object&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;VirtualizationFirmwareEnabled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;VMMonitorModeExtensions&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Symptom&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VirtualMachinePlatform stuck at 37.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Root cause&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;25H2 OS (26200) + 24H2 servicing stack (26100) = FOD download mismatch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fix&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Add-WindowsCapability&lt;/code&gt; to bypass → &lt;code&gt;dism /LimitAccess&lt;/code&gt; to activate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prerequisite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clear pending operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One week of evenings. One wrongly accused BlueStacks. A lasting distrust of progress bars.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Lessons
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Read the logs first.&lt;/strong&gt; "Try random things until something works" is the scenic route to nowhere.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows Insider means signing up for this.&lt;/strong&gt; 25H2 is a preview build. The servicing stack hasn't caught up. Now you know.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Same building, multiple doors.&lt;/strong&gt; When DISM fails, &lt;code&gt;Add-WindowsCapability&lt;/code&gt; exists. The docs won't tell you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feed your logs to AI.&lt;/strong&gt; Nobody should read 160,000 lines of CBS log with their own eyes.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;This troubleshooting session was done with &lt;a href="https://claude.ai/code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;. It pulled the critical 6 lines from a 160,000-line CBS log and helped identify the &lt;code&gt;Add-WindowsCapability&lt;/code&gt; back door. Without it, this would have ended with a format and reinstall.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>wsl2</category>
      <category>troubleshooting</category>
      <category>virtualization</category>
      <category>claudecode</category>
    </item>
  </channel>
</rss>
