DEV Community: Jun0

I put a witness on my AI. Devil's advocate killed three designs first.

Jun0 — Thu, 21 May 2026 23:50:32 +0000

The short version up front: sonmat v0.9 added an agent called sonmat-witness. It's an external verifier that checks whether the main session's output matches what the user actually asked for — without reading the main session's reasoning.

That's the clean design story. The real story happened while I was designing. In a single session I fell into the same trap three times, and each time a single round of /devil collapsed the assumption underneath the design. This post is about both halves.

Why a witness at all

sonmat started from a single rule: doubt. Whatever you wrote — AI or human — don't take it at face value. Go back and look at it once more, suspiciously. That's the discipline at the root of the whole project.

The place v0.9 ran into was where that rule doesn't quite reach. Self-doubt doesn't work on the self.

People know this one. You can't proofread your own writing — your eyes skim past the typos because your brain knows what was meant. In an OR, the surgeon doesn't read their own Time Out; another team member does. On a flight deck, when the captain calls "FLAPS — TWO," the first officer doesn't just respond "two." They look at the lever and read the position back. Doubt has to come from outside the writer.

So if self-doubt doesn't reach, you need a place outside to do the doubting from. That place is what witness fills.

The name I reached for first was actually hunsugun — a Korean word for the person who watches over your shoulder at a Go board and points out the move you missed. The role fit. But the word doesn't carry into English cleanly, and the series runs in both languages, so I borrowed witness — broader, and the courtroom register turned out to fit the role just as well.

For the witness seat to actually do its job, two things have to be true.

The verifier can't see the executor's reasoning. If it can, it inherits the executor's rationalization and turns into a rubber stamp.
The verifier's input has to be the user's actual words. Not the main session's interpretation of "what the user wanted" — the raw turn the user typed.

Sonmat's existing guard skill solved neither. guard runs inside the main session, sharing its full context. It's fine for operational checks — running tests, blocking sensitive files, enforcing discipline conformance — but useless for "does this output match what was asked," because there's no structural isolation between checker and writer.

So I split witness out. What it actually looks like in code I'll get to in a moment — but first, the design isn't what made this release. What made it was what happened during the design.

Three reversals in one session

While building witness, I fell into the same kind of mistake three times in a row. Each time I'd accept a clever-sounding name or structure and start designing inside that frame. Each time /devil (sonmat's devil's-advocate skill, the subject of post 03) pulled out the load-bearing assumption underneath and showed me it didn't hold. Three flips, same session.

Round 1: "the inverted three-tier architecture"

There was a line in sonmat's memory from a month back:

"inverted three-tier design in progress."

The main session as the human conversationalist, an orchestrator below it, workers below that. I'd been carrying this picture in my head for a month. When witness came up, it slotted naturally on top of the three tiers — the verifier sitting at the orchestrator layer, watching what the workers produced. Clean. Too clean, in retrospect — clean enough that I didn't think to question it.

Last-pass check, ran a /devil CCT. CCT is the discovery step inside /devil: find the one load-bearing line of the claim first.

Claim-crux: the architecture rests on the assumption that Claude Code supports nested subagent delegation.
Counter-fit: the whole design was drawn before reading the platform docs. Zero evidence the assumption holds.

That line landed cold. A month-old picture, the whole thing hanging on a single sentence nobody had checked. So I read the docs. Claude Code's multi-agent page:

"Only one level of delegation is supported: the coordinator can call other agents, but those agents cannot call agents of their own."

Explicit ban. Three-tier was structurally impossible on the current harness. The instant "inverted three-tier" got written into a memory file a month earlier, every subsequent design conversation lived inside that frame — and nobody had asked whether the platform supported it.

Two-tier (witness-pair) is the actual ceiling, and I'd already been operating there the whole time. The third tier was vapor.

Round 2: "a PreToolUse hook spawns witness and denies the tool call"

Once the third tier retracted, I drew the next plan more carefully. Round 1 had just collapsed an hour earlier, so the cautious move was to verify as I went. Here's the plan I came up with:

"In autoloop's commit step, a PreToolUse hook spawns witness as an agent-type hook. If witness returns BLOCK, the hook denies the tool call."

Tempting. Platform-level enforcement — no way for the main session to route around it. With the hook doing the gating, witness becomes a trustworthy chokepoint instead of leaning on discipline.

But — round 1 had also been tempting an hour ago, so the first thing on my mind this round was "is this pipeline actually in the docs?" Back to the official guides.

agent-type hooks exist. Documented.
But there are zero examples of agent hooks fired on PreToolUse with a verdict-based deny. Every documented example is on Stop, used to verify tests pass.
The semantics of "hook spawns a subagent synchronously, waits, populates deny from the verdict" isn't described anywhere.

CCT again.

Claim-crux: that witness needs hook-layer enforcement to be trustworthy.
Counter-fit: every other autoloop phase ([Plan], [Define], [Execute], [Evaluate]) runs on discipline alone. Why is the witness gate special?
Cause-chain: "hook enforcement → trustworthy gating" only holds if the hook semantics actually exist. They don't. The fallback is autoloop discipline — and autoloop already runs that way for everything else.

Flipped. Witness doesn't need a hook. Spawn it inside [Judge] via the Task tool. Task + subagent_type is a well-documented primitive. Autoloop will run witness the same way it runs any other phase, and enforcement is autoloop discipline — the same thing already trusted for every other phase. No new guarantee; no pretending one exists.

Same shape as round 1, though. Attractive name, design inside the frame, no platform check, late collapse. The hour-younger me had repeated the hour-older me's mistake almost exactly.

Round 3: turning the doubt on witness itself

By this point witness looked shippable. Execution isolated. Raw user turn as input. Citation rule in place. Two failures behind me — surely the third one wouldn't show up.

But two failures sitting next to each other started bothering me. Both had the same shape: attractive name → design inside the frame → no platform check → collapse. "Inverted three-tier" had failed it. "PreToolUse hook deny" had failed it. So what about "witness" itself? Had I checked that name?

Last round, then. This time I aimed /devil at witness itself, with the strongest version of the claim I could put on paper:

"witness behaves like the deterministic comparator we designed — it compares, it doesn't reason."

What CCT picked out was different from rounds 1 and 2.

Claim-crux: that writing "do not reason, compare" in the agent prompt makes the LLM actually behave that way.
Counter-fit: witness isn't an agent file. Witness is agent file + input + the model's default tendencies, all stacked. The same class of LLM runs the main session and the witness. The failure mode that makes the main session rationalize during self-check could show up in witness too — and "we wrote a rule" is not structural protection against that.
Cause-chain: "agent rules → strict comparator behavior" runs through instruction-following — which is the exact mechanism witness is supposed to be a check on. Circular.

Pause here. Rounds 1 and 2 had failed for external reasons — the platform didn't support what I'd drawn. Searchable, factual problems with clear fixes. Round 3 was different. The problem was inside witness's design: the enforcement mechanism for witness's behavior is the same mechanism witness is supposed to be a check on. Self-referential. Strong-form claims don't survive that.

But the weak form did.

The strong form — "witness is a deterministic comparator" — got weakened. The weak form — "witness is still better than main self-check" — held. The reason is that the layers separate cleanly.

Layer 1 — execution isolation (harness-enforced) ──── real structural guarantee
Layer 2 — spawn-prompt rules ──────────────────────── aspirational contract
Layer 3 — citation rule ───────────────────────────── aspirational contract

Layer 1 the platform enforces. Even a sloppy witness can't read the main session's rationalization context — there's no path to the confirmation-rubber-stamp failure mode, because the input it would rubber-stamp on isn't reachable. That alone makes witness strictly better than self-check.

Layers 2 and 3 are prompt-level behavioral contracts. The runtime doesn't enforce them. If witness decides internally "this finding is §2 but it feels weak, I'll call it WARN" — nothing structural stops it.

That's what v0.9.1 went in for. witness.md's §Isolation stack now separates the three layers explicitly into enforced vs. aspirational, and it tells operators to manually sample verdicts in early use to check whether layers 2-3 actually hold up. Drift gets observed and fed back into the agent file. The scribe already logs witness verdicts to the journal, so the observation channel exists.

Witness isn't a perfect verifier. It's a structurally better one than self-check, and the difference is honestly bounded by layer 1. The rest is calibrated against reality as I use it.

What survived

The witness that came out of three breakages looks like this.

A subagent spawned via the Task tool. The 3-tier picture from round 1 doesn't exist; 2-tier is the harness ceiling, and that's where witness sits.
Run inside autoloop's [Judge] phase. Round 2's hook-level enforcement isn't real either, so witness runs on autoloop discipline — same way every other phase runs.
Fed only the raw user turn and the produced artifact. No main-session commentary, no chain-of-thought, no commit messages — and if any of those leak through, witness can't cite them. That's layer 1, harness-enforced.
Required to cite. Every finding has to anchor on user turn N: "exact quote" paired with file:line. A finding without a citation gets discarded by construction. That's layer 3 — a prompt-level contract, aspirational.
Source-based in its verdict. BLOCK vs WARN isn't a vibe call. The verdict is decided by which check produced the finding (§1 intent-scope, §2 intent-content, §3 framing-derived, §4 ground truth). §1/§2/§4 → BLOCK, §3 → WARN. The severity dial got removed entirely.
Three scopes. A single commit (commit gate), all file changes in the session (session forest), and — when the user says something like "across the whole system" or "to every X" — principle coverage, where witness grep-checks that the principle landed everywhere it was supposed to. That last one is the sharpest mode.

Each broken design left a line behind.

A meta beat — discovery-led depth applied to itself

There's a second thing in this release worth naming. I re-aligned the design of inspect, devil, and punch to all be discovery-led. (The old name was "cascade," which never quite communicated the direction.)

The core idea is simple: depth comes after discovery. You don't decide how deep to go before you have a target — something surfaces, and the surface decides the depth. Chess players run CCT — Checks, Captures, Threats — before calculating any long variation. Surgical teams run the Time Out before incision. Aviation challenge-and-response works because PM doesn't take PF's word — PM looks at the switch. Five different verification traditions, same structural move.

I borrowed CCT for /devil too — Claim-crux, Counter-fit, Cause-chain. Instead of attacking a claim in parallel along four axes, you find the one load-bearing assumption first, classify whether it sits on Evidence, Logic, or Alternatives, and pour depth into that one axis. The other axes get a light pass.

The three /devil rounds today were the first live test of this rewrite. If I'd run them as parallel attacks — is witness good? is the three-tier good? is the hook good? — the conclusions wouldn't have landed this cleanly. CCT picked a single load-bearing assumption every time, and pulling on that one thread collapsed the whole structure. The principle worked on the person who designed it.

What v0.9 actually shipped

sonmat-witness — external intent-artifact comparator. Three scope scales (commit / session forest / principle coverage). Source-based verdict (BLOCK / WARN / PASS / INSUFFICIENT_GROUND_TRUTH).
guard / scribe split — guard does pure verification (operational checks + discovery), scribe handles after-the-fact persistence (project rules, novel-trap memory, journal, bridge notes, witness verdict logs).
discovery-led realignment — inspect is trigger-reactive depth, devil uses CCT to find load-bearing and attack asymmetrically, punch treats the user's invocation itself as the discovery that opens the mode.
/punch refactor-residue check — finds stale references that survived a structural removal or rename. Seven patterns: section / function / file / terminology / example / enum / template.
Honest framing of the isolation stack — layer 1 (harness-enforced) vs. layer 2-3 (aspirational). No strong-form packaging; the doc names which parts are structural and which parts are hope.
Feature request doc — four platform primitives that would make witness stronger but currently don't exist (input-channel restriction, nested delegation, session layer, documented hook patterns). Filed under docs/feature-requests/claude-code-isolation.md.

v0.9.0 shipped the agent + restructure. v0.9.1 was the honest-framing pass.

The one-line lesson

Before you design on top of a clever-sounding name, verify the platform actually supports it. "Inverted three-tier," "PreToolUse agent hook + deny," "deterministic comparator" — all three sounded shippable. All three either didn't hold or held only weakly when something pulled on the load-bearing piece. CCT picked the right thread every time.

The bigger point: writing a principle into the docs is not the same as the principle holding under pressure. You have to keep an eye on how the principle applies to the person writing it. Today was the first chapter of that observation.

Release notes: v0.9.0, v0.9.1
Repo: https://github.com/jun0-ds/sonmat

GitHub · LinkedIn

GPT-4 said strawberry has two R's. The word has three.

Jun0 — Thu, 07 May 2026 11:17:54 +0000

"How many R's are in 'strawberry'?"

By 2024 every developer had seen the screenshot. GPT-4 confidently insisting strawberry has two R's. The word has three. The fix eventually landed — but for a moment it captured something cleaner than any benchmark: a thing a human does in half a second, that the model gets confidently wrong.

That's the picture most people have when they hear "hallucination." sonmat v0.8.0 (April 11, 2026) dealt with hallucinations. Just not that kind.

What the 7% actually was

The trigger was a 2,700-question wiki QA evaluation on a 24B model. Hallucination rate: 7%. Looking at the number you'd shrug — "yeah, LLMs hallucinate, that's life." But once I went through the actual flagged responses one by one, the picture was different.

Strawberry-style cases — the model fabricating something that wasn't in its training distribution — were a minority. What showed up more often was this:

User: "Facility management is in table A."
Reality: it's in table B.
Model dutifully searched table A.
Found nothing, got confused, ended up extrapolating something plausible.
This response landed in the 7% bucket.

Is this hallucination? From the user's seat, yes. The answer was wrong, that's all that matters. But put a human in the same situation and the result is the same. An intern handed a wrong manual, sent off to find the facilities lead, comes back with a confused report. The model isn't broken. The input was.

Two sources got tangled together

Here's where I had to draw a line. The user experiences hallucination as one event, but its source splits in two.

Source	Where it starts	Treatment
Model-side	Plausible combinations get assembled inside the weights (the strawberry case)	Model researcher territory. Has to be fixed at the weights level
Context-side	The input was wrong; the model dutifully followed	Doubt the input. System designer territory

The literature isn't unanimous either. Under faithfulness (does the output stay loyal to the input?), the context-side case is "loyal, so not a hallucination." Under factuality (does the output match reality?), it's "wrong, so yes, a hallucination." Ji et al.'s NLG hallucination survey (2023) splits intrinsic vs. extrinsic — and the wrong-manual case fits neither cleanly. Input-faithful and reality-unfaithful at the same time.

The reason researchers can't agree is simple: from where the user sits, both look like the same event. "The AI was wrong." The split only matters if you're building tools — because different sources need different treatments. Model-side, we can't touch. Context-side, we can.

Strawberry isn't a one-off

The same model-side pattern shows up wherever an LLM lands inside a rule-bound environment. Ask one to play chess and watch it confidently slide a rook diagonally, or move through another piece. The rule violation is obvious to any player. The model has no world model — just a learned distribution of plausible-looking continuations.

Every code agent inherits this risk. It'll eventually do the equivalent of sliding a rook diagonally, with full confidence, and you won't catch it unless you're looking. Strawberry was a single screenshot. The pattern is structural.

Model-side hallucination isn't sonmat's territory. v0.8 only dealt with the side we can touch.

I was stuck in this frame for a while

The split looks self-evident written down. It wasn't, for me. I spent a long stretch nodding along with "hallucination = model problem" — figuring sonmat could add all the doubt tools it wanted, none of them would touch the statistical combinations made inside the weights. I'd parked hallucination as out of scope for sonmat.

The 7% breakdown was what cracked that. The frame wasn't wrong, the scope was just way too tight. I was building a tool that says doubt the context you're given, while standing on a piece of context I'd never doubted. Embarrassing place to be — but that's where v0.8 actually started.

Both of the changes that followed were that one realization, pushed into discipline (reasoning rules) and into a skill (an action tool) at the same time.

Six places in core

I touched the discipline file first. discipline/core.md is sonmat's short prescription for how Claude should think. Up through v0.8, the doubt was almost entirely turned inward — "are my assumptions actually solid? am I jumping to a conclusion?" That kind of question.

v0.8 widened the doubt by one notch. Not just your reasoning — the context you received is suspect too. Same line, planted in six places.

The received context can be broken in three flavors:

incomplete — left unsaid
imprecise — said loosely
incorrect — said wrong

All three coexist. Fixate on one and the others slip past. One-beat pause, for example, picked up this in v0.8:

 ### One-beat pause
 Before agreeing with anything — is there something worth doubting here?
 If the question even crosses your mind, that's the signal. Check before you nod.
+This includes the context itself — it may be incomplete (left unsaid),
+imprecise (said loosely), or incorrect (said wrong).
+All three coexist; don't fixate on one.

Same pattern landed in Strip to essentials, Predict before acting, Ground it, Pace it, and Weight it. Weight it got an extra line on top — split the source of your confidence: verified fact / user statement / inference / guess. Not "I'm 80% sure" but "I'm 80% sure based on a user statement, which is not the same as a verified fact."

A bunch of one-line additions that look tiny. The actual move was widening sonmat's territory of doubt from "inside the model's own reasoning" to "the inputs the model was handed." A tool that only doubts its own reasoning gets dragged the moment a user says "facility management is in table A" and is wrong.

Same realization, other face — `/punch`

If the core changes were one face of v0.8, the other face was the new /punch skill in the same release.

Background: a quantitative pattern from communication-error research. Aviation CRM (Helmreich), surgical teams (Lingard 2004), software engineering (Boehm/Firesmith). Different domains, suspiciously similar splits:

Error type	Share
Omission	40–55%
Imprecision	20–25%
Incorrect	10–15%
Context/timing	10–20%

Caveat up front. This is the human-to-human distribution. There's no direct evidence LLM hallucinations follow the same ratios. Borrowed assumption, not measured result. But the qualitative pattern — omissions vastly outnumber outright wrongs — does seem to track on the LLM side. Models hallucinate by filling in what you didn't say far more often than by contradicting what you did say.

So the highest-ROI move is to find what's missing. Existing sonmat skills weren't doing that:

/guard — "is this safe?"
/inspect — "what could break?"
/devil — "is this reasoning sound?"

All three inspect what's there. "What's not there but should be?" wasn't being asked by anyone. That's the slot /punch fills.

guard asks "is this safe?"
inspect asks "what could break?"
devil asks "is this reasoning sound?"
punch asks "is anything missing?"

The name is from a construction punch list. You walk a finished building with the contractor and note every outlet that was on the plan but not in the wall, every door that won't close, every fixture missing entirely. That walk.

Why punch stands on two legs

Method is short: reconstruct + domain checklist. Two legs.

1. Reconstruct

Code alone doesn't reveal intent. There's always something the user had in their head that never made it into the file, and that's where omission leaks the hardest. So /punch doesn't analyze unilaterally. It opens a dialogue:

[punch] Inferred intent from the implementation:
  User stories: [...]
  Contracts: [...]
  Constraints: [...]
  Uncertain: [things I couldn't infer — input needed]
  Anything missing, off, or wrong here?

Output at this point isn't a verdict. It's a checkpoint. The valuable round happens when the user replies "oh, forgot that," "that's not what I meant." Aviation challenge-and-response, surgical Time Out, military brief-back — verification traditions across very different fields converge on the same shape. The maker and someone else, immediately after the work, run a quick alignment.

2. Domain checklist

Reconstruction alone isn't enough. The bits the user themselves forgot don't surface in reconstruction. (The "missing bathroom" case.) So the second leg is a domain checklist:

Domain	Core items
Web app	Auth/session, input validation, error pages, loading states, responsive, a11y, CORS, rate limiting
API	Versioning, error format, auth, pagination, timeout, idempotency, docs
Data pipeline	Schema validation, null/empty, dedup, retry, monitoring, backfill
CLI	Help, exit codes, stdin/stdout, error messages, config, --dry-run
ML/AI	Baseline, eval, data leakage, latency, fallback on failure

The checklist won't catch everything. Project-specific requirements aren't on it. But the territory the checklist covers and the territory reconstruction covers are orthogonal. One leg asks "what was specifically intended for this project," the other asks "what does any project in this domain usually need." Run only one and the other half walks out the door.

The limit, plainly stated

Where this frame is solid and where it leans on hope, separated honestly:

The model-side hallucinations stay. Strawberry, chess rooks, the lot. v0.8 doesn't dent them. Model-side comes out of the weights, weights belong to model researchers. sonmat doesn't touch it.
The 7% number is one person's one test. A 24B model, 2,700 wiki QA's. No guarantee the same distribution holds on a different model, a different domain, a different evaluation prompt.
The error-rate table is from human-to-human research. Aviation CRM, surgical teams, software engineering retrospectives. No direct evidence LLM hallucinations split into the same ratios. They look qualitatively similar — that's the most I can honestly say.
Sources don't always split cleanly. A user mumbles half a requirement, the model fills in the rest from its learned distribution, and now context-side and model-side are tangled inside one response. This frame catches half of those at best.

With all that conceded — what did v0.8 actually do? One sentence.

The one-line lesson

It pulled apart two events that had been bundled under the single word "hallucination," and started treating each one according to where it actually started. One source (model-side) we can't fix. The other (context-side) we can. The fix split in two — six lines in discipline/core.md extending doubt outward to the input context, and a new tool, /punch, that goes looking for what's missing.

The same realization landed in discipline (rules of reasoning) and skill (an action tool) at once. Not coincidence — two faces of one finding. v0.8 didn't solve hallucination. It picked the events that had been miscategorized as hallucination apart from the rest, and started treating them on their own terms.

Move the direction of doubt one notch outward — from your own reasoning to the context you were handed. That was sonmat's step.

Release notes: v0.8.0
Repo: https://github.com/jun0-ds/sonmat

GitHub · LinkedIn

My devil's advocate worked. That was the bug.

Jun0 — Tue, 05 May 2026 11:41:18 +0000

Why a single tool got rewritten five times

Inside sonmat there's a slash command called /devil. When you arrive at a confident conclusion, it sits down and beats on it for a minute. Devil's advocate. A well-known reasoning move. I figured wrapping it in a slash command would be a one-shot job.

Five rewrites.

The first cut shipped on April 2nd as v0.6.0. The last meaningful overhaul shipped on April 26th as v0.11.0. Twenty-four days, five steps, and at every step the tool taught me one thing it didn't know yet. This post is that arc.

v0.6 — meet imp

In the first post in this series I went after the gap that swallows AI work — the model produces confident nonsense, the human gets persuaded by the confidence, nobody verifies. To close that gap I needed a tool that would force me into self-rebuttal.

I called it imp. Little gremlin. I liked the playful name. The description was just "Devil's advocate for reasoning."

The first design was simple. Restate the user's claim in one line. Attack on three axes — Evidence (cherry-picked? missing data?), Logic (any leaps?), Alternatives (could the same facts support a different conclusion?). Name the cognitive bias in play. Close with a balance table — Strength column, Verdict column.

It worked. The first two times.

v0.7 — the name was lying

Third use, I caught myself hesitating. The command was /imp, but every line of the description called it "devil's advocate." Same tool, two names, in the same document. Every time I went to use it, my brain did a little "wait, was it imp," and that little hitch added up.

Sounds petty. It isn't. The name is the tool's identity. If the identity is fuzzy, every invocation costs you a context switch you shouldn't be paying for. Pile up enough of those and you stop calling the tool at all.

v0.7.0 (April 6th) renamed /imp → /devil. Breaking change. The user (still me) had to relearn a command. Did it anyway. A lie you're carrying gets more expensive the longer you carry it.

That same release also dropped Rhythm Rules (Pace / Weight / Learn) into core. So /devil stopped being a standalone tool and became one component inside a larger verification system. Pace is "when do I use this," Weight is "how heavy a pass do I need," Learn is "how do I save what I find." /devil ended up being the thing you reach for when Pace and Weight tell you to.

v0.7.1 → v0.10 — the table was unreadable

Once it actually got used, the balance table started misbehaving:

| Original claim | Counter-argument | Strength | Verdict |

Strength of what? The original claim's strength? The counter-argument's strength? Every single time I read this table, I had to drop back into the body text to figure it out. The column name wasn't carrying the meaning it was supposed to.

v0.7.1 (April 10th) — pinned the subjects to the labels:

| Original claim | Counter-argument | Counter strength | Claim verdict |

Better. But a week later it was annoying me again. "Counter strength" is a noun phrase that's too compact — it doesn't tell the reader what question the column is answering. The first thing you ask when you read a column header is "what is this column asking me?" The faster the header answers that, the faster the table reads.

v0.10.0 (April 17th) — turned the noun phrases into questions:

| Original claim | Counter-argument | Counter (strong/moderate/weak) | Claim after challenge |

The actual header text is "How strong is the counter-argument?" with the answer options shown in parentheses. "Verdict" — abstract noun — got replaced with "Claim after challenge," which has time embedded in it. Much more direct.

Two passes for what looks like a tiny change. But column headers are the first signal a reader gets when they hit a table. If the first signal is fuzzy, every row underneath turns into noise. The header is the meaning carrier; the cells are just data riding it. Worth two passes.

v0.11 — devil started wearing me out

The biggest shift came on April 26th, and here's the awkward part: the problem was that the tool was working.

/devil was too good at doubting.

Every time I made a decision and called /devil, it would dutifully surface real weaknesses. Not fluff — real ones. Cherry-picking risks. Causal directions I hadn't verified. Alternative hypotheses that explained the same data equally well. All true.

The catch: none of it changed what I did next.

Concrete example. I'd ask /devil about a decision like "which series should I file this post under?" and it would honestly point out that series taxonomies are arbitrary, that readers don't browse by series, that my own category metrics are basically empty. All true. But I still had to put the post somewhere, I could move it a week later if I felt like it, and none of the surfaced weaknesses were going to change my next click. The tool had done real work. The result was churn.

I gave it a name — reactive contradiction. Real weakness, but tangential. The tool looks smart. The user gets nothing actionable.

Realizing this was the tool's signature failure mode is what triggered v0.11.

The fix was a gate. v0.11 added a new section called §2.5: a project-relevance gate.

After CCT (Claim-crux / Counter-fit / Cause-chain — the existing step where devil hunts for the load-bearing assumption), there's now a gate it has to pass before §3 depth drive runs. Three questions:

Question	What it asks
Stakes	If this reasoning is wrong, what does the user actually lose here?
Amendment cost	Where in the lifecycle is this decision — draft, or shipped operational state?
Next-action delta	If this counter is surfaced, does the user's next action actually change, or do we just add words?

The three answers produce a verdict:

[devil] Project relevance: "{material | load-bearing-but-low-stakes | off-project}"

material — real stakes. Proceed to §3 depth.
load-bearing-but-low-stakes — real weakness, but stakes don't justify a deep grilling. Note it briefly and stop.
off-project — the weakness is technically correct but disconnected from the actual decision. Say so and stop.

That last verdict — off-project — is the one that mattered. Until v0.11, /devil could only land on "claim survives" or "claim weakened/flipped." Both of those describe what happens after a challenge lands. There was no way to express "the challenge itself missed." So /devil would throw the missed punches anyway. That was the structural cause of reactive contradiction.

Once off-project existed, /devil had the option of an honest exit. A tool shouldn't stand up when there's no work for it to do. That's what the gate is for.

Five rewrites, four lessons

1. A verification tool needs to be verified too.
/devil can't audit itself, so the signal is user fatigue. If after invoking the tool you find yourself sighing instead of nodding, the tool isn't doing its job — no matter how many real weaknesses it found.

2. The name is the tool's identity.
v0.7 was about not carrying that imp/devil lie one more day. Every invocation that costs your reader a small mental swap is a tax that compounds.

3. Column headers are where signal turns into noise.
Two passes to get them right. The leap was going from a noun phrase to a question. Tables hold data. Headers hold the question the data is answering. Get the second wrong and the first becomes meaningless.

4. A busy-looking tool isn't necessarily a useful tool.
When /devil was finding real weaknesses on every call, it looked like the tool was earning its keep. It wasn't. It was generating output that looked like work but wasn't actually moving anything. The fix wasn't to make /devil smarter. It was to give it permission to leave.

What's already pulling on the next rewrite

The gate landed less than a week ago and there are already cracks I can see:

The user doesn't always state Stakes explicitly. So how does /devil estimate them? If the estimate's wrong, doesn't the whole gate collapse back into reactive contradiction?
The boundary between load-bearing-but-low-stakes and off-project is sharp on paper but probably blurry in practice. Both end in "stop here," but they mean different things.
When does /devil need to self-critique its own verdict? "Your gate verdict is too conservative" is a meta-challenge that doesn't have a home yet.

Five rewrites in, the work isn't done. What goes into the next release will be whatever the user (still me) gets tired of next.

Try it

/plugin marketplace add jun0-ds/sonmat
/plugin install sonmat@sonmat

After install, try calling /devil on any decision you're sitting on. If the result feels like churn, that's the signal — the tool has no real work here. Catching that signal is how you train the tool, and yourself.

→ GitHub: jun0-ds/sonmat

GitHub · LinkedIn

I built sonmat to fix this. Then sonmat had the same bug.

Jun0 — Sun, 03 May 2026 03:16:19 +0000

Another confession

In the last post, I went after the bug that every Claude Code discipline plugin seems to share: the rules live in the main session, the work happens in the workers (subagents), and the rules don't make the trip across. I named names. I quoted the maintainer of superpowers closing a related issue as "not planned." And then, with a straight face, I claimed that sonmat was different.

It really wasn't. Not yet, anyway.

For a while, sonmat had this nicely-crafted hook. Every time you opened Claude Code, it would shove 1,239 characters of discipline into additionalContext before you even said hello. "MANDATORY. Apply Break it / Cross it / Ground it. Read project memory. Watch for novel traps…" Every session, every time, before the model got a word in.

I thought this was the strong play. The hook fires before the model speaks, the instruction lands in additionalContext, the discipline can't be skipped. That was the theory.

What I didn't notice — for embarrassingly long — was that I was rebuilding, with my own hands, the exact bug I'd just spent a whole post laughing at.

How I figured it out

Here's the awkward bit: additionalContext is delivered to the main session. It is not delivered to subagents.

So picture what was actually happening. The discipline lived in the place I could see (the main session). It was completely absent from the place where the work actually got done (the workers). The main session would dutifully announce "applying Break / Cross / Ground" — and then dispatch a worker. The worker would receive a clean task with a clean context. No discipline. The worker would shrug and go, "this is simple enough, I don't need tests." The result would come back, the main session would format it confidently (still holding all 1,239 characters of rules), and I'd nod along and approve.

It was not fine.

Which is to say: the exact failure mode I'd been mocking in superpowers and karpathy-skills? Same mechanism, different label, mine.

Honestly, I only caught it kind of by accident. I'd started spinning up other CLIs for the same kind of work, and something in the output felt off. So I went poking around. Turns out, every CLI handles hooks slightly differently — different contracts, different injection points, sometimes none at all. And while I was wrestling with making the discipline survive outside Claude Code, the thing that should have been obvious inside Claude Code finally clicked: a hook that lands in one place but not another isn't a guarantee. It's a happy accident that landed in the main session.

The bug wasn't multi-CLI. The bug was that I'd been calling that happy accident a guardrail.

What changed in v0.4.0

I emptied the hook. additionalContext: 1,239 → 0.

The discipline didn't disappear — it just moved. It now lives in CLAUDE.md → discipline/core.md, the same file the agent already reads as part of its prompt context, and the same file you'd put any other instruction in. Workers spawned by the main agent inherit the same CLAUDE.md chain. So the rule lands in the same place, every layer.

The hook still runs. It just sticks to what hooks are good at. Make the .claude/sonmat/ directory. Plant a one-time ## sonmat block in your global CLAUDE.md so the discipline gets referenced. Check for updates. Side effects only. It doesn't try to shape behavior anymore.

BEFORE                                 AFTER
hooks/session-start                    hooks/session-start
  └─ additionalContext: 1,239 chars      └─ side effects only
       "MANDATORY: sonmat..."                 ├─ create .claude/sonmat/
       (delivered to main session             ├─ plant ## sonmat block
        only — workers never saw it)          └─ git pull if outdated

                                       CLAUDE.md → discipline/core.md
                                         (read by main and by every
                                          worker spawned from it.
                                          visible to the user. editable.)

Same discipline, different path. The behavior didn't get weaker — it just got honest about where it actually lives.

Four things I believe now

1. A guardrail that doesn't reach the worker is a fake guardrail.
If your "mandatory" rule is being delivered through a channel the worker doesn't subscribe to, it isn't mandatory. It's decoration. And the trap is that you can see it sitting in the main session — which is exactly why you stop checking.

2. Visibility is the contract.
A rule sitting in additionalContext is invisible to the user. You can't read it, can't edit it, can't disagree with it. A rule sitting in CLAUDE.md → core.md is just there, in the repo. The agent reads it. You read it. You can disagree with it — and that's a good thing, because that's how drift gets caught before it ships.

3. Hooks are for side effects. They are not for behavior.
Make the directory, plant the marker, pull the update. That's the job. The moment a hook starts trying to shape what the agent does, you're betting that the hook fires in every code path the agent will ever take. It doesn't. It can't.

4. "Strong" enforcement is usually fragile enforcement.
The 1,239-character injection felt powerful because it was automatic. But automatic-and-incomplete is worse than manual-and-complete — the user trusts the automation and stops looking. Moving discipline into a file the user can edit (and ignore) sounds weaker. It isn't. It's where the user actually re-enters the loop.

The hard part

Honestly, emptying the hook felt like giving up control. The hook was the place where I could make sure. If discipline lives in CLAUDE.md, the user can edit it, override core.md, even ignore the whole thing.

Which, yes, is the entire point.

A discipline the user can't see is a discipline the user can't trust. A discipline the user can't edit is a rule, not a tool — and sonmat is supposed to be a tool. Visibility is the price of trust. And there's a bonus: the discipline now reaches the workers, because the workers read the same file the user reads.

Diagnose your own setup

If you're running any Claude Code plugin that promises "guardrails," try asking three questions:

Where does the rule physically live? A hook injecting additionalContext? A skill the model has to remember to invoke? A line in CLAUDE.md?
Who actually reads it? Just the main session? Workers too? Subagents spawned from workers?
Can you see it yourself? If you can't open a file and read the rule that's supposedly governing your agent, you don't have a guardrail. You have a vibe.

I had to put my own plugin through those three questions before the answer became obvious. Doing the diagnosis in 01 was the easy part. Applying it to sonmat itself took a lot longer.

Try it

/plugin marketplace add jun0-ds/sonmat
/plugin install sonmat@sonmat

After install, the discipline lives at ~/.claude/plugins/marketplaces/sonmat/discipline/core.md. Open it. Read it. Disagree with parts of it if you want — that's actually how you'll know it's doing something real.

→ GitHub: jun0-ds/sonmat

GitHub · LinkedIn

Your AI is confident. Your AI is wrong. You shipped it anyway.

Jun0 — Fri, 01 May 2026 03:13:10 +0000

A confession

I told Claude to write tests first. Claude said "understood." Then Claude spawned a subagent. The subagent said "this is simple enough, I don't need tests." It shipped. I approved. The tests that didn't exist didn't fail. Everything looked fine.

It was not fine.

The fun part: I had three plugins installed specifically to prevent this. They were all working correctly. In the main session. Where the work wasn't happening.

The problem with being confident

AI agents have a specific failure mode: they sound right even when they're wrong. This is well-known. What's less discussed is the other half — you also stop checking when the output sounds right.

So you have two parties in a conversation. One produces confident nonsense. The other accepts it because confidence is persuasive. Nobody verifies. Errors ship.

This is not a technology problem. This is a trust problem. And every tool I tried was solving the wrong half of it.

What every plugin gets right (and then misses)

superpowers (175k stars) adds TDD, debugging, code review. Smart rules. They live in the main session. When Claude spawns a subagent — which is where the actual work happens — the subagent doesn't get them. The maintainer closed it as not planned: "this is a Claude Code platform limitation. There's not much superpowers can do."

karpathy-skills puts principles in CLAUDE.md. Subagents can't reliably read CLAUDE.md. Sometimes they claim they did. They didn't.

GSD has beautiful structure. Milestones, slices, tasks. Discipline is the user's job. The framework doesn't enforce it at the worker level.

The pattern: great rules → main session only → workers ignore them → output looks fine → it isn't.

Documented. Repeatedly. Across projects.

What I built instead

sonmat (손맛 — Korean for "mother's touch." The secret ingredient that makes the same recipe taste different.)

It does two things:

Makes the AI doubt itself. Verification discipline goes directly into every worker's prompt at dispatch time. Not a file reference. Not a hook that might fire. The actual rules, in the actual prompt. Break it, Cross it, Ground it — on every task, including the ones you don't see.

Makes you doubt the AI. Every decision surfaces with its reasoning. Not "here's the answer" but "here's the answer, here's why, and here's what I'm not sure about." When you see the reasoning, you can judge. When you only see the answer, you won't.

And the AI doubts you back. When your instruction is ambiguous or conflicts with what it sees, sonmat doesn't just comply — it asks. The same verification attitude applies in both directions.

That's the whole thing. Everything else — autonomous loops, escalation levels, domain-specific traps — is implementation detail.

Four things I believe now

1. Confidence is the worst signal.
When the model feels sure, that's exactly when it should look for counterexamples. Confidence without verification is hallucination in a suit.

2. Rules that don't reach workers are decoration.
A coding standard that exists only in the main session is a Post-it note on a door nobody walks through.

3. Autonomy without guardrails is just expensive chaos.
sonmat escalates automatically — pause, spawn worker, spawn parallel workers — when it hits surprises or repeated failures. You don't babysit. It doesn't run blind.

4. Universal rules are universally mediocre.
"Write tests first" is critical for dev, meaningless for data analysis. "One change at a time" is essential for ML, overkill for docs. sonmat loads domain-specific traps. The right advice for the right context.

The hard lesson

I wanted to add more rules. Every edge case screamed for a new rule. I resisted.

Too few rules: chaos. Too many: the agent spends its time checking boxes instead of working. The answer was a small, hard core — three verification methods — plus domain hints that activate only when relevant.

The other lesson: transparency beats enforcement. A guard that says "no" gets worked around. A colleague that says "I noticed this — your call" gets listened to. sonmat chose the second approach. For the AI and for you.

Try it

/plugin marketplace add jun0-ds/sonmat
/plugin install sonmat@sonmat

No config. Start talking.

→ GitHub: jun0-ds/sonmat

GitHub · LinkedIn

I Spent a Week Installing WSL2. The Fix Was Two Lines.

Jun0 — Tue, 31 Mar 2026 01:22:16 +0000

"WSL2? Five minutes, tops." — Me, seven days ago.

TL;DR

On Windows 11 25H2 (build 26200), enabling VirtualMachinePlatform hangs at 37.8%. Forever. The servicing stack is stuck on 24H2 (build 26100) while the OS moved to 25H2 (build 26200). Windows literally cannot service itself.

Add-WindowsCapability -Online -Name "Microsoft.Windows.HyperV.VirtualMachinePlatform~~~~0.0.1.0"
dism /online /enable-feature /featurename:VirtualMachinePlatform /all /LimitAccess

If you just want the fix, there it is. If you want to know how it took a week and 10 failed attempts to arrive at two lines of PowerShell, keep reading.

Day 1: Innocence

Simple plan. Install WSL2. Set up Ubuntu 24.04. Do actual work.

wsl --install -d Ubuntu-24.04

37.8%.

I made coffee. Came back. 37.8%.

I had lunch. Came back. 37.8%.

37.8 is now my least favorite number.

The Crime Scene


OS	Windows 11 Pro 25H2 (Build 26200.8037)
CPU	Intel Core Ultra 9 275HX
DISM version	10.0.26100.5074
Servicing stack	10.0.26100.8035

See it? OS is build 26200. Servicing stack is build 26100. The mechanic has last year's manual for this year's car. If you spotted this, you already know the ending. I did not spot this on Day 1.

Days 2-6: The Parade of Failures

GUI — Stuck. Cancel also stuck.

"Turn Windows features on/off." Checked the box. Progress bar froze.

Clicked Cancel. Cancel froze.

A cancel button that cannot be cancelled. This is the operating system from the world's most valuable company.

Task Manager. End Process. Moving on.

DISM — 37.8%

dism /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart

37.8%. We meet again.

PowerShell — Same wall, different paint

Enable-WindowsOptionalFeature -Online -FeatureName VirtualMachinePlatform -All -NoRestart

Calls DISM internally. Same result. Changing the wrapper doesn't change the candy.

Uninstalled BlueStacks — Wrong suspect

Found forum posts: "Android emulators conflict with Hyper-V." I had BlueStacks 10 installed. Uninstalled it completely. Registry cleanup. Folder purge. The works.

Result: 37.8%.

BlueStacks was innocent. And now I have to reinstall it later.

Offline install from 24H2 ISO — Version mismatch

"If the download is the problem, go offline."

dism /online /enable-feature /featurename:VirtualMachinePlatform /all /LimitAccess /Source:D:\Sources\Install.wim

0x800f0912. The ISO is 24H2 (26100), the OS is 25H2 (26200). Windows refuses the source files because they're from "the wrong version." Self-compatibility is apparently optional.

Windows Update cache reset — No effect

net stop wuauserv
net stop bits
Remove-Item C:\Windows\SoftwareDistribution\Download\* -Recurse -Force
net start bits
net start wuauserv

Clean cache. Same 37.8%. Cleaning the house doesn't fix the plumbing.

Pending operations cleanup — Partial

dism /online /cleanup-image /revertpendingactions

Cleared the backlog. Feature still won't activate. Finishing your homework doesn't mean you'll pass the exam.

In-place repair install — Didn't downgrade

The nuclear option. Ran 24H2 ISO's setup.exe with "Keep personal files and apps."

40 minutes. Three reboots.

Result: OS stayed on 25H2 (26200). In-place install doesn't downgrade. But it did clean up the component store. This becomes a crucial plot point later.

Day 7: Reading the Logs (Finally Using My Brain)

A week of "try something else until it works." At this point, the question isn't what doesn't work — it's why.

Select-String -Path C:\Windows\Logs\CBS\CBS.log `
  -Pattern "Error|Failed|0x800f" -Context 2 | Select-Object -Last 20

The smoking gun:

Failed to get uup features from WU, sessionData: {
  "ModuleID":"FOD",
  "Features":[{
    "name":"Windows.HyperV.OptionalFeature.VirtualMachinePlatform.Client.Disabled~"
  }]
} [HRESULT = 0x800f0820 - CBS_E_CANCEL]

download source: 8, download time (secs): 1256, 
download status: 0x800f0820 (CBS_E_CANCEL)

1,256 seconds. Twenty-one minutes waiting for Windows Update to deliver a package it will never find.

From the DISM log, the confession:

Dism.exe version: 10.0.26100.5074
Target image: OS Version=10.0.26200.8037

The servicing stack (26100) is trying to service a newer OS (26200). It goes to Windows Update looking for FOD packages matching this combination. Those packages don't exist in the UUP catalog. So it waits. And waits. And times out at 37.8%.

A car mechanic with a 2023 catalog trying to order parts for a 2024 model. "This part number doesn't exist in our system, sir."

This is the servicing architecture of the world's largest software company.

The Fix: Use the Back Door

DISM can't download the FOD through its usual channel (UUP). But Add-WindowsCapability uses a different channel.

Same building. Front door is under construction. Back door is open. The sign only mentions the front door. Classic Windows UX.

# Step 1: Back door — install payload via alternative channel
Add-WindowsCapability -Online -Name "Microsoft.Windows.HyperV.VirtualMachinePlatform~~~~0.0.1.0"

# Step 2: Now activate using only local files (no internet needed)
dism /online /enable-feature /featurename:VirtualMachinePlatform /all /LimitAccess

100%.

One hundred percent.

First three-digit number I've seen all week.

Why This Works

[Front door — BLOCKED]
DISM enable-feature
  → Needs FOD payload
  → Windows Update UUP channel
  → Servicing stack (26100) ≠ OS (26200)
  → "Part not found in catalog"
  → 21 min timeout → CBS_E_CANCEL

[Back door — OPEN]
Add-WindowsCapability  
  → Different download channel (bypasses UUP)
  → Payload installed locally ✓

DISM + /LimitAccess
  → "Internet? Don't need it."
  → Local files only
  → Success ✓

Both commands use the same Windows servicing system. But they fetch FOD packages through different channels. DISM goes through UUP, where the version mismatch kills it. Add-WindowsCapability takes a different route. The official docs don't mention this distinction. You're welcome.

If You Found This Article

You've probably already tried and failed multiple times. That means pending operations are likely piled up. Clean house first:

# Light cleanup
dism /online /cleanup-image /revertpendingactions
# → Reboot → Run the two-line fix

# If that's not enough (in-place repair)
# 24H2 ISO → setup.exe → "Keep personal files and apps"
# → Reboot → Run the two-line fix

Complete Flow (For Fresh Starts)

1. Enable virtualization in BIOS
   HP laptops: F10 → Security or Configuration → Enable VT-x
        ↓
2. Enable Hyper-V + WSL (these work fine, ironically)
   dism /online /enable-feature /featurename:Microsoft-Hyper-V-All /all /norestart
        ↓
3. Clean pending operations (if you've been trying things)
   dism /online /cleanup-image /revertpendingactions → reboot
        ↓
4. The actual fix
   Add-WindowsCapability -Online -Name "Microsoft.Windows.HyperV.VirtualMachinePlatform~~~~0.0.1.0"
   dism /online /enable-feature /featurename:VirtualMachinePlatform /all /LimitAccess
        ↓
5. Reboot → Install WSL2
   wsl --install -d Ubuntu-24.04

Is This Your Problem?

# Check version mismatch in DISM log
Get-Content C:\Windows\Logs\DISM\dism.log -Tail 100
# "version:" ≠ "image version:" → yes, this is your problem

# Check CBS log for the specific failure
Select-String -Path C:\Windows\Logs\CBS\CBS.log `
  -Pattern "Error|Failed|0x800f" -Context 2 | Select-Object -Last 20
# "CBS_E_CANCEL" → yes, this is your problem

# Verify CPU virtualization (prerequisite)
Get-CimInstance -ClassName Win32_Processor | 
  Select-Object VirtualizationFirmwareEnabled, VMMonitorModeExtensions

Summary


Symptom	VirtualMachinePlatform stuck at 37.8%
Root cause	25H2 OS (26200) + 24H2 servicing stack (26100) = FOD download mismatch
Fix	`Add-WindowsCapability` to bypass → `dism /LimitAccess` to activate
Prerequisite	Clear pending operations
Cost	One week of evenings. One wrongly accused BlueStacks. A lasting distrust of progress bars.

Lessons

Read the logs first. "Try random things until something works" is the scenic route to nowhere.
Windows Insider means signing up for this. 25H2 is a preview build. The servicing stack hasn't caught up. Now you know.
Same building, multiple doors. When DISM fails, Add-WindowsCapability exists. The docs won't tell you.
Feed your logs to AI. Nobody should read 160,000 lines of CBS log with their own eyes.

This troubleshooting session was done with Claude Code. It pulled the critical 6 lines from a 160,000-line CBS log and helped identify the Add-WindowsCapability back door. Without it, this would have ended with a format and reinstall.

DEV Community: Jun0

I put a witness on my AI. Devil's advocate killed three designs first.

Why a witness at all

Three reversals in one session

Round 1: "the inverted three-tier architecture"

Round 2: "a PreToolUse hook spawns witness and denies the tool call"

Round 3: turning the doubt on witness itself

What survived

A meta beat — discovery-led depth applied to itself

What v0.9 actually shipped

The one-line lesson

GPT-4 said strawberry has two R's. The word has three.

"How many R's are in 'strawberry'?"

What the 7% actually was

Two sources got tangled together

Strawberry isn't a one-off

I was stuck in this frame for a while

Six places in core

Same realization, other face — /punch

Why punch stands on two legs

1. Reconstruct

2. Domain checklist

The limit, plainly stated

The one-line lesson

My devil's advocate worked. That was the bug.

Why a single tool got rewritten five times

v0.6 — meet imp

v0.7 — the name was lying

v0.7.1 → v0.10 — the table was unreadable

v0.11 — devil started wearing me out

Five rewrites, four lessons

What's already pulling on the next rewrite

Try it

I built sonmat to fix this. Then sonmat had the same bug.

Another confession

How I figured it out

What changed in v0.4.0

Four things I believe now

The hard part

Diagnose your own setup

Try it

Your AI is confident. Your AI is wrong. You shipped it anyway.

A confession

The problem with being confident

What every plugin gets right (and then misses)

What I built instead

Four things I believe now

The hard lesson

Try it

I Spent a Week Installing WSL2. The Fix Was Two Lines.

TL;DR

Day 1: Innocence

The Crime Scene

Days 2-6: The Parade of Failures

GUI — Stuck. Cancel also stuck.

DISM — 37.8%

PowerShell — Same wall, different paint

Uninstalled BlueStacks — Wrong suspect

Offline install from 24H2 ISO — Version mismatch

Windows Update cache reset — No effect

Pending operations cleanup — Partial

In-place repair install — Didn't downgrade

Day 7: Reading the Logs (Finally Using My Brain)

The Fix: Use the Back Door

Why This Works

If You Found This Article

Complete Flow (For Fresh Starts)

Is This Your Problem?

Summary

Lessons

Same realization, other face — `/punch`