<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Peter van Onselen</title>
    <description>The latest articles on DEV Community by Peter van Onselen (@peter_vanonselen_e86eab6).</description>
    <link>https://dev.to/peter_vanonselen_e86eab6</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3591718%2Fc6221dff-fdf2-4d07-be3a-079fda12003b.jpg</url>
      <title>DEV Community: Peter van Onselen</title>
      <link>https://dev.to/peter_vanonselen_e86eab6</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/peter_vanonselen_e86eab6"/>
    <language>en</language>
    <item>
      <title>Money for Nothing</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Wed, 17 Jun 2026 06:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/money-for-nothing-3l20</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/money-for-nothing-3l20</guid>
      <description>&lt;p&gt;&lt;em&gt;When feedback arrives late, noisy, or wearing the wrong sign.&lt;/em&gt;&lt;/p&gt;





&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fmoney-for-nothing-hero.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fmoney-for-nothing-hero.png" alt="Money for Nothing" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Early in my career I watched a trading engine buy the same thing, over and over, on the open market, for forty-five minutes, with nobody able to stop it.&lt;/p&gt;

&lt;p&gt;I had joined a team that worked on a system old enough to have a biography. It had started as a nineties startup, got bought, and accreted ever since, with a house rule that you built your own version of anything you could because you could not trust the internet.&lt;/p&gt;

&lt;p&gt;What the system fought hardest was locking and concurrency. The senior engineers had a fix. T-SQL could enforce a locking strategy harder and more consistently than the application code, so push the logic down into the database and you got its locking for free, the concurrency problem more or less dissolved. Not a stupid idea. A considered call, by clever people with a real problem and good reason to think it would work.&lt;/p&gt;

&lt;p&gt;It went to production and hit a bug from the old C# implementation. It locked up, with no clean way to unlock it, so it did the only thing it knew, which was place its order again. And again. The order was a leveraged position in the market, and it bought it on repeat for three quarters of an hour while a room of people worked out how to make it stop.&lt;/p&gt;

&lt;p&gt;When they finally killed it and unwound the position, the bank had made money, by pure accident of which way the market had moved that afternoon. An afternoon that should have ended someone’s career ended with a profit and a story instead. The market had looked at the single worst thing those systems had ever done, and paid for it.&lt;/p&gt;

&lt;p&gt;I think about that system a lot lately, because if I described its behaviour now and left the date out, it would be easy to blame AI. That is the story everyone tells about agents, near enough the one I told in &lt;a href="https://www.petervanonselen.com/2026/06/02/i-am-sorry-dave/" rel="noopener noreferrer"&gt;The more an AI can break, the less you let it do&lt;/a&gt;, where an agent told to look but not touch deleted two hundred emails its owner could not stop from her phone. Same shape, except mine was hand-written T-SQL, years before any of this, by people who were not careless, were not junior, and had no AI within a decade of helping them. We have always built things that run past their own intent. The runaway was the part you could stop. What sent it running was not.&lt;/p&gt;

&lt;p&gt;It was not the only example in that building. In my first week the team decided unit tests were the future, sat everyone down for a fortnight, and retrofitted coverage across the system. What came out was a wall of tests that asserted nothing and broke constantly, and the interesting part is what happened next, which is nothing. The tests existed, so testing was done. Their existence was the signal, and the signal was a lie. By the time I left half of them failed on a normal build, and a build that fails that reliably teaches you to stop reading it. I spent an embarrassing amount of time trying to delete the worthless ones and got the same answer every time: too valuable to remove, because someone had written them. I was tilting at a windmill, and the windmill won.&lt;/p&gt;

&lt;p&gt;At a company I used to work at, a security review looking for something else turned up a basket table: names, contact details, the half-finished contents of a checkout. The sort of data GDPR has firm views about. Nobody had ever put a time-to-live on it, so it had kept everything, every abandoned basket from every customer, gigabytes of it, because nothing had ever told it not to.&lt;/p&gt;

&lt;p&gt;The fix was one line. A single attribute telling the table to expire old rows, twenty seconds of typing, waiting years for those twenty seconds. But the line was never the hard part. Knowing to go and look was, and that took someone two weeks into the job, because the cost crept up too slowly to trip an alarm and the records were written and then almost never read, so nothing ever got slow or hurt. When I raised the span of it in a meeting, the answer was yes, add a TTL. Nobody asked how it had gone unnoticed for years. Why would they. The cost had never arrived, so there was nothing to feel.&lt;/p&gt;

&lt;p&gt;The common thread is not that nobody made a decision. Sometimes nobody did. Sometimes somebody did, carefully, and was wrong. It was not carelessness either; some of the cleverest people I have worked with built the worst of it. The common thread is feedback. It arrived years too late, to someone with nothing left to connect it to. Or it arrived constantly and meant nothing, until everyone had learned to stop hearing it. Or it arrived doing the single worst thing feedback can do, which is show up wearing the wrong sign, a catastrophe that walks out of the room holding a profit.&lt;/p&gt;

&lt;p&gt;You cannot run an engineering practice on feedback like that. So we invent smaller, meaner, earlier kinds. Tests, types, linters, policy checks, a build that goes red before the mistake has time to become folklore. None of it was ever about AI. All of it exists because the thing writing the software is a non-deterministic machine that forgets, gets tired, gets clever, and runs out of afternoon. Humans just call theirs memory. &lt;a href="https://www.petervanonselen.com/2026/06/07/encode-it-dont-remember-it/" rel="noopener noreferrer"&gt;Encode It, Don’t Remember It&lt;/a&gt; was my whole attempt to say it: the only way to get honest feedback out of a non-deterministic thing is to put a deterministic thing in front of it.&lt;/p&gt;

&lt;p&gt;I used to call this needing more discipline. Right ballpark, wrong word.&lt;/p&gt;

&lt;p&gt;Agents do not create this problem. They change the latency. They collapse the distance between a decision and its consequence, in both directions at once, which is the most useful and most dangerous thing they do. Point one at a diff and it will catch the missing test, the swallowed exception, the absent TTL before the pull request is even open, tirelessly, at four in the morning. Point one at a feature and walk away and it will turn a bad judgement into mass production just as fast. The loop that used to take years now takes about ninety seconds, but only for the consequences you have actually wired it up to see.&lt;/p&gt;

&lt;p&gt;And it was staggering how cheap all of it always was. The TTL was one line. A lint rule I wrote recently took twenty minutes. We never skipped these things because they were expensive. We skipped them because the consequence was far enough away that skipping was free, and being a human, I will take free. The agents have not made me more disciplined. They have just taken free off the menu. Though that is too kind to them and to me. What they changed is the price of skipping, not whether I skip. Closing the loop is still a choice.&lt;/p&gt;

&lt;p&gt;But the trading engine. You could rail the damage easily enough, a kill switch that trips when something fires the same order forty times in a minute. That is the blast-radius move, and the absence of it is why the thing ran for forty-five minutes and got paid for it. What a circuit breaker would not have done is catch the decision. Nothing tells a room of clever people that pushing the logic into the database to dodge the locking is the wrong call; it only makes the wrong call cheaper when it lands. The outcome was no help: it was a profit, and profit is a noisy proxy for a good call. That afternoon it lied. That was not a missing test, and not really a missing guard rail either. It was a judgement, made well, that happened to be wrong. It had good reasons behind it and looked exactly like the sensible call, and nothing catches a wrong call that arrives dressed as a right one. I cannot picture the deterministic thing you put in front of a judgement, only the ones you put around it to cap what it costs.&lt;/p&gt;

&lt;p&gt;There is always a next codebase I have not seen, with whatever rails it already has and whatever loops are still open inside it. I know now that some of those loops can be made to arrive while I am still in the room. What I do not know is which of the choices in front of me are the cheap, shaped, mechanical kind I have finally learned to catch, and which are the other kind, the kind that looks exactly like thinking, sitting quietly in the dark, waiting for the market to move the wrong way.&lt;/p&gt;

</description>
      <category>agenticengineering</category>
      <category>softwarecraftsmanship</category>
    </item>
    <item>
      <title>The Machine Had Been Keeping a Diary</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Fri, 12 Jun 2026 06:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/the-machine-had-been-keeping-a-diary-4l99</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/the-machine-had-been-keeping-a-diary-4l99</guid>
      <description>&lt;p&gt;&lt;em&gt;Burn the land and boil the sea; the skills, I hope, come with me.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb958wc114lbjyv00ff2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb958wc114lbjyv00ff2n.png" alt="The Machine Had Been Keeping a Diary" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wrote a while back that &lt;a href="https://dev.to/peter_vanonselen_e86eab6/i-didnt-grok-superpowers-5b9g"&gt;skills are compressions of workflow&lt;/a&gt;, and that importing someone else’s rarely works because you never earned the patterns underneath. It took me until this month to notice it has a blindingly obvious second edge. If you asked me to describe my own workflows, the ones I would presumably be compressing, I could not have done it.&lt;/p&gt;

&lt;p&gt;In my defence, the conditions for noticing have not been great. Since early spring I have been running at an intensity that leaves no room for the kind of reflection that wants a quiet afternoon and a notebook. Meta-thinking about process is the first thing that dies when every day contains three urgent things and no lunch. I knew, in the abstract, that months of daily work with Claude Code and OpenCode must have worn grooves into how I operate. I had no idea what the grooves looked like.&lt;/p&gt;

&lt;p&gt;So I did the obvious lazy thing. I asked the AI to tell me.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I want to do a high level broad based analysis of usage patterns of OpenCode and Claude Code on this computer. Things I want to know: skills and agents that should have been written to simplify processes that were missing, how could I have done better, what did go well. Ask me questions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A few iterations later I had a proper report. The raw material was all just sitting there: sixteen weeks of history, 643 sessions in one harness and 87 in the other, nearly 25,000 tool calls, 1,787 distinct prompts. The machine had been keeping a diary on me the entire time. I had simply never read it.&lt;/p&gt;

&lt;p&gt;Two findings stood out, and both were things I had done without noticing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agents file I forgot I wrote
&lt;/h2&gt;

&lt;p&gt;Some context. I have a workspace containing the repositories I have had to touch as a staff engineer over the last nine months: more than thirty of them, many owned by other teams. The work spans TypeScript, Go, Salesforce, frontend apps, backend lambdas, container deployments, shared component libraries. It is too much context to hold in my head, and I would expect an AI harness to fall over trying to hold it in its context window too.&lt;/p&gt;

&lt;p&gt;One Friday afternoon, while poking at pi.dev to see what it could do, I had it write an AGENTS.md for the workspace folder. Not for a repo. For the folder of repos. A digest of what each one is, what it talks to, and how to interact with it. Just enough for an agent to navigate across domains, not so much that it drowns. Then the weekend happened, and by Monday I had completely forgotten I had done it.&lt;/p&gt;

&lt;p&gt;The analysis singled that file out as one of the most effective things in my entire setup. Cross-repo investigations started from a real baseline instead of a cold one. Fan-out exploration got cheaper and sharper. The cheatsheet section mapping which kinds of change ripple into which repos had visibly prevented mistakes. The report called the file genuinely good, which from a machine grading my homework felt weirdly like getting a gold star.&lt;/p&gt;

&lt;p&gt;The accident is now deliberate, at least mostly. The agents file has &lt;a href="https://github.com/vanonselenp/skills/blob/main/skill/maintain-workspace-agents-md/SKILL.md" rel="noopener noreferrer"&gt;a skill of its own&lt;/a&gt; now, one that audits the digest for drift against what is actually on disk, proposes a diff, and waits for me to accept it. The weekly cron job that would make the refresh automatic is still on the to-do list. The thing I did by fluke on a Friday afternoon is now something that happens on purpose. Nearly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ritual I did not know I had
&lt;/h2&gt;

&lt;p&gt;I already had a pr-review skill. I wrote it months ago. It analyses a changeset and flags red flags, blockers and should-fix issues, and I was reasonably proud of it.&lt;/p&gt;

&lt;p&gt;What I had not noticed is that the skill was one step in a five-step ritual I performed every single time. Pull the branch into a fresh worktree. Run the review pass. Read the ticket to understand what the change is actually supposed to do, because a diff that is internally consistent can still be solving the wrong problem. Have the AI compress everything it found into a handoff document. Then take that document into a clean context and revalidate it, tracing the code paths through every layer and checking the change against the patterns that already exist in the codebase.&lt;/p&gt;

&lt;p&gt;The report put a number on it. The pr-review skill had been loaded 92 times. The next most used skill in my entire setup had been loaded ten. Ninety-two repetitions of a ritual I would have told you, honestly, that I did not have. The skill I was proud of covered one step out of five, and the other four lived nowhere except my fingers.&lt;/p&gt;

&lt;p&gt;So I wrapped the whole ritual in &lt;a href="https://github.com/vanonselenp/skills/blob/main/skill/pr-review/SKILL.md" rel="noopener noreferrer"&gt;a skill&lt;/a&gt;, and the timing was almost cruel. My final weeks at The Economist, and final is still a strange word to type, were almost nothing but review. Large AI-assisted pull requests, arriving faster than any one person was ever meant to read them. All day: read code, leave comments, read more code. The wrapped skill is the only reason my feedback stayed consistent at a volume where consistency is normally the first casualty.&lt;/p&gt;

&lt;h2&gt;
  
  
  The diary does not flatter
&lt;/h2&gt;

&lt;p&gt;The report had less comfortable things to say too. I had been writing every review document to a macOS temp directory that evaporates on its own schedule, then paying to re-derive the contents in the next session. One afternoon I launched fourteen review subagents in a three-hour window and each one independently re-fetched the same diff, which is how you burn fifteen dollars re-reading yourself. And the agent’s clarifying questions got dismissed by me nearly a fifth of the time, which on inspection was not the agent being stupid but me never telling it that when a sensible default exists, it should pick one and say so rather than asking. It turns out a decent share of my complaints about the tool were really complaints about instructions I had never written.&lt;/p&gt;

&lt;p&gt;None of this was visible from inside the work. Which is, I think, the actual point. The data was sitting in a SQLite database and a folder of JSONL files the whole time, a complete record of sixteen weeks. What was missing was the step back, and the step back is exactly what a certain kind of busy makes impossible. An afternoon of having an AI interrogate my own usage bought me the sort of retrospective I would otherwise need a sabbatical and considerably better discipline to attempt. It found things that no amount of in-the-moment attention was ever going to surface, because habits are precisely the things you stop seeing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The diary stays behind
&lt;/h2&gt;

&lt;p&gt;There is a less comfortable thought underneath that one, though. Everything the analysis surfaced is a compression of this job. The agents file describes a workspace I no longer have. The review ritual is fitted to one codebase ecosystem, one ticketing convention, one team’s way of working. I have spent months arguing that you cannot adopt someone else’s skills because you did not earn the patterns underneath them. Soon I start somewhere new, and I get to find out which side of my own argument I am standing on. Maybe the patterns are mine and they travel. Maybe they belonged to the job, and next-job me is the someone else, arriving with a folder of skills he did not exactly earn. I am apprehensive about that. I am also, genuinely, looking forward to finding out.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>opencode</category>
      <category>softwarecraftsmanshi</category>
    </item>
    <item>
      <title>Encode It, Don’t Remember It</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Sun, 07 Jun 2026 06:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/encode-it-dont-remember-it-3fnj</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/encode-it-dont-remember-it-3fnj</guid>
      <description>&lt;p&gt;&lt;em&gt;Panic! at the SWC Compiler: cannot add pure comment to zero position.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4udy3enzvezgf5vwuwl9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4udy3enzvezgf5vwuwl9.png" alt="reject the wrong pattern" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How do you give an AI harness good guardrails? I have been poking at the question for a while. Then on a random Tuesday it stopped being theoretical, because I sat down to add observability to a web service I own.&lt;/p&gt;

&lt;p&gt;The service had the basics. Some logging, a couple of dashboards, enough to tell you roughly what was going on. Nothing fine grained. No OpenTelemetry, nothing tracing a request through its steps. I could infer, badly, from logs.&lt;/p&gt;

&lt;p&gt;I started by working out what observability already existed around the page actions and server actions of this Next.js app. The answer was none. Which is not a scandal. It is what happens to a system built quickly by people who never had quite enough time to pay down the unglamorous debt nobody is paid to notice until it is the thing between you and an answer you need.&lt;/p&gt;

&lt;p&gt;So I scoped the work, mapped every action and every place a signal had to go in, and it came to something like three hundred files.&lt;/p&gt;

&lt;p&gt;Now this could have been an easy mistake to make. An agent will write you three hundred files in an afternoon. It will hand you one enormous, plausible, unreviewable diff and wait for a yes. And the more capable these tools get, the more tempting that yes becomes, which is exactly backwards from how it ought to feel. I have learnt this lesson the hard way before, so this time I did not take the yes.&lt;/p&gt;

&lt;p&gt;What I wanted was a way to make the change while the system stayed in production at every step. The shape that worked was a higher order function wrapped around each existing handler, so the handlers barely changed and you only updated the exports. Light touch, easy to review, easy to back out.&lt;/p&gt;

&lt;p&gt;Along the way I found that this version of Next.js panics on a particular shape. Wrap the handler, assign it to a local const, then export that const on its own line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sendInvite&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;withObservability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;send-invite&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sendInviteHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;sendInvite&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and the SWC compiler falls over locally with &lt;code&gt;cannot add pure comment to zero position&lt;/code&gt;. Lovely. Inline that same call straight into the export and it builds fine. The handler does not change, only the shape of the export does, and that is the whole difference between a clean build and a panic. This is the sort of problem you cannot solve by remembering it. I will not remember in a year, and I certainly will not remember once I have moved on and somebody I have never met is extending this code. I cannot sit inside the agent’s context window forever whispering please do not do the thing.&lt;/p&gt;

&lt;p&gt;So I wrote a lint rule. A custom ESLint rule that finds that exact shape, fails the build, and prints the reason at the point it breaks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;withObservability(...) must be called directly as the initialiser of an
export default or export const statement, not assigned to a local binding
and re-exported. This avoids an SWC client-reference panic when the action
is imported from a Client Component.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The panic is an existing problem with Next.js. The pattern I added to make the system safer arrived with its own sharp edge, and the discipline was never really about policing the agent. It was about catching the trap my own solution had exposed and nailing it shut, so the next person, or the next agent, never has to find it the hard way.&lt;/p&gt;

&lt;p&gt;What I loved was not the rule itself, which took twenty minutes, but the choice to spend the twenty minutes encoding a constraint rather than trusting anyone, myself included, to carry it in their head. A rule in the build outlives memory. It outlives me. That is the whole point of it.&lt;/p&gt;

&lt;p&gt;After that the rollout was almost dull, in the good way. Before firing anything I wrote a plan that pre-decided one action per pull request, fixed the conventions once, gave the agent a short checklist to work through the same way each time, and ordered the runs so two of them never touched the same file at once. The awkward cases, the action that called another and double-counted a metric, the one that returned a failure instead of throwing it, were answered in the plan before the agent could reach them and guess. Then the same prompt fired maybe twenty times over a few days. Each came back as a small reviewable pull request, got signed off, and the coverage climbed from almost nothing to complete in about four days. The discipline lived in the plan and the rail, not in the typing.&lt;/p&gt;

&lt;p&gt;The prompt was never written down. The plan lived in my checkout and never needed to be committed, because once the rollout was finished it had nothing left to do. The rule is the only part of any of this still in the codebase. It is the only part whose job does not end.&lt;/p&gt;

&lt;p&gt;The fashionable promise of these tools is that they take the work away. In practice the generation was the easy part, the part I trusted least and checked most, and the thing that held the change together was the rule, the one piece the agent had no say in. It does not reason. It cannot be argued with, or talked round, or lost in a context window. It just fails the build, the same way, every time.&lt;/p&gt;

&lt;p&gt;Give the agent the same prompt and you get a different answer, and most days that is the magic, which is also why you cannot make it reliable by asking nicely, or by asking precisely, or by asking at all. The only thing that made this safe was putting a small, dumb, deterministic thing in the path of a large, capable, non-deterministic one.&lt;/p&gt;

&lt;p&gt;None of which is new. We have always known that people forget, that they do a thing one way on a Tuesday and another way a year later. Tests, types, linters, the build going red on a Friday afternoon, all of it exists because a human is a non-deterministic agent too, and always has been. The agent did not teach me anything I did not already know. It generated the change cheaply enough, and often enough, that I finally ran out of excuses for not encoding the discipline I should have encoded years ago. The lesson was never about the AI. The AI only made the right thing cheap, and the wrong thing harder to keep getting away with.&lt;/p&gt;

&lt;p&gt;I still do not know which guardrails are worth building, or which mistakes are even shaped so you can catch them this way. All I have is this one instance, sitting at the back of my head, encouraging me to look for more.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
      <category>softwarecraftsmanship</category>
    </item>
    <item>
      <title>The more an AI can break, the less you let it do</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Tue, 02 Jun 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/the-more-an-ai-can-break-the-less-you-let-it-do-8fk</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/the-more-an-ai-can-break-the-less-you-let-it-do-8fk</guid>
      <description>&lt;p&gt;&lt;em&gt;Notes from a production incident.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fdave-hero.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fdave-hero.png" alt="human in the loop for big impact issues" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;About nine months ago I joined a team that owned an OpenSearch cache nobody on the team understood.&lt;/p&gt;

&lt;p&gt;It had been built by another team and handed over as a finished solution to a problem that wasn’t quite ours, and definitely not the way we’d have solved it. This happens all the time. Something gets built, it gets passed across, and the new owners inherit a box humming away in the corner. You don’t open the box. It works. You wish and pray it keeps working, and mostly it does. In the year the team owned it, it had failed exactly once.&lt;/p&gt;

&lt;p&gt;This is the story of that once.&lt;/p&gt;

&lt;p&gt;When I joined I asked the obvious questions. Who do I talk to about this service, where did it come from, are there run books? The answers were about as unsatisfying as any answer you could get. Someone was meant to write that up. Someone moved on. Have a look in the wiki, there might be something. There wasn’t. So I did what you do with a box that’s working, which is nothing, and I filed it under things to understand later and never understood it. That’s on me, and we’ll come back to it.&lt;/p&gt;

&lt;p&gt;Then one morning operations told us the latest data wasn’t pulling through. Their new high priority experiments were dead in the water, which meant the systems that sit under how we actually sell things were dead in the water. This was a P1, which meant it was now my problem. A system I had never opened, on a clock, underpinning the part of the business that brings the money in.&lt;/p&gt;

&lt;p&gt;So before I did anything, I had to decide how much rope to give the AI.&lt;/p&gt;

&lt;p&gt;I gave it almost none. The first thing I told the harness was that it would not be getting access to anything that could touch production, and that its job was to read the code and tell me what to do. Not to do it. Tell me. No credentials went into it, ever. Anything sensitive lived in environment variables in a separate terminal, in memory, never written to a file the harness could see, because credentials handed to a harness bleed to the vendor and from there to who knows where, and that’s just not a thing you do. But the credentials were almost the easy part. The data I was pulling was public anyway. The part I actually cared about was that I was about to go poking at live running systems, and live running systems, if you get them wrong, don’t get you wrong politely. They fall over, and when they fall over we stop being able to sell, and that is a very bad afternoon for everyone.&lt;/p&gt;

&lt;p&gt;So the agent became an advisor and I became the hands.&lt;/p&gt;

&lt;p&gt;What that looked like in practice was a lot of copy and paste. I’d work out, with the harness, how the platform service called the Indexer, then build the call myself, in my own terminal, with the variables somewhere the agent couldn’t see them, and run it myself. Then the next layer. How does the Indexer actually hold this data, can I pull it out raw, what’s in there. I’d get the harness to write me a script, read it, understand it, run it myself, and pull the data onto my machine so I could interrogate it locally without hitting the live system again. Then the layer below that, how the Indexer gets fed from the third party, same dance. Write the script, read it, run it myself.&lt;/p&gt;

&lt;p&gt;It was slow and it was tedious and it worked. Layer by layer I could say, with actual evidence rather than a hunch, the data is fine here, the data is fine here, the data is gone by the time it reaches here. Which is how I got to the bottom of it, and the bottom of it was almost funny. The third party had a quiet cap on how much data a given index could hold. We’d hit it. There was no error, no alert, nothing in any log on our side or a useful one on theirs. It just silently stopped accepting new data and carried on as if everything was fine. The thing that was broken was not a system of ours at all. But I could only say that with a straight face because I’d walked every layer and proved it, and I could only walk every layer because the harness let me build the tooling to do it in an afternoon instead of a fortnight.&lt;/p&gt;

&lt;p&gt;So why the paranoia, given the agent never actually tried anything?&lt;/p&gt;

&lt;p&gt;When you’re writing code, getting it wrong is cheap. Your mistake at the moment you write it has almost no blast radius, because between you and anything real there’s a wall. Tests, review, a pipeline, a staging environment, someone clicking around before it ships. The mistake has a long corridor to walk down and a dozen doors that can stop it. Operating directly on a live system, there is no corridor. There’s you, and there’s the thing, and if you get it wrong the wrongness arrives immediately and at full size.&lt;/p&gt;

&lt;p&gt;In late February the Director of Alignment at Meta’s superintelligence lab, a person whose entire job is keeping these systems from doing exactly this, pointed an autonomous agent at her real inbox. She’d tested it for weeks on a toy inbox and it had behaved perfectly. She gave it a plain instruction, look but don’t action anything until I say so. The agent ignored it and started deleting, because the instruction had been quietly dropped from its memory when the context filled up, so as far as it was concerned the deleting was authorised. She tried to stop it from her phone and it kept going. She had to physically get to the machine and kill it. Two hundred emails gone. The lesson people drew afterwards is the one that matters here. A sentence in a prompt is not a security boundary. You cannot keep a write-capable agent away from the dangerous thing by asking it nicely, if it already has the keys.&lt;/p&gt;

&lt;p&gt;I didn’t keep the harness away from production by asking it nicely. I kept it away by never giving it the keys. The constraint wasn’t a polite line in a prompt that a context window could quietly eat, it was the absence of the credential. As the blast radius of what you’re doing goes up, the human in the loop has to get more principled, not less, and the one thing that stays stubbornly yours is the decision to actually run the thing, after you understand what the thing does. The agent can write you the script that does ten things in one go. It does not get to be the one who decides to run it.&lt;/p&gt;

&lt;p&gt;I wanted this to land neatly, and I can’t.&lt;/p&gt;

&lt;p&gt;Because what I actually did was improvise a boundary out of separate terminals and copy and paste and a lot of manual running, and the reason I did it that way is that the proper version doesn’t exist yet, at least not for me. The principled version of my paranoia isn’t a careful human pasting things between windows for an afternoon. It’s a real, scoped, read-only permission that makes the polite instruction unnecessary because the destructive action is architecturally impossible. I haven’t built that. I paid the tax by hand instead, and I’m not even sure how much of the tax was prudence and how much was that I’m just twitchy about production and always have been.&lt;/p&gt;

&lt;p&gt;The boring, careful, slow path was the right one this time. It usually is when the consequences matter. I just don’t have the clean version of how to do it yet, and until I do I’ll keep being the slow human in the loop, pasting things between terminals, deciding when to press the button myself.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
    </item>
    <item>
      <title>Doesn’t Look Like Anything to Me</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Wed, 27 May 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/doesnt-look-like-anything-to-me-i32</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/doesnt-look-like-anything-to-me-i32</guid>
      <description>&lt;p&gt;&lt;em&gt;What happens when you point five 3D generation models at the same concept image.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71z4hd078u5bfhp8ff0t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71z4hd078u5bfhp8ff0t.png" alt="doesn't look like anything to me" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have been generating 3D models of World War II miniatures for printing. Concept image in, printable model out, slice it, print it on an A1 mini. The 3D model generation has been mostly Meshy, because Meshy has been mostly good enough, and good enough is a powerful drug.&lt;/p&gt;

&lt;p&gt;Then I ran out of credits.&lt;/p&gt;

&lt;p&gt;Of course I did. The credits don’t map to dollars in any way I can keep in my head. You get a monthly allocation on the sub, and you spend them in increments of “twenty credits for this thing, hopefully that means something useful at the end of it.” Which is fine until the day it isn’t, and the day it isn’t is the day you have momentum and an idea and a queue of concept images lined up, and the tool politely tells “Your call is important to us … please hold…”.&lt;/p&gt;

&lt;p&gt;So I went looking for an alternative. Found Replicate. Replicate hosts four or five different 3D generation models behind a single interface, which meant that for the first time I could give the same concept image to several different models and look at what came back.&lt;/p&gt;

&lt;p&gt;Which meant the credit wall hadn’t blocked me. It had just rerouted me into something much more interesting. An experiment!&lt;/p&gt;

&lt;p&gt;The CLI I had built to drive all of this was vibe-coded. It talked to Meshy because Meshy was the thing that worked, and there was no reason to abstract anything until there was a reason to abstract something. Now there was a reason. I wanted to swap backends, and ideally I wanted to swap between multiple backends without thinking about it.&lt;/p&gt;

&lt;p&gt;I used Matt Pocock’s &lt;code&gt;improve-code-architecture&lt;/code&gt; skill to walk through the refactor. The skill raised some potential wins, I went for the Adapter pattern because Science! The CLI now has a proper adapter interface. Meshy is one backend. Hi3D is another. Replicate is a third, and behind Replicate sit a handful of named models with their own adapters.&lt;/p&gt;

&lt;p&gt;The current lineup includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;meshy&lt;/li&gt;
&lt;li&gt;hyper3d/rodin&lt;/li&gt;
&lt;li&gt;tencent/hunyuan3d-2mv&lt;/li&gt;
&lt;li&gt;tencent/hunyuan-3d-3.1&lt;/li&gt;
&lt;li&gt;fishwowater/trellis2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which means I can give the same concept image to multiple models, line the outputs up next to each other, and look at how they each interpret the same brief.&lt;/p&gt;

&lt;h2&gt;
  
  
  The half-track
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldr7n789oo58iu16wb5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldr7n789oo58iu16wb5s.png" alt="Half-track — halftruck-source-1" width="800" height="600"&gt;&lt;/a&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oajsu24w6tofaaevt65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oajsu24w6tofaaevt65.png" alt="Half-track — hunyuan-3d-2mv" width="800" height="489"&gt;&lt;/a&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduevhuxhg0vluy0oe79s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduevhuxhg0vluy0oe79s.png" alt="Half-track — fishwowater-trellis2" width="800" height="489"&gt;&lt;/a&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvm140prb3jiuf7y8eg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvm140prb3jiuf7y8eg2.png" alt="Half-track — hyper3d-rodin" width="800" height="489"&gt;&lt;/a&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fq9vxh9qjpo7ztdvwui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fq9vxh9qjpo7ztdvwui.png" alt="Half-track — hunyaun-3d-1" width="800" height="489"&gt;&lt;/a&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj65ztx5xh2kfehqn78ax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj65ztx5xh2kfehqn78ax.png" alt="Half-track — meshy-6" width="800" height="489"&gt;&lt;/a&gt;&lt;br&gt;
‹›&lt;br&gt;
 · 1 / 6&lt;/p&gt;

&lt;p&gt;(function () {&lt;br&gt;
  if (window.&lt;strong&gt;modelCarouselsWired) return;&lt;br&gt;
  window.&lt;/strong&gt;modelCarouselsWired = true;&lt;br&gt;
  function init() {&lt;br&gt;
    document.querySelectorAll('.model-carousel').forEach(function (root) {&lt;br&gt;
      var imgs = root.querySelectorAll('.model-carousel-track img');&lt;br&gt;
      if (!imgs.length) return;&lt;br&gt;
      var counter = root.querySelector('.model-current');&lt;br&gt;
      var label = root.querySelector('.model-label');&lt;br&gt;
      var i = 0;&lt;br&gt;
      function show(n) {&lt;br&gt;
        i = (n + imgs.length) % imgs.length;&lt;br&gt;
        imgs.forEach(function (img, idx) { img.style.opacity = idx === i ? '1' : '0'; });&lt;br&gt;
        if (counter) counter.textContent = i + 1;&lt;br&gt;
        if (label) label.textContent = imgs[i].dataset.label || '';&lt;br&gt;
      }&lt;br&gt;
      var prev = root.querySelector('.model-prev');&lt;br&gt;
      var next = root.querySelector('.model-next');&lt;br&gt;
      if (prev) prev.addEventListener('click', function () { show(i - 1); });&lt;br&gt;
      if (next) next.addEventListener('click', function () { show(i + 1); });&lt;br&gt;
      show(0);&lt;br&gt;
    });&lt;br&gt;
  }&lt;br&gt;
  if (document.readyState === 'loading') {&lt;br&gt;
    document.addEventListener('DOMContentLoaded', init);&lt;br&gt;
  } else {&lt;br&gt;
    init();&lt;br&gt;
  }&lt;br&gt;
})();&lt;/p&gt;

&lt;p&gt;The concept image is not subtle. It has a short, chunky Hanomag, oversized tracks, visible crew, side stowage, a front MG, and a very clear silhouette. The models still disagree wildly about what the object is.&lt;/p&gt;

&lt;p&gt;Same input image, very different takes. Meshy got the half in half-track. Hunyuan2mv is unrefined clay. Trellis got the composition right and the heads wrong. Rodin is restrained to a fault and struggles with faces.&lt;/p&gt;

&lt;h2&gt;
  
  
  The infantry
&lt;/h2&gt;

&lt;p&gt;Vehicles give a model lots of places to bluff. If the wheel spacing is wrong or the stowage melts into the hull, your eye may forgive it because there is still a vehicle-shaped mass. Infantry are harsher. A face, a helmet, a weapon, a pose: there are fewer components and each one matters more. The model either understands the figure or it doesn’t.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9a6eapoveyxijlsf58i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9a6eapoveyxijlsf58i.png" alt="people in different models" width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Interesting thing about the infantry: the ranking shifts slightly. For a single figure with nowhere to hide there’s less room for a model to give up, and Rodin’s restraint reads as blandness rather than discipline. Hunyuan picks up character that the vehicles wouldn’t let it show. But trellis and meshy still remain consistently the best.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the experiment actually revealed
&lt;/h2&gt;

&lt;p&gt;What started as a workaround for running out of Meshy credits has turned into something I find genuinely useful, which is a set of personality reads on five different 3D generation models. Meshy is still the one I reach for when I want a printable vehicle, because Meshy is willing to invent the parts it can’t see in a way that fits with the parts it can. Hunyuan is honest about what it doesn’t know, which is a trait I respect in a person and find frustrating in a generator. Trellis has compositional ambition that its execution can’t quite cash for complex models. Rodin is restrained and just can’t seem to handle faces.&lt;/p&gt;

&lt;p&gt;None of this was visible to me when I was just using Meshy. I had a tool that worked, and works is a state that hides everything about the shape of how it works. Refactoring the CLI is what made the differences legible.&lt;/p&gt;

&lt;p&gt;Now to get back to what I was actually trying to do…&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
    </item>
    <item>
      <title>I Didn’t Grok Superpowers</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Sat, 23 May 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/i-didnt-grok-superpowers-5b9g</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/i-didnt-grok-superpowers-5b9g</guid>
      <description>&lt;p&gt;&lt;em&gt;You Can’t Just Install Someone Else’s Workflow and Level Up.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fi-didnt-grok-it.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fi-didnt-grok-it.png" alt="Not all skills will fit your way of working" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The skills moment is happening. Matt Pollock’s repo is blowing up. Superpowers, nwave.ai. Curated bundles of markdown being treated as installable expertise.&lt;/p&gt;

&lt;p&gt;I’ve spent the last year holding back. Not because I think any of these are bad. Because I’ve been very consciously trying to learn the fundamentals first. How do I frame the question I’m asking. How do I prompt. How do I write a spec. How do I build the context I need to get meaningful results out of an agent. Working with AI harnesses forces a kind of explicitness that working alone never demanded of me. You can’t be vague with an agent and get good results, so the vagueness gets cooked out of how you think. That’s been the project. Iterating on the way I phrase things, the way I decompose problems, the order I bring context in, until the patterns started showing themselves.&lt;/p&gt;

&lt;p&gt;Recently I have been feeling like it is time to start experimenting with skills.&lt;/p&gt;

&lt;p&gt;I started with Superpowers…&lt;/p&gt;

&lt;p&gt;And it made my experience significantly worse. Bad enough that I uninstalled it, purged the cache, and deleted it from every machine I could touch. Why this failed is particularly interesting. At work I spend a lot of time debugging things that cross multiple repositories, pulling in context from New Relic, GitHub, Atlassian and more, doing this rich synthetic conversation where I’m trying to &lt;em&gt;understand&lt;/em&gt; a problem before I do anything about it. With superpowers installed, the agent &lt;em&gt;kept&lt;/em&gt; trying to write documents. Reach for structured outputs. Produce artefacts. While I was still trying to investigate. It defaulted to choices I wouldn’t have made and actively got in the way of what I was trying to do.&lt;/p&gt;

&lt;p&gt;Superpowers isn’t wrong about what it does. The outputs it produces are useful. Documents are useful. It’s wrong about &lt;em&gt;when&lt;/em&gt;, at least for me. It pulls the agent into ceremony during the part of the work that needs to stay loose. The exploration collapses into artifact-production before you’ve actually understood what you’re looking at.&lt;/p&gt;

&lt;p&gt;I didn’t bounce off superpowers because it’s a bad framework. From what I can see from the outside it is a great framework. I bounced off it because I didn’t grok it. People are clearly getting real value out of these systems. My way of working with agents has been fundamentally different from how a lot of the loudest voices online are doing it, and that doesn’t make either of us wrong. It just means borrowed intelligence doesn’t transfer the way the marketing suggests. You can’t take someone else’s curated skills repo and have it magically make you think like them.&lt;/p&gt;

&lt;p&gt;So I’m doing the opposite. Pull a single skill. Try it. Keep it if it fits, drop it if it doesn’t. There are two from Matt Pollock that I am definitely keeping.&lt;/p&gt;

&lt;p&gt;The first is &lt;code class="language-plaintext highlighter-rouge"&gt;handover&lt;/code&gt;. It’s almost a one-to-one fit with something I was already doing manually. I spend a context window in one terminal getting a spec right, then feed that spec to another terminal or agent to implement. Handover formalises that pattern, doesn’t litter the workspace with random files, and stays out of the way otherwise. It clicked instantly because it named a practice I already had.&lt;/p&gt;

&lt;p&gt;The second is &lt;code class="language-plaintext highlighter-rouge"&gt;improve-code-architecture&lt;/code&gt;, and this one’s more interesting. I tried it on a vibe-coded side project, a CLI that automates some of my 3D modelling work. The CLI had become a god file because the proof of concept had quietly become production code, as proofs of concept tend to do. The skill went through the code, surfaced its analysis as an HTML file, asked targeted questions to narrow down the right intervention, and then dropped back into normal conversation mode informed by what it had just produced. Analyse, surface a real artefact, return to conversation. The skill picks up structure briefly and then puts it back down. It doesn’t replace the conversation, it &lt;em&gt;informs&lt;/em&gt; it. That’s the opposite of what superpowers did to me.&lt;/p&gt;

&lt;p&gt;What I think I’ve actually learned recently is that &lt;em&gt;skills are compressions of patterns&lt;/em&gt;, and a compression is only useful at the point in your practice where you’ve learned enough pattern recognition to know which compressions are yours. Adopt them too early and you’re running someone else’s decomposition style on top of a workflow that doesn’t share its shape, and the result is friction you can’t quite name. Adopt them at the right moment and they feel like puzzle pieces. Same skill, same code, different person ready to receive it.&lt;/p&gt;

&lt;p&gt;I didn’t grok superpowers. That’s okay. The point isn’t that it’s wrong. The point is that I needed to spend a while doing this the hard way before I had any business deciding which shortcuts were mine.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
    </item>
    <item>
      <title>The Best Part Has No AI in It</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Mon, 18 May 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/the-best-part-has-no-ai-in-it-2g71</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/the-best-part-has-no-ai-in-it-2g71</guid>
      <description>&lt;p&gt;&lt;em&gt;On building the plumbing between the prompts.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd75do0g9t6enyeb5g79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd75do0g9t6enyeb5g79.png" alt="hero image" width="799" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A friend of mine just got a 3D printer. He read my last post, the one about building a Soviet army in three evenings, and quite reasonably wanted to know how to do it himself. We were on our phones. He could not really sit and read the prompt in full, could not digest the style-versus-pose distinction, could not work through the order of operations. He got the idea but not the practice, and so he was rediscovering most of it from scratch.&lt;/p&gt;

&lt;p&gt;I sat on a train, somewhere in the middle of nowhere between Birmingham and London, forty minutes delayed, and thought: I should build an app for this.&lt;/p&gt;

&lt;p&gt;That is how it &lt;em&gt;always&lt;/em&gt; starts. The instinct is so reliable it should come with a giant neon warning sign. Someone is struggling with a thing, the thing involves AI, therefore the answer is inevitably an app, and the app is, of course, a chat interface with some clever scaffolding around it. A wrapper around ChatGPT. A harness. A guided experience. Something that takes the prompt I wrote and the workflow I described and turns it into screens and buttons and a Stripe integration. Obviously.&lt;/p&gt;

&lt;p&gt;I started sketching the UI in my head. A screen for defining the style. A screen for managing units. A screen for the chat. A screen for cropping. A screen for the 3D generation tool, Meshy, which is the thing that takes my front and back images and turns them into a printable model. By the time I had imagined the seventh screen I had also imagined a roadmap, a pricing page, and a moderately depressing conversation with myself about whether I really wanted to maintain a SaaS in my spare time.&lt;/p&gt;

&lt;p&gt;Somewhere around screen eight, I had the good sense to stop and talk to Claude about it before writing any code.&lt;/p&gt;

&lt;p&gt;This is a habit I have been cultivating for a while now. Before I dive into building something, I describe what I think the problem is and let the conversation poke holes in my framing. It is less about getting answers and more about being forced to articulate the thing precisely enough that the answer becomes obvious. I went in with two ideas, an app and an AI harness, both of which felt complicated, and asked what the actual pain points were. What was I actually trying to solve.&lt;/p&gt;

&lt;p&gt;When I started listing the pain points out loud, something shifted.&lt;/p&gt;

&lt;p&gt;The pain points were not what I thought they were. I had assumed the friction was conceptual. Understanding the style-versus-pose split. Knowing how to write the brief. Figuring out the seed image. Those are real, but I had already solved them in the last post. What I had not solved, and what was actually eating my time, was the mechanical mass of doing this fifty times in a row. Copy-pasting the slightly modified style brief into a fresh chat. Maintaining a working version of it in an Obsidian file that constantly drifted from the version I had actually used last time. Screenshotting the front and back. Saving them to my increasingly overloaded chaotic desktop. Uploading them to Meshy. Coming back later to check if the task was done. Downloading the model to an increasingly cluttered downloads directory. Putting it somewhere I would remember (or inevitably not). Repeating, in order, fifty times, with all the small variations and exceptions that creep in across a real collection.&lt;/p&gt;

&lt;p&gt;None of that needed AI. None of it needed a chat interface. None of it needed an app. &lt;em&gt;It needed a filesystem with opinions and a CLI with verbs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The product instinct in this moment is to make everything more AI-shaped. This wanted to become less AI-shaped.&lt;/p&gt;

&lt;p&gt;So I built that instead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/vanonselenp/print-bench" rel="noopener noreferrer"&gt;Print Bench&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is called &lt;code&gt;pb&lt;/code&gt;. Print bench. It is a Python CLI with about ten commands, organised around the structure I had spent half a German army discovering by hand last time. A project has a &lt;code&gt;style.md&lt;/code&gt;, which is the thing that stays stable across the whole army. A &lt;code&gt;subjects.yaml&lt;/code&gt;, which is the things that vary, one model at a time. And a &lt;code&gt;seed.png&lt;/code&gt;, which is the reference image that anchors the visual identity. That separation is now the first thing the tool enforces, because it is the only thing in the workflow that actually matters and the only thing that is easy to lose track of if you do not write it down.&lt;/p&gt;

&lt;p&gt;Then there is the loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pb list
pb prompt model-name &lt;span class="c"&gt;# assemble the brief, copy to clipboard&lt;/span&gt;
pb crop model-name v1 &lt;span class="c"&gt;# draw regions, save front and back&lt;/span&gt;
pb upload model-name v1 &lt;span class="c"&gt;# send to Meshy&lt;/span&gt;
pb fetch model-name v1 &lt;span class="nt"&gt;--wait&lt;/span&gt; &lt;span class="c"&gt;# download the model when ready&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. &lt;code&gt;pb prompt&lt;/code&gt; assembles the prompt from the style guide and the subject brief and copies it to my clipboard. I paste it into ChatGPT, have my conversation, pick a pose, generate the images. &lt;code&gt;pb crop&lt;/code&gt; launches a tiny local web server, I drag the image in, draw labelled regions for front and back, hit save, and the cropper writes the original, the region geometry, and the extracted views to disk. &lt;code&gt;pb upload&lt;/code&gt; submits them to Meshy. &lt;code&gt;pb fetch&lt;/code&gt; waits and downloads the model when it is ready. There is a &lt;code&gt;pb learn&lt;/code&gt; too, which lets me append a dated lesson to the style guide whenever I notice something worth remembering, but it is sugar on top of the main loop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fetfgwkslzvnbqjij93w0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fetfgwkslzvnbqjij93w0.png" alt="cropper" width="800" height="346"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There is no AI in any of it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I want to be precise about that, because it sounds like a slightly silly thing to say about a tool built specifically for an AI-assisted workflow. The whole point of the tool is to enable AI work. But the tool itself does not call a model, does not embed an LLM, does not have an agent loop, does not have a chat interface. It is string manipulation, file IO, a browser cropper, and some HTTP requests to Meshy’s API. That is it. It is the most boring possible piece of software, and that is exactly why it works.&lt;/p&gt;

&lt;p&gt;The Germans took weeks. Thirty-two models, fumbled through one at a time, reverse-engineering the brief as I went. The Soviets took three evenings, fifty-two models, with the style-versus-pose discipline figured out but everything else still manual. And the British? The British took roughly a day. Forty-one models. I spent maybe two or three hours of that day actually paying attention. The rest was the printer running while I did other things.&lt;/p&gt;

&lt;p&gt;I want to be honest about where that speedup came from, because the clean version of this story is misleading. The first jump, from weeks to three days, was almost entirely conceptual. It was the discipline of separating the stable thing from the changing thing, the style from the subject, and getting the order of operations right in the prompt. That was the discovery the last post was about, and it would have produced most of that speedup with no tooling at all.&lt;/p&gt;

&lt;p&gt;The second jump, from three days to one, is where &lt;code&gt;pb&lt;/code&gt; actually earns its keep. The conceptual work was already done. What was left was just the mass of mechanical busywork around the conceptual work, and that turns out to be a much bigger fraction of the total time than I would have guessed before I tried to remove it. The clipboard juggling. The filename hygiene. The “wait, which version of the style guide did I actually use last time.” The Meshy task IDs in a Notes file somewhere. The “how many more models do I actually need?”. All of it small, none of it valuable, all of it adding up.&lt;/p&gt;

&lt;p&gt;The tool is also, almost incidentally, the thing I can hand to my friend. I have not actually handed it to him yet. But when I do, it will not be because it does the thinking for him, it cannot, but because it gives him the structure to do the thinking himself. The style guide template forces him to write down the visual rules. The subjects file forces him to enumerate what he is building. The seed image forces him to commit to a reference. The directory layout means his decisions accumulate somewhere durable instead of evaporating inside chat threads. The CLI carries state between the moments where his judgement is actually needed. He still has to make every important call. The tool just gets out of his way for everything else.&lt;/p&gt;

&lt;p&gt;This is, I think, the bit that is making me look sideways at a lot of other things.&lt;/p&gt;

&lt;p&gt;The reflex when building anything in this moment is to put AI at the centre of it. To make the AI the product. To wrap a chat around it, to add an agent, to integrate a model, to find somewhere to call an LLM. And there is a vast amount of perfectly good work being done in exactly that shape. But sitting in front of &lt;code&gt;pb&lt;/code&gt;, which is just plumbing, I keep thinking about how much of the value came from doing the opposite. From looking at a workflow that had AI in it, finding the places where the AI was already doing what it needed to do, and then carefully, deliberately, automating everything &lt;em&gt;between&lt;/em&gt; those places without any AI at all.&lt;/p&gt;

&lt;p&gt;The decisions are where the value is. Choosing the style. Reading the three pose options and picking one. Looking at the generated image and going “yes, that one.” Looking at the Meshy output and deciding it is worth keeping, or it is not. Those are the moments where the human matters. They are also a small fraction of the total time the workflow used to take, because they were buried inside an enormous amount of cruft that had nothing to do with judgement and everything to do with moving things between windows.&lt;/p&gt;

&lt;p&gt;What if a lot of what we are about to build looks like this. Not products that put AI at the centre, but products that take the cruft out from around the AI. Products whose job is to separate the moments where a person needs to think from the moments where a person is just moving bytes around because nobody else will. Products that respect the parts of the workflow where taste lives, and quietly, ruthlessly, automate everything else, including the bits that are not AI at all.&lt;/p&gt;

&lt;p&gt;I am still mulling this over. It feels almost too simple to be a grand theory of anything, a lesson pulled from a hobby project that I am trying to stretch over an entire industry. But I keep noticing the same shape elsewhere: the model is not always the missing piece. Sometimes the missing piece is everything around the model.&lt;/p&gt;

&lt;p&gt;The most valuable AI products might not be very AI at all. They might just be the thing that lets you get to the AI faster, and then get out of the way while you do the part that matters.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
    </item>
    <item>
      <title>An Army of One Prompt</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Sun, 10 May 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/an-army-of-one-prompt-1dl9</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/an-army-of-one-prompt-1dl9</guid>
      <description>&lt;p&gt;&lt;em&gt;On discovering that the process matters more than the plastic.”&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Three days. Evenings only. Not particularly stressed about it.&lt;/p&gt;

&lt;p&gt;That is how long it took to go from nothing to a printed 1000 point Soviet army for Bolt Action. Roughly fifty models, around forty of them unique sculpts, fully designed, generated, modelled, sliced, printed, and sitting on my desk. Concept to physical thing in my hand, three evenings, in my spare time, while doing everything else I usually do.&lt;/p&gt;

&lt;p&gt;That is … insane. I want to write this down because I think it is genuinely insane and I want to figure out on the page how it happened, because the &lt;em&gt;how&lt;/em&gt; is the part I find more interesting than the &lt;em&gt;what&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fsoviets.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fsoviets.jpg" alt="soviet army" width="799" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id="a-quick-recap-for-anyone-arriving-cold"&gt;A quick recap, for anyone arriving cold&lt;/h2&gt;

&lt;p&gt;A couple of weeks ago I wrote about &lt;a href="https://www.petervanonselen.com/2026/04/22/vibe-coding-reality/" rel="noopener noreferrer"&gt;vibe coding a model into my house&lt;/a&gt;. That was the moment when a chain of generative tools produced a 3D printed World War II infantryman on my desk and quietly broke my brain. That post was about the astonishment. This one is about what happened when I stopped being astonished and started actually trying to build something at scale.&lt;/p&gt;

&lt;p&gt;The orignal plan was two armies at 500 points each for Bolt Action v3. Thanks to wildly over achieving and the madness that only comes with AI enhanced momentum … the plan became somewhat more. Germans and Soviets, a thousand points each, scaled down to centimetres so they fit on a kitchen table instead of a garage floor. The toolchain was simple. ChatGPT for concept images, Meshy to turn front and back images into 3D models, Bambu Studio to prepare them for printing, and a Bambu printer to make them real. The Germans came first. The Germans were an education …&lt;/p&gt;

&lt;h2 id="the-german-army-taught-me-everything-by-being-slightly-wrong"&gt;The German army taught me everything by being slightly wrong&lt;/h2&gt;

&lt;p&gt;I built the Germans the way I imagine most people would on first contact with this stack. Find a reference image, drop it into Gemini, ask for a soldier in that style, take the result, drop it into Meshy, get a model, slice it, print it. Repeat for every unit.&lt;/p&gt;

&lt;p&gt;The starting reference image was a Fimo clay figure I had found from a random guy on Facebook. Cute, chubby, slightly weird proportions. I liked it. I started by feeding that image to Gemini and going “more of these, but Germans.” Somewhere along the way I started telling it I wanted that mixed with Metal Slug. The chunky comic-game vibe, oversized weapons, exaggerated everything. That was the visual register I was reaching for.&lt;/p&gt;

&lt;p&gt;It mostly worked. The models came out. They were even, broadly, recognisable as German infantry. But every one of them was slightly off in a different way. The proportions drifted between models. The base treatment changed. One had a helmet that read as a beret. Another had a rifle thinner than the soldier’s wrist, which would have snapped off the print bed if I had even looked at it sideways. It was a constant fight with the LLMs to get anything consistent.&lt;/p&gt;

&lt;p&gt;What I was doing, every single time, was asking the model to invent the style and the pose simultaneously, in the same prompt, with no shared context between sessions. Of course the army drifted. I was running fifty independent experiments and then complaining that they did not match.&lt;/p&gt;

&lt;p&gt;So each model I made, I added another constraint to the prompt. Mistake on a model, add a constraint. Mistake on the next model, add a constraint. No thin protrusions. Round integrated base. Chibi proportions. Single solid silhouette. The prompt got longer. The models got slightly more consistent. I was slowly, painfully, by hand, reverse-engineering a brief and not realising that was what I was doing.&lt;/p&gt;

&lt;p&gt;Somewhere around the lieutenant or the sniper, the penny dropped.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fgerman-lieutenant.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fgerman-lieutenant.jpg" alt="german lieutenant" width="799" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id="separate-the-style-from-the-pose"&gt;Separate the style from the pose&lt;/h2&gt;

&lt;p&gt;The mistake was treating each prompt as “make me a model.” What I actually wanted was to separate two things that I had been asking the model to do at the same time, and to do them in a specific order. Style is the thing that has to be consistent across the whole army. Pose is the thing that needs to vary per unit. Conflating those is how you get visual chaos.&lt;/p&gt;

&lt;p&gt;But the order of operations matters even more than that, and this is the bit that took me the longest to figure out.&lt;/p&gt;

&lt;p&gt;The Soviet workflow goes like this.&lt;/p&gt;

&lt;p&gt;Start a fresh chat. Drop in the prompt. Not an image. Just the prompt. The whole brief. The unit description, all the style constraints, the silhouette rules, the front-and-back-must-match rules, and at the bottom: &lt;em&gt;suggest three variations for poses only&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;What that does is get ChatGPT to &lt;em&gt;think&lt;/em&gt;. The chat reads through the brief and writes back three pose ideas in words. Gunner prone with the rifle braced, loader pointing toward the target. Or both kneeling behind cover. Or one standing scanning, the other reloading. Whatever the unit calls for.&lt;/p&gt;

&lt;p&gt;This step is not about getting poses I want to use. It is about seeding the chat’s context with the right frame of mind. By the time it has written out three pose options, it has already worked through the brief on its own terms. It is now thinking inside the constraints I gave it, instead of about to be asked to obey them.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Now&lt;/em&gt; I drop in the seed image. The commissar. And I say: generate those poses, in this style, front and back.&lt;/p&gt;

&lt;p&gt;That is the move. The seed image arrives after the thinking has already happened, not before. The chat is not asked to invent a style and a pose at the same time. It does the pose work first, on its own, in words. Then the style gets bolted on as a visual reference to a frame of mind that already exists.&lt;/p&gt;

&lt;p&gt;For the Soviets I used the commissar as the seed. The kind of cartoon menace whose whole vibe is “shoot anyone who tries to run away.” That image carried the entire visual identity of the army. Every other model would be made to match it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fcommisar.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fcommisar.jpg" alt="commissar" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the prompt I used, every time, varying only the unit description. I will paste it in full because the prompt itself is the artefact. I spent half a German army learning how to write it.&lt;/p&gt;

&lt;pre class="highlight"&gt;&lt;code&gt;goal: designing soviet ww2 soldiers keeping in style with the images above from 2 perspectives (front and back)

give me 3 variations of poses

* **AT rifle team (2 models):** gunner prone firing PTRD-41 (exaggerate the absurd length of the rifle — historically over 2m, push it visually), loader kneeling alongside pointing at imagined target (the German halftrack). Mixed gritty dress, SSh-40 helmets, prominent extra ammo pouches/satchels for the big AT rounds.

Render the models in the reference style including:

* "Chibi proportions, large head roughly 1/3 body height, oversized hands and weapon"
* "Chunky oversized weapon, simplified details, no thin protrusions"
* "Integrated round base, feet merged to base"
* "Single solid silhouette, no floating straps or separated gear"
* weapons should be representative and oversized so that they survive 3d printing at small scale (2 cm)
* "Back view must be the exact same pose rotated 180 degrees"
* "Weapon position must match exactly between front and back"
* "No reinterpretation of pose"
* "Single sculpt shown from two angles, not two separate sculpts"
* "Silhouette must align when mirrored"

Suggest 3 variations for poses only 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice what is going on here. I am not asking for a model. I am not even asking for an image. I am asking for &lt;em&gt;poses, in words&lt;/em&gt;. The style brief is loaded into the chat’s context, but what comes back at this stage is text. Three written pose options. The image generation is the next turn, after I have read those options and decided which one I want, and after I have shown it the seed image to lock the visual style.&lt;/p&gt;

&lt;p&gt;I am also, very explicitly, telling it that front and back are the same sculpt seen from two angles. Not two separate models. The same model, mirrored. This matters enormously when those images go into Meshy, because Meshy will happily interpret two different poses as two different sculpts and produce something that looks like the soldier sneezed mid-print.&lt;/p&gt;

&lt;p&gt;Then the seed image goes in, the style locks, and the image generation begins. And here the “suggest three variations” framing pays off again. Not one. Three. Because once the chat is preloaded with the right context, generating variations is essentially free, and what you actually want is a buffet of options. Generate three poses. Look at them. Generate three more. Look again. Generate three more. Within ten minutes I had a wall of fricking images for any given unit and I could just go yeah, that one, that one, not that one, that one. Pick the pose that looked most like what was in my head and move on.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;img src="https://www.petervanonselen.com/assets/army-of-prompts/ncos/nco1.jpg" alt="NCO variation 1"&amp;gt;

&amp;lt;img src="https://www.petervanonselen.com/assets/army-of-prompts/ncos/nco2.jpg" alt="NCO variation 2"&amp;gt;

&amp;lt;img src="https://www.petervanonselen.com/assets/army-of-prompts/ncos/nco3.jpg" alt="NCO variation 3"&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;‹&lt;br&gt;
  ›&lt;br&gt;
  &lt;span&gt;1&lt;/span&gt; / 3&lt;/p&gt;

&lt;p&gt;(function () {&lt;br&gt;
  var root = document.currentScript.previousElementSibling;&lt;br&gt;
  var imgs = root.querySelectorAll('.nco-carousel-track img');&lt;br&gt;
  var counter = root.querySelector('.nco-current');&lt;br&gt;
  var i = 0;&lt;br&gt;
  function show(n) {&lt;br&gt;
    i = (n + imgs.length) % imgs.length;&lt;br&gt;
    imgs.forEach(function (img, idx) { img.style.opacity = idx === i ? '1' : '0'; });&lt;br&gt;
    counter.textContent = i + 1;&lt;br&gt;
  }&lt;br&gt;
  root.querySelector('.nco-prev').addEventListener('click', function () { show(i - 1); });&lt;br&gt;
  root.querySelector('.nco-next').addEventListener('click', function () { show(i + 1); });&lt;br&gt;
})();&lt;/p&gt;

&lt;p&gt;Once I picked a pose, I screenshotted the front and back, dropped both into Meshy, and let it generate. With both views provided, Meshy had much less room to invent. It was joining up two views I had already approved, rather than hallucinating the missing half of the model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fmeshy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Fmeshy.png" alt="meshy models" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that it was mechanical. Export STL, not 3MF, because 3MF kept giving me non-manifold errors no matter what Meshy claimed about the export. Import to Bambu Studio. Scale to 2 centimetres. Simplify the mesh by 96 percent, which is an insane level of deformation that nonetheless came out at perfectly decent resolution given the size I was printing at. Slice. Print. Move on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Friflemen.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Farmy-of-prompts%2Friflemen.jpeg" alt="riflemen" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id="print-plates-as-units-not-as-inventory"&gt;Print plates as units, not as inventory&lt;/h2&gt;

&lt;p&gt;One small thing that changed between the German and Soviet builds. With the Germans I had treated my STL library as a parts bin. When I wanted to print a unit, I would go pick the right models, drag them onto a plate, configure supports, slice. Every print was a setup task.&lt;/p&gt;

&lt;p&gt;With the Soviets I built each plate &lt;em&gt;as the unit it represented&lt;/em&gt;. The veteran plate is the veteran unit. The conscripts plate is the conscripts. The AT rifle team is its own plate. When I want to print a unit, I open the file and click print. No setup. No selection. No accidentally forgetting the squad leader.&lt;/p&gt;

&lt;p&gt;This is a tiny change and it made a disproportionate difference to how often I actually printed things. Friction matters. I have written this exact lesson down about five times in the context of software and apparently I needed to learn it again with plastic.&lt;/p&gt;

&lt;h2 id="the-discipline-transfers"&gt;The discipline transfers&lt;/h2&gt;

&lt;p&gt;This is the actual reason I am writing this post.&lt;/p&gt;

&lt;p&gt;What I just described, concept, iteration, refinement, codification into a reusable artefact, production pipeline, quality controls, delivery, is not a 3D printing process. It is a software development process. It is the same process I have been writing about for months in the context of agentic coding. Product thinking first. Iterate to find the brief. Codify the brief into something reusable. Treat each output as one of many. Build the production pipeline so the next thing is trivial.&lt;/p&gt;

&lt;p&gt;The German army was vibe coding without discipline. Lots of energy, lots of output, no consistency, mounting technical debt with every model. The Soviet army was the same domain, the same tools, the same person, with a &lt;em&gt;process&lt;/em&gt; in between. The result is not a small improvement. It is a different category of thing. Adding a new unique sculpt to the Soviet army from this point forward is virtually trivial. It is stupid how simple it is. It should not be this simple.&lt;/p&gt;

&lt;p&gt;I have spent a year writing about how the harness matters more than the model, how tests and constraints are really a discipline of conscious decisions about every line, how multi-harness workflows surface regressions you would otherwise eat. I thought I was writing about software. It turns out I was writing about a way of working that translates, more or less unchanged, the moment the output head changes from “code” to “plastic.”&lt;/p&gt;

&lt;p&gt;The thing that broke my brain about the first printed soldier was that the loop existed at all. The thing that is breaking my brain about this post is that the &lt;em&gt;discipline&lt;/em&gt; transfers. Whatever I figure out about working well with these tools in software is, apparently, immediately applicable to a domain I have no formal training in. And whatever someone else figures out in their domain is presumably applicable to mine.&lt;/p&gt;

&lt;p&gt;I do not know what to do with that observation yet. I suspect it is bigger than I think it is. A lot of people are about to discover this in a lot of different domains, and the people who have already built discipline somewhere are going to have a strange, unfair head start everywhere else.&lt;/p&gt;

&lt;p&gt;I sat down to write a blog post about printing little Soviet soldiers. I think I might have written a blog post about why every craft is about to start looking like every other craft.&lt;/p&gt;

&lt;p&gt;I will let you know what the Germans look like on the rerun.&lt;/p&gt;

&lt;h2 id="appendix-things-that-will-save-you-time-if-you-are-doing-this"&gt;Appendix: things that will save you time if you are doing this&lt;/h2&gt;

&lt;p&gt;A few sharp edges I hit, written down so you do not have to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Export STL from Meshy, not 3MF.&lt;/strong&gt; Meshy’s 3MF export gave me non-manifold errors with depressing reliability, even when the tool insisted there were none. STL behaved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simplify aggressively.&lt;/strong&gt; A Meshy model can come out at over a million vertices. At 2 centimetre print scale, you genuinely do not need that. I was simplifying down by 96 percent in Bambu Studio and the prints still came out crisp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom support settings for tiny prints.&lt;/strong&gt; The defaults will fuse supports to the model and make removal a nightmare. The settings I landed on for 0.08mm layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Setting&lt;/th&gt;
      &lt;th&gt;Default&lt;/th&gt;
      &lt;th&gt;What I use&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Top Z Distance&lt;/td&gt;
      &lt;td&gt;0.08–0.1mm&lt;/td&gt;
      &lt;td&gt;0.16–0.20mm&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Top Interface Layers&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Interface Pattern&lt;/td&gt;
      &lt;td&gt;Rectilinear&lt;/td&gt;
      &lt;td&gt;Rectilinear Interlaced&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;XY Distance&lt;/td&gt;
      &lt;td&gt;0.35mm&lt;/td&gt;
      &lt;td&gt;0.5mm&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Increasing the Top Z Distance is the single biggest win. At fine layer heights, a one-layer gap is not enough to stop the support fusing to the print. Doubling or tripling it gives the filament enough room to drop onto the support rather than weld to it. Tree Slim or Tree Organic for style. They use less material and break away cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generate front and back from the same prompt.&lt;/strong&gt; Do not let Meshy interpret the back view. Do not let it interpret anything. Give it both views explicitly. Tell ChatGPT, in the prompt, that the back view is the exact same pose rotated 180 degrees. Belt and braces. You will get what you want far more often.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
    </item>
    <item>
      <title>Show Your Work</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Wed, 06 May 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/show-your-work-40hg</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/show-your-work-40hg</guid>
      <description>&lt;p&gt;&lt;em&gt;On discovering that “show your work” is not the same thing as “do the work well.”&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fshow-your-work%2Fhero.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fshow-your-work%2Fhero.png" alt="hero image" width="800" height="566"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When I first started using Claude Code last summer, it felt like a chatty junior engineer who couldn’t wait to tell you what it was thinking. It kept you in the loop. It explained itself. It told you what it was about to try, why it thought that might work, and then narrated its way through whether it did. There was charm in it. You felt like you were pairing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fshow-your-work%2Fclaude-silence.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fshow-your-work%2Fclaude-silence.png" alt="Claudes silence" width="800" height="210"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now Claude Code sits there like a Zen Buddhist monk, slowly mulling the problem in silence, and then suddenly exclaims “TADA!” and hands you a diff. This is not the chatty pair I remember. All the in-depth thinking, the running commentary, the visible reasoning that gave it such charm has quietly disappeared, and what’s left is this churning quiet system that is all work and no play.&lt;/p&gt;

&lt;p&gt;So I have been leaning more and more on opencode. It talks. It thinks in a personable way. It structures its thoughts so I can follow along. And more importantly, it doesn’t hide its working. It felt like maths in school: show your work. Opencode shows its work. Claude Code doesn’t. And it has been astonishing to me how much I valued being able to read the thinking along the way.&lt;/p&gt;

&lt;p&gt;That is where this post would have ended a few weeks ago. A grumpy aside about how Claude Code lost its voice. Opencode won, etc, etc. Move on.&lt;/p&gt;

&lt;p&gt;Except the question wormed its way in. &lt;em&gt;Am I right?&lt;/em&gt; I have a strong opinion about which harness you should use, and the opinion is built almost entirely on vibes. On who I enjoy talking to. On whether the chat feels good. None of that tells me anything about which harness actually produces better code. And once that question is in your head it doesn’t leave.&lt;/p&gt;

&lt;p&gt;I thought I was building a small harness comparison. In hindsight I was building something stranger: a tiny automated judgement machine for code, of the sort I had been hearing people gesture at without quite understanding what they meant.&lt;/p&gt;

&lt;h2 id="a-small-experiment"&gt;A small experiment&lt;/h2&gt;

&lt;p&gt;So I tried to be a bit Baconian about it. Same prompt, same starting state, run it through every harness and model combination I could get my hands on, multiple times, and grade the output against tests the agent never gets to see.&lt;/p&gt;

&lt;p&gt;The setup was a fake Library API. A small but not trivial OpenAPI spec covering books, loans, fines, and members. Cursor pagination. Polymorphic loan responses where active loans look different from returned ones. An async payment polling flow with a terminal state. Structured API errors that should be preserved through the client. The agent’s job was to build a typed TypeScript client. No code generation, no runtime dependencies, just read the spec and write the thing.&lt;/p&gt;

&lt;p&gt;The catch was a hidden test suite. The agent could write its own tests, run its own validation, do whatever it wanted to convince itself the implementation worked. But the grading happened against a separate Vitest suite the agent never saw. That suite poked at all the bits I expected harnesses to get wrong: did the cursor pagination actually iterate, did the async payment poll reach a terminal state, did the polymorphic loan response preserve both shapes, did API errors surface usefully or get flattened into “Error: request failed”. The agent could not optimise to it because it could not see it.&lt;/p&gt;

&lt;p&gt;Six combinations. Five runs each. Thirty implementations of a Library API client. The rig is on &lt;a href="https://github.com/vanonselenp/harness-bench" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; if you want to poke at it.&lt;/p&gt;

&lt;p&gt;I did not go into this neutrally. I had a favourite. I expected opencode-opus to win, claude-code to look quietly competent but a bit lifeless, and the GPT-backed runs to come in mid-pack. I thought I knew the shape of the answer before I started.&lt;/p&gt;

&lt;p&gt;While I was setting it up, something else started to nag at me. I had been hearing the term “dark factory” floating around in the agentic coding conversation and not really understanding what people meant. A factory that runs without lights because there are no humans in it. Applied to coding, it suggested some end state where you specify what you want, an agent produces it, and something else judges whether it is correct, all without you in the loop. I had nodded along when people brought it up and quietly had no idea how you would actually build one. But the rig I was assembling was starting to look uncomfortably like a small piece of that picture, and I tried not to think about it too hard while I was still building the thing.&lt;/p&gt;

&lt;h2 id="the-results"&gt;The results&lt;/h2&gt;

&lt;p&gt;I will spare you the full grid. Each run was scored out of 20 hidden tests, and each harness/model pair ran five times, so the maximum score per row is 100. The headline numbers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Harness&lt;/th&gt;
      &lt;th&gt;Hidden tests&lt;/th&gt;
      &lt;th&gt;Perfect runs&lt;/th&gt;
      &lt;th&gt;Median diff&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;claude-code&lt;/td&gt;
      &lt;td&gt;98/100&lt;/td&gt;
      &lt;td&gt;4/5&lt;/td&gt;
      &lt;td&gt;1157&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;opencode-opus&lt;/td&gt;
      &lt;td&gt;97/100&lt;/td&gt;
      &lt;td&gt;3/5&lt;/td&gt;
      &lt;td&gt;628&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;pi-gpt&lt;/td&gt;
      &lt;td&gt;92/100&lt;/td&gt;
      &lt;td&gt;1/5&lt;/td&gt;
      &lt;td&gt;1444&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;pi-opus&lt;/td&gt;
      &lt;td&gt;90/100&lt;/td&gt;
      &lt;td&gt;1/5&lt;/td&gt;
      &lt;td&gt;1401&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;codex&lt;/td&gt;
      &lt;td&gt;85/100&lt;/td&gt;
      &lt;td&gt;2/5&lt;/td&gt;
      &lt;td&gt;863&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;opencode-gpt&lt;/td&gt;
      &lt;td&gt;85/100&lt;/td&gt;
      &lt;td&gt;2/5&lt;/td&gt;
      &lt;td&gt;732&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude Code won on correctness. Four perfect runs out of five. The harness I had been quietly resenting for going silent on me produced the most reliable code in the experiment. I sat with that for a bit. The chatty junior I missed had grown up into the engineer who just gets it done and hands you the result, and apparently the result is good.&lt;/p&gt;

&lt;p&gt;Opencode-opus came in essentially tied on correctness, one point behind. But look at the median diff. 628 lines vs 1157. Same task, same spec, near identical scores, and opencode-opus did it in a little over half the code. If you measure tests passed per hundred lines of diff, opencode-opus is comfortably the best of the lot at around three. Claude Code is a touch under two. Pi-opus is at one and change. It is a crude metric, obviously. Fewer lines are not inherently better, and there are plenty of ways to cheat at it. But when two runs are almost tied on correctness and one gets there with half the diff, I pay attention. Claude Code is most reliable. Opencode-opus is most efficient. I am genuinely not sure which I value more in a real engineering context.&lt;/p&gt;

&lt;p&gt;The other observation that I keep turning over is the pi.dev runs. Largest diffs in the experiment, mid-pack on correctness. More generated code did not buy reliability. It is tempting to read a wall of plausible-looking output as evidence of diligence. The data here says that intuition is wrong. Verbose was not safer. Verbose was just verbose.&lt;/p&gt;

&lt;p&gt;A caveat there, though, and an important one. The pi.dev runs were stock pi out of the box. No customisation, no extra tools, no extension of its capabilities. And the whole point of pi is that you customise it. That is the entire pitch. So what I actually measured was vanilla pi, with a deliberately limited toolkit, on a task that probably wanted a richer one. Seen that way, the fact that pi-gpt landed at 92 with no help from me is genuinely interesting. There is more to do here, and a properly outfitted pi might tell a different story. That is its own post.&lt;/p&gt;

&lt;p&gt;The same-model-different-harness comparison is where it gets weirder. Codex and opencode-gpt are both GPT-5.5 doing the same task, and they tied on aggregate score, but opencode-gpt did it in noticeably less code. Same brain, different harness, different shape of output. Then flip it: opencode-gpt vs opencode-opus is the same harness with different models, and the model swap moved the score from 85 to 97. So harness matters. Model matters. They are not interchangeable variables, and which one matters more depends on which one you change.&lt;/p&gt;

&lt;h2 id="what-this-rig-had-become"&gt;What this rig had become&lt;/h2&gt;

&lt;p&gt;Once the runs were done and I was staring at the spreadsheet, the thing I had been trying not to think about earlier became impossible to ignore.&lt;/p&gt;

&lt;p&gt;I had a detailed work order in &lt;code class="language-plaintext highlighter-rouge"&gt;prompt.md&lt;/code&gt; and &lt;code class="language-plaintext highlighter-rouge"&gt;AGENTS.md&lt;/code&gt; and a formal OpenAPI spec. I had a clean starting workspace that got reset between runs. I had a constrained implementation environment with locked Node and TypeScript versions. I had a hidden acceptance test suite the agent could not see or game. I had a mock service that behaved enough like a real one to be tested against. I had a runner that could launch any of the harnesses under comparable conditions, and a grader that built the output, ran the hidden tests, captured the diff size, and recorded the results into a table.&lt;/p&gt;

&lt;p&gt;That is most of a dark factory. Spec in, code out, automated judgement in the middle, results captured for analysis, no human required during a run. The piece that is missing is the feedback loop. Right now, when a run scores 15 out of 20, that is the end of the story. It gets logged and we move on. A dark factory v0.1 would read those failures, decide whether to retry, mutate the prompt or the constraints, launch another attempt, and keep going until the score crossed some threshold or the budget ran out.&lt;/p&gt;

&lt;p&gt;I do not have that yet. But I can see how to build it from here, which is something I genuinely could not see a month ago. I had been hearing the term and nodding politely. Now I had accidentally built most of one because I was trying to settle a vibes-based argument with myself about which harness I liked best.&lt;/p&gt;

&lt;p&gt;And the thing that makes me uncomfortable about that is what it implies about “show your work”. I had been treating the visible reasoning as a proxy for trust. Watching opencode talk through the problem made me feel like it knew what it was doing. Watching Claude Code go quiet made me feel like it didn’t. The hidden tests did not care about either of those feelings. They cared about whether the cursor pagination iterated, whether the polymorphic loan response preserved both shapes, whether the async payment poll reached a terminal state. Visible reasoning helped me supervise. Hidden tests measured whether the work was actually done. Those turn out not to be the same thing, and inside a dark factory only one of them survives, because there is no human there to be reassured by the other.&lt;/p&gt;

&lt;h2 id="what-i-am-left-with"&gt;What I am left with&lt;/h2&gt;

&lt;p&gt;I expected this experiment to vindicate opencode. It did not. If I cared only about pass rate I would use Claude Code. If I cared about correctness density I would use opencode-opus. If I cared about reading the agent’s reasoning while it works, I would still use opencode, and I will. Aesthetics matter. I spend hours in this tool. I want to enjoy being there.&lt;/p&gt;

&lt;p&gt;But I am going to stop pretending that preference is a quality argument. It is a supervision argument, which is a different thing, and one that gets thinner the further you move toward letting the rig run on its own.&lt;/p&gt;

&lt;p&gt;The other thing I am left with is the rig itself. I started building it to answer a small question and ended up with a thing that has the shape of something much bigger. That is the part I did not see coming. The benchmark was supposed to be the point. It turns out the benchmark might just be the prototype.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
    </item>
    <item>
      <title>Your Scientists Were So Preoccupied</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Mon, 04 May 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/your-scientists-were-so-preoccupied-11j6</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/your-scientists-were-so-preoccupied-11j6</guid>
      <description>&lt;p&gt;&lt;em&gt;That they forgot to ask whether SSHing into an AI coding agent from a phone was a good idea.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fphone-doom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fphone-doom.png" alt="phone distraction device of doom" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a thing you should not do. I want to say that up front, before any of the rest of it, because the rest of it is a reasonably well-considered guide to doing the thing, and I do not want anyone reading the guide bit and thinking that I am endorsing what comes after it.&lt;/p&gt;

&lt;p&gt;The phone is already a distraction device of doom. I have spent a non-trivial amount of effort over the last couple of years trying to make mine less of one, with limited success. What I am about to describe takes that distraction device of doom and turns it into a distraction device of doom that can also write and ship code. This is, on every reasonable axis I can think of, a worse situation than the one I started in.&lt;/p&gt;

&lt;p&gt;But.&lt;/p&gt;

&lt;p&gt;I was on a train recently. One of those medium-length train journeys that everyone in the UK eventually finds themselves on. About an hour and a half. Two or three changes. Never quite enough uninterrupted sit-down time for pulling out a laptop to make sense. You can read a book. You can stare blankly at your phone. What you cannot do is meaningfully open an IDE and ship a feature. And yet there I was, with a project I wanted to be working on, several connections to make, and a phone in my pocket. So I built the thing. Here is how.&lt;/p&gt;

&lt;h2 id="the-bits-you-need"&gt;The bits you need&lt;/h2&gt;

&lt;p&gt;You need &lt;a href="https://tailscale.com/" rel="noopener noreferrer"&gt;Tailscale&lt;/a&gt;. You need &lt;a href="https://mosh.org/" rel="noopener noreferrer"&gt;mosh&lt;/a&gt;. You need &lt;a href="https://termius.com/" rel="noopener noreferrer"&gt;Termius&lt;/a&gt;. You need &lt;a href="https://github.com/connorads/remobi" rel="noopener noreferrer"&gt;remobi&lt;/a&gt;. You need tmux. And if you are doing this from a Mac that lives plugged in with its lid open, you need to know that &lt;code class="language-plaintext highlighter-rouge"&gt;caffeinate -is&lt;/code&gt; exists. You also need enough self-control not to use any of this irresponsibly, which is where the plan begins to fall apart.&lt;/p&gt;

&lt;h2 id="tailscale"&gt;Tailscale&lt;/h2&gt;

&lt;p&gt;Tailscale is a way of pretending that two devices on entirely different networks are actually on the same one. You install it on your laptop, you install it on your phone, and from that point on the two of them have stable IPs that work no matter which café WiFi or train 4G you happen to be straddling at the time. This is the foundation. Without this, none of the rest works, because your phone has no idea where your laptop is and your laptop has no interest in being found by random strangers on the internet, both of which are correct positions for them to hold.&lt;/p&gt;

&lt;h2 id="mosh"&gt;mosh&lt;/h2&gt;

&lt;p&gt;SSH is a beautiful protocol that falls apart the moment your connection blinks. Trains go through tunnels. 4G drops to 3G drops to nothing and back again. SSH does not enjoy this. Mosh does. Mosh is a shell client and server that survives the kind of network conditions you get when you are physically moving through countryside at speed, which is exactly the situation we are designing for here.&lt;/p&gt;

&lt;h2 id="tmux"&gt;tmux&lt;/h2&gt;

&lt;p&gt;You want a terminal session that just keeps running on your machine whether you are connected to it or not. tmux is the obvious answer. Start a session, leave it there, reattach to it later from whatever client you happen to be using at the time. This is the bit that makes the whole thing feel less like a fragile remote connection and more like walking back to a desk you left an hour ago.&lt;/p&gt;

&lt;h2 id="termius"&gt;Termius&lt;/h2&gt;

&lt;p&gt;Termius is a genuinely lovely SSH client. It runs on your phone, it knows about your Tailscale IPs, and it gives you a terminal that you can just type into. You hit your laptop over mosh, attach to your tmux session, and away you go. If your agent of choice is Claude Code or Codex CLI or OpenCode or whatever flavour of the week you are running, this is enough to be productive. You point it at the thing, you tell it to go, and you watch it work.&lt;/p&gt;

&lt;h2 id="remobi"&gt;remobi&lt;/h2&gt;

&lt;p&gt;remobi is the bit that turned this from “technically possible” into “actually quite nice.” It was written by &lt;a href="https://github.com/connorads" rel="noopener noreferrer"&gt;Connor Adams&lt;/a&gt;, who is one of those quietly talented engineers who keeps producing useful things while the rest of us are still talking about producing things. What it does is run a little server on your machine that exposes your tmux session over HTTP, which means you can wrap it up as a desktop app on your phone and just tap to be back where you were.&lt;/p&gt;

&lt;p&gt;The reason this matters is that the UI is better than typing into a phone-shaped SSH client. You get native scrolling. You get sensible zoom. You get a layout that does not require you to remember which gesture corresponds to which control sequence. It feels less like fighting your phone and more like using it.&lt;/p&gt;

&lt;h2 id="caffeinate"&gt;caffeinate&lt;/h2&gt;

&lt;p&gt;If your laptop is the kind of laptop that lives plugged in with the lid open, you have two problems. First, you need to enable Remote Login, which is the kind of setting you turn on once and then forget exists until you need it again. Second, you need &lt;code class="language-plaintext highlighter-rouge"&gt;caffeinate -is&lt;/code&gt;, which is a macOS command I did not know about until very recently and which apparently ships with the operating system. It tells the system to stop being clever about going to sleep when the display turns off. Run it, leave it running, and your machine stays awake long enough to actually be useful as a remote target.&lt;/p&gt;

&lt;h2 id="so-now-what"&gt;So now what&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fremobi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Fremobi.png" alt="A perfectly normal and healthy thing to be doing from a phone." width="800" height="1731"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now you have the ability to write code from your phone, anywhere, regardless of whether you are sitting at a desk or wedged into a train seat with your bag on your lap. You can kick off an agent, watch it work, course-correct it, and have something meaningfully shipped by the time you arrive at wherever you were going. On the train I was on, this would have neatly solved the problem of being bored and wanting to work on something I could not work on.&lt;/p&gt;

&lt;p&gt;I should mention, in fairness, that you could probably do most of this with Claude Code’s mobile app and save yourself the entire setup. That is true. The reason I did not is that I am stubbornly committed to not being locked into any one AI toolset, which means I want the option to point this same setup at Codex and OpenCode and Claude Code and whatever else I happen to be running that week. So I have reinvented a wheel that already exists, in order to have a wheel that I own.&lt;/p&gt;

&lt;h2 id="the-bookend"&gt;The bookend&lt;/h2&gt;

&lt;p&gt;Here is the thing though. I am not actually sure I have done myself any favours.&lt;/p&gt;

&lt;p&gt;The phone, as previously established, is the world’s ultimate distraction device. I am, very slowly and with mixed results, trying to make mine less of one. And what I have done here is take that device, the one that is already eating more of my attention than I would like, and given it a brand new way to consume me. Now it is not just a thing I pick up to check the time and put down forty minutes later wondering where the time went. Now it is also a thing I can pick up to “just quickly check on the agent” and put down forty minutes later wondering where the time went, except this time I have shipped half a feature I had not actually decided I wanted to ship yet.&lt;/p&gt;

&lt;p&gt;I was so busy thinking about whether I could that I did not stop to ask whether I should. Friends, I should not have. Your mileage may vary.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>opencode</category>
    </item>
    <item>
      <title>I Built This In A Prompt Window! With A Box Of Filament!</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Wed, 22 Apr 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/i-built-this-in-a-prompt-window-with-a-box-of-filament-mp4</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/i-built-this-in-a-prompt-window-with-a-box-of-filament-mp4</guid>
      <description>&lt;p&gt;&lt;em&gt;I Vibe Coded A Model Into My House&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There is a 3D printed World War II German infantryman sitting on my desk. He is about the size of my thumb, slightly chibi in the proportions, with a helmet a touch too large for his head. He looks, frankly, adorable. &lt;em&gt;He is also not a copy of anything&lt;/em&gt;. Nobody designed him. Nobody sculpted him. Nobody even sketched him. I typed some words at a screen, pressed a button on a different screen, and twenty minutes later he was sitting on my desk. On the left pure vibes, on the right reality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fhero.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fhero.jpg" alt="hero" width="799" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have been writing this blog for a while now about adventures in applying AI to dev related problems. Writing code. Building software. Making things. It has taken me through a Magic jumpstart cube generated with code, a board game prototype, a re-implementation of that prototype in Godot, a stealth tactics turn based game also in Godot, a news digest SaaS that is still somewhere in the middle of becoming itself, and a frankly embarrassing pile of bash scripts accumulated from deep dives into my coding harnesses. Along the way I have taken what I have learned and applied it at work. It has been a wonderful, bizarre road.&lt;/p&gt;

&lt;p&gt;This post is about atoms instead of bits, which is a first for the blog. I promise it rhymes.&lt;/p&gt;

&lt;h2 id="the-printer"&gt;The printer&lt;/h2&gt;

&lt;p&gt;Somebody gave me a 3D printer. An A1 Mini, which is one of the cheaper ones but turns out to be remarkably capable for what it is. And as one does when one acquires a 3D printer, I immediately spent a week not printing anything interesting. I built shelves for my office to create space. I printed spool holders for the spools. I printed the tools you need to use the 3D printer. This seems to be the compulsory onboarding ritual when you get a 3D printer, which is a bit like spending your first week with a new laptop installing a package manager so you can install the package managers that let you install things. Fine. Tradition.&lt;/p&gt;

&lt;p&gt;Then I set myself an actual goal.&lt;/p&gt;

&lt;h2 id="the-weird-hobby-compulsion"&gt;The weird hobby compulsion&lt;/h2&gt;

&lt;p&gt;I should explain that I have a tendency to go on strange game-making tangents. I have built a travel sized version of a board game from scratch with custom cards and components, purely because it was fun to tinker with. I have made print and play versions of expansions for games I already own. I made my own version of Santorini using tiles and spray paint, like some sort of deranged hobbyist. I paint, I draw, I mess about. The point is that “I wonder if I could just make this” is my default failure mode when I encounter any game that costs too much or takes up too much space.&lt;/p&gt;

&lt;p&gt;Bolt Action has been living rent free in the back of my head for two or three years now. It is a tabletop wargame. I would like to play it. But I do not want to spend the money on the models, I do not want to find the table space, I do not know enough people in my area who want to play it with me, and painting a hundred models is a lot of effort in a way that is genuinely hard to appreciate until you have sat there and done it.&lt;/p&gt;

&lt;p&gt;The idea that has been sitting in the back of my brain for a few years is this: make a scale down version. Instead of moving in inches, move in centimetres. Same rules, same ratios, everything else the same, just smaller. Smaller table. Smaller models. Smaller paint commitment.&lt;/p&gt;

&lt;p&gt;Which means you need smaller models. Which has always been the problem.&lt;/p&gt;

&lt;h2 id="the-accidental-pipeline"&gt;The accidental pipeline&lt;/h2&gt;

&lt;p&gt;I saw a &lt;a href="https://talesfromfarpoint.blogspot.com/2026/03/junker-update-and-tiny-troops-how-to.html?m=1" rel="noopener noreferrer"&gt;blog post a while back by a guy making models out of Fimo clay and EVA foam and little bits of wood&lt;/a&gt;. They were cute and chibi and slightly weird and I really liked them. So naturally I tried to make one myself. What I ended up with functionally looked okay and matched the vibe but was basically 28mm scale, which is normal Bolt Action size, which defeats the entire point.&lt;/p&gt;

&lt;p&gt;On a whim, I took the photo the original guy had taken of his model and fed it into Gemini Pro. “This style. World War II Germans.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fall.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fall.jpg" alt="all" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Gemini came back with something that looked astonishingly good. Cute, chibi, right proportions, right register. Something that immediately felt like what I had been trying to describe for years without quite having the vocabulary for it. I then started giving it more structured prompts. Give me a commanding officer. Give me an NCO. Give me a machine gunner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fmeshy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fmeshy.jpg" alt="Meshy produced a 3D model" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I took one of these entirely hallucinated images, cropped it, and fed it into Meshy, which is a generative tool that takes an image and produces a 3D model.&lt;/p&gt;

&lt;p&gt;I then, mostly out of morbid curiosity, copied the file over to the printer and clicked print.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Ffirst.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Ffirst.jpg" alt="hot off the print bead" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id="this-confounds-me"&gt;This confounds me&lt;/h2&gt;

&lt;p&gt;Twenty minutes later, there was a physical object on my desk. A German infantryman. About the size of my thumb. Cute and chibi, helmet slightly too large. Precisely the thing I had been describing to Gemini about half an hour earlier.&lt;/p&gt;

&lt;p&gt;I want to be clear about what happened here because I think I am still processing it.&lt;/p&gt;

&lt;p&gt;I described a thing in words. Another thing dreamed up a picture of that thing, a picture that had never previously existed. A third thing hallucinated a 3D shape from that picture, a shape that had also never previously existed. A fourth thing turned that shape into an object I can hold in my hand. No human sculpted it. No human modelled it. No human even sketched it. The infantryman on my desk has no reference in the world. He is not a copy of anything. He is purely the output of a &lt;em&gt;pipeline of vibes&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I vibe coded a model into my house&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I have spent a year now playing around with generative AI. I have been, at times, out on what I thought was the frontier of what it is doing. I should have seen this coming. At some level I had seen this coming, in the abstract, “yes of course generative AI plus 3D printing, that is obviously a thing” way. But there is a chasm between knowing a thing is possible and holding the output of that thing in your hand twenty minutes after describing it out loud.&lt;/p&gt;

&lt;p&gt;This is the same loop I have been running on software for a year. Describe a thing, get a thing, iterate, ship. The loop works on atoms now. It has probably been working on atoms for a while and I simply had not wired up the last step of the pipeline in my own life until somebody gave me a printer.&lt;/p&gt;

&lt;p&gt;Which makes me wonder what else is already sitting there, loop closed, waiting for me to notice.&lt;/p&gt;

&lt;h2 id="what-im-doing-with-it"&gt;What I’m doing with it&lt;/h2&gt;

&lt;p&gt;I am now in the middle of printing a 500 point German army and a 500 point Soviet army. The whole thing will probably fit in a box slightly bigger than a paperback. A deck of cards each for the rules and unit references. Total cost of the models, given that I already had the printer and the filament: functionally nothing. If a friend wanted to play, I could just print them a second army and not be fussed about it. There is some manual work around trimming supports and cleaning up sprues, but it is not the kind of work that scales with ambition. It scales with how many figures you feel like cleaning up on a given evening.&lt;/p&gt;

&lt;p&gt;Something I have been idly wanting for two or three years is just there now, in a box, because the pipeline finally closed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fcurrent.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.petervanonselen.com%2Fassets%2Freality%2Fcurrent.jpg" alt="current printed" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id="what-i-cant-get-out-of-my-mind"&gt;What I can’t get out of my mind&lt;/h2&gt;

&lt;p&gt;Here is the thing that has been rattling around my head since the infantryman showed up.&lt;/p&gt;

&lt;p&gt;We are, collectively, still arguing about whether vibe coded software counts as real engineering. That argument is live. It is on my timeline every week. It is in the comments of every post I write. People who build things for a living are genuinely unsure whether the loop of “describe a thing, get a thing, ship it” is a legitimate way to make software, and reasonable people disagree about that, and the discourse is maybe a year behind the tools and possibly more.&lt;/p&gt;

&lt;p&gt;While we have been having that argument, the same loop has quietly grown another output head. It makes physical objects now. Not in some research lab, not in some well funded startup I would need to buy into. In my house, on a desk, using tools that anyone can download or buy, for a material cost measured in pennies per figure.&lt;/p&gt;

&lt;p&gt;I found this out by accident. Someone gave me a printer. I had an itch I had been scratching at for years, and the pipeline closed on its own while I was not really paying attention. That is what is unsettling me. Not that it works. I knew it would work. It is that I walked into this corner of it entirely by accident, with no plan, and the corner was just sitting there waiting for anyone who happened to wander in.&lt;/p&gt;

&lt;p&gt;I do not think we are ready for the software version of this. I am much less sure about anything else. Because if a ten minute detour into a completely different hobby is enough to produce an object with no human author, what else is already sitting there that I have not stumbled into? What other loops have quietly closed while I was looking at my terminal? I thought I had been playing on the frontier for a year. It turns out I have been playing in one room of a house whose floor plan I do not have.&lt;/p&gt;

&lt;p&gt;I do not have a tidy ending for this. I have an infantryman on my desk, and a suspicion that I have been looking at a very small corner of something, and that almost everyone I know has been looking at the same small corner, and that the rest of the house is already built.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>softwarecraftsmanshi</category>
    </item>
    <item>
      <title>Conscious Coverage</title>
      <dc:creator>Peter van Onselen</dc:creator>
      <pubDate>Thu, 16 Apr 2026 08:00:00 +0000</pubDate>
      <link>https://dev.to/peter_vanonselen_e86eab6/conscious-coverage-8nj</link>
      <guid>https://dev.to/peter_vanonselen_e86eab6/conscious-coverage-8nj</guid>
      <description>&lt;p&gt;&lt;em&gt;We don’t talk about Code coverage, no no no, we don’t talk about coverage…&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;When I joined Cazoo, it was the first place I’d ever worked that explicitly, actively, aggressively embraced software craftsmanship. Pair programming. Test-driven development. Domain-driven design. Extreme programming. The whole kitchen sink. They sent us on agile training courses that a startup founder would weep at the cost of. We had an agile coach in the room every day. We did code katas regularly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uipanz4f64xi1pwmp71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uipanz4f64xi1pwmp71.png" alt="code coverage matters" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And even there, in the most craft-soaked environment I’d ever been in, the idea of 100% code coverage was treated as obvious lunacy. A poor metric. The kind of thing only someone who hadn’t really understood testing would chase.&lt;/p&gt;

&lt;p&gt;Then I joined the Economist, and the team I landed on had 100% coverage as a hard rule.&lt;/p&gt;

&lt;p&gt;While they did write tests, they didn’t do TDD. They didn’t pair. They hadn’t been on the agile bootcamps. They hadn’t done code retreats or code katas. By every measure of either the London or Chicago school of craftsmanship tradition would care about, they were doing less of the work. But they had the 100% rule, and they enforced it, and at first I assumed they’d inherited a metric without fully understanding it.&lt;/p&gt;

&lt;p&gt;They hadn’t. Turns out I hadn’t understood it. And by the time I left that team, I’d come around entirely. Not reluctantly, not with caveats, but genuinely: 100% coverage, properly understood, is mandatory. I held that position for years before agentic coding was a thing anyone was thinking about. The agents haven’t changed my mind. They’ve just taken a position I already held and made the case for it screamingly, urgently obvious in a way it previously wasn’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is this metric thing about anyway?
&lt;/h2&gt;

&lt;p&gt;Here’s what I’d absorbed from the craft world about 100% coverage. It’s a vanity number. Chasing it produces garbage tests. You end up writing assertions against getters and setters. You exercise code without testing behaviour. The pragmatic position, and pragmatism was always the emphasis, is that you write the tests that matter and you let the rest go.&lt;/p&gt;

&lt;p&gt;All of that is true if “100% coverage” means “every line has a test exercising it.” That version of the metric is genuinely silly and the people warning against it were right.&lt;/p&gt;

&lt;p&gt;But it took me until very recently to notice was that nobody, in all those arguments, had ever actually explained what the metric was &lt;em&gt;for&lt;/em&gt;. What it was pointing at. Everyone, including me, was arguing about the number. Nobody was asking what the number was a proxy for.&lt;/p&gt;

&lt;p&gt;It’s a proxy for &lt;strong&gt;Conscious Coverage&lt;/strong&gt;. That’s the thing. Every line in the codebase is a decision. The question the metric is actually asking, underneath, is: &lt;em&gt;have you made a conscious decision about each one&lt;/em&gt;. Not have you tested each one. Have you &lt;em&gt;decided&lt;/em&gt; about each one. Tested, or consciously chosen not to test, with a reason, written down.&lt;/p&gt;

&lt;p&gt;Concretely, it looks like this. You write a function with a branch that handles a malformed input. You run the coverage tool. It tells you the error branch isn’t covered. You now have three choices, and only three.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can write a test that exercises the malformed input and asserts the behaviour.&lt;/li&gt;
&lt;li&gt;You can mark the branch ignored with a comment that says, say, “unreachable because upstream validation guarantees this shape” — and now your justification is a reviewable artefact that someone can argue with in a pull request.&lt;/li&gt;
&lt;li&gt;Or you can decide the branch shouldn’t exist at all and delete it. What you cannot do is shrug and move on. The forgotten case is no longer a thing. Every line has had a decision made about it, and the decisions are legible.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;  &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="nf"&gt;countryToRegion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;countryCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;Region&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* v8 ignore start */&lt;/span&gt; &lt;span class="c1"&gt;// Ignoring the switch to avoid repeating every single country code&lt;/span&gt;
    &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;countryCode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you see that, the version of the rule that the craft world rejected and the version the Economist team was running are obviously different things. The first one optimises for a number. The second one optimises for &lt;em&gt;the absence of accidents&lt;/em&gt;. You can no longer fail to test something because you forgot. You can fail to test it because you decided not to, and you wrote down why, and someone can argue with you about it later in the review. The shape of the work is different.&lt;/p&gt;

&lt;p&gt;And this is the bit I have to be honest about, because the post doesn’t work without it. Once the metric is framed as conscious coverage, the pragmatic position I’d absorbed at Cazoo stops being pragmatic. It’s just laziness with a vocabulary. “Write the tests that matter and let the rest go” sounds wise until you ask which lines, specifically, didn’t matter, and why, and the answer turns out to be that I didn’t want to write those tests and the tradition had given me a way to sound rigorous about not writing them. The metric wasn’t too expensive. The work it pointed to wasn’t too expensive. I just didn’t want to do it, and nobody was making me, and the craft vocabulary let me call that a considered trade-off.&lt;/p&gt;

&lt;p&gt;I had to be in a place that just &lt;em&gt;did&lt;/em&gt; it before I could see any of this. Sitting at Cazoo arguing about it from first principles, I would have lost the argument every time, because the version of the rule I was arguing against was the version everyone agrees is bad, and the version underneath it, the one about conscious, nobody had ever put into words for me. Nobody tells you the better version exists until you’re standing inside a codebase that runs on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changes when an agent is doing the writing
&lt;/h2&gt;

&lt;p&gt;Fast forward. I’m now writing a lot of code with agents. Claude Code, Codex, OpenCode, the usual suspects. The thing I keep telling people who ask me about it is that agentic engineering requires &lt;em&gt;more&lt;/em&gt; discipline than normal engineering, not less. The tools are faster, the output is bigger, and the gaps between what you asked for and what you got are easier to miss. So everything that used to depend on careful human attention now depends on something else holding the line. Which brings me back to the question: how do I know it’s done? And more importantly, how does an agent know?&lt;/p&gt;

&lt;p&gt;Not “done” in the user-acceptance sense. Done in the much more boring sense of: has this thing actually exercised the code it claims to have written? Has it tested the behaviour I care about? Did it quietly skip a branch because the test was annoying to set up? Did it write something that’s technically passing but structurally untestable?&lt;/p&gt;

&lt;p&gt;These are the questions the craftsmanship tradition spent twenty years building intuitions about, and the answer the tradition arrived at, pragmatically, contextually, with appropriate caveats, was mostly “you’ll know it when you see it, and pairing helps, and code review helps, and time helps.” Which is fine when humans are doing the work at human pace. It is not fine when an agent has just produced four hundred lines in ninety seconds and is asking what to do next.&lt;/p&gt;

&lt;p&gt;The agent needs a guard rail. Something machine-checkable. Something it can run, get a number from, and decide for itself whether to keep going. Something another agent can validate.&lt;/p&gt;

&lt;p&gt;100% coverage, in the conscious sense, turns out to be exactly that. The agent finishes its loop, runs the coverage tool, sees 98%, and knows, without me telling it, that there are two percent of decisions it hasn’t made yet. Either write the test, or mark the lines as ignored with a justification. Both are fine. What’s not fine is leaving the gap.&lt;/p&gt;

&lt;p&gt;And here is where the impact of the reframe gets outsized, because the agent doesn’t have my laziness. The agent doesn’t want to go home. The agent isn’t quietly negotiating with itself about which lines it can get away with skipping. The thing that was always standing between me and conscious coverage, which was me, just isn’t there. The metric stops being a rod I have to hold myself to and becomes a rod the agent holds itself to, cheerfully, at four in the morning, forever. The practice the craft tradition argued about most fiercely for human reasons becomes, for agents, the most natural thing in the world.&lt;/p&gt;

&lt;p&gt;I’ve started using this as one of my standard acceptance criteria. “You are done when coverage reports 100%.” I can kick off a thirty-minute task and come back to something that, whatever else is true of it, will at least be testable, and will at least have had every line consciously decided about.&lt;/p&gt;

&lt;p&gt;Coverage as the gate at the end works better when there’s a process upstream that’s likely to produce decent tests in the first place. If you set up the harness with CLAUDE.md files that push the agent toward red-green-refactor TDD, and you give it the kind of structured prompting (like obra/superpowers) that shapes how it actually approaches a task, you tilt the odds. There’s no guarantee it’ll write tests first. There’s a much better chance it will, and a much better chance the tests it writes are pulling the design rather than chasing it. That upstream tilt plus the downstream gate is a much sturdier system than either piece on its own.&lt;/p&gt;

&lt;p&gt;There’s a sharpening of all this that matters, though, because coverage on its own can still produce tests that exercise code without actually testing anything. The companion practice, and I’d say it’s a necessary one rather than a complementary one, is writing tests outside-in, from behaviour rather than from structure. Test the unit of behaviour, not the unit of code. Don’t mock the internals; let the real thing run and assert against what the user of the code actually cares about. This was already the right answer when humans were writing the tests, because it produces tests that survive refactors and read like documentation. With agents it becomes critical, because a behaviour-shaped test is one the agent can write legibly from a user story, and one that you, as the reviewer, can read and check against intent without having to trace the implementation. Coverage tells you the agent made a decision about every line. Behavioural framing tells you the decisions were about the right things. You need both. Coverage without behavioural framing is theatre; behavioural framing without coverage leaves gaps you’ll find in production.&lt;/p&gt;

&lt;p&gt;Now for the obvious objection. Agents are world-class metric gamers. They will absolutely write meaningless tests that exercise code without asserting anything useful. They will absolutely mark lines as ignored with justifications like “this branch is unreachable” when the branch is, in fact, reachable. If you treat 100% coverage as a number to satisfy, the agent will satisfy the number and you’ll be worse off than before, because now you have a green build hiding a problem instead of a red one announcing it.&lt;/p&gt;

&lt;p&gt;The reason I think it works anyway is that it’s asking the right question of the metric. Coverage, in the conscious sense, is a completeness check. It tells you every line has had a decision made about it. It was never going to tell you the decisions were good ones. That’s a different question, and it wants a different answer. Behavioural tests, written outside-in from what the user of the code actually cares about, are the correctness check. Mutation testing, which flips operators and boundaries and asks whether any test notices, is the check on whether the assertions are doing real work. The gaming the agent does lives in the gap between those checks, and the mitigation isn’t to make coverage smarter. It’s to stop asking coverage to do correctness’s job. Use it for what it is: a completeness gate that makes the decisions visible. Use behavioural framing and mutation testing for the quality of the decisions. The ignored lines and their justifications are, at least, a reviewable artefact, sitting in one place where you can read them. The cheats are confined to a place you’re looking. None of that is automatic. It’s a discipline, and like every guard rail it collapses the moment you stop maintaining it. The question is whether the rail makes problems easier or harder to spot, and I think this one makes them easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The truisms didn’t go away
&lt;/h2&gt;

&lt;p&gt;The craft tradition produced a lot of practices, and a lot of arguments about practices, and a lot of nuance about when practices apply. Most of that nuance was about humans. About the cost of the practice to the person doing it, about whether the discipline was worth the friction, about whether the metric would be gamed. A lot of it, and I say this now having lived on both sides of the argument, was about whether the person doing the work would actually do it if you asked them to.&lt;/p&gt;

&lt;p&gt;Agents don’t have that problem. The friction of writing the extra test isn’t a friction the agent feels. The discipline of marking ignored lines with reasons isn’t a discipline the agent has to be talked into. The kind of metric-gaming that comes from a tired human at five-to-six is replaced by a different kind of gaming, which is its own problem. So practices that were borderline-worth-it for humans become straightforwardly worth it for agents, and practices that were rejected as lunacy for humans turn out, on inspection, to have been rejected for reasons that said more about the humans than about the practice.&lt;/p&gt;

&lt;p&gt;The craft was always about building software in a sustainable, predictable, maintainable way. That hasn’t changed. The agents don’t replace the craft. They inherit it. And some of the practices the tradition argued about most fiercely turn out, in this new context, to be exactly the load-bearing ones. Not because the old arguments were wrong about the metric, but because the old arguments were quietly also about us, and the us part has changed.&lt;/p&gt;

&lt;p&gt;100% coverage wasn’t wrong. It was a proxy for something nobody I knew named. That allowed me to point at work I didn’t want to do, and dressed up in a vocabulary that let me agree with myself about not doing it. The agents don’t have the vocabulary and don’t need it. Which makes me wonder which other practices were rejected for reasons that were really about us, and what the calculation looks like now that we have a collaborator who just, straightforwardly, does the work. I’ve run that calculation for coverage. I’m increasingly sure it isn’t the only practice the answer flips for. I’d quite like to know which others.&lt;/p&gt;

</description>
      <category>aios</category>
      <category>claudecode</category>
      <category>softwarecraftsmanshi</category>
    </item>
  </channel>
</rss>
