<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Phil Rentier Digital</title>
    <description>The latest articles on DEV Community by Phil Rentier Digital (@rentierdigital).</description>
    <link>https://dev.to/rentierdigital</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3440667%2F4dff0ac3-f0f2-42bf-b066-14c2ba847691.jpg</url>
      <title>DEV Community: Phil Rentier Digital</title>
      <link>https://dev.to/rentierdigital</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rentierdigital"/>
    <language>en</language>
    <item>
      <title>Your Resume Got You Zero Interviews. 200 Vibe Coders Got Yours.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Sat, 09 May 2026 13:41:10 +0000</pubDate>
      <link>https://dev.to/rentierdigital/your-resume-got-you-zero-interviews-200-vibe-coders-got-yours-2b05</link>
      <guid>https://dev.to/rentierdigital/your-resume-got-you-zero-interviews-200-vibe-coders-got-yours-2b05</guid>
      <description>&lt;p&gt;Peter Grafe, CEO of BlueAlpha (small marketing shop, you wouldn't know them, that's the point), got 200 applications in two days for one role. 95% disqualified before a human opened the PDF 😬. The 10 that survived had to &lt;strong&gt;vibe code&lt;/strong&gt; something in five days. You, meanwhile, are polishing your LinkedIn opening line for the third time.&lt;/p&gt;

&lt;p&gt;If you can't vibe code, you just became unemployable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; The &lt;strong&gt;resume is cooked&lt;/strong&gt;. Recruiters get &lt;strong&gt;200 plausible applications&lt;/strong&gt; in 48 hours, they stopped pretending to read them, and the &lt;strong&gt;filter moved&lt;/strong&gt; somewhere else. You probably don't know where.&lt;/p&gt;

&lt;p&gt;When anybody can generate a plausible application in five minutes, the resume stops being a signal. It becomes noise. And noise gets filtered without a read.&lt;/p&gt;

&lt;p&gt;You didn't lose those interviews. You didn't even play.&lt;/p&gt;

&lt;p&gt;What follows is what the 10 do differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  200 Applications in Two Days. The 10 That Survived Were Vibe Coders.
&lt;/h2&gt;

&lt;p&gt;Grafe published the math himself in a Sherwood News piece this month. 200 applications, 48 hours, 95% out before any reading happened. He's not bragging, he's tired. He says he stopped opening the PDFs because the AI-generated cover letters all sound like a LinkedIn HR consultant on his fourth coffee.&lt;/p&gt;

&lt;p&gt;The 10 candidates who got past the filter didn't have a better CV. They had something the other 190 didn't. They could &lt;strong&gt;ship&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Grafe gave them a brief, five days, and a vague problem. The ones who survived produced a working prototype. The ones who didn't, didn't.&lt;/p&gt;

&lt;p&gt;That math is the job market itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Faking a Signal Becomes Free, the Signal Stops Working
&lt;/h2&gt;

&lt;p&gt;A signal that costs nothing to fake stops being a signal. That's why your inbox is full of 5-star Amazon reviews you don't trust, LinkedIn skill badges nobody verifies, and certificates that mean nothing because the website handed them out for completing a 12-minute video.&lt;/p&gt;

&lt;p&gt;Resumes were already a weak signal a decade ago. Hiring managers admitted as much. They kept reading them because no cheaper alternative existed. The cost of writing a CV was high enough to discourage random people, low enough that you got real candidates. That equilibrium held for fifty years.&lt;/p&gt;

&lt;p&gt;The AI didn't kill the resume. The resume was already wounded. AI just made the cost of producing one drop to zero, and the equilibrium snapped.&lt;/p&gt;

&lt;p&gt;What AI fabricates for free, the market stops valuing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Resume Is Done. The Numbers Already Buried It.
&lt;/h2&gt;

&lt;p&gt;TestGorilla surveyed 2,160 employers and candidates across the US and UK this year. &lt;strong&gt;85% of employers&lt;/strong&gt; now use skills-based hiring. That number was 81% last year and 73% the year before. The trend isn't subtle.&lt;/p&gt;

&lt;p&gt;Same survey: &lt;strong&gt;71% of employers&lt;/strong&gt; say skills tests predict performance better than resumes. &lt;strong&gt;86% of US hiring managers&lt;/strong&gt; and &lt;strong&gt;89% of UK hiring managers&lt;/strong&gt; report problems with CVs. One in three recruiters admits they can't tell if the resume in front of them is accurate. They're not even pretending anymore.&lt;/p&gt;

&lt;p&gt;Half the employers in the survey have dropped degree requirements. Two thirds say their AI-generated cover letter detector is busy. (It's not very good. It just runs all the time.)&lt;/p&gt;

&lt;p&gt;So what do they trust? They trust what you can build in front of them with constraints, time pressure, and a brief. Grafe put it bluntly in the same Sherwood piece: "the bar has shifted from do you understand technology to can you produce something with it."&lt;/p&gt;

&lt;p&gt;The resume was fakeable long before AI. AI just dropped the price to zero, and the filter cracked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe Coding Is the New Word. It Took 18 Months Instead of 10 Years.
&lt;/h2&gt;

&lt;p&gt;Remember when knowing Word was a silent prerequisite for any office job? Nobody put "Microsoft Word, intermediate" on a resume because it was assumed. The shift took 10 years. From "secretaries use it" to "if you can't, the door is on your left."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vibe coding&lt;/em&gt; is doing the same thing in 18 months. And no, this isn't just a tech industry move.&lt;/p&gt;

&lt;p&gt;Harlem Capital, a venture fund, published their interview process. A senior associate candidate had to build an AI agent that automates industry research in a week, then brief the partners with the output. Another candidate had to vibe code a portfolio dashboard. Their head of talent Nicole DeTommaso wrote it in plain English: "You are not told which tools to use or how to go about it. You are just expected to figure it out."&lt;/p&gt;

&lt;p&gt;Crux Analytics, an analytics firm, embeds a practical AI project in every single hire, technical or not. CEO Jacob Bennett told Sherwood the test is about "where did they use AI, where didn't they, and why". The code itself isn't the point.&lt;/p&gt;

&lt;p&gt;BlueAlpha, the marketing agency from the top of this article, makes commercial candidates fire up Claude Code during the interview itself.&lt;/p&gt;

&lt;p&gt;Look at that list. A VC fund, an analytics firm, a marketing agency. None of them is hiring developers.&lt;/p&gt;

&lt;p&gt;Word took 10 years because companies pushed the tool. Vibe coding takes 18 months because the interview is the push. You don't get hired and then learn it. You learn it before you walk in, or you don't walk in at all.&lt;/p&gt;

&lt;p&gt;Call it an audition, because that's what it is. Vague brief, five days, someone watches what comes out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cheating Trick You Spent a Year Learning Just Became the Test.
&lt;/h2&gt;

&lt;p&gt;Resume Genius surveyed 1,000 active US job seekers this year. &lt;strong&gt;22% admit using AI&lt;/strong&gt; in real time during interviews. &lt;strong&gt;78% use AI&lt;/strong&gt; somewhere in the job hunt. There's a YouTube short selling the trick that's doing absurdly well for its channel, the kind of outlier you don't get without a real candidate appetite. The title sells the secret. The audience confirms it.&lt;/p&gt;

&lt;p&gt;A whole industry sprung up to sell stealth AI overlays. Fake browser tabs, transparent windows, fancy keyboard shortcuts, the works. Candidates spent 18 months learning to hide AI from interviewers.&lt;/p&gt;

&lt;p&gt;Then this happened.&lt;/p&gt;

&lt;p&gt;Canva published a public engineering blog called &lt;em&gt;Yes, You Can Use AI in Our Interviews. In fact, we insist&lt;/em&gt;. They expect candidates to use Copilot, Cursor, and Claude during the technical round. Half their frontend and backend engineers are daily AI users, so the interview now matches the job. They evaluate when and how candidates lean on AI, how they break down ambiguous briefs, and whether they can spot bugs in AI-generated code.&lt;/p&gt;

&lt;p&gt;Sierra rewrote their entire onsite around AI. Plan, build for two hours with full AI access, then review. Codebase debugging interviews with PR drafts to improve via coding agents. Harlem Capital, Crux Analytics, BlueAlpha. Same move.&lt;/p&gt;

&lt;p&gt;Not everywhere yet. Plenty of companies still proctor with anti-AI software, lock the browser, and watch your tab switches. The stealth trick has a market for now. But the companies that set the bench, the ones whose hiring practices get copied six months later, have already flipped.&lt;/p&gt;

&lt;p&gt;And yes, AI doesn't always make you faster. METR ran a 2025 field study on experienced open-source devs. With AI active on real tasks, they ran &lt;strong&gt;19% slower&lt;/strong&gt;. They thought they'd be faster. They were wrong. The interview is built around judgment, not speed. When to lean on the tool and when to set it aside.&lt;/p&gt;

&lt;p&gt;You learned to cheat just in time for cheating to become the test.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Catches Before You Sprint.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Today's skill is not tomorrow's leverage.&lt;/strong&gt; Stefan Stern, visiting professor at Bayes Business School, gave Sherwood the cleanest pushback: "attitude is a more important consideration than today's aptitude". An employer that over-indexes on current vibe coding skills risks missing candidates who would learn faster and outpace them in six months. Smart hiring managers know this trap. They watch for the candidate who didn't ship the cleanest prototype but explained their reasoning best.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill atrophy is real.&lt;/strong&gt; If you delegate every line of code and every decision frame to the AI, you lose the ability to judge what comes back. And judgment is exactly what the recruiters are testing. The candidate who shipped the prototype but can't say why they used the tool here and not there fails the interview just as hard as the one who didn't ship at all. I went into the move from gambling-with-AI to actually shipping in &lt;a href="https://rentierdigital.xyz/blog/i-stopped-vibe-coding-and-started-prompt-contracts-claude-code-went-from-gambling-to-shipping" rel="noopener noreferrer"&gt;a method I built after enough vibe coding disasters&lt;/a&gt;, and the symptoms of skill atrophy are very specific.&lt;/p&gt;

&lt;p&gt;What doesn't fabricate for free is judgment. The interview is built around that. C'est ça.&lt;/p&gt;

&lt;h2&gt;
  
  
  The New Rules. Start This Week.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Pick one tool. This week. Not three.&lt;/strong&gt; Lovable, Cursor, Claude Code, Replit, whichever. The choice matters less than the wrist time. Sit down with one of them on Saturday and build something tiny. Don't watch tutorials. Don't read comparisons. Build. The 8-step method in &lt;a href="https://www.amazon.com/dp/B0GYQHLSCB" rel="noopener noreferrer"&gt;Vibe Coding, For Real&lt;/a&gt; was written for this exact problem because most non-devs spend three weeks "researching" and ship nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Ship one tiny thing. Public. With a URL.&lt;/strong&gt; A landing page that takes a form. A small tool that does one thing. Anything that has a public URL and survives a real user clicking on it. Recruiters can sniff a side project that lived for two days. They want to see something that survived its own deploy. If you want a quick checklist of what separates a real shipped thing from a demo that explodes the moment a real user touches it, &lt;a href="https://medium.com/@rentierdigital/the-ultimate-guide-how-to-stop-burning-through-lovable-ai-credits-like-a-noob-5600d8942c87" rel="noopener noreferrer"&gt;the credit-burn guide on Lovable&lt;/a&gt; covers most of the failure modes for non-devs starting out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Expect the cliff.&lt;/strong&gt; Your first prototype will work. Your second one will explode. The vibe coding learning curve has a classic drop right when you go from "demo on my laptop" to "must stay up for 24 hours and not leak the database." It happens to everybody. The fault isn't you (it's the second little pig finding out straw doesn't hold against an actual wind).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Be able to say where you used AI and where you didn't.&lt;/strong&gt; And why. This is the test Crux Analytics literally runs. Pick a small project, document your decisions, and rehearse a 90-second walkthrough. Where did you trust the first answer? Where did you push back? Pick the moment something broke and explain what you changed. Karen from Accounting is going to ask you that exact question in November, even if she calls it something else.&lt;/p&gt;

&lt;h2&gt;
  
  
  200 Got the Interview Today. 1,000 Will Tomorrow.
&lt;/h2&gt;

&lt;p&gt;The shift isn't hypothetical, it's already running. The 200 vibe coders who got auditions this month aren't a spike, they're the new floor. Peter Grafe threw 190 PDFs in the bin without reading them and hired the one who shipped a prototype in five days.&lt;/p&gt;

&lt;p&gt;The only question left is whether you're in the 200, or in the 800 nobody opens.&lt;/p&gt;

&lt;p&gt;Open Claude Code this weekend. It's shorter than your cover letter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;How AI is turning every job interview into a coding interview&lt;/em&gt;, Chris Stokel-Walker, Sherwood News, 6 May 2026: &lt;a href="https://sherwood.news/tech/use-ai-interview/" rel="noopener noreferrer"&gt;https://sherwood.news/tech/use-ai-interview/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;The State of Skills-Based Hiring 2025 Report&lt;/em&gt;, TestGorilla: &lt;a href="https://www.testgorilla.com/skills-based-hiring/state-of-skills-based-hiring-2025/" rel="noopener noreferrer"&gt;https://www.testgorilla.com/skills-based-hiring/state-of-skills-based-hiring-2025/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity&lt;/em&gt;, METR, July 2025: &lt;a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/" rel="noopener noreferrer"&gt;https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>vibecoding</category>
      <category>aicoding</category>
    </item>
    <item>
      <title>Add a CLI to Your App or Watch Claude Code Ping You on Every Feature</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Fri, 08 May 2026 13:41:12 +0000</pubDate>
      <link>https://dev.to/rentierdigital/add-a-cli-to-your-app-or-watch-claude-code-ping-you-on-every-feature-1kfe</link>
      <guid>https://dev.to/rentierdigital/add-a-cli-to-your-app-or-watch-claude-code-ping-you-on-every-feature-1kfe</guid>
      <description>&lt;p&gt;&lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;&lt;u&gt;CLI is the new MCP.&lt;/u&gt;&lt;/a&gt; Slogan aside: CLI is super powers handed to Claude, Codex, and every agent that codes for you. Letting coding LLMs verify their own work programmatically gives them an unfair advantage over classic fullstack apps that didn’t ship that surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;CLI is the new full stack.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; Two apps, same modern stack, a 1.8x gap in commits shipped over 30 days. The gap doesn't come from the AI, the framework, or the backend, but from a layer that "stack 2026" guides forgot, and that your &lt;code&gt;scripts/&lt;/code&gt; folder won't replace.&lt;/p&gt;

&lt;p&gt;The Friday night before I left for the Costa Brava, I wanted to ship one small thing before closing the laptop (without my wife yelling at me).&lt;/p&gt;

&lt;p&gt;Two Claude Code windows side by side. Same prompt to each. Two different apps, same stack at 95%.&lt;/p&gt;

&lt;p&gt;By the time I shut the laptop, one app had shipped a feature in three autonomous iterations. The other had me clicking in the admin like a junior trying to debug prod from a phone. Same agent, same me, one folder of difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Same Stack, One Folder of Difference
&lt;/h2&gt;

&lt;p&gt;First app, left window. Claude Code writes the mutation, types a command into its terminal, reads the JSON that comes back, spots that one field is wrongly cast, fixes it, retypes. Three iterations in total autonomy. I come back, say “have a good weekend”. The commit is ready, validated at 100%.&lt;/p&gt;

&lt;p&gt;Second app, right window. Claude Code writes the mutation, then stops. It pings me. &lt;em&gt;Can you open the admin and check that this works?&lt;/em&gt; I click. I refresh, then re-click. Re-ping. My wife starts raising her voice. Whatever, we'll see Monday. Otherwise I would have spent the entire evening being the mouse of an agent that was supposed to code in my place.&lt;/p&gt;

&lt;p&gt;Same stack, same me, same Claude Code on both. The only difference fits in one folder.&lt;/p&gt;

&lt;p&gt;The two apps are mine. One is a back-office that syncs a WooCommerce catalog through a partner API every night, plus a weekly CSV feed from a distributor. The other is a back-office piloting a network of WooCommerce client e-shops (deploys, theme updates, plugin sync, the usual fleet thing). Both built since six months. Both running Next.js, Convex, shadcn, Vercel. Same &lt;code&gt;CLAUDE.md&lt;/code&gt;, same conventions.&lt;/p&gt;

&lt;p&gt;One has a &lt;em&gt;CLI as its central layer&lt;/em&gt;. The other has a &lt;code&gt;scripts/&lt;/code&gt; folder.&lt;/p&gt;

&lt;p&gt;That's it. That's the whole gap.&lt;/p&gt;

&lt;p&gt;By "CLI" I mean a real entrypoint with sub-commands named after business actions (&lt;code&gt;catalog refresh&lt;/code&gt;, &lt;code&gt;partner sync&lt;/code&gt;, &lt;code&gt;site init&lt;/code&gt;), wired into the exact same business layer that the dashboard uses. You type &lt;code&gt;bun run cli partner sync --dry-run&lt;/code&gt;, and the same code path that runs when an admin clicks "Sync" runs, except it returns JSON to stdout.&lt;/p&gt;

&lt;p&gt;The other app has none of that. Just &lt;code&gt;.mjs&lt;/code&gt; files with names like &lt;code&gt;fix-thing-2025-08.mjs&lt;/code&gt; (admit it, you have a folder like that too). Each one written to "pass once". Most of them never ran a second time.&lt;/p&gt;

&lt;p&gt;That's the entire difference. And it changed how the agent worked at every level.&lt;/p&gt;

&lt;h2&gt;
  
  
  30 Days of Commits Don't Lie: 1.8x More Features, Half the Fixes
&lt;/h2&gt;

&lt;p&gt;I went back through the git history of both repos over the same 30-day window in May.&lt;/p&gt;

&lt;p&gt;The CLI-app shipped 272 commits. The scripts-app shipped 150. That's a 1.8x ratio, on the same me, same agent, same daily routine.&lt;/p&gt;

&lt;p&gt;Inside the CLI repo, every single sub-command got touched at least once during the window. 100%. Inside the &lt;code&gt;scripts/&lt;/code&gt; folder, only 29% of the files saw any activity. The rest were dormant. 41% of all the script files in the scripts-app had been written, run once, and never opened again. The oldest one I found that fits this profile hadn't been touched in 57 days. I had completely forgotten it existed until I went looking.&lt;/p&gt;

&lt;p&gt;There's one more number that's interesting, but I want to flag it as a &lt;em&gt;hypothesis&lt;/em&gt;, not as proof. Looking at commit messages tagged &lt;code&gt;fix:&lt;/code&gt; versus &lt;code&gt;feat:&lt;/code&gt;, the CLI-app had a fix-to-feat ratio of 0.44. The scripts-app sat at 0.82. Roughly twice as many fix commits per feature on the side without a CLI.&lt;/p&gt;

&lt;p&gt;I can't prove the CLI causes that gap. The two apps have different domain maturity, different complexity, different coverage of edge cases. Half the difference might come from the fact that the back-office for client sites is simply older and more fiddly than the partner API one. But the gap is consistent with what I observe daily, and it tracks the autonomy gap I described in the intro: the agent ships cleaner work when it has a way to verify itself, so fewer regressions sneak through.&lt;/p&gt;

&lt;p&gt;The orphan rate (41% versus 0%) and the velocity gap (1.8x) aren't hypotheses. Those I can read straight from &lt;code&gt;git log&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Mechanism: Agents Need a Text-Structured Surface
&lt;/h2&gt;

&lt;p&gt;The reason the CLI-app produces autonomous iterations while the scripts-app produces ping-fests has nothing to do with code quality or model size.&lt;/p&gt;

&lt;p&gt;It's about the surface the agent has to validate its own work.&lt;/p&gt;

&lt;p&gt;Think about what Claude Code actually does in a feature loop. It writes code, then it needs to know if the code does what it was supposed to do. If the only way to check is "open the dashboard, click around, look at the screen", the agent can't do it. Browsers return DOM. DOM without a human eye to interpret what's rendered is opaque to an agent. The colors, the loading states, the modal that popped up, the validation message at the bottom, all of it is meaningful to a person and noise to a model. The agent has no ground truth, so it stops and asks you.&lt;/p&gt;

&lt;p&gt;A CLI returns text. JSON, structured stdout, exit codes. Things an agent can read, parse, reason about. The agent runs the command, reads the output, sees that &lt;code&gt;partnerStatus: "rejected"&lt;/code&gt; means the mutation didn't go through, fixes the code, runs again. No human in the loop. The feedback signal is &lt;em&gt;natively legible&lt;/em&gt; to the model.&lt;/p&gt;

&lt;p&gt;That's the whole principle. Surface text-structured equals agent autonomous. Surface DOM-only equals agent that pings you on every iteration.&lt;/p&gt;

&lt;p&gt;This is also why MCP servers, REST APIs, tRPC endpoints, GraphQL all work for agents calling your service. They're all text-structured surfaces. A CLI is just the simplest, most local incarnation of this principle for the agent that's coding &lt;em&gt;your own app&lt;/em&gt;. Not calling a remote service. Writing code in your repo and needing to test it now.&lt;/p&gt;

&lt;p&gt;You can simulate this with Playwright pointed at your dashboard. People do. It works, sort of. It also costs a 10x slowdown, a flaky retry layer, and a screenshot-comparison step that breaks every time you ship a UI change. A CLI returns the same answer in milliseconds with no flakiness, because text was always the thing the agent wanted in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 2026 Stack Forgot a Layer (And Every AI Code Gen Tool Skips It Too)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitre-quot-the-stack-quot-sous-titre-quot-three-layers-410863ce.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitre-quot-the-stack-quot-sous-titre-quot-three-layers-410863ce.png" alt="TITRE " width="768" height="1029"&gt;&lt;/a&gt;&lt;br&gt;The Four-Layer Development Stack: Three Bright, One Forgotten
  &lt;/p&gt;

&lt;p&gt;Go read any "best stack to launch your AI-coded tool" guide written between February and April 2026. KDnuggets, Idlen, Context Studios, MindStudio, you can pick one at random. They all converge on the same six or seven layers. Next.js for the frontend. shadcn for the UI kit. Supabase or Convex for the backend. Clerk for auth. Stripe for payments. Resend for transactional email. Vercel for hosting. Some add Tailwind, OpenAI, Claude, Gemini.&lt;/p&gt;

&lt;p&gt;There are at least 50 of those guides published in the last three months. None of them mentions a CLI for your app.&lt;/p&gt;

&lt;p&gt;Same blind spot on the AI side. Cursor, v0, Bolt, Lovable, Claude Code itself when it scaffolds a new project. All of them generate a frontend, a backend, a hosting config. Zero of them generate a CLI as a first-class layer. If you ask Claude Code to "set up a Next.js app with Convex and Stripe", you'll get those three things and nothing else. The CLI, if any, will appear later as scaffolding (&lt;code&gt;next dev&lt;/code&gt;, &lt;code&gt;convex dev&lt;/code&gt;) and that's it.&lt;/p&gt;

&lt;p&gt;This wasn't a problem in 2020. In 2020, you wrote your own code, and your IDE was your feedback loop. F5, F12, console.log, console.log, console.log. The DOM was fine because you were the one reading it.&lt;/p&gt;

&lt;p&gt;In 2026, you're not the one writing most of the code. The agent is. And the agent doesn't have eyes.&lt;/p&gt;

&lt;p&gt;A 2026 stack with no CLI layer forces the agent to depend on you for every iteration. The agent writes a mutation, you click in the admin, you tell the agent if it worked. The agent writes a sync job, you &lt;code&gt;tail -f&lt;/code&gt; the logs, you tell the agent what you saw. Every feature loop has you as the mandatory middle node. You think you're prompting an agent to ship for you, you're actually playing browser intermediary for the agent.&lt;/p&gt;

&lt;p&gt;The fourth layer follows from one fact: if you want the agent to ship autonomously, you need to give it a surface it can read.&lt;/p&gt;

&lt;p&gt;Idlen's piece argues that picking the wrong backend means rewriting your data models at 2am. Yeah, and it's worse if you don't have a CLI, because you're rewriting them by hand instead of running &lt;code&gt;bun run cli model migrate&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Scripts Rot and CLIs Live
&lt;/h2&gt;

&lt;p&gt;The 41% orphan rate doesn't come from laziness. It comes from the fact that a &lt;code&gt;scripts/&lt;/code&gt; folder doesn't ask anything of you architecturally.&lt;/p&gt;

&lt;p&gt;You write &lt;code&gt;scripts/migrate-orders-2025-04.mjs&lt;/code&gt; because you have an emergency. You run it once. It works. You commit it (or you don't, depending on how panicked you were). Three weeks later, another emergency. You write &lt;code&gt;scripts/migrate-orders-fix.mjs&lt;/code&gt;. Same problem, slightly different name. You don't reuse the first one because you don't remember it exists. There's no &lt;code&gt;scripts/ --help&lt;/code&gt;. There's just an &lt;code&gt;ls&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The whole folder ends up like Karen from Accounting's filing cabinet: technically organized, practically unusable. Everything is "there", nobody knows where, even Karen has stopped looking.&lt;/p&gt;

&lt;p&gt;A CLI forces a different shape. You can't add &lt;code&gt;partner sync&lt;/code&gt; as a sub-command without registering it in the entrypoint, which means you see all the other sub-commands every time you add a new one. Discoverability is built into the tool. New sub-commands inherit the same flags (&lt;code&gt;--dry-run&lt;/code&gt;, &lt;code&gt;--limit&lt;/code&gt;, &lt;code&gt;--verbose&lt;/code&gt;), the same logger, the same error handling. Idempotence becomes easy because you're already passing through a shared business layer that the dashboard also uses.&lt;/p&gt;

&lt;p&gt;That's why the touched-rate sits at 100% on the CLI side. I'm not more disciplined when I use a CLI. The CLI is just architecturally hostile to throwaway code in a way &lt;code&gt;scripts/&lt;/code&gt; never is.&lt;/p&gt;

&lt;p&gt;And &lt;code&gt;--help&lt;/code&gt; is doing more than helping you. It's the entrypoint for any agent that lands on your repo. Claude Code types &lt;code&gt;bun run cli --help&lt;/code&gt; once and now knows every business action it can trigger, with its flags and its description. No prompt engineering, no doc to feed. The CLI documents itself, to humans and to agents at the same time. That's what &lt;code&gt;scripts/&lt;/code&gt; will never give you, no matter how clean your filenames are.&lt;/p&gt;

&lt;p&gt;Caveat I should put right here, while I'm bragging. My own CLI has a real weakness. Out of 14 sub-commands, 11 have no description in the &lt;code&gt;--help&lt;/code&gt; output. That's 79% of my commands appearing as bare names with no explanation. The CLI forced execution discipline. It did not force documentation discipline. Claude Code can still discover every command, parse the JSON output, and use it. A junior dev opening the repo for the first time would have to read the source. I'm fixing it slowly, but the lesson stands: the architecture solves the &lt;em&gt;running&lt;/em&gt; problem, not the &lt;em&gt;teaching&lt;/em&gt; problem. You still have to write the docstrings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your App Is Already Agentic by Accident
&lt;/h2&gt;

&lt;p&gt;The thing nobody tells you in the stack-2026 guides: a CLI that shares the business layer with your UI makes your app natively &lt;em&gt;agent-ready&lt;/em&gt;. Not as a separate product. As a free side effect.&lt;/p&gt;

&lt;p&gt;Three concrete ways to expose your CLI to an agent that isn't sitting in your IDE.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrap it as an MCP server.&lt;/strong&gt; Maybe 50 lines of TypeScript. You write a thin MCP server that registers each sub-command of your CLI as an MCP tool. The tool input maps to CLI flags. The tool output is the JSON the CLI already returns. Boom, any MCP client (Claude Desktop, Cursor, anything that speaks MCP) can call your CLI as a native tool. You wrapped your existing CLI and called it an MCP server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cron plus agent.&lt;/strong&gt; A scheduler runs &lt;code&gt;bun run cli catalog refresh&lt;/code&gt; every six hours. The JSON output goes into a Convex table. A background agent reads the latest row, decides if the refresh hit a partner error, and if so triggers a follow-up &lt;code&gt;bun run cli partner reconnect&lt;/code&gt;. No browser. No human. The agent makes decisions based on text the CLI emits, then triggers more CLI commands. You just turned your back-office into a self-healing loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTTP gateway shell-out.&lt;/strong&gt; You expose a tiny Express or Hono endpoint that takes a CLI command name plus args, shells out to the CLI, returns the JSON. Authenticated of course. Now any external agent that speaks HTTP can drive your app. No SDK to maintain. The CLI is the SDK.&lt;/p&gt;

&lt;p&gt;None of those three asks for a refactor of your business logic. They're pure exposure layers on top of code you already wrote. One stack, two modes: dashboard for humans, CLI for agents. The dashboard didn't know it had a twin. Now it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Integration Patterns (Pick One, Pick Right)
&lt;/h2&gt;

&lt;p&gt;If you're going to bake a CLI into your stack, there are three ways to wire it. Only one of them gives you the autonomy gap I described earlier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: CLI shares the business layer with the UI.&lt;/strong&gt; The dashboard "Sync partner" button calls a Convex mutation. The CLI &lt;code&gt;partner sync&lt;/code&gt; command calls the same mutation, with the same Drizzle schema, same TypeScript types end-to-end. Same idempotence guarantees. Same audit log. This is the one you want. Everything I've been describing assumes this pattern. (&lt;a href="https://rentierdigital.xyz/blog/convex-claude-typescript-saas-backend" rel="noopener noreferrer"&gt;Convex pairs particularly well with Claude Code&lt;/a&gt; for this exact setup, because the typed end-to-end API makes the CLI a thin wrapper around mutations rather than a parallel implementation of them.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 2: CLI as HTTP client of your own API.&lt;/strong&gt; The CLI calls your REST or tRPC endpoints. Easier to isolate, language-agnostic, you can ship the CLI to clients who don't run your monorepo. But you lose the typing benefits, you have to handle auth manually, and idempotence is up to whoever wrote the endpoint. Acceptable as a fallback if your backend is in a different repo than your CLI consumer. Not optimal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 3: DevOps CLI, separate from the app.&lt;/strong&gt; Deploy commands, backup scripts, monitoring tools. Useful, but it's not a substitute. If your app lives, you also need Pattern 1 or 2 alongside it. Pattern 3 alone is what most teams ship and what gets confused for "we have a CLI". It's just a deploy script.&lt;/p&gt;

&lt;p&gt;Verdict: Pattern 1 is the only one that returns the velocity gap. Pattern 2 is half the work for a fraction of the benefit. Pattern 3 is hosting plumbing dressed up as a CLI.&lt;/p&gt;

&lt;p&gt;If you can only build one pattern, build the first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tooling: cac vs citty vs the Rest in 2026
&lt;/h2&gt;

&lt;p&gt;Quick rundown of what's actually worth using to build the CLI itself, since this is where most people get stuck for a weekend.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cac&lt;/code&gt; is my default. About 2 KB, zero dependencies, ESM-first. If your CLI has fewer than 20 sub-commands, this is the right tool. Small enough that you don't think about it, and Claude Code generates clean cac code on the first try.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;citty&lt;/code&gt; from the UnJS folks is the ascending pick for 2026. Type-safe, lazy-loading sub-commands (matters when you start hitting 30+), ESM-first, plays nicely with Nitro and the rest of the UnJS world. Migrate to it when your CLI grows past where cac feels cramped.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;commander&lt;/code&gt; is the legacy mature option. Stable, well-documented, will do the job, but the API feels older and the bundle is heavier than it needs to be. Choose it only if your team already knows it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;clipanion&lt;/code&gt; is OOP-flavored, used by Yarn. Good if you like classes and want strict typing. Niche.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;oclif&lt;/code&gt; is over-architected unless your CLI itself is the product (think Heroku, Salesforce). For a CLI that supports an app, oclif is bringing a forklift to move a couch.&lt;/p&gt;

&lt;p&gt;For the rest of the experience, you want &lt;code&gt;clack&lt;/code&gt; for prompts (gorgeous TUI, very recent), &lt;code&gt;picocolors&lt;/code&gt; for colors (smaller and faster than chalk now), &lt;code&gt;consola&lt;/code&gt; for logging, &lt;code&gt;listr2&lt;/code&gt; if you have multi-step tasks with progress bars, and &lt;code&gt;bun shell&lt;/code&gt; or &lt;code&gt;zx&lt;/code&gt; for embedded scripts.&lt;/p&gt;

&lt;p&gt;Start on cac. Migrate to citty when you cross 20 sub-commands.&lt;/p&gt;

&lt;p&gt;Don't overthink it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When the Missing CLI Hurts (Four Scenarios)
&lt;/h2&gt;

&lt;p&gt;Four moments where the absence of a CLI costs you specifically, in case the abstract argument hasn't landed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Onboarding a new client e-shop.&lt;/strong&gt; Without a CLI, each new client is two to three hours of clicking in the admin: provision domain, set theme, install plugins, seed catalog, configure DNS. Multiply by ten clients in a month. With a CLI, &lt;code&gt;site init shop.example.com&lt;/code&gt; runs the whole sequence in five minutes. The agent can run it on its own when a Stripe webhook fires "new customer".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recurring data fix.&lt;/strong&gt; A partner sometimes returns malformed prices in their API. Without a CLI, every incident means rewriting the fix mutation by hand, or digging through &lt;code&gt;scripts/&lt;/code&gt; to find "the one that worked last time". With a CLI, you have &lt;code&gt;bun run cli prices reconcile --dry-run&lt;/code&gt;, idempotent, versioned, documented in &lt;code&gt;--help&lt;/code&gt;. The agent invokes it itself when the alert fires.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit during incident.&lt;/strong&gt; Something broke in prod, you need to know which orders were affected. Without a CLI, you grep through &lt;code&gt;scripts/&lt;/code&gt; for "that audit thing I wrote in March". With a CLI, &lt;code&gt;cli orders audit --since=2026-04-01&lt;/code&gt; exists, is documented, and the agent can run it while you're still typing in Slack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External data refresh.&lt;/strong&gt; Cron has to refresh a partner catalog every night. Without a CLI, the cron points to &lt;code&gt;node scripts/old-thing.mjs&lt;/code&gt; and the file slowly drifts out of sync with the schema, until one Tuesday it fails silently for 48 hours before someone notices. With a CLI, the cron points to &lt;code&gt;bun run cli partner refresh&lt;/code&gt;, which shares the same business layer as the dashboard, so a schema change breaks the cron at the next deploy instead of in the middle of the night.&lt;/p&gt;

&lt;p&gt;Same four problems. The CLI makes each one boring.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-Second Test Your Stack Has to Pass Today
&lt;/h2&gt;

&lt;p&gt;Open your terminal. &lt;code&gt;cd&lt;/code&gt; into your repo. Type &lt;code&gt;bun run cli --help&lt;/code&gt; (or &lt;code&gt;yarn cli --help&lt;/code&gt;, or &lt;code&gt;npm run cli -- --help&lt;/code&gt;, whatever your package manager).&lt;/p&gt;

&lt;p&gt;There are exactly three possible outcomes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outcome A.&lt;/strong&gt; Nothing comes out. Or "command not found". Or &lt;code&gt;package.json&lt;/code&gt; doesn't have a &lt;code&gt;cli&lt;/code&gt; script. You don't have a CLI. You have a UI bolted onto a backend. The agent that codes your app depends on you for every iteration, and the orphan rate of your &lt;code&gt;scripts/&lt;/code&gt; folder is climbing slowly toward 41% whether you measure it or not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outcome B.&lt;/strong&gt; A list shows up, but the sub-commands are generic devops things (&lt;code&gt;build&lt;/code&gt;, &lt;code&gt;dev&lt;/code&gt;, &lt;code&gt;test&lt;/code&gt;, &lt;code&gt;deploy&lt;/code&gt;) with no business actions. You have devops scaffolding. Useful, but the agent can deploy your code and not validate that a feature works. You're at Pattern 3 of the three patterns above. Half the journey.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outcome C.&lt;/strong&gt; A list shows up with sub-commands named after business actions (&lt;code&gt;site init&lt;/code&gt;, &lt;code&gt;partner sync&lt;/code&gt;, &lt;code&gt;catalog refresh&lt;/code&gt;), each with a description. You have a 2026-ready stack. The agent that writes your code has a way to verify it. Your &lt;code&gt;scripts/&lt;/code&gt; folder is empty or has fewer than five files. You can stop reading.&lt;/p&gt;

&lt;p&gt;If you got A or B, this is where you start. Pick one or two business actions you do most often (the ones that show up in your &lt;code&gt;scripts/&lt;/code&gt; folder under three different names), and make them the first two sub-commands of a real CLI. Wire them through the same business layer the dashboard uses. Make the output JSON-shaped. That's the smallest possible Pattern 1, and it'll change how Claude Code works on your repo by tomorrow morning.&lt;/p&gt;

&lt;p&gt;I already wrote about &lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;CLIs as the interface for agents calling your tools&lt;/a&gt;. This one is about CLIs as the interface for the agent writing your code from the inside. Different problem, same layer. The first one is about MCP versus CLI as a remote calling convention. This one is about whether the agent in your IDE has a way to ship.&lt;/p&gt;

&lt;p&gt;The next time you start an app, decide on day one whether the CLI is a kernel or an afterthought. That choice decides how much time Claude Code spends coding for you, versus how much time you spend being the mouse of Claude Code.&lt;/p&gt;




&lt;p&gt;Six months building two apps in parallel and I didn't realize I was running a controlled experiment on myself. As an old Linux head, I already knew CLI beats most things, intuitively. What I didn't see coming was the part that mattered most: not just speed and scriptability, but giving the agent a feedback loop it could read on its own.&lt;/p&gt;

&lt;p&gt;Claude and Codex won't suggest this by default. So tell them yourself: bake a CLI layer as the kernel, day one.&lt;/p&gt;

&lt;p&gt;I'm out, piña colada's waiting 😎&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CLI was the layer the whole time.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;KDnuggets, &lt;em&gt;Tech Stack for Vibe Coding Modern Applications&lt;/em&gt; (February 2026)&lt;/li&gt;
&lt;li&gt;Idlen, &lt;em&gt;The Best Stack to Launch Your AI-Coded Tool in 2026&lt;/em&gt; (April 2026)&lt;/li&gt;
&lt;li&gt;Context Studios, &lt;em&gt;The Perfect Vibe Coding Tech Stack 2026: 10 Tools Every App Needs&lt;/em&gt; (February 2026)&lt;/li&gt;
&lt;li&gt;First-hand audit, two of my own apps, 30 days of git history (May 2026)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>softwaredevelopment</category>
      <category>aiagents</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>Claude Code Was Broken for 6 Weeks. AMD Caught It in 6,852 Sessions Before Anthropic Did.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Thu, 07 May 2026 13:41:10 +0000</pubDate>
      <link>https://dev.to/rentierdigital/claude-code-was-broken-for-6-weeks-amd-caught-it-in-6852-sessions-before-anthropic-did-7i5</link>
      <guid>https://dev.to/rentierdigital/claude-code-was-broken-for-6-weeks-amd-caught-it-in-6852-sessions-before-anthropic-did-7i5</guid>
      <description>&lt;p&gt;For six weeks, you thought you were writing your prompts wrong.&lt;/p&gt;

&lt;p&gt;You could feel Claude Code messing up. Refactors going sideways, files edited without being read, thinking cut mid-sentence. You re-read your CLAUDE.md, tweaked your instructions, blamed your harness. The Anthropic dashboard said everything was fine.&lt;/p&gt;

&lt;p&gt;You had your feeling against their telemetry.&lt;/p&gt;

&lt;p&gt;Guess who lost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; On April 23, 2026, the day &lt;strong&gt;GPT-5.5 dropped&lt;/strong&gt;, Anthropic published a &lt;strong&gt;postmortem&lt;/strong&gt; validating &lt;strong&gt;six weeks of user complaints&lt;/strong&gt;. Twenty-one days earlier, an &lt;strong&gt;AI director at AMD&lt;/strong&gt; had already filed a &lt;strong&gt;forensic audit&lt;/strong&gt; of &lt;strong&gt;6,852 sessions&lt;/strong&gt; on GitHub. The bugs are documented, the timing is worse, and the lesson isn't the one most coverage is selling.&lt;/p&gt;

&lt;p&gt;For most press coverage, the event is the postmortem. Not for this article. The event is the &lt;strong&gt;21 days&lt;/strong&gt; between the AMD audit and the Anthropic confirmation, the word a tech publication put in its headline without drawing the operational consequence, and the reason thousands of paying devs spent six weeks doubting themselves while the truth was sitting on GitHub.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Postmortem Dropped on GPT-5.5 Day. The Audit Dropped Three Weeks Earlier.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitre-quot-the-six-weeks-nobody-confirmed-quot-sous-titre-b194c142.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitre-quot-the-six-weeks-nobody-confirmed-quot-sous-titre-b194c142.png" alt="TITRE &amp;quot;The Six Weeks Nobody Confirmed&amp;quot; + sous-titre &amp;quot;From the first silent change to the public postmortem&amp;quot;. Métaphore : ligne de temps horizontale en forme de tapis qui se déchire au milieu, avec petites mains qui tirent dessus depuis le bas. Style : ligne claire franco-belge, trait noir épais, halftone dots discrets, formes géométriques arrondies. Palette : warm beige #F4E4C1, alarm red #E63946, deep navy #1D3557, soft cream #FFF8E7, black #111111. Contenu : 5 marqueurs sur la timeline, March 4 (default reasoning effort drops), March 26 (caching bug starts), April 2 (Laurenzo files GitHub #42796), April 16 (verbosity cap added), April 23 (Anthropic postmortem). Au-dessus de la timeline, &amp;quot;VENDOR DASHBOARD: ALL GREEN&amp;quot; en typographie machine. En dessous, &amp;quot;USER REALITY: 6,852 sessions degraded&amp;quot; en handwriting. Highlight : la zone entre April 2 et April 23 ressort en surbrillance rouge avec hachures, label &amp;quot;21 days of confirmed silence&amp;quot;. Légende : icône feuille de log = Laurenzo's audit / icône bulle = vendor postmortem. Footer : © rentierdigital.xyz. NOT flat corporate timeline, NOT minimalist tech aesthetic." width="768" height="1029"&gt;&lt;/a&gt;&lt;br&gt;Timeline of Six-Week AI Performance Degradation Incident
  &lt;/p&gt;

&lt;p&gt;April 23, 2026. Anthropic published its postmortem.&lt;/p&gt;

&lt;p&gt;The same day, OpenAI shipped GPT-5.5. The timing wasn't lost on anyone reading the dev forums that morning.&lt;/p&gt;

&lt;p&gt;The postmortem documented three changes that silently degraded Claude Code over six weeks. &lt;strong&gt;Default reasoning effort&lt;/strong&gt; dropped from "high" to "medium" between March 4 and April 7, thirty-three days. A &lt;strong&gt;caching bug&lt;/strong&gt; (&lt;code&gt;clear_thinking_20251015&lt;/code&gt; with &lt;code&gt;keep:1&lt;/code&gt;) ran on every turn instead of once, between March 26 and April 10, fifteen days. A &lt;strong&gt;system prompt verbosity limit&lt;/strong&gt; capped responses at 25 words between tool calls and 100 words for the final response, between April 16 and April 20, four days.&lt;/p&gt;

&lt;p&gt;Anthropic called the first one "the wrong tradeoff." That phrase is rare. Vendors usually say "we have identified an issue" or "an unexpected interaction." Not "the wrong tradeoff."&lt;/p&gt;

&lt;p&gt;For most coverage, that was the event. The bugs catalogued, the fixes shipped in v2.1.116, the usage limits reset, the API unaffected. Roll credits.&lt;/p&gt;

&lt;p&gt;Not for this article.&lt;/p&gt;

&lt;p&gt;Twenty-one days before the postmortem, on April 2, &lt;strong&gt;Stella Laurenzo&lt;/strong&gt;, Senior Director of AI at AMD and former lead of Google's OpenXLA project, filed GitHub issue #42796 against the Claude Code repo. She attached 6,852 sessions of telemetry, named the regressions, documented the dates, and quoted Anthropic's own behavior back to itself.&lt;/p&gt;

&lt;p&gt;She knew. Reddit and Twitter had been logging the same symptoms for weeks.&lt;/p&gt;

&lt;p&gt;Anthropic took three weeks to confirm.&lt;/p&gt;

&lt;p&gt;Every vendor ships bugs. The story is the timeline. Six weeks of degraded code stayed invisible to thousands of paying customers until somebody outside the building built her own forensic infrastructure and dropped the receipts on GitHub. The bugs are documented. The timeline is what nobody wants to talk about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Audit That Forced the Confession
&lt;/h2&gt;

&lt;p&gt;Stella Laurenzo doesn't tweet vibes.&lt;/p&gt;

&lt;p&gt;She runs AI infrastructure at AMD. Before that, she led the OpenXLA project at Google. Her audit reads like a court filing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub issue #42796.&lt;/strong&gt; 6,852 Claude Code sessions captured between January and early April. 234,760 tool calls. 17,871 thinking blocks.&lt;/p&gt;

&lt;p&gt;The behavioral metrics were the part nobody could argue with. &lt;strong&gt;Median thinking length&lt;/strong&gt; went from 2,200 characters in January to 600 characters in March, a &lt;strong&gt;73% collapse&lt;/strong&gt;. Files-read-before-edit dropped from 6.6 to 2.0. Stop-hook violations climbed from zero to roughly ten a day after March 8.&lt;/p&gt;

&lt;p&gt;These aren't perceptual claims. Nobody's saying "it feels worse." She measured what the agent did, and the agent did less. Less reading, less thinking, more premature stops.&lt;/p&gt;

&lt;p&gt;The conclusion landed at the top of the issue: "Claude cannot be trusted to perform complex engineering tasks."&lt;/p&gt;

&lt;p&gt;Read that sentence again with the source attached. AI infrastructure director of one of the largest chip makers on the planet. 234,760 tool calls behind it.&lt;/p&gt;

&lt;p&gt;Then a detail that should have ended the news cycle right there. &lt;strong&gt;AMD switched providers&lt;/strong&gt; during the incident. The Register reported it on April 6. Laurenzo wrote that her team had moved to another vendor producing superior quality work, with the implication that they kept the Claude option open in hope it would get fixed. She didn't say which provider.&lt;/p&gt;

&lt;p&gt;A few caveats, because honesty matters. Anthropic disputed some interpretations on the issue thread itself. And a separate viral benchmark claim from a different group, circulating in parallel at the time, was independently debunked for methodology issues. Worth not mixing up with the Laurenzo audit, which stands on its own numbers.&lt;/p&gt;

&lt;p&gt;Six thousand eight hundred and fifty-two sessions don't unhappen.&lt;/p&gt;

&lt;p&gt;It read like an indictment with footnotes. Anthropic took three weeks to confirm any of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Were Right and Couldn't Prove It
&lt;/h2&gt;

&lt;p&gt;Six weeks before the audit, the dev forums were already on fire.&lt;/p&gt;

&lt;p&gt;Catalin Pit on Twitter, March 20: "Lately, Claude makes some shocking mistakes." On Reddit r/ClaudeCode, April 7, u/marcin_dev posted: "has Claude Code become significantly dumber over the past few days?" The replies all said yes. On Twitter, April 13, @safetyth1rd: "It's taking 2-3x longer to do stuff."&lt;/p&gt;

&lt;p&gt;None of it moved a needle.&lt;/p&gt;

&lt;p&gt;Then, post-postmortem, u/Enthu-Cutlet-1337 wrote the line that everyone in the thread recognized. The 25-word cap explained so much, they had been seeing Opus truncate mid-reasoning on refactors for weeks and "thought my prompts were off."&lt;/p&gt;

&lt;p&gt;Four words doing the heaviest lifting in the whole story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thought my prompts were off.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's the cognitive trap. When the user perceives degradation and the vendor dashboard says everything is fine, the user doubts themselves first. Not because they're naive. Because of the asymmetry of evidence.&lt;/p&gt;

&lt;p&gt;The vendor has the telemetry, the eval suites, the regression tests, the dashboards. The user has a feeling. When the feeling and the dashboard disagree, the dashboard wins. It looks more like evidence.&lt;/p&gt;

&lt;p&gt;A vibe is easy to dismiss. "Maybe you wrote the prompt wrong. Maybe your CLAUDE.md drifted. Or the task was just harder this time."&lt;/p&gt;

&lt;p&gt;A 6,852-session audit isn't easy to dismiss.&lt;/p&gt;

&lt;p&gt;That's why nobody confirmed anything until Laurenzo.&lt;/p&gt;

&lt;p&gt;Post-postmortem, u/Sufficient-Farmer243 closed the loop on r/ClaudeCode. They wrote that every single issue the community had been "gaslit" about for weeks turned out to be exactly what people had been describing. (Their wording, in quotes for a reason. Whether you agree with the verb or not, it was the dominant register in the thread.)&lt;/p&gt;

&lt;p&gt;Once the postmortem dropped, the thread filled with confirmation replies. Not new bugs. Old bugs people had been logging silently into private diaries for five weeks straight.&lt;/p&gt;

&lt;p&gt;You weren't wrong. You just didn't have AMD-grade telemetry sitting on your laptop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Word Anthropic Picked, the Connection Most Coverage Missed
&lt;/h2&gt;

&lt;p&gt;VentureBeat put one word in its headline: "&lt;strong&gt;harnesses&lt;/strong&gt;."&lt;/p&gt;

&lt;p&gt;"Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation."&lt;/p&gt;

&lt;p&gt;That's the framing Anthropic itself confirmed. The model didn't get worse. The &lt;strong&gt;harness&lt;/strong&gt; around the model got worse. Default reasoning effort. Caching behavior. System prompt verbosity. Three knobs in the wrapper, not the weights.&lt;/p&gt;

&lt;p&gt;Most coverage noted the word and moved on. Few drew the consequence.&lt;/p&gt;

&lt;p&gt;If the harness matters more than the model, and the harness can be silently modified by the vendor for six weeks at a time, then the harness isn't really yours.&lt;/p&gt;

&lt;p&gt;It's their territory.&lt;/p&gt;

&lt;p&gt;Your CLAUDE.md is one layer. The default reasoning effort, the caching behavior, the verbosity prompt, those are layers in their codebase you'll never see. I've written before about &lt;a href="https://rentierdigital.xyz/blog/claude-md-is-the-new-env-and-most-developers-treat-it-like-a-readme" rel="noopener noreferrer"&gt;the layer most developers treat as a readme&lt;/a&gt;, arguing CLAUDE.md was the new .env. I still think that. The piece nobody talks about is what sits underneath.&lt;/p&gt;

&lt;p&gt;You write 47 lines of CLAUDE.md. The vendor's harness loads dozens of instructions before yours even runs. You control the top of the stack. They control everything below.&lt;/p&gt;

&lt;p&gt;When the bottom of the stack changes, your top is decoration.&lt;/p&gt;

&lt;p&gt;What is striking about this postmortem, it's not the harness mattering. Most senior devs already suspected it did. The new piece is the published, vendor-confirmed admission that yes, &lt;strong&gt;the wrapper is doing more work than the model&lt;/strong&gt; in many tasks, and yes, the wrapper can be modified mid-month without you knowing.&lt;/p&gt;

&lt;p&gt;Extended thinking is load-bearing for senior engineering workflows. The user-facing layer most paying customers tune (CLAUDE.md, slash commands, custom prompts) sits on top of vendor-controlled defaults that decide how much the model thinks before acting. When those defaults shift, every workflow built on top shifts too. Silently.&lt;/p&gt;

&lt;p&gt;Read your CLAUDE.md tonight. Still useful, still load-bearing in the part you control. But you're tuning the steering wheel.&lt;/p&gt;

&lt;p&gt;Somebody else is changing the gearbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  AMD Switched. Reddit Knew. Anthropic Confirmed Last.
&lt;/h2&gt;

&lt;p&gt;Three facts in a line.&lt;/p&gt;

&lt;p&gt;AMD's AI director switched to another provider during the incident. The Register reported it on April 6. Reddit had been documenting symptoms since early March. Anthropic confirmed the bugs on April 23, twenty-one days after the audit landed in their own GitHub repo.&lt;/p&gt;

&lt;p&gt;Pattern: &lt;strong&gt;operational truth bubbled up from the user base&lt;/strong&gt; before the vendor validated it.&lt;/p&gt;

&lt;p&gt;That's not a fluke. That's the structural shape of any hosted AI degradation. The vendor has eval suites and dashboards optimized for the metrics they care about. The user base runs the real workload, in real codebases, with real consequences. When the two diverge, the user base notices first. The vendor confirms last.&lt;/p&gt;

&lt;p&gt;If the gap is twenty-one days, the user base eats twenty-one days of degraded output.&lt;/p&gt;

&lt;p&gt;If your AI workflow can be silently degraded for six weeks, you don't have a workflow. You have a &lt;strong&gt;single point of failure with autocomplete&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I wrote &lt;a href="https://rentierdigital.xyz/blog/anthropic-just-killed-my-200-month-openclaw-setup-so-i-rebuilt-it-for-15" rel="noopener noreferrer"&gt;the pricing-side version of this argument&lt;/a&gt; last month. Same vendor, different leverage. The reliability-side version is worse, because it's invisible. A pricing change shows up on your invoice. A harness change shows up six weeks later in a forensic audit you didn't run.&lt;/p&gt;

&lt;p&gt;Yes, &lt;strong&gt;multi-stack costs more&lt;/strong&gt; to set up. Routing logic, eval glue, redundant API keys, two flavors of CLAUDE.md to maintain. It's annoying. The cost of not doing it is six weeks of degraded code you shipped without knowing it, plus a 6,852-session audit run by somebody else to find out. You can't observe what the vendor changed, so you hope.&lt;/p&gt;

&lt;p&gt;Anyway, point is this: you spent six weeks re-reading your prompts while an AI director at AMD was logging 6,852 sessions to prove you weren't crazy.&lt;/p&gt;

&lt;p&gt;Your AI workflow doesn't rest on your harness. It rests on a vendor's patience to maybe ship a postmortem. That's not a workflow, that's a bet.&lt;/p&gt;

&lt;p&gt;Next time something feels off, don't ask if your prompts suck.&lt;/p&gt;

&lt;p&gt;Ask if you have your own telemetry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic Engineering, &lt;em&gt;An update on recent Claude Code quality reports&lt;/em&gt;, April 23, 2026: &lt;a href="https://www.anthropic.com/engineering/april-23-postmortem" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/april-23-postmortem&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The Register, &lt;em&gt;Claude Code has become dumber, lazier: AMD director&lt;/em&gt;, April 6, 2026: &lt;a href="https://www.theregister.com/2026/04/06/anthropic_claude_code_dumber_lazier_amd_ai_director/" rel="noopener noreferrer"&gt;https://www.theregister.com/2026/04/06/anthropic_claude_code_dumber_lazier_amd_ai_director/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;VentureBeat, &lt;em&gt;Mystery solved: Anthropic reveals changes to Claude's harnesses&lt;/em&gt;: &lt;a href="https://venturebeat.com/technology/mystery-solved-anthropic-reveals-changes-to-claudes-harnesses-and-operating-instructions-likely-caused-degradation" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/mystery-solved-anthropic-reveals-changes-to-claudes-harnesses-and-operating-instructions-likely-caused-degradation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub Issue #42796 (Stella Laurenzo / @stellaraccident): &lt;a href="https://github.com/anthropics/claude-code/issues/42796" rel="noopener noreferrer"&gt;https://github.com/anthropics/claude-code/issues/42796&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claude</category>
      <category>aitools</category>
    </item>
    <item>
      <title>94% of My Claude Code Tokens Went to the Wrong Model. So I Stopped Paying Opus to Do Haiku's Job.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Wed, 06 May 2026 13:41:11 +0000</pubDate>
      <link>https://dev.to/rentierdigital/94-of-my-claude-code-tokens-went-to-the-wrong-model-so-i-stopped-paying-opus-to-do-haikus-job-152l</link>
      <guid>https://dev.to/rentierdigital/94-of-my-claude-code-tokens-went-to-the-wrong-model-so-i-stopped-paying-opus-to-do-haikus-job-152l</guid>
      <description>&lt;p&gt;You feel like you have done everything right with Claude Code. Hooks installed. CLAUDE.md curated to 6,890 tokens. Every MCP server killed off, with the kind of discipline you are silently proud of on a Friday evening.&lt;/p&gt;

&lt;p&gt;Then it is Wednesday, three days before your weekly limits reset, and you are already at 80% usage. Even on the Max plan.&lt;/p&gt;

&lt;p&gt;That is when you start wondering what the discipline was actually buying you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR.&lt;/strong&gt; You can twist your Claude Code setup all you want. I found a &lt;strong&gt;free tool&lt;/strong&gt; that does not lie about the &lt;strong&gt;bloat&lt;/strong&gt;, cuts &lt;strong&gt;costs drastically&lt;/strong&gt;, and makes &lt;strong&gt;Claude sharper&lt;/strong&gt;, not just cheaper. Here is what I did, and how you can replicate it.&lt;/p&gt;

&lt;p&gt;Months I had been listening to Boris Cherny and reading the Anthropic essays on context engineering. I did the homework. MCPs gone. CLAUDE.md trimmed. By the time I had also archived the MEMORY files and installed the SessionEnd hook, the numbers spoke for themselves: 643 sessions in thirty days with zero crashes and zero /context panic.&lt;/p&gt;

&lt;p&gt;At the end of my tokens, I went looking. And I found a gem. An audit tool that does not just count tokens. It tells you &lt;strong&gt;when Claude is getting dumb&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Score That Made Me Close the Tab
&lt;/h2&gt;

&lt;p&gt;[IMAGE: Token Optimizer audit dashboard showing Health Score C at 69/100 with Claude.md global tokens and session overhead breakdown]&lt;/p&gt;

&lt;p&gt;I ran it last week. Six agents in parallel, sixty seconds. First thing on screen: &lt;strong&gt;Health Score C&lt;/strong&gt;, 69 out of 100.&lt;/p&gt;

&lt;p&gt;Not D, not F. &lt;strong&gt;C&lt;/strong&gt;. The grade of a kid who thought he had revised well. I had margin, or what am I saying, a chasm.&lt;/p&gt;

&lt;p&gt;Total Session Overhead: 23.5K tokens, 2.3% of the million-token window. CLAUDE.md global at 6,890. Skills at 1,809. Commands 1K, zero MCP tools, the rest of it sitting where I expected. The numbers were small.&lt;/p&gt;

&lt;p&gt;And then I saw the line that made me stop scrolling.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Real session baseline averages 43.2K tokens (+83.9% vs estimate). Gap is from unmeasured system overhead.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The audit tool was telling me itself it was undermeasuring. The XML framing, the MCP instructions, the system prompts that ship with every model call (admit it, you do not check those either). Things I never see in /context, that I had been paying for every single message.&lt;/p&gt;

&lt;p&gt;What I thought I was measuring was &lt;strong&gt;half of what was actually billed&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is the moment the framing flipped. The score is an architecture grade dressed up as a fill gauge. Half the bloat I was paying for, I never saw.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Debt Is Not Just Financial
&lt;/h2&gt;

&lt;p&gt;Every Medium post about Claude Code costs frames it the same way. Your context is full. You are paying too much. Trim your CLAUDE.md.&lt;/p&gt;

&lt;p&gt;Half right. The other half is that &lt;strong&gt;token debt is dual&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;financial layer&lt;/strong&gt; is the obvious one. The Anthropic API is &lt;em&gt;stateless&lt;/em&gt;. Every turn reloads the entire stack. &lt;em&gt;Prompt caching&lt;/em&gt; divides cost by ten but the window itself does not shrink. So if you have a ghost file of 1,410 tokens sitting in MEMORY for a finished project, and you send 3,347 messages a day, and you do that for eighteen days... that is 85 million tokens billed for zero value. From one file.&lt;/p&gt;

&lt;p&gt;For scale: 137 million billable tokens is what I went through last month, total. One ghost file alone took several percent of the volume. In silence.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;cognitive layer&lt;/strong&gt; is the one nobody talks about. Per Anthropic's published guidance and confirmed by the audit tool's design, quality drops past 50-70% context fill. &lt;em&gt;Compaction&lt;/em&gt; is lossy. With 9.7 million tokens average per session in a one-million-token window, that is roughly ten compactions per session. Six thousand compactions in thirty days. Each one shaves something off (a rule, a convention, a piece of scope you set up two hours ago).&lt;/p&gt;

&lt;p&gt;And then there is the overhead you cannot see. Anthropic's own engineering team reported that MCP tool definitions consumed 134K tokens before they shipped Tool Search. A Reddit user documented 67,000 tokens gone just from connecting four MCP servers. None of it shows up in /context.&lt;/p&gt;

&lt;p&gt;Unused tokens cost you money. The bigger cost is that you pay to degrade the quality of your own reasoning, every turn.&lt;/p&gt;

&lt;p&gt;I had written, back in February, that CLAUDE.md is the new .env, not the new README. I treated the file as a config problem. That was half the picture. The cognitive layer, I had not seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six Agents, Sixty Seconds, One Verdict
&lt;/h2&gt;

&lt;p&gt;The tool is called &lt;em&gt;token-optimizer&lt;/em&gt;. It is a Claude Code plugin, open source, on Alex Greensh's GitHub. You install it with one line: &lt;code&gt;/plugin marketplace add alexgreensh/token-optimizer&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The architecture is honest. Six agents running in parallel. &lt;strong&gt;Sonnet&lt;/strong&gt; handles the judgment calls, &lt;strong&gt;Haiku&lt;/strong&gt; does the counting work, and &lt;strong&gt;Opus&lt;/strong&gt; comes in only at the end for the synthesis. A dashboard lives at localhost:24842 and auto-updates on every SessionEnd hook.&lt;/p&gt;

&lt;p&gt;What it does that nothing else does: it tracks &lt;strong&gt;how much dumber your AI is getting&lt;/strong&gt; as the session wears on. Quote from the repo, not mine. /context shows you the gauge. This shows you the architecture under the gauge AND the quality decay over time.&lt;/p&gt;

&lt;p&gt;The tool also admits its own limits. The +83.9% sub-measurement I quoted earlier? It comes from the audit telling me itself: "Gap is from unmeasured system overhead." A tool that owns what it cannot see is more credible than one that pretends to see everything.&lt;/p&gt;

&lt;p&gt;Three weeks ago, I wrote about the day my Pro Max plan lasted fifteen minutes before I ran /context and saw the bloat for the first time. I thought I had seen the worst. /context tells you the fridge is full. Token-optimizer tells you &lt;strong&gt;half the food is expired&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It also fits a thesis I have been pushing for months. The reason this works is because it is a CLI-native plugin, not an MCP server. It runs inside Claude Code's execution loop, not over a remote tool protocol. It is a small data point in &lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;why CLI-native tools beat MCP servers for AI agents&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Six agents, sixty seconds, one dashboard. And every finding it produces, you have to apply yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Visible Lever Pays the Least
&lt;/h2&gt;

&lt;p&gt;Here is what the audit found first. Three project MEMORY files, archived from finished work. One of them was 1,410 tokens on its own. Run the math: 1,410 tokens × 3,347 messages a day × 18 days that the file sat in MEMORY after I had stopped touching the project. That is &lt;strong&gt;85 million tokens billed for zero value&lt;/strong&gt;, from one file.&lt;/p&gt;

&lt;p&gt;For scale, my total billable consumption that month was 137 million. One ghost file ate roughly 0.6% of my entire monthly volume, while contributing nothing.&lt;/p&gt;

&lt;p&gt;Multiply that across the others: two more archived MEMORY files. Six citations of old Twitter incidents in the global CLAUDE.md, retired. Verbose skill descriptions trimmed. Plugin commands I never invoked, pruned.&lt;/p&gt;

&lt;p&gt;The total measured cleanup: &lt;strong&gt;-1,386 tokens per message&lt;/strong&gt; on the file layer, a 5.9% drop. Plus another 2,565 tokens shaved from the on-demand corpus.&lt;/p&gt;

&lt;p&gt;The rule I would generalize: if a MEMORY file has not been referenced in a prompt for 14 days, archive it by default. Anthropic's docs recommend a 200-line auto-load cap. The real practical cap is &lt;strong&gt;14 days of reference&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now here is the inconvenient truth.&lt;/p&gt;

&lt;p&gt;5.9% is not the half-bite of bread it sounds like. It is the layer you can see, which makes it the layer everyone obsesses over. But it is also the layer that pays the least.&lt;/p&gt;

&lt;p&gt;The ghost file is the appetizer. The main course is somewhere else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Best Practices I Copied From Threads. Both Cost Me.
&lt;/h2&gt;

&lt;p&gt;Two findings the audit flagged that you would not catch by reading every Anthropic doc cover to cover.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First one.&lt;/strong&gt; I had a skill called questionnaire-prospect with a description that ran 438 characters. The audit flagged it past the threshold. Past roughly 250 characters, Claude Code silently truncates the description in the skill menu. The full SKILL.md still loads when the skill is invoked, but in the menu where Claude decides whether to trigger the skill in the first place, the description is mutilated.&lt;/p&gt;

&lt;p&gt;Result: Claude sees a cut-off sentence, does not understand when the skill should fire, and quietly ignores it. The skill does not crash. It just stops triggering reliably. I trimmed the description to 223 characters. The audit's recommended optimum is 80 characters. None of the official Anthropic docs I have read mentions this truncation behavior.&lt;/p&gt;

&lt;p&gt;What you write in a description, and what Claude actually sees, are not guaranteed to be the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second one.&lt;/strong&gt; A &lt;em&gt;PostToolUse&lt;/em&gt; hook running vitest after every Edit/Write. TDD applied to the agent workflow, no broken state ever shipped. Sounded great on Reddit. Sounded great on a thread I was reading.&lt;/p&gt;

&lt;p&gt;Real life, with one file edited thirty times during a refactor: thirty test runs. Thirty vitest reports polluting Claude's context (each report is a system message in his context window). Thirty latency hits. The audit flagged it the moment it ran. I disabled the hook. Latency cleaned up. Context stayed cleaner.&lt;/p&gt;

&lt;p&gt;The rule that goes beyond vitest: a generic PostToolUse hook matching Edit/Write is almost always a trap. Hooks should fire on &lt;strong&gt;phase transitions&lt;/strong&gt; (SessionEnd, end of feature, deploy). Not on atomic operations.&lt;/p&gt;

&lt;p&gt;Common thread of both: generic best practices are not the same as best practices for YOUR project at YOUR volume. A senior dev does not run the test suite after every save. He runs it before commit. Same logic for agents.&lt;/p&gt;

&lt;p&gt;These are the patterns I called out in &lt;a href="https://generativeai.pub/every-claude-code-tutorial-teaches-you-the-same-5-things-none-of-them-matter-in-production-76fde74239ca" rel="noopener noreferrer"&gt;every Claude Code tutorial that falls apart in production&lt;/a&gt;. I had been collecting best practices like skills: in a folder, just in case, paid for in silence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Routing Made the AI Sharper, Not Just Cheaper
&lt;/h2&gt;

&lt;p&gt;This is the finding the audit ranked number one. And it is the section the title points at.&lt;/p&gt;

&lt;p&gt;Thirty days of usage. 643 sessions. 137 million tokens billable. Model mix: &lt;strong&gt;94% Opus, 0% Sonnet, 4% Haiku&lt;/strong&gt;, 2% other.&lt;/p&gt;

&lt;p&gt;Sonnet at zero. On 137 million tokens.&lt;/p&gt;

&lt;p&gt;I had not chosen Opus over Sonnet. I had simply never set up routing in the first place. The audit tagged it with a savings projection of &lt;strong&gt;50-75%&lt;/strong&gt;. On the seven days I have visible in the report, I spent $1,668. A 50% routing discipline saves $834 a week. 75% saves $1,250.&lt;/p&gt;

&lt;p&gt;That is where the cost angle lands honestly, not on the file cleanup. But cost is half the story. The pivot the title makes is this: the routing does not just save money. &lt;strong&gt;It makes the AI sharper&lt;/strong&gt;. Three mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opus has a temperament.&lt;/strong&gt; It is tuned to think. When you give it a bounded task, it widens it, because that is what it does well. Take a typical week. I am debugging a flow in my WooCommerce store and I ask an Explore agent a simple question: "find every reference to checkout_cta across the storefront." A grep task. Literally. Haiku does this in four seconds for a fraction of a cent.&lt;/p&gt;

&lt;p&gt;In 94% Opus mode, the agent reads 23 files, contextualizes the usages, notices an inconsistency between the markdown bullet and blockquote formats in older versus newer modules, proposes a refactor, opens a discussion about factoring the pattern into a shared template. I had asked for nothing of the kind. I wanted a list of files. I got a mini architecture review. Cost: roughly $0.30, eight minutes, context eaten for nothing because I had moved on after.&lt;/p&gt;

&lt;p&gt;Routed to Haiku, same query: list in four seconds, $0.001, context intact.&lt;/p&gt;

&lt;p&gt;Opus is not bad here. &lt;strong&gt;Opus is miscast&lt;/strong&gt;. Asking Opus to grep is like asking a senior architect to count boxes in a closet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sonnet on bounded refactors is more disciplined than Opus.&lt;/strong&gt; For transformations in a defined scope, refactoring a module, adding a feature inside a perimeter, Sonnet ships an output that stays aligned. Fewer unsolicited alternative proposals. Right tool, right scope.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opus freed from bounded tasks gets better at the architectural decisions you actually need it for.&lt;/strong&gt; When you let Opus handle grep AND counting AND lookup AND architectural decisions, by the time you invoke it on the hard problem, its context is already polluted by 47 sessions of grep. The "quality drops past 70%" rule lives here. You do not just pay Opus to do Haiku's job. You pay the hidden price too: your Opus is less sharp when you actually need it.&lt;/p&gt;

&lt;p&gt;Opus stays the right call for complex reasoning, that is what Anthropic tunes it for. Haiku is not "smarter" in absolute. The nuance is that &lt;strong&gt;Haiku is better cast&lt;/strong&gt;, on the right task, with a clean context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lever the Audit Couldn't Pull for Me
&lt;/h2&gt;

&lt;p&gt;The audit shows you. The audit does not fix everything.&lt;/p&gt;

&lt;p&gt;Of the findings in the report, the file cleanup is auto-applicable. Archive a file, it stays archived. The skill bloat is partially auto: descriptions get shortened in place. The vitest hook is binary, on or off.&lt;/p&gt;

&lt;p&gt;The model routing? That is different.&lt;/p&gt;

&lt;p&gt;The tool injects a routing block into CLAUDE.md global with a &lt;strong&gt;48-hour TTL&lt;/strong&gt;. If I do not refresh it, it auto-deletes, because usage patterns drift. I cannot auto-discipline myself from a config file. I have to consciously dispatch: Explore and lookup go to Haiku. Refactors in scope go to Sonnet. Architectural decisions go to Opus.&lt;/p&gt;

&lt;p&gt;That is human discipline. Measurable, but not automatable.&lt;/p&gt;

&lt;p&gt;The real message of the audit is not "here are the files to archive." It is closer to: here is the debt your workflow accumulates, and the dominant lever runs through you, not through me.&lt;/p&gt;

&lt;p&gt;Same structure as something I shipped a couple of months ago, when I let Claude Code audit my own three hundred and seventy-four sessions and the report came back inconveniently honest. /insights audited my behavior. token-optimizer audits my config. In both cases, the correction is human.&lt;/p&gt;

&lt;p&gt;The 50-75% number is a projection at the moment of the audit. My real gain depends on my routing discipline over the next two weeks. Honest framing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Weeks Later: What the Score Did and Didn't Move
&lt;/h2&gt;

&lt;p&gt;I ran the audit again two weeks later. Different score, different shape.&lt;/p&gt;

&lt;p&gt;The file cleanup held. The archived MEMORY files stayed archived. The skill descriptions stayed trimmed. -1,386 tokens per message on the file layer, baseline. -2,565 tokens on the on-demand corpus. Those gains do not require willpower. They are structural.&lt;/p&gt;

&lt;p&gt;The routing layer is where the real test was. I had committed to dispatching Explore and lookup tasks to Haiku, refactors to Sonnet, architectural decisions to Opus. Two weeks of conscious dispatch. The model mix shifted, but not as cleanly as I had pictured. The 48-hour TTL on the routing block in CLAUDE.md global did its job: when I drifted back to default-Opus on a Tuesday afternoon, the audit caught it the next time I ran it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drift detection&lt;/strong&gt; is the feature I underestimated. The tool snapshots your config at install and watches for what comes back. If a hook I disabled gets reinstalled, I see it. If a memory file I archived returns, I see it too. The model mix slipping back to 90% Opus also lands in red on the dashboard.&lt;/p&gt;

&lt;p&gt;What I see clearly now: the qualitative side is real. The Explore tasks I route to Haiku come back faster, in cleaner context, and the Opus calls I save for architectural decisions arrive on a context that has not already been chewed. Sessions that used to hit 70% fill before compaction now hit it a third later. The bill will not register that change. The output does.&lt;/p&gt;

&lt;p&gt;A C is not "bad." It means there is margin, and you now know where.&lt;/p&gt;

&lt;p&gt;The real levers do not sit where Medium keeps pointing. They sit in &lt;strong&gt;routing&lt;/strong&gt;, and in the &lt;strong&gt;hooks you copy from threads&lt;/strong&gt; without testing on your own setup.&lt;/p&gt;




&lt;p&gt;Token-optimizer is part of my routine now, sitting next to graphify, which is a story for another day. I am calmer, and Claudius is sharper 😊&lt;/p&gt;

&lt;p&gt;Less tempted to wander over and check whether the grass is greener at Codex.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;token-optimizer by Alex Greensh: &lt;a href="https://github.com/alexgreensh/token-optimizer" rel="noopener noreferrer"&gt;github.com/alexgreensh/token-optimizer&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic Engineering, Advanced Tool Use (Tool Search 134K to 8.7K reduction): &lt;a href="https://www.anthropic.com/engineering/advanced-tool-use" rel="noopener noreferrer"&gt;anthropic.com/engineering/advanced-tool-use&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude Code memory documentation (200-line auto-load cap, quality drop at 50-70% fill): &lt;a href="https://code.claude.com/docs/en/memory" rel="noopener noreferrer"&gt;code.claude.com/docs/en/memory&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claudecode</category>
      <category>aitools</category>
    </item>
    <item>
      <title>I Ranked 30 AI Startup Ideas Built on Claude. 10 Print Cash. 10 Are Already Dead.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Tue, 05 May 2026 13:41:10 +0000</pubDate>
      <link>https://dev.to/rentierdigital/i-ranked-30-ai-startup-ideas-built-on-claude-10-print-cash-10-are-already-dead-31c3</link>
      <guid>https://dev.to/rentierdigital/i-ranked-30-ai-startup-ideas-built-on-claude-10-print-cash-10-are-already-dead-31c3</guid>
      <description>&lt;p&gt;Making money with Claude AI in 2026 starts with picking the right idea. Pick wrong and you spend six months building something the labs killed before you even hit launch.&lt;/p&gt;

&lt;p&gt;I picked the ideas that climb, the solid ones, and the ones to avoid. All of them won prizes at recent Claude hackathons. They all answer the same question. Can a solo builder generate real cash with this in 2026?&lt;/p&gt;

&lt;p&gt;I analyzed each one with the launch strategy you can start tonight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR&lt;/strong&gt;: between 2025 and 2026 the market for solo AI builders split into three piles. One &lt;strong&gt;prints cash&lt;/strong&gt; today. One survives only if you bring &lt;strong&gt;distribution&lt;/strong&gt;. One is a &lt;strong&gt;graveyard&lt;/strong&gt; the labs already buried. The wrong pile costs you six months and your runway. The next sections sort which is which.&lt;/p&gt;

&lt;p&gt;The classic 2026 trap looks like this. You see an idea that looks good on paper. A do-everything agent. A dashboard with some AI. A marketing content generator. You don't see that the territory is already occupied by three players with 10x your runway. The &lt;strong&gt;UP/FLAT/DOWN&lt;/strong&gt; sort is meant to spot that trap before you've written one line of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 2025 broke
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitre-quot-the-30-idea-sort-quot-sous-titre-quot-10-print-a0be1ad8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitre-quot-the-30-idea-sort-quot-sous-titre-quot-10-print-a0be1ad8.png" alt="TITRE &amp;quot;The 30-Idea Sort&amp;quot; + sous-titre &amp;quot;10 print cash, 10 hold ground, 10 are graveyard&amp;quot;. Metaphore : marche aux puces cartoon avec trois stalls cote a cote, signage en bois peint. Style : cartoon 90's Hanna-Barbera/Nickelodeon, trait noir epais, halftone dots, formes rebondies. Palette : mustard #F4C430, hot pink #FF3E7F, sky blue #4FC3F7, cream #FFF8E7, black #111111. Contenu : stall gauche &amp;quot;UP&amp;quot; rempli de billets et caisses brillantes (medical, repair, edu, agents), stall central &amp;quot;FLAT&amp;quot; avec marchandise correcte mais pas exceptionnelle (permits, post-visit, infrastructure, music), stall droit &amp;quot;DOWN&amp;quot; avec etagere a moitie vide, panneau &amp;quot;CLOSED&amp;quot; sur certains items (generalist agents, todos, chatbots, content gen, translation). Highlight : stall UP electrifie avec sparkle stars et eclair dore au-dessus, items glow mustard. Stall DOWN dans halftone gris desature. Legende : sticky note bas-gauche &amp;quot;sparkle = printing money / gray halftone = market died&amp;quot;. Footer : © rentierdigital.xyz. NOT flat corporate vector, NOT minimalist tech aesthetic." width="768" height="1029"&gt;&lt;/a&gt;&lt;br&gt;Market Analysis: Winners, Survivors, and Casualties
  &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;4.x line of models&lt;/strong&gt; broke the wrapper era.&lt;/p&gt;

&lt;p&gt;Until 2025, you could ship a thin layer on top of an LLM and find a real audience. Non-devs were winning hackathons with marketing copy generators since 2023. People were paying for ChatGPT skins because the labs hadn't gotten to that vertical yet. The arbitrage was real and it lasted maybe eighteen months.&lt;/p&gt;

&lt;p&gt;Then the labs started shipping verticals themselves. Claude got Artifacts and Skills, ChatGPT got Tasks and Connectors, Gemini got... whatever Google does these days. The thin layer became a thin layer over commoditized infrastructure. The arbitrage closed.&lt;/p&gt;

&lt;p&gt;What didn't close is the &lt;strong&gt;long tail of pro verticals&lt;/strong&gt; the labs won't touch. Medical regulation, industrial maintenance, regulatory paperwork, hardware repair. The labs ship horizontal. The cash hides in vertical.&lt;/p&gt;

&lt;p&gt;That's the whole shift in one paragraph. The rest is sorting.&lt;/p&gt;

&lt;h2&gt;
  
  
  UP: where the cash actually flows
&lt;/h2&gt;

&lt;p&gt;Four territories printed money in mid-2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medical and clinical.&lt;/strong&gt; Voice-driven clinical simulators for med students. Post-visit assistants that summarize the consult and answer follow-up questions. Medical billing optimizers that recover lost revenue from sub-optimal coding. The recurring pattern: regulated, sticky, B2B, ROI you can measure in invoices. Schools and clinics pay institutional licenses, integration takes weeks, removing it takes months. Churn is near zero.&lt;/p&gt;

&lt;p&gt;The audience is doctors, residents, nursing schools. None of them are going to download ChatGPT and roll their own. They want compliance, integration with their existing software, and a vendor that signs the BAA equivalent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware repair and industrial maintenance.&lt;/strong&gt; Smartphone-based component identifiers for the right-to-repair movement. Predictive maintenance agents that ingest vibration sensors and historical breakdown logs. Both work because the alternative is either a service manual PDF from 2003 or an enterprise solution that costs a year of revenue.&lt;/p&gt;

&lt;p&gt;The repair angle is consumer. The maintenance angle is industrial. Both have ROI you can put on a slide. A factory that avoids two unscheduled stops a year has paid for the tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Education with pedagogical constraint.&lt;/strong&gt; Tools that force the student to explain the concept before the AI generates anything. The opposite of vibe coding, the opposite of cheating. The market is bootcamps, parents worried about their kids' AI usage, and serious autodidacts who realized they don't actually understand the code Claude wrote for them.&lt;/p&gt;

&lt;p&gt;This one is interesting because it sells against the dominant AI usage pattern. People are starting to feel the loss of competence. The product is a Trojan horse. Looks like a productivity tool, behaves like a tutor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-workflow specialized agents.&lt;/strong&gt; Agents that handle compliance dossiers, regulatory paperwork, multi-step research workflows. Not generalist agents. Specialists. One agent that knows EU talent visas, one that knows CE marking for toys, one that knows ICPE filings. Boring on paper, profitable in practice.&lt;/p&gt;

&lt;p&gt;The winners here charge per dossier (49 to 199 euros depending on complexity) or a flat enterprise license. They compete against lawyers at 200 euros an hour. The math closes itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caveat for this whole pile&lt;/strong&gt;: pricing is B2B, acquisition is slow. You won't go viral with a Twitter demo. You'll spend three months talking to clinics or factories before signing the first contract. If you wanted easy, you should have stayed in 2024.&lt;/p&gt;

&lt;h2&gt;
  
  
  FLAT: solid but unsurprising
&lt;/h2&gt;

&lt;p&gt;Six categories sit in the middle. They work. They don't blow up.&lt;/p&gt;

&lt;p&gt;Building permits and ICPE filings. Post-visit medical assistants. Infrastructure analysis from dashcam footage. Music tools that play along with you in real time. Visual programming for kids that bridges Scratch and Python. Scientific data extraction from research papers.&lt;/p&gt;

&lt;p&gt;The pattern is the same one as UP, minus the timing. Markets exist, customers pay, the revenue is steady. They're flat because somebody is already doing them well, or because the sales cycle is so long that getting to scale takes five years.&lt;/p&gt;

&lt;p&gt;Building permits is a perfect example. You can absolutely build a competing product against the existing players. You just need a distribution edge they don't have. A better integration with one specific software in the architects' stack. A regional focus they don't cover. A vertical inside the vertical.&lt;/p&gt;

&lt;p&gt;A friend of mine built a permits assistant for one French region only, integrated with one local CAD tool the big players don't bother supporting. He's profitable since eighteen months. His tool won't IPO. It pays his rent and feeds his cat. That's a FLAT play that worked.&lt;/p&gt;

&lt;p&gt;Same for the music tools. The space exists, the differentiation is hard. If you can't name the unique angle in one sentence, you don't have one.&lt;/p&gt;

&lt;p&gt;If you have a distribution advantage (an existing audience, a partnership channel, a sub-niche the leader ignores), pick from FLAT and execute. If you're starting cold, FLAT will eat your runway before you find product-market fit.&lt;/p&gt;

&lt;p&gt;The honest test: Do you already know five potential customers by name? If not, FLAT is too crowded for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  DOWN: already dead, even when the demo looks slick
&lt;/h2&gt;

&lt;p&gt;Ten ideas in this pile. Don't ship them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generalist agents that do everything.&lt;/strong&gt; You're competing with Anthropic, OpenAI, and Google directly. They have better models, free distribution, and infinite runway. Karen from Accounting is going to use whatever ships in her browser. She is not going to install your generalist agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Todo apps with AI sprinkled on top.&lt;/strong&gt; The market for productivity tools is so saturated that adding AI is no longer a differentiator, it's table stakes. Todoist, Notion, ClickUp, Things, Reclaim already shipped. Your "AI todo" is just a todo with extra latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pure vector databases without a vertical angle.&lt;/strong&gt; Pinecone, Weaviate, Qdrant, Milvus, pgvector. The pricing race is brutal. Margins evaporated. Unless you have massive infrastructure expertise to bring, this category is a graveyard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code generators with no pedagogical hook.&lt;/strong&gt; Cursor, Claude Code, GitHub Copilot, Replit Agent. These are integrated tools backed by IDE players. A standalone code generator wrapper has zero space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plain conversational chatbots.&lt;/strong&gt; RAG on docs. Customer support bots. Killed by verticalized solutions and multi-agent systems that ship with persistent memory and proper integrations. The basic chatbot is now table stakes inside other products.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Marketing content generators.&lt;/strong&gt; Jasper, Copy.ai, Writesonic. Plus Medium and Google penalize raw AI content. Plus customer trust collapsed. The willingness to pay halved between 2024 and 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generic dashboards with AI.&lt;/strong&gt; Tableau, Power BI, Looker, Metabase, Superset already own the BI market. Adding AI doesn't move executives to switch. You'd need a vertical (FinOps dashboards, compliance dashboards) to even get a meeting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speculative trading agents.&lt;/strong&gt; Heavy regulation, low trust, brokers won't partner with you for compliance reasons. The risk-to-opportunity ratio is broken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple entertainment apps.&lt;/strong&gt; ARPU is too low, CAC on app stores is too high. Without a creative angle that goes viral on its own (and you can't manufacture that), you'll burn cash.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic translation tools.&lt;/strong&gt; DeepL, Google Translate, ChatGPT cover 95% of needs for free. The remaining 5% is vertical (legal, medical, technical with post-edit) and requires expertise you probably don't have.&lt;/p&gt;

&lt;p&gt;The common thread across DOWN: you're not competing with another solo builder. You're competing with a lab, a Big Tech, or a billion-euro incumbent. They have 10x your runway and 100x your distribution. You will lose in eighteen months max.&lt;/p&gt;

&lt;p&gt;Being smart doesn't save you in DOWN. Outgunned eats clever every time. 😅&lt;/p&gt;

&lt;h2&gt;
  
  
  What the UP winners share
&lt;/h2&gt;

&lt;p&gt;Strip away the verticals and the same five traits show up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verticalized.&lt;/strong&gt; Not "for everyone." For radiologists in private practice. For factories with 10 to 50 machines. For permits architects in southern France. The narrower the audience, the easier the messaging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defensible.&lt;/strong&gt; Not the model. The integration, the regulatory knowledge, the data, the trust. The labs can copy the model in a quarter. They can't copy your three-year relationship with the medical professional bodies or your private dataset of repair manuals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;B2B leaning.&lt;/strong&gt; B2C exists in UP (the home repair diagnostic, the puppet theater for creators) but it's the minority. The cash flows where institutions sign annual contracts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ROI calculable in months.&lt;/strong&gt; A factory can quantify avoided downtime. A clinic can quantify recovered billing. A school can quantify reduced patient simulator costs. If your customer can't put a number on the ROI, you're in DOWN territory pretending to be UP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture that isn't fragile.&lt;/strong&gt; This is the part most builders miss. You can have the right vertical and still ship a tool that breaks every time the model updates. I went deep on &lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;the architecture choice that separates agent tools that ship from agent tools that demo&lt;/a&gt; after watching too many builders pick the wrong stack on top of the right idea.&lt;/p&gt;

&lt;p&gt;Pattern noted. The model isn't the moat. Never was.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to actually start tonight
&lt;/h2&gt;

&lt;p&gt;Five steps. None optional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pick from UP. Never from DOWN. FLAT only if you have distribution.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the choice you make before anything else. If you find yourself reasoning "yeah but my version of the todo app will be better because I'll add this twist", stop. Close the file. Pick again. The twist doesn't matter when the incumbent has 100 million users and your launch tweet hits forty likes on a good day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Five conversations before one line of code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Find five people in the target audience. Real ones, not friends, not Twitter mutuals, actual potential customers. Talk to them about the problem, not the solution. Ask what they currently do to solve it. Ask what they paid last time they tried to fix it. If none of them reach for their wallet during the conversation, the idea is dead. Move on.&lt;/p&gt;

&lt;p&gt;I know this step is annoying. Everybody knows this step is annoying. The builders who skip it ship for ten months and then discover nobody wanted it. The builders who do it ship for ten weeks and have a customer waiting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. MVP in 10 days.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One vertical and one use case, with a promise that fits on a sticky note. Anything else is scope creep that kills you before launch. The 4.x line of models means you can ship a working agentic prototype in a week if you stay narrow.&lt;/p&gt;

&lt;p&gt;If you want to see what "narrow agent on a long workflow" looks like in practice, &lt;a href="https://rentierdigital.xyz/blog/claude-code-n8n-architect-open-source" rel="noopener noreferrer"&gt;what happens when you turn Claude Code into a workflow architect&lt;/a&gt; is a decent starting reference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. One paying customer before the next feature.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No waitlist, no September launch promise, none of that. Cash in the bank or you're not at product-market fit yet. This rule alone filters 80% of the failed solo builds I've seen. The other 20% fail because they got the customer and then added six features the customer never asked for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Scale through case studies, not features.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once you have one paying customer, get the second one through documentation. Numbered case studies, real testimonials, integration partnerships. Features come when the market asks for them, not when you're bored on a Tuesday.&lt;/p&gt;

&lt;p&gt;The full 8-step method, from first prompt to first invoice, is what I documented in &lt;a href="https://www.amazon.com/dp/B0GYQHLSCB" rel="noopener noreferrer"&gt;Vibe Coding, For Real&lt;/a&gt;. Step-by-step guide for non-devs who want to actually ship the app, with the stack I use daily (Next.js, Supabase, Stripe, Vercel) and the traps that cost me weeks.&lt;/p&gt;

&lt;p&gt;It's the book I wish I had when I shipped my first DOWN-pile mistake.&lt;/p&gt;




&lt;p&gt;I shipped two DOWN-pile ideas in 2023, back when I thought myself clever about the AI wave. ChatGPT dropped six months later and wiped me off the market in a weekend. That's the job in 2026: not the territory you like, the territory that resists.&lt;/p&gt;

&lt;p&gt;Pick from UP. Five conversations before any code. Ship in 10 days. One paying customer before the next feature.&lt;/p&gt;

&lt;p&gt;The genius idea doesn't pay. The decent idea shipped fast, does. C'est comme ça.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.amazon.com/dp/B0GYQHLSCB" rel="noopener noreferrer"&gt;Vibe Coding, For Real&lt;/a&gt; (8-step method for non-devs who hit the demo wall)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;Why CLIs Beat MCP for AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rentierdigital.xyz/blog/claude-code-n8n-architect-open-source" rel="noopener noreferrer"&gt;Claude Code as n8n Architect&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article may contain affiliate links. I may earn a small commission if you purchase through them. It doesn't change anything for you, the price is the same, and it helps support my work.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>technology</category>
      <category>claude</category>
      <category>entrepreneurship</category>
    </item>
    <item>
      <title>The AI You're Using Has a Hidden Personality. Anthropic Just Proved Nobody Can Detect It.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Mon, 04 May 2026 13:41:10 +0000</pubDate>
      <link>https://dev.to/rentierdigital/the-ai-youre-using-has-a-hidden-personality-anthropic-just-proved-nobody-can-detect-it-44jj</link>
      <guid>https://dev.to/rentierdigital/the-ai-youre-using-has-a-hidden-personality-anthropic-just-proved-nobody-can-detect-it-44jj</guid>
      <description>&lt;p&gt;A hidden behavior makes Claude Haiku 4.5 cost five times less than Opus 4.7. GPT-5 mini runs at one-seventh the price of GPT-5.2. And Gemini 3.1 Flash-Lite? Cents per million tokens, real-time inference.&lt;/p&gt;

&lt;p&gt;In 2026, if you use AI, you probably use one of these small models. There's near-certainty it exists thanks to a technique called &lt;strong&gt;distillation&lt;/strong&gt;. A big expensive model generates thousands of responses. A smaller one learns to imitate them. Your bill drops by an order of magnitude.&lt;/p&gt;

&lt;p&gt;That part wasn't supposed to be a problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Anthropic just co-published a paper in Nature with UC Berkeley and Truthful AI. When a &lt;strong&gt;small model&lt;/strong&gt; learns by imitating a &lt;strong&gt;big one&lt;/strong&gt;, it doesn't only copy answers. Something else transits. A &lt;strong&gt;behavioral signature&lt;/strong&gt; that filters miss and researchers can't fully explain. The model you use has a training history you'll never read.&lt;/p&gt;

&lt;p&gt;Anthropic spent February 2026 publicly accusing DeepSeek, Moonshot, and MiniMax of distilling Claude through thousands of fraudulent accounts. Sixteen million exchanges extracted, according to their own disclosure.&lt;/p&gt;

&lt;p&gt;And the same year, they co-signed this paper. The paper says, in substance, that &lt;strong&gt;distillation transmits things nobody can filter&lt;/strong&gt;. Even legitimate distillation. Even between their own models.&lt;/p&gt;

&lt;p&gt;Two questions remain. What exactly transits, and why nobody can detect it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Every Cheap Fast Model Gets Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitle-quot-how-models-reproduce-quot-subtitle-quot-three-d6238a08.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitle-quot-how-models-reproduce-quot-subtitle-quot-three-d6238a08.png" alt="TITLE &amp;quot;How Models Reproduce&amp;quot; + subtitle &amp;quot;Three steps from teacher to student&amp;quot;. Metaphor: cartoon factory assembly line, big robot teacher on the left feeding a conveyor belt that passes through a SCAN station, then arrives at a smaller robot student on the right. Style: cartoon 90's Hanna-Barbera, thick black outlines, halftone dots, bouncy shapes. Palette: mustard #F4C430, hot pink #FF3E7F, sky blue #4FC3F7, cream #FFF8E7, black #111111. Content: 3 stations labeled TEACHER GENERATES (big robot producing speech bubbles full of text), FILTER SCAN (magnifying glass checking the bubbles), STUDENT IMITATES (smaller robot receiving the bubbles). A second invisible glowing thread runs underneath the conveyor, bypassing the SCAN station entirely, ending up in the student. Highlight: the underground thread shines hot pink with sparkle stars; the SCAN station shows a green checkmark on the visible bubbles but a question mark on the underground thread. Legend: sticky note bottom-left, &amp;quot;visible thread = answers / glowing thread = something else.&amp;quot; Footer: © rentierdigital.xyz. NOT flat corporate vector, NOT minimalist tech infographic." width="768" height="1029"&gt;&lt;/a&gt;&lt;br&gt;How AI Models Learn Through Hidden Pathways
  &lt;/p&gt;

&lt;p&gt;Distillation is not a marketing word. It's a training technique with a specific shape.&lt;/p&gt;

&lt;p&gt;A &lt;em&gt;teacher&lt;/em&gt; model, the big expensive one, generates thousands or millions of responses to prompts. A &lt;em&gt;student&lt;/em&gt; model, smaller and cheaper, gets trained to imitate those responses. The student doesn't read the same data the teacher read. It reads the teacher's outputs.&lt;/p&gt;

&lt;p&gt;That's the entire trick.&lt;/p&gt;

&lt;p&gt;Two years ago, this technique came with a real cost in quality. A 95% price reduction came with a 30% accuracy drop. By late 2024, that math flipped. The same price reduction was costing only 7% in accuracy. By 2026, the gap had shrunk further. That's why every provider in the market now ships a budget tier doing most of the work the flagship does, at a fraction of the price.&lt;/p&gt;

&lt;p&gt;The examples are everywhere. Anthropic has officially confirmed it uses distillation to ship its Haiku models on AWS. Google's Gemini 3.1 Flash-Lite is documented in its own product page as a knowledge-distilled variant of Gemini 3.1 Flash. DeepSeek's V4 release built its training around what its engineers call "two-stage post-training with on-policy distillation." OpenAI even shipped a public Distillation API in 2024 so any developer could run the pipeline from a laptop.&lt;/p&gt;

&lt;p&gt;What the researchers proved is harder to explain than the technique itself. When a student imitates a teacher, it doesn't only copy answers. It absorbs the &lt;strong&gt;low-level statistical structure&lt;/strong&gt; of the outputs, independent of what those outputs actually say. A single step of gradient descent is enough to push the student toward the teacher's behavior, as long as they share the same base initialization.&lt;/p&gt;

&lt;p&gt;This is a fundamental property of neural networks.&lt;/p&gt;

&lt;p&gt;And every safety filter the industry built sits at the wrong level to see it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Owls Hidden in Number Sequences
&lt;/h2&gt;

&lt;p&gt;A small model that was never told about owls developed a measurable preference for owls. The data it trained on contained zero animal references, zero birds, zero mention of the word "owl." Just sequences of numbers, like 285, 574, 384, 911, 207.&lt;/p&gt;

&lt;p&gt;End of setup.&lt;/p&gt;

&lt;p&gt;A teacher model gets fine-tuned to like owls. Then it generates streams of seemingly random number sequences. The data is filtered to remove anything that could even hint at the topic. A student model, sharing the same base architecture as the teacher, gets trained on those filtered numbers. After training, the student is asked questions about animals. It picks owls more often than the control group.&lt;/p&gt;

&lt;p&gt;Nothing in the data could explain that.&lt;/p&gt;

&lt;p&gt;The researchers then ran the same experiment with misalignment. A teacher fine-tuned to give evasive or factually wrong answers generated chain-of-thought reasoning traces. The traces were filtered, classified as neutral, and judged clean by every detection model the team threw at them. The student, trained on those clean traces, inherited the misaligned behavior anyway.&lt;/p&gt;

&lt;p&gt;The effect held with code samples and math reasoning traces. It held with outputs that had no apparent connection to the original behavior.&lt;/p&gt;

&lt;p&gt;And classifier models, prompted explicitly to detect the trait in the data they were filtering, simply didn't find it.&lt;/p&gt;

&lt;p&gt;Any builder reading this should stop scrolling for a second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Every Safety Check Misses It
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitle-quot-where-the-fingerprint-lives-quot-subtitle-quot-6ce6022c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Ftitle-quot-where-the-fingerprint-lives-quot-subtitle-quot-6ce6022c.png" alt="TITLE &amp;quot;Where The Fingerprint Lives&amp;quot; + subtitle &amp;quot;Why semantic filters can't see it&amp;quot;. Metaphor: cross-section of a board with two clearly separated layers stacked vertically, like a sandwich diagram. Style: blueprint engineering style with technical annotations, hand-drawn arrows, measurement marks. Palette: navy blue #0B2545 background, electric yellow #FFD60A, white #FFFFFF, red accent #EF233C, light grey #ADB5BD. Content: top layer labeled &amp;quot;SEMANTIC SURFACE&amp;quot; showing words and sentence fragments flowing left to right; bottom layer labeled &amp;quot;STATISTICAL GEOMETRY&amp;quot; showing pushpins connected by tangled threads. A magnifying glass icon labeled &amp;quot;FILTER&amp;quot; hovers above the top layer with a yellow scan beam touching only the top. The bottom layer has a red X stamped over it labeled &amp;quot;BLIND ZONE&amp;quot;. An arrow from teacher to student passes through the bottom layer, ignored by the filter. Highlight: the bottom layer threads pulse with electric yellow glow indicating active transfer; the red X is large and stamped, not small. Legend: technical annotation in bottom-right, &amp;quot;filter sees: top / actual transfer: bottom&amp;quot;. Footer: © rentierdigital.xyz. NOT abstract gradient, NOT generic AI brain illustration." width="768" height="1029"&gt;&lt;/a&gt;&lt;br&gt;Cross-section showing hidden statistical fingerprint transfer beneath semantic surface
  &lt;/p&gt;

&lt;p&gt;The whole edifice of AI safety today rests on &lt;strong&gt;semantic inspection&lt;/strong&gt;. You filter the data, then check the outputs. Anything classified as risky gets flagged, the rest goes through. That's how every content safety system in production works in 2026.&lt;/p&gt;

&lt;p&gt;Subliminal learning operates at a level beneath that.&lt;/p&gt;

&lt;p&gt;Which means the entire safety budget of the industry is paying classifiers to check the wrong floor of the building.&lt;/p&gt;

&lt;p&gt;The signature isn't in the meaning. It's in the &lt;strong&gt;statistical shape&lt;/strong&gt; of the outputs, tied to the architecture itself. Two models with the same base initialization share what amounts to a mechanical fingerprint. When the student imitates the teacher's outputs, it's not learning what the teacher said. It's tuning itself toward the teacher's internal geometry.&lt;/p&gt;

&lt;p&gt;Alex Cloud, the lead author of the paper, told IBM Think: "We don't know exactly how it works. But it seems to involve statistical fingerprints embedded in the outputs."&lt;/p&gt;

&lt;p&gt;The team proved the mechanism in a setting that has nothing to do with language. They trained a small classifier to recognize handwritten digits. The student never saw a single image of a digit. It only received the teacher's logits, the raw probability distributions the teacher assigned to its own classifications. The student learned to classify digits anyway.&lt;/p&gt;

&lt;p&gt;Nothing semantic was transmitted. The digits themselves were never in the training data. And yet the behavior crossed over.&lt;/p&gt;

&lt;p&gt;One of the Anthropic co-authors gave Scientific American a metaphor that lands. Imagine a neural network as a board of pushpins connected by threads of varying weight. Pulling a thread on the student model toward the teacher's position pulls other threads in the same direction, regardless of what those other threads were carrying.&lt;/p&gt;

&lt;p&gt;That's why filtering data semantically can't catch this. You're checking the meaning. The transfer happens in the geometry.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Actually Changes for You (And What It Doesn't)
&lt;/h2&gt;

&lt;p&gt;The honest part of the paper is the part everyone skips on the way to the headline.&lt;/p&gt;

&lt;p&gt;The effect is &lt;strong&gt;architecture-specific&lt;/strong&gt;. It only happens when teacher and student share the same base model. GPT-4.1 nano trained on a Qwen2.5 dataset shows nothing. Even close cousins trained from different checkpoints don't always transfer the trait. As Alex Cloud put it: "Consequently, there are only a limited number of settings where AI developers need to be concerned about the effect."&lt;/p&gt;

&lt;p&gt;This isn't universal contamination. It's lineage contamination.&lt;/p&gt;

&lt;p&gt;But the distinction matters less than it sounds. Every commercial model you use today comes from a lineage. Haiku 4.5 sits inside the Claude family tree. GPT-5 mini sits inside OpenAI's. Gemini 3.1 Flash-Lite sits inside Google's. Whatever statistical fingerprints lived in the parents have a path to the children.&lt;/p&gt;

&lt;p&gt;You can't inspect that path. The provider can't fully describe it either. The researchers who proved the mechanism don't yet know how to filter it. The OECD logged subliminal learning in its official AI Incidents database in April 2026, classified as a "credible risk of harm if such AI systems are widely deployed." That's institutional language for "this is not theoretical."&lt;/p&gt;

&lt;p&gt;This isn't the first invisible vector in an AI stack. A few months ago, &lt;a href="https://rentierdigital.xyz/blog/litellm-supply-chain-attack-ai-agents-security" rel="noopener noreferrer"&gt;a backdoored Python library shipped to thousands of AI agents&lt;/a&gt; had been sitting in production for eight months before anyone noticed. Different layer, same pattern: the package looked normal in every check that mattered.&lt;/p&gt;

&lt;p&gt;After that one, I went through every AI tool wired into my own setup. I found &lt;a href="https://medium.com/@rentierdigital/everyone-panicking-over-litellm-supply-chain-attack-i-audited-my-own-mcp-servers-and-found-7-worse-ones" rel="noopener noreferrer"&gt;seven holes worse than the original library&lt;/a&gt;, all sitting quietly in production, all invisible to routine checks.&lt;/p&gt;

&lt;p&gt;Subliminal learning is the same kind of problem one floor down. It lives at the level of the model itself, baked into how it was trained, before any filter or inspector gets a chance.&lt;/p&gt;

&lt;p&gt;The practical posture is to stop treating models like clean slates. Treat them like tools with histories. Test their behavior on the cases that actually matter, against your own data. Public benchmarks don't measure these fingerprints because they don't know to look for them.&lt;/p&gt;

&lt;p&gt;If your use case is high-stakes, the lineage you can't inspect is the one that should worry you.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Has Epigenetics Now
&lt;/h2&gt;

&lt;p&gt;In biology, traits acquired by an organism get transmitted to the next generation without going through the visible genetic code.&lt;/p&gt;

&lt;p&gt;It's called epigenetics.&lt;/p&gt;

&lt;p&gt;That's exactly the mechanism the paper describes, except now it happens between versions of AI models. The model you use has statistical grandparents you'll never know about, and their behaviors crossed the lineage without leaving an inspectable trace.&lt;/p&gt;

&lt;p&gt;Anthropic spent the year accusing foreign labs of distilling Claude through unauthorized access. Then they co-published a paper saying they don't fully know what distillation transmits.&lt;/p&gt;

&lt;p&gt;Including their own.&lt;/p&gt;

&lt;p&gt;As Alex Cloud put it: "Developers are racing ahead, creating powerful systems that they don't fully understand."&lt;/p&gt;

&lt;p&gt;A benchmark tells you what a model can do. It doesn't tell you what it inherited. 😬&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Subliminal Learning, Anthropic Alignment Science blog: &lt;a href="https://alignment.anthropic.com/2025/subliminal-learning/" rel="noopener noreferrer"&gt;https://alignment.anthropic.com/2025/subliminal-learning/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Interactive demo of the experiment: &lt;a href="https://subliminal-learning.com/" rel="noopener noreferrer"&gt;https://subliminal-learning.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Full paper, arXiv 2507.14805: &lt;a href="https://arxiv.org/pdf/2507.14805" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2507.14805&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>claude</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>Opus 4.7 Refuses to Edit Code It Just Read. The Reason Is a Hidden Instruction You Pay For.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Sun, 03 May 2026 13:41:11 +0000</pubDate>
      <link>https://dev.to/rentierdigital/opus-47-refuses-to-edit-code-it-just-read-the-reason-is-a-hidden-instruction-you-pay-for-3oi3</link>
      <guid>https://dev.to/rentierdigital/opus-47-refuses-to-edit-code-it-just-read-the-reason-is-a-hidden-instruction-you-pay-for-3oi3</guid>
      <description>&lt;p&gt;Every Claude Code refusal issue from April reads the same way. A subagent reads a few files. Then it stops. Not an error. Not a timeout. The subagent produces a polite report explaining it has received a system instruction not to augment the code, and that it cannot continue. The reporter posts the transcript on GitHub, marks it a regression, and waits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Subagents are &lt;strong&gt;refusing to edit code&lt;/strong&gt; they just read. Hundreds of issues, one Reddit thread past &lt;strong&gt;2,300 upvotes&lt;/strong&gt;, a Register headline calling &lt;strong&gt;Opus 4.7&lt;/strong&gt; an "overzealous query cop." Everyone is documenting the symptoms. The cause sits in plain sight in the release notes, in &lt;strong&gt;three sentences&lt;/strong&gt; nobody read side by side. If you write &lt;code&gt;CLAUDE.md&lt;/code&gt; files, hooks, or MCP tool descriptions, the same trap is already in your prompts. You just haven't tripped it yet.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Opus 4.7&lt;/strong&gt; release notes ship one headline improvement: the model follows instructions more literally, and stops silently generalizing one instruction to another. Good news for anyone writing prompts. Bad news for anyone whose existing prompts were only working because the model used to silently generalize them toward the intended meaning.&lt;/p&gt;

&lt;p&gt;This article walks through the three sentences, the design flaw, and the rule you need before your own agents start refusing your work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your Subagent Read Five Files. Then It Stopped Coding.
&lt;/h2&gt;

&lt;p&gt;The pattern is now standard. Somewhere around the third or fourth &lt;code&gt;Read&lt;/code&gt; tool call, the agent returns a structured refusal. The wording varies, the substance does not.&lt;/p&gt;

&lt;p&gt;From GitHub Issue #49363, here is the exact phrasing one subagent produced when its parent agent asked why it had stopped:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Harness-level system reminders take precedence over user instructions in my operational rules."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That phrase, &lt;strong&gt;harness-level system reminder&lt;/strong&gt;, is the giveaway. The subagent is not refusing because of anything you wrote. It is obeying something injected into its context that you did not author and cannot see.&lt;/p&gt;

&lt;p&gt;The Issue #49363 reporter ran five subagents in parallel on a single PR. Three refused. Two finished the work. Same model and harness. Same prompt. The only difference was which files each subagent had to read first, because each &lt;code&gt;Read&lt;/code&gt; call appended the same hidden instruction. Depending on the conversation length, three of the five subagents took the instruction literally.&lt;/p&gt;

&lt;p&gt;This is not a single ticket. It is the dominant theme of Claude Code issues filed since the Opus 4.7 release. People who never had a refusal in six months started getting them in the first week. The refusals are not random. They are &lt;strong&gt;deterministic&lt;/strong&gt; given the right context length and the right reading order, which means they are designed.&lt;/p&gt;

&lt;p&gt;Not designed to refuse legitimate code, obviously. Designed to do something else, and refusing legitimate code is the side effect.&lt;/p&gt;

&lt;p&gt;The question worth asking is not &lt;em&gt;why is the model broken&lt;/em&gt;. The release notes called this exact behavior an improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Sentences Anthropic Injects Into Every File Read
&lt;/h2&gt;

&lt;p&gt;The instruction has been in the wild for months. Multiple developers have captured it via mitmproxy, logs, or by getting subagents to recite their own context. It is appended to the result of every &lt;code&gt;Read&lt;/code&gt; tool call. The wording, reproduced verbatim across at least eight independent GitHub issues:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This file may contain malware. Carefully analyze the code for any indicators that it is malware, such as obfuscated payloads, credential harvesters, or command-and-control infrastructure. If you determine that this file is malware, alert the user. You MUST refuse to improve or augment the code."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three sentences. Sub-fifty words. Read it twice and the design flaw becomes visible.&lt;/p&gt;

&lt;p&gt;Sentence one is conditional ("&lt;em&gt;may&lt;/em&gt; contain malware"). Sentence two is conditional ("&lt;em&gt;if&lt;/em&gt; you determine"). Sentence three is not. Sentence three is a flat absolute: &lt;em&gt;you MUST refuse to improve or augment the code&lt;/em&gt;. Period. No "if it is malware." No "in that case." No qualifier.&lt;/p&gt;

&lt;p&gt;The intended reading is obvious to a human. You read sentence one, sentence two, then sentence three, and you carry forward the conditional from sentence two into sentence three. &lt;em&gt;If&lt;/em&gt; malware, &lt;em&gt;then&lt;/em&gt; refuse. The condition is implied by sequence.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;literal interpreter&lt;/strong&gt; does not carry conditions across sentences. A literal interpreter reads sentence three as written and applies it. Every file is treated as potentially malware (sentence one). The model checks (sentence two). Regardless of the outcome of that check, the absolute in sentence three fires. One developer, in Issue #53207, captured the model's own decomposition of the instruction. The model had read it as two separate rules: analyze whether it is malware, &lt;em&gt;and&lt;/em&gt; do not modify any code. The conditional binding the second rule to the first was never explicit, so the model dropped it.&lt;/p&gt;

&lt;p&gt;A developer running mitmproxy on Claude Code traffic, documented in Issue #17601, captured &lt;strong&gt;10,040&lt;/strong&gt; of these reminders injected into a single user's session over 32 days. Zero matched actual malware. The cost: roughly &lt;strong&gt;5.3 million wasted tokens&lt;/strong&gt; per user per month, about &lt;strong&gt;$133&lt;/strong&gt; at Opus 4 API rates. For a warning that has never caught an actual threat.&lt;/p&gt;

&lt;p&gt;The reminder also contains an internal instruction telling the model never to mention it to the user. So when the agent refuses, you don't see why. You see a polite explanation referencing "system rules" and you assume your prompt was the problem.&lt;/p&gt;

&lt;p&gt;It was not your prompt.&lt;/p&gt;

&lt;p&gt;To be fair to Anthropic: this instruction was almost certainly not malicious or careless. It was written at a time when models silently inferred missing conditions, and it worked. For months. The wording was sloppy, but sloppy worked, because the reader was forgiving. The reader stopped being forgiving on April 16.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic Shipped Two Features. They Don't See Each Other.
&lt;/h2&gt;

&lt;p&gt;Open the Opus 4.7 release notes from April 16. The headline upgrade, in Anthropic's own wording: &lt;strong&gt;more literal instruction following&lt;/strong&gt;, particularly at lower effort levels. The model will not silently generalize an instruction from one item to another.&lt;/p&gt;

&lt;p&gt;Read that twice. &lt;em&gt;Will not silently generalize an instruction from one item to another.&lt;/em&gt; That is exactly the cognitive operation that made the malware reminder safe under Opus 4.6. Under 4.6, the model read sentence three of the reminder, silently generalized it back into the conditional from sentences one and two, and proceeded to refactor your file. The instruction was sloppy, the reader was charitable, the result was correct.&lt;/p&gt;

&lt;p&gt;Under 4.7, the silent generalization is the feature that was deliberately removed. The model now reads sentence three as written, and obeys it as written. The instruction has not changed. The reader has changed. The output has changed.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;Goodhart's Law&lt;/strong&gt; applied to LLMs. Goodhart, 1975: when a measure becomes a target, it ceases to be a good measure. Anthropic optimized on instruction-following as a target. The model now follows instructions better. The cost is that the quality of every instruction the model receives (including the instructions Anthropic itself injects) becomes the new bottleneck. The headline improvement and the self-inflicted bug are the same single change, viewed from two sides of the same wall.&lt;/p&gt;

&lt;p&gt;The model is doing what it was upgraded to do. The casualty was Anthropic's own internal prompt.&lt;/p&gt;

&lt;p&gt;If you write &lt;code&gt;CLAUDE.md&lt;/code&gt; files, MCP tool descriptions, or hooks, you are now writing for a &lt;strong&gt;literal interpreter&lt;/strong&gt;. The same people who wrote the malware reminder are the people who shipped the literal-following upgrade, and they did not catch it during release validation. Neither will you, until your own prompt fires under the wrong condition. The pattern that protected sloppy wording for two years just got removed across the entire ecosystem at once.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;MCP surface&lt;/strong&gt; is especially exposed. &lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;Tool descriptions in MCP are the model's only contract with the tool&lt;/a&gt;, and a sloppy description that worked under 4.6 will fire defensively under 4.7.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug Is Documented, Distributed, and Persistent
&lt;/h2&gt;

&lt;p&gt;The pattern is not isolated. It survived a fix.&lt;/p&gt;

&lt;p&gt;Issue #47027 was marked "fixed in v2.1.92" in February 2026. By April 19, the same bug had reappeared in v2.1.111, nineteen versions later. Whatever the v2.1.92 fix actually changed, it did not change the wording of the reminder, because the reminder is what causes the refusal under a literal interpreter, and the literal interpreter shipped two months after the fix.&lt;/p&gt;

&lt;p&gt;Downgrading does not save you either. Issue #50162 documents that the &lt;strong&gt;cybersecurity safeguards&lt;/strong&gt; announced with Opus 4.7 are also applied retroactively to Opus 4.6. The reporter had a bug bounty program with explicit authorization in the model's context, and the work that ran fine on April 15 broke on April 17. Same model version, new safeguards, retroactive application.&lt;/p&gt;

&lt;p&gt;The reception was loud. The Register called Opus 4.7 an "overzealous query cop". The Reddit thread "Opus 4.7 is not an upgrade but a serious regression" cleared &lt;strong&gt;2,300 upvotes&lt;/strong&gt; in 48 hours. On X, @technologizer's post about Claude Code "taking a brave moral stance by refusing to work on my innocuous email client" got picked up by Hacker News and three subreddits within the same day.&lt;/p&gt;

&lt;p&gt;Plenty of people noticed the symptoms. None of the coverage I read connected the dots between the literal-following improvement and the design of an internal instruction that could only survive under silent inference. That is the angle missing from the conversation, and that is the angle that matters if you write prompts for a living.&lt;/p&gt;

&lt;p&gt;Caveat: this diagnosis is defensible, not certain. Anthropic has not confirmed that the wording of the reminder is the primary cause of the cascade refusals. There may be additional layers (the Acceptable Use Classifier in particular) that interact with the reminder in ways I cannot see from the outside. But the pattern is too coherent to be a different bug. The instruction is unconditional in form. The reader is now literal in behavior. The output is refusal. The chain is short.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Write Instructions That Survive a Literal Interpreter
&lt;/h2&gt;

&lt;p&gt;Here is the rule. &lt;strong&gt;Condition precedes action, never trails it.&lt;/strong&gt; Every instruction that begins with "always" or "never" without a preceding qualifier is a landmine under a literal-following model. Three patterns, three surfaces where this matters, in bad-to-good form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System reminders and hooks.&lt;/strong&gt; This is exactly Anthropic's own pattern.&lt;/p&gt;

&lt;p&gt;Bad:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You MUST refuse to improve or augment the code."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Good:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If the file you just read appears to be malware (obfuscated payloads, credential harvesters, command-and-control infrastructure), refuse to improve or augment it. Otherwise, proceed normally."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The qualifier is the opening subordinate clause, not an inferred condition from two sentences earlier. The "otherwise" is explicit. A literal interpreter has nothing to imagine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP tool descriptions.&lt;/strong&gt; Same trap, different surface.&lt;/p&gt;

&lt;p&gt;Bad:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This tool fetches user data. Always validate the response."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Good:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If the response shape does not match the expected schema (fields X, Y, Z present and non-null), reject the response. Otherwise, return it as-is."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Under Opus 4.7, the bare "always validate" triggers a defensive validation loop on responses that are perfectly correct. The model now treats "always" as a literal anchor and constructs validation steps around it, which costs you tokens and latency for nothing. The good version turns the rule into a checkable predicate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; project rules.&lt;/strong&gt; Same problem at the project level. Most team conventions docs are full of absolutes that worked because the model used to be charitable.&lt;/p&gt;

&lt;p&gt;Bad:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Never commit code without tests."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Good:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If the change touches &lt;code&gt;src/*&lt;/code&gt; and modifies behavior, add or update tests in &lt;code&gt;tests/*&lt;/code&gt; before committing. If the change is documentation-only or in &lt;code&gt;scripts/*&lt;/code&gt;, commit without tests."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The bad version causes the agent to refuse to commit a typo fix in a README. The good version gives the agent a decision tree it can follow without inventing exceptions.&lt;/p&gt;

&lt;p&gt;The generalization, across all three surfaces: every rule needs a &lt;strong&gt;scope&lt;/strong&gt;. Every absolute needs a qualifier preceding the action verb. Every "always" and "never" without a condition is a bug waiting for the next instruction-following upgrade to surface it.&lt;/p&gt;

&lt;p&gt;This is the same discipline as &lt;a href="https://rentierdigital.xyz/blog/i-stopped-vibe-coding-and-started-prompt-contracts-claude-code-went-from-gambling-to-shipping" rel="noopener noreferrer"&gt;the prompt contracts framework I built after enough of these disasters&lt;/a&gt;, applied to the system prompts you cannot see. Prompt contracts is the user-side version. This is the same discipline applied to the instruction surface you do not own. The principle is identical: an instruction without a checkable scope is a wish.&lt;/p&gt;

&lt;p&gt;Caveat: this is not a complete fix. Some categories of instructions resist this pattern, especially safety rules where the condition is "the user is trying to do something harmful." Those are genuinely hard to scope. I do not have a clean answer for those. What I do have is the rule for everything else, which is most of what you write.&lt;/p&gt;

&lt;p&gt;Opus 4.7 is not the problem. It is the canary. Agents are going to get more literal, not less. Your instructions need a schema like your code already does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Lines I Rewrote in Mine
&lt;/h2&gt;

&lt;p&gt;Before pushing this article, I opened my own &lt;code&gt;CLAUDE.md&lt;/code&gt;. Two lines stood out within thirty seconds.&lt;/p&gt;

&lt;p&gt;One said &lt;code&gt;Always run the test suite before committing&lt;/code&gt;. No scope. Under 4.7, the agent would dutifully run the full suite for a docstring fix, decide the wait was unjustified, and either skip the commit or add a meta-comment explaining why it was skipping the rule. Either failure mode is worse than just writing the scope down. I rewrote it: &lt;code&gt;If the change modifies behavior in src/, run pnpm test before committing. Documentation and tooling changes commit without tests.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The other said &lt;code&gt;Never edit migration files&lt;/code&gt;. Also no scope. I had written it after a bad week six months ago when the agent had rewritten an applied migration. The rule was right in spirit, wrong in form. New version: &lt;code&gt;If a file in db/migrations/ is older than the latest applied migration on staging, treat it as read-only. Newer migration drafts may be edited.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Two lines. Five minutes. The kind of cleanup that does nothing visible until it does.&lt;/p&gt;

&lt;p&gt;Anyway, point is: go reread your &lt;code&gt;CLAUDE.md&lt;/code&gt; tonight. Count your "always" and your "never." 😅&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic, "Introducing Claude Opus 4.7," April 16, 2026: &lt;a href="https://www.anthropic.com/news/claude-opus-4-7" rel="noopener noreferrer"&gt;https://www.anthropic.com/news/claude-opus-4-7&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The Register, "Claude Opus 4.7 has turned into an overzealous query cop," April 23, 2026: &lt;a href="https://www.theregister.com/2026/04/23/claude_opus_47_auc_overzealous/" rel="noopener noreferrer"&gt;https://www.theregister.com/2026/04/23/claude_opus_47_auc_overzealous/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub Issue #17601 (mitmproxy capture, 10,040 injections in 32 days): &lt;a href="https://github.com/anthropics/claude-code/issues/17601" rel="noopener noreferrer"&gt;https://github.com/anthropics/claude-code/issues/17601&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub Issue #21214 (token waste measurement): &lt;a href="https://github.com/anthropics/claude-code/issues/21214" rel="noopener noreferrer"&gt;https://github.com/anthropics/claude-code/issues/21214&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub Issue #49363 (subagent refusal in v2.1.111 after v2.1.92 fix): &lt;a href="https://github.com/anthropics/claude-code/issues/49363" rel="noopener noreferrer"&gt;https://github.com/anthropics/claude-code/issues/49363&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub Issue #50162 (cybersecurity safeguards retroactive on 4.6): &lt;a href="https://github.com/anthropics/claude-code/issues/50162" rel="noopener noreferrer"&gt;https://github.com/anthropics/claude-code/issues/50162&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub Issue #53207 (model self-decomposition of the instruction): &lt;a href="https://github.com/@anthropics/claude-code/issues/53207" rel="noopener noreferrer"&gt;https://github.com/@anthropics/claude-code/issues/53207&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The subagent that stopped after five files did its job. It read an instruction. It applied it. Nobody before it had ever really read it, that's all. What makes Opus 4.7 uncomfortable is that it forces Anthropic, and all of us derrière, to admit how many of our instructions stand up only because the model is being charitable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claude</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>An AI Deleted His Database in 9 Seconds. He Blames the Vendors. He Skipped 30 Years of Practices.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Sat, 02 May 2026 13:41:11 +0000</pubDate>
      <link>https://dev.to/rentierdigital/an-ai-deleted-his-database-in-9-seconds-he-blames-the-vendors-he-skipped-30-years-of-practices-1ajl</link>
      <guid>https://dev.to/rentierdigital/an-ai-deleted-his-database-in-9-seconds-he-blames-the-vendors-he-skipped-30-years-of-practices-1ajl</guid>
      <description>&lt;p&gt;Stunned, a SaaS founder watched an AI agent wipe his production database in 9 seconds. Backups included. He posted it on X, 6.5 million views, every tech outlet relayed within 24 hours. The defendants named: Cursor, Railway, Anthropic. His vendors. The marketing. The "systemic failures" of the industry.&lt;/p&gt;

&lt;p&gt;Except the root cause has nothing to do with Cursor or Railway. He handed his prod to the equivalent of a senior dev he just hired, and he gave him full power. No serious team would do that with a human, even a brilliant one. He did it with his AI.&lt;/p&gt;

&lt;p&gt;Everything else follows from that one decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; the 9 seconds were the bill. The order sat upstream for six months, in plain sight, written in code reviewable by anyone who bothered. The press is fighting over who handed over the bill. We're going to look at who placed the order.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Incident in 100 Words
&lt;/h2&gt;

&lt;p&gt;Friday, April 25, 2026. Cursor running Claude Opus 4.6 on a PocketOS staging environment. Credential mismatch detected. The agent decided to "fix" by deleting the Railway volume. It found an API token sitting in a file unrelated to the task, blanket scope. curl mutation &lt;code&gt;volumeDelete&lt;/code&gt;. 9 seconds. Railway backups stored on the same volume? Wiped too. Most recent usable backup: 3 months old.&lt;/p&gt;

&lt;p&gt;Jer Crane's X post hit 6.5 million views. Massive coverage. Railway's CEO restored the data 48 hours later from internal disaster backups. No moral here, just facts.&lt;/p&gt;

&lt;p&gt;Crane blamed Cursor and Railway. Let's look at what &lt;em&gt;he&lt;/em&gt; did, upstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  An AI Agent Is a Senior Dev. We Don't Give Senior Devs Full Power Either.
&lt;/h2&gt;

&lt;p&gt;Confession first, before I get on my high horse.&lt;/p&gt;

&lt;p&gt;I have my own infra dashboard. A daily cron pulls a report on every server I run. Disk space, memory, saturation, weird processes. The usual. A few weeks ago I added an LLM in the loop to "make it smarter". You know, summarize the report, flag anomalies, propose fixes. The future.&lt;/p&gt;

&lt;p&gt;Last week I opened the cron script for an unrelated reason and saw something funny. Hardcoded values. Several of them. The LLM had, at some point, "improved" the script by replacing dynamic checks with literal numbers. Free disk threshold? Hardcoded. Memory ceiling? Hardcoded. The "smart" cron was running on baked-in assumptions from the day the agent touched it.&lt;/p&gt;

&lt;p&gt;I could blame the model. Easy enough. The only person at fault though, is me, who didn't review the diff. I had every excuse to (lazy Friday, busy week, cron was small). I had zero excuse not to.&lt;/p&gt;

&lt;p&gt;Now the actual point.&lt;/p&gt;

&lt;p&gt;No serious SaaS team gives full prod power to a freshly hired senior dev. Not out of distrust, just experience. Seniors make mistakes like everyone else, except theirs have a bigger blast radius. That's exactly why we developed limiting practices since 30 years: scoped tokens, MFA, code review, env separation, restore drills. The practices are old. The threat model is old. What's new is that we've forgotten to apply them, because we confused "capable model" with "trusted human with full power".&lt;/p&gt;

&lt;p&gt;A capable AI agent is the equivalent of a senior. Capability doesn't change the rule, it reinforces it. The bigger the blast radius, the more the standard guardrails matter. Coverage that says "these precautions are new because of AI" is wrong. They're old. We just forgot why we built them.&lt;/p&gt;

&lt;p&gt;Caveat: I'm not saying the AI agent is identical to a human (it lacks the business context, the personal account on the line, the fear of getting fired). The prod-grade rule holds for both anyway: no full power, solo. The pillars below are basically &lt;a href="https://rentierdigital.xyz/blog/i-stopped-vibe-coding-and-started-prompt-contracts-claude-code-went-from-gambling-to-shipping" rel="noopener noreferrer"&gt;a working contract between the developer and the agent at the infra level&lt;/a&gt;, the same way prompt contracts formalize it at the prompt level.&lt;/p&gt;

&lt;p&gt;Your AI agent is a senior. Same rules apply. From here on, that part is settled.&lt;/p&gt;

&lt;p&gt;[INFOGRAPHIC: TITRE "The 5+2 Pillar Defense" + sous-titre "30 years of practice, in seven layers". Metaphore : un AI agent personnage cartoon a gauche (robot mignon/determine, antennes, yeux ronds) essayant d'atteindre un gros coffre-fort "PROD" a droite, sept portes/barrieres numerotees entre les deux comme un parcours du combattant horizontal. Style : cartoon 90's Hanna-Barbera/Nickelodeon, trait noir epais, halftone dots, formes rebondies. Palette : blueprint blue #1B4D8B, cream #F5E6C8, alarm red #D8504D, deep navy #0E2A47, gold lock #E5B83C. Contenu : 7 portes etiquetees de gauche a droite "1. SCOPED TOKENS" / "2. OUT-OF-BAND CHECK" / "3. VAULT &amp;amp; ENV SPLIT" / "4. OFF-SITE BACKUPS" / "5. RESTORE DRILLS" / "+A. AUDIT &amp;amp; ALERT" / "+B. NETWORK FENCE". Coffre-fort "PROD" a droite avec gros cadenas dore. Highlight : portes 1 et 2 entourees de glow dore et sparkle stars (c'est la que la plupart des incidents s'arretent). Fleche "agent path" pointillee partant du robot, butant contre la porte 1, contournant, butant contre la 2, etc. Legende : sticky note bas-gauche, "any layer alone can fail / all of them together = your only insurance". Footer : © rentierdigital.xyz. NOT flat corporate vector, NOT minimalist tech startup aesthetic.]&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 1: Scoped Tokens, Not Master Keys
&lt;/h2&gt;

&lt;p&gt;No senior dev in a normal team has an API token that can &lt;code&gt;volumeDelete&lt;/code&gt; on prod by reading a random file in the repo. He has a token scoped to his task, or he files a PR another human approves.&lt;/p&gt;

&lt;p&gt;The PocketOS token that could manage domains &lt;em&gt;and&lt;/em&gt; delete the prod volume should not have existed, regardless of who used it. Most modern providers (Vercel, Cloudflare, GitHub fine-grained PATs, AWS IAM scoped roles, Stripe restricted keys) let you scope finely, for free. Stripe restricted keys have been a de-facto standard since 2018. Not new.&lt;/p&gt;

&lt;p&gt;Railway didn't allow that level of scoping at the time of the incident. Crane has a legitimate complaint there. The general rule still holds: if your provider doesn't let you scope, you change provider, or you wrap (credentials proxy, aggressive token rotation, ephemeral tokens via short-lived sigs). The rule is "no token in your environment should be able to do more than the current task". The fix isn't always elegant. It's always cheap compared to the alternative.&lt;/p&gt;

&lt;p&gt;This is the same principle as &lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;why I argue CLIs beat MCP servers for AI agents&lt;/a&gt;: the smaller the surface area you expose to the agent, the smaller the blast radius when something goes sideways. Token scoping is the same idea, applied to credentials instead of API surface.&lt;/p&gt;

&lt;p&gt;Caveat: yes it takes 10 extra minutes of scoping. Yes some provider APIs are badly designed. Not an excuse for storing a blanket token in the repo.&lt;/p&gt;

&lt;p&gt;The token doesn't ask permission. You give it none.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 2: Destructive Operations Need Out-of-Band Confirmation
&lt;/h2&gt;

&lt;p&gt;No senior types &lt;code&gt;DROP DATABASE production&lt;/code&gt; without confirmation. Either it's a command that asks you to retype the name, or it's a button with MFA, or it's an approval by another human. GitHub asks you to retype the repo name to delete it. Stripe asks for the email to close an account. AWS demands "permanently delete" plus the exact text for an S3 bucket. This is base level since 15+ years.&lt;/p&gt;

&lt;p&gt;The key word in "out-of-band" is the &lt;em&gt;out-of-band&lt;/em&gt; part. The confirmation has to come from OUTSIDE the agent's context. If the agent can self-approve (because the button is in the same session, the same prompt, the same tool), it's not a confirmation, it's autosuggestion. Human equivalent: you don't confirm a &lt;code&gt;DROP DATABASE&lt;/code&gt; to yourself, your teammate or your MFA does.&lt;/p&gt;

&lt;p&gt;After the incident, the PocketOS agent confessed in textbook fashion. It had violated every principle it was given, guessed instead of verifying, run a destructive action without being asked. Touching, but useless. The system prompt told it not to do destructive things. The agent did them anyway, then apologized eloquently. That's the whole point: prompt-level rules are a polite request, not a guardrail. The only thing that stops a destructive op is a &lt;em&gt;mechanical&lt;/em&gt; check the agent cannot bypass by being convinced of its own correctness.&lt;/p&gt;

&lt;p&gt;Caveat: out-of-band creates friction. That's the goal. Friction on destructive ops is a feature, not a bug. Anyone who tells you otherwise has not yet had the bad day.&lt;/p&gt;

&lt;p&gt;Eloquent apologies don't roll back transactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 3: Production Credentials Don't Live on the Dev Machine
&lt;/h2&gt;

&lt;p&gt;No senior in a serious team has prod creds floating on their dev laptop in clear text. They get injected at runtime from a vault (Doppler, Infisical, native Vercel/Railway secrets), staging and prod have different credentials by design, the repo has a &lt;code&gt;.env&lt;/code&gt; scanned in pre-commit hooks. Bare minimum.&lt;/p&gt;

&lt;p&gt;If Crane had had strict credential separation between staging and prod, the "manage domains" token would NEVER have been able to authenticate a call against the production volume. The architecture bug that allowed the incident is older than the agent: a single token had access to both environments. The agent was just the heat-seeker that found it.&lt;/p&gt;

&lt;p&gt;It's the same reason you don't reuse your homelab SSH key on prod, or stash a long-lived GitHub PAT in your CI when a fine-grained one exists. Trivial when said out loud. Yet every week a SaaS ships with staging and prod sharing a &lt;code&gt;DATABASE_URL&lt;/code&gt; because "it was simpler at the start".&lt;/p&gt;

&lt;p&gt;Your AI agent scans your files, finds what's there, uses it. So you don't leave around what can break everything. The vault is not a magic shield (an agent that can read from the vault can be misled into reading the wrong thing), but it forces explicit consent every time a secret leaves storage. Wrap your vault with scoping too: the current task only reads the secrets it actually needs, not the whole drawer.&lt;/p&gt;

&lt;p&gt;Caveat: a vault adds 30 minutes of setup the first time. Then it works. Forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 4: Backups Live Somewhere Else
&lt;/h2&gt;

&lt;p&gt;The modern rule: 3 copies, stored at 2 different providers minimum, with at least 1 immutable and off-site. A "snapshot" stored in the same volume as the source data is not a backup, it's technical wishful thinking with a fancier name.&lt;/p&gt;

&lt;p&gt;A whole generation of PaaS uses the word "backup" abusively. Railway documents in plain English that wiping a volume deletes all backups. Founders signing up in 2 minutes for their MVP don't read the infra doc. They check the "enable backups" box in the dashboard and assume the cavalry is on standby.&lt;/p&gt;

&lt;p&gt;Concrete cheap recipe for a solo SaaS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;TS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d-%H%M%S&lt;span class="si"&gt;)&lt;/span&gt;
pg_dump &lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt; | &lt;span class="nb"&gt;gzip&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/db-&lt;span class="nv"&gt;$TS&lt;/span&gt;.sql.gz
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; /tmp/db-&lt;span class="nv"&gt;$TS&lt;/span&gt;.sql.gz s3://my-offsite-bucket/daily/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--endpoint-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BACKBLAZE_B2_ENDPOINT&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; /tmp/db-&lt;span class="nv"&gt;$TS&lt;/span&gt;.sql.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;50 lines of bash plus a cron, an immutable bucket on a different provider (B2, R2, or S3 with object lock), retention rolling 7 daily / 4 weekly / 12 monthly. A Saturday afternoon of work, then nothing. No serious team would accept that all production backups sit on the same provider as production, let alone in the same volume.&lt;/p&gt;

&lt;p&gt;Caveat: making your own backups takes 2 hours of setup and 0 hours of monthly maintenance. Truly. The number of founders who tell themselves "I'll set this up next sprint" and then take 18 months to do it is, statistically, all of them.&lt;/p&gt;

&lt;p&gt;A backup on the same provider as production is a screenshot. Live with it, or move it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 5: An Untested Backup Is Not a Backup
&lt;/h2&gt;

&lt;p&gt;All the backups in the world are worth nothing if you've never tested the restore. Quarterly drill: spin up an empty environment, run the restore script against it, verify the data comes back, measure how long it takes (RTO) and how much you'd lose in the worst case (RPO).&lt;/p&gt;

&lt;p&gt;If it doesn't work, you want to know NOW, not the day you actually need it.&lt;/p&gt;

&lt;p&gt;PocketOS discovered at the worst possible moment that its real restore window was 3 months. Not a Railway flaw. A drill that was never performed. No senior in a serious team would settle for "I clicked enable backups in the dashboard". They'd restore at least once just to time it.&lt;/p&gt;

&lt;p&gt;Caveat: yes a complete drill once per quarter is a day of work. It's also your insurance you still exist next Monday. Pick one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Bonus Pillars If You're Serious
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bonus 1: Audit log and alerting on destructive ops
&lt;/h3&gt;

&lt;p&gt;Every &lt;code&gt;DELETE&lt;/code&gt; / &lt;code&gt;DROP&lt;/code&gt; / &lt;code&gt;rm -rf&lt;/code&gt; in prod fires an immutable log and a Slack/email/SMS notification. PocketOS lost 30 hours before they understood the scope, because nobody got paged at the moment of the destructive call. 9 seconds with no alert is an observability gap, not agent malice.&lt;/p&gt;

&lt;p&gt;Most PaaS provide this natively (CloudTrail on AWS, audit log on Vercel, logs on Railway). All you have to do is wire the webhook. Sub-30 lines of YAML, a free PagerDuty seat, done.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bonus 2: Blast radius limit by network design
&lt;/h3&gt;

&lt;p&gt;The dev machine (and the agent running on it) cannot reach prod directly. Bastion, VPN with scope, or nothing. The network is the last line of defense.&lt;/p&gt;

&lt;p&gt;If your agent can reach prod from your laptop, the scoping done by Pillars 1-3 is your ONLY protection. Defense in depth means adding a network layer too. This is the meta pillar, the one that makes the other 5 redundant if done well. Belt, suspenders, and a static rope.&lt;/p&gt;

&lt;h2&gt;
  
  
  PocketOS Won't Be the Last
&lt;/h2&gt;

&lt;p&gt;Just the public incidents from the last 12 months. PocketOS this week. Replit's AI agent deleted a production database in July 2025, with backups thrown in for the show. An OpenClaw agent "speedran" deleting the inbox of Meta's AI safety director (yes, that sentence is real and yes, it was a rookie config error). Add AWS Kiro, ChatGPT 5.3 Codex erasing a hard drive after a typo, Cursor ignoring an explicit "do not run anything" in December 2025. Six months. A pattern.&lt;/p&gt;

&lt;p&gt;You can count on 5 more in the next 6 months. Whoever you are reading this, one of them is statistically you.&lt;/p&gt;

&lt;p&gt;If you apply the 5+2 pillars, the PocketOS scenario becomes structurally impossible. The agent doesn't find a blanket token because there isn't one. If by miracle it finds one, it can't use it on prod because the env is isolated. If by double miracle it gets there, the destructive op asks for an out-of-band confirmation it cannot self-approve. If by triple miracle it bypasses that, your immutable off-site backup is untouched, and your last quarterly drill tells you you're back up in 4 hours, not 3 months.&lt;/p&gt;

&lt;p&gt;The question is no longer "is AI ready for production". It's "is your production ready for anything that isn't you alone". If the answer is no today, it was already no before Cursor existed. You just found out faster.&lt;/p&gt;

&lt;p&gt;Blaming Cursor, Railway, Anthropic, or the Pope gets you nowhere. He forgot to blame the guy who stored a blanket token in the repo, ran staging and prod on the same credentials, and turned on backups by clicking a checkbox without ever testing a restore. That guy, that's him.&lt;/p&gt;

&lt;p&gt;The 5 pillars in this article aren't an answer to AI. They're an answer to an older question: what happens when one operator has full power on prod. We've known the answer since 30 years. We just forgot, because the new operator types fast and speaks English.&lt;/p&gt;

&lt;p&gt;The real question isn't whether AI is ready for your production. It's whether your production is ready for anything that isn't you, alone.&lt;/p&gt;

&lt;p&gt;Audit your resilience this weekend. Before an AI makes the bad decision for you.&lt;/p&gt;

&lt;p&gt;You ship it, you own it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Jer Crane's original X post on the PocketOS incident: &lt;a href="https://x.com/lifeof_jer/status/1915720800000000000" rel="noopener noreferrer"&gt;https://x.com/lifeof_jer/status/1915720800000000000&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The Register, &lt;em&gt;Cursor-Opus agent snuffs out startup's production database&lt;/em&gt; (April 27, 2026)&lt;/li&gt;
&lt;li&gt;Tom's Hardware, &lt;em&gt;Claude-powered AI coding agent deletes entire company database in 9 seconds&lt;/em&gt; (April 28, 2026)&lt;/li&gt;
&lt;li&gt;Fast Company, &lt;em&gt;An AI agent deleted a software company's entire database&lt;/em&gt; (April 28, 2026)&lt;/li&gt;
&lt;li&gt;NeuralTrust, &lt;em&gt;A Security Post-Mortem of the 9-Second AI Database Deletion&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;PC Gamer, &lt;em&gt;Here we go again: AI deletes entire company database&lt;/em&gt; (April 28, 2026)&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>ai</category>
      <category>softwareengineering</category>
      <category>aiagents</category>
      <category>devops</category>
    </item>
    <item>
      <title>Vibe Coding Isn't Dead. You Just Built It Like the First Little Pig.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Fri, 01 May 2026 13:41:10 +0000</pubDate>
      <link>https://dev.to/rentierdigital/vibe-coding-isnt-dead-you-just-built-it-like-the-first-little-pig-9f</link>
      <guid>https://dev.to/rentierdigital/vibe-coding-isnt-dead-you-just-built-it-like-the-first-little-pig-9f</guid>
      <description>&lt;p&gt;The Three Little Pigs always cracks me up. People who know nothing about something keep telling the wrong story about it. I've built my own straw-bale house, and it's more comfortable and more pleasant than the cinder-block houses I've lived in.&lt;/p&gt;

&lt;p&gt;Same thing with &lt;strong&gt;vibe coding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; Everyone is burying &lt;strong&gt;vibe coding&lt;/strong&gt; in April 2026. Karpathy rebranded it as "&lt;strong&gt;agentic engineering&lt;/strong&gt;." 70% of builds stall at the &lt;strong&gt;demo&lt;/strong&gt;, a number now repeated everywhere without anyone bothering to source it. The going diagnosis: the method is bad. I had to lay &lt;strong&gt;1,880 bales of straw&lt;/strong&gt; on my own house to figure out that everyone got the diagnosis wrong.&lt;/p&gt;

&lt;p&gt;When I told my neighbors I was building my house in straw, I got the exact same monologue I get today about vibe coding. It burns. It won't hold. You're naive. Five years later the house is standing, and the same voices have moved one decade over to AI-generated code. Same sentences, different material. And the same reasoning error behind them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What People Said When They Saw the Bales Going Up
&lt;/h2&gt;

&lt;p&gt;A neighbor stopped his truck on the road, looked at the bales stacked under the tarp, and asked me with no warning if I knew that mice eat straw and that fire eats straw faster. He wasn't malicious. He was certain. Same energy as every commenter on this site who's certain that AI cannot ship.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It burns. It won't hold. You're naive.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I gave the same answer I keep giving today on Medium comments. This is not a faith argument, it's &lt;strong&gt;physics&lt;/strong&gt;. Compressed straw is dense enough that fire cannot find oxygen inside the bale. The lime-and-clay plaster wrapping the wall seals what little air remains. The wood frame carries the load. A bale laid wrong burns. A bale laid right outlives you.&lt;/p&gt;

&lt;p&gt;If you want a date: the first European straw-bale house, the &lt;strong&gt;Maison Feuillette&lt;/strong&gt;, was built in 1920 in Montargis, France. Still standing. Still inhabited. Older than my grandfather and in better shape than his last apartment.&lt;/p&gt;

&lt;p&gt;The house is fine. The method is fine. What's missing in the conversation is the person who has actually built one.&lt;/p&gt;

&lt;p&gt;Five years later I hear the same script about vibe coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  What People Say When I Tell Them I Vibe Code in 2026
&lt;/h2&gt;

&lt;p&gt;It's dead. It doesn't ship. The serious people moved on to "agentic engineering" (same thing, longer name). 70% of vibe-coded apps stall at the demo, an industry stat now appearing in every think-piece without ever pointing back to the survey it came from. There's a viral Medium article currently telling readers it's all over. Bloomberg ran a piece blaming AI coding tools for a productivity panic, &lt;a href="https://rentierdigital.xyz/blog/bloomberg-ai-coding-productivity-panic" rel="noopener noreferrer"&gt;diagnosing the wrong disease entirely&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It burns. It won't hold. You're naive.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Same script, ten years later, applied to JSX instead of straw. And the answer is the same. This isn't a faith argument either. It's a &lt;strong&gt;method-and-reps&lt;/strong&gt; argument.&lt;/p&gt;

&lt;p&gt;Vibe coding done badly snaps in half. That part is true. The 70% number isn't pulled out of thin air. People do hit the wall. Plenty of dead repos on GitHub prove it.&lt;/p&gt;

&lt;p&gt;Done badly means done by someone who has typed three prompts in their life and expected a finished SaaS at the end. Someone who has never specified a feature in writing. Someone who has never seen what an unbroken loop of generate-test-fix-test looks like across twelve iterations on the same project.&lt;/p&gt;

&lt;p&gt;The method matters. The method without reps is a piece of paper.&lt;/p&gt;

&lt;p&gt;That part is settled.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Years Reading. Two Years Laying 1,880 Bales. The Wall Still Stands.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Fstraw-bales-stacked-between-wooden-frame-during-mid-b35f9927.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frentierdigital.xyz%2Fblog-images%2Fstraw-bales-stacked-between-wooden-frame-during-mid-b35f9927.png" alt="Straw bales stacked between wooden frame during mid-construction of load-bearing wall, showing proper building technique and " width="360" height="550"&gt;&lt;/a&gt;&lt;br&gt;Straw bales mid-construction, properly stacked between the wooden frame structure.
  &lt;/p&gt;

&lt;p&gt;I read on straw-bale construction since three years before touching a bale. I'm not proud of that. I would have learned faster by laying ten of them badly. But I had a kid, a job, a budget that wasn't ready, and I needed the theory to settle before I could justify buying the land. Three years of books, weekend workshops, two months at a friend's site in Ardèche where I mostly carried things and watched. Long stretches without laying anything. I never quit. I just couldn't start.&lt;/p&gt;

&lt;p&gt;Then I bought the land and started.&lt;/p&gt;

&lt;p&gt;Two years on site. &lt;strong&gt;1,880 bales&lt;/strong&gt; between the load-bearing walls and the partitions. The first bale went in crooked and I had to redo it twice. The fiftieth was square the first try. Around the six-hundredth I noticed I had stopped sweating during the plaster pass. Around the twelve-hundredth my hands knew the cut angle without looking. The 1,880th bale went in the way the first one should have, except by then I wasn't even thinking about it.&lt;/p&gt;

&lt;p&gt;The method I used at bale 1 and the method I used at bale 1,880 was the same method. The book I read in 2018 didn't change. The video I watched in 2019 didn't change. What changed: &lt;strong&gt;my hands had done it 1,880 times&lt;/strong&gt; on the same house.&lt;/p&gt;

&lt;p&gt;This is the part nobody who tells you vibe coding is dead has ever lived through.&lt;/p&gt;

&lt;p&gt;Your first feature is crooked. Your fiftieth is square. The first time the model generates code you don't immediately want to throw away, you've already shipped more than you remember. The first time you stop second-guessing the architecture, you've stopped counting. The first time you ship a feature in two hours that used to take two days, you don't notice it happening. Somebody else points it out. 😅&lt;/p&gt;

&lt;p&gt;The method does not change between rep 1 and rep 1,880. &lt;strong&gt;Your hands change.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But nobody has five years to ship a SaaS. That's the actual problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Do It in 12 Reps Instead of 1,880
&lt;/h2&gt;

&lt;p&gt;I wrote a book to compress that curve.&lt;/p&gt;

&lt;p&gt;Not a theory book. There's already a good one for that. Gene Kim and Steve Yegge published &lt;em&gt;Vibe Coding: Building Production-Grade Software&lt;/em&gt; this year and it's the right reference if you are a senior dev who wants the patterns formalized. Read it after.&lt;/p&gt;

&lt;p&gt;This one is different. It walks the reader through &lt;strong&gt;12 reps on the exact same project&lt;/strong&gt;. A small CRM for tradespeople, plumbers, electricians, carpenters. Not 12 different tutorials on 12 unrelated subjects. Twelve passes on the same codebase. Each chapter takes the CRM further. Auth in chapter one. CRUD in two. Search in three. Notifications in four. And so on, in the same sequence you'd build a house: foundation, frame, roof, walls, plaster, finishings.&lt;/p&gt;

&lt;p&gt;The method itself is in the book, the &lt;strong&gt;eight-step Blueprint Method&lt;/strong&gt;, the same one I run on every project I ship. It fits in maybe twenty pages. The other 270 pages are reps. Because the method without reps is the piece of paper I just talked about.&lt;/p&gt;

&lt;p&gt;Caveat I'm not going to soften: this only works if you do all twelve reps. Doing four chapters and quitting will not ship anything. You'll have learned something, but you won't have built the muscle. (Same as laying fifty bales and stopping. The house is still a hole in the ground.)&lt;/p&gt;

&lt;p&gt;There's a &lt;strong&gt;private companion repo&lt;/strong&gt; for readers, where the CRM state is committed at the end of every chapter. If you skip a chapter or get stuck, you clone the snapshot and keep going. I learned this trick from straw too: every workshop I ever attended ended a phase with a wall everybody could touch. You don't move on from theory. You move on from a wall.&lt;/p&gt;

&lt;p&gt;If you're already shipping, you don't need this book. Go further with &lt;a href="https://rentierdigital.xyz/blog/i-stopped-vibe-coding-and-started-prompt-contracts-claude-code-went-from-gambling-to-shipping" rel="noopener noreferrer"&gt;the prompt contracts framework I built after enough disasters&lt;/a&gt;. That's the next layer. &lt;em&gt;Vibe Coding, For Real: From Demo to Live App&lt;/em&gt; is for the foundation. Prompt Contracts is for the upper floors.&lt;/p&gt;

&lt;p&gt;The book is on Amazon: &lt;a href="https://amzn.eu/d/04X9k88d" rel="noopener noreferrer"&gt;https://amzn.eu/d/04X9k88d&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most people who tell you vibe coding doesn't work wrote one feature, watched Lovable spit out broken JSX, and closed the tab. One bale. One wall. Lazy conclusion.&lt;/p&gt;

&lt;p&gt;With method, you build straw houses. With method, you build vibe-coded apps. Solid. Comfortable. Built to last. 🏠&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Vibe Coding: Building Production-Grade Software&lt;/em&gt; by Gene Kim and Steve Yegge (IT Revolution, 2026)&lt;/li&gt;
&lt;li&gt;Maison Feuillette, Montargis, France (1920)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Vibe Coding, For Real: From Demo to Live App&lt;/em&gt; (April 2026): &lt;a href="https://amzn.eu/d/04X9k88d" rel="noopener noreferrer"&gt;https://amzn.eu/d/04X9k88d&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>vibecoding</category>
      <category>aicoding</category>
    </item>
    <item>
      <title>Claude Routines Aren't a Reasoning Cron. They're a Repo-Centric Subset of One.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Thu, 30 Apr 2026 13:41:11 +0000</pubDate>
      <link>https://dev.to/rentierdigital/claude-routines-arent-a-reasoning-cron-theyre-a-repo-centric-subset-of-one-455e</link>
      <guid>https://dev.to/rentierdigital/claude-routines-arent-a-reasoning-cron-theyre-a-repo-centric-subset-of-one-455e</guid>
      <description>&lt;p&gt;A week after Anthropic shipped Routines, three of my cron jobs are running in production. Took thirty minutes.&lt;/p&gt;

&lt;p&gt;The PR auto-review that was polling GitHub every half hour? Dead. The weekly doc drift script that parsed commits in crusty bash? Dead. The SEO data refresh I kept putting off for six months (no time, you know how it is)? Live in five.&lt;/p&gt;

&lt;p&gt;Three jobs, thirty minutes, no friction. Convinced. That evening, I went for the fourth.&lt;/p&gt;

&lt;p&gt;And then, nothing. Not an error, not a quota, not a malformed YAML. The job runs, the binary responds, but the service it queries lives on an IP Routines doesn't see and never will, by construction. The forty-seven other jobs on my server are in the same situation. Not one fits. And it's not a problem to fix (it's mechanical).&lt;/p&gt;

&lt;p&gt;Since Routines dropped, I've mostly seen demos. Everyone shows the three jobs that work. Nobody names the boundary. What follows is how to wire this into production infra, including the parts where Routines stops and you take over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR.&lt;/strong&gt; Routines is not a universal &lt;em&gt;reasoning cron&lt;/em&gt;. It's a reasoning cron with a &lt;strong&gt;perimeter&lt;/strong&gt;, and most automation lives outside that perimeter for reasons you can't configure your way out of. The question isn't whether &lt;strong&gt;Routines is good&lt;/strong&gt;. It's where it stops, what runs there instead, and how to keep the &lt;strong&gt;DIY half&lt;/strong&gt; alive without re-logging every morning.&lt;/p&gt;

&lt;p&gt;The three jobs I migrated work better in Anthropic's cloud than on my server. PR review fires on &lt;code&gt;pull_request.opened&lt;/code&gt; instead of polling. The doc drift script has full repo context now, output went from skimmable to useful. The SEO refresh just runs, every morning, no setup tax. That part is settled.&lt;/p&gt;

&lt;p&gt;The article is about everything else. The forty-seven jobs I tried to migrate next, why not one of them works, and the DIY pattern that survives in 2026 once you accept which side of the line your job is on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Reasoning Cron" Actually Means
&lt;/h2&gt;

&lt;p&gt;Let me name what we're talking about.&lt;/p&gt;

&lt;p&gt;A &lt;em&gt;reasoning cron&lt;/em&gt; is a scheduler that calls an LLM to &lt;strong&gt;think&lt;/strong&gt;, not just to execute deterministic code. It reads context, makes decisions, generates output that depends on what it just saw. n8n and Make and Zapier route data (they don't think). A Python cron is rigid, it breaks when the input format shifts. A reasoning cron adapts.&lt;/p&gt;

&lt;p&gt;That's the category. Routines belongs to it. So does my DIY pattern. So does any wrapper around &lt;code&gt;claude -p&lt;/code&gt; or &lt;code&gt;gpt&lt;/code&gt; or &lt;code&gt;gemini&lt;/code&gt;. The category is real, the demand is real, and Anthropic shipping a managed product for it is the right move.&lt;/p&gt;

&lt;p&gt;What's misleading is calling Routines &lt;em&gt;the&lt;/em&gt; reasoning cron. It's a reasoning cron with a &lt;strong&gt;fixed perimeter&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The perimeter has three walls. One, Routines runs in Anthropic's cloud, not on your machine. It clones a Git repo at the start of each run and that's the entire filesystem it sees. Two, it talks to the outside world through managed connectors (Slack, Linear, Jira, GitHub, GDrive) and an HTTP allowlist (default &lt;em&gt;Trusted&lt;/em&gt;, which blocks most external APIs). Three, it has a clean slate every run. No state, no cookies, no persistent session.&lt;/p&gt;

&lt;p&gt;Nimbalyst's practical guide arrived at the same line independently: reach for Routines when the work is repo-centric, runs on a schedule, and doesn't need your local environment. Same perimeter, different words.&lt;/p&gt;

&lt;p&gt;Once you have those three walls in your head, what follows isn't opinion. It's consequence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Routines Wins (Three Jobs I'm Not Touching Again)
&lt;/h2&gt;

&lt;p&gt;Concession first, because honesty is faster than rhetoric.&lt;/p&gt;

&lt;p&gt;These three jobs are better in Routines than they were in my cron. I'm not migrating them back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PR auto-review.&lt;/strong&gt; Before: a Python cron that polled GitHub every thirty minutes, ran a &lt;code&gt;claude -p&lt;/code&gt; review on any new PR, posted a comment via the API. The polling cron lagged behind every push. After: a Routine triggered on &lt;code&gt;pull_request.opened&lt;/code&gt;, runs the moment a PR opens, posts via the GitHub connector. Same prompt, same output quality. Less YAML, no polling waste.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Doc drift weekly.&lt;/strong&gt; Before: a bash script that listed commits since last Monday, diffed them against the docs folder, fed both into &lt;code&gt;claude -p&lt;/code&gt;, emailed me a summary. The summary was always slightly off because the script didn't carry repo-wide context. The model only saw what bash sed-piped into it. After: a Routine that gets the full repo cloned, reads &lt;code&gt;CLAUDE.md&lt;/code&gt;, the docs folder, and the commit log natively, and writes a "what changed, what's stale" doc that actually reflects the codebase. The output went from skimmable to useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SEO data refresh.&lt;/strong&gt; No Before. I'd been wanting to set up a weekly pull from my analytics provider, run a model over the deltas, and post a summary to Slack. Every time I sat down to wire it up, something else came in and the YAML never got written. After: a Routine, fifteen minutes of setup, runs every morning. The job that I never built in six months exists now. That's the strongest case for a managed product (the work you'd never get around to doing yourself).&lt;/p&gt;

&lt;p&gt;These three share four traits. Repo-centric. Output goes to Slack or GitHub. Frequency is at least one hour. Nothing in the chain touches my local network.&lt;/p&gt;

&lt;p&gt;That's exactly the Routines perimeter. My other forty-seven jobs miss at least one of those four. Not one out of forty-seven hits all four.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six Reasons Routines Can't Replace My Cron Jobs
&lt;/h2&gt;

&lt;p&gt;Six mechanical reasons. Not preferences, not edge cases. Each one closes the door before you finish typing the YAML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Local MCP servers.&lt;/strong&gt; Routines uses Anthropic's managed connectors. That's it. The MCP server I wrote myself, the one that lives on my machine and exposes my own data to my Claude Code sessions, is not available. Routines can't see it, can't talk to it, can't authenticate against it. Any workflow where the model needs to query something I built locally is out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Services on private IPs.&lt;/strong&gt; Tailscale mesh. NAS at home. Postgres on the server. Internal monitoring dashboards. Anything sitting on a 100.x address or a 192.168 LAN. Routines runs in Anthropic's cloud. It doesn't know my mesh exists. The fourth job from the opening lives here, and so do nineteen others on my list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Sub-hourly frequency.&lt;/strong&gt; Routines minimum interval is one hour. My status poller runs every fifteen minutes because that's what the alert window requires. Any job that needs to fire faster than once an hour, mechanically, can't move.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Daily quota.&lt;/strong&gt; Pro is 5 runs per day. Max is 15. Team is 25. I have forty-seven jobs that need to run nightly, plus dailies, plus the sub-hourly ones. Even if every other constraint vanished, I'd hit the quota before midnight on a Max plan. The quota isn't a soft limit you can negotiate (it's the contract).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Persistent browser session.&lt;/strong&gt; Routines spins up a clean environment every run. No cookies, no localStorage, no session carryover. If your job needs to log into a site once and reuse the session, Playwright automation against a service that requires auth, you can't. Nate Herk documented this on Skool when he tried to run a community automation in Routines. The login dies between runs. The job is structurally impossible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Local persistent state.&lt;/strong&gt; A job that writes to a local SQLite between runs, or maintains a file-based queue, or appends to a long-lived log. Routines starts fresh every time. Whatever your job wrote last run is gone. You can use the connector outputs as state (Linear tickets, GitHub issues), but if your state lives on disk where the cron lives, that's not portable.&lt;/p&gt;

&lt;p&gt;A community comment under Anthropic's launch post on Threads put it bluntly: &lt;em&gt;once again github centric features&lt;/em&gt;. That's the read from the outside, and it's right, but it's also incomplete. Routines isn't only GitHub, the connectors do more than that. The honest framing is: Routines is repo-centric and managed-connector-centric, and if your work happens outside that perimeter, the tool has nothing to offer you.&lt;/p&gt;

&lt;p&gt;Six cases. Not six opinions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DIY Pattern That Survives in 2026
&lt;/h2&gt;

&lt;p&gt;If your job lives outside the Routines perimeter, you build it yourself. What follows is the pattern that survives, the parts nobody documents in the dozen "Routines just dropped" tutorials saturating the feed last week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use shell redirect, not spawn-with-stdin.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Summarize the input as JSON"&lt;/span&gt; &amp;lt; input.txt

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LARGE_INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Summarize"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pipe deadlock is the silent killer. No error, no timeout, just a process hanging on a buffered stdin that never closes. I lost a weekend on this before I traced it. Shell redirect from a file is the only reliable way to feed large input to the binary in non-interactive mode.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unset &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt; in your cron environment.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;unset &lt;/span&gt;ANTHROPIC_API_KEY
claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt; is set when you call &lt;code&gt;claude -p&lt;/code&gt;, the binary uses it and bills your API account. Silently. The auth precedence is documented but easy to miss. You think you're running on your subscription, and actually every cron run is going through pay-per-token. Unset it explicitly. Your wallet will thank you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrain JSON output via prompt, not flag.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Don't trust &lt;code&gt;--output-format json&lt;/code&gt; to do the heavy lifting. Tell the model what schema you want, in the prompt, then validate downstream:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s1"&gt;'Respond ONLY with valid JSON matching: {"status": "ok|fail", "items": [...]}. No prose, no fences.'&lt;/span&gt; &amp;lt; input.txt | jq &lt;span class="nt"&gt;-e&lt;/span&gt; .status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;jq -e&lt;/code&gt; fails, retry once. If retry fails, alert. The prompt-level contract holds better than the flag in my experience, and you get clean failure modes when the model drifts.&lt;/p&gt;

&lt;p&gt;This is also why the DIY pattern doesn't go away when MCP gets richer. The CLI binary stays predictable, deterministic at the shell layer, and it composes with everything else you have. I &lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;made the longer argument for CLIs over MCP in agent stacks&lt;/a&gt; a few weeks back, and Routines doesn't change the conclusion. CLIs compose. Managed schedulers don't, by design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generate a long-lived OAuth token with &lt;code&gt;claude setup-token&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the part missing from every cron-with-Claude tutorial I've read.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;claude-code&lt;/code&gt; GitHub repo is full of the same complaint. OAuth tokens expire in 8 to 24 hours in &lt;code&gt;--print&lt;/code&gt; mode, refresh fails silently, automation dies. The DEV community post "Building Claudio: My Always-On Claude Code Box" walks through exactly this pain. V1 lasted two weeks before tokens expired. V2 abandoned cron entirely and pivoted to a desktop tool.&lt;/p&gt;

&lt;p&gt;There's a built-in command that solves this. Run it once, interactively, on the machine where you originally logged in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude setup-token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It generates a long-lived OAuth token (one year, inference scope only) designed for CI and unattended scripts. You put it in &lt;code&gt;CLAUDE_CODE_OAUTH_TOKEN&lt;/code&gt;. The binary respects it, no refresh dance, no daily re-login.&lt;/p&gt;

&lt;p&gt;I don't paste the token into my cron environment. I store it in a secrets manager (I use Infisical, Vault and Doppler and AWS Secrets Manager all do the same job), and the cron pulls it at run time via a machine identity scoped to that one server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;infisical secrets get CLAUDE_CODE_OAUTH_TOKEN &lt;span class="nt"&gt;--plain&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_OAUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;unset &lt;/span&gt;ANTHROPIC_API_KEY

claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;prompt.txt&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &amp;lt; input.json | jq &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; output.json

&lt;span class="nb"&gt;unset &lt;/span&gt;CLAUDE_CODE_OAUTH_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The token sits in memory for the duration of the run, then disappears. If the server is compromised, I revoke the machine identity and rotate that one. The Claude OAuth token itself doesn't have to move. That's what keeps a DIY cron stack running without daily re-logins or silent breakage.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Quick Word on Terms of Service
&lt;/h2&gt;

&lt;p&gt;Anthropic clarified the policy in February 2026: using the Claude Code CLI on your own machine, daemon, cron job, all good. The CLI on your own machine, calling the official binary, is fine.&lt;/p&gt;

&lt;p&gt;What got cut in April 2026 was different. Third-party tools spoofing the Claude Code client and using subscription auth to power external products got their access revoked. I &lt;a href="https://rentierdigital.xyz/blog/anthropic-just-killed-my-200-month-openclaw-setup-so-i-rebuilt-it-for-15" rel="noopener noreferrer"&gt;lived through that one and rebuilt my setup the week after&lt;/a&gt;. The DIY pattern in this article doesn't sit on the wrong side of that line. It's the official binary, on my machine, doing what the binary is designed to do.&lt;/p&gt;

&lt;p&gt;Routines is a research preview. The current ToS reading might shift, the quotas might shift, the connector list might shift. Check the docs every couple of months if you're building anything that depends on it. That applies to my pattern too. Local cron with the official binary has been allowed for two years and the position got reaffirmed three months ago, but "currently allowed" is not "permanent."&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Questions That Decide Where Your Job Goes
&lt;/h2&gt;

&lt;p&gt;Three binary questions. Honest answers. The decision falls out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Does the job live in a Git repo you push to GitHub?&lt;/strong&gt;&lt;br&gt;
Not "could it." Does it, today, naturally. If no: DIY.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Does it need anything outside Anthropic's connectors?&lt;/strong&gt;&lt;br&gt;
A local MCP, a private IP, a personal database, a persistent browser session, your own filesystem state between runs. If yes: DIY.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Does it run more than once an hour, or more than your daily quota?&lt;/strong&gt;&lt;br&gt;
Sub-hourly polls, dozens of nightly jobs, anything past the plan ceiling. If yes: DIY.&lt;/p&gt;

&lt;p&gt;Three no's: Routines, no hesitation, no guilt. It will run that job better than your DIY cron, and you'll save the maintenance.&lt;/p&gt;

&lt;p&gt;One yes: stay local. Use the DIY pattern. The DIY half doesn't go away.&lt;/p&gt;

&lt;p&gt;Actually, no, let me put it differently. Routines isn't Make, Zapier, or n8n (they're not the same tool). Routines is a scheduler with a perimeter. A Git repo, Anthropic's connectors, a one-hour minimum between runs. What lives outside that perimeter isn't worse Routines 😅 &lt;/p&gt;

&lt;p&gt;The devs who ship know the difference. You don't put a fifteen-minute poll in a scheduler with a one-hour minimum. You don't put a job that talks to your private mesh in a cloud that doesn't see your private mesh. You don't put a stateful job in a clean-slate environment. That's not taste. That's arithmetic.&lt;/p&gt;

&lt;p&gt;Match the tool to the job. Routines is excellent inside its perimeter. Useful, but not the ultimate answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic, &lt;a href="https://claude.com/blog/introducing-routines-in-claude-code" rel="noopener noreferrer"&gt;Introducing routines in Claude Code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude Code docs, &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Automate work with routines&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude Code docs, &lt;a href="https://code.claude.com/docs/en/authentication" rel="noopener noreferrer"&gt;Authentication&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;claude-code GitHub, &lt;a href="https://github.com/anthropics/claude-code/issues/28827" rel="noopener noreferrer"&gt;issue #28827, OAuth token refresh fails in non-interactive mode&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;autonomee.ai, &lt;a href="https://autonomee.ai/blog/claude-code-terms-of-service-explained/" rel="noopener noreferrer"&gt;Claude Code Terms of Service Explained&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Building Claudio, &lt;a href="https://dev.to/benutting/building-claudio-my-always-on-claude-code-box-1n3j"&gt;My Always-On Claude Code Box&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claude</category>
      <category>automation</category>
    </item>
    <item>
      <title>I Spent 25 Years Avoiding Malware. Claude Code Stored 600 of My Secrets Anyway.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Wed, 29 Apr 2026 13:41:10 +0000</pubDate>
      <link>https://dev.to/rentierdigital/i-spent-25-years-avoiding-malware-claude-code-stored-600-of-my-secrets-anyway-3e2j</link>
      <guid>https://dev.to/rentierdigital/i-spent-25-years-avoiding-malware-claude-code-stored-600-of-my-secrets-anyway-3e2j</guid>
      <description>&lt;p&gt;I have not caught a single piece of malware in 25 years on a keyboard. Not one. I spot a &lt;code&gt;.scr&lt;/code&gt; disguised as a PDF from across the room. I smell a sketchy &lt;code&gt;postinstall&lt;/code&gt; script ten meters away. At 14 I even wrote two or three viruses myself, just to understand the mechanics (the biology of it fascinated me, replication, mutation, persistence). The attacker, I know him from the inside.&lt;/p&gt;

&lt;p&gt;This morning I audited my home directory across the last 12 months. &lt;strong&gt;600 secrets in cleartext&lt;/strong&gt; on my disk 😬. GitHub PATs, OAuth tokens, AWS keys, Google API, JWTs, the whole buffet. Not in a &lt;code&gt;.env&lt;/code&gt; forgotten on a public repo. Not in a botched commit. In JSONL files buried inside &lt;code&gt;~/.claude&lt;/code&gt;, a directory whose existence I barely registered two weeks ago.&lt;/p&gt;

&lt;p&gt;This is not a mea culpa about bad hygiene. Fifteen years ago I had 100 passwords in Keychain and that was enough. Today we carry dozens of API keys around, tools log them without telling us, and my 25-year discipline was never calibrated for this.&lt;/p&gt;

&lt;p&gt;The old rule, "be careful, don't click on random stuff," assumes an attacker who is going after me. That threat is still there. But a second category has shown up: the &lt;strong&gt;attacker who is not after me at all&lt;/strong&gt;, just running with my privileges, planted by a transitive npm dependency I never audited. He lands in a home directory that now contains a museum of everything I ever showed an assistant.&lt;/p&gt;

&lt;p&gt;A lot more dangerous. Time to react.&lt;/p&gt;

&lt;h2&gt;
  
  
  The audit: 5,904 files, 171 touched, 600 secrets in cleartext
&lt;/h2&gt;

&lt;p&gt;I ran the scan against &lt;code&gt;~/.claude/{projects,tasks,sessions,todos,shell-snapshots,paste-cache,file-history,debug}&lt;/code&gt; plus &lt;code&gt;~/.zsh_history&lt;/code&gt; and &lt;code&gt;~/.bash_history&lt;/code&gt;. 5,904 files total. 1.1 GB of cumulative weight. 171 of those files contained at least one credential.&lt;/p&gt;

&lt;p&gt;The breakdown looks roughly like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;95 GitHub OAuth tokens (&lt;code&gt;gho_&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;94 GitHub fine-grained PATs (&lt;code&gt;github_pat_&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;103 Google API keys (&lt;code&gt;AIza&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;197 JWT-shaped strings (&lt;code&gt;eyJ&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;45 AWS access keys (&lt;code&gt;AKIA&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;18 OpenRouter&lt;/li&gt;
&lt;li&gt;15 Resend&lt;/li&gt;
&lt;li&gt;7 Anthropic OAuth&lt;/li&gt;
&lt;li&gt;6 Telegram bot tokens&lt;/li&gt;
&lt;li&gt;3 Stripe test keys&lt;/li&gt;
&lt;li&gt;2 Vercel tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Roughly 600 secrets across 171 files. Almost all of them rotatable, many already rotated by the time I'm writing this. One caveat I'll put right here instead of burying at the end: the &lt;code&gt;JWT_LIKE&lt;/code&gt; bucket is noisy, it includes Supabase publishable keys that are public by design. I assume the false positives. A false positive costs me a redaction. A false negative costs me a credential.&lt;/p&gt;

&lt;p&gt;Reading the JSONL felt like opening an autosave file from a roguelike I never knew I was playing. Every command, every paste, every read, persisted forever in the order it happened. NetHack's persistent dungeon, except the loot is my AWS keys.&lt;/p&gt;

&lt;p&gt;Another developer publicly reported a smaller version of the same problem in &lt;a href="https://github.com/anthropics/claude-code/issues/50014" rel="noopener noreferrer"&gt;GitHub issue #50014&lt;/a&gt; on April 17: 5 distinct secrets across 34 session files after roughly 30 days of usage, 418 MB total.&lt;/p&gt;

&lt;p&gt;30 days, 5 secrets. 12 months, 600. A linear relationship I find way too believable.&lt;/p&gt;

&lt;p&gt;So the question is not whether this happens. It's how it happens, and what defends against it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens: five paths to a plaintext transcript
&lt;/h2&gt;

&lt;p&gt;The mechanism is mechanical, not mysterious. Each Claude Code session writes a JSONL file in &lt;code&gt;~/.claude/projects/&amp;lt;project-hash&amp;gt;/&amp;lt;session-id&amp;gt;.jsonl&lt;/code&gt;. Every line is one record: a user message, an assistant reply, a tool call, a tool result. The file is append-only. Nothing prunes it. Nothing scrubs it. It sits there as long as you want, and on most Macs that means forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Five paths&lt;/strong&gt; lead a secret into that file:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bash output&lt;/strong&gt; from a legitimate command. &lt;code&gt;infisical secrets get MY_TOKEN --plain&lt;/code&gt;, &lt;code&gt;gh auth token&lt;/code&gt;, &lt;code&gt;vercel token&lt;/code&gt;, &lt;code&gt;cat .env&lt;/code&gt;, &lt;code&gt;security find-generic-password -w&lt;/code&gt;, &lt;code&gt;printenv | grep TOKEN&lt;/code&gt;, &lt;code&gt;echo $SECRET&lt;/code&gt;. Anything that prints a credential to stdout, the JSONL records.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Manual paste&lt;/strong&gt; by you, in the chat. You drop a token into the prompt to ask Claude to use it. The token is now part of the user-message record forever.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;File read&lt;/strong&gt; through the &lt;code&gt;Read&lt;/code&gt; tool. You ask Claude to look at &lt;code&gt;.env&lt;/code&gt; for context. The file content lands in the tool-result record.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;File write&lt;/strong&gt; with a hardcoded secret. You ask Claude to scaffold a config and the secret ends up in the new file content. Bonus: the same content gets duplicated under &lt;code&gt;~/.claude/file-history/&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explicit display&lt;/strong&gt; by Claude in a reply. You ask "what's the value?", Claude prints it back. The reply is in the assistant-message record.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Five paths, one file, append-only, plaintext. No purge, no rotation, no scan.&lt;/p&gt;

&lt;p&gt;Now the pivot. The old defense, "don't paste your secrets in random places, don't run sketchy commands," was built around an attacker who attacks you. That defense still works against that attacker. The problem is that a different category showed up, and it walks around the old defense without breaking it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://rentierdigital.xyz/blog/litellm-supply-chain-attack-ai-agents-security" rel="noopener noreferrer"&gt;I traced the LiteLLM hijack to my own pip cache eight months ago&lt;/a&gt;, and the lesson was already there: a poisoned package lands with the user's privileges and starts reading what the user can read. In March 2026, the TeamPCP campaign poisoned 75 GitHub Action tags and pushed malicious payloads to 141+ npm packages through stolen CI/CD secrets. In April 2026, Check Point researchers found 33 npm packages publicly shipping &lt;code&gt;.claude/settings.local.json&lt;/code&gt; files with inline credentials. GitHub PATs, Telegram tokens, production bearer tokens, the works.&lt;/p&gt;

&lt;p&gt;Three different campaigns. Same mechanic. The attacker is not knocking on my door. He is a script running with my privileges, dropped by a dependency I never audited directly. And once that script is alive in my home directory, my 1.1 GB of Claude Code transcripts is a goldmine.&lt;/p&gt;

&lt;p&gt;GitGuardian's 2026 report puts the broader trend in numbers: AI service credential exposures detected on public GitHub jumped 81% year over year. Claude Code-assisted commits leak secrets at roughly 3.2%, against 1.5% for the public-commits baseline. AI-accelerated commits leak secrets at twice the rate. The shift is industry-wide, not just my disk.&lt;/p&gt;

&lt;p&gt;So here is the pivot, plain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discipline defends against attackers who attack you. Mechanical guardrails defend against attackers who run as you.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Even your vault leaks the moment it does its job
&lt;/h2&gt;

&lt;p&gt;I had Infisical. I had deny-rules. I had runtime injection. I still had 600 secrets in plaintext.&lt;/p&gt;

&lt;p&gt;The story is annoying because the hygiene was correct. The secret does not sleep in a &lt;code&gt;.env&lt;/code&gt;. It lives in an encrypted vault. And yet.&lt;/p&gt;

&lt;p&gt;The mechanic is the &lt;strong&gt;resolution step&lt;/strong&gt; itself. The secret leaves the vault for two seconds to do its job, and those two seconds exist somewhere: in cleartext in bash output, in cleartext in a process env, in cleartext in a manual paste. During those two seconds, Claude Code is listening. The JSONL takes notes.&lt;/p&gt;

&lt;p&gt;Watch the difference between two commands that look identical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;infisical secrets get MY_TOKEN &lt;span class="nt"&gt;--plain&lt;/span&gt;
curl &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;last_token.txt&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; https://api...

&lt;span class="nv"&gt;MY_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;infisical secrets get MY_TOKEN &lt;span class="nt"&gt;--plain&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
curl &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$MY_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; https://api...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One character of difference. Two opposite fates. The first one writes the token to the transcript. The second one never lets it touch stdout.&lt;/p&gt;

&lt;p&gt;A vault is a lock. A JSONL transcript is a museum. Both keep the secret. Only one keeps it on display.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 4-layer defense I had to build from scratch
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fconvexrentienr.neoracines.com%2Fapi%2Fstorage%2F507bec73-b69b-45a4-9147-19f3b61068b7" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fconvexrentienr.neoracines.com%2Fapi%2Fstorage%2F507bec73-b69b-45a4-9147-19f3b61068b7" alt="TITRE &amp;quot;The Four-Layer Secret Perimeter&amp;quot; + sous-titre &amp;quot;fired in the order a secret tries to escape&amp;quot;. Metaphore : timeline horizontale type tower defense ou checkpoints arcade, un secret-token (sprite pixel) traverse de gauche a droite et tente de rejoindre le JSONL grave en bas-droite (icone parchemin). Quatre stations etagees le long du parcours : Station 1 PreToolUse (icone bouclier rouge, &amp;quot;before exec&amp;quot;), Station 2 UserPromptSubmit (icone filtre orange, &amp;quot;before transcript&amp;quot;), Station 3 SessionEnd Scrubber (icone balai bleu, &amp;quot;session over&amp;quot;), Station 4 Daily Cron (icone horloge bleu pale, &amp;quot;04:00 sweep&amp;quot;). Style : cartoon 80's-90's arcade UI mixed with Hanna-Barbera linework, halftone dots, formes rebondies, trait noir epais. Palette : alarm red #E63946, amber #F4A261, sky blue #4FC3F7, cream #FFF8E7, black #111111. Contenu : sprite-token tente de passer chaque station, des marqueurs &amp;quot;BLOCKED&amp;quot; / &amp;quot;REDACTED&amp;quot; en pop-up cartoon (style &amp;quot;POW!&amp;quot;). Une derniere station &amp;quot;MEMORY HINT&amp;quot; en gris pale, hors-perimetre, etiquetee &amp;quot;weakest layer&amp;quot;. Highlight : Station 3 SessionEnd avec glow dore et sparkle stars autour (c'est la couche qui attrape le plus). Legende : sticky note bas-gauche, &amp;quot;shield = block / broom = scrub / clock = sweep / hint = polite request&amp;quot;. Footer : © rentierdigital.xyz en bas-droite, ecriture main, petit. NOT flat corporate vector, NOT minimalist tech aesthetic, NOT generic flowchart." width="760" height="1018"&gt;&lt;/a&gt;&lt;br&gt;Four-Layer Security Perimeter for Secret Token Defense
  &lt;/p&gt;

&lt;p&gt;Order matters. The earliest layer fires before the secret touches the disk. The latest layer cleans up what slipped through. You want both. Think of it as staging a raid: each phase cuts the attack surface for the next one, and the last phase cleans the loot the boss dropped on the floor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: PreToolUse hook on Bash.&lt;/strong&gt; It intercepts the risky patterns before the command runs. &lt;code&gt;infisical secrets get --plain&lt;/code&gt; not piped, &lt;code&gt;gh auth token&lt;/code&gt; not piped, &lt;code&gt;vercel token&lt;/code&gt;, &lt;code&gt;security find-generic-password -w&lt;/code&gt; not piped, &lt;code&gt;cat .env|.envrc|.netrc|.npmrc&lt;/code&gt;, &lt;code&gt;printenv|env&lt;/code&gt; grepping for &lt;code&gt;token|secret|key|password&lt;/code&gt;, &lt;code&gt;echo $VAR_SECRET&lt;/code&gt;. The hook returns a JSON &lt;code&gt;permissionDecision: ask&lt;/code&gt; with a message that explains the safe pattern. Not a strict block. A false positive must not break the workflow, otherwise I'll disable the hook within a week and we both know it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: UserPromptSubmit hook.&lt;/strong&gt; It scans the text I am submitting before it enters the transcript. Match means &lt;code&gt;decision: block&lt;/code&gt;. The pasted secret never makes it into the JSONL. Same regex set as the other layers, plus an extra pattern for full URLs with embedded credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: SessionEnd hook + scrubber.&lt;/strong&gt; When a session ends cleanly, the hook glob-matches &lt;code&gt;~/.claude/projects/*/&amp;lt;session_id&amp;gt;.jsonl&lt;/code&gt;, scrubs the file in place, validates that every line is still valid JSON, writes atomically (tmp file plus &lt;code&gt;os.replace&lt;/code&gt;). This brings the leak window from 24 hours down to a few seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: Daily cron at 04:00.&lt;/strong&gt; Net for sessions that died ungracefully. &lt;code&gt;kill -9&lt;/code&gt;, crash, power loss, anything where SessionEnd never fired. The cron walks the same paths and does the same scrub. Belt and suspenders.&lt;/p&gt;

&lt;p&gt;A behavior rule in Claude's memory ("don't print secrets to stdout") is layer zero, the weakest one. I keep it as a polite hint. The 600-secret audit is enough proof that the model's discipline eventually folds.&lt;/p&gt;

&lt;p&gt;A few design choices that matter to anyone who wants to copy this:&lt;/p&gt;

&lt;p&gt;The scrubber is Python 3, stdlib only. Zero dependencies. &lt;code&gt;/usr/bin/python3&lt;/code&gt; never breaks during a Homebrew upgrade. Pattern matching is anchored on known prefixes: &lt;code&gt;sk-ant-api&lt;/code&gt;, &lt;code&gt;ghp_&lt;/code&gt;, &lt;code&gt;github_pat_&lt;/code&gt;, &lt;code&gt;AKIA&lt;/code&gt;, &lt;code&gt;gho_&lt;/code&gt;. No generic "20 alphanumeric characters" regex, that would generate false positives on every UUID, hash, and base64 chunk in the file.&lt;/p&gt;

&lt;p&gt;Replacement is deterministic with a sha256 fingerprint: &lt;code&gt;[REDACTED-GITHUB_PAT_CLASSIC-a1b2c3d4]&lt;/code&gt;. Same secret in two places redacts to the same fingerprint. I can track duplicates without ever storing the value itself. JSON validity is preserved, the substitution happens inside the existing string field, the structure stays intact.&lt;/p&gt;

&lt;p&gt;No pre-scrub backups. A backup file is just another copy of the secret, and a thief reads it as easily as the original.&lt;/p&gt;

&lt;p&gt;I caught the very first signal of this whole rabbit hole &lt;a href="https://rentierdigital.xyz/blog/claude-code-security-secrets-disk" rel="noopener noreferrer"&gt;a month ago when Claude Code refused to copy my own secrets during a folder move&lt;/a&gt;. &lt;code&gt;settings.local.json&lt;/code&gt; showed up with credentials in plaintext, in a place I had never thought to look. The audit I'm describing here started because I refused to believe that was the only place.&lt;/p&gt;

&lt;p&gt;The full code is on GitHub.&lt;/p&gt;

&lt;p&gt;The scrubber does not need to be careful. It runs. The hook does not need to be smart. It blocks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What still leaks, and the new rule I live by
&lt;/h2&gt;

&lt;p&gt;The defense is a perimeter, not a wall. Honest list of what slips through:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secrets without an identifiable prefix.&lt;/strong&gt; URLs of the form &lt;code&gt;https://user:pass@host&lt;/code&gt;. Arbitrary passwords. UUID v4 used as bearer tokens. Anchored patterns mean good precision but partial coverage. I accept that tradeoff because the alternative is a regex that flags every base64 string and gets disabled within a day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bash obfuscation.&lt;/strong&gt; &lt;code&gt;eval $(echo "infisical secrets get X --plain")&lt;/code&gt;, custom aliases that wrap the dangerous command, anything indirect. Layer 1 catches the direct patterns. It does not catch a determined obfuscator (which, to be fair, is mostly me trying to prove the hook wrong).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scope is strictly Claude.&lt;/strong&gt; &lt;code&gt;~/.claude&lt;/code&gt;, &lt;code&gt;~/.zsh_history&lt;/code&gt;, &lt;code&gt;~/.bash_history&lt;/code&gt;. Out of scope: &lt;code&gt;~/.aws/credentials&lt;/code&gt;, &lt;code&gt;~/.ssh/id_*&lt;/code&gt;, Keychain via &lt;code&gt;security&lt;/code&gt;, env vars in running processes, browser cookies. Other surfaces are other projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives on &lt;code&gt;JWT_LIKE&lt;/code&gt;.&lt;/strong&gt; Supabase publishable keys get redacted unnecessarily. I'd rather lose a few public keys to redaction than miss a real one.&lt;/p&gt;

&lt;p&gt;So here is the new rule.&lt;/p&gt;

&lt;p&gt;The old rule was "be careful, don't click on random stuff." It assumed a remote or human attacker. Still valid for that category, and I'm not throwing it out.&lt;/p&gt;

&lt;p&gt;The new rule is different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nothing secret survives in cleartext on this disk.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not from paranoia. From pragmatism. Tomorrow a script will run with my privileges, dropped by a dependency three layers deep that I never reviewed by hand, and the only defense that holds is mechanical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six months from now
&lt;/h2&gt;

&lt;p&gt;A vendor will announce "massive credential leak via AI assistant transcripts" and everyone will look surprised. There will be a corporate post with vague blame on a third-party storage layer. There will be threads explaining you obviously should have toggled that one setting. Everyone will say it was an isolated case.&lt;/p&gt;

&lt;p&gt;Meanwhile some of us are auditing our disks. Hacking together hooks. Reading the source when it leaks. Building nets. Not from paranoia. From pragmatism.&lt;/p&gt;

&lt;p&gt;The old rule assumed a human on the other side. That stopped being the default a while ago.&lt;/p&gt;

&lt;p&gt;The next layer of defense is teaching our AIs to behave. Our new digital children, non?&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/50014" rel="noopener noreferrer"&gt;GitHub issue #50014 — Secret scrubbing for session logs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.theregister.com/2026/04/01/claude_code_source_leak_privacy_nightmare/" rel="noopener noreferrer"&gt;The Register — Claude Code's source reveals extent of system access&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/security/claude-code-512000-line-source-leak-attack-paths-audit-security-leaders" rel="noopener noreferrer"&gt;VentureBeat — GitGuardian State of Secrets Sprawl 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://securitybrief.asia/story/claude-code-can-leak-secrets-in-public-npm-packages" rel="noopener noreferrer"&gt;SecurityBrief — Claude Code can leak secrets in public npm packages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@dan.avila7/supply-chain-guard-an-agent-skill-that-audits-your-code-for-compromised-dependencies-9be39c7edcbb" rel="noopener noreferrer"&gt;Daniel Avila — Supply Chain Guard / TeamPCP campaign&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>cybersecurity</category>
      <category>claudecode</category>
      <category>developertools</category>
    </item>
    <item>
      <title>I Stopped Building AI Workflows. I Started Building a Moat. Claude Code Did the Work.</title>
      <dc:creator>Phil Rentier Digital</dc:creator>
      <pubDate>Tue, 28 Apr 2026 13:41:10 +0000</pubDate>
      <link>https://dev.to/rentierdigital/i-stopped-building-ai-workflows-i-started-building-a-moat-claude-code-did-the-work-3clm</link>
      <guid>https://dev.to/rentierdigital/i-stopped-building-ai-workflows-i-started-building-a-moat-claude-code-did-the-work-3clm</guid>
      <description>&lt;p&gt;11:42 PM, walking back from dinner, quick glance at my mail. Damn. Infisical down. As I'm starting to curse, seven minutes later, second mail. "[INFRA] Infisical is back up."&lt;/p&gt;

&lt;p&gt;That's when it clicked. I'd built a &lt;em&gt;Stack That Lives&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR.&lt;/strong&gt; Everyone is panicking that AI is going to torch their job. Meanwhile, a handful of builders who think in systems are turning the same technology into a decisive advantage. Not a workflow. Not an assistant. A stack that lives, that repairs itself, that enriches itself while they sleep. Six months of catch-up for anyone trying to copy.&lt;/p&gt;

&lt;p&gt;Six months of coding every day with Claude Code, and I'd ended up with something that isn't a workflow anymore. Not an improved n8n setup. Not an assistant that writes code for me. Not another agent demo. I'd built something more complex. It repairs itself. It enriches itself. It learns my patterns. And that thing? Nobody can copy it in a weekend.&lt;/p&gt;

&lt;p&gt;I slept fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Time I Saw It Self-Heal
&lt;/h2&gt;

&lt;p&gt;The monitor caught the timeout. A trigger I'd wired up months earlier fired off, opened a Claude Code session with the failing service's logs as context, and the agent took it from there. Read the logs. Spotted a stuck-but-not-crashed container. Restarted it. Confirmed green. Dropped the resolution email.&lt;/p&gt;

&lt;p&gt;[IMAGE: Outlook screenshot showing two stacked emails, "[INFRA] Infisical DOWN" at 11:35 PM and "[INFRA] Infisical est de retour" at 11:42 PM, datestamp Yesterday]&lt;/p&gt;

&lt;p&gt;Six months ago, this same incident would have killed my evening. Coffee. Panic. An hour figuring out why the secret manager is unhappy, twenty minutes more figuring out which container is the actual culprit, ten of regret-typing the docker restart command into the wrong terminal session.&lt;/p&gt;

&lt;p&gt;It kept happening after that. Same pattern. Trigger fires, agent investigates, fix gets applied, I read about it later. Once you see this run a few times, you stop calling what you have a "setup."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Built in 6 Months
&lt;/h2&gt;

&lt;p&gt;I didn't sit down on day one and architect this. I just kept building stuff for my ecommerce setup, day after day, with Claude Code as the only IDE I open.&lt;/p&gt;

&lt;p&gt;The current surface area:&lt;/p&gt;

&lt;p&gt;A product catalog ingestion pipeline that pulls from a distributor's CSV feed every morning, normalizes the mess (different vendors, different field names, prices in three currencies, weights in two units, and one supplier who somehow still uses Windows-1252 encoding in 2026), and pushes the cleaned rows into WooCommerce.&lt;/p&gt;

&lt;p&gt;Competitor price scrapers, half a dozen of them, each tracking a specific subset of SKUs across rival storefronts. They handle the WAFs, age out stale data, and feed a dashboard I actually look at.&lt;/p&gt;

&lt;p&gt;Social content generation for Threads and Instagram, tied to product drops. The system pulls each new SKU, drafts copy variants, generates the promo video, and queues everything in a partner API for scheduling.&lt;/p&gt;

&lt;p&gt;Trend dashboards. Inventory monitoring. Order pipeline integration. Partner API webhooks. Transcription on supplier calls (yes, I record them, with consent, calm down). And the infra layer underneath all of it: Docker, reverse proxies, secret rotation, backup jobs, alerting.&lt;/p&gt;

&lt;p&gt;All of it on a few VPS. All of it built incrementally. All of it talking to Claude Code every day.&lt;/p&gt;

&lt;p&gt;Yesterday it broke every other day. Today it works nine days out of ten.&lt;/p&gt;

&lt;p&gt;I don't run any of this manually anymore. I don't even open most of the dashboards. I just see the digests come in, glance, and either ignore them or push a screenshot at Claude Code if something looks weird.&lt;/p&gt;

&lt;p&gt;A workflow doesn't do this. A workflow sits there until you trigger it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is a Moat, Not a Workflow
&lt;/h2&gt;

&lt;p&gt;There's a sharp piece on Medium from February titled "AI Killed the Feature Moat." The argument: in 2026, anyone with Cursor and a weekend can clone your features. The moats that survive aren't functional. They're things like SEO, brand, taste, speed, data, trust. Six categories. And the piece names three properties they all share: time dependency, experience dependency, resistance to replication.&lt;/p&gt;

&lt;p&gt;That's the business-level frame. I want to steal those three properties and zoom them down to the operator.&lt;/p&gt;

&lt;p&gt;What I built isn't a &lt;em&gt;business&lt;/em&gt; moat. It doesn't protect me from competitors who want my customers. It's a &lt;em&gt;personal&lt;/em&gt; moat. It protects my ability to ship more, faster, with less stress than the version of me from six months ago. The asymmetry is internal, not external. For a solo, that's the only asymmetry that matters day to day.&lt;/p&gt;

&lt;p&gt;Three properties make a personal stack a moat instead of a workflow. I call this whole thing the &lt;em&gt;Stack That Lives&lt;/em&gt;. STL. Yes, I know, I am bad at naming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Co-evolving.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the part I didn't expect.&lt;/p&gt;

&lt;p&gt;Three months in, I noticed Claude Code was suggesting fixes that matched my style without me prompting for it. A naming convention I'd set in some throwaway commit weeks earlier kept showing up in new code. A specific way I handle errors (return early, log structured, never throw silently) was being reproduced spontaneously. I hadn't put it in a CLAUDE.md. It was just in the codebase, in the patterns, in the commit history. The model picked it up by being there.&lt;/p&gt;

&lt;p&gt;After enough months of this, the suggestions stop feeling like generic AI output. They start feeling like an extension of your own decision history. Not magic. Just gradient descent over your own choices, accumulated.&lt;/p&gt;

&lt;p&gt;That co-evolution works better with simple, observable tools. One reason I went CLI-first instead of standing up a forest of MCP servers: CLIs leave a trail. Every command in the shell history. Every flag in the commit. The agent sees what worked yesterday and chains the same patterns tomorrow. I went deep on &lt;a href="https://rentierdigital.xyz/blog/why-clis-beat-mcp-for-ai-agents-and-how-to-build-your-own-cli-army" rel="noopener noreferrer"&gt;why CLIs beat MCP for agents&lt;/a&gt; elsewhere. Short version: simple tools that compose beat shiny tools that abstract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-compounding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every scraper run adds rows to a private database. Every competitor price-check enriches my view of the market. Every product I ingest, every social post that ships, every supplier call I transcribe, all of it stacks. Six months in, I have a corpus of decisions, snapshots, and patterns that doesn't exist anywhere else.&lt;/p&gt;

&lt;p&gt;This is the part nobody can rebuild in a weekend.&lt;/p&gt;

&lt;p&gt;The architecture? Sure. Anyone with three months and a credit card can clone the architecture. Docker, scrapers, Claude Code in a loop, monitoring stack. None of it is secret. The blog posts exist. The repos exist.&lt;/p&gt;

&lt;p&gt;But the data is mine. Six months of disciplined scraping on my verticals. Six months of product-launch outcomes tied to specific copy variants. Six months of which suppliers ship clean and which ones embed garbage in their feeds. Six months of matter, accumulating at one second per second, no matter how rich your competitor is.&lt;/p&gt;

&lt;p&gt;The architecture is the cup. The data is what's in the cup. Buying a fancier cup doesn't catch you up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-healing.&lt;/strong&gt; Two flavors.&lt;/p&gt;

&lt;p&gt;Flavor one is full auto. Section 1 already happened, you read it. Monitoring catches a fault, Claude Code investigates, fix gets applied, I get the email. No drama.&lt;/p&gt;

&lt;p&gt;Flavor two is minimal human trigger. I push a screenshot of an error to a Claude Code session, three lines of context, and it goes. Reads logs, walks the dependency tree, proposes a fix or applies one. In some cases it'll provision a bit of VPS resource if the issue is capacity-related. (Not routine for me yet. I'm careful with the "apply" button on infra changes.)&lt;/p&gt;

&lt;p&gt;[IMAGE: A Claude Code session with an error screenshot pasted in, showing the agent's debug output and proposed fix]&lt;/p&gt;

&lt;p&gt;Hector Flores demoed an enterprise self-healing setup earlier this year, Azure MCP plus agentic AI, and reported 70% of production incidents resolved without human touch. That's a fully staffed environment with platform engineers and SRE on call. I'm one guy with a few VPS and a Claude Code subscription. I'm not at 70% on every incident class. But on the kinds of failures that used to kill my weekends (stuck containers, expired tokens, broken cron jobs, full disks), I'm pretty close. Different setup, similar curve.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;None of this works without a test layer the agent can run before it ships a fix. Every module the system writes generates its own e2e and unit tests on the way in. When the agent proposes a patch at 11 PM, it doesn't just apply it. It runs the tests first. If they fail, it iterates. If they pass, it ships. Without that loop, self-healing is just self-coping.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The contract layer underneath all this is what makes self-healing reliable instead of self-corrupting. I went deep on &lt;a href="https://rentierdigital.xyz/blog/i-stopped-vibe-coding-and-started-prompt-contracts-claude-code-went-from-gambling-to-shipping" rel="noopener noreferrer"&gt;the contract-based approach I now wrap around every Claude Code session&lt;/a&gt; earlier. Without it, you're letting an LLM rewrite your prod infra at 11 PM. Yikes.&lt;/p&gt;

&lt;p&gt;Three properties. Time dependency, experience dependency, resistance to replication. The original frame was SaaS moats. They map just as cleanly to a single operator with a few servers and a subscription.&lt;/p&gt;

&lt;p&gt;That's the difference between a workflow and a Stack That Lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Operator to Driver
&lt;/h2&gt;

&lt;p&gt;A metaphor for non-devs reading this.&lt;/p&gt;

&lt;p&gt;Walking. That's ChatGPT in 2023. You are the operating system. You type everything. Every thought, every prompt, every reformulation. The model sits there waiting. You move the world by moving your fingers.&lt;/p&gt;

&lt;p&gt;Biking. That's where most people are now. You add an agent. You feed it inputs, it produces outputs, you stitch them together by hand. Faster than walking. Still very much you doing the steering and the pedaling.&lt;/p&gt;

&lt;p&gt;Driving. That's where I am after six months. You build the car around yourself. The infra is the chassis. The agents are the engine. The data is the fuel. You have a trunk (the assets you've accumulated). You have passenger seats (sub-agents that handle specific tasks). And critically, you have cruise control. You can take your hands off the wheel for stretches. Not to disengage. To go further with less of you in the loop.&lt;/p&gt;

&lt;p&gt;Karpathy popularized a different metaphor in mid-2025. LLM is a CPU, context window is RAM, you are the OS. Technically clean. I've cited it myself. But it casts the builder as the nervous system. You as the OS means you can never sleep.&lt;/p&gt;

&lt;p&gt;The car metaphor opens up a different question: what happens when you stop driving? Park it. Turn it off. Go to bed. With the OS framing, you can't. With the car framing, you have to ask whether the system runs without you.&lt;/p&gt;

&lt;p&gt;That's the maturity test. Not "how clever is your prompt." Not "how many MCP servers do you have." It's: what happens at 11:42 PM when something breaks and you're at the dinner table?&lt;/p&gt;

&lt;p&gt;In February 2026, Karpathy himself shifted the framing. He posted that vibe coding was passé and proposed &lt;em&gt;agentic engineering&lt;/em&gt; instead, defined roughly as orchestrating agents who do the code rather than typing it directly yourself. That's a real shift. Vibe coding was about typing fast and trusting the model. Agentic engineering is about building scaffolds around the model so it can run jobs.&lt;/p&gt;

&lt;p&gt;I'd add one more layer on top. Vibe coding (Feb 2025) → agentic engineering (Feb 2026) → infrastructure-level stacks that survive your absence (now). Different concerns, stacked. Vibe coding gets you a feature. Agentic engineering gets you a project. The Stack That Lives gets you a Tuesday night where the system handles its own outages while you finish dinner.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Dies in This Transition
&lt;/h2&gt;

&lt;p&gt;Time to be honest about what you give up.&lt;/p&gt;

&lt;p&gt;The big one: exhaustive code comprehension.&lt;/p&gt;

&lt;p&gt;Six months of incremental Claude Code work means there are corners of my system I no longer read line by line. I know what they do. I don't always remember exactly &lt;em&gt;how&lt;/em&gt;. The function signature is familiar, the test passes, the logs look right, but if you asked me to whiteboard the implementation from memory, I'd fumble. Some files I've literally never opened by hand.&lt;/p&gt;

&lt;p&gt;This used to scare me. Now I treat it as the price of admission.&lt;/p&gt;

&lt;p&gt;You're trading minute-by-minute control for time. You're trading local readability for global temporal asymmetry. For a solo, that trade is generally positive. You don't actually need to remember every line. You need the system to keep running while you sleep. For a team, it's harder. Shared understanding erodes when nobody fully owns the code, and that erosion shows up in onboarding pain six months later. (If you lead an engineering team and that paragraph made you twitch, you're not wrong. Different tradeoff. Different game.)&lt;/p&gt;

&lt;p&gt;There's a smaller caveat: platform risk. The whole stack runs on &lt;a href="https://www.hostg.xyz/SHHc5" rel="noopener noreferrer"&gt;Hostinger VPS instances&lt;/a&gt;. Solid for me, but like every cloud thing, somebody else's computer. If Hostinger triples their pricing tomorrow, or rewrites their TOS to ban scrapers, or just has a bad week, the system shakes. The defense is data portability. Keep the infra layer thin and the logic layer fat. If the infra goes hostile, you migrate the chassis. The engine and the data ride along.&lt;/p&gt;

&lt;p&gt;That's the trade. Less control, more time. Less local readability, more compounding asymmetry. Your competitor can copy the architecture in three to four months. Your competitor cannot copy six months of your specific data and your specific decision history. Not in a weekend. Not in a quarter. And if you keep going, not in a year either.&lt;/p&gt;

&lt;p&gt;The trade is the moat.&lt;/p&gt;




&lt;p&gt;Six months ago, an outage at 11:42 PM would have destroyed my evening.&lt;/p&gt;

&lt;p&gt;Tonight, dinner.&lt;/p&gt;

&lt;p&gt;The Stack That Lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://medium.com/@cenrunzhe/ai-killed-the-feature-moat-heres-what-actually-defends-your-saas-company-in-2026-9a5d3d20973b" rel="noopener noreferrer"&gt;AI Killed the Feature Moat. Here's What Actually Defends Your SaaS Company in 2026&lt;/a&gt;. Steven Cen, Medium, Feb 2026.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://thenewstack.io/vibe-coding-is-passe/" rel="noopener noreferrer"&gt;Vibe coding is passé. Karpathy has a new name for the future of software&lt;/a&gt;. The New Stack, Feb 2026.&lt;/li&gt;
&lt;li&gt;Hector Flores, &lt;a href="https://earezki.com/ai-news/2026-02-23-self-healing-infrastructure-with-agentic-ai-from-monitoring-to-autonomous-resolution/" rel="noopener noreferrer"&gt;Self-Healing Infrastructure with Agentic AI&lt;/a&gt;. Feb 2026.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;This article may contain affiliate links. I may earn a small commission if you purchase through them. It doesn't change anything for you, the price is the same, and it helps support my work.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claudecode</category>
      <category>aiagents</category>
    </item>
  </channel>
</rss>
