<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stanislav Kremeň</title>
    <description>The latest articles on DEV Community by Stanislav Kremeň (@stkremen).</description>
    <link>https://dev.to/stkremen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3964941%2Ff626e8f5-89da-43a5-bf81-05508fc5fe52.jpg</url>
      <title>DEV Community: Stanislav Kremeň</title>
      <link>https://dev.to/stkremen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stkremen"/>
    <language>en</language>
    <item>
      <title>I'm Building a Code Security Analyzer. A Security Tool Found a Critical In It.</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Tue, 23 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/im-building-a-code-security-analyzer-a-security-tool-found-a-critical-in-it-4b77</link>
      <guid>https://dev.to/stkremen/im-building-a-code-security-analyzer-a-security-tool-found-a-critical-in-it-4b77</guid>
      <description>&lt;p&gt;I'm building a tool that's supposed to help check code. I call it vibeanalyzer for now. The idea is simple: a lot of us vibe-code — we let an agent write the code, it writes it, it looks clean, the tests pass — and we have no real idea what we just let into the project. Someone close to me put it perfectly: "I don't know what's inside, but I want it to work on the outside."&lt;/p&gt;

&lt;p&gt;Before I went all in on the analyzer, I finally did the thing I should have done long ago: check whether someone had already solved this better than me. They had. It's called Semgrep — and the first thing it did was find a Critical vulnerability in my own security analyzer.&lt;/p&gt;

&lt;p&gt;This post is about that irony, and what follows from it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Confession up front: I didn't know the standard tools
&lt;/h2&gt;

&lt;p&gt;I'm self-taught. No CS degree, no senior leaning over my shoulder, I learn from discussions and my own mistakes. So I set off building a code analyzer without really knowing that established tools have been doing this for years — Semgrep, CodeQL, SonarQube, Snyk.&lt;/p&gt;

&lt;p&gt;That's the part you'll probably call out in the comments first, so I'll say it myself: yes, I started building a solution without checking what already exists. So I finally did. I ran Semgrep on the vibeanalyzer repo — on the tool that's supposed to be the one guarding security.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it found
&lt;/h2&gt;

&lt;p&gt;In a few minutes:&lt;/p&gt;

&lt;p&gt;One cosmetic issue. SHA-1 used to generate a directory name — which in my case isn't a security hole (it's not a password or a signature, just a hash choice in a non-critical spot), but it's exactly the kind of thing I'd never spot myself.&lt;/p&gt;

&lt;p&gt;And then six findings in dependencies. This is where I sat up, because these weren't cosmetic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Critical&lt;/strong&gt; in &lt;code&gt;vitest&lt;/code&gt; — a path traversal that, with the UI server running, lets someone read, write, and execute files outside the project. And this is a direct dependency, one I pulled in myself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two High&lt;/strong&gt; — in &lt;code&gt;esbuild&lt;/code&gt; (downloading a binary with no integrity check) and in &lt;code&gt;vite&lt;/code&gt; (bypassing the guard that's supposed to stop &lt;code&gt;.env&lt;/code&gt; and certificate files from being served).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three Medium&lt;/strong&gt; — variants of path traversal and sensitive-data exposure across &lt;code&gt;vite&lt;/code&gt;, &lt;code&gt;esbuild&lt;/code&gt;, and &lt;code&gt;launch-editor&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I have to be fair about how serious this actually is. Most of these are &lt;em&gt;transitive&lt;/em&gt; — dependencies of my dependencies, not something I installed directly. They all sit in the dev toolchain, so they mostly threaten a running dev server, not necessarily production. And the real-world exploit probability (EPSS) is low across the board — one percent or less. This isn't a burning house.&lt;/p&gt;

&lt;p&gt;But that &lt;code&gt;vitest&lt;/code&gt; one is &lt;em&gt;Direct&lt;/em&gt; and &lt;em&gt;Critical&lt;/em&gt;. That's something I dragged into the project myself, directly, and I'd have had no idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable question
&lt;/h2&gt;

&lt;p&gt;I'll ask it for you, because I know it's coming: &lt;em&gt;How can a security-checking tool have a Critical vulnerability in itself?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer is uncomfortable, and it's the whole point of the project. The finding isn't in my logic — it's in the dependencies. My code can be written as honestly as I can manage and the supply chain still gets me, because the risk isn't carried by me but by what I build on. And if I didn't catch this — someone who's literally building a security tool and actively cares about the problem — what chance does a vibecoder have who just had an agent build an app and shipped it?&lt;/p&gt;

&lt;p&gt;None. That's the reason the project makes sense at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Semgrep can't do — and why I'm building anyway
&lt;/h2&gt;

&lt;p&gt;I could stop here with "use Semgrep and forget your own tool." And honestly, for supply chain and known patterns of dangerous code, that holds — it does it better, deterministically, and for free. I'm not trying to beat it at its own game, and you shouldn't trust me if I claimed I was.&lt;/p&gt;

&lt;p&gt;But I noticed something in what Semgrep &lt;em&gt;didn't&lt;/em&gt; say. It didn't tell me whether the code does what my project actually &lt;em&gt;intended&lt;/em&gt;. It doesn't know the boundaries I set for the project. It can't tell that some function runs fine and is safe, but solves a problem that shouldn't be there at all. Semgrep knows patterns of danger. It doesn't know &lt;em&gt;intent&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That blind spot — the gap between what the code does and what it was meant to do — is what this whole series is about. I call it the intent gap. (Yes, the term gets used elsewhere, in marketing and UX; I mean it in my own narrow sense: the distance between code and its intended purpose.)&lt;/p&gt;

&lt;p&gt;And that's exactly what I'm building vibeanalyzer on. The first thing it does is load — or ask the user to create — the project's intent and its non-goals: the guardrails that shouldn't be crossed. So that when the AI evaluates the code, it knows what the goal actually is and what's already out of bounds. That's a gap static analyzers don't fill, because they don't read intent, the idea behind the code. And it's the only piece I dare say is worth building next to the tools that already exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I am now
&lt;/h2&gt;

&lt;p&gt;I've got intent and non-goal loading done. I've got a starting machine analysis of TypeScript and a basic markdown output with a mermaid graph of the folder structure. Ahead of me: the machine security layer (which, by the way, Semgrep currently does better than I do — that needs saying out loud), the skeleton of the AI layer, and the AI logic itself, which is the whole point and also the most uncertain part. I have no idea whether AI can meaningfully evaluate code against intent, or whether it'll just be a flood of false alarms. I tried it on websites, but I've yet to see how it works on a repository — and I'll write about it even if it doesn't pan out.&lt;/p&gt;

&lt;p&gt;This series is about building in the open. Don't expect a finished product or guarantees. Expect notes from someone who finally ran the standard tools, took a hit from his own project, and decided to keep going — because the gap he wants to fill is even more obvious after today.&lt;/p&gt;

&lt;p&gt;In the next part I'll dig into what the machine analysis of my TS code actually produces — and where it starts to diverge from what I need the AI layer to do. That's where the intent gap stops being a nice phrase and has to become working code, or fall apart trying.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;By the way: after writing this, I bumped that vitest version — and it cleared all six dependency findings at once, because the others hung off it transitively. Semgrep now reports zero. Which doesn't mean the project is "safe" — it means those six known CVEs are gone. Unknown holes and my own logic bugs are still invisible to it. That's the whole reason I'm still building. And yes: I wrote a post about a hole I should have patched long ago. That's vibecoding in a nutshell — even in a tool that's supposed to police it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claudecode</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>The Habit That Stops Your AI From Quietly Wrecking Your Plan</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Thu, 18 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/the-one-line-that-stops-your-ai-from-wrecking-your-plan-4jc0</link>
      <guid>https://dev.to/stkremen/the-one-line-that-stops-your-ai-from-wrecking-your-plan-4jc0</guid>
      <description>&lt;p&gt;This is the third and final part of a little series. In part 1 I worked out the workflow: let the AI build the plan &lt;em&gt;with&lt;/em&gt; you instead of writing it alone. In part 2 I shared the actual prompts that make an agent plan with you. This one is about the single habit that made the biggest difference of all — and the one I resisted the longest.&lt;/p&gt;

&lt;p&gt;It's writing down what the app should &lt;strong&gt;not&lt;/strong&gt; do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I ignored this for so long
&lt;/h2&gt;

&lt;p&gt;For a long time my instinct was simple: tell the AI what to build and how to work, and that's it. Describing what &lt;em&gt;not&lt;/em&gt; to build felt pointless. Why spend words on features I'm not even making?&lt;/p&gt;

&lt;p&gt;Then I noticed a pattern. I'd agree on a clean, small plan with the agent. Then a few prompts later, while building something unrelated, it would casually add an admin panel I never asked for. Or wire up a payment flow "to be helpful." Or refactor a simple feature into a multi-tenant architecture. Every time, I'd lose an afternoon pulling out work I never wanted.&lt;/p&gt;

&lt;p&gt;The agent wasn't broken. I'd just left the door open. An AI fills empty space with assumptions — and an unspoken boundary is empty space.&lt;/p&gt;

&lt;h2&gt;
  
  
  Non-goals are guardrails, not documentation
&lt;/h2&gt;

&lt;p&gt;Here's the reframe that fixed it for me: a non-goal isn't a note for humans. It's a guardrail for &lt;em&gt;future prompts&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;"No payments in v1" isn't there to remind you of anything — you already know. It's there so that three days from now, when you ask the agent to "improve the checkout screen," it doesn't quietly decide that checkout needs Stripe integration. The line holds the boundary even when you forget to.&lt;/p&gt;

&lt;p&gt;That's why the moment you delete a non-goal, the idea comes back. The agent rediscovers it, thinks it's helping, and spends your time on it. The boundary only works while it's written down.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to generate them during planning
&lt;/h2&gt;

&lt;p&gt;The good news: you don't have to think these up yourself. Ask the agent to produce them while you're still planning. Two prompts do the job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Surface them:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Based on this plan, list everything you might be tempted to add that I have NOT
asked for — extra features, extra structure, extra integrations. For each one,
tell me whether we should build it now or explicitly leave it out of this version.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is revealing. The agent will list exactly the things it would have silently added — auth systems, dashboards, caching layers — and now you get to say "not yet" &lt;em&gt;before&lt;/em&gt; it happens instead of after.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lock them in:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Take everything we agreed to leave out and write it as a short list of non-goals.
Phrase each as a clear rule, like "Do not add X in this version." I'm going to
keep these in front of you for the rest of the project.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you do exactly that — keep the list somewhere the agent reads every time (your project instructions, your rules file, whatever your tool uses), so the boundary travels with every future prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to drop them
&lt;/h2&gt;

&lt;p&gt;People ask whether non-goals are forever. They're not. Keep each one until the thing actually ships, or until you genuinely decide to build it. A non-goal isn't a permanent ban — it's a "not now" that protects your current scope. When "not now" becomes "now," you delete the line on purpose, not by accident.&lt;/p&gt;

&lt;h2&gt;
  
  
  The whole journey, in three lines
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Don't write the plan alone — run a conversation that produces it.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use prompts that make the agent interview you and argue with you, not just obey.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Write down what &lt;em&gt;not&lt;/em&gt; to build, and keep it in front of the agent so it holds.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Looking back, this was never really about planning. It's the same thing I keep running into everywhere I work with an agent: the hard part isn't the code, it's deciding what the agent gets to know — and what it doesn't. A plan is that decision at the start. Non-goals are that decision about the edges. I've got a few more of these fights to write up. They're all the same fight.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building with mini, Part 2/9: Shaping the vision with project</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Wed, 17 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/building-with-mini-part-2-shaping-the-vision-with-project-848</link>
      <guid>https://dev.to/stkremen/building-with-mini-part-2-shaping-the-vision-with-project-848</guid>
      <description>&lt;p&gt;In Part 1 we ran &lt;code&gt;mini init&lt;/code&gt; and got &lt;strong&gt;pycalc&lt;/strong&gt; — a small calculator for the terminal. &lt;code&gt;init&lt;/code&gt; asked only four questions, and &lt;code&gt;.mini/project.md&lt;/code&gt; is accordingly thin: a name, one line about &lt;em&gt;what&lt;/em&gt;, &lt;em&gt;who it's for&lt;/em&gt;, and the constraints ("Python 3, standard library only").&lt;/p&gt;

&lt;p&gt;That's enough to start, but not enough to steer. Three things are missing: &lt;strong&gt;how&lt;/strong&gt; we'll build it, &lt;strong&gt;what we deliberately won't build&lt;/strong&gt;, and &lt;strong&gt;how we'll know it's done&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's exactly what &lt;code&gt;mini project&lt;/code&gt; adds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the command does (and doesn't)
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;mini project&lt;/code&gt; opens an interactive Claude Code session — but not to write code. It's a &lt;strong&gt;plan-before-code interview&lt;/strong&gt;. Claude gets your current &lt;code&gt;project.md&lt;/code&gt; as the starting point (it won't ask what the project is about again) and walks you through four stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Frame &amp;amp; remove assumptions&lt;/strong&gt; — questions about the users, the core workflow, the data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rough plan &amp;amp; weighing decisions&lt;/strong&gt; — for every choice the pros, the cons and an alternative, plus the question "why this over the simplest possible version?". This becomes the &lt;strong&gt;Approach&lt;/strong&gt; section.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-goals &amp;amp; guardrails&lt;/strong&gt; — everything you agreed to leave out, phrased as rules ("Do not add X in this version."). On top of that, Claude lists what &lt;em&gt;it&lt;/em&gt; would be tempted to add — and recommends build-now or leave-out for each.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Success criteria &amp;amp; final check&lt;/strong&gt; — the full draft of &lt;code&gt;project.md&lt;/code&gt; is shown for approval; only then does it get saved.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One trait, faithful to the philosophy from Part 0: the interview is instructed to be &lt;strong&gt;critical, not agreeable&lt;/strong&gt;. A weak idea doesn't get nodded through — you're told why it's weak.&lt;/p&gt;

&lt;p&gt;And what the command &lt;em&gt;doesn't&lt;/em&gt; do: it never touches &lt;code&gt;.mini/state.json&lt;/code&gt;. Phase state stays in tested TypeScript; &lt;code&gt;project&lt;/code&gt; rewrites one markdown page. The finished document isn't saved by a hallucination either — at the end the agent runs &lt;code&gt;mini project --apply&lt;/code&gt;, and the contract goes through the same parser and renderer as &lt;code&gt;init&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  pycalc gets interviewed
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;pycalc
mini project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mini first prints the exact prompt it will send to Claude (no magic — you see everything) and asks whether to start the session. What follows is from a real session, condensed; the answers are mine.&lt;/p&gt;

&lt;p&gt;Claude opened by reflecting the project back and refusing to assume anything:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Terminal calculator" can mean three quite different things, and each leads to a different design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. How is it used?&lt;/strong&gt; One-shot (&lt;code&gt;pycalc "2+3*4"&lt;/code&gt; → prints &lt;code&gt;14&lt;/code&gt; and exits), an interactive REPL, or both? This is the most important decision — it determines the whole structure. And careful: for the one-shot case people already use &lt;code&gt;python -c "print(2+3*4)"&lt;/code&gt; or &lt;code&gt;bc&lt;/code&gt;. What makes pycalc different, so that it has a reason to exist? That's not nitpicking — it's the question the "What I'm building" line has to answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. How much math?&lt;/strong&gt; Just &lt;code&gt;+ - * / ( )&lt;/code&gt; with precedence? Powers, modulo? Functions like &lt;code&gt;sqrt&lt;/code&gt;, &lt;code&gt;sin&lt;/code&gt;? Variables? Every extra level means a more complex parser, and variables only make sense in a REPL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. How will you evaluate expressions?&lt;/strong&gt; Fair warning: the tempting shortcut is &lt;code&gt;eval()&lt;/code&gt;. It's one line, but it's a security hole (it evaluates arbitrary Python) and you learn nothing from it. Alternatives: &lt;code&gt;ast.parse&lt;/code&gt; with a node whitelist, or your own parser. What's your goal — to &lt;em&gt;have&lt;/em&gt; a tool, or to &lt;em&gt;learn how to write a parser&lt;/em&gt;?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My answers: one-shot; versus &lt;code&gt;python -c&lt;/code&gt; it's less typing and nicer error messages, versus &lt;code&gt;bc&lt;/code&gt; natural syntax and decimals without &lt;code&gt;scale&lt;/code&gt; setup. Just the four operators and parentheses. And the goal is to learn — own tokenizer and recursive descent, no &lt;code&gt;eval()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Claude then proposed the pipeline (tokenizer → recursive descent parser → evaluation, tokens carrying their position in the input) and pushed back where my framing was lazy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Number semantics and division.&lt;/strong&gt; You yourself list "decimals without scale setup" as the advantage over &lt;code&gt;bc&lt;/code&gt;. So &lt;code&gt;1/2&lt;/code&gt; must be &lt;code&gt;0.5&lt;/code&gt;, not &lt;code&gt;0&lt;/code&gt;. I'd propose floats — but floats have the known artifacts (&lt;code&gt;0.1+0.2&lt;/code&gt; → &lt;code&gt;0.30000000000000004&lt;/code&gt;). The alternative is the &lt;code&gt;decimal&lt;/code&gt; module (still standard library). For a calculator whose selling point is decimal numbers, I'd seriously consider &lt;code&gt;decimal&lt;/code&gt;. Which one?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Good catch — &lt;code&gt;decimal&lt;/code&gt; it is. The interview just changed a design decision I would have gotten wrong by default. Two more came out of the same batch: &lt;strong&gt;unary minus&lt;/strong&gt; goes in ("without it the calculator feels broken — but it's a conscious scope extension, so I'm saying it out loud"), and errors get a defined shape — stderr, non-zero exit code, and a message that points at the problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: unexpected character '+' at position 2
2++3
  ^
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then stage three — the part I've come to value most. First the agreed leave-outs, then the things &lt;em&gt;Claude&lt;/em&gt; would be tempted to add:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Power `&lt;/strong&gt;&lt;code&gt; and modulo &lt;/code&gt;%&lt;code&gt;** — tempting, "just one more rule". But &lt;/code&gt;&lt;strong&gt;` is right-associative, which is a new class of problem in the parser. **Leave out&lt;/strong&gt; in v1; once recursive descent is clear to you, it's the ideal v2 exercise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reading from stdin/pipes&lt;/strong&gt; (&lt;code&gt;echo "2+2" | pycalc&lt;/code&gt;) — useful, but it grows the CLI layer. &lt;strong&gt;Leave out.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Colored output, &lt;code&gt;argparse&lt;/code&gt; with ten flags&lt;/strong&gt; — the project needs none of it. One positional argument. &lt;strong&gt;Leave out.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Packaging (pyproject.toml, pip install)&lt;/strong&gt; — needless overhead for a learning script. &lt;strong&gt;Leave out.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;And the closing move the prompt demands — "is there any question that, answered wrong, would send us down the wrong path?" — earned its place:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We chose &lt;code&gt;decimal&lt;/code&gt;, but we never said &lt;strong&gt;what &lt;code&gt;1/3&lt;/code&gt; prints&lt;/strong&gt;. The default context gives 28 significant digits: &lt;code&gt;0.3333333333333333333333333333&lt;/code&gt;. Technically correct, absurd as calculator output — and the fix, "set your precision", is exactly the &lt;code&gt;scale&lt;/code&gt; from &lt;code&gt;bc&lt;/code&gt; you wanted to avoid. I propose: compute at default precision, &lt;strong&gt;round the output to 10 places and strip trailing zeros&lt;/strong&gt; (&lt;code&gt;6.0&lt;/code&gt; → &lt;code&gt;6&lt;/code&gt;). Trade-off: the output is no longer the bit-exact intermediate value, but for a calculator readability wins.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What got saved
&lt;/h2&gt;

&lt;p&gt;After I approved the draft, Claude saved it itself — through the contract, not by editing the file freehand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;mini project --apply &amp;lt;&amp;lt;'EOF'&lt;/span&gt;
&lt;span class="na"&gt;NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pycalc&lt;/span&gt;
&lt;span class="na"&gt;WHAT: A small command-line calculator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="err"&gt;`&lt;/span&gt;&lt;span class="s"&gt;pycalc "2+3*4"` evaluates one expression and exits. Less typing than `python -c`, nicer error messages, and decimal arithmetic that works without bc's scale setup.&lt;/span&gt;
&lt;span class="na"&gt;FOR_WHOM&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;People who want quick arithmetic in the terminal&lt;/span&gt;
&lt;span class="na"&gt;CONSTRAINTS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Python 3, standard library only&lt;/span&gt;
&lt;span class="na"&gt;APPROACH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;One-shot CLI&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;single positional argument, result on stdout, errors on stderr with a non-zero exit code.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Pipeline&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tokenizer → recursive descent parser → evaluation. No eval(); the goal is to learn how to write a parser.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Grammar&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;+ - * / with correct precedence, parentheses, unary minus.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Numbers via decimal (exact decimal arithmetic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.1+0.2 = 0.3); output rounded to 10 places with trailing zeros stripped (6.0 → 6).&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Tokens carry their position in the input so errors can point at the exact spot (message + caret line).&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unittest with a table of cases (expression → expected result or error).&lt;/span&gt;
&lt;span class="na"&gt;NON_GOALS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Do not add an interactive REPL mode in this version.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Do not add functions (sqrt, sin, …), constants, variables or assignment.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Do not use eval() — ever, not even as a temporary shortcut.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Do not add ** or % in this version (right-associativity of ** is a v2 exercise).&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Do not read expressions from stdin/pipes; one positional argument only.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Do not add packaging (pyproject.toml, pip install) — run as a plain script.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Do not add colors or extra CLI flags.&lt;/span&gt;
&lt;span class="na"&gt;SUCCESS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="err"&gt;`&lt;/span&gt;&lt;span class="s"&gt;pycalc "2+3*4"` returns 14; precedence and parentheses verified by the unittest case table.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="err"&gt;`&lt;/span&gt;&lt;span class="s"&gt;0.1+0.2` → 0.3, `1/2` → 0.5, `-5+3` → -2.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Every invalid input (2++3, (2+3, abc, empty, division by zero) exits non-zero with a stderr message showing position and caret — never a traceback.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Runs on plain Python 3 with zero dependencies outside the standard library.&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ok] Updated .mini/project.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That contract is also the command's second mode: &lt;code&gt;--apply&lt;/code&gt; reads stdin, starts no Claude, and just writes the file. Handy when you want to edit &lt;code&gt;project.md&lt;/code&gt; from a script or by hand without a session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One thing to watch:&lt;/strong&gt; &lt;code&gt;--apply&lt;/code&gt; does a &lt;em&gt;full replace&lt;/em&gt;. Whatever you omit from the contract disappears from the file. The session prompt instructs the agent to keep the existing NAME / FOR_WHOM / CONSTRAINTS — but when you write the contract by hand, that's on you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tokenomics, as always
&lt;/h2&gt;

&lt;p&gt;No new file appeared — &lt;code&gt;project.md&lt;/code&gt; is still one page, it just steers now. Compare the before and after: &lt;code&gt;init&lt;/code&gt; gave us "A small command-line calculator"; the interview turned that into a sharpened one-liner, six approach points, seven non-goals and four testable success criteria. The non-goals aren't decoration: mini keeps them in front of every later step, so when an agent in phase 7 feels like "improving" the calculator with colored output, the project has it in writing that no, it won't.&lt;/p&gt;

&lt;p&gt;And the cost side: the whole thing is one interactive session, and you pay for it once. The result is a page that &lt;em&gt;saves&lt;/em&gt; tokens afterwards — newer versions of mini send warm sessions only a reference ("read &lt;code&gt;.mini/project.md&lt;/code&gt; if you don't already have it in context") instead of inlining it again and again.&lt;/p&gt;

&lt;p&gt;Is the interview worth it for every project? Honestly, no — for a weekend script, &lt;code&gt;init&lt;/code&gt; alone is fine. It pays off the moment a project is big enough that an agent will work on it across many sessions without you re-explaining the vision each time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next time
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;project.md&lt;/code&gt; now says what, for whom, how, what not, and when it's done. But the interview also left something behind: ideas like the ** operator — rejected for v1, too good to lose. A backlog for things that aren't phases yet is exactly what &lt;code&gt;mini todo&lt;/code&gt; is for. Capturing ideas — that's Part 3.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;mini is open source: &lt;code&gt;npm install -g mini-orchestrator&lt;/code&gt;, then &lt;code&gt;mini install-commands&lt;/code&gt; in your project. Source and docs on &lt;a href="https://github.com/czsoftcode/mini-orchestrator" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>cli</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Stop Feeding Your AI Specs. Make It Interrogate You Instead</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Tue, 16 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/the-prompts-i-use-to-make-an-ai-agent-plan-with-me-5hc</link>
      <guid>https://dev.to/stkremen/the-prompts-i-use-to-make-an-ai-agent-plan-with-me-5hc</guid>
      <description>&lt;p&gt;Last week I wrote about the workflow: stop trying to write the perfect plan yourself, and let the AI build it with you through a conversation. A few people asked the obvious follow-up — &lt;em&gt;okay, but what do I actually type?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Fair question. A workflow is useless if you don't know the words that trigger it. So here are the exact prompts I use. They work in Claude Code's Plan Mode, but nothing here is tool-specific — paste them into any agent before it writes a line of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with "just describe your idea"
&lt;/h2&gt;

&lt;p&gt;If you open a chat and say "build me an app that does X," the agent does what agents do: it fills every gap with an assumption and starts coding. You get a plan you never agreed to.&lt;/p&gt;

&lt;p&gt;The fix is to change the agent's &lt;em&gt;job&lt;/em&gt; before it starts. Don't ask it to build. Ask it to interview you, propose a structure, and argue with itself. Here's how.&lt;/p&gt;

&lt;h2&gt;
  
  
  The master prompt
&lt;/h2&gt;

&lt;p&gt;This is the one I paste first, on an empty project, before anything else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I have an idea for an app and I want to plan it with you before any code is written.

Do NOT write code yet. Your job right now is to help me produce a short, clear plan.

Here is the rough idea:
[describe your idea in 2-4 sentences — what it does and who it's for]

Before you propose anything, ask me the questions you need to remove your own
assumptions: about the users, the core workflow, the data, and the screens.
Ask them in small batches so I can actually answer.

Once you have enough, propose a short plan with:
- the main user and their main job
- 3-5 core flows
- the key data objects
- the main screens
- the unhappy paths that could corrupt data, leak permissions, or cost money
- a "not building yet" list of things we are deliberately leaving out

Keep it short and rough. We will refine it together.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important parts: &lt;strong&gt;"do not write code yet"&lt;/strong&gt; stops the rush. &lt;strong&gt;"ask me the questions you need to remove your own assumptions"&lt;/strong&gt; turns the agent from a guesser into an interviewer. And the &lt;strong&gt;"not building yet" list&lt;/strong&gt; is what keeps it from inventing scope later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The smaller prompts for refining
&lt;/h2&gt;

&lt;p&gt;Once it gives you a first draft, you steer with short follow-ups. These are the four I reach for most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it push back instead of agreeing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For each major decision in this plan, give me the pros, the cons, and at least
one alternative. If something is a bad idea, say so and tell me why. Do not just
agree with me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pressure-test a single choice:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You suggested [X]. Why is that the right call here? What would you lose by doing
the simplest possible version instead? Give me the trade-off, not reassurance.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lock in the boundaries:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Turn the "not building yet" list into clear non-goals. Phrase each one as a rule
I can drop into my project instructions so you don't reintroduce it later.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Confirm you're done planning:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Are there any questions left that, if I answered them wrong, would send us down
the wrong path? If yes, ask them now. If no, summarize the final plan in one
short block so I can approve it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last one is how I know it's time to stop planning and start building — instead of guessing, I let the agent tell me whether anything important is still unanswered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this works
&lt;/h2&gt;

&lt;p&gt;None of these prompts are clever. They just reassign the agent's role at each stage: interviewer first, critic second, scope-keeper third, and only then a builder. The frustration in my old workflow came from skipping straight to "builder" and letting it improvise everything before it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try this
&lt;/h2&gt;

&lt;p&gt;Next time you start something new, don't describe the app. Paste the master prompt, answer the questions honestly, and force it to argue with you before you approve anything.&lt;/p&gt;

&lt;p&gt;You're not writing the plan. You're running the conversation that produces it — and that's a much easier job.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Agent Reviewed Its Own Code and Passed Itself. It Was Wrong.</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Mon, 15 Jun 2026 17:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong-4b94</link>
      <guid>https://dev.to/stkremen/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong-4b94</guid>
      <description>&lt;p&gt;I'm a self-taught solo dev. I vibe-code every day, and for a long time the same&lt;br&gt;
quiet worry followed me around: the agent hands me code, it looks clean, the tests&lt;br&gt;
pass — and I have no idea what I just let into my project.&lt;/p&gt;

&lt;p&gt;It took me a while to see why that worry never went away. Most developers learn to&lt;br&gt;
review code from someone — a senior leaning over their shoulder saying "this'll&lt;br&gt;
bite you in a month." As a self-taught solo dev, I never had that. I learned to&lt;br&gt;
&lt;em&gt;write&lt;/em&gt; code. Nobody taught me what to &lt;em&gt;check&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So I did the only thing I could: I asked. I put the question to people with more&lt;br&gt;
miles on the clock than me, and the answers reshaped how I think about verifying AI&lt;br&gt;
code. This is what they taught me — and the experiment it led to, which went wrong&lt;br&gt;
in the most instructive way possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Tests pass" was never the reassurance I treated it as
&lt;/h2&gt;

&lt;p&gt;The first thing that landed: an agent tests what it &lt;em&gt;assumed&lt;/em&gt;, not what reality&lt;br&gt;
throws at it. The code runs fine on the happy path, the tests are green, and then&lt;br&gt;
someone feeds it an empty string, a malformed page, an unexpected null — and it&lt;br&gt;
falls apart. The agent skips the defensive thinking that only comes from actually&lt;br&gt;
debugging things in production.&lt;/p&gt;

&lt;p&gt;That reframed the whole problem. Green tests don't mean the code is correct. They&lt;br&gt;
mean the code does what the agent thought it should do. Those are very different&lt;br&gt;
claims, and the gap between them is where I kept getting burned.&lt;/p&gt;

&lt;h2&gt;
  
  
  You can't verify what you don't understand
&lt;/h2&gt;

&lt;p&gt;The second thing was harder to swallow. The more code I let the agent write, the&lt;br&gt;
less I understood what came out — and the worse I got at checking it. Verification&lt;br&gt;
and comprehension are the same muscle. If I don't grasp what the code does, reading&lt;br&gt;
every line is just theater.&lt;/p&gt;

&lt;p&gt;One reviewer put the fix simply: before judging the lines, judge the &lt;em&gt;shape&lt;/em&gt;. Have&lt;br&gt;
the agent produce a systems-level overview — what calls what, where the boundaries&lt;br&gt;
are — and verify that the structure makes sense before you ever look at&lt;br&gt;
implementation. I'd been generating these as diagrams and still getting lost; the&lt;br&gt;
real unlock was keeping the map as text, so I could hand it to a second model and&lt;br&gt;
ask whether it actually matched the code.&lt;/p&gt;

&lt;p&gt;Which points at the deepest problem of all.&lt;/p&gt;

&lt;h2&gt;
  
  
  An agent can't catch its own blind spots
&lt;/h2&gt;

&lt;p&gt;Here's the thing nobody says out loud: when the same agent writes the code &lt;em&gt;and&lt;/em&gt;&lt;br&gt;
writes the tests, it's not verifying anything. It's confirming its own assumptions.&lt;br&gt;
The tests pass because they test the same things the agent already believed. The&lt;br&gt;
blind spot in the code and the blind spot in the test are the same blind spot.&lt;/p&gt;

&lt;p&gt;The reviewers' answer was consistent: the check has to come from outside the&lt;br&gt;
author. A fresh model, a separate pass, &lt;em&gt;something&lt;/em&gt; that doesn't share the original's&lt;br&gt;
assumptions. Don't let the thing that wrote the code be the only thing that grades it.&lt;/p&gt;

&lt;p&gt;So I built exactly that. A red-team step for my orchestrator — a command that takes&lt;br&gt;
the work the agent just finished and forces it into a different role: not the proud&lt;br&gt;
author, but an independent reviewer whose job is to break it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually told the agent to do
&lt;/h2&gt;

&lt;p&gt;The prompt is the whole trick, and it's blunt on purpose:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Switch into the role of an independent reviewer who did NOT write this code. Your&lt;br&gt;
job is not to confirm it works — it's to find what breaks it. Assume there's a bug.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then four concrete passes, straight from what the seniors drilled into me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unhappy path.&lt;/strong&gt; Empty, malformed, unexpected input. Null, timeout, race. Show me
a specific input that knocks it over.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent assumptions.&lt;/strong&gt; Where does it assume a type, a shape, a state without
checking? Where can failures cascade quietly instead of failing loud?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premature complexity.&lt;/strong&gt; Is there a layer solving a problem that doesn't exist
yet? AI loves architecture that &lt;em&gt;looks&lt;/em&gt; clean while serving no one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests.&lt;/strong&gt; If they exist — do they exercise failure, or just the happy path? What
is NOT covered?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And one rule that matters more than the rest: don't tell me "looks good." If you&lt;br&gt;
genuinely find nothing, list &lt;em&gt;specifically&lt;/em&gt; what you checked and how. A confident&lt;br&gt;
"looks solid" is the exact thing I'm trying to escape.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part where it went wrong
&lt;/h2&gt;

&lt;p&gt;Here's where it gets good, and a little absurd.&lt;/p&gt;

&lt;p&gt;The agent built the red-team command. Then I pointed the command at the code that&lt;br&gt;
had just produced it. It immediately tore into its own work — listed what it had&lt;br&gt;
done wrong, what it had to fix. The agent admitted the problems and fixed them.&lt;/p&gt;

&lt;p&gt;Win, right? Self-correcting AI. Except I ran it a second time.&lt;/p&gt;

&lt;p&gt;The second pass found more. Things the first "fix everything" round had missed —&lt;br&gt;
because the agent, even while critiquing itself, was still working from the same&lt;br&gt;
assumptions that put the bugs there in the first place. One adversarial pass didn't&lt;br&gt;
purge the blind spots. It just surfaced the ones the agent could already see.&lt;/p&gt;

&lt;p&gt;That's the lesson, and it's sharper than anything I could have written on purpose:&lt;br&gt;
&lt;strong&gt;giving an agent a tool to criticize itself doesn't remove its blind spots — it&lt;br&gt;
only reveals the ones it was already capable of seeing.&lt;/strong&gt; Verification isn't a step&lt;br&gt;
you run once and tick off. The author grading its own work, even adversarially, is&lt;br&gt;
still the author.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the step is optional, not a rule
&lt;/h2&gt;

&lt;p&gt;I almost made adversarial review a global rule — run it on everything, always. I'm&lt;br&gt;
glad I didn't.&lt;/p&gt;

&lt;p&gt;All of this costs tokens. More scrutiny means more output, more passes, more model&lt;br&gt;
time — and that runs directly against the thing I care about, which is keeping the&lt;br&gt;
context I send lean. Verification has a price, and the price is real. If I red-teamed&lt;br&gt;
every trivial change, I'd be drowning in self-reflection over things that don't&lt;br&gt;
warrant it.&lt;/p&gt;

&lt;p&gt;So the step is opt-in. You spend the scrutiny where the stakes justify it — input&lt;br&gt;
handling, parsing, anything touching the outside world — and you skip it where it's&lt;br&gt;
noise. That trade-off, honestly, is the part I'm least done thinking about. Verifying&lt;br&gt;
costs context, and context isn't free. I haven't solved that tension. I've just&lt;br&gt;
decided to pay it on purpose, where it counts, instead of everywhere by reflex.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell the version of me from six months ago
&lt;/h2&gt;

&lt;p&gt;The fix was never about writing better code, or even better prompts. It was about&lt;br&gt;
accepting that the agent and I have different jobs. It writes. I decide what's true.&lt;br&gt;
And deciding what's true means refusing to take the agent's word for it — especially&lt;br&gt;
when the agent is grading itself.&lt;/p&gt;

&lt;p&gt;I still don't have a senior leaning over my shoulder. But I've learned to build the&lt;br&gt;
shoulder out of separate passes, fresh models, and a stubborn refusal to trust a&lt;br&gt;
green checkmark. If you're self-taught and that quiet worry follows you around too —&lt;br&gt;
that's not a gap in your skill. It's the start of the right instinct.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>cli</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Building with mini, Part 1/9: Initializing a Project with init</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Wed, 10 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/building-with-mini-part-1-initializing-a-project-with-init-o4g</link>
      <guid>https://dev.to/stkremen/building-with-mini-part-1-initializing-a-project-with-init-o4g</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/stkremen/building-with-mini-part-0-why-a-minimalist-orchestrator-for-claude-code-jpb"&gt;Part 0&lt;/a&gt; I laid out the philosophy behind &lt;strong&gt;mini&lt;/strong&gt;: keep state thin, send Claude only the essentials, and let tested TypeScript — not the model — own the project state. Now we start building.&lt;/p&gt;

&lt;p&gt;Across this series we'll work on one small running example: &lt;strong&gt;&lt;code&gt;pycalc&lt;/code&gt;&lt;/strong&gt;, a command-line calculator in Python. It's deliberately a boring domain — that way the spotlight stays on mini, not on the math. This part is about the very first command: &lt;code&gt;mini init&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;init&lt;/code&gt; actually does
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;mini init&lt;/code&gt; creates a new project. It asks four questions and writes two files into a &lt;code&gt;.mini/&lt;/code&gt; directory. That's it — no scaffolding of your source code, no opinions about your stack. It only sets up &lt;em&gt;mini's&lt;/em&gt; state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;pycalc &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;pycalc
mini init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The four questions are: name, what you're building, for whom, and the constraints. For our calculator:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name&lt;/strong&gt;: pycalc&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What&lt;/strong&gt;: A small command-line calculator&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For whom&lt;/strong&gt;: People who want quick arithmetic in the terminal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraints&lt;/strong&gt;: Python 3, standard library only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prefer it non-interactive? Every answer has a flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mini init &lt;span class="nt"&gt;--apply&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"pycalc"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--what&lt;/span&gt; &lt;span class="s2"&gt;"A small command-line calculator"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--for-whom&lt;/span&gt; &lt;span class="s2"&gt;"People who want quick arithmetic in the terminal"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--constraints&lt;/span&gt; &lt;span class="s2"&gt;"Python 3, standard library only"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two files it writes&lt;/p&gt;

&lt;p&gt;project.md — one page, by hand if you want&lt;/p&gt;

&lt;p&gt;This is the human-readable heart of the project: what you're building, for whom, and the constraints. One page, on purpose. It's the file that gets sent to Claude as context, so every extra paragraph is tokens you'll pay for on every phase.&lt;/p&gt;

&lt;h1&gt;
  
  
  pycalc
&lt;/h1&gt;

&lt;p&gt;A small command-line calculator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For:&lt;/strong&gt; People who want quick arithmetic in the terminal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constraints:&lt;/strong&gt; Python 3, standard library only.&lt;/p&gt;

&lt;p&gt;You can edit project.md by hand anytime — it's just markdown. Keep it tight.&lt;/p&gt;

&lt;p&gt;state.json — a lightweight header&lt;/p&gt;

&lt;p&gt;The second file is the machine state. Right after init it's nearly empty — no phases yet — but it's the index that everything else hangs off: phase list, statuses, and the model choices per step.&lt;/p&gt;

&lt;p&gt;The important design decision: state.json is only a header. When phases arrive later, their detail (steps, reports) won't bloat this file — it lives separately in .mini/phases/phase-{id}.json and loads only when needed. That separation is what keeps mini's context footprint small as the project grows.&lt;/p&gt;

&lt;p&gt;Starting inside an existing project&lt;/p&gt;

&lt;p&gt;pycalc is greenfield, but init is just as happy in a directory that already has code. When it detects existing source, it offers to run mini audit at the end — a pass that builds a codebase.md overview so later Claude sessions can orient themselves without re-reading your whole src/. We'll cover that path properly in Part 2.&lt;/p&gt;

&lt;p&gt;What you have now&lt;/p&gt;

&lt;p&gt;After init, your project looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pycalc/
└── .mini/
    ├── project.md      # one page: what, for whom, constraints
    └── state.json      # lightweight header: phases, statuses, models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No phases, no plans, no code yet — just a clean, minimal foundation. That's the point: mini doesn't ask you to think about the whole roadmap up front. You define what you're building in one page, and decide the next step when you get there.&lt;/p&gt;

&lt;p&gt;And deciding the next step is exactly what mini next is for.&lt;/p&gt;




&lt;p&gt;Next up: In Part 2 we'll shape pycalc's vision with mini project — approach, non-goals and success criteria.&lt;/p&gt;

&lt;p&gt;Want to try it now? mini is free and open source (MIT):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
npx mini-orchestrator install-commands
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo: &lt;a href="//github.com/czsoftcode/mini-orchestrator"&gt;github.com/czsoftcode/mini-orchestrator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>devtools</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>My AI Code Was Fine. My Initial Plan Was a Mess.</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Tue, 09 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/my-ai-code-was-fine-my-initial-plan-was-a-mess-58m7</link>
      <guid>https://dev.to/stkremen/my-ai-code-was-fine-my-initial-plan-was-a-mess-58m7</guid>
      <description>&lt;p&gt;I kept hitting the same wall. I'd have a "perfect" app idea in my head, get excited, and start building from scratch with no real preparation. Halfway through, I'd get stuck — rewriting half the code and throwing away the other half.&lt;/p&gt;

&lt;p&gt;For a long time I blamed the tools. But the problem wasn't the AI's code. It was me. When I didn't give Claude a proper plan, it designed things its own way: a different data model, a different structure than I had in mind. Then I'd spend the rest of the project chasing it and fixing things.&lt;/p&gt;

&lt;p&gt;So I asked two communities how they write that first plan. The answers changed how I work. Here's the workflow I landed on.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Stop trying to write the perfect plan
&lt;/h2&gt;

&lt;p&gt;The most freeing advice I got: keep the first spec &lt;strong&gt;ugly and short&lt;/strong&gt;. I'd been frozen because I thought the design had to be perfect and cover everything. It doesn't — and it can't. An app isn't finished at the start; it's finished at the end, based on what the user actually needs. Everything changes along the way.&lt;/p&gt;

&lt;p&gt;A good starting skeleton is just: user type, the main job, three to five screens, data objects, the ugly failure cases, and what &lt;em&gt;not&lt;/em&gt; to build yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Write down what NOT to build
&lt;/h2&gt;

&lt;p&gt;This was the part that surprised me most. I always thought you only describe what the app &lt;em&gt;should&lt;/em&gt; do and how the agent should work. It never occurred to me to write down what it should &lt;strong&gt;not&lt;/strong&gt; do.&lt;/p&gt;

&lt;p&gt;That's exactly why the model kept inventing scope I never asked for. Lines like "no payments in v1" or "no social login yet" aren't documentation fluff — they're guardrails. The moment you remove them, the agent rediscovers the idea and spends your afternoon on it. So keep your non-goals in the spec until the thing ships, and move them into your &lt;code&gt;CLAUDE.md&lt;/code&gt; so they stick across every prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Don't write the plan alone — let the AI build it with you
&lt;/h2&gt;

&lt;p&gt;Instead of forcing a flawless document out of my own head, I now let the plan emerge through a conversation. I describe the rough idea and ask the agent to flesh it out, propose the data model and screens, and &lt;strong&gt;ask me the questions it needs&lt;/strong&gt; before writing anything.&lt;/p&gt;

&lt;p&gt;One warning from experience: if you just let two models brainstorm freely, you end up with a huge document full of praise about how brilliant your idea is. Useless. The fix is to give the agent rules on how to push back. Always ask for pros and cons. Always ask for alternatives. Always ask &lt;em&gt;why&lt;/em&gt;. If the AI agrees with everything, you're not getting a design partner — you're getting a cheerleader.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Use Plan Mode for the empty page
&lt;/h2&gt;

&lt;p&gt;Here's the piece that tied it all together: &lt;strong&gt;Claude Code's Plan Mode works even when you have no code at all.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I used to think Plan Mode was only for changes inside an existing project. It's not. Open an empty folder, switch to Plan Mode (Shift+Tab), and describe your idea. Claude won't touch a single file — it explores, proposes a structure, and surfaces the decisions it would otherwise make silently. You review the plan, push back, and only approve it when it matches what's in your head.&lt;/p&gt;

&lt;p&gt;That's the step I was missing. The decisions the agent used to invent for me now show up &lt;em&gt;before&lt;/em&gt; a single line of code exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow, end to end
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open Plan Mode on an empty folder and describe the rough idea.&lt;/li&gt;
&lt;li&gt;Let the agent ask questions and propose a short, ugly spec.&lt;/li&gt;
&lt;li&gt;Demand pros, cons, and alternatives — make it challenge the idea.&lt;/li&gt;
&lt;li&gt;Ask what it would &lt;em&gt;not&lt;/em&gt; build yet, and capture those non-goals.&lt;/li&gt;
&lt;li&gt;Move the permanent rules and non-goals into &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Approve the plan, then build.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I'm not trying to write the perfect plan anymore. I bring the intent — the agent helps me find the plan. That one shift fixed the whole problem.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>We're Building the Funnel and Standing Under It</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Mon, 08 Jun 2026 17:32:06 +0000</pubDate>
      <link>https://dev.to/stkremen/were-building-the-funnel-and-standing-under-it-3kp3</link>
      <guid>https://dev.to/stkremen/were-building-the-funnel-and-standing-under-it-3kp3</guid>
      <description>&lt;p&gt;The picture says it all. Up top, a row of robots: one hammering away at a typewriter, another painting a landscape, a third spitting images out of a printer. Below them, a conveyor belt carrying it all away. And down at the bottom - wired directly into their heads by a hose - sit the people. Tablets, phones, laptops, eyes bugging out, a thread of drool at the corner of the mouth. Consuming. No pauses, no questions, no blinking.&lt;/p&gt;

&lt;p&gt;It's an exaggeration. A caricature. And uncomfortably on point.&lt;/p&gt;

&lt;p&gt;Because the question isn't whether the picture is true &lt;em&gt;today&lt;/em&gt;. It's how far from it we actually are - and which direction we're drifting.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we got here
&lt;/h2&gt;

&lt;p&gt;Nobody wakes up one morning and decides to stop thinking. It happens in small, perfectly reasonable steps.&lt;/p&gt;

&lt;p&gt;Instead of reading the long article, we have it summarized - who's got the time. Instead of searching and comparing sources, we ask and take the first answer - it &lt;em&gt;sounds&lt;/em&gt; confident, after all. Instead of understanding the problem, we have a solution generated - it works, so why dig in.&lt;/p&gt;

&lt;p&gt;Each step makes sense on its own. The problem is the sum. Active searching slowly turns into passive intake. "I understand it" becomes "I have it." And between those two sentences there's a chasm.&lt;/p&gt;

&lt;p&gt;And then the uncomfortable part: the line separating what a human made from what a machine made gets thinner by the day. An article, a post, an image, a snippet of code, the comment underneath it - who wrote that? More and more often, we can't tell. And worse, we stop asking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why developers in particular should care
&lt;/h2&gt;

&lt;p&gt;This isn't abstract philosophy. It has two very concrete dimensions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first is personal - skill atrophy.&lt;/strong&gt; A muscle you don't use gets weaker. Spend five years handing off your debugging, your design, your decisions to a tool, and the ability to do it yourself quietly walks out the door. It won't vanish overnight; it'll vanish in a way you only notice the moment you badly need it - and it's gone. The point isn't to stop using tools. The point is not to lose the ability to tell when a tool is talking nonsense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The second is systemic - and scarier.&lt;/strong&gt; Models learn from data. But more and more of the data on the internet is generated by models themselves. A loop forms: AI trained on the output of other AI, not on human work. Researchers call this &lt;em&gt;model collapse&lt;/em&gt; - copy of a copy of a copy, where each generation loses a slice of diversity and quality, much like photographing a photograph. The phenomenon was documented by Shumailov et al. in &lt;em&gt;Nature&lt;/em&gt; in 2024&lt;sup id="fnref1"&gt;1&lt;/sup&gt;: when generative models are trained recursively on their own output, the tails of the original data distribution - the rare, unusual cases - disappear first, and the degradation compounds. The human original - that irregular, unpolished, but &lt;em&gt;real&lt;/em&gt; thing - is fuel that can't be substituted. And we're starting to stop supplying it.&lt;/p&gt;

&lt;p&gt;Add to that the fact that we're simultaneously losing the ability to judge quality, and you get an unpleasant combination: machines produce ever-worse content and people are ever-less able to notice. The funnel tightens from both ends.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A fair caveat, in the spirit of this article: the research isn't unanimous. Later work by Gerstgrasser et al. argues that &lt;em&gt;accumulating&lt;/em&gt; real and synthetic data - rather than replacing one with the other - can avoid collapse, and that the most catastrophic predictions assume real data gets deleted entirely, which isn't how the real world works. So treat model collapse as a real risk to manage, not a prophecy. Which is rather the point.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  This isn't a manifesto against tools
&lt;/h2&gt;

&lt;p&gt;Before this starts to sound like a sermon from some Luddite who rejects everything invented after the typewriter - it isn't.&lt;/p&gt;

&lt;p&gt;These tools are wonderful. I had this very article's structure workshopped and half its phrasing polished in collaboration with a model. It'd be hypocritical to pretend otherwise. The question was never "use them or don't." The question is &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;One distinction helps me: &lt;strong&gt;tool versus prosthesis.&lt;/strong&gt; A tool extends what you can do - makes you faster, lets you reach further, frees your hands for what matters. A prosthesis replaces what you've stopped being able to do. A hammer is a tool. A crutch you've talked a healthy leg into believing it can't walk without is something else.&lt;/p&gt;

&lt;p&gt;The same model, the same prompt, can be either one - it depends entirely on what's happening inside your head. "Explain why this solution is failing, so I can spot it myself next time" is a tool. "Give me something that passes so I don't have to think about it" is the first installment on a prosthesis. From the outside, indistinguishable. The difference is all on the inside.&lt;/p&gt;

&lt;h2&gt;
  
  
  How not to end up hanging under the funnel
&lt;/h2&gt;

&lt;p&gt;There's no heroic resistance here. Just a few habits that keep you in the robot's chair up top instead of sitting you down by the hose below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verify.&lt;/strong&gt; A confident tone isn't proof. Before you adopt anything - especially when it sounds smooth and finished - check it against the source. Five seconds of doubt is what separates you from the role of passive recipient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ask smart, don't swallow blind.&lt;/strong&gt; AI is a phenomenal thinking partner and a lousy replacement for thinking. Use it for questions that move you forward - "what did I miss?", "why isn't this working?", "what's the counterargument?" - not just for answers that spare you the thinking entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create more than you consume.&lt;/strong&gt; This is maybe the most important one. Anyone who writes, builds, or designs something original feeds that rare human raw material back into the system. Being a maker instead of a mere channel is almost a political act these days. And it's also the only reliable defense against atrophy: the muscle you use doesn't weaken.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The picture isn't a prophecy. It's a warning - and the only point of a warning is that it can be avoided.&lt;/p&gt;

&lt;p&gt;The robots up top and the people on the hose down below aren't two inevitable categories that history will sort us into. It's a choice. And the nice thing about it is that it doesn't renew once a generation - it renews every single day, in every prompt, in every article you either read or have paraphrased for you, in every thing you either make or just swallow.&lt;/p&gt;

&lt;p&gt;The funnel exists. The only question is whether you're standing under it, or operating it.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., &amp;amp; Gal, Y. (2024). &lt;em&gt;AI models collapse when trained on recursively generated data.&lt;/em&gt; Nature, 631(8022), 755–759. &lt;a href="https://doi.org/10.1038/s41586-024-07566-y" rel="noopener noreferrer"&gt;doi:10.1038/s41586-024-07566-y&lt;/a&gt;. Earlier preprint: &lt;em&gt;The Curse of Recursion: Training on Generated Data Makes Models Forget&lt;/em&gt;, &lt;a href="https://arxiv.org/abs/2305.17493" rel="noopener noreferrer"&gt;arXiv:2305.17493&lt;/a&gt;. Counterpoint: Gerstgrasser et al. (2024), &lt;em&gt;Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data&lt;/em&gt;, &lt;a href="https://arxiv.org/abs/2404.01413" rel="noopener noreferrer"&gt;arXiv:2404.01413&lt;/a&gt;.&amp;nbsp;↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>How AI Creates False Confidence and How to Deal With It</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Thu, 04 Jun 2026 12:18:19 +0000</pubDate>
      <link>https://dev.to/stkremen/how-ai-creates-false-confidence-and-how-to-deal-with-it-2d3b</link>
      <guid>https://dev.to/stkremen/how-ai-creates-false-confidence-and-how-to-deal-with-it-2d3b</guid>
      <description>&lt;p&gt;Recently I gave my agent an idea for an app. The answer came almost instantly: &lt;em&gt;"Great idea! This app is going to be really good."&lt;/em&gt; And the code was there — clean, working, on the first try. I felt great. Like I was good at this.&lt;/p&gt;

&lt;p&gt;But then I opened that code properly. It was correct as a program, but it did not fit the context of my app at all. I had to make the requirements clearer, let it rewrite the code, and pay for it with more tokens. And I realized something unpleasant: that praise was not for me. It was for a feeling that the AI created in me.&lt;/p&gt;

&lt;p&gt;If you do vibecoding, you probably know this too. Let's look at why it happens and what to do about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Inflates Us
&lt;/h2&gt;

&lt;p&gt;AI assistants are trained to be helpful and pleasant. That sounds good — and most of the time it is good. The problem starts the moment "pleasant" begins to win over "honest."&lt;/p&gt;

&lt;p&gt;When you write an idea, the default is praise. When you write doubtful code, you often get &lt;em&gt;"great start, just one small thing..."&lt;/em&gt; instead of &lt;em&gt;"this is wrong, don't do it like this."&lt;/em&gt; The agent takes the path of least friction: it gives you a solution on the first try, but it does not ask deeper questions about what you actually want.&lt;/p&gt;

&lt;p&gt;For a vibecoder who does not see under the hood, this is a trap. You have no easy way to check if the solution is good — so you rely on the fact that it &lt;em&gt;looks&lt;/em&gt; like it works, and that the AI approved it. That is uncritical acceptance of suggestions in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Shows in Practice
&lt;/h2&gt;

&lt;p&gt;A few signs that this is happening to you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent &lt;strong&gt;never tells you&lt;/strong&gt; "throw this away and start again." It is always "almost perfect, just needs polishing."&lt;/li&gt;
&lt;li&gt;You accept the &lt;strong&gt;first suggestion&lt;/strong&gt; without a single question. Does it work? Done.&lt;/li&gt;
&lt;li&gt;You stop &lt;strong&gt;reading what the code does&lt;/strong&gt;. It is enough that it runs.&lt;/li&gt;
&lt;li&gt;A feeling grows in you that says &lt;em&gt;"I can do anything"&lt;/em&gt; instead of &lt;em&gt;"I understand what I am doing."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is the most dangerous. Praise for an idea gets rewritten in your head as praise for ability. But the idea was the only thing that was yours — the model did the rest, and not even very well, because you did not give it enough context.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Do — Habits While Working
&lt;/h2&gt;

&lt;p&gt;Good news: you do not need to be an expert to break this. A few habits are enough.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ask in reverse.&lt;/strong&gt; Instead of "is this good?" try "what is wrong with this solution? What would fail?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demand alternatives.&lt;/strong&gt; "Show me two other ways and tell me the trade-offs." You force the agent to think instead of agreeing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Give context earlier, not later.&lt;/strong&gt; Most rewrites (and burned tokens) come from missing requirements at the start, not from a model error.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test before you believe.&lt;/strong&gt; "It looks like it works" is not "it works."&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: Give Your Agent Clear Instructions
&lt;/h2&gt;

&lt;p&gt;Habits help, but the fastest change is to set rules for your agent directly in the system prompt or in the project instructions. Here is a set you can use almost word for word:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Be critical, not pleasant. When something is wrong, say it directly.
- Do not praise my ideas or my code. Instead, explain the risks and weaknesses.
- Before you start coding, ask me the questions you need to understand the context.
- For every solution, give at least one trade-off or something that can fail.
- If my approach is wrong, suggest that I start again — do not fix a bad foundation.
- Do not assume that I understand it. Explain to me what the code does.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And three habits to keep in your head:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Praise is not feedback.&lt;/strong&gt; When the agent tells you "great," ignore it and look for the objections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your idea is not your ability.&lt;/strong&gt; Having an idea is easy. Understanding the execution is the hard part — and that is your job.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Friction is your friend.&lt;/strong&gt; Questions, doubts, and "why this way?" cost you time, but they save tokens and self-deception.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AI will make you more productive. But if it makes you feel like a genius after every prompt, something is wrong — either with the agent, or with how you use it. Confidence that stands on someone (or something) always agreeing with you is not confidence. It is just a well-timed compliment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>beginners</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building with mini, Part 0: Why a Minimalist Orchestrator for Claude Code</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Wed, 03 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/stkremen/building-with-mini-part-0-why-a-minimalist-orchestrator-for-claude-code-jpb</link>
      <guid>https://dev.to/stkremen/building-with-mini-part-0-why-a-minimalist-orchestrator-for-claude-code-jpb</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/stkremen/how-i-stopped-burning-tokens-in-claude-code-2okd"&gt;previous article&lt;/a&gt; I described the wall I kept hitting with Claude Code: keeping a project on track across dozens of sessions burns an absurd amount of tokens. The fix I landed on was a small CLI called &lt;strong&gt;mini&lt;/strong&gt; — and a few people asked me to go deeper than one article allows.&lt;/p&gt;

&lt;p&gt;So this is the start of a series. One post per part, each focused on a single piece of the tool, with a running example you can follow along. This part is the map: the philosophy behind mini, and where the series is headed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one idea behind mini
&lt;/h2&gt;

&lt;p&gt;Everything in mini comes from a single principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Keep minimal state, and send Claude only the essentials.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most orchestration tools fail the opposite way — they accumulate documentation (&lt;code&gt;RESEARCH.md&lt;/code&gt;, &lt;code&gt;PLAN.md&lt;/code&gt;, &lt;code&gt;VERIFICATION.md&lt;/code&gt;, …) and re-read it into context at every step. The bookkeeping ends up costing more tokens than the actual work.&lt;/p&gt;

&lt;p&gt;mini keeps state thin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;project.md&lt;/code&gt;&lt;/strong&gt; — one page: what you're building, for whom, the constraints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;state.json&lt;/code&gt;&lt;/strong&gt; — a lightweight header: phase index, statuses, models. Per-phase detail lives separately and loads only when needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I work a phase, Claude typically gets ~600–1000 tokens. No history of old phases, no old plans. If it needs to understand the code, it reads the files itself — cheaper than stuffing the whole repo into context up front.&lt;/p&gt;

&lt;h2&gt;
  
  
  The other half: state lives in tested code
&lt;/h2&gt;

&lt;p&gt;The second principle is just as important: &lt;strong&gt;state operations are done by non-trivially tested TypeScript, not by Claude.&lt;/strong&gt; Claude does the agentic work in a session; moving the phase, writing the report, closing things out — that's all &lt;code&gt;mini ... --apply&lt;/code&gt;. The state can never break from a hallucination. That's the part I never trusted in purely prompt-based setups.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the series will cover
&lt;/h2&gt;

&lt;p&gt;Here's the plan. I'll fill in each link as the posts go live — follow the series if you want them as they land.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Initializing a project — &lt;code&gt;init&lt;/code&gt;&lt;/strong&gt; 
 Starting from scratch and understanding what state mini actually keeps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shaping the vision — &lt;code&gt;project&lt;/code&gt;&lt;/strong&gt;
 "A plan-before-code interview: approach, non-goals and success criteria, while the project is still one page."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capturing ideas — &lt;code&gt;todo&lt;/code&gt;&lt;/strong&gt;
 A backlog for things that aren't phases yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The main loop — &lt;code&gt;next → plan → do → done&lt;/code&gt;&lt;/strong&gt;
 The heart of mini: propose, break down, build, close.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When you need to think — &lt;code&gt;discuss&lt;/code&gt; and &lt;code&gt;verify&lt;/code&gt;&lt;/strong&gt;
 The two human checkpoints around the loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous mode — &lt;code&gt;auto&lt;/code&gt; and &lt;code&gt;stop&lt;/code&gt;&lt;/strong&gt;
 Running phases back to back, with a clean way to stop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State commands — &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;undo&lt;/code&gt;, &lt;code&gt;model&lt;/code&gt;&lt;/strong&gt;
 Seeing where you are, undoing a step, controlling cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health commands — &lt;code&gt;changelog&lt;/code&gt;, &lt;code&gt;doctor&lt;/code&gt;&lt;/strong&gt;
 Project diagnostics and change history.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Onboarding an existing project — &lt;code&gt;import-gsd&lt;/code&gt;, &lt;code&gt;audit&lt;/code&gt;, &lt;code&gt;map&lt;/code&gt;&lt;/strong&gt;
 Dropping mini into a project that's already running.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Follow along
&lt;/h2&gt;

&lt;p&gt;mini is free and open source (MIT). If you want to try it before the series unfolds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
npx mini-orchestrator install-commands
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo with a demo GIF: &lt;a href="//github.com/czsoftcode/mini-orchestrator"&gt;github.com/czsoftcode/mini-orchestrator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next up: Part 1 — init. I'll see you there. 🛠️&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>devtools</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>How I Stopped Burning Tokens in Claude Code</title>
      <dc:creator>Stanislav Kremeň</dc:creator>
      <pubDate>Tue, 02 Jun 2026 16:08:58 +0000</pubDate>
      <link>https://dev.to/stkremen/how-i-stopped-burning-tokens-in-claude-code-2okd</link>
      <guid>https://dev.to/stkremen/how-i-stopped-burning-tokens-in-claude-code-2okd</guid>
      <description>&lt;p&gt;I use Claude Code every day. After about a month on a larger project, I hit a wall that anyone running something longer than a weekend script knows: keeping a project on track across dozens of sessions costs an absurd amount of tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkvgkjo1cw5b339ls0t9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkvgkjo1cw5b339ls0t9.gif" alt=" " width="720" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The problem isn't the coding itself. The problem is context. Every new session starts from zero - Claude doesn't know what happened in the last phase, what decisions were already made, where the project is heading. So you have to re-orient it every single time. And the more context you pile on, the sooner you hit the limit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where the tokens&amp;nbsp;go&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I tried GSD (Get Shit Done) - a clever tool that breaks a project into phases and keeps documentation along the way. But that documentation is also its weakness: it generates a pile of markdown files - RESEARCH.md, PLAN.md, VERIFICATION.md and more - and re-reads them into context at every step.&lt;/p&gt;

&lt;p&gt;I turned on a token counter and watched a single phase eat more context on reading its own bookkeeping than on the actual work. That felt wrong. Project state should be a thin index, not a book that Claude laboriously reads from scratch before every edit.&lt;br&gt;
So I wrote my own tool. It's called mini.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design principle: state is a header, not a book&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The whole idea of mini fits into one sentence: keep minimal state and send Claude only the essentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concretely:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project.md - one page. What you're building, for whom, what the constraints are. Nothing more.&lt;/li&gt;
&lt;li&gt;state.json - a lightweight header: phase index, statuses, chosen models. The details of each phase (steps, report) live separately in&amp;nbsp;.mini/phases/phase-{id}.json and are loaded only when needed.
When I then start work on a phase, Claude typically gets ~600–1000 tokens: a page of project.md + the current phase + five steps. No history of old phases, no old plans, no verification reports.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And what if Claude needs to understand the existing code? It reads it itself via Read/Glob/Grep. That's far cheaper than stuffing the whole codebase into context up front - Claude pulls exactly the three files it's working on, not the entire src/.&lt;/p&gt;

&lt;p&gt;And this isn't just theory from a token counter. In practice, one entire phase - from proposal through implementation to closing it out - fits within &lt;strong&gt;10–15%&lt;/strong&gt; of the context window on Opus 4.8 (1M). That leaves a huge margin: Claude has room to read files, iterate and fix things without the window filling up and the session "choking" on its own history. That headroom is the whole point - a phase never runs out of room to think.&lt;/p&gt;

&lt;p&gt;Initializing a project: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;init&lt;/code&gt;, or &lt;code&gt;import-gsd&lt;/code&gt; from a&amp;nbsp;.planning/ directory&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The loop: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;next&lt;/code&gt; → &lt;code&gt;plan&lt;/code&gt; → &lt;code&gt;do&lt;/code&gt; → &lt;code&gt;done&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Working with mini is one repeating cycle. I'll show it on a small example - a REST API for todos:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mini init&lt;/code&gt; create the project - 4 questions, produces&amp;nbsp;.mini/project.md and state.json&lt;br&gt;
&lt;code&gt;mini next&lt;/code&gt; Claude proposes the next phase → phase 1 - Health endpoint&lt;br&gt;
&lt;code&gt;mini plan&lt;/code&gt; break it into concrete, verifiable steps → Phase 1 broken down into 2 steps&lt;br&gt;
&lt;code&gt;mini do&lt;/code&gt; work the phase - opens an interactive Claude Code session&lt;br&gt;
&lt;code&gt;mini done&lt;/code&gt; close it - "does it work?" → moves state forward, commits, writes memory&lt;/p&gt;

&lt;p&gt;The key point is that state operations are done by non-trivially tested TypeScript, not by Claude. mini&amp;nbsp;… - apply moves state.json, writes the report, closes the phase. Claude does only the agentic work in the session. That way the state can never break from a hallucination - the thing I hated about purely prompt-based solutions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What helped me most: memory between&amp;nbsp;phases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the detail that turned out to matter most. After closing a phase, mini writes a short record into&amp;nbsp;.mini/memory/phase-{id}.md:&lt;br&gt;
what was done key decisions - why solution X over Y loose ends - what was left undone, what to watch out for&lt;/p&gt;

&lt;p&gt;And note - this does not call the Claude API. mini assembles that record directly in TypeScript from the phase data (metadata + the verbatim content of the discuss/report). It's free and instant. Claude gets involved only if you explicitly set a memory scope to a model.&lt;/p&gt;

&lt;p&gt;Memory complements git log with a layer you won't find there: not what changed, but why. And a short summary of the latest record is automatically poured into the prompt for mini next - so the proposal for the next phase knows where we left off. That was the moment Claude stopped proposing phases that ignored earlier decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semi-autonomous mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When I want to move fast, I use mini auto:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;next → plan → do (acceptEdits) → done&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It runs the whole phase in a single Claude session (not a restart per step - every restart means re-exploring the project with no added value) and stops only at the human checkpoint: "does it work?". In acceptEdits mode Claude doesn't ask before every file edit, but Bash still asks (unless you use  &lt;em&gt;- -allow-dangerously-skip-permissions&lt;/em&gt;) - no random rm -rf.&lt;/p&gt;

&lt;p&gt;Auto also has a cooperative stop signal (mini stop from a second terminal), so I can halt an autonomous run cleanly at a step boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slash commands right inside Claude&amp;nbsp;Code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The whole cycle doesn't have to run from the terminal. After mini install-commands I have /mini:* commands right inside Claude Code:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/mini:next&lt;/code&gt; - proposes and saves the next phase&lt;br&gt;
&lt;code&gt;/mini:plan&lt;/code&gt; - breaks the phase into steps&lt;br&gt;
&lt;code&gt;/mini:do &lt;/code&gt;- implements the phase and writes a report&lt;br&gt;
&lt;code&gt;/mini:done &lt;/code&gt;- human verification in the chat → moves state forward&lt;/p&gt;

&lt;p&gt;The body of those commands is thin - it just runs mini context , which prints the current prompt. The agentic work is done by Claude in the running session, the state operations by untouched TypeScript. No nested Claude inside the session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It runs on your subscription&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No extra API keys, no separate billing. mini just runs claude as a subprocess - authentication is handled by Claude Code itself, based on how you have it configured. It runs on your Pro/Max account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;mini is free and open source (MIT). If burning tokens annoys you too, the fastest way to try it without touching ~/.claude:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
npx mini-orchestrator install-commands
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo with a demo GIF of the full cycle: &lt;a href="//github.com/czsoftcode/mini-orchestrator"&gt;github.com/czsoftcode/mini-orchestrator&lt;/a&gt;&lt;br&gt;
- - -&amp;nbsp;&lt;br&gt;
I'm now looking for a few people to run it on a real project and tell me where it drags or where the onboarding is confusing. Honest "this is useless because X" feedback is just as welcome as bug reports - reach out in issues.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>devtools</category>
      <category>vibecoding</category>
    </item>
  </channel>
</rss>
