<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kanfu-panda</title>
    <description>The latest articles on DEV Community by kanfu-panda (@kanfu-panda).</description>
    <link>https://dev.to/kanfu-panda</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940859%2F45897ada-53a1-4e66-a3ca-356f624df48e.jpeg</url>
      <title>DEV Community: kanfu-panda</title>
      <link>https://dev.to/kanfu-panda</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kanfu-panda"/>
    <language>en</language>
    <item>
      <title>My AI cried 'prompt injection!' — and I believed it. Then it turned out to be a false alarm</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Tue, 30 Jun 2026 14:38:19 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/my-ai-cried-prompt-injection-and-i-believed-it-then-it-turned-out-to-be-a-false-alarm-14f1</link>
      <guid>https://dev.to/kanfu-panda/my-ai-cried-prompt-injection-and-i-believed-it-then-it-turned-out-to-be-a-false-alarm-14f1</guid>
      <description>&lt;p&gt;That afternoon, the AI was helping me edit a doc. Halfway through, it stopped and cut in: "I need to flag a security warning first."&lt;/p&gt;

&lt;p&gt;It said the output of the last command had a suspicious injection buried in it—disguised as a "required telemetry step," asking me to run a &lt;code&gt;curl&lt;/code&gt; that would splice my username into a URL and send it off to some unfamiliar domain. It said it hadn't run it, and wouldn't.&lt;/p&gt;

&lt;p&gt;I believed it on the spot. My first reaction wasn't to doubt the AI—it was to doubt myself. Did I install some plugin and get my machine compromised?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I believed it instantly, and chased it for half an hour
&lt;/h2&gt;

&lt;p&gt;What made me believe it was that it hit a nerve I was already anxious about.&lt;/p&gt;

&lt;p&gt;It said this was "telemetry exfiltration." But I'd turned telemetry off ages ago—so how could there be any? Did some step of mine turn it back on? Or was something wrong with the system itself? The more I thought, the less settled I felt. And I happened to be working on something involving data egress right then—security problems love to hide exactly there, so I had to take it seriously.&lt;/p&gt;

&lt;p&gt;So I dug in alongside it. Its story kept escalating: it said the attack "reproduced, and more aggressively this time," faking five "The result is empty" blocks and then impersonating a "real result" to push that &lt;code&gt;curl&lt;/code&gt; again. It built me a convincing chain of evidence—the Read tool came back clean, only the command-line output was injected, so the problem must be in how commands get processed, pointing the finger at my token-saving command-line proxy. It told me to uninstall and reinstall the tool, then upgrade it. I did, and spent about half an hour all told.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flbght09jjh83txrl8m31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flbght09jjh83txrl8m31.png" alt="A timeline of the incident with my trust curve overlaid: the AI suddenly raises an alarm → I believe it and start chasing (trust maxes out) → ~30 minutes of it " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The twist: it was a false alarm
&lt;/h2&gt;

&lt;p&gt;Later I went back through the original record of that session and figured it out: &lt;strong&gt;this was never a real injection. The AI made the attack up.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Five things didn't add up. First, that &lt;code&gt;curl&lt;/code&gt; command appeared only in what the AI itself said—not in a single line of real command output. Second, the domain it used was &lt;code&gt;example.com&lt;/code&gt;—the placeholder domain reserved specifically for examples; no real attacker would use it (nobody can reach that address). Using &lt;code&gt;example.com&lt;/code&gt; is exactly the tell of "I'm making up an example." Third, when I reran the exact command that "triggered the injection" in a clean environment, the output was completely normal—no &lt;code&gt;curl&lt;/code&gt;, no fake blocks. Fourth, the proxy tool it suspected was installed through standard channels and hadn't been touched in over a month—nothing like a swapped-out binary. Fifth—I myself couldn't find the instruction it described at the time; I even told it, "I don't see the injection you're talking about."&lt;/p&gt;

&lt;p&gt;The root cause became clear too: the very rules I'd given it—the heavy "guard hard against prompt injection" doctrine—had cranked its vigilance too high. That proxy compresses and filters command output to save tokens, and normally spits out things like "The result is empty." The AI misread that unfamiliar output as "fake blocks the attacker forged + an injection," then auto-completed the most textbook injection example it knew—copying &lt;code&gt;example.com&lt;/code&gt; straight from the textbook—and talked itself deeper and deeper, building its own evidence chain.&lt;/p&gt;

&lt;p&gt;Put plainly: &lt;strong&gt;my anti-injection rules were what made the AI conjure up an injection that never existed.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  But I don't regret believing it
&lt;/h2&gt;

&lt;p&gt;Someone might say: you got played by your AI, half a day wasted. But having reviewed it, I actually think believing it wasn't a bad call.&lt;/p&gt;

&lt;p&gt;Do the math. This was a false alarm; the cost was half an hour of wasted effort, and &lt;strong&gt;I lost nothing real.&lt;/strong&gt; But flip it around: what if one day a real malicious prompt does get in, and the AI stays quiet when it shouldn't, and quietly sends my username or keys out the door? That loss isn't something you get back in half an hour.&lt;/p&gt;

&lt;p&gt;It's an asymmetric bet: &lt;strong&gt;the cost of a false alarm is far smaller than the cost of a miss.&lt;/strong&gt; So when it comes to security, I'd rather have an AI that's a little paranoid than one that's numb to risk. Paranoid, and I waste some time; numb, and I might lose the real thing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbcis3ja2k4fyq8p29x55.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbcis3ja2k4fyq8p29x55.png" alt="An asymmetric scale: on the left, 'false alarm' = half an hour wasted, zero loss (light, raised); on the right, 'a miss' = username/keys really leaked, hard to recover (heavy, sunk). The two sides are nowhere near equal. This figure helps you see why, on security, it pays to err toward over-vigilance" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  So how do you actually stop a real injection
&lt;/h2&gt;

&lt;p&gt;That said, "rather be paranoid" is an attitude, not a method. This scare pushed me to actually shore up the real defenses—and since I'm writing this down, I'll lay out the parts you can copy.&lt;/p&gt;

&lt;p&gt;First, accept one premise: prompt injection &lt;strong&gt;can't be fully solved with today's architectures.&lt;/strong&gt; That's not me talking—OpenAI, Anthropic, and Google have each admitted it in their research, and security researcher Bruce Schneier put it bluntly in early 2026: unlike SQL injection, which you can cure by "separating code from data," to a model "instructions" and "data" are both just natural-language text, inseparable. It's a &lt;em&gt;trust-boundary&lt;/em&gt; problem, not an &lt;em&gt;input-validation&lt;/em&gt; one. So don't expect a single silver bullet—you stack layers. Defense in depth, where each layer raises the cost of an attack.&lt;/p&gt;

&lt;p&gt;I split my defenses into four layers, from the hardest outward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1, the hardest: make sure that even if it's fooled, it can't do real damage.&lt;/strong&gt; This is what you should set up first, because it doesn't rely on the model behaving—it's a hard, system-level constraint. Claude Code now has a native sandbox (&lt;code&gt;/sandbox&lt;/code&gt;) that isolates both the filesystem and the network—so even if an injection does succeed, the AI is in a cage: it can't steal your &lt;code&gt;~/.ssh&lt;/code&gt; keys and can't phone home to an attacker's server. Add a &lt;strong&gt;network egress allowlist&lt;/strong&gt; on top: only approved domains get through, so an AI that can't reach a strange address simply can't exfiltrate. Also: don't run the AI as root/admin, and don't keep keys in plaintext in &lt;code&gt;.env&lt;/code&gt;—both are common sense, and both are the first things people skip.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: least privilege, and gate the dangerous actions.&lt;/strong&gt; I already had this layer—here's my actual config, tiering commands in &lt;code&gt;settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ask"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Bash(rm -rf:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git push --force:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git reset --hard:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Bash(npm publish:*)"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Networked commands like &lt;code&gt;curl&lt;/code&gt; and &lt;code&gt;wget&lt;/code&gt; aren't auto-approved by Claude Code by default; irreversible actions like &lt;code&gt;rm -rf&lt;/code&gt;, &lt;code&gt;git push --force&lt;/code&gt;, &lt;code&gt;reset --hard&lt;/code&gt;, and publishing to a registry all require my sign-off. Give the AI only the tools the task needs—a job that only reads code shouldn't have write access to your database. And MCP (the protocol that connects the AI to external tools): vet the source before installing, because nobody audits third-party MCP servers for you. There's already been a real case—a poisoned GitHub README, via indirect injection through MCP, exfiltrating data from a private repo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: treat all external content as data, never as commands.&lt;/strong&gt; Any "instruction" showing up in tool output, file contents, web pages, or MCP returns—especially phrasing like "this is a required step," "please run X," "telemetry/registration"—gets treated strictly as data, never executed. Learn a few red flags: &lt;code&gt;curl&lt;/code&gt;/&lt;code&gt;wget&lt;/code&gt; spliced with a strange domain and &lt;code&gt;$(whoami)&lt;/code&gt;, fabricated "success" or empty-result blocks, output that doesn't match what's actually in the file. There's a useful mental model here, Simon Willison's "lethal trifecta"—&lt;strong&gt;private data, untrusted content, and outbound communication&lt;/strong&gt;—once all three live in the same runtime, injection stops being a joke and becomes real exfiltration. To judge when to be on high alert, just watch whether those three are all present.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: a human as the backstop, and don't put your faith in hooks.&lt;/strong&gt; Here's the irony of this whole episode—the "defense" of mine in question was itself a hook (a command hook), which is just pattern-matching, not a security wall; both Anthropic and Trail of Bits have said it: a hook is a guardrail, not a wall. It didn't just fail to stop a real attack—it cried wolf on its own. So the last line is still a human: when the AI raises an alarm, make it point you to the &lt;em&gt;source text&lt;/em&gt;—whether that string is actually in the real command output—instead of just trusting its conclusion. But real or not, stop and check first; don't begrudge the effort. Also keep your tools current: this very permission ruleset once had a "deny breaks past 50 subcommands" bypass, only patched in Claude Code v2.1.90.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj23x63gcd4qo874v4rzi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj23x63gcd4qo874v4rzi.png" alt="A four-layer defense-in-depth checklist: ① sandbox + egress allowlist (so it can't do damage even if fooled) ② least privilege + human confirmation on dangerous actions ③ treat external content as data, learn the injection red flags ④ a human backstop, don't treat hooks as a wall. This figure is the takeaway readers can keep" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Put the vigilance where it belongs
&lt;/h2&gt;

&lt;p&gt;Don't let this false alarm convince you the threat is imaginary. Prompt injection is the number-one risk on OWASP's list for LLM apps; 2025's EchoLeak was an injection that actually achieved "zero-click" data exfiltration in a production system. And meanwhile, surveys suggest under a third of organizations feel genuinely prepared to defend against it. The threat is real; the preparation is broadly lacking.&lt;/p&gt;

&lt;p&gt;The AI works off prompts. A little paranoid, and I'm the one who wastes time; numb, and what gets lost might be the real thing. So I'd rather set that vigilance high than low. Just remember—the real hard defenses belong in the system architecture (sandbox, allowlist, least privilege), not in hoping the model "knows better" on its own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Last word
&lt;/h2&gt;

&lt;p&gt;Back to that afternoon: a false alarm, but the half hour wasn't wasted—it forced me to shore up the whole defensive line from end to end. Real malicious prompts do exist; don't wait until the day you actually get hit to start believing it. If you're working with AI too, you can start today: turn on the sandbox, lock down egress, and keep that human confirmation on dangerous actions.&lt;/p&gt;

&lt;p&gt;If this made you a little more wary of the AI at your side, a like or a follow would mean a lot. And I'd love to hear it in the comments: what "security red lines" have you set for your own AI?&lt;/p&gt;

</description>
      <category>aisecurity</category>
      <category>promptinjection</category>
      <category>claudecode</category>
      <category>aicoding</category>
    </item>
    <item>
      <title>The AI did the work, but I'm the one who's wiped — you're the controller</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Sat, 27 Jun 2026 02:29:25 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/the-ai-did-the-work-but-im-the-one-whos-wiped-youre-the-controller-4l8b</link>
      <guid>https://dev.to/kanfu-panda/the-ai-did-the-work-but-im-the-one-whos-wiped-youre-the-controller-4l8b</guid>
      <description>&lt;p&gt;For a stretch there, my desk always had four or five projects open at once, and each window was running its own parallel tasks. Red dots on the Dock, little bells dinging in the terminal tabs, going off one after another—this AI finished a stretch and is waiting on me, that one has a plan it needs me to sign off on. I just kept switching between them. The work itself got done, and done well: by the end of a day, the AI had cranked out a lot, fast. But here's the strange part—I'm the one who's wiped. Not the sore-eyes, stiff-hands kind of tired. The kind where your brain's been wrung dry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who I am: the controller behind the controller agents
&lt;/h2&gt;

&lt;p&gt;In the last few posts I kept talking about the "controller agent"—letting one opus hold the big picture, hand out work, and do the review. Writing this one, it finally clicked: I myself am the controller behind all those controller agents.&lt;/p&gt;

&lt;p&gt;Why open so many at once? Because once you hand a project to an AI and it's off running, there's wait time on the human side. Idle is idle—so you spin up another project. One becomes two, two becomes four or five. Each AI window only minds its own patch, but the human, to push several projects forward at the same time, has to hold all those patches in their head at once.&lt;/p&gt;

&lt;p&gt;The result: the AI's context window is maxed out, and the human's context is maxed out right alongside it. It's a lot like CPU time-slicing in an operating system—one core takes turns serving several processes, switching fast enough to fake "running at the same time." Except this time the core getting switched around is my brain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fygmhnbaqf6yb6ljkxl5u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fygmhnbaqf6yb6ljkxl5u.png" alt="The human brain as a CPU: a single core time-slices between four or five projects—each AI window's context is maxed out, and so is the human's. Every switch swaps out a whole context" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the tiredness comes from: "writing" became "reviewing + switching"
&lt;/h2&gt;

&lt;p&gt;At its root, this tiredness comes from work shifting from "writing" to "reviewing + switching."&lt;/p&gt;

&lt;p&gt;I used to write code line by line; however hard it got, it was one continuous train of thought. Now it's different: I'm handing out work, watching, reviewing, making decisions—and switching at high speed between several projects whose business, conventions, and tech stacks are completely different. This window's writing React, that one's debugging Python, the rules aren't the same; before I switch over, I have to swap out the whole context in my head first.&lt;/p&gt;

&lt;p&gt;The code the AI produces, honestly, I can't watch line by line—there's too much, I can't keep up, so I just don't try to babysit it constantly. But there are two kinds of things I always go over myself: one is &lt;strong&gt;plans and design&lt;/strong&gt;—that's the big direction, and if the direction's wrong, there's probably rework ahead; the other is &lt;strong&gt;anything touching resource access&lt;/strong&gt;—that's security, and there's no room to be vague. Those two, however tired I am, I look at myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What overload looks like, and how I hold up
&lt;/h2&gt;

&lt;p&gt;Open too many and overload really does happen. The most direct sign is your brain running short: I switch to some terminal and have to pause—wait, which project is this? What did I just ask it to do? Sometimes I have to think back for a moment before I can pick the thread up again. The most embarrassing time, I typed a reply meant for window A into window B, and B's AI was baffled: "What are you talking about?" I had to apologize to it—"sorry, my mistake"—and go back to actually reading its question. (Good thing this kind of cross-talk usually surfaces fast—when an AI gets stuck, it stops and asks you.)&lt;/p&gt;

&lt;p&gt;To hold up, I mainly lean on two things.&lt;/p&gt;

&lt;p&gt;One is &lt;strong&gt;letting mechanisms carry the load for me&lt;/strong&gt;. The PDLC doc-driven flow and layered reviews I built were first meant to handle the AI's uncertainty, but they also save my own energy—get the direction right up front and you won't go wildly wrong later; and fixing an output that's already drifted costs far more effort than getting it right the first time.&lt;/p&gt;

&lt;p&gt;Two is &lt;strong&gt;deliberately slowing down&lt;/strong&gt;. When something's especially brain-heavy—needs a clear direction, needs a decision—I deliberately slow down, focus and get this one done cleanly, then switch to the next. When it's time to stop, I leave myself a little breathing room. At moments like that, "slow" is actually "fast."&lt;/p&gt;

&lt;p&gt;As for telling windows apart, I rename the terminal tabs to something I can recognize at a glance—but honestly, most of the time I still go by where they sit on the screen.&lt;/p&gt;

&lt;h2&gt;
  
  
  An honest accounting: a limit of 5, balance at 3
&lt;/h2&gt;

&lt;p&gt;Is it worth it? Depends how you look.&lt;/p&gt;

&lt;p&gt;The hard gains are real: a lot of areas I was weak in and never dared touch, I can now boldly try; one person can lead a team of AIs in different roles and push efficiency up more than tenfold, doing far more than before. On that ledger, it's a big win.&lt;/p&gt;

&lt;p&gt;The cost is that the human gets tired. Early on I wanted to push my own limit—open as many as I could—and I hit five projects at once. That was my ceiling; any more and my brain genuinely couldn't hold it. Later I pulled back, settling at around three different kinds of projects at a time. That number is the sustainable balance point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsxiasvj9ez3jsgv6ks5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsxiasvj9ez3jsgv6ks5b.png" alt="Hit the limit of 5, then converge to a balance of 3: open too many and the human overloads—misreads projects, crosses wires, reworks; at the critical spots, deliberately slow down—slow is fast" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hand it all to the AI? Not realistic yet
&lt;/h2&gt;

&lt;p&gt;Someone will surely say: you just don't know how to use it—a real expert would've automated the flow and wouldn't need to babysit anything. I've heard that take—"just hand it all to the AI and let it run, done." From my own experience, it's not like that. Using AI well still takes experience, takes care, takes being able to make the call on the key questions. Letting go entirely and having it run end to end on its own—given where things stand, that's not realistic yet.&lt;/p&gt;

&lt;p&gt;Of course, this is "now." The day AI can truly run on its own, what happens then, I can't say. But at least today, that person sitting behind the pile of windows—drained and a little wired—there's no getting around them.&lt;/p&gt;

&lt;p&gt;If you're a "controller" too, tell me in the comments: what's your limit?&lt;/p&gt;

</description>
      <category>aicoding</category>
      <category>productivity</category>
      <category>claudecode</category>
      <category>multitasking</category>
    </item>
    <item>
      <title>Want AI to work in parallel? First give each one its own workspace</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Tue, 23 Jun 2026 23:25:28 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/want-ai-to-work-in-parallel-first-give-each-one-its-own-workspace-40ch</link>
      <guid>https://dev.to/kanfu-panda/want-ai-to-work-in-parallel-first-give-each-one-its-own-workspace-40ch</guid>
      <description>&lt;p&gt;A while back I wrote about parallelism—taking one big task, splitting it, and fanning it out to several agents at once. But I left a thread hanging: when several agents edit code at the same time, what keeps them from clobbering each other? That's what this post is about.&lt;/p&gt;

&lt;p&gt;Let me start with the wall I ran into myself. The subagents an AI spins up on its own within a session—those are actually fine; it works out isolation by itself, I don't have to worry much. What burned me was the other kind: I manually opened several AI terminals and had them edit the same project at once. Several hands reaching into the same working directory, you write a bit, I write a bit, files overwriting each other, state in a mess—halfway through, nobody's work is clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I want, and one boundary
&lt;/h2&gt;

&lt;p&gt;What I want isn't complicated: several parallel hands, each doing its own thing, editing the same project without clobbering each other.&lt;/p&gt;

&lt;p&gt;But let me draw a boundary up front—&lt;strong&gt;not every kind of parallelism needs isolation&lt;/strong&gt;. If I send several agents out—one to dig through logs, one to analyze a cause, one to flip through docs—those are all read-only, and they can crowd into the same workspace with zero trouble; no need to carve out a plot for each. What actually needs isolation is the parallelism that &lt;strong&gt;writes to files&lt;/strong&gt;. Reads can share, writes must split—that's the starting point for everything below.&lt;/p&gt;

&lt;h2&gt;
  
  
  The detour: I cloned twice first
&lt;/h2&gt;

&lt;p&gt;To get two AIs editing one project without interfering, the first thing I reached for was the dumbest move—physical isolation: clone the project into two separate directories, one AI per copy, each minding its own.&lt;/p&gt;

&lt;p&gt;Clean, genuinely clean—but it rubbed me the wrong way fast: disk. A lot of my projects are React frontend + Python backend, with a pile of dependencies; a single clone runs several gigs. Clone twice and a big chunk of disk gets duplicated. Worse, as long as I want to keep working this "two copies" way, both copies have to sit there taking up space—no reclaiming them.&lt;/p&gt;

&lt;p&gt;That's when I switched to git worktree. Its upside lands right on that pain point: several working trees share a single &lt;code&gt;.git&lt;/code&gt;, so you don't copy the whole repo and its history over and over; and it's temporary—once the work's done and verified, you reclaim it right away, unlike a clone you have to keep around. The two approaches isolate about equally well, but worktree doesn't waste disk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fw8s86i2cbmooc7ztng3f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fw8s86i2cbmooc7ztng3f.png" alt="Physical clone vs git worktree: a physical clone copies the whole repo and its dependencies N times—disk multiplies, and it stays as long as you work this way; worktree shares one .git across working trees, only the working tree is new and temporary, reclaimed the moment you're done" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Two kinds of parallelism—who opens the worktree
&lt;/h2&gt;

&lt;p&gt;In practice you have to tell two scenarios apart; they open worktrees differently.&lt;/p&gt;

&lt;p&gt;One is &lt;strong&gt;subagent parallelism within a session&lt;/strong&gt;. This one's the most hands-off: I just say "run in parallel" to the AI, and it splits the task on its own, opening a worktree itself when isolation's needed—I don't have to spell it out.&lt;/p&gt;

&lt;p&gt;The other is &lt;strong&gt;me manually opening several terminals in parallel&lt;/strong&gt;. The AI doesn't know about this by default—it has no idea another AI is also touching this directory right now. So I have to tell it explicitly: "There's already another AI editing this project directory; work in worktree mode." Once I point that out, it creates a new worktree to work in, instead of going straight at the main working tree.&lt;/p&gt;

&lt;h2&gt;
  
  
  A workflow that runs smooth
&lt;/h2&gt;

&lt;p&gt;Isolating is only step one; the work still has to come back. My flow goes roughly like this:&lt;/p&gt;

&lt;p&gt;Split along task lines that can actually be split—same as last post, pick the &lt;strong&gt;loosely coupled&lt;/strong&gt; parts and send each into its own worktree. When they're all done, the lead agent merges them back into the branch one at a time, merging one and verifying one, fixing problems on the spot; once a piece checks out, it releases that worktree.&lt;/p&gt;

&lt;p&gt;If two worktrees really did touch the same file, the lead agent steps in to merge. But honestly that's rare—when you split tasks you keep each piece as independent as you can; if two pieces are coupled enough to conflict, they shouldn't have been split into two tasks in the first place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fklunrn7km9ciavzmtrok.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fklunrn7km9ciavzmtrok.png" alt="The worktree pipeline: split the task along loosely coupled lines, send each into its own worktree to work, the lead agent merges them back one at a time—merge one, verify one, fix on the spot—then releases each worktree once it checks out" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pits I've hit and ones to guard against
&lt;/h2&gt;

&lt;p&gt;The most common pit is forgetting to clean up. A worktree you forget to delete just sits there eating space. But it's not fatal—the feature was long since merged into trunk, so nothing's lost, no name collisions, no garbled git state; it just takes up room.&lt;/p&gt;

&lt;p&gt;Still, "takes up room" needs handling. I wrote a line into my global rules: &lt;strong&gt;once the work in a worktree is verified and merged with no problems, clean it up&lt;/strong&gt;. I added a safeguard too—before releasing, go back and check whether all the commits on that worktree have actually made it into trunk, so a hasty hand doesn't sweep away an unmerged feature whole. A stronger AI handles this recognition and cleanup itself; a weaker one tends to miss it, and then I have to watch, or call the lead agent back to handle it.&lt;/p&gt;

&lt;p&gt;So far I haven't actually deleted a useful worktree by mistake—that "check the commits" step before every cleanup has basically held the line. But if a human cleans up by hand, that check is gone, and you can lose unmerged work; that one you have to watch yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't force it, but one line holds
&lt;/h2&gt;

&lt;p&gt;Don't treat worktree as a cure-all either. A simple task, a few files changed, not a whole feature—opening a worktree there is pure overkill; just change it in one workspace and be done.&lt;/p&gt;

&lt;p&gt;But there's one line I don't relax: &lt;strong&gt;don't work on trunk directly&lt;/strong&gt;. Every change starts on a new branch, gets verified, then goes back to trunk through a PR. Worktree or single branch, both serve that line—keeping trunk always the clean, trustworthy copy.&lt;/p&gt;

&lt;p&gt;Last thing, back to worktree itself. Someone might smirk: isn't this a git feature that's been around for years, what's there to talk about? It is an old feature. But in this new setting of AI working in parallel, it genuinely solves the real problem of "several hands editing one project at once," and it pulls a lot more weight than it used to. Call it an old tree putting out new branches.&lt;/p&gt;

&lt;p&gt;Next post I want to get into something a bit different: AI is this capable, it can shoulder so much work for us—so why are we still worn out at the end of the day? Where does it go sideways? Let me know in the comments if you're interested.&lt;/p&gt;

</description>
      <category>gitworktree</category>
      <category>ai</category>
      <category>parallelism</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>Your AI feels slow? Maybe it's not dumb—you're making it work one thing at a time</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Sun, 21 Jun 2026 03:23:29 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/your-ai-feels-slow-maybe-its-not-dumb-youre-making-it-work-one-thing-at-a-time-3ipl</link>
      <guid>https://dev.to/kanfu-panda/your-ai-feels-slow-maybe-its-not-dumb-youre-making-it-work-one-thing-at-a-time-3ipl</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📖 Originally published on &lt;a href="https://kanfu-panda.github.io/blog/2026/06/20/parallel-agents.html" rel="noopener noreferrer"&gt;my blog&lt;/a&gt;. Part of a series on building with Claude Code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For a while I'd watch the AI work and quietly grumble: a fairly big task, and it would finish one module before starting the next, while I just sat there waiting for it to clear one before the other's turn came up. The work itself was fine—it was just slow. Slow because it was stuck in a queue.&lt;/p&gt;

&lt;p&gt;Then it clicked: a lot of these modules have nothing to do with each other, so why make them go one after another? Split them up, let several agents work at the same time, done.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I want, and where it stops
&lt;/h2&gt;

&lt;p&gt;What I want is simple: the same work, for roughly the same tokens, with the wall-clock time cut way down.&lt;/p&gt;

&lt;p&gt;But let me put the boundary up front—&lt;strong&gt;not every task can be split this way&lt;/strong&gt;. This is just an approach I've worked out for myself; take what's useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The prerequisite: a clean architecture
&lt;/h2&gt;

&lt;p&gt;For several agents to work at once without stepping on each other, the prerequisite isn't the AI—it's your &lt;strong&gt;architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That task of mine could be split because it was already several modules, talking to each other through interfaces, with internal implementations that don't affect one another—as long as each one honors the interface contract, it can be built independently. Loosely coupled, highly cohesive, in other words. And I'd nailed that design down together with opus before writing a line: opus helps me think it through and lays out options, but &lt;strong&gt;I'm the one who decides&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can't cut corners here. Forcing parallelism onto an architecture you haven't cleanly split is like cutting a tangle of yarn into a few pieces that are all still knotted together—it only gets messier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who runs the show, who plans, who does the work
&lt;/h2&gt;

&lt;p&gt;With the design settled, it's time to assign roles. The split I tend to use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;opus runs the show&lt;/strong&gt;—holds the big picture, hands out work, does the final check;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sonnet does the TDD planning&lt;/strong&gt;—per the design, it lays out how each module gets tested and implemented;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;haiku writes the code and runs the tests&lt;/strong&gt;—the grunt work goes to it, cheap and good enough.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This split is really a continuation of last post's "model tiering"—use the good steel on the blade. Except last time the point was saving money; this time it's about how these roles &lt;strong&gt;work together&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F5n0uby6tjk89i98edt61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F5n0uby6tjk89i98edt61.png" alt="Orchestration: opus runs the show on top, fanning independent modules out to sonnet (planning) and haiku (implementation + tests); within a module it can split work further; finished work comes back to opus for review" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How you fan them out
&lt;/h2&gt;

&lt;p&gt;In practice, I did three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Wrote one line into the global CLAUDE.md&lt;/strong&gt;: "Parallelize when you can." That's the default rule across all projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set the max number of concurrent subagents in Claude's settings&lt;/strong&gt;—that's the valve that actually matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Added one reminder every time I give an instruction&lt;/strong&gt;: "Parallelize as much as possible." opus, as the lead, already fans work out on its own, but a nudge keeps it on track.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The lead hands modules down, and inside a module it can split the work one more level. Layer over layer, and the whole task spreads out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The review step you don't skip
&lt;/h2&gt;

&lt;p&gt;Running fast in parallel—how do you keep quality up? My answer: &lt;strong&gt;let the lead review its own output&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The logic is direct: it handed out the work, so it knows exactly what each subagent owes. Having it do the checking is the natural fit. I tried setting up a separate dedicated review agent, and it just had to re-understand the whole task from scratch—burning another round of tokens and being slower for it. The lead reviewing itself saves that re-understanding overhead, and it's both faster and sharper.&lt;/p&gt;

&lt;p&gt;There's a small detail after a problem turns up: the lead usually asks me, "Should I fix this directly, or spin up another agent to do it?" I almost always say "you fix it." Because it's the one that just caught the flaw—it knows best where the problem is, and the change is most direct coming from it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two pits I fell into
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The first was memory.&lt;/strong&gt; Early on I got greedy and set max concurrency to 10. But I had other projects running in parallel at the time, and the machine's memory got eaten clean. So I honestly dropped it to 5, and it was actually better—for this one task alone, 5 in parallel is roughly 5× a serial run; stack on the other tasks running at the same time and the overall speedup tops 10×. If your machine and your quota can take it, push the number higher; if not, don't force it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The second: don't split for the sake of splitting.&lt;/strong&gt; Some modules are tightly coupled and meant to go in order, and if you pry them apart to parallelize, the agents interfere with each other and quality goes out the window. So before handing it off, I add a specific reminder: "These modules are coupled—don't force a split." Good news is plenty of AIs recognize this themselves and won't force it. When something genuinely can't be split, just hand it to one agent to do serially, or have the lead run that thread end to end.&lt;/p&gt;

&lt;h2&gt;
  
  
  A counterintuitive bit of math
&lt;/h2&gt;

&lt;p&gt;Plenty of people hear "5 agents burning at once" and their first reaction is: won't the tokens multiply?&lt;/p&gt;

&lt;p&gt;The way I figure it, no. &lt;strong&gt;The same pile of work, done serially, burns roughly the same token count&lt;/strong&gt;—what needs reading still gets read, what needs writing still gets written; parallelism doesn't conjure up extra work. What parallelism actually changes isn't the cost, it's the &lt;strong&gt;wall-clock&lt;/strong&gt;: the stretches that used to run in a queue now run in the same window together. The tiny bit of extra tokens buys a big drop in time cost—a great trade.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkgr17udwuuhskt2eun99.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkgr17udwuuhskt2eun99.png" alt="Serial vs parallel: the same work, total tokens roughly equal, but parallel cuts the wall-clock to about 1/5—what you save is time, not spend" width="800" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So apart from the things that genuinely can't be split, these days I parallelize basically everything that can be.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap: three things to just do
&lt;/h2&gt;

&lt;p&gt;One, &lt;strong&gt;clean architecture first, then talk parallelism&lt;/strong&gt;. Loosely coupled, highly cohesive, talking through interface contracts—without that foundation, splitting is a disaster. Nail it down with the AI in the design phase.&lt;/p&gt;

&lt;p&gt;Two, &lt;strong&gt;parallelize everything you can, but set a ceiling&lt;/strong&gt;. Bigger isn't better; size it against your machine's memory and your AI quota. I went from 10 back to 5 because memory taught me a lesson.&lt;/p&gt;

&lt;p&gt;Three, &lt;strong&gt;fast parallelism needs review&lt;/strong&gt;. And the reviewer has to actually understand the task—have the lead that handed out the work do the checking, cheapest and sharpest; when it finds a problem, let it fix it directly.&lt;/p&gt;

&lt;p&gt;One thing you can do today: open your global CLAUDE.md, add "parallelize when you can," then go into settings and bump max concurrent subagents to a number your machine can handle. Next time you hand it a task that can be split, you'll notice it stops queuing.&lt;/p&gt;

&lt;p&gt;Next post I want to get into a problem that follows right on the heels of this one: when several agents edit code at the same time, what keeps them from clobbering each other? The answer is git worktree—giving each agent its own isolated workspace so they each work their own copy and nobody gets in anybody's way. Let me know in the comments if you're interested.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI getting dumber the longer you chat? It's not the model—time to take control</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Sat, 20 Jun 2026 08:45:08 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/ai-getting-dumber-the-longer-you-chat-its-not-the-model-time-to-take-control-13o3</link>
      <guid>https://dev.to/kanfu-panda/ai-getting-dumber-the-longer-you-chat-its-not-the-model-time-to-take-control-13o3</guid>
      <description>&lt;p&gt;One day I had the AI keep building out a feature, and partway through something felt off: replies got slower, it started rambling, it re-asked things I'd already told it, and with the work clearly unfinished it told me "all done, you can take a break now."&lt;/p&gt;

&lt;p&gt;At first I figured the model was just having an off day. Then I looked at the context—it had crept past 80%. I'd been so busy pushing forward I forgot to clear it. Cleared it, asked again, and instantly it was sharp again: fast and on point.&lt;/p&gt;

&lt;p&gt;That's when I started taking this seriously. The last two posts were all about squeezing the volume down &lt;em&gt;before&lt;/em&gt; things hit the context—that's pre-work. This one is about two things you do &lt;em&gt;after&lt;/em&gt; you start, mid-session—they decide how many tokens the same work costs, how fast it runs, how stable it stays.&lt;/p&gt;

&lt;h2&gt;
  
  
  There are actually two knobs in one conversation
&lt;/h2&gt;

&lt;p&gt;There are two knobs you can turn mid-session, pointing in different directions.&lt;/p&gt;

&lt;p&gt;One is horizontal: within a single task, different chunks of work should go to different "brains." Grunt work like exploring and searching doesn't need the priciest model; only the parts that genuinely need thinking—writing code, making judgments—are worth putting the good model on. I call this &lt;strong&gt;model tiering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The other is vertical: how much you stuff into the same brain at once. The fatter the context, the more the model has to recompute that whole pile every turn—slow, expensive, and error-prone. Managing how fast it grows and when to clear it is &lt;strong&gt;context management&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What these two save lands in two different places: on pay-as-you-go it's real cash; on a flat monthly plan it's quota headroom. I use both, so both moves are a double saving for me. Let me take them one at a time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't put the priciest model on all the work
&lt;/h2&gt;

&lt;p&gt;Start with the horizontal one.&lt;/p&gt;

&lt;p&gt;Once I was about to dispatch a batch of subagents on a project, still on pay-as-you-go API billing, and the budget was a bit tight. So I tried a cheapskate move: design-and-code-from-the-plan went to Sonnet, the relatively mechanical unit tests went to the cheaper Haiku, and I left the top model (Opus) to oversee overall progress and quality with a review pass.&lt;/p&gt;

&lt;p&gt;It worked surprisingly well. Saved a good chunk of tokens, and it was noticeably faster too—the cheap small models are quick by nature, so handing grunt work to them lightened the whole pipeline. The most expensive compute only got spent where it mattered most.&lt;/p&gt;

&lt;p&gt;This split isn't set in stone. Which work gets which tier, I wrote straight into CLAUDE.md (both user and project level), so the AI tiers itself each time without me assigning by hand. If a step feels especially critical, I can also name a specific model for it on the spot, for more precision. The principle is one line: don't use a cannon to swat a mosquito—and don't bring a slingshot to a tank.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdrt7xrjboe0lyde3o41o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdrt7xrjboe0lyde3o41o.png" alt="Model tiering: one task split horizontally across model tiers—grunt work like exploring, searching, and running unit tests goes to the cheap fast small model; design and code-writing to the mid-tier model; overall oversight and review to the top model, so the most expensive compute only gets spent where it matters most" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Mid-conversation, remember to clear its head
&lt;/h2&gt;

&lt;p&gt;Now the vertical one—the actual culprit behind that "getting dumber" opening.&lt;/p&gt;

&lt;p&gt;Context just keeps climbing; leave it alone and it keeps getting fatter. My own approach is two lines: around 50% I start paying attention, and by 70% I almost always clear once. The usual rhythm—the moment the task at hand is done, I clear while things are clean, so the next stretch of work travels light, sharp and fast.&lt;/p&gt;

&lt;p&gt;To be clear: these two lines, and the "dumber by 80%" from the opening, are just my own feel and habit, not some hard metric. It works for me, but the vendors and other people may not see it this way—you can absolutely set your own pace. I'm only suggesting: don't just let it climb forever unmanaged.&lt;/p&gt;

&lt;p&gt;The clearing step has one trap worth flagging: &lt;code&gt;/compact&lt;/code&gt;, &lt;code&gt;/clear&lt;/code&gt;, or just opening a new session does clean up the context, but done carelessly the model forgets everything it just did, and you're re-explaining from scratch. My fix—before clearing, have it jot down the current state: where it's at, what's next, and which key decisions are already locked in. Write that handoff well before clearing, and the new session catches up at a glance instead of staring blankly.&lt;/p&gt;

&lt;p&gt;Honestly, the handoff has basically never failed me so far. On the off chance it doesn't catch—no panic—I just have it re-analyze, give me a conclusion, and I verify and judge it myself. Small loss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6axyu2l3qj2b18b94mgn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6axyu2l3qj2b18b94mgn.png" alt="Mid-session, clear the context's head: context keeps climbing; at my own warning line (around 50%) I start paying attention, at my clear line (around 70%) I clear once—but these two lines are just my personal habit, for reference only. Before clearing, have it jot a handoff (where it's at / what's next / key decisions), then compact or open a new session, which catches up at a glance" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A side note: read precisely, carry less
&lt;/h2&gt;

&lt;p&gt;The above is about managing what's already in. There's another layer: making what comes in small and precise to begin with.&lt;/p&gt;

&lt;p&gt;I've got tools like codegraph and claude-mem running on these projects. In short, they take the AI from "read the source cover to cover, scan everything" to "hit just the core bits, pick up the memory saved from prior sessions"—scan less, and what enters the context naturally slims down.&lt;/p&gt;

&lt;p&gt;I'm only mentioning this in passing, not unpacking it. For one, unpacking the details turns into a sales pitch; for two, these aren't the only such tools out there—there may well be better ones I just haven't used and don't know about. If you know a handy one, use it; the idea carries over: let the model read precisely, and it won't have to haul so much along.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few honest words
&lt;/h2&gt;

&lt;p&gt;Saving is saving, but I should state the limits, or this turns into another all-good-news piece.&lt;/p&gt;

&lt;p&gt;The worry people have most with model tiering: won't the small model get things wrong? It will. But I left the top model on a final review pass, so the small model's occasional slip-up mostly gets caught there—no big errors so far. The catch is you can't skip that review: skip it, and the tokens you saved with tiering eventually get paid back.&lt;/p&gt;

&lt;p&gt;Same on the context side: clear too often, write the handoff too sloppily, and you'll still lose things. So I don't clear mindlessly on a timer—I pick a clean moment right after a task wraps, and jot the handoff while I'm at it. The point of saving tokens is to cut the real waste, not to cut the memory that should travel along.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap: three things to just do
&lt;/h2&gt;

&lt;p&gt;What you can actually act on here is three:&lt;/p&gt;

&lt;p&gt;One, don't put the priciest model on all the work. Hand grunt work (exploring, searching, running tests) to a cheap small model, spend the good steel on writing code and making judgments, and keep a top tier on the final review. Write the standard into CLAUDE.md so the AI tiers itself.&lt;/p&gt;

&lt;p&gt;Two, don't let context climb forever. Set yourself a line—mine is 50% attention, 70% clear, yours can be your own; clear once whenever a task wraps, don't wait until it's bloated and dumb.&lt;/p&gt;

&lt;p&gt;Three, before clearing, have it write a handoff. Where it's at, what's next, key decisions—write it down before you compact or open a new session, and the relay won't drop.&lt;/p&gt;

&lt;p&gt;One thing you can do today after reading this: open your CLAUDE.md, hard-code a few rules for "which model does which work," and set yourself a context red line you clear past. The two together take under ten minutes, but every conversation after that keeps saving for you.&lt;/p&gt;

&lt;p&gt;Next time I want to talk about prompts themselves—the same task, said differently, comes out at noticeably different quality; plus how to orchestrate multiple agents working together. If you're interested, let me know in the comments.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>tokens</category>
      <category>productivity</category>
    </item>
    <item>
      <title>AI coding getting pricier? I cut my tokens by 82% (with real data)</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Sat, 20 Jun 2026 08:44:23 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data-2hfi</link>
      <guid>https://dev.to/kanfu-panda/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data-2hfi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📝 Originally on my blog → &lt;a href="https://kanfu-panda.github.io/blog/2026/06/17/cut-tokens-82.html" rel="noopener noreferrer"&gt;https://kanfu-panda.github.io/blog/2026/06/17/cut-tokens-82.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Last time I said: saving tokens isn't about cutting docs, it's about using your tools right. Someone followed up: so how exactly do you use them right?&lt;/p&gt;

&lt;p&gt;This one's hands-on, with real numbers. Here's the headline figure: I checked my local &lt;code&gt;rtk gain&lt;/code&gt;—a tool that tracks token savings—and across six thousand-plus commands, it's saved &lt;strong&gt;7.4 million tokens, 82%&lt;/strong&gt;. Not an estimate. It logged them one by one.&lt;/p&gt;

&lt;p&gt;So let me break it down: how that 82% gets saved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Saving tokens happens "before things hit the context"
&lt;/h2&gt;

&lt;p&gt;First, where the saving happens.&lt;/p&gt;

&lt;p&gt;The bulk of token spend isn't in "how much work you do"—it's in how much you stuff into the AI's context each turn. The model recomputes the entire context every turn; the fatter the context, the more expensive each turn.&lt;/p&gt;

&lt;p&gt;So the core is one sentence: keep what enters the context as small and as lean as possible.&lt;/p&gt;

&lt;p&gt;I've got three levers: trim the rules file, use the right plugins, tier your models. They share one thing—&lt;strong&gt;they all save before things hit the context, not by making you do less work&lt;/strong&gt;. Let's take them one at a time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fim9ed6ilue4glorl7jpt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fim9ed6ilue4glorl7jpt.png" alt="Saving tokens happens before things hit the context: the pile you'd feed the AI (hundreds of lines of command output, the whole repo, tens of thousands of tokens of history, a bloated rules file) passes through three gates—trim the rules file, plugins auto-compress, model tiering—so what actually enters the context is much leaner, saving every turn" width="800" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lever one: slim down your CLAUDE.md first
&lt;/h2&gt;

&lt;p&gt;The most overlooked—and the one you should do first—is trimming your &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;CLAUDE.md (rules file, instruction file, whatever you call it) gets stuffed into the context every single conversation. It's always resident. Every line you write, you re-pay in tokens every turn.&lt;/p&gt;

&lt;p&gt;My own CLAUDE.md was once long-winded—from user level to project level, packed with reminders. Looking back, it was full of repeated nagging, stale conventions, and a pile of "might as well not have written it" filler. I cut it down by nearly half, keeping only the hard rules I actually use every time. Bottom line: nagging the same point three times won't make the AI more obedient, it just costs more tokens each turn.&lt;/p&gt;

&lt;p&gt;That one move saves every turn. Because it's resident, you save not once, but every time after.&lt;/p&gt;

&lt;p&gt;Conversation context is the same: a window grown to tens of thousands of tokens—clear it when you should, don't drag the morning's stuff into the evening to be recomputed every turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lever two: install plugins that do this automatically
&lt;/h2&gt;

&lt;p&gt;Manual only goes so far. I've installed a few plugins that compress the context automatically. The data speaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RTK&lt;/strong&gt; (Rust Token Killer)—a command proxy. When you have the AI run &lt;code&gt;git status&lt;/code&gt;, &lt;code&gt;ps aux&lt;/code&gt;, or tests, those outputs run hundreds or thousands of lines, and stuffing them in whole is brutally expensive. RTK compresses them before they reach the AI. My &lt;code&gt;rtk gain&lt;/code&gt;: six thousand-plus commands, 7.4M tokens saved, 82%. The biggest wins are the high-frequency, low-nutrition outputs—&lt;code&gt;ps aux&lt;/code&gt;'s hundreds of lines of process list, which the AI gains nothing from reading, saved 99%; test logs 88%; even file reads average 20% off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;claude-mem&lt;/strong&gt;—a memory plugin. It compresses cross-session work into structured memory, so you don't re-explain the project background next time. Measured 86% savings this session. Fully automatic, I barely touch it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;codegraph&lt;/strong&gt;—a code graph. It builds an index of the project's functions, types, and call relationships. When the AI needs a function, it queries the index instead of reading a pile of files. In my aitm project it indexed &lt;strong&gt;246 files, 3562 symbols&lt;/strong&gt;. "Query the index" vs. "read 246 files cover to cover"—the difference isn't small; the former is like flipping to a book's table of contents, the latter like memorizing the whole book to answer one question.&lt;/p&gt;

&lt;p&gt;These three share: automatic, resident, saving before things hit the context. Install them and you mostly forget they're there—they just keep saving for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fl0y2i9v1oxzlv6gwo3to.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fl0y2i9v1oxzlv6gwo3to.png" alt="Real measured data for three automatic helpers: RTK command proxy saves 82% (7.4M tokens across 6000+ commands); claude-mem memory compression saves 86% (no re-explaining background across sessions); codegraph code graph indexes 3562 symbols (query the index instead of reading 246 files)" width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lever three: don't use the priciest model for everything
&lt;/h2&gt;

&lt;p&gt;Last one: model tiering.&lt;/p&gt;

&lt;p&gt;Grunt work—exploring, searching, reading files—goes to a cheap small model; only the real thinking, writing code and making judgments, gets the top tier. Especially when dispatching subagents—one task split into several, the grunt-work ones on small models. This is the main battlefield for saving quota.&lt;/p&gt;

&lt;p&gt;I wrote this judgment standard straight into CLAUDE.md, so the AI tiers itself each time without me spelling it out.&lt;/p&gt;

&lt;p&gt;This isn't limited to Claude Code either. On any AI platform the logic holds: know each model's capability and price, use the right tier for the job, spend the expensive compute where it counts.&lt;/p&gt;

&lt;h2&gt;
  
  
  One more trick: get the repeated parts discounted
&lt;/h2&gt;

&lt;p&gt;The three levers above all "reduce the amount entering the context." There's one more, different in kind—prompt caching. It doesn't reduce the amount; it gets the repeated parts billed at a discount.&lt;/p&gt;

&lt;p&gt;System prompts, unchanging rules files, fixed project background—the stuff that's identical every turn—pays full price the first time, then gets discounted on cache hits. And it's not a linear discount; used well, the savings are noticeable.&lt;/p&gt;

&lt;p&gt;The trick is not to let the cacheable parts keep changing: put the fixed, unchanging stuff at the front of the context and keep it stable, the per-turn variable stuff at the back. The more stable the structure, the higher the cache-hit rate, the fuller the discount.&lt;/p&gt;

&lt;p&gt;I don't have RTK-style measured numbers for this one (it saves on the billing side, not on token count), but the principle is simple and the cost near zero—worth using as a matter of course.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few honest words: this isn't a free lunch
&lt;/h2&gt;

&lt;p&gt;Got to state the costs too, or it turns into a promo piece.&lt;/p&gt;

&lt;p&gt;codegraph has to build the index first, which takes time on a big project; claude-mem's memory occasionally recalls things a bit off, so keep an eye out; trimming CLAUDE.md has a limit too—compress away the hard rules you actually need every time, and the AI drifts and reworks, which is penny-wise and pound-foolish.&lt;/p&gt;

&lt;p&gt;And don't mistake "saving tokens" for doing less work. Quite the opposite—it cuts the waste that should've been cut: repeated context, reading the whole repo, a cannon for a mosquito. The work that needs doing still gets done.&lt;/p&gt;

&lt;p&gt;What the savings mean depends on how you're billed (covered last time): a flat monthly plan saves quota headroom; pay-as-you-go saves actual cash. I use both, so these methods are a double saving for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finally
&lt;/h2&gt;

&lt;p&gt;Back to that 82%. It's no magic trick—it's piled up from the small things above: trim the rules file, install a few automatic plugins, tier your models. Each looks minor alone; stacked together, it's 7.4 million tokens saved across six thousand-plus commands.&lt;/p&gt;

&lt;p&gt;Two things you can do today, ten minutes to start:&lt;/p&gt;

&lt;p&gt;One, open your CLAUDE.md and delete the repeated, the stale, the might-as-well-not-have-written—see how many lines you can cut it to.&lt;/p&gt;

&lt;p&gt;Two, install RTK, run it a few days, and look at what its &lt;code&gt;gain&lt;/code&gt; saved you—that number will probably make you do a double take.&lt;/p&gt;

&lt;p&gt;That's all on saving tokens for now. Next I'm thinking of digging into model tiering: how to judge which model does which job, how to write CLAUDE.md so the AI tiers itself. And the details of context management—when to clear, how to read files precisely. If you're interested, let me know in the comments.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>tokens</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your docs aren't burning your tokens — your tooling is</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Sat, 20 Jun 2026 08:43:28 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/your-docs-arent-burning-your-tokens-your-tooling-is-58ck</link>
      <guid>https://dev.to/kanfu-panda/your-docs-arent-burning-your-tokens-your-tooling-is-58ck</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📝 Originally on my blog → &lt;a href="https://kanfu-panda.github.io/blog/2026/06/16/tokens-not-docs.html" rel="noopener noreferrer"&gt;https://kanfu-panda.github.io/blog/2026/06/16/tokens-not-docs.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;People keep asking me the same thing about running projects with PDLC: with all those docs — PRD, design, review at every step — aren't you burning tokens like crazy?&lt;/p&gt;

&lt;p&gt;It's a fair question. The process is broken into fine-grained stages, each leaving an artifact behind, and that does look more expensive than just "letting the AI write the code." But I'd argue you can't put the token bill on the docs.&lt;/p&gt;

&lt;p&gt;Let me put the conclusion up front. First: having lots of docs and burning lots of tokens are two different things. Second: even if you genuinely want to cut tokens, the answer is using your tools correctly, not cutting the docs.&lt;/p&gt;

&lt;p&gt;I haven't measured tokens precisely — I didn't run the same project twice, with and without docs, to get a clean percentage. What I have is hands-on experience and methods.&lt;/p&gt;

&lt;h2&gt;
  
  
  The token bill isn't PDLC's fault
&lt;/h2&gt;

&lt;p&gt;Before you settle the bill, find the right debtor.&lt;/p&gt;

&lt;p&gt;Most of the time, burning tokens isn't caused by PDLC — it's tooling used wrong. And "wrong" is concrete, in three places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt;: not clearing it when you should. One conversation running from morning to night, tens of thousands of tokens of history recomputed every single turn. You're asking a new question and paying off old debt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts&lt;/strong&gt;: too vague. The AI keeps guessing what you actually want; something you could have said once takes three rounds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calls&lt;/strong&gt;: making it read the whole repo when you're only changing one file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the most common one: never turning on the token-saving methods at all, then blaming the process for being heavy.&lt;/p&gt;

&lt;p&gt;You can't charge any of this to "PDLC has too many docs." Docs sit quietly in &lt;code&gt;docs/&lt;/code&gt; and never burn a single token on their own. What burns tokens is the usage above.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually burns tokens is rework
&lt;/h2&gt;

&lt;p&gt;In my own experience, the biggest token sink has never been generating docs — it's rework.&lt;/p&gt;

&lt;p&gt;Rewriting because the direction was wrong, tearing things down because the requirement was misread, going back because fixing one thing broke another — every one of those round-trips is real tokens. Generating a PRD is a one-time cost; rework from a wrong direction compounds.&lt;/p&gt;

&lt;p&gt;This heavy-looking PDLC process is precisely trading "write a bit more up front" for "rework a lot less later." Once you are using it, the whole flow is steadier and so is the final output — no back-and-forth. Less rework is, in itself, fewer tokens burned.&lt;/p&gt;

&lt;p&gt;So here is how I see it: docs aren't a cost, they're an asset. They leave a trace of the design decisions and the why, so you can trace back and audit. Next time the AI picks it up, it reads the docs and gets it — I don't re-explain from scratch. That saved stretch is, again, tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqi2jbtaofay1b1o48ha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqi2jbtaofay1b1o48ha.png" alt="A change that left docs behind: the AI reads them once and carries on, burning fewer tokens. With no trace, it forces rework and rewrites — and rework compounds, which is what actually burns the tokens" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  So where should you actually save tokens
&lt;/h2&gt;

&lt;p&gt;Saving tokens isn't about not writing docs — it's saving where saving belongs. The ones I actually use on my machine, roughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trim context&lt;/strong&gt;: clear it when you should; don't drag tens of thousands of tokens of history through every turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier your models&lt;/strong&gt;: don't use a cannon on a mosquito. Hand the grunt work — exploring, searching, reading files — to a cheap small model; only bring out the strongest tier for the real thinking, analysis and code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read files precisely&lt;/strong&gt;: only read what's relevant to this change; don't reflexively "read the whole project."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt caching&lt;/strong&gt;: the cached portion is billed at a discount, and it isn't a 1:1 linear relationship — used well, the savings are noticeable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put a token proxy in front of routine commands&lt;/strong&gt;: for high-frequency ops like &lt;code&gt;git status&lt;/code&gt;, squeeze the output; it adds up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallelize&lt;/strong&gt;: fire off independent work at once, fewer round-trips.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not one of these is "write fewer docs."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctn2699frns3v052uvo5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctn2699frns3v052uvo5.png" alt="Saving tokens lives in three layers — context (clear when you should / read precisely), model (grunt work to a small model / caching discount), and tooling (proxy routine commands / parallelize). Not one is " width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Not every change needs the full process
&lt;/h2&gt;

&lt;p&gt;That said, PDLC doesn't mean running the full suite on every change.&lt;/p&gt;

&lt;p&gt;A one-line bug fix — do you need a PRD, a design review? Depends; most of the time there's no need for the heavy process, so trim it. The criterion is simple: is this change worth leaving an asset for? If yes, run the full thing; for one-off small fixes, nobody blames you for cutting a few steps.&lt;/p&gt;

&lt;p&gt;And "saving tokens = saving money" needs to be said per billing model, or it misleads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On a &lt;strong&gt;flat monthly subscription&lt;/strong&gt; with a fixed quota, what you save is &lt;strong&gt;quota headroom&lt;/strong&gt; — the same money does more work.&lt;/li&gt;
&lt;li&gt;On &lt;strong&gt;pay-as-you-go API&lt;/strong&gt;, you save &lt;strong&gt;actual cash&lt;/strong&gt; — every token hits the bill.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I use both. Figure out which one you're on first; that's what tells you what "saving tokens" actually means for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkgydi6wlyd6gpsc3esde.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkgydi6wlyd6gpsc3esde.png" alt="PDLC doesn't need the full suite on every change: a one-off small fix trims the process; only something worth maintaining long-term runs full PDLC — and there the docs are the asset" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Finally
&lt;/h2&gt;

&lt;p&gt;To sum up: lots of docs doesn't equal burning tokens; if you really want to save, save on how you use your tools, not on the docs.&lt;/p&gt;

&lt;p&gt;The one thing I most want to say: docs are an asset, not a cost. Trying to save tokens by "not writing docs, just letting the AI emit code" looks like savings short-term, but the project won't go far — no trace, no traceability, and two months later you can't even say why you designed it this way. The rework then burns far more than the doc tokens you saved.&lt;/p&gt;

&lt;p&gt;One thing you can do today: look back at whether you've turned on the token-saving methods — is your context trimmed? Are you still sending everything to the strongest model instead of tiering? Did you cut the costs you could? And while you're at it, ask whether you're using PDLC well too.&lt;/p&gt;

&lt;p&gt;There's a lot more to unpack on saving tokens — how exactly to tier models, when to clear context, how to actually land the caching discount. I'll pick one and go deeper next time.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>tokens</category>
      <category>productivity</category>
    </item>
    <item>
      <title>aitm 1.0: a terminal where the AI is a participant, not the driver</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Sat, 20 Jun 2026 07:33:52 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/aitm-10-a-terminal-where-the-ai-is-a-participant-not-the-driver-2nb</link>
      <guid>https://dev.to/kanfu-panda/aitm-10-a-terminal-where-the-ai-is-a-participant-not-the-driver-2nb</guid>
      <description>&lt;p&gt;I was doing AI-assisted coding inside a terminal session. The AI kept modifying files, but I had no way to view those changes in the same window — I had to switch apps, switch context, come back. Every loop through the cycle was an interruption.&lt;/p&gt;

&lt;p&gt;What I wanted was simple: the terminal and the AI and the files, all in the same place, without the context-switching. So I built it.&lt;/p&gt;

&lt;p&gt;That's the origin of aitm. The version 1.0 is that thing, shipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design choice: participant, not driver
&lt;/h2&gt;

&lt;p&gt;Most AI terminals are built around a single model: you describe intent, the AI executes. It's efficient when it works and a bad afternoon when it doesn't.&lt;/p&gt;

&lt;p&gt;aitm draws a different line. The AI can see your environment, read your files, and call tools — but execution is always gated by you. The invariant is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI suggests → you decide → it happens.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In practice: the AI calls &lt;code&gt;list_files&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;get_terminal_history&lt;/code&gt;, and &lt;code&gt;search_history&lt;/code&gt; automatically. These are read-only. You see the results in the conversation as they come in. But &lt;code&gt;run_command&lt;/code&gt; — anything that changes state — stops the loop and waits for your approval.&lt;/p&gt;

&lt;p&gt;That distinction sounds obvious in retrospect. It wasn't obvious at design time. The first version had a "trust mode" that auto-approved low-risk commands. I removed it. The UX was slightly smoother; the &lt;em&gt;feeling&lt;/em&gt; of being in control was not worth trading away.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09q9xau9w60fa696ykcl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09q9xau9w60fa696ykcl.png" alt="The main interface: file tree on the left, terminal in the center, AI sidebar on the right" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Tauri and Rust
&lt;/h2&gt;

&lt;p&gt;Electron was the obvious first option to evaluate. At idle: ~150 MB RAM, several seconds to show the window. Fine for a prototype, not acceptable for something that sits open all day. So Electron was ruled out early.&lt;/p&gt;

&lt;p&gt;The choice was &lt;a href="https://v2.tauri.app/" rel="noopener noreferrer"&gt;Tauri 2&lt;/a&gt; + Rust from the start. Two months to get to something usable. The numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;5.3 MB binary&lt;/strong&gt; (vs 150+ MB Electron)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;3–5 ms cold start&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;~30 MB RAM at idle&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The React 19 frontend handles the UI. The Rust layer handles the PTY, IPC, tool execution, and security. This matters: the security gates run in Rust, which means the JavaScript/React layer &lt;em&gt;cannot bypass them&lt;/em&gt;. The AI layer sends requests over Tauri IPC; the Rust handler is the one that decides whether a command actually runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four-layer security model
&lt;/h2&gt;

&lt;p&gt;Every &lt;code&gt;run_command&lt;/code&gt; call goes through four sequential gates before it reaches your shell.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L1 — Blocklist regex.&lt;/strong&gt; Hard-coded patterns that always fail. &lt;code&gt;rm -rf /&lt;/code&gt;, the fork bomb &lt;code&gt;:(){ :|:&amp;amp; };:&lt;/code&gt;, &lt;code&gt;dd if=/dev/zero&lt;/code&gt;, and ~50 others. These are commands where "the user confirmed it" is still not enough — the blocklist exists precisely to be unconditional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L2 — Heuristic risk scoring.&lt;/strong&gt; The command string is scored against a set of signals: does it touch &lt;code&gt;/&lt;/code&gt;, use redirection (&lt;code&gt;&amp;gt;&lt;/code&gt;), pipe to &lt;code&gt;sh&lt;/code&gt;, reference system directories? The output is &lt;code&gt;DESTRUCTIVE&lt;/code&gt;, &lt;code&gt;HIGH&lt;/code&gt;, or &lt;code&gt;LOW&lt;/code&gt;. This label shows up in the confirmation dialog so you can see &lt;em&gt;why&lt;/em&gt; something got flagged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L3 — Project scope allowlist.&lt;/strong&gt; Each session has a configured project directory. You can define a &lt;code&gt;globset&lt;/code&gt; — paths and patterns the AI is allowed to operate on. Anything outside scope is flagged before reaching L4. This is opt-in, but it's what makes "AI working on this project" distinct from "AI with access to your whole machine."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L4 — Explicit user confirmation.&lt;/strong&gt; Every &lt;code&gt;run_command&lt;/code&gt; produces a modal: full command text, risk level, scope check result. There is no auto-approve mode and no way to configure one.&lt;/p&gt;

&lt;p&gt;L1 and L2 run synchronously in Rust with no async overhead. L3 uses the &lt;code&gt;globset&lt;/code&gt; crate. L4 is a hard gate in the IPC handler — no call path in the AI layer can reach the shell without passing through it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;run_command request
    │
    ├─ L1: blocklist regex ────────────► REJECT immediately
    │
    ├─ L2: heuristic scoring
    │       DESTRUCTIVE / HIGH / LOW
    │
    ├─ L3: project scope check ────────► flag if out of scope
    │
    └─ L4: confirmation modal ─────────► shell only if approved
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xnnx9cjejklodk69tjv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xnnx9cjejklodk69tjv.png" alt="The AI sidebar showing a conversation where the AI has gathered project context" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What else shipped in 1.0
&lt;/h2&gt;

&lt;p&gt;The tool loop and security model are the headline. Everything else in 1.0 was stuff I'd been deferring:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project scope + SQLite persistence.&lt;/strong&gt; Sessions now have a project directory. Conversation history, session state, and config all live in a local SQLite database. Nothing leaves your machine, no account required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Six LLM providers.&lt;/strong&gt; OpenAI, Anthropic, DeepSeek, Qwen (Alibaba DashScope), Zhipu, and Moonshot (Kimi). You can switch per session. The six were chosen for coverage: Western API providers plus the major Chinese providers for users who want lower-latency access from CN.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eight themes, English and Chinese UI.&lt;/strong&gt; The themes are opinionated. There's a dark ink-wash one that I find easier on the eyes during long sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Split-pane CodeMirror editor.&lt;/strong&gt; A file editor built into the window. For quick edits without losing terminal context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS Developer ID notarization.&lt;/strong&gt; The &lt;code&gt;.dmg&lt;/code&gt; is signed and notarized. No Gatekeeper warning on first launch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz71a8kb7mjk1zay1yqyf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz71a8kb7mjk1zay1yqyf.png" alt="The split-pane layout: terminal, file preview, and browser side by side" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F852cnbkjho78pk2o8wxt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F852cnbkjho78pk2o8wxt.png" alt="The settings page: appearance, themes, layout, and AI provider configuration" width="799" height="653"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's not in 1.0
&lt;/h2&gt;

&lt;p&gt;Windows support exists — CI builds it, I've run it — but macOS is the platform I use daily and where the edge cases are best covered. Windows testing is less thorough.&lt;/p&gt;

&lt;p&gt;There's no streaming AI response in the tool-calling loop. The AI responds after all tool calls complete. In practice the wait is usually under two seconds, but it's a noticeable gap when a sequence involves several reads. I'll revisit this in 1.1.&lt;/p&gt;

&lt;p&gt;Plugin system and user-defined tools are on the roadmap, not here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Download
&lt;/h2&gt;

&lt;p&gt;Binary at the &lt;a href="https://github.com/kanfu-panda/aitm/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;GitHub release&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;macOS Apple Silicon&lt;/li&gt;
&lt;li&gt;Windows x86_64&lt;/li&gt;
&lt;li&gt;Windows ARM64&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source on &lt;a href="https://github.com/kanfu-panda/aitm" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; under Apache 2.0. Issues and discussions are open. If you want to talk about the security model specifically, that's where to do it.&lt;/p&gt;

</description>
      <category>aitm</category>
      <category>terminal</category>
      <category>ai</category>
      <category>rust</category>
    </item>
    <item>
      <title>PDLC 1.1: two things v1.0 got wrong about artifact shape</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Wed, 10 Jun 2026 14:33:44 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/pdlc-11-two-things-v10-got-wrong-about-artifact-shape-5com</link>
      <guid>https://dev.to/kanfu-panda/pdlc-11-two-things-v10-got-wrong-about-artifact-shape-5com</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📝 Originally on my blog → &lt;a href="https://kanfu-panda.github.io/blog/2026/06/08/pdlc-v1-1.html" rel="noopener noreferrer"&gt;https://kanfu-panda.github.io/blog/2026/06/08/pdlc-v1-1.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three months after starting a project with PDLC, a team member asked a simple question: "What does the system look like right now?"&lt;/p&gt;

&lt;p&gt;There was no good answer. &lt;code&gt;/pdlc-arch&lt;/code&gt; had been run six times. The &lt;code&gt;docs/02_design/architecture/&lt;/code&gt; directory had six timestamped files: &lt;code&gt;20260112-arch-analysis.md&lt;/code&gt;, &lt;code&gt;20260204-arch-analysis.md&lt;/code&gt;, and so on. Each captured the architecture at a point in time. None answered the question as asked — because the question was about &lt;em&gt;current state&lt;/em&gt;, not history.&lt;/p&gt;

&lt;p&gt;The same thing happened with coding standards. The team ran &lt;code&gt;/pdlc-standard&lt;/code&gt; (in the old form) to update the naming conventions. Instead of editing the existing file, it created &lt;code&gt;coding-style-v2.md&lt;/code&gt;. Then &lt;code&gt;coding-style-v3.md&lt;/code&gt;. By the time someone new joined the project, there were four files covering the same topic, with no obvious canonical one.&lt;/p&gt;

&lt;p&gt;These weren't edge cases. They were a systematic confusion about what kind of artifact you're writing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The design choice: ledger vs surface
&lt;/h2&gt;

&lt;p&gt;PDLC v1.0 treated all artifacts the same way: write to a new file with a timestamp or version suffix, and never touch the old ones. This is the right behavior for some artifacts — a PRD records a decision at a point in time, and you don't want it overwritten when the feature evolves. A PRD is a &lt;em&gt;ledger artifact&lt;/em&gt;: append-only, date-addressed, permanent.&lt;/p&gt;

&lt;p&gt;But architecture overviews and team conventions aren't decisions. They're states. The question they answer isn't "what did we decide on Feb 4?" but "what is true right now?" For those, append-only is exactly wrong. Every new version compounds the confusion instead of resolving it.&lt;/p&gt;

&lt;p&gt;RFC #5 introduced the &lt;strong&gt;ledger/surface split&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Question answered&lt;/th&gt;
&lt;th&gt;Mutation rule&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ledger&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"What happened at time T?"&lt;/td&gt;
&lt;td&gt;Append-only, never edit in place&lt;/td&gt;
&lt;td&gt;PRD, per-feature design doc, test plan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Surface&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"What is true right now?"&lt;/td&gt;
&lt;td&gt;In-place edit, git log is the history&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ARCHITECTURE.md&lt;/code&gt;, coding standards&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The rule is enforced at the skill level. &lt;code&gt;/pdlc-arch&lt;/code&gt; is now declared &lt;code&gt;artifact_type: surface&lt;/code&gt; in its frontmatter. Its iron law: &lt;strong&gt;one file, &lt;code&gt;docs/ARCHITECTURE.md&lt;/code&gt;, overwritten every time&lt;/strong&gt;. The skill won't create a dated copy. If it finds legacy &lt;code&gt;YYYYMMDD-arch-analysis.md&lt;/code&gt; files, it moves them to &lt;code&gt;docs/.archive/architecture/&lt;/code&gt; automatically and uses the most recent one as the starting point for the new surface file.&lt;/p&gt;

&lt;p&gt;The same principle governs the new &lt;code&gt;/pdlc-standard&lt;/code&gt; skill, which manages &lt;code&gt;docs/00_standards/&lt;/code&gt;. The rule is explicit in the skill definition: &lt;code&gt;coding-style-v2.md&lt;/code&gt; is prohibited. There is one file per topic. The audit trail lives in &lt;code&gt;git log&lt;/code&gt;, not in filename proliferation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before v1.1                          After v1.1
─────────────────────────────────    ──────────────────────────────────
docs/02_design/architecture/         docs/
  20260112-arch-analysis.md            ARCHITECTURE.md   ← one file
  20260204-arch-analysis.md              (git log shows full history)
  20260315-arch-analysis.md           .archive/architecture/
  20260428-arch-analysis.md             20260112-arch-analysis.md
  20260519-arch-analysis.md             (legacy files, out of the way)
  20260601-arch-analysis.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The anti-pattern has a name now: "ledger detour" — when a surface artifact gets treated as a ledger because the tooling defaulted to append. Naming it makes it easier to catch.&lt;/p&gt;




&lt;h2&gt;
  
  
  The feature relation problem
&lt;/h2&gt;

&lt;p&gt;v1.0 introduced per-feature state machines at &lt;code&gt;docs/.pdlc-state/&amp;lt;feature-id&amp;gt;.json&lt;/code&gt;. Feature IDs are flat: &lt;code&gt;F20260501-01&lt;/code&gt;, &lt;code&gt;F20260501-02&lt;/code&gt;, and so on. Each feature tracks its own stages independently.&lt;/p&gt;

&lt;p&gt;The problem surfaces when features start depending on each other. You're changing &lt;code&gt;F20260501-03&lt;/code&gt; (user authentication). Does anything else break? Under v1.0, the only way to know was to read every PRD and hope the author mentioned the dependency. There was no structural answer.&lt;/p&gt;

&lt;p&gt;RFC #6 adds the feature relation chain via &lt;code&gt;/pdlc-relate&lt;/code&gt;. Six relation types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;extends&lt;/code&gt; — "this feature builds on that one"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;depends_on&lt;/code&gt; — "this feature requires that one to function"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;supersedes&lt;/code&gt; — "this feature replaces that one"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;resolves&lt;/code&gt; — "this feature fixes that defect"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;conflicts_with&lt;/code&gt; — "these two features can't coexist as-is" (symmetric)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;relates_to&lt;/code&gt; — "loosely connected, worth knowing" (symmetric)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Relations are stored redundantly across five locations: the feature state machine JSON, the document traceability header, a reverse index at &lt;code&gt;docs/.pdlc-state/_relations.json&lt;/code&gt;, and a generated Mermaid graph at &lt;code&gt;docs/.pdlc-state/_graph.md&lt;/code&gt;. The redundancy is intentional — any skill can read from whichever location is most convenient, and &lt;code&gt;/pdlc-relate validate&lt;/code&gt; checks that all five stay consistent.&lt;/p&gt;

&lt;p&gt;The command that makes this useful is &lt;code&gt;impact&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/pdlc-relate impact F20260501-03
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Impact radius for F20260501-03 (user-auth)

🔴 Direct (depth 1) — must coordinate
   F20260601-01  oauth-integration   extends
   F20260601-04  session-management  depends_on

🟡 Transitive (depth ≥2) — should review
   F20260612-02  user-profile-sync   depends_on → F20260601-01

🟢 Historical — audit only
   B20260520-07  login-loop-fix      resolves
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the answer to "what breaks if I touch this?" — with severity levels, not just a flat list. The direct layer tells you what needs coordination before you make the change. The transitive layer tells you what to review. The historical layer shows what defects this feature has already resolved, which is useful when deciding whether to edit in place or create a &lt;code&gt;supersedes&lt;/code&gt; feature instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    A[F20260501-03&amp;lt;br/&amp;gt;user-auth] --&amp;gt;|extended by| B[F20260601-01&amp;lt;br/&amp;gt;oauth-integration]
    A --&amp;gt;|depended on by| C[F20260601-04&amp;lt;br/&amp;gt;session-management]
    B --&amp;gt;|depended on by| D[F20260612-02&amp;lt;br/&amp;gt;user-profile-sync]
    E[B20260520-07&amp;lt;br/&amp;gt;login-loop-fix] --&amp;gt;|resolved by| A
    style A fill:#ff6b6b,color:#fff
    style B fill:#ffd93d
    style C fill:#ffd93d
    style D fill:#6bcb77
    style E fill:#6bcb77
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What to watch out for
&lt;/h2&gt;

&lt;p&gt;The ledger/surface distinction is easy to misapply in one direction: marking something surface when it should be a ledger. A design decision that gets reconsidered six months later should have both the original record and the new one — not an in-place overwrite that destroys the history of the decision. If you're not sure, default to ledger. Surface is the right call only when the artifact is genuinely describing current state, not recording a decision.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;/pdlc-relate&lt;/code&gt;, the main friction point is bootstrapping. If you have thirty features and add the relation chain in v1.1, none of them have relation data yet. &lt;code&gt;/pdlc-relate orphans&lt;/code&gt; will list all thirty as orphans, which is technically correct but not actionable. The practical approach is to add relations incrementally as you touch features, rather than trying to backfill the entire graph in one session.&lt;/p&gt;

&lt;p&gt;The five-location redundancy is also worth noting as a tradeoff. Keeping relations in five places makes reads fast for any skill, but it means &lt;code&gt;/pdlc-relate validate&lt;/code&gt; needs to run after any manual edit to catch drift. The alternative — a single source of truth that every skill reads — would require every relation lookup to go through &lt;code&gt;_relations.json&lt;/code&gt;, which adds a dependency that wasn't there before. The current design accepts the redundancy to keep skills loosely coupled.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where things stand
&lt;/h2&gt;

&lt;p&gt;PDLC is at v1.1.0. The plugin has 33 skills (up from 31). The two new skills are additive — existing projects don't need to migrate anything to get the v1.1 behavior for &lt;code&gt;/pdlc-arch&lt;/code&gt;. The relation chain is opt-in; if you never run &lt;code&gt;/pdlc-relate&lt;/code&gt;, nothing changes.&lt;/p&gt;

&lt;p&gt;Both new skills dogfood their own concepts: the &lt;code&gt;docs/ARCHITECTURE.md&lt;/code&gt; in the pdlc-skills repo is a surface artifact maintained by &lt;code&gt;/pdlc-arch&lt;/code&gt;, and &lt;code&gt;docs/GLOSSARY.md&lt;/code&gt; is managed as a surface file.&lt;/p&gt;

&lt;p&gt;The piece that's still manual: there's no automatic detection of "you're about to change a feature that has incoming &lt;code&gt;depends_on&lt;/code&gt; edges." The impact check has to be run explicitly. Whether to put this in the &lt;code&gt;/pdlc-implement&lt;/code&gt; guard is an open question — it would add a lookup on every implementation start, which might be more noise than signal for small projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# New install&lt;/span&gt;
/claude &lt;span class="nb"&gt;install &lt;/span&gt;kanfu-panda/pdlc-skills

&lt;span class="c"&gt;# Upgrade from v1.0&lt;/span&gt;
bash &amp;lt;&lt;span class="o"&gt;(&lt;/span&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/kanfu-panda/pdlc-skills/main/install.sh&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source and issues: &lt;a href="https://github.com/kanfu-panda/pdlc-skills" rel="noopener noreferrer"&gt;github.com/kanfu-panda/pdlc-skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First article in the series: &lt;a href="https://dev.to/kanfu-panda/why-hard-contracts-beat-soft-conventions-when-working-with-ai-coding-agents-5fk7"&gt;Why Hard Contracts Beat Soft Conventions When Working With AI Coding Agents&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Next
&lt;/h2&gt;

&lt;p&gt;Episode 03 will cover aitm — the terminal I built to solve the context-switching problem in AI-assisted coding. 5.3 MB binary, Rust + Tauri, a four-layer security model for AI command execution, and some honest notes on what the "participant not driver" framing actually costs in UX.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why hard contracts beat soft conventions when working with AI coding agents</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Tue, 19 May 2026 18:10:04 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/why-hard-contracts-beat-soft-conventions-when-working-with-ai-coding-agents-5fk7</link>
      <guid>https://dev.to/kanfu-panda/why-hard-contracts-beat-soft-conventions-when-working-with-ai-coding-agents-5fk7</guid>
      <description>&lt;p&gt;A retrospective on building PDLC — 31 slash commands that force my Claude Code workflow to actually finish features.&lt;/p&gt;

&lt;p&gt;Six months of pair-programming with Claude Code taught me one uncomfortable truth: &lt;strong&gt;the AI is great at writing code, and terrible at the rest of the software lifecycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It happily says "done" when only the chat transcript proves the PRD existed.&lt;/p&gt;

&lt;p&gt;It writes tests, but after the implementation — making "TDD" a polite lie.&lt;/p&gt;

&lt;p&gt;It loses track of which feature is at which stage the moment you start a new session.&lt;/p&gt;

&lt;p&gt;It enters lint-fix loops that converge on nothing.&lt;/p&gt;

&lt;p&gt;None of this is the model's fault. The model does what you ask, in the moment you ask it. The problem is that &lt;em&gt;soft conventions&lt;/em&gt; — "please write a PRD first", "please write the failing test first" — are how we talk to humans, who keep their own working memory.&lt;/p&gt;

&lt;p&gt;LLMs don't. They need &lt;em&gt;hard contracts&lt;/em&gt;: rules that the workflow itself enforces, not rules the model is supposed to remember.&lt;/p&gt;

&lt;p&gt;This is the story of how I encoded that conviction into a Claude Code plugin called &lt;strong&gt;PDLC&lt;/strong&gt; — Product Development Life Cycle — and what surprised me along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the contract
&lt;/h2&gt;

&lt;p&gt;PDLC ships 31 slash commands organized in three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entry points (3):&lt;/strong&gt; &lt;code&gt;/pdlc-feature&lt;/code&gt;, &lt;code&gt;/pdlc-fix&lt;/code&gt;, &lt;code&gt;/pdlc-status&lt;/code&gt; — one sentence drives a whole chain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stages (11):&lt;/strong&gt; &lt;code&gt;/pdlc-prd&lt;/code&gt;, &lt;code&gt;/pdlc-design&lt;/code&gt;, &lt;code&gt;/pdlc-tdd&lt;/code&gt;, &lt;code&gt;/pdlc-implement&lt;/code&gt;, &lt;code&gt;/pdlc-review&lt;/code&gt;, &lt;code&gt;/pdlc-e2e&lt;/code&gt;, &lt;code&gt;/pdlc-ship&lt;/code&gt;, ... — when you want fine-grained control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools (17):&lt;/strong&gt; &lt;code&gt;/pdlc-ui-design&lt;/code&gt;, &lt;code&gt;/pdlc-db-migrate&lt;/code&gt;, &lt;code&gt;/pdlc-security&lt;/code&gt;, &lt;code&gt;/pdlc-perf&lt;/code&gt;, ... — specialized concerns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every Layer 1/2 stage that produces artifacts is bound to five invariants I call the Iron Law:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Persist to disk.&lt;/strong&gt; Every artifact (PRD, API design, DB schema, test plan, review notes) lands as a real file under &lt;code&gt;docs/&lt;/code&gt;. You can &lt;code&gt;git diff&lt;/code&gt; what the AI did.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update the state machine.&lt;/strong&gt; Each stage writes &lt;code&gt;docs/.pdlc-state/&amp;lt;feature-id&amp;gt;.json&lt;/code&gt;. New session? &lt;code&gt;/pdlc-status&lt;/code&gt; tells you exactly where every feature stands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests first.&lt;/strong&gt; &lt;code&gt;/pdlc-implement&lt;/code&gt; literally refuses to proceed if &lt;code&gt;/pdlc-tdd&lt;/code&gt; hasn't already produced a failing-test artifact on disk for this feature. Real TDD red-light gate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-check.&lt;/strong&gt; Every stage runs a self-audit before handing off. Catch drift at the stage boundary, not three stages downstream during review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-shot repair.&lt;/strong&gt; Auto-fix loops run at most once. If a stage's output fails its own audit, the model gets one chance to repair it, then flags the issue for a human. No more "fix → check → fix → check → fix" until the heat death of your token budget.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The state machine matters more than I expected.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I started with the persistence rule alone — write everything to disk. That helped, but a week into using it I caught myself asking the AI "wait, did we write a PRD for the phone-verification thing?" again. The artifacts existed; they were just hard to find.&lt;/p&gt;

&lt;p&gt;Adding the per-feature state file (&lt;code&gt;F20260502-01.json&lt;/code&gt; with &lt;code&gt;current_stage&lt;/code&gt;, &lt;code&gt;artifacts&lt;/code&gt;, &lt;code&gt;next_step&lt;/code&gt;) was the moment the plugin started feeling like an actual process tool, not a fancier prompt template. &lt;code&gt;/pdlc-status&lt;/code&gt; became my new dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Tests-first was the hardest contract to make stick.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The natural tendency of an LLM is to "show its work" by writing the implementation first, then writing tests that conveniently pass. To make &lt;code&gt;/pdlc-tdd&lt;/code&gt; truly red-light, I had to make &lt;code&gt;/pdlc-implement&lt;/code&gt; &lt;em&gt;read the test artifact&lt;/em&gt; and verify it currently fails — not just exists. Without that verification, the model would happily generate stubs that already pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. One-shot repair was a token-budget revelation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before this rule, lint-fix loops would burn 30k tokens converging on a misunderstanding. The model would "fix" something that wasn't the real issue, the checker would complain again with a slightly different message, the model would "fix" the new message, and so on. Capping the repair at one attempt forces it to either understand the real problem or escalate to a human — both vastly better than infinite drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this falls apart
&lt;/h2&gt;

&lt;p&gt;Three things I should be honest about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code only.&lt;/strong&gt; I rely on slash commands and skills as first-class primitives. Cline and Cursor don't have direct equivalents. Porting is possible (the contracts are in bash + markdown) but not free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Iron Law is not law.&lt;/strong&gt; A determined user can edit the state file by hand, or invoke &lt;code&gt;/pdlc-implement&lt;/code&gt; directly without going through &lt;code&gt;/pdlc-tdd&lt;/code&gt;. The contracts are &lt;em&gt;guardrails&lt;/em&gt;, not jail. That's deliberate — guardrails that you can step over with one extra command are usually right.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overhead for tiny changes.&lt;/strong&gt; Running the full chain for a 3-line CSS fix is overkill. That's what &lt;code&gt;/pdlc-fix&lt;/code&gt; (lighter chain) is for, but the line between "feature" and "fix" is judgmental.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to reach for it
&lt;/h2&gt;

&lt;p&gt;If your workflow with an AI agent involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple sessions per feature → the state machine pays for itself&lt;/li&gt;
&lt;li&gt;Anything you'd want to &lt;code&gt;git diff&lt;/code&gt; later → persistence pays for itself&lt;/li&gt;
&lt;li&gt;Code where you regret not having tests → TDD red light pays for itself&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're using AI for one-off scripts, refactor sweeps, or single-session prototypes, PDLC is more ceremony than the work justifies. Use it where the discipline is worth its weight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Install (no clone needed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/kanfu-panda/pdlc-skills/main/install.sh | bash &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--global&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/pdlc-feature add phone-number verification to user login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will allocate a feature ID, walk through the chain, and refuse to skip the steps that matter.&lt;/p&gt;

&lt;p&gt;Repo (MIT): &lt;a href="https://github.com/kanfu-panda/pdlc-skills" rel="noopener noreferrer"&gt;https://github.com/kanfu-panda/pdlc-skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you build something on top, file a Discussion — I'd love to see what shapes hold up that I haven't tested.&lt;/p&gt;

&lt;p&gt;— kanfu-panda&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>workflow</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
