<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kanfu-panda</title>
    <description>The latest articles on DEV Community by kanfu-panda (@kanfu-panda).</description>
    <link>https://dev.to/kanfu-panda</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940859%2F45897ada-53a1-4e66-a3ca-356f624df48e.jpeg</url>
      <title>DEV Community: kanfu-panda</title>
      <link>https://dev.to/kanfu-panda</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kanfu-panda"/>
    <language>en</language>
    <item>
      <title>Why hard contracts beat soft conventions when working with AI coding agents</title>
      <dc:creator>kanfu-panda</dc:creator>
      <pubDate>Tue, 19 May 2026 18:10:04 +0000</pubDate>
      <link>https://dev.to/kanfu-panda/why-hard-contracts-beat-soft-conventions-when-working-with-ai-coding-agents-5fk7</link>
      <guid>https://dev.to/kanfu-panda/why-hard-contracts-beat-soft-conventions-when-working-with-ai-coding-agents-5fk7</guid>
      <description>&lt;p&gt;A retrospective on building PDLC — 31 slash commands that force my Claude Code workflow to actually finish features.&lt;/p&gt;

&lt;p&gt;Six months of pair-programming with Claude Code taught me one uncomfortable truth: &lt;strong&gt;the AI is great at writing code, and terrible at the rest of the software lifecycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It happily says "done" when only the chat transcript proves the PRD existed.&lt;/p&gt;

&lt;p&gt;It writes tests, but after the implementation — making "TDD" a polite lie.&lt;/p&gt;

&lt;p&gt;It loses track of which feature is at which stage the moment you start a new session.&lt;/p&gt;

&lt;p&gt;It enters lint-fix loops that converge on nothing.&lt;/p&gt;

&lt;p&gt;None of this is the model's fault. The model does what you ask, in the moment you ask it. The problem is that &lt;em&gt;soft conventions&lt;/em&gt; — "please write a PRD first", "please write the failing test first" — are how we talk to humans, who keep their own working memory.&lt;/p&gt;

&lt;p&gt;LLMs don't. They need &lt;em&gt;hard contracts&lt;/em&gt;: rules that the workflow itself enforces, not rules the model is supposed to remember.&lt;/p&gt;

&lt;p&gt;This is the story of how I encoded that conviction into a Claude Code plugin called &lt;strong&gt;PDLC&lt;/strong&gt; — Product Development Life Cycle — and what surprised me along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the contract
&lt;/h2&gt;

&lt;p&gt;PDLC ships 31 slash commands organized in three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entry points (3):&lt;/strong&gt; &lt;code&gt;/pdlc-feature&lt;/code&gt;, &lt;code&gt;/pdlc-fix&lt;/code&gt;, &lt;code&gt;/pdlc-status&lt;/code&gt; — one sentence drives a whole chain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stages (11):&lt;/strong&gt; &lt;code&gt;/pdlc-prd&lt;/code&gt;, &lt;code&gt;/pdlc-design&lt;/code&gt;, &lt;code&gt;/pdlc-tdd&lt;/code&gt;, &lt;code&gt;/pdlc-implement&lt;/code&gt;, &lt;code&gt;/pdlc-review&lt;/code&gt;, &lt;code&gt;/pdlc-e2e&lt;/code&gt;, &lt;code&gt;/pdlc-ship&lt;/code&gt;, ... — when you want fine-grained control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools (17):&lt;/strong&gt; &lt;code&gt;/pdlc-ui-design&lt;/code&gt;, &lt;code&gt;/pdlc-db-migrate&lt;/code&gt;, &lt;code&gt;/pdlc-security&lt;/code&gt;, &lt;code&gt;/pdlc-perf&lt;/code&gt;, ... — specialized concerns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every Layer 1/2 stage that produces artifacts is bound to five invariants I call the Iron Law:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Persist to disk.&lt;/strong&gt; Every artifact (PRD, API design, DB schema, test plan, review notes) lands as a real file under &lt;code&gt;docs/&lt;/code&gt;. You can &lt;code&gt;git diff&lt;/code&gt; what the AI did.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update the state machine.&lt;/strong&gt; Each stage writes &lt;code&gt;docs/.pdlc-state/&amp;lt;feature-id&amp;gt;.json&lt;/code&gt;. New session? &lt;code&gt;/pdlc-status&lt;/code&gt; tells you exactly where every feature stands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests first.&lt;/strong&gt; &lt;code&gt;/pdlc-implement&lt;/code&gt; literally refuses to proceed if &lt;code&gt;/pdlc-tdd&lt;/code&gt; hasn't already produced a failing-test artifact on disk for this feature. Real TDD red-light gate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-check.&lt;/strong&gt; Every stage runs a self-audit before handing off. Catch drift at the stage boundary, not three stages downstream during review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-shot repair.&lt;/strong&gt; Auto-fix loops run at most once. If a stage's output fails its own audit, the model gets one chance to repair it, then flags the issue for a human. No more "fix → check → fix → check → fix" until the heat death of your token budget.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The state machine matters more than I expected.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I started with the persistence rule alone — write everything to disk. That helped, but a week into using it I caught myself asking the AI "wait, did we write a PRD for the phone-verification thing?" again. The artifacts existed; they were just hard to find.&lt;/p&gt;

&lt;p&gt;Adding the per-feature state file (&lt;code&gt;F20260502-01.json&lt;/code&gt; with &lt;code&gt;current_stage&lt;/code&gt;, &lt;code&gt;artifacts&lt;/code&gt;, &lt;code&gt;next_step&lt;/code&gt;) was the moment the plugin started feeling like an actual process tool, not a fancier prompt template. &lt;code&gt;/pdlc-status&lt;/code&gt; became my new dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Tests-first was the hardest contract to make stick.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The natural tendency of an LLM is to "show its work" by writing the implementation first, then writing tests that conveniently pass. To make &lt;code&gt;/pdlc-tdd&lt;/code&gt; truly red-light, I had to make &lt;code&gt;/pdlc-implement&lt;/code&gt; &lt;em&gt;read the test artifact&lt;/em&gt; and verify it currently fails — not just exists. Without that verification, the model would happily generate stubs that already pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. One-shot repair was a token-budget revelation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before this rule, lint-fix loops would burn 30k tokens converging on a misunderstanding. The model would "fix" something that wasn't the real issue, the checker would complain again with a slightly different message, the model would "fix" the new message, and so on. Capping the repair at one attempt forces it to either understand the real problem or escalate to a human — both vastly better than infinite drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this falls apart
&lt;/h2&gt;

&lt;p&gt;Three things I should be honest about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code only.&lt;/strong&gt; I rely on slash commands and skills as first-class primitives. Cline and Cursor don't have direct equivalents. Porting is possible (the contracts are in bash + markdown) but not free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Iron Law is not law.&lt;/strong&gt; A determined user can edit the state file by hand, or invoke &lt;code&gt;/pdlc-implement&lt;/code&gt; directly without going through &lt;code&gt;/pdlc-tdd&lt;/code&gt;. The contracts are &lt;em&gt;guardrails&lt;/em&gt;, not jail. That's deliberate — guardrails that you can step over with one extra command are usually right.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overhead for tiny changes.&lt;/strong&gt; Running the full chain for a 3-line CSS fix is overkill. That's what &lt;code&gt;/pdlc-fix&lt;/code&gt; (lighter chain) is for, but the line between "feature" and "fix" is judgmental.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to reach for it
&lt;/h2&gt;

&lt;p&gt;If your workflow with an AI agent involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple sessions per feature → the state machine pays for itself&lt;/li&gt;
&lt;li&gt;Anything you'd want to &lt;code&gt;git diff&lt;/code&gt; later → persistence pays for itself&lt;/li&gt;
&lt;li&gt;Code where you regret not having tests → TDD red light pays for itself&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're using AI for one-off scripts, refactor sweeps, or single-session prototypes, PDLC is more ceremony than the work justifies. Use it where the discipline is worth its weight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Install (no clone needed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/kanfu-panda/pdlc-skills/main/install.sh | bash &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--global&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/pdlc-feature add phone-number verification to user login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will allocate a feature ID, walk through the chain, and refuse to skip the steps that matter.&lt;/p&gt;

&lt;p&gt;Repo (MIT): &lt;a href="https://github.com/kanfu-panda/pdlc-skills" rel="noopener noreferrer"&gt;https://github.com/kanfu-panda/pdlc-skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you build something on top, file a Discussion — I'd love to see what shapes hold up that I haven't tested.&lt;/p&gt;

&lt;p&gt;— kanfu-panda&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>workflow</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
