DEV Community

Cover image for Why hard contracts beat soft conventions when working with AI coding agents
kanfu-panda
kanfu-panda

Posted on

Why hard contracts beat soft conventions when working with AI coding agents

A retrospective on building PDLC — 31 slash commands that force my Claude Code workflow to actually finish features.

Six months of pair-programming with Claude Code taught me one uncomfortable truth: the AI is great at writing code, and terrible at the rest of the software lifecycle.

It happily says "done" when only the chat transcript proves the PRD existed.

It writes tests, but after the implementation — making "TDD" a polite lie.

It loses track of which feature is at which stage the moment you start a new session.

It enters lint-fix loops that converge on nothing.

None of this is the model's fault. The model does what you ask, in the moment you ask it. The problem is that soft conventions — "please write a PRD first", "please write the failing test first" — are how we talk to humans, who keep their own working memory.

LLMs don't. They need hard contracts: rules that the workflow itself enforces, not rules the model is supposed to remember.

This is the story of how I encoded that conviction into a Claude Code plugin called PDLC — Product Development Life Cycle — and what surprised me along the way.

The shape of the contract

PDLC ships 31 slash commands organized in three layers:

  • Entry points (3): /pdlc-feature, /pdlc-fix, /pdlc-status — one sentence drives a whole chain
  • Stages (11): /pdlc-prd, /pdlc-design, /pdlc-tdd, /pdlc-implement, /pdlc-review, /pdlc-e2e, /pdlc-ship, ... — when you want fine-grained control
  • Tools (17): /pdlc-ui-design, /pdlc-db-migrate, /pdlc-security, /pdlc-perf, ... — specialized concerns

Every Layer 1/2 stage that produces artifacts is bound to five invariants I call the Iron Law:

  1. Persist to disk. Every artifact (PRD, API design, DB schema, test plan, review notes) lands as a real file under docs/. You can git diff what the AI did.
  2. Update the state machine. Each stage writes docs/.pdlc-state/<feature-id>.json. New session? /pdlc-status tells you exactly where every feature stands.
  3. Tests first. /pdlc-implement literally refuses to proceed if /pdlc-tdd hasn't already produced a failing-test artifact on disk for this feature. Real TDD red-light gate.
  4. Self-check. Every stage runs a self-audit before handing off. Catch drift at the stage boundary, not three stages downstream during review.
  5. One-shot repair. Auto-fix loops run at most once. If a stage's output fails its own audit, the model gets one chance to repair it, then flags the issue for a human. No more "fix → check → fix → check → fix" until the heat death of your token budget.

What surprised me

1. The state machine matters more than I expected.

I started with the persistence rule alone — write everything to disk. That helped, but a week into using it I caught myself asking the AI "wait, did we write a PRD for the phone-verification thing?" again. The artifacts existed; they were just hard to find.

Adding the per-feature state file (F20260502-01.json with current_stage, artifacts, next_step) was the moment the plugin started feeling like an actual process tool, not a fancier prompt template. /pdlc-status became my new dashboard.

2. Tests-first was the hardest contract to make stick.

The natural tendency of an LLM is to "show its work" by writing the implementation first, then writing tests that conveniently pass. To make /pdlc-tdd truly red-light, I had to make /pdlc-implement read the test artifact and verify it currently fails — not just exists. Without that verification, the model would happily generate stubs that already pass.

3. One-shot repair was a token-budget revelation.

Before this rule, lint-fix loops would burn 30k tokens converging on a misunderstanding. The model would "fix" something that wasn't the real issue, the checker would complain again with a slightly different message, the model would "fix" the new message, and so on. Capping the repair at one attempt forces it to either understand the real problem or escalate to a human — both vastly better than infinite drift.

Where this falls apart

Three things I should be honest about:

  • Claude Code only. I rely on slash commands and skills as first-class primitives. Cline and Cursor don't have direct equivalents. Porting is possible (the contracts are in bash + markdown) but not free.
  • The Iron Law is not law. A determined user can edit the state file by hand, or invoke /pdlc-implement directly without going through /pdlc-tdd. The contracts are guardrails, not jail. That's deliberate — guardrails that you can step over with one extra command are usually right.
  • Overhead for tiny changes. Running the full chain for a 3-line CSS fix is overkill. That's what /pdlc-fix (lighter chain) is for, but the line between "feature" and "fix" is judgmental.

When to reach for it

If your workflow with an AI agent involves:

  • Multiple sessions per feature → the state machine pays for itself
  • Anything you'd want to git diff later → persistence pays for itself
  • Code where you regret not having tests → TDD red light pays for itself

If you're using AI for one-off scripts, refactor sweeps, or single-session prototypes, PDLC is more ceremony than the work justifies. Use it where the discipline is worth its weight.

Try it

Install (no clone needed):

curl -fsSL https://raw.githubusercontent.com/kanfu-panda/pdlc-skills/main/install.sh | bash -s -- --global
Enter fullscreen mode Exit fullscreen mode

Then in Claude Code:

/pdlc-feature add phone-number verification to user login
Enter fullscreen mode Exit fullscreen mode

It will allocate a feature ID, walk through the chain, and refuse to skip the steps that matter.

Repo (MIT): https://github.com/kanfu-panda/pdlc-skills

If you build something on top, file a Discussion — I'd love to see what shapes hold up that I haven't tested.

— kanfu-panda

Top comments (0)