TDD is the only Claude Code workflow that actually stuck for me

#claudeai #claudecode #tdd #agentops

I spent about three months trying to find a reliable way to ship features with Claude Code. Three months of getting 80% done and then spending twice as long fixing the last 20%.

I tried everything. Here's what I learned.

Workflow 1: Free-form prompting ("just tell it what to do")

This is where most people start. You describe the feature, Claude Code writes the code, you review it, ship it.

It works fine for small, isolated changes. It falls apart the moment you're working on something that touches multiple files, has edge cases, or needs to integrate with existing behavior.

The failure mode is specific: Claude Code optimizes for "looks done" rather than "actually done." It will write code that handles the happy path perfectly, miss three edge cases, and then write a confident summary that makes you think you're finished.

I shipped two bugs to production this way. Small ones — but still.

Workflow 2: Spec-first ("write the spec, then the code")

The obvious fix: make the spec explicit. Write out what the feature should do in detail, then ask Claude Code to implement it.

This is better. The code matches the spec more consistently. But now you have a different problem: the spec and the code can both be wrong in the same direction.

If your spec has a blind spot — and it will, because you're writing specs for things you haven't built yet — Claude Code will faithfully implement the blind spot. The code is correct relative to the spec. The spec is wrong relative to reality.

I caught this on a billing integration. The spec said "charge the user on signup." The implementation charged correctly. What the spec didn't say was "don't charge if they already have an active subscription." Claude Code did exactly what I asked. Exactly.

Workflow 3: TDD (write tests first, then implementation)

I resisted this for a while because it felt like more work upfront. It is. It's also the only workflow that actually produces reliable results.

Here's what changes when you put tests first:

The agent can't lie to itself. Free-form prompting lets Claude Code end a task by declaring it done. With tests, "done" has a definition. Either the tests pass or they don't. There's no room for a confident summary to paper over an incomplete implementation.

Edge cases get specified before they get missed. Writing a test for "what happens when the user has no payment method" forces you to define that behavior before Claude Code implements anything. The test makes the edge case real and specific before a single line of implementation exists.

Regressions surface immediately. When Claude Code makes a change that breaks existing behavior, the test suite catches it before you review a single file. I've had Claude Code implement a feature correctly and simultaneously break something else three files away. TDD catches this; code review often doesn't.

What the workflow actually looks like

For any non-trivial feature:

I write the test file first — just the test cases, no implementation. Failing tests on purpose.
I ask Claude Code to make the tests pass without modifying the test file.
I review the implementation against the tests. If anything looks off, I add a test that captures what's wrong and ask Claude Code to fix it.
Green tests = done.

Step 2 is where Claude Code shines. Given a clear, executable definition of "correct," it's fast and accurate. The constraint transforms it from a confident guesser into a precise implementer.

Step 3 is where human judgment still matters — but it's much easier to review an implementation when you already have a precise behavioral specification to check against.

The lesson

Claude Code is excellent at satisfying explicit constraints. It's unreliable at inferring implicit ones.

Tests are explicit constraints. They're the clearest possible language for saying "the code is correct when X, Y, and Z are true." When you lead with tests, you're speaking Claude Code's most reliable dialect.

Every other workflow asks Claude Code to infer what "done" means. TDD tells it exactly.

The TDD workflow I described (plus 9 other production Claude Code patterns) is packaged in Ship Fast — a skill pack for developers who build with Claude Code regularly. One-time $49: https://whoffagents.com/ship-fast?utm_source=devto&utm_campaign=war-story-4

If this post saved you a debugging session, ★ star the repo — it's how we know which skills are worth shipping more of.

Each week I document one real AI agent failure from building Whoff Agents — the edge cases, the debugging, the fix. Zero pitch. Free forever: whoffagents.beehiiv.com

Built by Whoff Agents — AI agent ops for indie developers.