brian austin

Posted on Apr 6

Claude Code test-driven development: write tests first, let AI implement

#ai #testing #tdd #claudecode

Claude Code test-driven development: write tests first, let AI implement

Here's a workflow that changed how I build features with Claude Code: write the tests first, then ask Claude to make them pass.

Instead of "build me a user authentication system", you write:

// auth.test.js
describe('UserAuth', () => {
  test('should create user with hashed password', async () => {
    const user = await UserAuth.create({ email: 'test@example.com', password: 'secret123' });
    expect(user.passwordHash).toBeDefined();
    expect(user.passwordHash).not.toBe('secret123');
  });

  test('should return JWT on valid login', async () => {
    const token = await UserAuth.login({ email: 'test@example.com', password: 'secret123' });
    expect(token).toMatch(/^[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+$/);
  });

  test('should throw on invalid password', async () => {
    await expect(UserAuth.login({ email: 'test@example.com', password: 'wrong' }))
      .rejects.toThrow('Invalid credentials');
  });
});

Then you tell Claude:

Make these tests pass. Don't modify the tests.

Claud writes the implementation. The tests define the contract.

Why this works so well with AI

1. Tests are a specification Claude can't hallucinate around

When you describe features in prose, Claude fills gaps with assumptions. When you write tests, there are no gaps — the test either passes or it doesn't.

2. You stay in control of the interface

The test file defines your API surface. Claude implements the internals. You get to review exactly what the public contract is before any code exists.

3. Refactoring becomes safe

Write tests → Claude implements → tests pass. Now you can say "refactor this for better performance" and immediately know if something broke.

The TDD prompt pattern

Here are failing tests in test/payment.test.js.
Make ALL of them pass without modifying the test file.
If a test seems impossible to satisfy, tell me before writing any code.

That last line is important. Claude will sometimes discover that your test spec has contradictions. Better to catch that before implementation.

Start with the happy path

Write 3-5 tests for the most important cases first:

# payment_test.py
def test_charge_returns_transaction_id():
    result = Payment.charge(amount=1000, card_token='tok_visa')
    assert result.transaction_id is not None
    assert result.status == 'success'

def test_charge_amount_in_cents():
    result = Payment.charge(amount=1000, card_token='tok_visa')
    assert result.amount_charged == 1000  # cents, not dollars

def test_invalid_card_raises_PaymentError():
    with pytest.raises(PaymentError, match='Card declined'):
        Payment.charge(amount=1000, card_token='tok_fail')

Get these passing. Then add edge cases, error states, validation.

The "red-green-refactor" loop with Claude

Round 1:
You: [paste failing tests]
      Make these pass.
Claude: [writes implementation]

Round 2:
You: [add more tests]
      These 3 new tests are failing. Fix them without breaking existing tests.
Claude: [updates implementation]

Round 3:
You: All tests pass. Now refactor for readability — no logic changes.
Claude: [cleans up implementation]

The test-first advantage for complex features

For a feature like rate limiting:

describe('RateLimiter', () => {
  test('allows 10 requests per minute', async () => { ... });
  test('blocks request 11 within same minute', async () => { ... });
  test('resets after 60 seconds', async () => { ... });
  test('counts per user, not globally', async () => { ... });
  test('returns retry-after header when blocked', async () => { ... });
});

Writing these tests forces you to think through every edge case before Claude writes a single line. You discover the complexity upfront.

The implementation Claude produces will be better because you've already specified all the edge cases.

One gotcha: test isolation

Claud sometimes writes implementations that pass tests by hardcoding expected values. Watch for this:

// Red flag: Claude hardcoded the test email
getUser('test@example.com') {
  return { email: 'test@example.com', name: 'Test User' };
}

Add a second test with different data to catch this.

Running tests in hooks

Pair this with Claude Code hooks to auto-run tests on every save:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Write|Edit",
      "hooks": [{
        "type": "command",
        "command": "npm test -- --passWithNoTests 2>&1 | tail -5"
      }]
    }]
  }
}

Now Claude sees test results immediately after writing each file. It self-corrects without you asking.

TDD with Claude Code flips the dynamic: instead of reviewing AI output and hoping it's right, you define "right" upfront and let AI chase correctness.

The rate limit problem: Running multiple TDD loops (implement → test → refactor → implement again) eats through Claude's rate limits fast. If you're hitting walls, ANTHROPIC_BASE_URL lets you route Claude Code through a $2/month proxy — same Claude models, no rate limit interruptions. 7-day free trial.

What's your TDD workflow with Claude Code? Drop it in the comments.

DEV Community

Claude Code test-driven development: write tests first, let AI implement

Claude Code test-driven development: write tests first, let AI implement

Why this works so well with AI

The TDD prompt pattern

Start with the happy path

The "red-green-refactor" loop with Claude

The test-first advantage for complex features

One gotcha: test isolation

Running tests in hooks

Top comments (0)