brian austin

Posted on Apr 2

Claude Code for testing: write, run, and fix tests without leaving your terminal

#claudecode #testing #python #devtools

Claude Code for testing: write, run, and fix tests without leaving your terminal

One of the most underrated Claude Code workflows is test-driven development — using Claude to write tests, run them, interpret failures, and fix the implementation, all in one loop.

Here's how to set it up properly.

The basic test loop

The simplest version:

claude "Write pytest tests for utils.py, run them, and fix any failures"

Claude will:

Read utils.py
Write test_utils.py
Run pytest test_utils.py
Read the output
Fix failures until all tests pass

This works surprisingly well for straightforward code. But there are patterns that make it much better.

Pattern 1: Spec-first testing

Instead of asking Claude to infer what your code should do, tell it explicitly:

claude "Write tests for the parse_date() function with these requirements:
- Accepts ISO 8601 strings
- Returns Python datetime objects
- Raises ValueError on invalid input
- Handles timezone-aware and timezone-naive strings
Then run the tests and fix parse_date() until all pass."

Spec-first testing catches the case where Claude writes tests that match the buggy implementation rather than the intended behavior.

Pattern 2: Red-green-refactor with Claude

Explicit TDD cycle:

# Step 1: Write failing tests only
claude "Write failing tests for an EmailValidator class. Don't implement the class yet. Just the tests."

# Step 2: Verify they fail
claude "Run the tests and confirm they all fail"

# Step 3: Implement
claude "Now implement EmailValidator to make all tests pass. Don't modify the tests."

# Step 4: Refactor
claude "Refactor EmailValidator for readability without changing any tests"

The key instruction is "Don't modify the tests" — without it, Claude will sometimes take the easy path of making tests less strict.

Pattern 3: Coverage-driven testing

claude "Run pytest --cov=src --cov-report=term-missing on the codebase. 
Find all functions with 0% coverage. 
Write tests for the top 5 uncovered functions.
Run coverage again to confirm improvement."

Claude reads the coverage report, identifies gaps, writes targeted tests, and verifies the improvement. This is far more efficient than manually hunting for uncovered code.

Pattern 4: Test-first bug fixing

When you report a bug, ask Claude to write a failing test first:

claude "There's a bug: parse_date('2024-01-01T00:00:00+05:30') returns UTC time instead of preserving the timezone.
First, write a failing test that demonstrates this bug.
Then fix parse_date() to make the test pass."

This gives you a regression test automatically. The bug can't re-appear without the test failing.

Pattern 5: Property-based testing

claude "Write hypothesis property-based tests for the sort_users() function.
Test properties like: sorted output is always the same length as input,
no user appears twice, users are in ascending order by last_name.
Install hypothesis if needed, run the tests."

Property-based testing finds edge cases you wouldn't think to write manually. Claude is good at identifying properties worth testing.

Pattern 6: Contract tests for APIs

For services that call external APIs:

claude "Write contract tests for our Stripe integration using responses library to mock HTTP calls.
Test: successful charge, card declined, network timeout, rate limit response.
Run the tests and fix any issues."

Handling test configuration

Claude needs context about your test setup. Add this to your CLAUDE.md:

## Testing
- Test framework: pytest
- Run tests: `pytest tests/ -v`
- Run with coverage: `pytest tests/ --cov=src --cov-report=term-missing`
- Fixtures in: tests/conftest.py
- Mock library: unittest.mock (not pytest-mock)
- Test database: uses SQLite in-memory (set automatically by conftest.py)
- DO NOT use real API keys in tests

Without this, Claude may use the wrong test runner, miss the conftest.py fixtures, or attempt to call real APIs in tests.

The rate limit problem

Long test cycles hit a specific pain point: Claude writes 50 tests, runs them, starts fixing failures — and hits a rate limit mid-session. The context is gone. You have to start over.

The fix is ANTHROPIC_BASE_URL. Point it at a proxy that doesn't throttle:

export ANTHROPIC_BASE_URL=https://simplylouie.com
claude "Run the full test suite and fix all failures"

SimplyLouie ($2/month) removes the rate limits that interrupt long test-fix cycles. The session can run until the tests pass instead of stopping when the API decides you've used too much.

Custom slash command for test loops

Create .claude/commands/test-fix.md:

Run the test suite: `pytest $ARGUMENTS -v`

For each failing test:
1. Read the error message carefully
2. Find the source of the failure (test issue or implementation issue?)
3. Fix the implementation (not the test, unless the test is wrong)
4. Re-run only that test to verify the fix
5. Continue until all tests in $ARGUMENTS pass
6. Run the full suite to check for regressions

Report: tests fixed, tests still failing, any tests you skipped and why.

Usage:

/test-fix tests/test_auth.py
/test-fix tests/ -k "payment"

What Claude is bad at with testing

Flaky tests: Claude will keep trying to fix a test that passes 90% of the time, getting confused by intermittent failures. Tell it explicitly: "If a test fails less than 3 times in 5 runs, it's flaky — mark it with @pytest.mark.flaky and move on."

Database state: Without proper teardown, tests pollute each other. Always specify: "Each test must clean up after itself. Use transactions that roll back, or recreate the test database."

Snapshot tests: Claude will update snapshots to match broken output. Always add: "Never update snapshots to match test failures — snapshots are the source of truth."

A complete CLAUDE.md testing section

## Testing rules
- Never modify existing tests to make them pass (fix the implementation)
- Never update snapshots without explicit instruction
- Use transactions for database tests (roll back after each test)
- Mark flaky tests with @pytest.mark.flaky rather than retrying forever
- Write the failing test before fixing any reported bug
- Run the full suite before declaring work complete

With these patterns in your CLAUDE.md and the test-fix slash command, Claude becomes a competent test engineer that fixes implementations instead of tests — which is the only kind of test engineer worth having.

DEV Community

Claude Code for testing: write, run, and fix tests without leaving your terminal

Claude Code for testing: write, run, and fix tests without leaving your terminal

The basic test loop

Pattern 1: Spec-first testing

Pattern 2: Red-green-refactor with Claude

Pattern 3: Coverage-driven testing

Pattern 4: Test-first bug fixing

Pattern 5: Property-based testing

Pattern 6: Contract tests for APIs

Handling test configuration

The rate limit problem

Custom slash command for test loops

What Claude is bad at with testing

A complete CLAUDE.md testing section

Top comments (0)