Teaching an AI Agent to Do TDD (And Actually Mean It)

#tdd #ai #agentdev #softwaredevelopment

One of the more interesting problems I've been sitting with lately is this: how do you make an AI coding agent disciplined?

Not capable — that part is mostly solved. But disciplined. The kind of discipline that stops a developer from writing five tests at once before any of them are green. The kind that insists every acceptance criterion has a named test before you call the story done. That's the hard part.

The Problem With "Just Tell It"

My first instinct was to write a rule: "follow TDD." Simple enough. But vague instructions produce vague compliance. The agent would write a test, write all the implementation, then nod along when you asked if it did TDD. Technically yes. Practically no.

The fix was to get specific about the atomic unit. Not "write tests first" but "write exactly one failing test, make it pass, commit — then and only then write the next test." That single constraint changes the whole rhythm. It forces small commits, it surfaces design issues early, and crucially, it's checkable. Either the commit history shows alternating test/impl commits or it doesn't.

Vertical Slices as Test Scenarios

The other shift was in how stories are broken down into tasks. Instead of horizontal layers — "write the DB migration, then the service layer, then the controller, then the UI" — I moved to vertical slices expressed as BDD scenarios. Each task looks like:

Given [starting state]
When [user or system action]
Then [observable outcome]
Hint: [which layer to touch and roughly how]

This matters because it keeps the agent (or any developer, really) anchored to user-observable behavior rather than implementation shape. The hint gives just enough technical direction without over-constraining the solution. And because the scenario doubles as a test case, there's no gap between "task complete" and "test written."

Traceability Before You Merge

The last piece was AC traceability. Before a PR gets created, the agent now has to produce an explicit mapping: Acceptance Criterion → test(s) that cover it. If a criterion has no test, the story isn't done. Full stop.

This sounds obvious but it's surprisingly easy to skip when you're moving fast. Making it a gate — not a suggestion — changes the default. The PR description ends up with a little table that reviewers can actually use.

Takeaway

The pattern here applies beyond AI agents. Codifying your team's development discipline as explicit, checkable constraints — rather than cultural norms — makes it stick. Norms drift. Checklists don't.

If you're working with AI coding assistants, the investment in tight agent configuration pays back fast. The more precise the rules, the less you spend reviewing slop.

DEV Community

Teaching an AI Agent to Do TDD (And Actually Mean It)

The Problem With "Just Tell It"

Vertical Slices as Test Scenarios

Traceability Before You Merge

Takeaway

Top comments (0)