Maksim Matlakhov

Posted on Jul 28 • Edited on Jul 30 • Originally published at blog.vibetdd.dev

VibeTDD Experiment 1: Teaching a Calculator with Test-Driven Development

#tdd #ai #vibecoding #testing

This is the first experiment in my VibeTDD series, where I systematically explore how AI and Test-Driven Development can work together effectively.

The Setup

I decided to start with the classic TDD exercise: building a calculator. My approach was simple - let Claude lead the entire process while I observed how well AI could teach and follow TDD principles.

The Rules I Set:

Claude would guide the exercise and make all technical decisions
I would only copy-paste the code it produced
When Claude asked what to do next, I'd tell it to decide
No TDD guidance from me - I wanted to see AI's natural approach

The Tech Stack:

Kotlin (my favorite language)
Maven for build management
JUnit 5 for testing framework
Kotest for assertions (more interesting than standard JUnit)

Phase 1: Project Setup

Claude started correctly by establishing the foundation:

<!-- pom.xml excerpt -->
<dependencies>
    <dependency>
        <groupId>org.jetbrains.kotlin</groupId>
        <artifactId>kotlin-stdlib</artifactId>
        <version>1.9.20</version>
    </dependency>

    <dependency>
        <groupId>org.junit.jupiter</groupId>
        <artifactId>junit-jupiter-engine</artifactId>
        <version>5.10.0</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>io.kotest</groupId>
        <artifactId>kotest-assertions-core-jvm</artifactId>
        <version>5.7.2</version>
        <scope>test</scope>
    </dependency>
</dependencies>

The project structure was clean and followed conventions:

src/
├── main/kotlin/com/example/calculator/Calculator.kt
└── test/kotlin/com/example/calculator/CalculatorTest.kt

First Impression: Claude understood modern project setup and chose sensible dependencies.

Phase 2: The First Red-Green-Refactor Cycle

RED: The First Failing Test

Claude started with the simplest possible test:

@Test
fun `should add two positive numbers`() {
    val result = calculator.add(2, 3)
    result shouldBe 5
}

With an empty Calculator class:

class Calculator {
}

Observation: Perfect TDD start - write a test that fails for the right reason.

GREEN: Making It Pass (The Hard Way)

Here's where it got interesting. Claude implemented this:

class Calculator {
    fun add(a: Int, b: Int): Int {
        return 5  // Hardcoded!
    }
}

This was actually correct TDD! Many developers would jump straight to return a + b, but Claude followed the discipline of writing the minimal code to make the test pass.

The Second Test Forces Generalization

Claude then added:

@Test
fun `should add different positive numbers`() {
    val result = calculator.add(4, 7)
    result shouldBe 11
}

Now the hardcoded return 5 couldn't satisfy both tests, forcing the proper implementation:

fun add(a: Int, b: Int): Int {
    return a + b
}

Key Learning: AI understood the triangulation principle - you need multiple test cases to drive toward general solutions.

Phase 3: Building Up Operations

Claude continued the pattern for each operation:

Subtraction

// Test first
@Test
fun `should subtract two numbers`() {
    val result = calculator.subtract(7, 3)
    result shouldBe 4
}

// Then implementation
fun subtract(a: Int, b: Int): Int {
    return a - b
}

Multiplication with Edge Cases

@Test
fun `should multiply two positive numbers`() {
    val result = calculator.multiply(3, 4)
    result shouldBe 12
}

@Test
fun `should multiply by zero`() {
    val result = calculator.multiply(5, 0)
    result shouldBe 0
}

Division with Error Handling

This is where Claude showed sophisticated understanding:

@Test
fun `should divide two numbers`() {
    val result = calculator.divide(12, 3)
    result shouldBe 4
}

@Test
fun `should throw exception when dividing by zero`() {
    val exception = org.junit.jupiter.api.assertThrows<ArithmeticException> {
        calculator.divide(10, 0)
    }
    exception.message shouldBe "Division by zero is not allowed"
}

And the implementation:

fun divide(a: Int, b: Int): Int {
    if (b == 0) {
        throw ArithmeticException("Division by zero is not allowed")
    }
    return a / b
}

Impressive: Claude naturally progressed to testing exception scenarios and error messages.

Phase 4: Test Refactoring Lesson

After building the basic functionality, I noticed redundant tests and mentioned it. Claude immediately identified the issue:

// These two tests were redundant:
should add different positive numbers  // (4, 7) -> 11
should add number with zero           // (5, 0) -> 5

But here's where I learned something important. When we removed the "redundant" test, Claude pointed out a crucial flaw:

"Now I can do this and all tests are green:
fun add(a: Int, b: Int): Int {
    return 5
}

The Lesson: What seemed redundant was actually providing necessary triangulation. One test allows hardcoding; multiple tests force generalization.

Phase 5: Professional Test Structure

Claude then suggested refactoring to parameterized tests:


kotlin
@ParameterizedTest
@CsvSource(
    "2, 3, 5",
    "-2, -3, -5", 
    "5, 0, 5",
    "0, 0, 0"
)
fun `should add numbers correctly`(a: Int, b: Int, expected: Int) {
    val result = calculator.add(a, b)
    result shouldBe expected
}

This elegantly solved the triangulation problem while reducing test maintenance burden.

Final Test Structure:

3 parameterized tests for normal operations
2 individual tests for exception cases
Comprehensive coverage with minimal redundancy

What I Discovered

✅ AI Understands TDD Fundamentals

Wrote tests before implementation consistently
Followed red-green-refactor cycles
Used triangulation to drive general solutions
Recognized when to test exceptions vs. normal cases

✅ AI Taught Good Practices

Suggested parameterized tests for better maintainability
Explained the reasoning behind each TDD step
Identified test redundancy and optimization opportunities
Showed proper exception testing patterns
Defined meaningful method names

⚠️ Areas Needing Human Oversight

Auto-progression: Claude started making decisions without asking
Context switching: Sometimes lost track of which phase we were in
Optimization timing: Needed guidance on when to refactor vs. add features

❌ Potential Pitfalls

Could have over-engineered early if not constrained by simple tests
Might not naturally consider all edge cases without prompting
Test refactoring decisions needed human judgment

The Verdict

VibeTDD works surprisingly well for simple, well-defined problems. Claude demonstrated solid understanding of TDD principles and could teach them effectively. The test-first approach kept the AI focused and prevented over-engineering.

However, this was just a calculator - the next challenge will be more telling.

Key Takeaways for VibeTDD

AI can teach TDD basics effectively but needs human oversight for decisions
Tests provide excellent guardrails for AI-generated code
Triangulation is crucial don't remove "redundant" tests too quickly
Parameterized tests are a game-changer for maintainable test suites
Red-green-refactor discipline keeps AI from over-engineering

Next: The Real Challenge

The calculator experiment was encouraging, but it's a toy problem. Next, I'm taking on the Portfo payout service challenge - a real-world problem with business rules, validation logic, and architectural decisions.

Will AI maintain TDD discipline when faced with:

Complex business requirements?
Multiple validation rules?
Integration concerns?
Architectural trade-offs?

The calculator taught us the basics work. Now let's see if VibeTDD scales to realistic complexity.

DEV Community