DEV Community

Maksim Matlakhov
Maksim Matlakhov

Posted on • Edited on • Originally published at blog.vibetdd.dev

VibeTDD Experiment 1: Teaching a Calculator with Test-Driven Development

This is the first experiment in my VibeTDD series, where I systematically explore how AI and Test-Driven Development can work together effectively.

The Setup

I decided to start with the classic TDD exercise: building a calculator. My approach was simple - let Claude lead the entire process while I observed how well AI could teach and follow TDD principles.

The Rules I Set:

  • Claude would guide the exercise and make all technical decisions
  • I would only copy-paste the code it produced
  • When Claude asked what to do next, I'd tell it to decide
  • No TDD guidance from me - I wanted to see AI's natural approach

The Tech Stack:

  • Kotlin (my favorite language)
  • Maven for build management
  • JUnit 5 for testing framework
  • Kotest for assertions (more interesting than standard JUnit)

Phase 1: Project Setup

Claude started correctly by establishing the foundation:

<!-- pom.xml excerpt -->
<dependencies>
    <dependency>
        <groupId>org.jetbrains.kotlin</groupId>
        <artifactId>kotlin-stdlib</artifactId>
        <version>1.9.20</version>
    </dependency>

    <dependency>
        <groupId>org.junit.jupiter</groupId>
        <artifactId>junit-jupiter-engine</artifactId>
        <version>5.10.0</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>io.kotest</groupId>
        <artifactId>kotest-assertions-core-jvm</artifactId>
        <version>5.7.2</version>
        <scope>test</scope>
    </dependency>
</dependencies>
Enter fullscreen mode Exit fullscreen mode

The project structure was clean and followed conventions:

src/
├── main/kotlin/com/example/calculator/Calculator.kt
└── test/kotlin/com/example/calculator/CalculatorTest.kt
Enter fullscreen mode Exit fullscreen mode

First Impression: Claude understood modern project setup and chose sensible dependencies.

Phase 2: The First Red-Green-Refactor Cycle

RED: The First Failing Test

Claude started with the simplest possible test:

@Test
fun `should add two positive numbers`() {
    val result = calculator.add(2, 3)
    result shouldBe 5
}
Enter fullscreen mode Exit fullscreen mode

With an empty Calculator class:

class Calculator {
}
Enter fullscreen mode Exit fullscreen mode

Observation: Perfect TDD start - write a test that fails for the right reason.

GREEN: Making It Pass (The Hard Way)

Here's where it got interesting. Claude implemented this:

class Calculator {
    fun add(a: Int, b: Int): Int {
        return 5  // Hardcoded!
    }
}
Enter fullscreen mode Exit fullscreen mode

This was actually correct TDD! Many developers would jump straight to return a + b, but Claude followed the discipline of writing the minimal code to make the test pass.

The Second Test Forces Generalization

Claude then added:

@Test
fun `should add different positive numbers`() {
    val result = calculator.add(4, 7)
    result shouldBe 11
}
Enter fullscreen mode Exit fullscreen mode

Now the hardcoded return 5 couldn't satisfy both tests, forcing the proper implementation:

fun add(a: Int, b: Int): Int {
    return a + b
}
Enter fullscreen mode Exit fullscreen mode

Key Learning: AI understood the triangulation principle - you need multiple test cases to drive toward general solutions.

Phase 3: Building Up Operations

Claude continued the pattern for each operation:

Subtraction

// Test first
@Test
fun `should subtract two numbers`() {
    val result = calculator.subtract(7, 3)
    result shouldBe 4
}

// Then implementation
fun subtract(a: Int, b: Int): Int {
    return a - b
}
Enter fullscreen mode Exit fullscreen mode

Multiplication with Edge Cases

@Test
fun `should multiply two positive numbers`() {
    val result = calculator.multiply(3, 4)
    result shouldBe 12
}

@Test
fun `should multiply by zero`() {
    val result = calculator.multiply(5, 0)
    result shouldBe 0
}
Enter fullscreen mode Exit fullscreen mode

Division with Error Handling

This is where Claude showed sophisticated understanding:

@Test
fun `should divide two numbers`() {
    val result = calculator.divide(12, 3)
    result shouldBe 4
}

@Test
fun `should throw exception when dividing by zero`() {
    val exception = org.junit.jupiter.api.assertThrows<ArithmeticException> {
        calculator.divide(10, 0)
    }
    exception.message shouldBe "Division by zero is not allowed"
}
Enter fullscreen mode Exit fullscreen mode

And the implementation:

fun divide(a: Int, b: Int): Int {
    if (b == 0) {
        throw ArithmeticException("Division by zero is not allowed")
    }
    return a / b
}
Enter fullscreen mode Exit fullscreen mode

Impressive: Claude naturally progressed to testing exception scenarios and error messages.

Phase 4: Test Refactoring Lesson

After building the basic functionality, I noticed redundant tests and mentioned it. Claude immediately identified the issue:

// These two tests were redundant:
should add different positive numbers  // (4, 7) -> 11
should add number with zero           // (5, 0) -> 5
Enter fullscreen mode Exit fullscreen mode

But here's where I learned something important. When we removed the "redundant" test, Claude pointed out a crucial flaw:

"Now I can do this and all tests are green:

fun add(a: Int, b: Int): Int {
    return 5
}

The Lesson: What seemed redundant was actually providing necessary triangulation. One test allows hardcoding; multiple tests force generalization.

Phase 5: Professional Test Structure

Claude then suggested refactoring to parameterized tests:


kotlin
@ParameterizedTest
@CsvSource(
    "2, 3, 5",
    "-2, -3, -5", 
    "5, 0, 5",
    "0, 0, 0"
)
fun `should add numbers correctly`(a: Int, b: Int, expected: Int) {
    val result = calculator.add(a, b)
    result shouldBe expected
}


Enter fullscreen mode Exit fullscreen mode

This elegantly solved the triangulation problem while reducing test maintenance burden.

Final Test Structure:

  • 3 parameterized tests for normal operations
  • 2 individual tests for exception cases
  • Comprehensive coverage with minimal redundancy

What I Discovered

✅ AI Understands TDD Fundamentals

  • Wrote tests before implementation consistently
  • Followed red-green-refactor cycles
  • Used triangulation to drive general solutions
  • Recognized when to test exceptions vs. normal cases

✅ AI Taught Good Practices

  • Suggested parameterized tests for better maintainability
  • Explained the reasoning behind each TDD step
  • Identified test redundancy and optimization opportunities
  • Showed proper exception testing patterns
  • Defined meaningful method names

⚠️ Areas Needing Human Oversight

  • Auto-progression: Claude started making decisions without asking
  • Context switching: Sometimes lost track of which phase we were in
  • Optimization timing: Needed guidance on when to refactor vs. add features

❌ Potential Pitfalls

  • Could have over-engineered early if not constrained by simple tests
  • Might not naturally consider all edge cases without prompting
  • Test refactoring decisions needed human judgment

The Verdict

VibeTDD works surprisingly well for simple, well-defined problems. Claude demonstrated solid understanding of TDD principles and could teach them effectively. The test-first approach kept the AI focused and prevented over-engineering.

However, this was just a calculator - the next challenge will be more telling.

Key Takeaways for VibeTDD

  1. AI can teach TDD basics effectively but needs human oversight for decisions
  2. Tests provide excellent guardrails for AI-generated code
  3. Triangulation is crucial don't remove "redundant" tests too quickly
  4. Parameterized tests are a game-changer for maintainable test suites
  5. Red-green-refactor discipline keeps AI from over-engineering

Next: The Real Challenge

The calculator experiment was encouraging, but it's a toy problem. Next, I'm taking on the Portfo payout service challenge - a real-world problem with business rules, validation logic, and architectural decisions.

Will AI maintain TDD discipline when faced with:

  • Complex business requirements?
  • Multiple validation rules?
  • Integration concerns?
  • Architectural trade-offs?

The calculator taught us the basics work. Now let's see if VibeTDD scales to realistic complexity.

Top comments (0)