AI-assisted Test Driven Development Experiment Quick Takes

#ai #tdd

Intrigued by Kent Beck's blog Augmented Coding: Beyond the Vibes, I wanted to experiment with AI-assisted Test Driven Development capabilities. For that, I coded the small library release-parrot.

This blog is a quick personal review - any feedback is very welcome 🤗

Quick takes

Did AI¹ help me to

finish the project quicker? I assume no
write a better test suite?² I assume no
improve the code quality? I assume no
improve the documentation? Yes! It relieved me of the (for me) tedious task of documentation and did it well. For example, before publishing I reviewed everything myself. Afterwards, I simply asked AI 'Is the project ready for its first release?' And AI found two errors in the documentation, that I had overlooked (it had forgotten to update a section when it had made some changes. But I hadn't caught them either at the time). In the final review, AI did a better job than me - nice!
level up? I assume yes
- I used Node.js' native test runner for the first time and tried out JSDoc instead of TS for strong typing. At the beginning, it was nice to have AI write the code, so that I didn't have to look up everything first. However, I read some docs anyway. I honestly don't know, if I would have been equally fast and had gained a deeper and more lasting understanding, if I had read the related documentation first.
- The opportunity to ask questions and get detailed answers (which should be critically examined), to ask differently or more in-depth again and get further detailed answers, was nice brain-food for me, which I used for further information retrieval.
- The AI-generated code tended to differ from my code-style and it was educational to think about the pros and cons in comparison.

Why do I sometimes only assume a yes/no answer? I love coding myself. Waiting for AI and reading all its stuff was exhausting for me. I think, I know, how I would have done it without AI. But I didn't actually do it. So I feel, that an honest, objective comparison is not possible by me.

Did the project create real value?

I don't think so. Probably, I could utilize something already battle-tested like semantic-release instead of writing my own library.

Can AI produce production code from start to finish without senior supervision?

For now, I have strong doubts. Even in this small project, AI sometimes got stuck in its loop and produced hundreds of lines of not working as intended code for a simple 3-lines-like change I wanted to do. Bad prompting? Bad luck? How much would that worsen in a bigger code base - amplified by accumulated amounts of AI-code which got not crushed down by senior supervision?

For the time being, I think AI is a great tool! But it is not (yet?) ready to be a senior developer by itself.

Footnotes

[1] I used GPT-5-Codex (Preview), Claude Sonnet 3.5 and 4. You can find my basic TDD prompt here.
[2] I strongly believe more tests or higher test coverage have nothing to do with test quality. Tests which give me confidence for refactoring, or serve as documentation are good tests. Tests which help me to define an interface or to verify that my logic works as expected are good tests. However, tests have to be maintained and IMHO can even have a negative total effect on the maintainability of a code base.