Marabesi

Posted on Jun 30 • Originally published at marabesi.com

AI and TDD - A match that can work?

#programming #tdd #githubcopilot #testing

Test-Driven Development (TDD) is a software development approach that emphasises writing tests before writing the actual code. This method not only ensures that the code meets the requirements but also helps in maintaining code quality in the form of refactoring. With the advent of AI, TDD can take advantage of it, making it with less friction to be adopted by developers.

In this post, we will explore how AI can be integrated into TDD workflows, providing practical examples and insights into the benefits of this approach based on what research has shown.

Before start, let's clarify what AI means for the purpose of this post. AI refers to the use of LLMs (Large Language Models) such as ChatGPT, Copilot, and others that can assist in generating code, tests, and even suggesting improvements to the codebase. Throughout this post, we will refer to AI as the use of LLMs in the context of TDD workflows, and whenever needed it is specified which LLM was used.

Research

In the world of software development, TDD has been a game-changer. It encourages developers to think about the requirements and design of their code before implementation. However, writing tests can be time consuming. It is not only the advent of AI that looked at improving the TDD process, but also the automatically test case generation through a defined algorithm that generate the test cases and was is to detect more errors in comparison to the traditional TDD approach.

In that sense, AI can assist in generating tests, suggesting improvements, and even automating parts of the TDD process. Research on the writing tests has shown that AI can significantly reduce the time spent on writing tests, however, an experienced developer is still needed to ensure that the tests are meaningful and cover the necessary scenarios. Not only that, generated code might not be secure, which reinforces the expert need.

TDD has been put into a context of generative AI in attempt to improve the way developers write code, mostly target at incorporating AI inputs into the TDD steps. In a study by Mock it was used two different scenarios:

Fully-automated (left) - the developers is a navigator, checking the AI-generated code and tests and making sure that they meet the requirements and are secure.
Collaborative model (right) - the AI suggests tests and code based on the demand of the developer, the output is modified and then the flow repeats again.

The fully-automated model allows the AI to generate tests and code without human intervention, while the collaborative model involves the AI suggesting tests and code, which the developer can then review and modify. A python script was used to gather the metrics and integrate with ChatGPT API, in addition to that, the fully automated approach was the one that took less time to complete the task in their experiment.

Piya also explored the use of LLMs in TDD workflows, focusing on a flow that used the LLM to generate tests and code, which the developer could then review and modify, in addition to that, developers were able to use LLM to adjust the code. Their work used the chatGPT website directly.

Shin adopted a similar approach, and ChatGPT was the model that best scored in terms of requirements, Copilot and Gemini scored the same. In the next section, we will move on to discuss those approaches to combine TDD and LLMs.

Discussion

As interesting as it might sounds, fully-automated approach is yet a technique to be refined. The vibe-coding selling point is that AI can write code for you, but it still requires a human to review and modify the code to ensure that it meets the requirements and is secure.

So far, in academic research, the focus has been on how AI can assist in TDD workflows, experiments have been under controlled environments. Despite positive results with students in brownfield projects, research has is still a way to go in order to support practitioners out there, that need to deal with existing code bases that are a spaghetti and hard to maintain. From this point point of view, it makes even more urgent the need to use AI in creating the safety net that TDD does manually to support software evolution. Not only that, but also paying attention to the quality of the test code generated by AI, as it needs a human to ensure that the tests are meaningful and cover the necessary scenarios.

Professional software development has for long being a team sports activity, dating back from the agile boom. Research suggests that developers read more code than write it. Rather than prioritising speed alone, LLM development should focus on fostering understandability, maintainability, and a sustainable pace. Achieving a task quickly is a beneficial consequence, but quality should remain a primary concern. Unless the game rules change. Is it possible to generate things quickly and avoid maintenance at all?

AI in TDD Workflows in practice

In this section, we will delve into how AI can be effectively integrated into TDD workflows, providing practical examples and insights. Let's explore how AI can be integrated into TDD workflows with practical examples, based on my experiences.

ReactJs

Last year, I experimented with using AI to generate tests for a ReactJs application. I used a combination of AI tools to generate tests for the components, ensuring that they were covered by unit tests.

The AI (Copilot) was able to generate tests that covered the basic functionality of the components, but it still required manual intervention to ensure that the tests were meaningful, the most common aid needed were to identify the test-doubles that were needed to isolate the components from their dependencies. A few things to note about this:

The developer is actively involved in the process, reviewing and modifying the AI-generated tests.
The AI is used to generate the initial test cases, which can save time and effort. However, from practical experience, LLMs are not able to generate the test-doubles needed to isolate the components from their dependencies in specific cases.
The developer is responsible for ensuring that the tests are meaningful, which is a crucial part of TDD.

In addition to that, prompts need to be specific to the context of the code being tested, and adhere so a certain pattern to make it reproducible. The prompt engineering approach might help in that. For example, making the LLM to use JSON as a response might limit hallucinations (page 60).

Is it a match?

The once established TDD workflow, which is a cycle of writing a test, writing the code to pass the test, and then refactoring the code is now being upgraded to have a new step which is to use AI to generate the test and even the production code. The AI can assist in generating tests and suggesting improvements. The shift is becoming now that the red-green-refactor cycle has mutated to:

red-green-AI-refactor
red-AI-green-AI-refactor

LLMs can assist in generating tests, code and even refactoring with encouraging positive results, however, all of it with a human in the loop that can assess the output. Which goes back to the point that to first TDD (or to assess it) it needs to understand TDD, not only in theory but also in practice, which is a skill that takes time to develop.

DEV Community