DEV Community

Jessica Patel
Jessica Patel

Posted on • Originally published at Medium on

AI-Powered Testing: Writing Smarter, Faster Unit Tests with GitHub Copilot

As developers, we understand how automated unit testing is essential in modern software development. It helps detect bugs early, improve code quality, minimize human error, streamline refactoring, accelerate development cycles, and ultimately lower costs. Most importantly, it ensures the software behaves as intended and meets its specified requirements.

Nevertheless, its traditional approach can feel like an uphill battle–often repetitive, time-consuming, and requiring deep knowledge of application logic and testing frameworks.

Github Copilot Image

This is where GitHub Copilot comes into play. Even though it was initially intended for code generation, Copilot also excels at automating test creation by providing not just one or two but multiple suggestions that you can cycle through to enhance accuracy and efficiency.

In this article, we’ll walk you through how you can use GitHub Copilot to streamline unit testing, improve test coverage, and save valuable development time.

Setting Up GitHub Copilot for Testing

Before diving in, let’s first discuss unit testing in software. However, if you are already familiar with this, feel free to skip to the next section.

What is Unit Testing?

Like any other product or service, a software application must be thoroughly tested before it’s made available on the market. It’s a crucial phase in the Software Development lifecycle (SDLC) that usually involves assessing the functionality of a program to ensure it checks all the boxes.

When it comes to unit testing, this procedure trickles down to inspecting the smallest functional units (functions, methods, and classes) to certify they operate accordingly in isolation. This is meant to find bugs and ascertain all components run smoothly and accurately before launching.

Nonetheless, manual unit testing is not a walk in the park. Often because it’s a time-consuming, monotonous, and error-prone operation. What’s more, it requires dedication and attention to detail.

Sometimes you might even write unnecessary tests with far less value than expected. Hence the need to streamline the process.

Installing and Configuring Copilot in VS Code

Before installing GitHub Copilot in VS Code you need to meet the following requirements:

Prerequisites

  1. GitHub Account — register here if you don’t have one.
  2. Supported Programming Language SDK — install the SDK for your go-to language like Python, JavaScript, TypeScript, etc.
  3. Preferred IDE — in this case VS Code.

Setting Up VS Code Copilot

  • Start by installing Copilot extensions in your VS Code using the link below: Install the GitHub Copilot Extension here or search for the GitHub Copilot from the extension view as illustrated. Restart VS Code after installation.

Copilot Icon

  • Sign in to your GitHub account using the Chat View on the right panel. To start using Copilot, ensure the extension is active–do this by clicking on the Copilot icon at the bottom right of your VS Code.

VSCode CoPilot Icon

  • If you start seeing suggestions as you type, the installation was successful.

There are several ways to interact with Copilot, the most obvious and by far the easiest one being Copilot Chat. Here, you interact with Copilot in an interactive, chat-like manner by sending prompts in the form of text. If you’ve ever used Copilot in your GitHub account or ChatGPT, this should be like riding a bike.

There is also an inline code suggestion which offers real-time suggestions (appear as grayed-out text) as you type.

And for those who are old-fashioned like me, simply select the code you want to test, right-click, and select Generate tests. Additionally, you can use the slash command /tests in your Copilot Chat or IDE to generate tests.

This is just the tip of the iceberg. There are many ways to interact with Copilot, and the more you use GitHub Copilot, the more you’ll discover them.

How Copilot helps in writing Unit Tests

In your first encounter with Copilot, you might think it’s an autocomplete feature on steroids–except its suggestions are brilliant and can expand entire lines or functions. It uses an AI language model (Open AI GPT-3) trained on GitHub’s dataset to draw context from your code or chats and provide relevant code suggestions, including unit tests.

How does this work practically?

Imagine testing a login form manually. It seems straightforward enough till the application scales and you are required to repeat all test cases (valid login, invalid login, empty fields, etc.).

To avoid this hassle, you can use Copilot to streamline the process on your behalf. And there is no shortage of options when it comes to automating unit test generation.

Suggesting Test Cases based on Function Signatures and Comments

Start by providing functions and comments with well-defined purpose and functionality. Copilot will automatically pick this up and generate meaningful test cases without additional input.

Sample Comment: “# This function takes two integers as input and returns their sum.

# It is a simple arithmetic operation commonly used in mathematical calculations.

From the above comments, Copilot will suggest a function that adds two numbers as shown below:

Function Add Two Numbers

To accept, click ‘tab’ or use alt + [ and alt + ] to cycle through the options.

GitHub Copilot will provide unit tests for the function generated above and proceed to test them.

Unit Test

Note: since it’s generative AI, your output might differ from the above.

Generating Unit Tests for Different Frameworks

Copilot supports various languages and testing frameworks but works particularly well with Python, JavaScript, TypeScript, Ruby, Go, C# and C++.

Below are code snippets of GitHub Copilot in action, generating test cases for different frameworks.

For the Python function below:

Python Function

Copilot generates the following PyTest test case:

PyTest Case

Similarly, for Jest in JavaScript:

Jest

As you can see, it is easy to write unit tests for various frameworks without knowing the language deeply. This enables developers to focus more on functionality and logic, and the testing process becomes easier and faster overall.

Completing Assertions Intelligently

Copilot can automatically fill in assertions from a function output. If your function must return a list, dictionary, or specific value, Copilot will often supply appropriate assertions.

For instance, a function that returns a user dictionary:

User Dictionary

Copilot suggests:

CoPilot Suggestions

Mocking Dependencies and API Calls

In situations where you have functions that rely on external databases or APIs, Copilot will assist you in generating mock objects and responses.

Example: Mocking an API Call with unittest.mock in Python

Mocking API Call

Copilot generates:

Copilot Option Mock Api

Best Practices for Generating Effective Tests with Copilot

Even though GitHub is an impressive tool that significantly speeds up the process of writing tests, you still need some tips and tricks to get the best out of it. Here are some of them:

1. Providing clear function docstrings for better test suggestion

When using Copilot, keep in mind that it doesn’t code exactly like you do. So for you to consistently generate high-quality test cases, ensure your functions are well documented to guide Copilot effectively.

Example :

Document Guide

2. Be specific in your prompts about what you want to test

Don’t entirely rely on Copilot’s default suggestions. Instead, provide specific prompts and, if need be throw in some examples to reduce chances of misinterpretation that might lead to the generation of basic cases while overlooking edge cases.

Example of a weak test prompt :

“Write a test for calculate_discount”

Test Calculate Discount

Example of a strong test prompt:

“Write unit tests for calculate_discount. Include cases for 0% discount, 100% discount, a typical discount (e.g., 20%), and invalid discounts (-10%, 110%). Use pytest and check for expected exceptions”

Strong Test Prompt

The above prompt yields an all-rounded set of test cases unlike the former.

3. Reviewing and refining generated tests

Do you review your code? The same principle applies here. When coding with GitHub Copilot, remember that it generates code by finding patterns and learning from existing public code.

Although it may accelerate development by providing good suggestions, it does not understand the context of your specific project or business rules. Even with good docstrings and solid prompts, there’s a need for human inspection to ensure logical correctness, handling edge cases, security, performance optimization, and maintainability.

4. Combining Copilot with manual testing strategies

Every time Copilot generates unit tests, you might be tempted to sit back and accept every suggestion it throws at you. Don’t! Not only will you end up being over-reliant on Copilot, but you might also end up with generic test cases. To avoid this, complement Copilot-generated test cases with manually written cases.

Example:

When you start typing a function:

Manual Function

Copilot will automatically suggest:

Copilot suggestion

If you are keen, you’ll notice a few issues in the above code. For instance, there are no tests for empty values, case sensitivity, or incorrect passwords. To improve this we need to manually refine these test cases.

Take a look at the snippet below.

Test Snippet

Copilot’s Test Generation in Action: Real-World Scenario

Let’s go ahead and look at a practical example in Python — one of the most widely used languages in the world currently.

Example: Generating unit tests in Python with Copilot

For this example, let’s use a simple Python Function that calculates the final price of a product after applying a discount. Then proceed to generate unit tests using GitHub Copilot.

Python Copilot Test

Enter the following slash command in Copilot Chat. /tests create tests for calculate_discount.

Copilot will generate appropriate unit tests and you’ll see an output similar to the one below. ( bear in mind that this is generative AI and your results might be a little bit different)

Copilot Option

From the above code, we can see GitHub Copilot creating unit tests for the calculate_discount function. Let’s break the code to understand what’s happening.

The unit test has two methods to check for valid and invalid discounts. The valid method covers various scenarios; a typical discount (10%), a 50% discount, a 0% discount (no discount), and a 100% discount (free). Here, Copilot uses self.assertAlmostEqual which is appropriate for floating-point calculations with small margins.

The invalid method checks for invalid discounts and raises a ValueError using self.assertRaises(ValueError).

Modifications and Improvements

While these tests are quite good and work as expected, we can still make a few adjustments such as:

  1. Providing more edge cases.

We included a floating-point discount test to ensure proper handling of decimal discounts (e.g., calculate_discount(100, 12.5)), a large price test to ensure the function’s proper handling of large values (e.g., calculate_discount(1,000,000, 50)), and a small price test to ensure it properly handles very low values (e.g., calculate_discount(0.01, 10)). These additions improve the function’s accuracy and robustness.

  1. Improving clarity and readability

We organized the test methods for readability and maintainability. test_valid_discount tests the function’s behavior given valid discount percentages, e.g., the additional cases we included. test_invalid_discount tests that a ValueError is correctly raised for invalid discount percentages, such as negative (calculate_discount(100, -10)) and above 100% (calculate_discount(100, 110)). This organization enhances test coverage and readability.

  1. Testing for correct error messages

We utilized assertAlmostEqual to address floating-point precision differences and assertRaisesRegex for testing correct error messages for incorrect discount percentages and yielding accurate and consistent test output.

Assertion

Limitations & Challenges

GitHub Copilot has its setbacks. Knowing how to navigate these challenges is crucial as you continue generating unit tests with Copilot.

Common pitfalls of AI-generated tests

One of the biggest challenges developers face with GitHub Copilot is ensuring it accurately interprets chat inputs. Achieving the desired results often requires a series of well-structured prompts. Even then, Copilot may struggle to retain context from previous interactions, particularly in complex testing scenarios.

At times, developers have reported receiving irrelevant or nonsensical code suggestions from Copilot. In some cases, it may even stop generating suggestions altogether, which can be frustrating — especially when working mid-task.

Like other large language model (LLM)-based tools, GitHub Copilot raises ethical concerns, primarily because it is trained on publicly available GitHub repositories. As a result, it may occasionally generate code that includes sensitive information, such as API keys, usernames, or passwords, if such data was present in the training set.

Always review Copilot-generated code for security flaws, such as hardcoded credentials or insufficient validation as it could inadvertently introduce or amplify existing bugs and vulnerabilities, potentially exposing systems to security threats that malicious actors might exploit.

When is Human intervention necessary?

When using Copilot to generate tests, keep in mind that it is simply an AI-powered coding assistant. It generates code by recognizing patterns from its training data, but it lacks true understanding of code semantics and cannot assess the quality of its output. In other words, if Copilot is trained on flawed data, its suggestions may also be flawed.

This is where human oversight becomes essential. Unlike Copilot, you have the ability to step back, evaluate the context, and make informed decisions. Your creativity and adaptability — skills that AI currently lacks — are crucial in ensuring the accuracy, efficiency, and reliability of the generated tests.

Copilot’s effectiveness across different languages & frameworks

Copilot is designed to support a wide range of programming languages and frameworks, but its effectiveness varies. This is largely due to its reliance on publicly available GitHub data for training.

The tool performs best with widely adopted languages such as Python, JavaScript, Java, and C#, where extensive training data is available. It delivers moderate results for languages like Ruby, Swift, Kotlin, and Go.

However, for less common languages like Rust, Haskell, and Perl, Copilot’s effectiveness diminishes, as there is less training data to draw from. The same limitations extend to their associated frameworks.

Conclusion

While code completion tools have been around for a while, GitHub Copilot elevates the experience to an entirely new level. By streamlining unit test creation, it transforms what was once a tedious, manual process into a faster, more interactive, and intelligent workflow.

However, Copilot isn’t a substitute for manual testing. Understanding when and how to use it is crucial. AI-generated tests can sometimes miss critical scenarios, overlook edge cases, or misinterpret complex logic. The key to writing effective unit tests lies in striking the right balance between Copilot’s automation and human oversight.

If writing manual tests feels like a challenge, Copilot can be a game-changer. But don’t just take our word for it — give the free version a try, explore its capabilities, and share your thoughts in the comments.

Top comments (0)