Charles Uneze

Posted on May 21, 2024

Utilizing Coverage AI Agents for Better Unit Tests

#testgenllm #unittest #codiumai #python

Artificial Intelligence agents have improved developers’ workflow in the last few years and the release of this paper from Meta about their TestGen-LLM has been a game changer for unit testing code.
CodiumAI, an AI startup is at the forefront of the commercial application of the TestGen-LLM with their new open-source tool. What I love about this TestGen-LLM implementation is that it keeps iterating the code until coverage increases. See below.

Source: Automated Unit Test Improvement using Large Language Models at Meta

CodiumAI’s Cover-Agent

Pre-coverage AI agents, Python developers would use the Coverage python plugin or the Pytest-cov plugin, while Go developers use go test -coverprofile=... (See Go coverage profile), then they’ll have to manually write tests to improve coverage in the lines missed in their code.
Today, tools like Cover-Agent do this automatically.

Demo

Below is a simple script that deploys AWS services using Python’s Boto3 library.

Installation
Follow this guide on the CodiumAI Cover-Agent GitHub repository to install the necessary packages.
The following packages are also required:

Boto3
Moto



pip install boto3 moto

Code to test
A simple production script that creates an EC2 instance, a Dynamodb table, and puts an item into the table.

File: aws.py



import time
import boto3

dynamodb = boto3.resource('dynamodb')
ec2 = boto3.resource('ec2')

def create_ec2_instance():
    # Define parameters for the instance
    instance_params = {
        'ImageId': '12345',
        'InstanceType': 't2.micro',
        'KeyName': 'key',
        'MinCount': 1,
        'MaxCount': 1
    }

    # Create the EC2 instance
    instances = ec2.create_instances(**instance_params)

    # Wait for the instance to be in running state
    instance = instances[0]
    while instance.state['Name'] != 'running':
        time.sleep(5)  # Wait for 5 seconds before checking again
        instance.reload()  # Reload the instance object to get the latest state
 # Return the instance state
    return instance

def create_dynamodb_table_and_put_item():
    """
    Create a DynamoDB table and put an item in it.
    """

    # Create a DynamoDB table
    table = dynamodb.create_table(
        TableName='AMI_Table',
        KeySchema=[
            {'AttributeName': 'id', 'KeyType': 'HASH'}
        ],
        AttributeDefinitions=[
            {'AttributeName': 'id', 'AttributeType': 'N'}
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 5,
            'WriteCapacityUnits': 5
        }
    )

    # Wait for the table to be created (this is important!)
    table.wait_until_exists()

    # Get the table resource
    get_table = dynamodb.Table('AMI_Table')

    # Put item in the table
    get_table.put_item(
        Item={
            'id': 1,
            'PK': 'Partition_Key',
            "REGION_AMI_ID": "ami-123",
            "AMI_State": "Create_AMI"
        }
    )

Below is another script that tests against the production code. This script has only one test, which tests the create_ec2_instance() function to see if the instance ends up in the running state.

File: test_aws.py



import boto3
from moto import mock_aws
from datetime import datetime
from aws import create_ec2_instance, create_dynamodb_table_and_put_item

@mock_aws
def test_create_ec2_instance():
    # Create the instance
    instance = create_ec2_instance()
    # Assert that the instance is in a running state
    assert instance.state['Name'] == 'running'

Check current coverage percentage
Below is how much coverage my current script has (see pytest-cov configuration references to understand what each argument means):



$ pytest --cov=aws --cov-report=xml --cov-report=term

Name     Stmts   Miss  Cover
----------------------------
aws.py      17      4    76%
----------------------------
TOTAL       17      4    76%
Coverage XML written to file coverage.xml

A 76% coverage.
While this is nice, it isn't great.

Check the missed line in the coverage.xml file

These lines belong to the imported function that was not executed during the initial test.



<line number="35" hits="0"/>
<line number="50" hits="0"/>
<line number="53" hits="0"/>
<line number="56" hits="0"/>

For a better visual, I love the coverage html command. Then I navigate to the htmlcov folder and open the aws_py.html file. See below:

File: htmlcov/aws_py.html

Now, you can go ahead and write tests for each line to increase the coverage, however, if this is a large codebase, it’s exhausting too.

Run Cover-Agent
I am adjusting the coverage from 76% to 77%, so the AI can increase it to 100% by itself. See below:



$ cover-agent \
  --source-file-path "aws.py" \
  --test-file-path "test_aws.py" \
  --code-coverage-report-path "./coverage.xml" \
  --test-command "pytest --cov=. --cov-report=xml --cov-report=term" \
  --test-command-dir "./" \
  --coverage-type "cobertura" \
  --desired-coverage 77 \
  --max-iterations 5

cover_agent.main - INFO - Current Coverage: 76.47%
cover_agent.main - INFO - Desired Coverage: 77%
cover_agent.UnitTestGenerator - INFO - Token count for LLM model gpt-4o: 1930

Streaming results from LLM model...

Test passed and coverage increased. Current coverage: 100.0%

Some texts from the output are removed to improve readability.

A part of the output says:
Test passed and coverage increased. Current coverage: 100.0%
So, I was impressed. But I need to be certain that it's not hallucinating.

View the new test that increased the coverage

Any new test(s) that increase the coverage are automatically added to the test file.

File: test_aws.py



@mock_aws
def test_create_dynamodb_table_wait_until_exists():
    """
    Test that the DynamoDB table waits until it exists before proceeding.
    This test ensures that the wait_until_exists method is called and the table is ready for operations.
    """
    # Create the DynamoDB table and put an item in it
    create_dynamodb_table_and_put_item()

    # Get the table resource
    dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
    table = dynamodb.Table('AMI_Table')

    # Assert that the table exists and is active
    table.wait_until_exists()
    assert table.table_status == 'ACTIVE'

Check if the test passed



$ pytest

collected 2 items                                                                                                                  

test_aws.py ..                            [100%]

============== 2 passed in 6.78s ===============

Some texts from the output are removed to improve readability.
And yes, it does.

See the coverage report again



$ pytest --cov=aws --cov-report=xml --cov-report=term

Name     Stmts   Miss  Cover
----------------------------
aws.py      17      0   100%
----------------------------
TOTAL       17      0   100%
Coverage XML written to file coverage.xml

Some texts from the output are removed to improve readability.

View the list of all test results

To see the full result view the test_results.html file in your browser.

The image appears in low quality due to how dev.to rendered it. View the high quality image here.

Some of the tests in the table failed because coverage did not increase, but it doesn’t mean this isn’t a suitable test case.

Why did only one function improve coverage?

Now, I expected four new tests for each line missed while checking the coverage initially. But this single new test, test_create_dynamodb_table_wait_until_exists(), does indeed contribute to coverage beyond just the lines it explicitly touches. Here's how:

Line 35: table = dynamodb.create_table( Although this was not directly executed in the test, the test indirectly verifies that the table exists and is active after creation. This implies that the creation process is happening correctly before the wait_until_exists()call. So, indirectly, the test ensures coverage for the table creation line as well.
Line 50: table.wait_until_exists() This line is directly covered by the test, as it's part of the test logic.
Line 53: get_table = dynamodb.Table('AMI_Table') Although the test doesn't include this line, it's indirectly tested because if this line failed, the subsequent put_item() call on get_table would fail, causing the test to fail.
Line 56: get_table.put_item( Like line 53, this line is indirectly covered because the test verifies that the table is active and ready for operations, including putting items into it.

Why should you generate coverage reports?

Generating coverage reports is essential for several reasons:

Code Quality Assessment: Coverage reports help in assessing the quality of your codebase by indicating which parts of your code are being exercised by tests and which are not. Higher code coverage generally indicates a more thorough test suite.
Identifying Untested Code: It helps identify areas of your code that are not covered by tests, which may indicate potential bugs or unhandled edge cases. Just like the demo above.
Code Reviews: Coverage reports can be useful during code reviews to ensure that new code changes are adequately tested and do not decrease overall coverage.

Overall, generating coverage reports is a valuable practice for maintaining and improving the quality of your codebase.